## Extreme inbreeding in a European ancestry sample from the contemporary UK population

### Prevalence of EI in European descents from the UKB

We previously identified^{22} 456,426 individuals of European ancestry among the 487,409 UKB participants who have been genotyped. Ancestry was called in our previous study using projected principal components analysis based on known ancestry and whole-genome sequence data from 2504 participants of the 1000 Genomes Project^{23} (Methods). Given that 12 participants had retracted consent, we only analysed 456,414 UKB participants in the present study. We used 301,412 quality-controlled genotyped single-nucleotide polymorphisms (SNPs) to call ROHs using the PLINK software (Methods). As in previous studies^{4,8,10}, ROHs were defined as homozygous >1.5 Mb long genomic segments (Methods). We then estimated for each study participant the percentage of their autosome comprising ROHs as a measure of inbreeding. Such inbreeding measure, hereafter denoted *F*_{ROH}, is a well-established predictor of pedigree inbreeding^{24,25}. Following guidelines from the American College of Medical Genetics and Genomics (ACMG)^{26,27}, EI was called for individuals with *F*_{ROH} > 0.1. The use of both *F*_{ROH} as a measure of inbreeding and of this threshold are recommended by the ACMG for detecting suspected consanguinity between parents.

We thus identified 125 unrelated participants (65 males and 60 females) whose genomes are consistent with their parents being first- or second-degree relatives. That represents a prevalence of EI ~0.03%, i.e., ~1/3652 (95% confidence interval—CI_{95%}: [1/4428–1/3106]). As a sensitivity analysis, and consistent with theory predicting much longer ROHs under EI, we re-estimated the prevalence of EI considering only ROHs > 2 Mb or >5 Mb long. Using these alternative definitions of ROH, also recommended in the ACMG guidelines, we detected 115 (prevalence of ~1/3969; CI_{95%}: [1/4857–1/3355]) and 98 (prevalence of ~1/4658; CI_{95%}: [1/5807–1/3887]) cases of EI, respectively. We also estimated the prevalence of EI using allele-frequency based inbreeding measures or using ROHs detected on both autosome and X-chromosomes of female participants. (Supplementary Table 1). Given that the latter estimates of the prevalence of EI are not statistically distinct (paired *t* test: *p* > 0.05) from our first estimate based on ROHs > 1.5 Mb, we will hereafter only consider ROHs > 1.5 Mb.

We then compared our estimate of the prevalence of EI with the prevalence of incest offences reported in the CSEW between April 2002 and March 2017. That survey reports a total of 11,196 cases of police-recorded incest offences over this time period (URLs). Relative to the population of England and Wales, which varied from 52,602,200 to 58,744,600 between those years (URLs), this represents a prevalence ranging from ~1/5247 (CI_{95%}: [1/5346–1/5151]) to 1/4699 (CI_{95%}: [1/4787–1/4612]). The latter estimate is of the same order of magnitude as our estimated prevalence of EI in the UKB although these two estimates are based on different time periods (births 1938–1967 in the UKB vs. reports 2002–2017 in the CSEW). We then compared the mean years of birth among EI cases with the rest of UKB participants and found no statistical difference (*p* = 0.11). That suggests that the prevalence of EI is relatively unchanged over time although mean inbreeding coefficients have significantly decreased over the years (correlation between year of birth and *F*_{ROH}: *r* = −0.01%; Pearson’s correlation test *p* = 5.5 × 10^{−14}). However, it is important to note that the prevalence of EI and that of police-recorded incest offences cannot naïvely nor strictly be compared because (i) only an unknown but likely small fraction of incest cases are reported to the police, (ii) not all cases of incest would result in viable offspring as observed in this study, and (iii) viable offspring with severe cognitive impairment due to inbreeding are unlikely to enrol themselves as participants in the UKB.

Fry et al.^{28} previously reported that the UKB is not representative of the entire UK population, as it notably, includes healthier and more educated participants than the average population. Such an ascertainment on traits which are negatively correlated with inbreeding (e.g., educational attainment (EA) or height^{28}), may lead the prevalence of EI in the UKB to be an underestimation of the actual prevalence of EI in the UK population. As a consequence, our estimate of prevalence of EI is likely conservative, although the magnitude of the underestimation is difficult to predict as it depends on many other unknown factors which might differ between UKB participants and the general population.

### Deconvolution of underlying mating types

We next estimated the proportion of EI cases born from mating between first-degree relatives (mating type 1; MT1) vs. second-degree relatives (MT2) using a threshold-based approach based on *F*_{ROH}. To determine an optimal threshold, we simulated inbreeding under MT1 and MT2 using phased genotypes from 972 unrelated UKB participants (Methods, Supplementary Table 2). These 972 UKB participants are the offspring from 972 independent parent–offspring (PO) trios identified in the UKB^{29}. Over ~20,000 simulation replicates (one replicate is one simulated EI case) we found that *F*_{ROH} as a predictor of underlying mating type (MT1 vs. MT2) yields an area under the receiver operating characteristic curve (AUC) of ~0.97 and that using *F*_{ROH} > 0.17 as a threshold yields optimal sensitivity and specificity both >0.92 (Fig. 1). Using this threshold, we therefore identified 54/125 (i.e., ~43.2%) EI UKB cases whose parents are most likely first-degree relatives. It is worth noting that complex inbreeding loops between second degree-relatives may also lead to extreme values of *F*_{ROH}. However, mating between first-degree relatives remains a more parsimonious explanation of the empirical observations, in particular in a population of European ancestry where such complex inbreeding loops are uncommon.

We further attempted to quantify the proportion of MT1 born from PO vs. FS mating (*π*_{PO/FS}). Given that the theoretical expectation of *F*_{ROH} is 0.25 both under PO and FS mating, we found, as expected, that *F*_{ROH} alone cannot discriminate PO from FS in our simulations (AUC of ~0.5).

However, *F*_{ROH} being proportional to the cumulative length of ROHs across the genome also implies that the same value of *F*_{ROH} could reflect either fewer larger ROHs, or more smaller ROHs. Therefore, we investigated if the numbers of ROHs detected (*N*_{ROH}) under PO or FS mating are different and if so can discriminate those two types of mating. We found on average over ~20,000 simulation replicates that *N*_{ROH} ~45 ROHs are detected in offspring of FS mating as compared to *N*_{ROH} ~38 ROHs detected in offspring of PO mating (Table 1). Moreover, ROHs detected in offspring of PO mating were on average ~2.7 Mb longer than ROHs detected under FS mating (Table 1). Consistent with these observations, we found that *N*_{ROH} as a predictor of mating type yields a discriminative AUC of ~0.81, with the optimal threshold of >41 yielding a sensitivity of ~0.77 and specificity of ~0.69. Using that threshold we predict that 24/54 (i.e., *π*_{PO/FS} ~44.4%; CI_{95%}: [31.2–57.7%]) EI cases with *F*_{ROH} > 0.17 are likely offspring of parent–offspring mating. We also considered an alternative approach that aims at directly estimating the proportion of EI cases born from PO vs. FS mating from modelling the length distribution of ROHs (Methods). We applied this method to 2244 ROHs segments detected in 54 EI cases with *F*_{ROH} > 0.17 and estimated that *π*_{PO/FS} ~67.6% (CI_{95%}: [45.2–90.1%]). To confirm this finding, we analysed the distribution of *F*_{ROH} from X-chromosome ROHs (hereafter denoted *F*_{ROH-X}) in 26 female EI cases with *F*_{ROH} > 0.17. This analysis is justified by the fact that the theoretical expectation of *F*_{ROH-X} equals 0.5 under PO mating vs. 0.25 under FS mating. We first stratified these 26 female EI cases into two groups (Group 1 and Group 2) depending on whether the likelihood of their autosomal segments lengths is larger under PO mating or under FS mating. More specifically, Group 1 (*N* = 10) and Group 2 (*N* = 16) contain female EI cases predicted to be offspring of FS and PO, respectively (Supplementary Fig. 1) from the length distribution of autosomal ROHs. The mean *F*_{ROH-X} in Group 2 is 0.53 (CI_{95%}: [0.41–0.65]), consistent with PO mating, while the mean *F*_{ROH-X} in Group 1 is 0.34 (CI_{95%}: [0.19–0.49]), which is consistent with FS mating, although standard errors are large. Altogether, we found that between 44.4 and 67.6% of EI cases with *F*_{ROH} > 0.17 are likely offspring of PO mating.

We simulated inbreeding between first-cousins (hereafter denoted MT3) in order to quantify the ability of the *F*_{ROH} > 0.1 threshold recommended by the ACMG guidelines to discriminate MT1 or MT2 from MT3. We recall here that the coefficient of relationship between first-cousins is 0.125, and therefore the expected inbreeding coefficient of their offspring is E[*F*_{ROH}] = 0.5 × 0.125 = 0.0625. Also, MT3 is legal in most countries and thus more common in the population. We found over ~20,000 simulation replicates that *F*_{ROH} yields an AUC of ~0.95, and that using *F*_{ROH} > 0.1 as a threshold yields a sensitivity of ~0.94 and a specificity of ~0.79 to discriminate MT1 or MT2 from MT3 (Fig. 1). This, therefore, suggests that ~8/125 EI cases identified (i.e., ~6.4%) in this study could in fact be offspring of first-cousins mating. Hill and Weir^{30} derived that the theoretical standard deviation of inbreeding coeffcients of offspring of first-cousins is ~0.024. Therefore, assuming under MT3 that *F*_{ROH} is normally distributed with mean 0.0625 and standard deviation 0.024, follows that the probability of *F*_{ROH} > 0.1 equals ~5.9%, which is consistent with our simulations.

### Distribution of ROH in EI cases

As expected, we found that EI cases harboured significantly more and significantly longer ROHs than EI controls (*F*_{ROH} < 0.01) in the population (Table 1). On average, we detected *N*_{ROH} ~33.6 ROHs in EI cases vs. ~4.9 ROHs in EI controls. The mean length of ROHs was *L*_{ROH} ~14.8 Mb in EI cases vs. ~2.1 Mb in EI controls. Both mean numbers and mean lengths of ROHs detected are consistent with our simulations of EI (mean *N*_{ROH} ~33.6 and *L*_{ROH} ~14.0; Table 1). We represent in Fig. 2 the histogram of ROHs length in EI cases, and report in Fig. 3, a few examples of very large ROHs (>100 Mb) covering ~50% of an entire chromosome. We also report X-chromosome ROHs detected 54/125 female EI cases in Supplementary Fig. 2.

Previous theoretical studies have often considered the length of genomic segments homozygous by descent (HBD) to follow an exponential distribution^{31,32}. These studies generally relied on specific assumptions regarding recombination map functions, like Haldane or Kosambi map functions, which yield tractable algebraical simplifications. However, empirical evidence supporting these assumptions remains limited. Moreover, some of these simplifying assumptions like that of independence between the lengths and the numbers of HBD segments have also been criticised^{33}. Here, we used an empirical approach to estimate the length distribution of ROHs segments detected in EI cases using a mixture of exponential distributions. Given that only ROHs larger than 1.5 Mb were detected, we modelled the distribution of lengths minus 1.5 Mb and not directly the length distribution, which would better fit a mixture of truncated exponential distributions (Methods). Mixtures of exponential distributions represent a flexible family of probability distributions, from which the exponential distribution is a special case. We selected the number of mixture components best fitting the data using the Bayesian Information Criterion (BIC). To calibrate our inference, we first estimated the length distribution of >282,635 simulated true HBD segments under various mating types. Our simulations are based on observed recombination maps from the 1000 Genomes Project^{23}, and therefore do not make additional assumptions regarding recombination rates (Methods). We found for all simulated mating types that BIC selects two mixture components, which suggests that the single exponential distribution is likely too simple to characterise the length distribution of HBD segments. Of note, mixtures of two exponential distributions also yield a better fit than gamma distributions that have previously also been proposed^{1}. Similarly, we estimated the length distribution of >99,794 ROHs detected in our simulated data. We found consistently that the length distribution of simulated ROHs is also well characterised by a mixture of two exponential distributions. We report in Table 2, the parameters of the mixture distributions estimated from true HBD segments and from ROHs. We then estimated the length distribution of the 4196 ROHs detected over all EI cases. We found this distribution to fit a 84:16 mixture of exponential distributions with means ~15.7 Mb (larger component) and ~0.7 Mb (smaller component), respectively (Table 2; Fig. 2). Overall, our findings suggest that the length distribution of HBD segments and ROHs can be well approximated with a mixture of two exponential distributions.

Another observation in our simulations was that that the mean number of ROHs detected in an individual was larger than the number of true HBD segments simulated. This somewhat counterintuitive observation is explained by the fact that HBD were defined as segments identical-by-descent (from parents to offspring), while ROHs were re-estimated from the genotypes of simulated offspring. As a consequence, although simulated offspring of matings between unrelated parents have exactly zero HBD segments, they still harbour ROHs > 1.5 Mb given that their chromosomes were sampled from 972 existing UKB participants. Despite not being closely related (genomic relationship (GRM) < 0.05), these 972 UKB participants are still likely to have a distant common ancestor (>25 generations ago), which would lead to detection of ROHs > 1.5 Mb in their (simulated) offspring. We found that simulated offspring of matings between unrelated parents had on average 4.8 ROHs > 1.5 Mb (Table 1). If we subtract that number (i.e., 4.8 ROHs) from the mean number of ROHs detected under simulated inbred matings (Tables 1 and 2), we now find very consistent mean numbers of ROHs and HBD segments per individual. More specifically, for each simulated inbred mating we find, after this correction, 32.5 HBD vs. 33.3 ROH for PO mating, 41.6 HBD vs. 40.4 ROH for fullsibs mating, 20.8 HBD vs. 20.2 ROH for HSs mating, 25.2 HBD vs. 23.5 ROH for avuncular mating, 20.8 vs. 20.1 ROH for grandparent–grandchild mating, 29.8 vs. 26.8 ROH for double-first cousin mating and 14.9 HBD vs. 13.3 ROH for first-cousin mating.

### Phenotypic consequences of EI

We quantified the consequences of EI on multiple traits measured in the UKB. We first analysed ten control traits with prior evidence of inbreeding depression^{4,8,10,13}. Those ten traits are height, hip-to-waist ratio (HWR), handgrip strength (HGS; average of left and right hand), lung function measured as the peak expiratory flow (PEF), visual acuity (VA), auditory acuity (AA), number of years of education (EA), fluid intelligence score (FIS), cognitive function measured as the mean time to correctly identify matches (MTCIM) and fertility measured as the number of children (NCh). We performed linear regressions of these traits on the EI status adjusted for age at recruitment, recruitment centre (treated as a categorical factor), sex, year of birth (treated as a continuous variable), genotyping batch (treated as a factor), socioeconomic status measured by the Townsend deprivation index and population structure measured by ten genetic principal components estimated from HM3 SNPs. As expected, we found that EI cases had a reduced mean in these ten traits as compared to EI controls. More specifically, we found phenotypic means in EI cases to be between 0.3 and 0.7 standard deviation below the population mean (Table 3). Note, that under normality assumptions, between ~25 and ~40% of the population has a phenotype below 0.7 and 0.3 standard deviations below the mean, respectively. Despite the small sample size of 125 EI cases, the reduction was statistically significant (Wald-test *p* < 0.5/10 = 0.005) for 7 out the 10 traits (Table 3). We also specifically estimated the inbreeding load (often denoted *B*), which represents the number of loci with deleterious alleles that would cause one death on average if made homozygous^{3}. As previously recommended^{34}, we estimated *B* using Poisson regression of the number of children engendered onto *F*_{ROH}. Poisson regression was performed using a logarithmic link function as also previously recommended^{34} and adjusted for the same covariates listed above. For this analysis, we used the entire distribution of *F*_{ROH}, (i.e., includes both EI cases and EI controls) and found an estimate of *B* ~1.46 (CI_{95%}: [0.87–2.05]; Wald-test *p* = 1.3 × 10^{−6}; Table 3). The effect of inbreeding on fertility of the resulting inbred offspring, that we have quantified here, has been previously detected in humans^{35}. However, the latter study did not provide an estimate of inbreeding load that can be directly compared with ours. Nonetheless, we found that our estimate is consistent with estimates of inbreeding load on survival of offspring from inbred mating in humans^{3,36} and other species^{34,37}, although these are different traits.

We then assessed whether the observed reduction in these ten traits was consistent with inbreeding depression quantified within EI controls. Under the assumption that inbreeding depression results only from directional dominance effects of deleterious alleles or heterozygote advantage (overdominance), phenotypes are expected to decline linearly with increased inbreeding. However, if epistasis contributes to inbreeding depression^{38,39} or if causal variants for inbreeding depression are rarer^{1}, a nonlinear relationship could be observed in particular for large inbreeding coefficients. To test this hypothesis we first estimated inbreeding depression in 345,276 EI controls unrelated with each other and unrelated with the 125 EI cases. For each of the 10 control traits, we then compared the phenotypic mean in the 125 EI cases, with a linear prediction based on the estimate of inbreeding depression in EI controls. For this analysis inbreeding depression was also estimated using an alternative inbreeding measure (*F*_{UNI}), which we previously showed to be more powerful for detecting inbreeding depression^{4}. The latter analysis did not reveal a significant deviation from the linear prediction (Wald-test *p* > 0.005) regardless of the inbreeding measure used, which therefore underlines that the observed phenotypic reduction in EI cases is consistent with inbreeding depression observed within EI controls (Fig. 4). This also suggests that causal variants contributing to inbreeding depression in those traits are likely well-tagged (i.e., correlated) by common variants in the population. However, we acknowledge that the estimate of inbreeding depression from the EI cases present in the UKB might be too low if, as seems plausible, they are a relatively healthy sample from the population of all EI cases in the UK^{28}.

We next analysed the number of diseases diagnosed in an individual as an overall measure of health (Methods). We used overdispersed Poisson regression to estimate the relative risk (RR) of being diagnosed with at least one disease in EI cases as compared to EI controls. We found a RR of ~1.44 (Wald-test *p* = 3.6 × 10^{−5}; Table 3). To minimise potential biases due to partial or differential disease reporting between UKB participants, we re-estimated RR in individuals with at least one disease diagnosed. This analysis included only 110 of the 125 EI cases identified and similarly showed a reduced but still significant RR ~1.34 (Wald-test *p* = 4.4 × 10^{−4}; Table 3). In summary, we confirm that EI produces offspring with reduced stature (height), cognitive function (EA, FIS, and MTCIM), AA, muscular fitness (HGS), and lung function (PEF), consistent with a linear decline in these traits as inbreeding increases. We also provide additional evidence that offspring resulting from EI have increased risk for developing any type of disease.

### Social context of EI cases

We tested the association between EI and the Townsend depression index, which quantifies the level of socioeconomic deprivation in areas where UKB participants live. We found significant evidence that EI is enriched in more socioeconomically deprived area (odds ratio: 1.22; CI_{95%}: [1.16–1.29]; Wald-test *p* = 2.6 × 10^{−13}), consistent with a previous study^{13}, which reported association between *F*_{ROH} and the same index in the UKB.

We further investigated the social contexts in which EI arose. For that we compared different characteristics of the parents of EI cases with that of the parents of EI controls. We found that 14.5% (i.e., 18/124, 1 missing value) of EI cases vs. 1.5% of controls reported to be adopted as a child (Fisher exact test *p* = 7.3 × 10^{−13}). Given the significance of this difference we therefore focused all subsequent comparisons in nonadopted participants (106 EI cases vs. 339,241 EI controls) in order to minimise biases due to differential reporting of parental traits.

Previous studies^{40} have suggested that low EA of parents could be a cause of inbreeding in the population. Given that EA of parents of UKB participants has not been measured, we therefore tested this hypothesis by comparing mean genetic predictors of EA in UKB participants between EI cases and EI controls. Note that mean genetic predictor of EA is an estimate of the parental average for this trait. We found no statistical evidence that the mean genetic predictor of EA in EI cases deviate from that of EI controls (*t* test *p* = 0.538; Table 3). In fact, the mean genetic predictor of EA in EI cases approximately equals the median of the EA genetic predictor distribution in EI controls, which highlights that EI cases are not outliers on this scale. Besides EA, we then used overdispersed Poisson regression to compare the number of diseases (Online method) reported in parents of EI cases vs. parents of controls, which we used as another proxy for socioeconomic status of parents. We found no significant evidence that parents of EI cases are enriched for comorbidities as compared to parents of EI controls (RR ~0.96; Wald-test *p* = 0.507; Table 3). However, this observation must be interpreted with caution as it may simply reflect that EI cases observed in the UKB may be from more healthier background as compared to EI in the general population. Although additional information on parents of UKB participants was available (i.e., age of parents or age when parent died), missing values rates were often too large (>50%) among EI cases to draw reliable inference. Finally, we investigated if EI cases were geographically clustered, but found no significant association between EI and birth location (North coordinate: Wald-test *p* = 0.15; East-coordinate: Wald-test *p* = 0.08). Note that the absence of geographical clustering that we report only applies to these extreme events and could also reflect lack of statistical power as we still observed variance in mean *F*_{ROH} between different geographical areas of the UK. Altogether, although we observed that EI is more prevalent in more socially deprived areas of the UK, our results point to an absence of evidence that social and geographical stratification of parents contribute to the prevalence of EI in the population.

**Original Source**