16 research outputs found

    The sequences of 150,119 genomes in the UK Biobank

    Get PDF
    Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data(1,2). Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank(3). This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation

    A genome-wide meta-analysis yields 46 new loci associating with biomarkers of iron homeostasis

    Get PDF
    Bell et al. report 46 new loci associated with biomarkers of iron homeostasis, including ferritin levels, iron binding capacity, and iron saturation, in the Icelandic, Danish and UK populations. The associated loci point to new iron-regulating proteins and important genetic differences between men and women

    A genome-wide meta-analysis yields 46 new loci associating with biomarkers of iron homeostasis

    Get PDF
    Abstract: Iron is essential for many biological functions and iron deficiency and overload have major health implications. We performed a meta-analysis of three genome-wide association studies from Iceland, the UK and Denmark of blood levels of ferritin (N = 246,139), total iron binding capacity (N = 135,430), iron (N = 163,511) and transferrin saturation (N = 131,471). We found 62 independent sequence variants associating with iron homeostasis parameters at 56 loci, including 46 novel loci. Variants at DUOX2, F5, SLC11A2 and TMPRSS6 associate with iron deficiency anemia, while variants at TF, HFE, TFR2 and TMPRSS6 associate with iron overload. A HBS1L-MYB intergenic region variant associates both with increased risk of iron overload and reduced risk of iron deficiency anemia. The DUOX2 missense variant is present in 14% of the population, associates with all iron homeostasis biomarkers, and increases the risk of iron deficiency anemia by 29%. The associations implicate proteins contributing to the main physiological processes involved in iron homeostasis: iron sensing and storage, inflammation, absorption of iron from the gut, iron recycling, erythropoiesis and bleeding/menstruation

    Graphtyper enables population-scale genotyping using pangenome graphs.

    No full text
    To access publisher's full text version of this article click on the hyperlink belowA fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies

    Sequence variation at ANAPC1 accounts for 24% of the variability in corneal endothelial cell density

    Get PDF
    The corneal endothelium is crucial for proper vision. Here, Ivarsdottir et al. perform genome-wide association studies for various corneal endothelial cell measurements and find that an intergenic variant near ANAPC1 explains 24% of the variance of endothelial cell density and associates with corneal hysteresis

    Diversity in non-repetitive human sequences not found in the reference genome.

    No full text
    To access publisher's full text version of this article click on the hyperlink belowGenomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r(2) > 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10(-8), odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease

    Multiple transmissions of de novo mutations in families.

    No full text
    To access publisher's full text version of this article click on the hyperlink belowDe novo mutations (DNMs) cause a large proportion of severe rare diseases of childhood. DNMs that occur early may result in mosaicism of both somatic and germ cells. Such early mutations can cause recurrence of disease. We scanned 1,007 sibling pairs from 251 families and identified 878 DNMs shared by siblings (ssDNMs) at 448 genomic sites. We estimated DNM recurrence probability based on parental mosaicism, sharing of DNMs among siblings, parent-of-origin, mutation type and genomic position. We detected 57.2% of ssDNMs in the parental blood. The recurrence probability of a DNM decreases by 2.27% per year for paternal DNMs and 1.78% per year for maternal DNMs. Maternal ssDNMs are more likely to be T>C mutations than paternal ssDNMs, and less likely to be C>T mutations. Depending on the properties of the DNM, the recurrence probability ranges from 0.011% to 28.5%. We have launched an online calculator to allow estimation of DNM recurrence probability for research purposes

    Lipoprotein(a) Concentration and Risks of Cardiovascular Disease and Diabetes.

    No full text
    To access publisher's full text version of this article click on the hyperlink belowBackground: Lipoprotein(a) [Lp(a)] is a causal risk factor for cardiovascular diseases that has no established therapy. The attribute of Lp(a) that affects cardiovascular risk is not established. Low levels of Lp(a) have been associated with type 2 diabetes (T2D). Objectives: This study investigated whether cardiovascular risk is conferred by Lp(a) molar concentration or apolipoprotein(a) [apo(a)] size, and whether the relationship between Lp(a) and T2D risk is causal. Methods: This was a case-control study of 143,087 Icelanders with genetic information, including 17,715 with coronary artery disease (CAD) and 8,734 with T2D. This study used measured and genetically imputed Lp(a) molar concentration, kringle IV type 2 (KIV-2) repeats (which determine apo(a) size), and a splice variant in LPA associated with small apo(a) but low Lp(a) molar concentration to disentangle the relationship between Lp(a) and cardiovascular risk. Loss-of-function homozygotes and other subjects genetically predicted to have low Lp(a) levels were evaluated to assess the relationship between Lp(a) and T2D. Results: Lp(a) molar concentration was associated dose-dependently with CAD risk, peripheral artery disease, aortic valve stenosis, heart failure, and lifespan. Lp(a) molar concentration fully explained the Lp(a) association with CAD, and there was no residual association with apo(a) size. Homozygous carriers of loss-of-function mutations had little or no Lp(a) and increased the risk of T2D. Conclusions: Molar concentration is the attribute of Lp(a) that affects risk of cardiovascular diseases. Low Lp(a) concentration (bottom 10%) increases T2D risk. Pharmacologic reduction of Lp(a) concentration in the 20% of individuals with the greatest concentration down to the population median is predicted to decrease CAD risk without increasing T2D risk. Keywords: Lp(a); Mendelian randomization; coronary artery disease; genetics; type 2 diabetes

    Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits.

    No full text
    To access publisher's full text version of this article click on the hyperlink belowLong-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes

    Loss-of-Function Variants in the Tumor-Suppressor Gene Confer Increased Cancer Risk.

    No full text
    To access publisher's full text version of this article click on the hyperlink belowThe success of genome-wide association studies (GWAS) in identifying common, low-penetrance variant-cancer associations for the past decade is undisputed. However, discovering additional high-penetrance cancer mutations in unknown cancer predisposing genes requires detection of variant-cancer association of ultra-rare coding variants. Consequently, large-scale next-generation sequence data with associated phenotype information are needed. Here, we used genotype data on 166,281 Icelanders, of which, 49,708 were whole-genome sequenced and 408,595 individuals from the UK Biobank, of which, 41,147 were whole-exome sequenced, to test for association between loss-of-function burden in autosomal genes and basal cell carcinoma (BCC), the most common cancer in Caucasians. A total of 25,205 BCC cases and 683,058 controls were tested. Rare germline loss-of-function variants in PTPN14 conferred substantial risks of BCC (OR, 8.0; P = 1.9 × 10-12), with a quarter of carriers getting BCC before age 70 and over half in their lifetime. Furthermore, common variants at the PTPN14 locus were associated with BCC, suggesting PTPN14 as a new, high-impact BCC predisposition gene. A follow-up investigation of 24 cancers and three benign tumor types showed that PTPN14 loss-of-function variants are associated with high risk of cervical cancer (OR, 12.7, P = 1.6 × 10-4) and low age at diagnosis. Our findings, using power-increasing methods with high-quality rare variant genotypes, highlight future prospects for new discoveries on carcinogenesis. SIGNIFICANCE: This study identifies the tumor-suppressor gene PTPN14 as a high-impact BCC predisposition gene and indicates that inactivation of PTPN14 by germline sequence variants may also lead to increased risk of cervical cancer.Common Fund of the Office of the Director of the National Institutes of Health United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Cancer Institute (NCI) United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Human Genome Research Institute (NHGRI) United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Heart Lung & Blood Institute (NHLBI) United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Institute on Drug Abuse (NIDA) United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Institute of Mental Health (NIMH) United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Institute of Neurological Disorders & Stroke (NINDS
    corecore