27 research outputs found

    popSTR2 enables clinical and population-scale genotyping of microsatellites

    Get PDF
    Summary: popSTR2 is an update and augmentation of our previous work ‘popSTR: a population-based microsatellite genotyper’. To make genotyping sensitive to inter-sample differences, we supply a kernel to estimate sample-specific slippage rates. For clinical sequencing purposes, a panel of known pathogenic repeat expansions is provided along with a script that scans and flags for manual inspection markers indicative of a pathogenic expansion. Like its predecessor, popSTR2 allows for joint genotyping of samples at a population scale. We now provide a binning method that makes the microsatellite genotypes more amenable to analysis within standard association pipelines and can increase association power. Availability and implementation: https://github.com/DecodeGenetics/popSTR. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.Peer Reviewed (ritrýnd grein

    GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs

    Get PDF
    Publisher's version (útgefin grein).Analysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads. Comparison to the syndip benchmark dataset shows that our SV genotyping is sensitive and variant segregation in families demonstrates the accuracy of our approach. We demonstrate that incorporating public assembly data into our pipeline greatly improves sensitivity, particularly for large insertions. We validate 6,812 SVs on average per genome using long-read data of 41 Icelanders. We show that GraphTyper2 can simultaneously genotype tens of thousands of whole-genomes by characterizing 60 million small variants and half a million SVs in 49,962 Icelanders, including 80 thousand SVs with high-confidence.We are grateful to our colleagues from deCODE genetics / Amgen Inc. for their contributions. We also wish to thank all research participants who provided a biological sample to deCODE genetics.Peer Reviewe

    PopDel identifies medium-size deletions simultaneously in tens of thousands of genomes

    Get PDF
    Thousands of genomic structural variants (SVs) segregate in the human population and can impact phenotypic traits and diseases. Their identification in whole-genome sequence data of large cohorts is a major computational challenge. Most current approaches identify SVs in single genomes and afterwards merge the identified variants into a joint call set across many genomes. We describe the approach PopDel, which directly identifies deletions of about 500 to at least 10,000 bp in length in data of many genomes jointly, eliminating the need for subsequent variant merging. PopDel scales to tens of thousands of genomes as we demonstrate in evaluations on up to 49,962 genomes. We show that PopDel reliably reports common, rare and de novo deletions. On genomes with available high-confidence reference call sets PopDel shows excellent recall and precision. Genotype inheritance patterns in up to 6794 trios indicate that genotypes predicted by PopDel are more reliable than those of previous SV callers. Furthermore, PopDel’s running time is competitive with the fastest tested previous tools. The demonstrated scalability and accuracy of PopDel enables routine scans for deletions in large-scale sequencing studies

    Whole genome characterization of sequence diversity of 15,220 Icelanders

    Get PDF
    Understanding of sequence diversity is the cornerstone of analysis of genetic disorders, population genetics, and evolutionary biology. Here, we present an update of our sequencing set to 15,220 Icelanders who we sequenced to an average genome-wide coverage of 34X. We identified 39,020,168 autosomal variants passing GATK filters: 31,079,378 SNPs and 7,940,790 indels. Calling de novo mutations (DNMs) is a formidable challenge given the high false positive rate in sequencing datasets relative to the mutation rate. Here we addressed this issue by using segregation of alleles in three-generation families. Using this transmission assay, we controlled the false positive rate and identified 108,778 high quality DNMs. Furthermore, we used our extended family structure and read pair tracing of DNMs to a panel of phased SNPs, to determine the parent of origin of 42,961 DNMs.Peer Reviewe

    The sequences of 150,119 genomes in the UK Biobank

    Get PDF
    Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data(1,2). Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank(3). This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation

    The genetic architecture of age-related hearing impairment revealed by genome-wide association analysis.

    Get PDF
    To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked DownloadAge-related hearing impairment (ARHI) is the most common sensory disorder in older adults. We conducted a genome-wide association meta-analysis of 121,934 ARHI cases and 591,699 controls from Iceland and the UK. We identified 21 novel sequence variants, of which 13 are rare, under either additive or recessive models. Of special interest are a missense variant in LOXHD1 (MAF = 1.96%) and a tandem duplication in FBF1 covering 4 exons (MAF = 0.22%) associating with ARHI (OR = 3.7 for homozygotes, P = 1.7 × 10-22 and OR = 4.2 for heterozygotes, P = 5.7 × 10-27, respectively). We constructed an ARHI genetic risk score (GRS) using common variants and showed that a common variant GRS can identify individuals at risk comparable to carriers of rare high penetrance variants. Furthermore, we found that ARHI and tinnitus share genetic causes. This study sheds a new light on the genetic architecture of ARHI, through several rare variants in both Mendelian deafness genes and genes not previously linked to hearing

    A genome-wide meta-analysis yields 46 new loci associating with biomarkers of iron homeostasis

    Get PDF
    Bell et al. report 46 new loci associated with biomarkers of iron homeostasis, including ferritin levels, iron binding capacity, and iron saturation, in the Icelandic, Danish and UK populations. The associated loci point to new iron-regulating proteins and important genetic differences between men and women

    Lipoprotein(a) Concentration and Risks of Cardiovascular Disease and Diabetes

    Get PDF
    Publisher's version (útgefin grein)Background: Lipoprotein(a) [Lp(a)] is a causal risk factor for cardiovascular diseases that has no established therapy. The attribute of Lp(a) that affects cardiovascular risk is not established. Low levels of Lp(a) have been associated with type 2 diabetes (T2D). Objectives: This study investigated whether cardiovascular risk is conferred by Lp(a) molar concentration or apolipoprotein(a) [apo(a)] size, and whether the relationship between Lp(a) and T2D risk is causal. Methods: This was a case-control study of 143,087 Icelanders with genetic information, including 17,715 with coronary artery disease (CAD) and 8,734 with T2D. This study used measured and genetically imputed Lp(a) molar concentration, kringle IV type 2 (KIV-2) repeats (which determine apo(a) size), and a splice variant in LPA associated with small apo(a) but low Lp(a) molar concentration to disentangle the relationship between Lp(a) and cardiovascular risk. Loss-of-function homozygotes and other subjects genetically predicted to have low Lp(a) levels were evaluated to assess the relationship between Lp(a) and T2D. Results: Lp(a) molar concentration was associated dose-dependently with CAD risk, peripheral artery disease, aortic valve stenosis, heart failure, and lifespan. Lp(a) molar concentration fully explained the Lp(a) association with CAD, and there was no residual association with apo(a) size. Homozygous carriers of loss-of-function mutations had little or no Lp(a) and increased the risk of T2D. Conclusions: Molar concentration is the attribute of Lp(a) that affects risk of cardiovascular diseases. Low Lp(a) concentration (bottom 10%) increases T2D risk. Pharmacologic reduction of Lp(a) concentration in the 20% of individuals with the greatest concentration down to the population median is predicted to decrease CAD risk without increasing T2D risk.Peer Reviewe

    HLA alleles, disease severity, and age associate with T-cell responses following infection with SARS-CoV-2

    Get PDF
    Funding Information: We thank all of the participants that contributed samples for this study for their invaluable contribution to the research. We also thank our research staff at the Patient Recruitment Center for their thorough work. Publisher Copyright: © 2022, The Author(s).Memory T-cell responses following SARS-CoV-2 infection have been extensively investigated but many studies have been small with a limited range of disease severity. Here we analyze SARS-CoV-2 reactive T-cell responses in 768 convalescent SARS-CoV-2-infected (cases) and 500 uninfected (controls) Icelanders. The T-cell responses are stable three to eight months after SARS-CoV-2 infection, irrespective of disease severity and even those with the mildest symptoms induce broad and persistent T-cell responses. Robust CD4+ T-cell responses are detected against all measured proteins (M, N, S and S1) while the N protein induces strongest CD8+ T-cell responses. CD4+ T-cell responses correlate with disease severity, humoral responses and age, whereas CD8+ T-cell responses correlate with age and functional antibodies. Further, CD8+ T-cell responses associate with several class I HLA alleles. Our results, provide new insight into HLA restriction of CD8+ T-cell immunity and other factors contributing to heterogeneity of T-cell responses following SARS-CoV-2 infection.Peer reviewe

    Genome-wide association identifies seven loci for pelvic organ prolapse in Iceland and the UK Biobank.

    Get PDF
    To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked DownloadPelvic organ prolapse (POP) is a downward descent of one or more of the pelvic organs, resulting in a protrusion of the vaginal wall and/or uterus. We performed a genome-wide association study of POP using data from Iceland and the UK Biobank, a total of 15,010 cases with hospital-based diagnosis code and 340,734 female controls, and found eight sequence variants at seven loci associating with POP (P 5%) and one with minor allele frequency of 4.87%. Some of the variants associating with POP also associated with traits of similar pathophysiology. Of these, rs3820282, which may alter the estrogen-based regulation of WNT4, also associates with leiomyoma of uterus, gestational duration and endometriosis. Rs3791675 at EFEMP1, a gene involved in connective tissue homeostasis, also associates with hernias and carpal tunnel syndrome. Our results highlight the role of connective tissue metabolism and estrogen exposure in the etiology of POP.UCL Hospitals NIHR Biomedical Research Centr
    corecore