4 research outputs found

    Microsatellites; genotyping, mutation rate and effect on disease

    Get PDF
    Microsatellites are polymorphic tracts of short tandem repeats (STRs) with one to six base-pair (bp) motifs and account for around 3% of the human genome. Just like copying by hand a text where the same word occurs many times in a row, the replication of microsatellites is error prone and frequently adds or removes one or more copies of the repeat motif. As a result, microsatellites mutate several orders of magnitude faster than unique genomic sequences and for a given microsatellite, a population can have many possible length variations. The first objective of this study was to implement a method to jointly determine the number of repeats present at each microsatellite in the genome for a large number of samples. The second goal was to make the determination of repeat numbers more computationally efficient while simultaneously increasing the detection sensitivity of heavily expanded microsatellite alleles, known as repeat expansions. Last, the software was run on two large sets of whole genome sequenced individuals, one from Iceland and the other from the UK biobank. Using the genealogy information available on the Icelandic set, de novo mutation events were detected and the effects of parental sex, age and genotypes on the types and number of mutations found in their offspring were estimated.Um það bil þrjú prósent af erfðamengi mannsins eru örtungl, en þau eru fjölbreytilegar raðir af stuttum samliggjandi endurtekningum þar sem endurtekna röðin er á bilinu einn til sex basar á lengd. Líkt og við afritun á texta þar sem sama orðið er endurtekið oft í röð, þá er villuhættan meiri við afritun örtunglaraða en við aðrar raðir erfðamengisins og afleiðingin er að endurtekningu er bætt við eða hún tapast miðað við upprunalega basaröð. Vegna þessa stökkbreytast örtungl nokkrum stærðargráðum hraðar en aðrar raðir erfðamengisins og fyrir ákveðið örtungl getur hópur af fólki haft margar mismunandi lengdarútgáfur. Fyrsta markmið verkefnisins var að hanna og skrifa hugbúnað sem gæti ákvarðað fjölda endurtekninga fyrir öll örtungl í erfðamenginu hjá mörgun einstaklingum í einu. Næst, var reikniritinu hraðað en það jafnframt gert næmara fyrir stórum útþenslu örtungla samsætum, sem geta valdið mörgum mismunandi heilkennum hjá þeim sem þær bera. Að lokum var hugbúnaðurinn notaður til að meta arfgerð allra einstaklinga í tveimur stórum þýðum, frá Íslandi annars vegar og Bretlandi hins vegar. Ættfræðiupplýsingar um íslenska þýðið voru notaðar til að greina stökkbreytingar í afkvæmum sem ekki fundust í foreldrum og stökkbreytingarnar notaðar til að meta hvernig aldur kyn og arfgerð foreldra hefur áhrif á tegund og fjölda stökkbreytinga sem þeir arfleiða afkvæmi sín að

    GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs

    Get PDF
    Publisher's version (útgefin grein).Analysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads. Comparison to the syndip benchmark dataset shows that our SV genotyping is sensitive and variant segregation in families demonstrates the accuracy of our approach. We demonstrate that incorporating public assembly data into our pipeline greatly improves sensitivity, particularly for large insertions. We validate 6,812 SVs on average per genome using long-read data of 41 Icelanders. We show that GraphTyper2 can simultaneously genotype tens of thousands of whole-genomes by characterizing 60 million small variants and half a million SVs in 49,962 Icelanders, including 80 thousand SVs with high-confidence.We are grateful to our colleagues from deCODE genetics / Amgen Inc. for their contributions. We also wish to thank all research participants who provided a biological sample to deCODE genetics.Peer Reviewe

    Whole genome characterization of sequence diversity of 15,220 Icelanders

    Get PDF
    Understanding of sequence diversity is the cornerstone of analysis of genetic disorders, population genetics, and evolutionary biology. Here, we present an update of our sequencing set to 15,220 Icelanders who we sequenced to an average genome-wide coverage of 34X. We identified 39,020,168 autosomal variants passing GATK filters: 31,079,378 SNPs and 7,940,790 indels. Calling de novo mutations (DNMs) is a formidable challenge given the high false positive rate in sequencing datasets relative to the mutation rate. Here we addressed this issue by using segregation of alleles in three-generation families. Using this transmission assay, we controlled the false positive rate and identified 108,778 high quality DNMs. Furthermore, we used our extended family structure and read pair tracing of DNMs to a panel of phased SNPs, to determine the parent of origin of 42,961 DNMs.Peer Reviewe

    Lipoprotein(a) Concentration and Risks of Cardiovascular Disease and Diabetes

    Get PDF
    Publisher's version (útgefin grein)Background: Lipoprotein(a) [Lp(a)] is a causal risk factor for cardiovascular diseases that has no established therapy. The attribute of Lp(a) that affects cardiovascular risk is not established. Low levels of Lp(a) have been associated with type 2 diabetes (T2D). Objectives: This study investigated whether cardiovascular risk is conferred by Lp(a) molar concentration or apolipoprotein(a) [apo(a)] size, and whether the relationship between Lp(a) and T2D risk is causal. Methods: This was a case-control study of 143,087 Icelanders with genetic information, including 17,715 with coronary artery disease (CAD) and 8,734 with T2D. This study used measured and genetically imputed Lp(a) molar concentration, kringle IV type 2 (KIV-2) repeats (which determine apo(a) size), and a splice variant in LPA associated with small apo(a) but low Lp(a) molar concentration to disentangle the relationship between Lp(a) and cardiovascular risk. Loss-of-function homozygotes and other subjects genetically predicted to have low Lp(a) levels were evaluated to assess the relationship between Lp(a) and T2D. Results: Lp(a) molar concentration was associated dose-dependently with CAD risk, peripheral artery disease, aortic valve stenosis, heart failure, and lifespan. Lp(a) molar concentration fully explained the Lp(a) association with CAD, and there was no residual association with apo(a) size. Homozygous carriers of loss-of-function mutations had little or no Lp(a) and increased the risk of T2D. Conclusions: Molar concentration is the attribute of Lp(a) that affects risk of cardiovascular diseases. Low Lp(a) concentration (bottom 10%) increases T2D risk. Pharmacologic reduction of Lp(a) concentration in the 20% of individuals with the greatest concentration down to the population median is predicted to decrease CAD risk without increasing T2D risk.Peer Reviewe
    corecore