22 research outputs found

    Robust, flexible, and scalable tests for Hardy-Weinberg Equilibrium across diverse ancestries

    Get PDF
    Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in datasets comprised of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence datasets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently amongst the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth

    Telomere length is not a main factor for the development of islet autoimmunity and type 1 diabetes in the TEDDY study

    Get PDF
    The Environmental Determinants of Diabetes in the Young (TEDDY) study enrolled 8676 children, 3-4 months of age, born with HLA-susceptibility genotypes for islet autoimmunity (IA) and type 1 diabetes (T1D). Whole-genome sequencing (WGS) was performed in 1119 children in a nested case-control study design. Telomere length was estimated from WGS data using five tools: Computel, Telseq, Telomerecat, qMotif and Motif_counter. The estimated median telomere length was 5.10 kb (IQR 4.52-5.68 kb) using Computel. The age when the blood sample was drawn had a significant negative correlation with telomere length (P = 0.003). European children, particularly those from Finland (P = 0.041) and from Sweden (P = 0.001), had shorter telomeres than children from the U.S.A. Paternal age (P = 0.019) was positively associated with telomere length. First-degree relative status, presence of gestational diabetes in the mother, and maternal age did not have a significant impact on estimated telomere length. HLA-DR4/4 or HLA-DR4/X children had significantly longer telomeres compared to children with HLA-DR3/3 or HLA-DR3/9 haplogenotypes (P = 0.008). Estimated telomere length was not significantly different with respect to any IA (P = 0.377), IAA-first (P = 0.248), GADA-first (P = 0.248) or T1D (P = 0.861). These results suggest that telomere length has no major impact on the risk for IA, the first step to develop T1D. Nevertheless, telomere length was shorter in the T1D high prevalence populations, Finland and Sweden.</p

    Telomere length is not a main factor for the development of islet autoimmunity and type 1 diabetes in the TEDDY study.

    Get PDF
    Funder: Lund UniversityThe Environmental Determinants of Diabetes in the Young (TEDDY) study enrolled 8676 children, 3-4 months of age, born with HLA-susceptibility genotypes for islet autoimmunity (IA) and type 1 diabetes (T1D). Whole-genome sequencing (WGS) was performed in 1119 children in a nested case-control study design. Telomere length was estimated from WGS data using five tools: Computel, Telseq, Telomerecat, qMotif and Motif_counter. The estimated median telomere length was 5.10 kb (IQR 4.52-5.68 kb) using Computel. The age when the blood sample was drawn had a significant negative correlation with telomere length (P = 0.003). European children, particularly those from Finland (P = 0.041) and from Sweden (P = 0.001), had shorter telomeres than children from the U.S.A. Paternal age (P = 0.019) was positively associated with telomere length. First-degree relative status, presence of gestational diabetes in the mother, and maternal age did not have a significant impact on estimated telomere length. HLA-DR4/4 or HLA-DR4/X children had significantly longer telomeres compared to children with HLA-DR3/3 or HLA-DR3/9 haplogenotypes (P = 0.008). Estimated telomere length was not significantly different with respect to any IA (P = 0.377), IAA-first (P = 0.248), GADA-first (P = 0.248) or T1D (P = 0.861). These results suggest that telomere length has no major impact on the risk for IA, the first step to develop T1D. Nevertheless, telomere length was shorter in the T1D high prevalence populations, Finland and Sweden

    SARS-CoV-2 susceptibility and COVID-19 disease severity are associated with genetic variants affecting gene expression in a variety of tissues

    Get PDF
    Variability in SARS-CoV-2 susceptibility and COVID-19 disease severity between individuals is partly due to genetic factors. Here, we identify 4 genomic loci with suggestive associations for SARS-CoV-2 susceptibility and 19 for COVID-19 disease severity. Four of these 23 loci likely have an ethnicity-specific component. Genome-wide association study (GWAS) signals in 11 loci colocalize with expression quantitative trait loci (eQTLs) associated with the expression of 20 genes in 62 tissues/cell types (range: 1:43 tissues/gene), including lung, brain, heart, muscle, and skin as well as the digestive system and immune system. We perform genetic fine mapping to compute 99% credible SNP sets, which identify 10 GWAS loci that have eight or fewer SNPs in the credible set, including three loci with one single likely causal SNP. Our study suggests that the diverse symptoms and disease severity of COVID-19 observed between individuals is associated with variants across the genome, affecting gene expression levels in a wide variety of tissue types

    A first update on mapping the human genetic architecture of COVID-19

    Get PDF
    peer reviewe

    Exploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb.

    No full text
    Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods

    Genome-wide association study of cardiac troponin i in the general population

    No full text
    Circulating cardiac troponin proteins are associated with structural heart disease and predict incident cardiovascular disease in the general population. However, the genetic contribution to cardiac troponin I (cTnI) concentrations and its causal effect on cardiovascular phenotypes are unclear. We combine data from two large population-based studies, the Trøndelag Health Study and the Generation Scotland Scottish Family Health Study, and perform a genome-wide association study of high-sensitivity cTnI concentrations with 48 115 individuals. We further use two-sample Mendelian randomization to investigate the causal effects of circulating cTnI on acute myocardial infarction (AMI) and heart failure (HF). We identified 12 genetic loci (8 novel) associated with cTnI concentrations. Associated protein-altering variants highlighted putative functional genes: CAND2, HABP2, ANO5, APOH, FHOD3, TNFAIP2, KLKB1 and LMAN1. Phenome-wide association tests in 1688 phecodes and 83 continuous traits in UK Biobank showed associations between a genetic risk score for cTnI and cardiac arrhythmias, metabolic and anthropometric measures. Using two-sample Mendelian randomization, we confirmed the non-causal role of cTnI in AMI (5948 cases, 355 246 controls). We found indications for a causal role of cTnI in HF (47 309 cases and 930 014 controls), but this was not supported by secondary analyses using left ventricular mass as outcome (18 257 individuals). Our findings clarify the biology underlying the heritable contribution to circulating cTnI and support cTnI as a non-causal biomarker for AMI in the general population. Using genetically informed methods for causal inference helps inform the role and value of measuring cTnI in the general population

    Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

    No full text
    © 2021, The Author(s). The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%
    corecore