47 research outputs found

    Conflation of short identity-by-descent segments bias their inferred length distribution

    Full text link
    Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to contain an IBD segment if they share a segment that is inherited from a recent shared common ancestor without intervening recombination. Long IBD segments (> 1cM) can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample. However, these approaches detect IBD based on contiguous segments of identity-by-state, and such segments may exist due to the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that nearly 40% of inferred segments 1-2cM long are results of conflations of two or more shorter segments, under demographic scenarios typical for modern humans. This biases the inferred IBD segment length distribution, and so can affect downstream inferences. We observed this conflation effect universally across different IBD detection programs and human demographic histories, and found inference of segments longer than 2cM to be much more reliable (less than 5% conflation rate). As an example of how this can negatively affect downstream analyses, we present and analyze a novel estimator of the de novo mutation rate using IBD segments, and demonstrate that the biased length distribution of the IBD segments due to conflation can lead to inflated estimates if the conflation is not modeled. Understanding the conflation effect in detail will make its correction in future methods more tractable

    KLFDAPC : a supervised machine learning approach for spatial genetic structure analysis

    Get PDF
    CSC-University of St Andrews Joint Scholarship (to X.Q.); International Postdoctoral Exchange Fellowship Program (Talent-Introduction Program) from China Postdoc Council (to X.Q.); National Institute of General Medical Sciences (NIGMS) of the National Institute of Health (grant R35GM142783 to C.W.K.C.). Part of the computation for this work is supported by USC’s Center for Advanced Research Computing (https://carc.usc.edu).Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.Publisher PDFPeer reviewe

    Mitochondrial genome copy number measured by DNA sequencing in human blood is strongly associated with metabolic traits via cell-type composition differences

    Get PDF
    Background Mitochondrial genome copy number (MT-CN) varies among humans and across tissues and is highly heritable, but its causes and consequences are not well understood. When measured by bulk DNA sequencing in blood, MT-CN may reflect a combination of the number of mitochondria per cell and cell-type composition. Here, we studied MT-CN variation in blood-derived DNA from 19184 Finnish individuals using a combination of genome (N = 4163) and exome sequencing (N = 19034) data as well as imputed genotypes (N = 17718). Results We identified two loci significantly associated with MT-CN variation: a common variant at the MYB-HBS1L locus (P = 1.6 x 10(-8)), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the TMBIM1 gene (P = 3.0 x 10(-8)), which has been reported to protect against non-alcoholic fatty liver disease. We also found that MT-CN is strongly associated with insulin levels (P = 2.0 x 10(-21)) and other metabolic syndrome (metS)-related traits. Using a Mendelian randomization framework, we show evidence that MT-CN measured in blood is causally related to insulin levels. We then applied an MT-CN polygenic risk score (PRS) derived from Finnish data to the UK Biobank, where the association between the PRS and metS traits was replicated. Adjusting for cell counts largely eliminated these signals, suggesting that MT-CN affects metS via cell-type composition. Conclusion These results suggest that measurements of MT-CN in blood-derived DNA partially reflect differences in cell-type composition and that these differences are causally linked to insulin and related traits.Peer reviewe

    Ultraconserved Elements in the Human Genome: Association and Transmission Analyses of Highly Constrained Single-Nucleotide Polymorphisms

    Get PDF
    Ultraconserved elements in the human genome likely harbor important biological functions as they are dosage sensitive and are able to direct tissue-specific expression. Because they are under purifying selection, variants in these elements may have a lower frequency in the population but a higher likelihood of association with complex traits. We tested a set of highly constrained SNPs (hcSNPs) distributed genome-wide among ultraconserved and nearly ultraconserved elements for association with seven traits related to reproductive (age at natural menopause, number of children, age at first child, and age at last child) and overall [longevity, body mass index (BMI), and height] fitness. Using up to 24,047 European-American samples from the National Heart, Lung, and Blood Institute Candidate Gene Association Resource (CARe), we observed an excess of associations with BMI and height. In an independent replication panel the most strongly associated SNPs showed an 8.4-fold enrichment of associations at the nominal level, including three variants in previously identified loci and one in a locus (DENND1A) previously shown to be associated with polycystic ovary syndrome. Finally, using 1430 family trios, we showed that the transmissions from heterozygous parents to offspring of the derived alleles of rare (frequency ≤0.5%) hcSNPs are not biased, particularly after adjusting for the rates of genotype missingness and error in the data. The lack of transmission bias ruled out an immediately and strongly deleterious effect due to the rare derived alleles, consistent with the observation that mice homozygous for the deletion of ultraconserved elements showed no overt phenotype. Our study also illustrated the importance of carefully modeling potential technical confounders when analyzing genotype data of rare variants

    Concept, Design and Implementation of a Cardiovascular Gene-Centric 50 K SNP Array for Large-Scale Genomic Association Studies

    Get PDF
    A wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a “cosmopolitan” tagging approach to capture the genetic diversity across ∼2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions

    A principal component meta-analysis on multiple anthropometric traits identifies novel loci for body shape

    Get PDF
    Large consortia have revealed hundreds of genetic loci associated with anthropometric traits, one trait at a time. We examined whether genetic variants affect body shape as a composite phenotype that is represented by a combination of anthropometric traits. We developed an approach that calculates averaged PCs (AvPCs) representing body shape derived from six anthropometric traits (body mass index, height, weight, waist and hip circumference, waist-to-hip ratio). The first four AvPCs explain >99% of the variability, are heritable, and associate with cardiometabolic outcomes. We performed genome-wide association analyses for each body shape composite phenotype across 65 studies and meta-analysed summary statistics. We identify six novel loci: LEMD2 and CD47 for AvPC1, RPS6KA5/C14orf159 and GANAB for AvPC3, and ARL15 and ANP32 for AvPC4. Our findings highlight the value of using multiple traits to define complex phenotypes for discovery, which are not captured by single-trait analyses, and may shed light onto new pathways

    A principal component meta-analysis on multiple anthropometric traits identifies novel loci for body shape

    Get PDF
    Large consortia have revealed hundreds of genetic loci associated with anthropometric traits, one trait at a time. We examined whether genetic variants affect body shape as a composite phenotype that is represented by a combination of anthropometric traits. We developed an approach that calculates averaged PCs (AvPCs) representing body shape derived from six anthropometric traits (body mass index, height, weight, waist and hip circumference, waist-to-hip ratio). The first four AvPCs explain >99% of the variability, are heritable, and associate with cardiometabolic outcomes. We performed genome-wide association analyses for each body shape composite phenotype across 65 studies and meta-analysed summary statistics. We identify six novel loci: LEMD2 and CD47 for AvPC1, RPS6KA5/C14orf159 and GANAB for AvPC3, and ARL15 and ANP32 for AvPC4. Our findings highlight the value of using multiple traits to define complex phenotypes for discovery, which are not captured by single-trait analyses, and may shed light onto new pathways.Peer reviewe
    corecore