Search CORE

82 research outputs found

Conflation of short identity-by-descent segments bias their inferred length distribution

Author: Chiang Charleston W. K.
Novembre John
Ralph Peter
Publication venue: 'Genetics Society of America'
Publication date: 17/08/2015
Field of study

Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to contain an IBD segment if they share a segment that is inherited from a recent shared common ancestor without intervening recombination. Long IBD segments (> 1cM) can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample. However, these approaches detect IBD based on contiguous segments of identity-by-state, and such segments may exist due to the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that nearly 40% of inferred segments 1-2cM long are results of conflations of two or more shorter segments, under demographic scenarios typical for modern humans. This biases the inferred IBD segment length distribution, and so can affect downstream inferences. We observed this conflation effect universally across different IBD detection programs and human demographic histories, and found inference of segments longer than 2cM to be much more reliable (less than 5% conflation rate). As an example of how this can negatively affect downstream analyses, we present and analyze a novel estimator of the de novo mutation rate using IBD segments, and demonstrate that the biased length distribution of the IBD segments due to conflation can lead to inflated estimates if the conflation is not modeled. Understanding the conflation effect in detail will make its correction in future methods more tractable

arXiv.org e-Print Archive

Directory of Open Access Journals

KLFDAPC : a supervised machine learning approach for spatial genetic structure analysis

Author: Chiang Charleston W K
Gaggiotti Oscar E
Qin Xinghu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 02/06/2022
Field of study

CSC-University of St Andrews Joint Scholarship (to X.Q.); International Postdoctoral Exchange Fellowship Program (Talent-Introduction Program) from China Postdoc Council (to X.Q.); National Institute of General Medical Sciences (NIGMS) of the National Institute of Health (grant R35GM142783 to C.W.K.C.). Part of the computation for this work is supported by USC’s Center for Advanced Research Computing (https://carc.usc.edu).Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.Publisher PDFPeer reviewe

PubMed Central

University of St. Andrews - Pure

St Andrews Research Repository

Evidence of widespread selection on standing variation in Europe at height-associated SNPs.

Author: Chiang Charleston WK
Genetic Investigation of ANthropometric Traits (GIANT) Consortium
Hirschhorn Joel N
Palmer Cameron D
Reich David
Sankararaman Sriram
Turchin Michael C
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Strong signatures of positive selection at newly arising genetic variants are well documented in humans(1-8), but this form of selection may not be widespread in recent human evolution(9). Because many human traits are highly polygenic and partly determined by common, ancient genetic variation, an alternative model for rapid genetic adaptation has been proposed: weak selection acting on many pre-existing (standing) genetic variants, or polygenic adaptation(10-12). By studying height, a classic polygenic trait, we demonstrate the first human signature of widespread selection on standing variation. We show that frequencies of alleles associated with increased height, both at known loci and genome wide, are systematically elevated in Northern Europeans compared with Southern Europeans (P < 4.3 × 10(-4)). This pattern mirrors intra-European height differences and is not confounded by ancestry or other ascertainment biases. The systematic frequency differences are consistent with the presence of widespread weak selection (selection coefficients ∼10(-3)-10(-5) per allele) rather than genetic drift alone (P < 10(-15))

CiteSeerX

PubMed Central

eScholarship - University of California

Recommended from our members

Evidence of Widespread Selection on Standing Variation in Europe at Height-Associated SNPs

Author: Chiang Charleston W. K.
GIANT Consortium
Hirschhorn Joel Naom
Palmer Cameron Douglas
Reich David Emil
Sankararaman Sriram
Turchin Michael C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/05/2013
Field of study

Strong signatures of positive selection at newly arising genetic variants are well-documented in humans, but this form of selection may not be widespread in recent human evolution. Because many human traits are highly polygenic and partly determined by common, ancient genetic variation, an alternative model for rapid genetic adaptation has been proposed: weak selection acting on many pre-existing (standing) genetic variants, or polygenic adaptation. By studying height, a classic polygenic trait, we demonstrate the first human signature of widespread selection on standing variation. We show that frequencies of alleles associated with increased height, both at known loci and genome-wide, are systematically elevated in Northern Europeans compared with Southern Europeans

(p<4.3×10^{−4})

. This pattern mirrors intra-European height differences and is not confounded by ancestry or other ascertainment biases. The systematic frequency differences are consistent with the presence of widespread weak selection (selection coefficients

~10^{−3}–10^{−5}

per allele) rather than genetic drift alone

(p<10^{−15})

Harvard University - DASH

Recommended from our members

Exome sequencing of Finnish isolates enhances rare-variant association power.

Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power

eScholarship - University of California

Mitochondrial genome copy number measured by DNA sequencing in human blood is strongly associated with metabolic traits via cell-type composition differences

Author: Abel Haley
Boehnke Michael
Chen Lei
Chiang Charleston W. K.
Christ Ryan
Das Indraniel
Freimer Nelson
Ganel Liron
Hall Ira M.
Havulinna Aki
Kanchi Krishna
Kang Chul Joo
Kuusisto Johanna
Laakso Markku
Larson David
Locke Adam
Palotie Aarno
Regier Allison
Ripatti Samuli
Scott Alexandra
Service Susan
Stitziel Nathan O.
Vangipurapu Jagadish
Young Erica
Publication venue
Publication date: 01/06/2021
Field of study

Background Mitochondrial genome copy number (MT-CN) varies among humans and across tissues and is highly heritable, but its causes and consequences are not well understood. When measured by bulk DNA sequencing in blood, MT-CN may reflect a combination of the number of mitochondria per cell and cell-type composition. Here, we studied MT-CN variation in blood-derived DNA from 19184 Finnish individuals using a combination of genome (N = 4163) and exome sequencing (N = 19034) data as well as imputed genotypes (N = 17718). Results We identified two loci significantly associated with MT-CN variation: a common variant at the MYB-HBS1L locus (P = 1.6 x 10(-8)), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the TMBIM1 gene (P = 3.0 x 10(-8)), which has been reported to protect against non-alcoholic fatty liver disease. We also found that MT-CN is strongly associated with insulin levels (P = 2.0 x 10(-21)) and other metabolic syndrome (metS)-related traits. Using a Mendelian randomization framework, we show evidence that MT-CN measured in blood is causally related to insulin levels. We then applied an MT-CN polygenic risk score (PRS) derived from Finnish data to the UK Biobank, where the association between the PRS and metS traits was replicated. Adjusting for cell counts largely eliminated these signals, suggesting that MT-CN affects metS via cell-type composition. Conclusion These results suggest that measurements of MT-CN in blood-derived DNA partially reflect differences in cell-type composition and that these differences are causally linked to insulin and related traits.Peer reviewe

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Helsingin yliopiston digitaalinen arkisto

Deep Blue Documents at the University of Michigan

Ultraconserved Elements in the Human Genome: Association and Transmission Analyses of Highly Constrained Single-Nucleotide Polymorphisms

Author: Boerwinkle Eric
Chiang Charleston W. K.
Cupples L. Adrienne
Demerath Ellen W.
Franceschini Nora
Hirschhorn Joel N.
Jorgensen Neal W.
Keating Brendan J.
Lange Leslie A.
Lettre Guillaume
Liu Ching-Ti
Murabito Joanne M.
Nock Nora L.
North Kari E.
Papanicolaou George J.
Reiner Alex P.
Rotter Jerome I.
Vedantam Sailaja
Wilson James G.
Publication venue
Publication date: 01/01/2012
Field of study

Ultraconserved elements in the human genome likely harbor important biological functions as they are dosage sensitive and are able to direct tissue-specific expression. Because they are under purifying selection, variants in these elements may have a lower frequency in the population but a higher likelihood of association with complex traits. We tested a set of highly constrained SNPs (hcSNPs) distributed genome-wide among ultraconserved and nearly ultraconserved elements for association with seven traits related to reproductive (age at natural menopause, number of children, age at first child, and age at last child) and overall [longevity, body mass index (BMI), and height] fitness. Using up to 24,047 European-American samples from the National Heart, Lung, and Blood Institute Candidate Gene Association Resource (CARe), we observed an excess of associations with BMI and height. In an independent replication panel the most strongly associated SNPs showed an 8.4-fold enrichment of associations at the nominal level, including three variants in previously identified loci and one in a locus (DENND1A) previously shown to be associated with polycystic ovary syndrome. Finally, using 1430 family trios, we showed that the transmissions from heterozygous parents to offspring of the derived alleles of rare (frequency ≤0.5%) hcSNPs are not biased, particularly after adjusting for the rates of genotype missingness and error in the data. The lack of transmission bias ruled out an immediately and strongly deleterious effect due to the rare derived alleles, consistent with the observation that mice homozygous for the deletion of ultraconserved elements showed no overt phenotype. Our study also illustrated the importance of carefully modeling potential technical confounders when analyzing genotype data of rare variants

PubMed Central

Carolina Digital Repository

Concept, Design and Implementation of a Cardiovascular Gene-Centric 50 K SNP Array for Large-Scale Genomic Association Studies

Author: Ajmal Saad
Anand Sonia S.
Bailey Swneke D.
Barrett Jeffrey C.
Bhangale Tushar
Boehnke Michael
Boerwinkle Eric
Cappola Thomas P.
Caulfield Mark
Chandrupatla Hareesh R.
Chiang Charleston W. K.
de Bakker Paul I Wen
DerOhannessian Stephanie
Drake Thomas
Edmondson Andrew C.
Engert James C.
Fabsitz Richard R.
Farlow Deborah N.
FitzGerald Garret A.
Fornage Myriam
Frackelton Edward
Gabriel Stacey B.
Gai Xiaowu
Galver Luana
Glessner Joseph T.
Grant Struan F. A.
Groop Leif
Guo Yiran
Hakonarson Hakon
Hall Alistair S.
Hansen Mark
Hattersley Andrew T.
Hirschhorn Joel Naom
Kathiresan Sekar
Keating Brendan J.
Kim Cecelia E.
Koenig Wolfgang
Li Mingyao
Lusis A. Jake
McCarthy Mark I.
Montpetit Alexandre
Munroe Patricia
Murray Sarah S.
Nickerson Deborah A.
Ouwehand Willem
Papanicolaou George J.
Patterson Nick
Price Alkes
Price Thomas S.
Rader Daniel J.
Reich David Emil
Reilly Muredach
Reitsma Pieter H.
Samani Nilesh J.
Schadt Eric
Shaikh Tamim
Taylor Kent
Tischfield Sam
Wang Susanna S.
Whitehead A. Stephen
Wilson James G.
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

A wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a “cosmopolitan” tagging approach to capture the genetic diversity across ∼2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions

Directory of Open Access Journals

Queen Mary Research Online

DigitalCommons@The Texas Medical Center

Hal-Diderot

Public Library of Science (PLOS)

Crossref

LSHTM Research Online

Harvard University - DASH

Springer - Publisher Connector

HAL-Inserm

PubMed Central

Oxford University Research Archive

King's Research Portal

HAL UVSQ

Leicester Research Archive

Combined admixture mapping and association analysis identifies a novel blood pressure genetic locus on 5p13: contributions from the CARe consortium

Admixture mapping based on recently admixed populations is a powerful method to detect disease variants with substantial allele frequency differences in ancestral populations. We performed admixture mapping analysis for systolic blood pressure (SBP) and diastolic blood pressure (DBP), followed by trait-marker association analysis, in 6303 unrelated African-American participants of the Candidate Gene Association Resource (CARe) consortium. We identified five genomic regions (P< 0.001) harboring genetic variants contributing to inter-individual BP variation. In follow-up association analyses, correcting for all tests performed in this study, three loci were significantly associated with SBP and one significantly associated with DBP (P< 10−5). Further analyses suggested that six independent single-nucleotide polymorphisms (SNPs) contributed to the phenotypic variation observed in the admixture mapping analysis. These six SNPs were examined for replication in multiple, large, independent studies of African-Americans [Women's Health Initiative (WHI), Maywood, Genetic Epidemiology Network of Arteriopathy (GENOA) and Howard University Family Study (HUFS)] as well as one native African sample (Nigerian study), with a total replication sample size of 11 882. Meta-analysis of the replication set identified a novel variant (rs7726475) on chromosome 5 between the SUB1 and NPR3 genes, as being associated with SBP and DBP (P< 0.0015 for both); in meta-analyses combining the CARe samples with the replication data, we observed P-values of 4.45 × 10−7 for SBP and 7.52 × 10−7 for DBP for rs7726475 that were significant after accounting for all the tests performed. Our study highlights that admixture mapping analysis can help identify genetic variants missed by genome-wide association studies because of drastically reduced number of tests in the whole genome

Carolina Digital Repository