37 research outputs found

    Multiethnic Genetic Association Studies Improve Power for Locus Discovery

    Get PDF
    To date, genome-wide association studies have focused almost exclusively on populations of European ancestry. These studies continue with the advent of next-generation sequencing, designed to systematically catalog and test low-frequency variation for a role in disease. A complementary approach would be to focus further efforts on cohorts of multiple ethnicities. This leverages the idea that population genetic drift may have elevated some variants to higher allele frequency in different populations, boosting statistical power to detect an association. Based on empirical allele frequency distributions from eleven populations represented in HapMap Phase 3 and the 1000 Genomes Project, we simulate a range of genetic models to quantify the power of association studies in multiple ethnicities relative to studies that exclusively focus on samples of European ancestry. In each of these simulations, a first phase of GWAS in exclusively European samples is followed by a second GWAS phase in any of the other populations (including a multiethnic design). We find that nontrivial power gains can be achieved by conducting future whole-genome studies in worldwide populations, where, in particular, African populations contribute the largest relative power gains for low-frequency alleles (<5%) of moderate effect that suffer from low power in samples of European descent. Our results emphasize the importance of broadening genetic studies to worldwide populations to ensure efficient discovery of genetic loci contributing to phenotypic trait variability, especially for those traits for which large numbers of samples of European ancestry have already been collected and tested

    Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

    Get PDF
    With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu

    Quantifying Missing Heritability at Known GWAS Loci

    Get PDF
    Recent work has shown that much of the missing heritability of complex traits can be resolved by estimates of heritability explained by all genotyped SNPs. However, it is currently unknown how much heritability is missing due to poor tagging or additional causal variants at known GWAS loci. Here, we use variance components to quantify the heritability explained by all SNPs at known GWAS loci in nine diseases from WTCCC1 and WTCCC2. After accounting for expectation, we observed all SNPs at known GWAS loci to explain 1.29 X more heritability than GWAS-associated SNPs on average (P = 3.3 X 10[superscript -5]). For some diseases, this increase was individually significant:2.07 X for Multiple Sclerosis (MS) (P = 6.5 X 10 [superscript -9]) and for Crohn's Disease (CD) (P = 1.3 X 10[superscript -3]); all analyses of autoimmune diseases excluded the well-studied MHC region. Additionally, we found that GWAS loci from other related traits also explained significant heritability. The union of all autoimmune disease loci explained 7.15 X more MS heritability than known MS SNPs (P 20,000 Rheumatoid Arthritis (RA) samples typed on ImmunoChip, with 2.37 X more heritability from all SNPs at GWAS loci (P = 2.3 X 10[superscript -6]) and more heritability from all autoimmune disease loci (P < 1 X 10[superscript -16]) compared to known RA SNPs (including those identified in this cohort). Our methods adjust for LD between SNPs, which can bias standard estimates of heritability from SNPs even if all causal variants are typed. By comparing adjusted estimates, we hypothesize that the genome-wide distribution of causal variants is enriched for low-frequency alleles, but that causal variants at known GWAS loci are skewed towards common alleles. These findings have important ramifications for fine-mapping study design and our understanding of complex disease architecture.National Institutes of Health (U.S.) (Grant R03HG006731)National Institutes of Health (U.S.) (Fellowship F32GM106584

    Enhanced Statistical Tests for GWAS in Admixed Populations: Assessment using African Americans from CARe and a Breast Cancer Consortium

    Get PDF
    While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations

    Genome-Wide Association Study of White Blood Cell Count in 16,388 African Americans: the Continental Origins and Genetic Epidemiology Network (COGENT)

    Get PDF
    Total white blood cell (WBC) and neutrophil counts are lower among individuals of African descent due to the common African-derived “null” variant of the Duffy Antigen Receptor for Chemokines (DARC) gene. Additional common genetic polymorphisms were recently associated with total WBC and WBC sub-type levels in European and Japanese populations. No additional loci that account for WBC variability have been identified in African Americans. In order to address this, we performed a large genome-wide association study (GWAS) of total WBC and cell subtype counts in 16,388 African-American participants from 7 population-based cohorts available in the Continental Origins and Genetic Epidemiology Network. In addition to the DARC locus on chromosome 1q23, we identified two other regions (chromosomes 4q13 and 16q22) associated with WBC in African Americans (P<2.5×10−8). The lead SNP (rs9131) on chromosome 4q13 is located in the CXCL2 gene, which encodes a chemotactic cytokine for polymorphonuclear leukocytes. Independent evidence of the novel CXCL2 association with WBC was present in 3,551 Hispanic Americans, 14,767 Japanese, and 19,509 European Americans. The index SNP (rs12149261) on chromosome 16q22 associated with WBC count is located in a large inter-chromosomal segmental duplication encompassing part of the hydrocephalus inducing homolog (HYDIN) gene. We demonstrate that the chromosome 16q22 association finding is most likely due to a genotyping artifact as a consequence of sequence similarity between duplicated regions on chromosomes 16q22 and 1q21. Among the WBC loci recently identified in European or Japanese populations, replication was observed in our African-American meta-analysis for rs445 of CDK6 on chromosome 7q21 and rs4065321 of PSMD3-CSF3 region on chromosome 17q21. In summary, the CXCL2, CDK6, and PSMD3-CSF3 regions are associated with WBC count in African American and other populations. We also demonstrate that large inter-chromosomal duplications can result in false positive associations in GWAS

    A meta-analysis of genome-wide association studies of epigenetic age acceleration

    Get PDF
    Funding: Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates (CZD/16/6) and the Scottish Funding Council (HR03006). Genotyping and DNA methylation profiling of the GS samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, Edinburgh, Scotland and was funded by the Medical Research Council UK and the Wellcome Trust (Wellcome Trust Strategic Award “STratifying Resilience and Depression Longitudinally” ((STRADL) Reference 104036/Z/14/Z)). Funding details for the cohorts included in the study by Lu et al. (2018) can be found in their publication. HCW is supported by a JMAS SIM fellowship from the Royal College of Physicians of Edinburgh and by an ESAT College Fellowship from the University of Edinburgh. AMM & HCW acknowledge the support of the Dr. Mortimer and Theresa Sackler Foundation. SH acknowledges support from grant 1U01AG060908-01. REM is supported by Alzheimer’s Research UK major project grant ARUK-PG2017B-10. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Data Availability: Summary statistics from the research reported in the manuscript will be made available immediately following publication on the Edinburgh Data Share portal with a permanent digital object identifier (DOI). According to the terms of consent for Generation Scotland participants, requests for access to the individual-level data must be reviewed by the GS Access Committee ([email protected]). Individual-level data are not immediately available, due to confidentiality considerations and our legal obligation to protect personal information. These data will, however, be made available upon request and after review by the GS access committee, once ethical and data governance concerns regarding personal data have been addressed by the receiving institution through a Data Transfer Agreement.Peer reviewedPublisher PD

    Pathogenic Huntingtin Repeat Expansions in Patients with Frontotemporal Dementia and Amyotrophic Lateral Sclerosis.

    Get PDF
    We examined the role of repeat expansions in the pathogenesis of frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS) by analyzing whole-genome sequence data from 2,442 FTD/ALS patients, 2,599 Lewy body dementia (LBD) patients, and 3,158 neurologically healthy subjects. Pathogenic expansions (range, 40-64 CAG repeats) in the huntingtin (HTT) gene were found in three (0.12%) patients diagnosed with pure FTD/ALS syndromes but were not present in the LBD or healthy cohorts. We replicated our findings in an independent collection of 3,674 FTD/ALS patients. Postmortem evaluations of two patients revealed the classical TDP-43 pathology of FTD/ALS, as well as huntingtin-positive, ubiquitin-positive aggregates in the frontal cortex. The neostriatal atrophy that pathologically defines Huntington's disease was absent in both cases. Our findings reveal an etiological relationship between HTT repeat expansions and FTD/ALS syndromes and indicate that genetic screening of FTD/ALS patients for HTT repeat expansions should be considered

    Multiple testing correction in linear mixed models

    Get PDF
    BACKGROUND: Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-016-0903-6) contains supplementary material, which is available to authorized users
    corecore