154 research outputs found

    Introduction to the Special Issue: Genome-Wide Association Studies

    Full text link
    Introduction to the Special Issue: Genome-Wide Association StudiesComment: Published in at http://dx.doi.org/10.1214/09-STS310 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies

    Get PDF
    Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%–20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions

    Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

    Get PDF
    A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or \u27scaffold\u27) of haplotypes across each chromosome. We then phase the sequence data \u27onto\u27 this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved

    Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations

    Get PDF
    Identifying the ancestry of chromosomal segments of distinct ancestry has a wide range of applications from disease mapping to learning about history. Most methods require the use of unlinked markers; but, using all markers from genome-wide scanning arrays, it should in principle be possible to infer the ancestry of even very small segments with exquisite accuracy. We describe a method, HAPMIX, which employs an explicit population genetic model to perform such local ancestry inference based on fine-scale variation data. We show that HAPMIX outperforms other methods, and we explore its utility for inferring ancestry, learning about ancestral populations, and inferring dates of admixture. We validate the method empirically by applying it to populations that have experienced recent and ancient admixture: 935 African Americans from the United States and 29 Mozabites from North Africa. HAPMIX will be of particular utility for mapping disease genes in recently admixed populations, as its accurate estimates of local ancestry permit admixture and case-control association signals to be combined, enabling more powerful tests of association than with either signal alone

    An Evolutionary Framework for Association Testing in Resequencing Studies

    Get PDF
    Sequencing technologies are becoming cheap enough to apply to large numbers of study participants and promise to provide new insights into human phenotypes by bringing to light rare and previously unknown genetic variants. We develop a new framework for the analysis of sequence data that incorporates all of the major features of previously proposed approaches, including those focused on allele counts and allele burden, but is both more general and more powerful. We harness population genetic theory to provide prior information on effect sizes and to create a pooling strategy for information from rare variants. Our method, EMMPAT (Evolutionary Mixed Model for Pooled Association Testing), generates a single test per gene (substantially reducing multiple testing concerns), facilitates graphical summaries, and improves the interpretation of results by allowing calculation of attributable variance. Simulations show that, relative to previously used approaches, our method increases the power to detect genes that affect phenotype when natural selection has kept alleles with large effect sizes rare. We demonstrate our approach on a population-based re-sequencing study of association between serum triglycerides and variation in ANGPTL4

    An African Ancestry-Specific Allele of CTLA4 Confers Protection against Rheumatoid Arthritis in African Americans

    Get PDF
    Cytotoxic T-lymphocyte associated protein 4 (CTLA4) is a negative regulator of T-cell proliferation. Polymorphisms in CTLA4 have been inconsistently associated with susceptibility to rheumatoid arthritis (RA) in populations of European ancestry but have not been examined in African Americans. The prevalence of RA in most populations of European and Asian ancestry is ∼1.0%; RA is purportedly less common in black Africans, with little known about its prevalence in African Americans. We sought to determine if CTLA4 polymorphisms are associated with RA in African Americans. We performed a 2-stage analysis of 12 haplotype tagging single nucleotide polymorphisms (SNPs) across CTLA4 in a total of 505 African American RA patients and 712 African American controls using Illumina and TaqMan platforms. The minor allele (G) of the rs231778 SNP was 0.054 in RA patients, compared to 0.209 in controls (4.462×10−26, Fisher's exact). The presence of the G allele was associated with a substantially reduced odds ratio (OR) of having RA (AG+GG genotypes vs. AA genotype, OR 0.19, 95% CI: 0.13–0.26, p = 2.4×10−28, Fisher's exact), suggesting a protective effect. This SNP is polymorphic in the African population (minor allele frequency [MAF] 0.09 in the Yoruba population), but is very rare in other groups (MAF = 0.002 in 530 Caucasians genotyped for this study). Markers associated with RA in populations of European ancestry (rs3087243 [+60C/T] and rs231775 [+49A/G]) were not replicated in African Americans. We found no confounding of association for rs231778 after stratifying for the HLA-DRB1 shared epitope, presence of anti-cyclic citrullinated peptide antibody, or degree of admixture from the European population. An African ancestry-specific genetic variant of CTLA4 appears to be associated with protection from RA in African Americans. This finding may explain, in part, the relatively low prevalence of RA in black African populations

    Dementia Revealed: Novel Chromosome 6 Locus for Late-Onset Alzheimer Disease Provides Genetic Evidence for Folate-Pathway Abnormalities

    Get PDF
    Genome-wide association studies (GWAS) of late-onset Alzheimer disease (LOAD) have consistently observed strong evidence of association with polymorphisms in APOE. However, until recently, variants at few other loci with statistically significant associations have replicated across studies. The present study combines data on 483,399 single nucleotide polymorphisms (SNPs) from a previously reported GWAS of 492 LOAD cases and 496 controls and from an independent set of 439 LOAD cases and 608 controls to strengthen power to identify novel genetic association signals. Associations exceeding the experiment-wide significance threshold () were replicated in an additional 1,338 cases and 2,003 controls. As expected, these analyses unequivocally confirmed APOE's risk effect (rs2075650, ). Additionally, the SNP rs11754661 at 151.2 Mb of chromosome 6q25.1 in the gene MTHFD1L (which encodes the methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1-like protein) was significantly associated with LOAD (; Bonferroni-corrected P = 0.022). Subsequent genotyping of SNPs in high linkage disequilibrium () with rs11754661 identified statistically significant associations in multiple SNPs (rs803424, P = 0.016; rs2073067, P = 0.03; rs2072064, P = 0.035), reducing the likelihood of association due to genotyping error. In the replication case-control set, we observed an association of rs11754661 in the same direction as the previous association at P = 0.002 ( in combined analysis of discovery and replication sets), with associations of similar statistical significance at several adjacent SNPs (rs17349743, P = 0.005; rs803422, P = 0.004). In summary, we observed and replicated a novel statistically significant association in MTHFD1L, a gene involved in the tetrahydrofolate synthesis pathway. This finding is noteworthy, as MTHFD1L may play a role in the generation of methionine from homocysteine and influence homocysteine-related pathways and as levels of homocysteine are a significant risk factor for LOAD development

    Localization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A

    Get PDF
    The major histocompatibility complex (MHC) on chromosome 6 is associated with susceptibility to more common diseases than any other region of the human genome, including almost all disorders classified as autoimmune. In type 1 diabetes the major genetic susceptibility determinants have been mapped to the MHC class II genes HLA-DQB1 and HLA-DRB1 (refs 1-3), but these genes cannot completely explain the association between type 1 diabetes and the MHC region. Owing to the region's extreme gene density, the multiplicity of disease-associated alleles, strong associations between alleles, limited genotyping capability, and inadequate statistical approaches and sample sizes, which, and how many, loci within the MHC determine susceptibility remains unclear. Here, in several large type 1 diabetes data sets, we analyse a combined total of 1,729 polymorphisms, and apply statistical methods - recursive partitioning and regression - to pinpoint disease susceptibility to the MHC class I genes HLA-B and HLA-A (risk ratios >1.5; Pcombined = 2.01 × 10-19 and 2.35 × 10-13, respectively) in addition to the established associations of the MHC class II genes. Other loci with smaller and/or rarer effects might also be involved, but to find these, future searches must take into account both the HLA class II and class I genes and use even larger samples. Taken together with previous studies, we conclude that MHC-class-I-mediated events, principally involving HLA-B*39, contribute to the aetiology of type 1 diabetes. ©2007 Nature Publishing Group

    Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes

    Get PDF
    Genome-wide association (GWA) studies have identified multiple new genomic loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D)1-11. Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to discover loci at which common alleles have modest effects, we performed meta-analysis of three T2D GWA scans encompassing 10,128 individuals of European-descent and ~2.2 million SNPs (directly genotyped and imputed). Replication testing was performed in an independent sample with an effective sample size of up to 53,975. At least six new loci with robust evidence for association were detected, including the JAZF1 (p=5.0×10−14), CDC123/CAMK1D (p=1.2×10−10), TSPAN8/LGR5 (p=1.1×10−9), THADA (p=1.1×10−9), ADAMTS9 (p=1.2×10−8), and NOTCH2 (p=4.1×10−8) gene regions. The large number of loci with relatively small effects indicates the value of large discovery and follow-up samples in identifying additional clues about the inherited basis of T2D

    Cystatin C and Cardiovascular Disease

    Get PDF
    Background Epidemiological studies show that high circulating cystatin C is associated with risk of cardiovascular disease (CVD), independent of creatinine-based renal function measurements. It is unclear whether this relationship is causal, arises from residual confounding, and/or is a consequence of reverse causation. Objectives The aim of this study was to use Mendelian randomization to investigate whether cystatin C is causally related to CVD in the general population. Methods We incorporated participant data from 16 prospective cohorts (n = 76,481) with 37,126 measures of cystatin C and added genetic data from 43 studies (n = 252,216) with 63,292 CVD events. We used the common variant rs911119 in CST3 as an instrumental variable to investigate the causal role of cystatin C in CVD, including coronary heart disease, ischemic stroke, and heart failure. Results Cystatin C concentrations were associated with CVD risk after adjusting for age, sex, and traditional risk factors (relative risk: 1.82 per doubling of cystatin C; 95% confidence interval [CI]: 1.56 to 2.13; p = 2.12 × 10−14). The minor allele of rs911119 was associated with decreased serum cystatin C (6.13% per allele; 95% CI: 5.75 to 6.50; p = 5.95 × 10−211), explaining 2.8% of the observed variation in cystatin C. Mendelian randomization analysis did not provide evidence for a causal role of cystatin C, with a causal relative risk for CVD of 1.00 per doubling cystatin C (95% CI: 0.82 to 1.22; p = 0.994), which was statistically different from the observational estimate (p = 1.6 × 10−5). A causal effect of cystatin C was not detected for any individual component of CVD. Conclusions Mendelian randomization analyses did not support a causal role of cystatin C in the etiology of CVD. As such, therapeutics targeted at lowering circulating cystatin C are unlikely to be effective in preventing CVD
    • …
    corecore