29,028 research outputs found

    Improved Imputation of Common and Uncommon Single Nucleotide Polymorphisms (SNPs) with a New Reference Set

    Get PDF
    Statistical imputation of genotype data is an important technique for analysis of genome-wide association studies (GWAS). We have built a reference dataset to improve imputation accuracy for studies of individuals of primarily European descent using genotype data from the Hap1, Omni1, and Omni2.5 human SNP arrays (Illumina). Our dataset contains 2.5-3.1 million variants for 930 European, 157 Asian, and 162 African/African-American individuals. Imputation accuracy of European data from Hap660 or OmniExpress array content, measured by the proportion of variants imputed with R^2^>0.8, improved by 34%, 23% and 12% for variants with MAF of 3%, 5% and 10%, respectively, compared to imputation using publicly available data from 1,000 Genomes and International HapMap projects. The improved accuracy with the use of the new dataset could increase the power for GWAS by as much as 8% relative to genotyping all variants. This reference dataset is available to the scientific community through the NCBI dbGaP portal. Future versions will include additional genotype data as well as non-European populations

    The Population Genetic Signature of Polygenic Local Adaptation

    Full text link
    Adaptation in response to selection on polygenic phenotypes may occur via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that may have been influenced by local adaptation. We exploit the fact that GWAS provide an estimate of the additive effect size of many loci to estimate the mean additive genetic value for a given phenotype across many populations as simple weighted sums of allele frequencies. We first describe a general model of neutral genetic value drift for an arbitrary number of populations with an arbitrary relatedness structure. Based on this model we develop methods for detecting unusually strong correlations between genetic values and specific environmental variables, as well as a generalization of QST/FSTQ_{ST}/F_{ST} comparisons to test for over-dispersion of genetic values among populations. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles, and also significantly outperform methods that do not account for population structure. We apply our tests to the Human Genome Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation, type 2 diabetes, body mass index, and two inflammatory bowel disease datasets. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.Comment: 42 pages including 8 figures and 3 tables; supplementary figures and tables not included on this upload, but are mostly unchanged from v

    Association Signals Unveiled by a Comprehensive Gene Set Enrichment Analysis of Dental Caries Genome-Wide Association Studies

    Get PDF
    Gene set-based analysis of genome-wide association study (GWAS) data has recently emerged as a useful approach to examine the joint effects of multiple risk loci in complex human diseases or phenotypes. Dental caries is a common, chronic, and complex disease leading to a decrease in quality of life worldwide. In this study, we applied the approaches of gene set enrichment analysis to a major dental caries GWAS dataset, which consists of 537 cases and 605 controls. Using four complementary gene set analysis methods, we analyzed 1331 Gene Ontology (GO) terms collected from the Molecular Signatures Database (MSigDB). Setting false discovery rate (FDR) threshold as 0.05, we identified 13 significantly associated GO terms. Additionally, 17 terms were further included as marginally associated because they were top ranked by each method, although their FDR is higher than 0.05. In total, we identified 30 promising GO terms, including 'Sphingoid metabolic process,' 'Ubiquitin protein ligase activity,' 'Regulation of cytokine secretion,' and 'Ceramide metabolic process.' These GO terms encompass broad functions that potentially interact and contribute to the oral immune response related to caries development, which have not been reported in the standard single marker based analysis. Collectively, our gene set enrichment analysis provided complementary insights into the molecular mechanisms and polygenic interactions in dental caries, revealing promising association signals that could not be detected through single marker analysis of GWAS data. © 2013 Wang et al

    Genome-wide screening for DNA variants associated with reading and language traits

    Get PDF
    This research was funded by: Max Planck Society, the University of St Andrews - Grant Number: 018696, US National Institutes of Health - Grant Number: P50 HD027802, Wellcome Trust - Grant Number: 090532/Z/09/Z, and Medical Research Council Hub Grant Grant Number: G0900747 91070Reading and language abilities are heritable traits that are likely to share some genetic influences with each other. To identify pleiotropic genetic variants affecting these traits, we first performed a genome‐wide association scan (GWAS) meta‐analysis using three richly characterized datasets comprising individuals with histories of reading or language problems, and their siblings. GWAS was performed in a total of 1862 participants using the first principal component computed from several quantitative measures of reading‐ and language‐related abilities, both before and after adjustment for performance IQ. We identified novel suggestive associations at the SNPs rs59197085 and rs5995177 (uncorrected P ≈ 10–7 for each SNP), located respectively at the CCDC136/FLNC and RBFOX2 genes. Each of these SNPs then showed evidence for effects across multiple reading and language traits in univariate association testing against the individual traits. FLNC encodes a structural protein involved in cytoskeleton remodelling, while RBFOX2 is an important regulator of alternative splicing in neurons. The CCDC136/FLNC locus showed association with a comparable reading/language measure in an independent sample of 6434 participants from the general population, although involving distinct alleles of the associated SNP. Our datasets will form an important part of on‐going international efforts to identify genes contributing to reading and language skills.Publisher PDFPeer reviewe

    Accurate Genomic Prediction Of Human Height

    Full text link
    We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, \sim40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate \sim0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the "missing heritability" problem -- i.e., the gap between prediction R-squared and SNP heritability. The \sim20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.Comment: 17 pages, 10 figure
    corecore