29,028 research outputs found
Improved Imputation of Common and Uncommon Single Nucleotide Polymorphisms (SNPs) with a New Reference Set
Statistical imputation of genotype data is an important technique for analysis of genome-wide association studies (GWAS). We have built a reference dataset to improve imputation accuracy for studies of individuals of primarily European descent using genotype data from the Hap1, Omni1, and Omni2.5 human SNP arrays (Illumina). Our dataset contains 2.5-3.1 million variants for 930 European, 157 Asian, and 162 African/African-American individuals. Imputation accuracy of European data from Hap660 or OmniExpress array content, measured by the proportion of variants imputed with R^2^>0.8, improved by 34%, 23% and 12% for variants with MAF of 3%, 5% and 10%, respectively, compared to imputation using publicly available data from 1,000 Genomes and International HapMap projects. The improved accuracy with the use of the new dataset could increase the power for GWAS by as much as 8% relative to genotyping all variants. This reference dataset is available to the scientific community through the NCBI dbGaP portal. Future versions will include additional genotype data as well as non-European populations
The Population Genetic Signature of Polygenic Local Adaptation
Adaptation in response to selection on polygenic phenotypes may occur via
subtle allele frequencies shifts at many loci. Current population genomic
techniques are not well posed to identify such signals. In the past decade,
detailed knowledge about the specific loci underlying polygenic traits has
begun to emerge from genome-wide association studies (GWAS). Here we combine
this knowledge from GWAS with robust population genetic modeling to identify
traits that may have been influenced by local adaptation. We exploit the fact
that GWAS provide an estimate of the additive effect size of many loci to
estimate the mean additive genetic value for a given phenotype across many
populations as simple weighted sums of allele frequencies. We first describe a
general model of neutral genetic value drift for an arbitrary number of
populations with an arbitrary relatedness structure. Based on this model we
develop methods for detecting unusually strong correlations between genetic
values and specific environmental variables, as well as a generalization of
comparisons to test for over-dispersion of genetic values among
populations. Finally we lay out a framework to identify the individual
populations or groups of populations that contribute to the signal of
overdispersion. These tests have considerably greater power than their single
locus equivalents due to the fact that they look for positive covariance
between like effect alleles, and also significantly outperform methods that do
not account for population structure. We apply our tests to the Human Genome
Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation,
type 2 diabetes, body mass index, and two inflammatory bowel disease datasets.
This analysis uncovers a number of putative signals of local adaptation, and we
discuss the biological interpretation and caveats of these results.Comment: 42 pages including 8 figures and 3 tables; supplementary figures and
tables not included on this upload, but are mostly unchanged from v
Recommended from our members
GenEpi: gene-based epistasis discovery using machine learning.
BackgroundGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).ResultsIn this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.ConclusionsThe results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future
Association Signals Unveiled by a Comprehensive Gene Set Enrichment Analysis of Dental Caries Genome-Wide Association Studies
Gene set-based analysis of genome-wide association study (GWAS) data has recently emerged as a useful approach to examine the joint effects of multiple risk loci in complex human diseases or phenotypes. Dental caries is a common, chronic, and complex disease leading to a decrease in quality of life worldwide. In this study, we applied the approaches of gene set enrichment analysis to a major dental caries GWAS dataset, which consists of 537 cases and 605 controls. Using four complementary gene set analysis methods, we analyzed 1331 Gene Ontology (GO) terms collected from the Molecular Signatures Database (MSigDB). Setting false discovery rate (FDR) threshold as 0.05, we identified 13 significantly associated GO terms. Additionally, 17 terms were further included as marginally associated because they were top ranked by each method, although their FDR is higher than 0.05. In total, we identified 30 promising GO terms, including 'Sphingoid metabolic process,' 'Ubiquitin protein ligase activity,' 'Regulation of cytokine secretion,' and 'Ceramide metabolic process.' These GO terms encompass broad functions that potentially interact and contribute to the oral immune response related to caries development, which have not been reported in the standard single marker based analysis. Collectively, our gene set enrichment analysis provided complementary insights into the molecular mechanisms and polygenic interactions in dental caries, revealing promising association signals that could not be detected through single marker analysis of GWAS data. © 2013 Wang et al
Genome-wide screening for DNA variants associated with reading and language traits
This research was funded by: Max Planck Society, the University of St Andrews - Grant Number: 018696, US National Institutes of Health - Grant Number: P50 HD027802, Wellcome Trust - Grant Number: 090532/Z/09/Z, and Medical Research Council Hub Grant Grant Number: G0900747 91070Reading and language abilities are heritable traits that are likely to share some genetic influences with each other. To identify pleiotropic genetic variants affecting these traits, we first performed a genome‐wide association scan (GWAS) meta‐analysis using three richly characterized datasets comprising individuals with histories of reading or language problems, and their siblings. GWAS was performed in a total of 1862 participants using the first principal component computed from several quantitative measures of reading‐ and language‐related abilities, both before and after adjustment for performance IQ. We identified novel suggestive associations at the SNPs rs59197085 and rs5995177 (uncorrected P ≈ 10–7 for each SNP), located respectively at the CCDC136/FLNC and RBFOX2 genes. Each of these SNPs then showed evidence for effects across multiple reading and language traits in univariate association testing against the individual traits. FLNC encodes a structural protein involved in cytoskeleton remodelling, while RBFOX2 is an important regulator of alternative splicing in neurons. The CCDC136/FLNC locus showed association with a comparable reading/language measure in an independent sample of 6434 participants from the general population, although involving distinct alleles of the associated SNP. Our datasets will form an important part of on‐going international efforts to identify genes contributing to reading and language skills.Publisher PDFPeer reviewe
Accurate Genomic Prediction Of Human Height
We construct genomic predictors for heritable and extremely complex human
quantitative traits (height, heel bone density, and educational attainment)
using modern methods in high dimensional statistics (i.e., machine learning).
Replication tests show that these predictors capture, respectively, 40,
20, and 9 percent of total variance for the three traits. For example,
predicted heights correlate 0.65 with actual height; actual heights of
most individuals in validation samples are within a few cm of the prediction.
The variance captured for height is comparable to the estimated SNP
heritability from GCTA (GREML) analysis, and seems to be close to its
asymptotic value (i.e., as sample size goes to infinity), suggesting that we
have captured most of the heritability for the SNPs used. Thus, our results
resolve the common SNP portion of the "missing heritability" problem -- i.e.,
the gap between prediction R-squared and SNP heritability. The 20k
activated SNPs in our height predictor reveal the genetic architecture of human
height, at least for common SNPs. Our primary dataset is the UK Biobank cohort,
comprised of almost 500k individual genotypes with multiple phenotypes. We also
use other datasets and SNPs found in earlier GWAS for out-of-sample validation
of our results.Comment: 17 pages, 10 figure
Sex-specific glioma genome-wide association study identifies new risk locus at 3p21.31 in females, and finds sex-differences in risk at 8q24.21
- …
