2,608 research outputs found
A fast algorithm for detecting gene-gene interactions in genome-wide association studies
With the recent advent of high-throughput genotyping techniques, genetic data
for genome-wide association studies (GWAS) have become increasingly available,
which entails the development of efficient and effective statistical
approaches. Although many such approaches have been developed and used to
identify single-nucleotide polymorphisms (SNPs) that are associated with
complex traits or diseases, few are able to detect gene-gene interactions among
different SNPs. Genetic interactions, also known as epistasis, have been
recognized to play a pivotal role in contributing to the genetic variation of
phenotypic traits. However, because of an extremely large number of SNP-SNP
combinations in GWAS, the model dimensionality can quickly become so
overwhelming that no prevailing variable selection methods are capable of
handling this problem. In this paper, we present a statistical framework for
characterizing main genetic effects and epistatic interactions in a GWAS study.
Specifically, we first propose a two-stage sure independence screening (TS-SIS)
procedure and generate a pool of candidate SNPs and interactions, which serve
as predictors to explain and predict the phenotypes of a complex trait. We also
propose a rates adjusted thresholding estimation (RATE) approach to determine
the size of the reduced model selected by an independence screening.
Regularization regression methods, such as LASSO or SCAD, are then applied to
further identify important genetic effects. Simulation studies show that the
TS-SIS procedure is computationally efficient and has an outstanding finite
sample performance in selecting potential SNPs as well as gene-gene
interactions. We apply the proposed framework to analyze an
ultrahigh-dimensional GWAS data set from the Framingham Heart Study, and select
23 active SNPs and 24 active epistatic interactions for the body mass index
variation. It shows the capability of our procedure to resolve the complexity
of genetic control.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS771 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The evolution of genetic architectures underlying quantitative traits
In the classic view introduced by R. A. Fisher, a quantitative trait is
encoded by many loci with small, additive effects. Recent advances in QTL
mapping have begun to elucidate the genetic architectures underlying vast
numbers of phenotypes across diverse taxa, producing observations that
sometimes contrast with Fisher's blueprint. Despite these considerable
empirical efforts to map the genetic determinants of traits, it remains poorly
understood how the genetic architecture of a trait should evolve, or how it
depends on the selection pressures on the trait. Here we develop a simple,
population-genetic model for the evolution of genetic architectures. Our model
predicts that traits under moderate selection should be encoded by many loci
with highly variable effects, whereas traits under either weak or strong
selection should be encoded by relatively few loci. We compare these
theoretical predictions to qualitative trends in the genetics of human traits,
and to systematic data on the genetics of gene expression levels in yeast. Our
analysis provides an evolutionary explanation for broad empirical patterns in
the genetic basis of traits, and it introduces a single framework that unifies
the diversity of observed genetic architectures, ranging from Mendelian to
Fisherian.Comment: Minor changes in the text; Added supplementary materia
Multiple locus linkage analysis of genomewide expression in yeast.
With the ability to measure thousands of related phenotypes from a single biological sample, it is now feasible to genetically dissect systems-level biological phenomena. The genetics of transcriptional regulation and protein abundance are likely to be complex, meaning that genetic variation at multiple loci will influence these phenotypes. Several recent studies have investigated the role of genetic variation in transcription by applying traditional linkage analysis methods to genomewide expression data, where each gene expression level was treated as a quantitative trait and analyzed separately from one another. Here, we develop a new, computationally efficient method for simultaneously mapping multiple gene expression quantitative trait loci that directly uses all of the available data. Information shared across gene expression traits is captured in a way that makes minimal assumptions about the statistical properties of the data. The method produces easy-to-interpret measures of statistical significance for both individual loci and the overall joint significance of multiple loci selected for a given expression trait. We apply the new method to a cross between two strains of the budding yeast Saccharomyces cerevisiae, and estimate that at least 37% of all gene expression traits show two simultaneous linkages, where we have allowed for epistatic interactions. Pairs of jointly linking quantitative trait loci are identified with high confidence for 170 gene expression traits, where it is expected that both loci are true positives for at least 153 traits. In addition, we are able to show that epistatic interactions contribute to gene expression variation for at least 14% of all traits. We compare the proposed approach to an exhaustive two-dimensional scan over all pairs of loci. Surprisingly, we demonstrate that an exhaustive two-dimensional scan is less powerful than the sequential search used here. In addition, we show that a two-dimensional scan does not truly allow one to test for simultaneous linkage, and the statistical significance measured from this existing method cannot be interpreted among many traits
Dissection of QTL effects for root traits using a chromosome arm-specific mapping population in bread wheat
A high-resolution chromosome arm-specific mapping population was used in an attempt to locate/detect gene(s)/QTL for different root traits on the short arm of rye chromosome 1 (1RS) in bread wheat. This population consisted of induced homoeologous recombinants of 1RS with 1BS, each originating from a different crossover event and distinct from all other recombinants in the proportions of rye and wheat chromatin present. It provides a simple and powerful approach to detect even small QTL effects using fewer progeny. A promising empirical Bayes method was applied to estimate additive and epistatic effects for all possible marker pairs simultaneously in a single model. This method has an advantage for QTL analysis in minimizing the error variance and detecting interaction effects between loci with no main effect. A total of 15 QTL effects, 6 additive and 9 epistatic, were detected for different traits of root length and root weight in 1RS wheat. Epistatic interactions were further partitioned into inter-genomic (wheat and rye alleles) and intra-genomic (rye–rye or wheat–wheat alleles) interactions affecting various root traits. Four common regions were identified involving all the QTL for root traits. Two regions carried QTL for almost all the root traits and were responsible for all the epistatic interactions. Evidence for inter-genomic interactions is provided. Comparison of mean values supported the QTL detection
Informative Bayesian Model Selection: a method for identifying interactions in genome-wide data
In high-dimensional genome-wide (GWA) data, a key challenge is to detect genomic variants that interact in a nonlinear fashion in their association with disease. Identifying such genomic interactions is important for elucidating the inheritance of complex phenotypes and diseases. In this paper, we introduce a new computational method called Informative Bayesian Model Selection (IBMS) that leverages correlation among variants in GWA data due to the linkage disequilibrium to identify interactions accurately in a computationally efficient manner. IBMS combines several statistical methods including canonical correlation analysis, logistic regression analysis, and a Bayesians statistical measure of evaluating interactions. Compared to BOOST and BEAM that are two widely used methods for detecting genomic interactions, IBMS had significantly higher power when evaluated on synthetic data. Furthermore, when applied to Alzheimer's disease GWA data, IBMS identified previously reported interactions. IBMS is a useful method for identifying variants in GWA data, and software that implements IBMS is freely available online from http://lbb.ut.ac.ir/Download/ LBBsoft/IBMS. This journal is © the Partner Organisations 2014
- …