623 research outputs found
LIMIX: genetic analysis of multiple traits
Multi-trait mixed models have emerged as a promising approach for joint analyses of multiple traits. In principle, the mixed model framework is remarkably general.
However, current methods implement only a very specific range of tasks to optimize the necessary computations. Here, we present a multi-trait modeling framework that is versatile and fast: LIMIX enables to exibly adapt mixed models for a broad range of applications with different observed and hidden covariates, and variable study designs. To highlight the novel modeling aspects of LIMIX we performed three vastly different genetic studies: joint GWAS of correlated blood lipid phenotypes, joint analysis of the expression levels of the multiple transcript-isoforms of
a gene, and pathway-based modeling of molecular traits across environments. In these applications we show that LIMIX increases GWAS power and phenotype prediction
accuracy, in particular when integrating stepwise multi-locus regression into multi-trait models, and when analyzing large numbers of traits. An open source
implementation of LIMIX is freely available at: https://github.com/PMBio/limix
Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies
Motivated by examples from genetic association studies, this paper considers
the model selection problem in a general complex linear model system and in a
Bayesian framework. We discuss formulating model selection problems and
incorporating context-dependent {\it a priori} information through different
levels of prior specifications. We also derive analytic Bayes factors and their
approximations to facilitate model selection and discuss their theoretical and
computational properties. We demonstrate our Bayesian approach based on an
implemented Markov Chain Monte Carlo (MCMC) algorithm in simulations and a real
data application of mapping tissue-specific eQTLs. Our novel results on Bayes
factors provide a general framework to perform efficient model comparisons in
complex linear model systems
Gene expression in large pedigrees: analytic approaches.
BackgroundWe currently have the ability to quantify transcript abundance of messenger RNA (mRNA), genome-wide, using microarray technologies. Analyzing genotype, phenotype and expression data from 20 pedigrees, the members of our Genetic Analysis Workshop (GAW) 19 gene expression group published 9 papers, tackling some timely and important problems and questions. To study the complexity and interrelationships of genetics and gene expression, we used established statistical tools, developed newer statistical tools, and developed and applied extensions to these tools.MethodsTo study gene expression correlations in the pedigree members (without incorporating genotype or trait data into the analysis), 2 papers used principal components analysis, weighted gene coexpression network analysis, meta-analyses, gene enrichment analyses, and linear mixed models. To explore the relationship between genetics and gene expression, 2 papers studied expression quantitative trait locus allelic heterogeneity through conditional association analyses, and epistasis through interaction analyses. A third paper assessed the feasibility of applying allele-specific binding to filter potential regulatory single-nucleotide polymorphisms (SNPs). Analytic approaches included linear mixed models based on measured genotypes in pedigrees, permutation tests, and covariance kernels. To incorporate both genotype and phenotype data with gene expression, 4 groups employed linear mixed models, nonparametric weighted U statistics, structural equation modeling, Bayesian unified frameworks, and multiple regression.Results and discussionRegarding the analysis of pedigree data, we found that gene expression is familial, indicating that at least 1 factor for pedigree membership or multiple factors for the degree of relationship should be included in analyses, and we developed a method to adjust for familiality prior to conducting weighted co-expression gene network analysis. For SNP association and conditional analyses, we found FaST-LMM (Factored Spectrally Transformed Linear Mixed Model) and SOLAR-MGA (Sequential Oligogenic Linkage Analysis Routines -Major Gene Analysis) have similar type 1 and type 2 errors and can be used almost interchangeably. To improve the power and precision of association tests, prior knowledge of DNase-I hypersensitivity sites or other relevant biological annotations can be incorporated into the analyses. On a biological level, eQTL (expression quantitative trait loci) are genetically complex, exhibiting both allelic heterogeneity and epistasis. Including both genotype and phenotype data together with measurements of gene expression was found to be generally advantageous in terms of generating improved levels of significance and in providing more interpretable biological models.ConclusionsPedigrees can be used to conduct analyses of and enhance gene expression studies
Efficient inference for genetic association studies with multiple outcomes
Combined inference for heterogeneous high-dimensional data is critical in
modern biology, where clinical and various kinds of molecular data may be
available from a single study. Classical genetic association studies regress a
single clinical outcome on many genetic variants one by one, but there is an
increasing demand for joint analysis of many molecular outcomes and genetic
variants in order to unravel functional interactions. Unfortunately, most
existing approaches to joint modelling are either too simplistic to be powerful
or are impracticable for computational reasons. Inspired by Richardson et al.
(2010, Bayesian Statistics 9), we consider a sparse multivariate regression
model that allows simultaneous selection of predictors and associated
responses. As Markov chain Monte Carlo (MCMC) inference on such models can be
prohibitively slow when the number of genetic variants exceeds a few thousand,
we propose a variational inference approach which produces posterior
information very close to that of MCMC inference, at a much reduced
computational cost. Extensive numerical experiments show that our approach
outperforms popular variable selection methods and tailored Bayesian
procedures, dealing within hours with problems involving hundreds of thousands
of genetic variants and tens to hundreds of clinical or molecular outcomes
- …