445 research outputs found
Fast model-fitting of Bayesian variable selection regression using the iterative complex factorization algorithm
Bayesian variable selection regression (BVSR) is able to jointly analyze
genome-wide genetic datasets, but the slow computation via Markov chain Monte
Carlo (MCMC) hampered its wide-spread usage. Here we present a novel iterative
method to solve a special class of linear systems, which can increase the speed
of the BVSR model-fitting tenfold. The iterative method hinges on the complex
factorization of the sum of two matrices and the solution path resides in the
complex domain (instead of the real domain). Compared to the Gauss-Seidel
method, the complex factorization converges almost instantaneously and its
error is several magnitude smaller than that of the Gauss-Seidel method. More
importantly, the error is always within the pre-specified precision while the
Gauss-Seidel method is not. For large problems with thousands of covariates,
the complex factorization is 10 -- 100 times faster than either the
Gauss-Seidel method or the direct method via the Cholesky decomposition. In
BVSR, one needs to repetitively solve large penalized regression systems whose
design matrices only change slightly between adjacent MCMC steps. This slight
change in design matrix enables the adaptation of the iterative complex
factorization method. The computational innovation will facilitate the
wide-spread use of BVSR in reanalyzing genome-wide association datasets.Comment: Accepted versio
Bayesian variable selection regression for genome-wide association studies and other large-scale problems
We consider applying Bayesian Variable Selection Regression, or BVSR, to
genome-wide association studies and similar large-scale regression problems.
Currently, typical genome-wide association studies measure hundreds of
thousands, or millions, of genetic variants (SNPs), in thousands or tens of
thousands of individuals, and attempt to identify regions harboring SNPs that
affect some phenotype or outcome of interest. This goal can naturally be cast
as a variable selection regression problem, with the SNPs as the covariates in
the regression. Characteristic features of genome-wide association studies
include the following: (i) a focus primarily on identifying relevant variables,
rather than on prediction; and (ii) many relevant covariates may have tiny
effects, making it effectively impossible to confidently identify the complete
"correct" subset of variables. Taken together, these factors put a premium on
having interpretable measures of confidence for individual covariates being
included in the model, which we argue is a strength of BVSR compared with
alternatives such as penalized regression methods. Here we focus primarily on
analysis of quantitative phenotypes, and on appropriate prior specification for
BVSR in this setting, emphasizing the idea of considering what the priors imply
about the total proportion of variance in outcome explained by relevant
covariates. We also emphasize the potential for BVSR to estimate this
proportion of variance explained, and hence shed light on the issue of "missing
heritability" in genome-wide association studies.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS455 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Expression QTLs Mapping and Analysis: A Bayesian Perspective.
The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results
Bayesian meta-analysis across genome-wide association studies of diverse phenotypes
Genome-wide association studies (GWAS) are a powerful tool for understanding the genetic basis of diseases and traits, but most studies have been conducted in isolation, with a focus on either a single or a set of closely related phenotypes. We describe MetABF, a simple Bayesian framework for performing integrative meta-analysis across multiple GWAS using summary statistics. The approach is applicable across a wide range of study designs and can increase the power by 50% compared with standard frequentist tests when only a subset of studies have a true effect. We demonstrate its utility in a meta-analysis of 20 diverse GWAS which were part of the Wellcome Trust Case Control Consortium 2. The novelty of the approach is its ability to explore, and assess the evidence for a range of possible true patterns of association across studies in a computationally efficient framework.Peer reviewe
- …