1,103 research outputs found
Efficient inference for genetic association studies with multiple outcomes
Combined inference for heterogeneous high-dimensional data is critical in
modern biology, where clinical and various kinds of molecular data may be
available from a single study. Classical genetic association studies regress a
single clinical outcome on many genetic variants one by one, but there is an
increasing demand for joint analysis of many molecular outcomes and genetic
variants in order to unravel functional interactions. Unfortunately, most
existing approaches to joint modelling are either too simplistic to be powerful
or are impracticable for computational reasons. Inspired by Richardson et al.
(2010, Bayesian Statistics 9), we consider a sparse multivariate regression
model that allows simultaneous selection of predictors and associated
responses. As Markov chain Monte Carlo (MCMC) inference on such models can be
prohibitively slow when the number of genetic variants exceeds a few thousand,
we propose a variational inference approach which produces posterior
information very close to that of MCMC inference, at a much reduced
computational cost. Extensive numerical experiments show that our approach
outperforms popular variable selection methods and tailored Bayesian
procedures, dealing within hours with problems involving hundreds of thousands
of genetic variants and tens to hundreds of clinical or molecular outcomes
An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process
Stochastic variational inference (SVI) is emerging as the most promising
candidate for scaling inference in Bayesian probabilistic models to large
datasets. However, the performance of these methods has been assessed primarily
in the context of Bayesian topic models, particularly latent Dirichlet
allocation (LDA). Deriving several new algorithms, and using synthetic, image
and genomic datasets, we investigate whether the understanding gleaned from LDA
applies in the setting of sparse latent factor models, specifically beta
process factor analysis (BPFA). We demonstrate that the big picture is
consistent: using Gibbs sampling within SVI to maintain certain posterior
dependencies is extremely effective. However, we find that different posterior
dependencies are important in BPFA relative to LDA. Particularly,
approximations able to model intra-local variable dependence perform best.Comment: ICML, 12 pages. Volume 37: Proceedings of The 32nd International
Conference on Machine Learning, 201
Binary Linear Classification and Feature Selection via Generalized Approximate Message Passing
For the problem of binary linear classification and feature selection, we
propose algorithmic approaches to classifier design based on the generalized
approximate message passing (GAMP) algorithm, recently proposed in the context
of compressive sensing. We are particularly motivated by problems where the
number of features greatly exceeds the number of training examples, but where
only a few features suffice for accurate classification. We show that
sum-product GAMP can be used to (approximately) minimize the classification
error rate and max-sum GAMP can be used to minimize a wide variety of
regularized loss functions. Furthermore, we describe an
expectation-maximization (EM)-based scheme to learn the associated model
parameters online, as an alternative to cross-validation, and we show that
GAMP's state-evolution framework can be used to accurately predict the
misclassification rate. Finally, we present a detailed numerical study to
confirm the accuracy, speed, and flexibility afforded by our GAMP-based
approaches to binary linear classification and feature selection
- …