21,051 research outputs found
Variable selection for BART: An application to gene regulation
We consider the task of discovering gene regulatory networks, which are
defined as sets of genes and the corresponding transcription factors which
regulate their expression levels. This can be viewed as a variable selection
problem, potentially with high dimensionality. Variable selection is especially
challenging in high-dimensional settings, where it is difficult to detect
subtle individual effects and interactions between predictors. Bayesian
Additive Regression Trees [BART, Ann. Appl. Stat. 4 (2010) 266-298] provides a
novel nonparametric alternative to parametric regression approaches, such as
the lasso or stepwise regression, especially when the number of relevant
predictors is sparse relative to the total number of available predictors and
the fundamental relationships are nonlinear. We develop a principled
permutation-based inferential approach for determining when the effect of a
selected predictor is likely to be real. Going further, we adapt the BART
procedure to incorporate informed prior information about variable importance.
We present simulations demonstrating that our method compares favorably to
existing parametric and nonparametric procedures in a variety of data settings.
To demonstrate the potential of our approach in a biological context, we apply
it to the task of inferring the gene regulatory network in yeast (Saccharomyces
cerevisiae). We find that our BART-based procedure is best able to recover the
subset of covariates with the largest signal compared to other variable
selection methods. The methods developed in this work are readily available in
the R package bartMachine.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS755 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Feature and Variable Selection in Classification
The amount of information in the form of features and variables avail- able
to machine learning algorithms is ever increasing. This can lead to classifiers
that are prone to overfitting in high dimensions, high di- mensional models do
not lend themselves to interpretable results, and the CPU and memory resources
necessary to run on high-dimensional datasets severly limit the applications of
the approaches. Variable and feature selection aim to remedy this by finding a
subset of features that in some way captures the information provided best. In
this paper we present the general methodology and highlight some specific
approaches.Comment: Part of master seminar in document analysis held by Marcus
Eichenberger-Liwick
Reliable ABC model choice via random forests
Approximate Bayesian computation (ABC) methods provide an elaborate approach
to Bayesian inference on complex models, including model choice. Both
theoretical arguments and simulation experiments indicate, however, that model
posterior probabilities may be poorly evaluated by standard ABC techniques. We
propose a novel approach based on a machine learning tool named random forests
to conduct selection among the highly complex models covered by ABC algorithms.
We thus modify the way Bayesian model selection is both understood and
operated, in that we rephrase the inferential goal as a classification problem,
first predicting the model that best fits the data with random forests and
postponing the approximation of the posterior probability of the predicted MAP
for a second stage also relying on random forests. Compared with earlier
implementations of ABC model choice, the ABC random forest approach offers
several potential improvements: (i) it often has a larger discriminative power
among the competing models, (ii) it is more robust against the number and
choice of statistics summarizing the data, (iii) the computing effort is
drastically reduced (with a gain in computation efficiency of at least fifty),
and (iv) it includes an approximation of the posterior probability of the
selected model. The call to random forests will undoubtedly extend the range of
size of datasets and complexity of models that ABC can handle. We illustrate
the power of this novel methodology by analyzing controlled experiments as well
as genuine population genetics datasets. The proposed methodologies are
implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
We present a novel hybrid algorithm for Bayesian network structure learning,
called H2PC. It first reconstructs the skeleton of a Bayesian network and then
performs a Bayesian-scoring greedy hill-climbing search to orient the edges.
The algorithm is based on divide-and-conquer constraint-based subroutines to
learn the local structure around a target variable. We conduct two series of
experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is
currently the most powerful state-of-the-art algorithm for Bayesian network
structure learning. First, we use eight well-known Bayesian network benchmarks
with various data sizes to assess the quality of the learned structure returned
by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in
terms of goodness of fit to new data and quality of the network structure with
respect to the true dependence structure of the data. Second, we investigate
H2PC's ability to solve the multi-label learning problem. We provide
theoretical results to characterize and identify graphically the so-called
minimal label powersets that appear as irreducible factors in the joint
distribution under the faithfulness condition. The multi-label learning problem
is then decomposed into a series of multi-class classification problems, where
each multi-class variable encodes a label powerset. H2PC is shown to compare
favorably to MMHC in terms of global classification accuracy over ten
multi-label data sets covering different application domains. Overall, our
experiments support the conclusions that local structural learning with H2PC in
the form of local neighborhood induction is a theoretically well-motivated and
empirically effective learning framework that is well suited to multi-label
learning. The source code (in R) of H2PC as well as all data sets used for the
empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author
Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities
Bayesian variable selection has gained much empirical success recently in a
variety of applications when the number of explanatory variables
is possibly much larger than the sample size . For
generalized linear models, if most of the 's have very small effects on
the response , we show that it is possible to use Bayesian variable
selection to reduce overfitting caused by the curse of dimensionality .
In this approach a suitable prior can be used to choose a few out of the many
's to model , so that the posterior will propose probability densities
that are ``often close'' to the true density in some sense. The
closeness can be described by a Hellinger distance between and that
scales at a power very close to , which is the ``finite-dimensional
rate'' corresponding to a low-dimensional situation. These findings extend some
recent work of Jiang [Technical Report 05-02 (2005) Dept. Statistics,
Northwestern Univ.] on consistency of Bayesian variable selection for binary
classification.Comment: Published in at http://dx.doi.org/10.1214/009053607000000019 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Nearly optimal Bayesian Shrinkage for High Dimensional Regression
During the past decade, shrinkage priors have received much attention in
Bayesian analysis of high-dimensional data. In this paper, we study the problem
for high-dimensional linear regression models. We show that if the shrinkage
prior has a heavy and flat tail, and allocates a sufficiently large probability
mass in a very small neighborhood of zero, then its posterior properties are as
good as those of the spike-and-slab prior. While enjoying its efficiency in
Bayesian computation, the shrinkage prior can lead to a nearly optimal
contraction rate and selection consistency as the spike-and-slab prior. Our
numerical results show that under posterior consistency, Bayesian methods can
yield much better results in variable selection than the regularization
methods, such as Lasso and SCAD. We also establish a Bernstein von-Mises type
results comparable to Castillo et al (2015), this result leads to a convenient
way to quantify uncertainties of the regression coefficient estimates, which
has been beyond the ability of regularization methods
- …