12 research outputs found
Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities
Bayesian variable selection has gained much empirical success recently in a
variety of applications when the number of explanatory variables
is possibly much larger than the sample size . For
generalized linear models, if most of the 's have very small effects on
the response , we show that it is possible to use Bayesian variable
selection to reduce overfitting caused by the curse of dimensionality .
In this approach a suitable prior can be used to choose a few out of the many
's to model , so that the posterior will propose probability densities
that are ``often close'' to the true density in some sense. The
closeness can be described by a Hellinger distance between and that
scales at a power very close to , which is the ``finite-dimensional
rate'' corresponding to a low-dimensional situation. These findings extend some
recent work of Jiang [Technical Report 05-02 (2005) Dept. Statistics,
Northwestern Univ.] on consistency of Bayesian variable selection for binary
classification.Comment: Published in at http://dx.doi.org/10.1214/009053607000000019 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Gaussian Mixture Regression model with logistic weights, a penalized maximum likelihood approach
We wish to estimate conditional density using Gaussian Mixture Regression
model with logistic weights and means depending on the covariate. We aim at
selecting the number of components of this model as well as the other
parameters by a penalized maximum likelihood approach. We provide a lower bound
on penalty, proportional up to a logarithmic term to the dimension of each
model, that ensures an oracle inequality for our estimator. Our theoretical
analysis is supported by some numerical experiments
Bayesian Neural Tree Models for Nonparametric Regression
Frequentist and Bayesian methods differ in many aspects, but share some basic
optimal properties. In real-life classification and regression problems,
situations exist in which a model based on one of the methods is preferable
based on some subjective criterion. Nonparametric classification and regression
techniques, such as decision trees and neural networks, have frequentist
(classification and regression trees (CART) and artificial neural networks) as
well as Bayesian (Bayesian CART and Bayesian neural networks) approaches to
learning from data. In this work, we present two hybrid models combining the
Bayesian and frequentist versions of CART and neural networks, which we call
the Bayesian neural tree (BNT) models. Both models exploit the architecture of
decision trees and have lesser number of parameters to tune than advanced
neural networks. Such models can simultaneously perform feature selection and
prediction, are highly flexible, and generalize well in settings with a limited
number of training observations. We study the consistency of the proposed
models, and derive the optimal value of an important model parameter. We also
provide illustrative examples using a wide variety of real-life regression data
sets
Challenges in Markov chain Monte Carlo for Bayesian neural networks
Markov chain Monte Carlo (MCMC) methods have not been broadly adopted in
Bayesian neural networks (BNNs). This paper initially reviews the main
challenges in sampling from the parameter posterior of a neural network via
MCMC. Such challenges culminate to lack of convergence to the parameter
posterior. Nevertheless, this paper shows that a non-converged Markov chain,
generated via MCMC sampling from the parameter space of a neural network, can
yield via Bayesian marginalization a valuable predictive posterior of the
output of the neural network. Classification examples based on multilayer
perceptrons showcase highly accurate predictive posteriors. The postulate of
limited scope for MCMC developments in BNNs is partially valid; an
asymptotically exact parameter posterior seems less plausible, yet an accurate
predictive posterior is a tenable research avenue
Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent
We prove that two-layer (Leaky)ReLU networks initialized by e.g. the widely
used method proposed by He et al. (2015) and trained using gradient descent on
a least-squares loss are not universally consistent. Specifically, we describe
a large class of one-dimensional data-generating distributions for which, with
high probability, gradient descent only finds a bad local minimum of the
optimization landscape. It turns out that in these cases, the found network
essentially performs linear regression even if the target function is
non-linear. We further provide numerical evidence that this happens in
practical situations, for some multi-dimensional distributions and that
stochastic gradient descent exhibits similar behavior.Comment: Changes in v2: Single-column layout, NTK discussion, new experiment,
updated introduction, improved explanations. 20 pages + 33 pages appendix.
Code available at https://github.com/dholzmueller/nn_inconsistenc
A review of probabilistic forecasting and prediction with machine learning
Predictions and forecasts of machine learning models should take the form of
probability distributions, aiming to increase the quantity of information
communicated to end users. Although applications of probabilistic prediction
and forecasting with machine learning models in academia and industry are
becoming more frequent, related concepts and methods have not been formalized
and structured under a holistic view of the entire field. Here, we review the
topic of predictive uncertainty estimation with machine learning algorithms, as
well as the related metrics (consistent scoring functions and proper scoring
rules) for assessing probabilistic predictions. The review covers a time period
spanning from the introduction of early statistical (linear regression and time
series models, based on Bayesian statistics or quantile regression) to recent
machine learning algorithms (including generalized additive models for
location, scale and shape, random forests, boosting and deep learning
algorithms) that are more flexible by nature. The review of the progress in the
field, expedites our understanding on how to develop new algorithms tailored to
users' needs, since the latest advancements are based on some fundamental
concepts applied to more complex algorithms. We conclude by classifying the
material and discussing challenges that are becoming a hot topic of research.Comment: 83 pages, 5 figure
Tree models: a Bayesian perspective
Submitted in partial fulfilment of the requirements for the degree of Master of Philosophy at Queen Mary, University of London, November 2006Classical tree models represent an attempt to create nonparametric models which
have good predictive powers as well a simple structure readily comprehensible by non-
experts. Bayesian tree models have been created by a team consisting of Chipman,
George and McCulloch and second team consisting of Denison, Mallick and Smith.
Both approaches employ Green's Reversible Jump Markov Chain Monte Carlo tech-
nique to carry out a more e®ective search than the `greedy' methods used classically.
The aim of this work is to evaluate both types of Bayesian tree models from a
Bayesian perspective and compare them