12 research outputs found

    Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities

    Full text link
    Bayesian variable selection has gained much empirical success recently in a variety of applications when the number KK of explanatory variables (x1,...,xK)(x_1,...,x_K) is possibly much larger than the sample size nn. For generalized linear models, if most of the xjx_j's have very small effects on the response yy, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality K≫nK\gg n. In this approach a suitable prior can be used to choose a few out of the many xjx_j's to model yy, so that the posterior will propose probability densities pp that are ``often close'' to the true density p∗p^* in some sense. The closeness can be described by a Hellinger distance between pp and p∗p^* that scales at a power very close to n−1/2n^{-1/2}, which is the ``finite-dimensional rate'' corresponding to a low-dimensional situation. These findings extend some recent work of Jiang [Technical Report 05-02 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.Comment: Published in at http://dx.doi.org/10.1214/009053607000000019 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Gaussian Mixture Regression model with logistic weights, a penalized maximum likelihood approach

    Get PDF
    We wish to estimate conditional density using Gaussian Mixture Regression model with logistic weights and means depending on the covariate. We aim at selecting the number of components of this model as well as the other parameters by a penalized maximum likelihood approach. We provide a lower bound on penalty, proportional up to a logarithmic term to the dimension of each model, that ensures an oracle inequality for our estimator. Our theoretical analysis is supported by some numerical experiments

    Bayesian Neural Tree Models for Nonparametric Regression

    Full text link
    Frequentist and Bayesian methods differ in many aspects, but share some basic optimal properties. In real-life classification and regression problems, situations exist in which a model based on one of the methods is preferable based on some subjective criterion. Nonparametric classification and regression techniques, such as decision trees and neural networks, have frequentist (classification and regression trees (CART) and artificial neural networks) as well as Bayesian (Bayesian CART and Bayesian neural networks) approaches to learning from data. In this work, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. Both models exploit the architecture of decision trees and have lesser number of parameters to tune than advanced neural networks. Such models can simultaneously perform feature selection and prediction, are highly flexible, and generalize well in settings with a limited number of training observations. We study the consistency of the proposed models, and derive the optimal value of an important model parameter. We also provide illustrative examples using a wide variety of real-life regression data sets

    Challenges in Markov chain Monte Carlo for Bayesian neural networks

    Full text link
    Markov chain Monte Carlo (MCMC) methods have not been broadly adopted in Bayesian neural networks (BNNs). This paper initially reviews the main challenges in sampling from the parameter posterior of a neural network via MCMC. Such challenges culminate to lack of convergence to the parameter posterior. Nevertheless, this paper shows that a non-converged Markov chain, generated via MCMC sampling from the parameter space of a neural network, can yield via Bayesian marginalization a valuable predictive posterior of the output of the neural network. Classification examples based on multilayer perceptrons showcase highly accurate predictive posteriors. The postulate of limited scope for MCMC developments in BNNs is partially valid; an asymptotically exact parameter posterior seems less plausible, yet an accurate predictive posterior is a tenable research avenue

    Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

    Full text link
    We prove that two-layer (Leaky)ReLU networks initialized by e.g. the widely used method proposed by He et al. (2015) and trained using gradient descent on a least-squares loss are not universally consistent. Specifically, we describe a large class of one-dimensional data-generating distributions for which, with high probability, gradient descent only finds a bad local minimum of the optimization landscape. It turns out that in these cases, the found network essentially performs linear regression even if the target function is non-linear. We further provide numerical evidence that this happens in practical situations, for some multi-dimensional distributions and that stochastic gradient descent exhibits similar behavior.Comment: Changes in v2: Single-column layout, NTK discussion, new experiment, updated introduction, improved explanations. 20 pages + 33 pages appendix. Code available at https://github.com/dholzmueller/nn_inconsistenc

    A review of probabilistic forecasting and prediction with machine learning

    Full text link
    Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users' needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.Comment: 83 pages, 5 figure

    Tree models: a Bayesian perspective

    Get PDF
    Submitted in partial fulfilment of the requirements for the degree of Master of Philosophy at Queen Mary, University of London, November 2006Classical tree models represent an attempt to create nonparametric models which have good predictive powers as well a simple structure readily comprehensible by non- experts. Bayesian tree models have been created by a team consisting of Chipman, George and McCulloch and second team consisting of Denison, Mallick and Smith. Both approaches employ Green's Reversible Jump Markov Chain Monte Carlo tech- nique to carry out a more e®ective search than the `greedy' methods used classically. The aim of this work is to evaluate both types of Bayesian tree models from a Bayesian perspective and compare them
    corecore