679 research outputs found
Bayesian Deep Net GLM and GLMM
Deep feedforward neural networks (DFNNs) are a powerful tool for functional
approximation. We describe flexible versions of generalized linear and
generalized linear mixed models incorporating basis functions formed by a DFNN.
The consideration of neural networks with random effects is not widely used in
the literature, perhaps because of the computational challenges of
incorporating subject specific parameters into already complex models.
Efficient computational methods for high-dimensional Bayesian inference are
developed using Gaussian variational approximation, with a parsimonious but
flexible factor parametrization of the covariance matrix. We implement natural
gradient methods for the optimization, exploiting the factor structure of the
variational covariance matrix in computation of the natural gradient. Our
flexible DFNN models and Bayesian inference approach lead to a regression and
classification method that has a high prediction accuracy, and is able to
quantify the prediction uncertainty in a principled and convenient way. We also
describe how to perform variable selection in our deep learning method. The
proposed methods are illustrated in a wide range of simulated and real-data
examples, and the results compare favourably to a state of the art flexible
regression and classification method in the statistical literature, the
Bayesian additive regression trees (BART) method. User-friendly software
packages in Matlab, R and Python implementing the proposed methods are
available at https://github.com/VBayesLabComment: 35 pages, 7 figure, 10 table
Bayesian Compressed Regression
As an alternative to variable selection or shrinkage in high dimensional
regression, we propose to randomly compress the predictors prior to analysis.
This dramatically reduces storage and computational bottlenecks, performing
well when the predictors can be projected to a low dimensional linear subspace
with minimal loss of information about the response. As opposed to existing
Bayesian dimensionality reduction approaches, the exact posterior distribution
conditional on the compressed data is available analytically, speeding up
computation by many orders of magnitude while also bypassing robustness issues
due to convergence and mixing problems with MCMC. Model averaging is used to
reduce sensitivity to the random projection matrix, while accommodating
uncertainty in the subspace dimension. Strong theoretical support is provided
for the approach by showing near parametric convergence rates for the
predictive density in the large p small n asymptotic paradigm. Practical
performance relative to competitors is illustrated in simulations and real data
applications.Comment: 29 pages, 4 figure
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Numerically Stable Approximate Bayesian Methods for Generalized Linear Mixed Models and Linear Model Selection
Approximate Bayesian inference methods offer methodology for fitting Bayesian models as fast alternatives to Markov Chain Monte Carlo methods that sometimes have only a slight loss of accuracy. In this thesis, we consider variable selection for linear models, and zero inflated mixed models. Variable selection for linear regression models are ubiquitous in applied statistics. We use the popular g-prior (Zellner, 1986) for model selection of linear models with normal priors where g is a prior hyperparameter. We derive exact expressions for the model selection Bayes Factors in terms of special functions depending on the sample size, number of covariates and R-squared of the model. We show that these expressions are accurate, fast to evaluate, and numerically stable. An R package blma for doing Bayesian linear model averaging using these exact expressions has been released on GitHub. We extend the Particle EM method of (Rockova, 2017) using Particle Variational Approximation and the exact posterior marginal likelihood expressions to derive a computationally efficient algorithm for model selection on data sets with many covariates. Our algorithm performs well relative to existing algorithms, completing in 8 seconds on a model selection problem with a sample size of 600 and 7200 covariates. We consider zero-inflated models that have many applications in areas such as manufacturing and public health, but pose numerical issues when fitting them to data. We apply a variational approximation to zero-inflated Poisson mixed models with Gaussian distributed random effects using a combination of VB and the Gaussian Variational Approximation (GVA). We also incorporate a novel parameterisation of the covariance of the GVA using the Cholesky factor of the precision matrix, similar to Tan and Nott (2018) to resolve associated numerical difficulties
Nonlinearities in Macroeconomic Tail Risk through the Lens of Big Data Quantile Regressions
Modeling and predicting extreme movements in GDP is notoriously difficult and
the selection of appropriate covariates and/or possible forms of nonlinearities
are key in obtaining precise forecasts. In this paper, our focus is on using
large datasets in quantile regression models to forecast the conditional
distribution of US GDP growth. To capture possible non-linearities, we include
several nonlinear specifications. The resulting models will be huge dimensional
and we thus rely on a set of shrinkage priors. Since Markov Chain Monte Carlo
estimation becomes slow in these dimensions, we rely on fast variational Bayes
approximations to the posterior distribution of the coefficients and the latent
states. We find that our proposed set of models produces precise forecasts.
These gains are especially pronounced in the tails. Using Gaussian processes to
approximate the nonlinear component of the model further improves the good
performance, in particular in the right tail
- …