9,126 research outputs found
Exact Bayesian inference via data augmentation
Data augmentation is a common tool in Bayesian statistics, especially in the application of MCMC. Data augmentation is used where direct computation of the posterior density, π(θ|x), of the parameters θ, given the observed data x, is not possible. We show that for a range of problems, it is possible to augment the data by y, such that, π(θ|x,y) is known, and π(y|x) can easily be computed. In particular, π(y|x) is obtained by collapsing π(y,θ|x) through integrating out θ. This allows the exact computation of π(θ|x) as a mixture distribution without recourse to approximating methods such as MCMC. Useful byproducts of the exact posterior distribution are the marginal likelihood of the model and the exact predictive distribution
Efficient data augmentation for fitting stochastic epidemic models to prevalence data
Stochastic epidemic models describe the dynamics of an epidemic as a disease
spreads through a population. Typically, only a fraction of cases are observed
at a set of discrete times. The absence of complete information about the time
evolution of an epidemic gives rise to a complicated latent variable problem in
which the state space size of the epidemic grows large as the population size
increases. This makes analytically integrating over the missing data infeasible
for populations of even moderate size. We present a data augmentation Markov
chain Monte Carlo (MCMC) framework for Bayesian estimation of stochastic
epidemic model parameters, in which measurements are augmented with
subject-level disease histories. In our MCMC algorithm, we propose each new
subject-level path, conditional on the data, using a time-inhomogeneous
continuous-time Markov process with rates determined by the infection histories
of other individuals. The method is general, and may be applied, with minimal
modifications, to a broad class of stochastic epidemic models. We present our
algorithm in the context of multiple stochastic epidemic models in which the
data are binomially sampled prevalence counts, and apply our method to data
from an outbreak of influenza in a British boarding school
Conjugate Bayes for probit regression via unified skew-normal distributions
Regression models for dichotomous data are ubiquitous in statistics. Besides
being useful for inference on binary responses, these methods serve also as
building blocks in more complex formulations, such as density regression,
nonparametric classification and graphical models. Within the Bayesian
framework, inference proceeds by updating the priors for the coefficients,
typically set to be Gaussians, with the likelihood induced by probit or logit
regressions for the responses. In this updating, the apparent absence of a
tractable posterior has motivated a variety of computational methods, including
Markov Chain Monte Carlo routines and algorithms which approximate the
posterior. Despite being routinely implemented, Markov Chain Monte Carlo
strategies face mixing or time-inefficiency issues in large p and small n
studies, whereas approximate routines fail to capture the skewness typically
observed in the posterior. This article proves that the posterior distribution
for the probit coefficients has a unified skew-normal kernel, under Gaussian
priors. Such a novel result allows efficient Bayesian inference for a wide
class of applications, especially in large p and small-to-moderate n studies
where state-of-the-art computational methods face notable issues. These
advances are outlined in a genetic study, and further motivate the development
of a wider class of conjugate priors for probit models along with methods to
obtain independent and identically distributed samples from the unified
skew-normal posterior
Discriminative Nonparametric Latent Feature Relational Models with Data Augmentation
We present a discriminative nonparametric latent feature relational model
(LFRM) for link prediction to automatically infer the dimensionality of latent
features. Under the generic RegBayes (regularized Bayesian inference)
framework, we handily incorporate the prediction loss with probabilistic
inference of a Bayesian model; set distinct regularization parameters for
different types of links to handle the imbalance issue in real networks; and
unify the analysis of both the smooth logistic log-loss and the piecewise
linear hinge loss. For the nonconjugate posterior inference, we present a
simple Gibbs sampler via data augmentation, without making restricting
assumptions as done in variational methods. We further develop an approximate
sampler using stochastic gradient Langevin dynamics to handle large networks
with hundreds of thousands of entities and millions of links, orders of
magnitude larger than what existing LFRM models can process. Extensive studies
on various real networks show promising performance.Comment: Accepted by AAAI 201
- …