13,554 research outputs found
Bayesian Conditional Tensor Factorizations for High-Dimensional Classification
In many application areas, data are collected on a categorical response and
high-dimensional categorical predictors, with the goals being to build a
parsimonious model for classification while doing inferences on the important
predictors. In settings such as genomics, there can be complex interactions
among the predictors. By using a carefully-structured Tucker factorization, we
define a model that can characterize any conditional probability, while
facilitating variable selection and modeling of higher-order interactions.
Following a Bayesian approach, we propose a Markov chain Monte Carlo algorithm
for posterior computation accommodating uncertainty in the predictors to be
included. Under near sparsity assumptions, the posterior distribution for the
conditional probability is shown to achieve close to the parametric rate of
contraction even in ultra high-dimensional settings. The methods are
illustrated using simulation examples and biomedical applications
Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data
We present a Bayesian non-negative tensor factorization model for
count-valued tensor data, and develop scalable inference algorithms (both batch
and online) for dealing with massive tensors. Our generative model can handle
overdispersed counts as well as infer the rank of the decomposition. Moreover,
leveraging a reparameterization of the Poisson distribution as a multinomial
facilitates conjugacy in the model and enables simple and efficient Gibbs
sampling and variational Bayes (VB) inference updates, with a computational
cost that only depends on the number of nonzeros in the tensor. The model also
provides a nice interpretability for the factors; in our model, each factor
corresponds to a "topic". We develop a set of online inference algorithms that
allow further scaling up the model to massive tensors, for which batch
inference methods may be infeasible. We apply our framework on diverse
real-world applications, such as \emph{multiway} topic modeling on a scientific
publications database, analyzing a political science data set, and analyzing a
massive household transactions data set.Comment: ECML PKDD 201
Bayesian uncertainty quantification in linear models for diffusion MRI
Diffusion MRI (dMRI) is a valuable tool in the assessment of tissue
microstructure. By fitting a model to the dMRI signal it is possible to derive
various quantitative features. Several of the most popular dMRI signal models
are expansions in an appropriately chosen basis, where the coefficients are
determined using some variation of least-squares. However, such approaches lack
any notion of uncertainty, which could be valuable in e.g. group analyses. In
this work, we use a probabilistic interpretation of linear least-squares
methods to recast popular dMRI models as Bayesian ones. This makes it possible
to quantify the uncertainty of any derived quantity. In particular, for
quantities that are affine functions of the coefficients, the posterior
distribution can be expressed in closed-form. We simulated measurements from
single- and double-tensor models where the correct values of several quantities
are known, to validate that the theoretically derived quantiles agree with
those observed empirically. We included results from residual bootstrap for
comparison and found good agreement. The validation employed several different
models: Diffusion Tensor Imaging (DTI), Mean Apparent Propagator MRI (MAP-MRI)
and Constrained Spherical Deconvolution (CSD). We also used in vivo data to
visualize maps of quantitative features and corresponding uncertainties, and to
show how our approach can be used in a group analysis to downweight subjects
with high uncertainty. In summary, we convert successful linear models for dMRI
signal estimation to probabilistic models, capable of accurate uncertainty
quantification.Comment: Added results from a group analysis and a comparison with residual
bootstra
Just Another Gibbs Additive Modeller: Interfacing JAGS and mgcv
The BUGS language offers a very flexible way of specifying complex
statistical models for the purposes of Gibbs sampling, while its JAGS variant
offers very convenient R integration via the rjags package. However, including
smoothers in JAGS models can involve some quite tedious coding, especially for
multivariate or adaptive smoothers. Further, if an additive smooth structure is
required then some care is needed, in order to centre smooths appropriately,
and to find appropriate starting values. R package mgcv implements a wide range
of smoothers, all in a manner appropriate for inclusion in JAGS code, and
automates centring and other smooth setup tasks. The purpose of this note is to
describe an interface between mgcv and JAGS, based around an R function,
`jagam', which takes a generalized additive model (GAM) as specified in mgcv
and automatically generates the JAGS model code and data required for inference
about the model via Gibbs sampling. Although the auto-generated JAGS code can
be run as is, the expectation is that the user would wish to modify it in order
to add complex stochastic model components readily specified in JAGS. A simple
interface is also provided for visualisation and further inference about the
estimated smooth components using standard mgcv functionality. The methods
described here will be un-necessarily inefficient if all that is required is
fully Bayesian inference about a standard GAM, rather than the full flexibility
of JAGS. In that case the BayesX package would be more efficient.Comment: Submitted to the Journal of Statistical Softwar
- …