4,381 research outputs found
Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data
We present a Bayesian non-negative tensor factorization model for
count-valued tensor data, and develop scalable inference algorithms (both batch
and online) for dealing with massive tensors. Our generative model can handle
overdispersed counts as well as infer the rank of the decomposition. Moreover,
leveraging a reparameterization of the Poisson distribution as a multinomial
facilitates conjugacy in the model and enables simple and efficient Gibbs
sampling and variational Bayes (VB) inference updates, with a computational
cost that only depends on the number of nonzeros in the tensor. The model also
provides a nice interpretability for the factors; in our model, each factor
corresponds to a "topic". We develop a set of online inference algorithms that
allow further scaling up the model to massive tensors, for which batch
inference methods may be infeasible. We apply our framework on diverse
real-world applications, such as \emph{multiway} topic modeling on a scientific
publications database, analyzing a political science data set, and analyzing a
massive household transactions data set.Comment: ECML PKDD 201
Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
Despite having various attractive qualities such as high prediction accuracy
and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix
Factorization has not been widely adopted because of the prohibitive cost of
inference. In this paper, we propose a scalable distributed Bayesian matrix
factorization algorithm using stochastic gradient MCMC. Our algorithm, based on
Distributed Stochastic Gradient Langevin Dynamics, can not only match the
prediction accuracy of standard MCMC methods like Gibbs sampling, but at the
same time is as fast and simple as stochastic gradient descent. In our
experiments, we show that our algorithm can achieve the same level of
prediction accuracy as Gibbs sampling an order of magnitude faster. We also
show that our method reduces the prediction error as fast as distributed
stochastic gradient descent, achieving a 4.1% improvement in RMSE for the
Netflix dataset and an 1.8% for the Yahoo music dataset
- …