10,982 research outputs found
A Collaborative Kalman Filter for Time-Evolving Dyadic Processes
We present the collaborative Kalman filter (CKF), a dynamic model for
collaborative filtering and related factorization models. Using the matrix
factorization approach to collaborative filtering, the CKF accounts for time
evolution by modeling each low-dimensional latent embedding as a
multidimensional Brownian motion. Each observation is a random variable whose
distribution is parameterized by the dot product of the relevant Brownian
motions at that moment in time. This is naturally interpreted as a Kalman
filter with multiple interacting state space vectors. We also present a method
for learning a dynamically evolving drift parameter for each location by
modeling it as a geometric Brownian motion. We handle posterior intractability
via a mean-field variational approximation, which also preserves tractability
for downstream calculations in a manner similar to the Kalman filter. We
evaluate the model on several large datasets, providing quantitative evaluation
on the 10 million Movielens and 100 million Netflix datasets and qualitative
evaluation on a set of 39 million stock returns divided across roughly 6,500
companies from the years 1962-2014.Comment: Appeared at 2014 IEEE International Conference on Data Mining (ICDM
The Discrete Infinite Logistic Normal Distribution
We present the discrete infinite logistic normal distribution (DILN), a
Bayesian nonparametric prior for mixed membership models. DILN is a
generalization of the hierarchical Dirichlet process (HDP) that models
correlation structure between the weights of the atoms at the group level. We
derive a representation of DILN as a normalized collection of gamma-distributed
random variables, and study its statistical properties. We consider
applications to topic modeling and derive a variational inference algorithm for
approximate posterior inference. We study the empirical performance of the DILN
topic model on four corpora, comparing performance with the HDP and the
correlated topic model (CTM). To deal with large-scale data sets, we also
develop an online inference algorithm for DILN and compare with online HDP and
online LDA on the Nature magazine, which contains approximately 350,000
articles.Comment: This paper will appear in Bayesian Analysis. A shorter version of
this paper appeared at AISTATS 2011, Fort Lauderdale, FL, US
Bayesian Conditional Density Filtering
We propose a Conditional Density Filtering (C-DF) algorithm for efficient
online Bayesian inference. C-DF adapts MCMC sampling to the online setting,
sampling from approximations to conditional posterior distributions obtained by
propagating surrogate conditional sufficient statistics (a function of data and
parameter estimates) as new data arrive. These quantities eliminate the need to
store or process the entire dataset simultaneously and offer a number of
desirable features. Often, these include a reduction in memory requirements and
runtime and improved mixing, along with state-of-the-art parameter inference
and prediction. These improvements are demonstrated through several
illustrative examples including an application to high dimensional compressed
regression. Finally, we show that C-DF samples converge to the target posterior
distribution asymptotically as sampling proceeds and more data arrives.Comment: 41 pages, 7 figures, 12 table
- …