49 research outputs found
Smoothed Gradients for Stochastic Variational Inference
Stochastic variational inference (SVI) lets us scale up Bayesian computation
to massive data. It uses stochastic optimization to fit a variational
distribution, following easy-to-compute noisy natural gradients. As with most
traditional stochastic optimization methods, SVI takes precautions to use
unbiased stochastic gradients whose expectations are equal to the true
gradients. In this paper, we explore the idea of following biased stochastic
gradients in SVI. Our method replaces the natural gradient with a similarly
constructed vector that uses a fixed-window moving average of some of its
previous terms. We will demonstrate the many advantages of this technique.
First, its computational cost is the same as for SVI and storage requirements
only multiply by a constant factor. Second, it enjoys significant variance
reduction over the unbiased estimates, smaller bias than averaged gradients,
and leads to smaller mean-squared error against the full gradient. We test our
method on latent Dirichlet allocation with three large corpora.Comment: Appears in Neural Information Processing Systems, 201
Consistency of adjacency spectral embedding for the mixed membership stochastic blockmodel
The mixed membership stochastic blockmodel is a statistical model for a
graph, which extends the stochastic blockmodel by allowing every node to
randomly choose a different community each time a decision of whether to form
an edge is made. Whereas spectral analysis for the stochastic blockmodel is
increasingly well established, theory for the mixed membership case is
considerably less developed. Here we show that adjacency spectral embedding
into , followed by fitting the minimum volume enclosing convex
-polytope to the principal components, leads to a consistent estimate
of a -community mixed membership stochastic blockmodel. The key is to
identify a direct correspondence between the mixed membership stochastic
blockmodel and the random dot product graph, which greatly facilitates
theoretical analysis. Specifically, a norm and central
limit theorem for the random dot product graph are exploited to respectively
show consistency and partially correct the bias of the procedure.Comment: 12 pages, 6 figure
Scalable Recommendation with Poisson Factorization
We develop a Bayesian Poisson matrix factorization model for forming
recommendations from sparse user behavior data. These data are large user/item
matrices where each user has provided feedback on only a small subset of items,
either explicitly (e.g., through star ratings) or implicitly (e.g., through
views or purchases). In contrast to traditional matrix factorization
approaches, Poisson factorization implicitly models each user's limited
attention to consume items. Moreover, because of the mathematical form of the
Poisson likelihood, the model needs only to explicitly consider the observed
entries in the matrix, leading to both scalable computation and good predictive
performance. We develop a variational inference algorithm for approximate
posterior inference that scales up to massive data sets. This is an efficient
algorithm that iterates over the observed entries and adjusts an approximate
posterior over the user/item representations. We apply our method to large
real-world user data containing users rating movies, users listening to songs,
and users reading scientific papers. In all these settings, Bayesian Poisson
factorization outperforms state-of-the-art matrix factorization methods