20 research outputs found
A new SVD approach to optimal topic estimation
In the probabilistic topic models, the quantity of interest---a low-rank
matrix consisting of topic vectors---is hidden in the text corpus matrix,
masked by noise, and Singular Value Decomposition (SVD) is a potentially useful
tool for learning such a matrix. However, different rows and columns of the
matrix are usually in very different scales and the connection between this
matrix and the singular vectors of the text corpus matrix are usually
complicated and hard to spell out, so how to use SVD for learning topic models
faces challenges.
We overcome the challenges by introducing a proper Pre-SVD normalization of
the text corpus matrix and a proper column-wise scaling for the matrix of
interest, and by revealing a surprising Post-SVD low-dimensional {\it simplex}
structure. The simplex structure, together with the Pre-SVD normalization and
column-wise scaling, allows us to conveniently reconstruct the matrix of
interest, and motivates a new SVD-based approach to learning topic models.
We show that under the popular probabilistic topic model \citep{hofmann1999},
our method has a faster rate of convergence than existing methods in a wide
variety of cases. In particular, for cases where documents are long or is
much larger than , our method achieves the optimal rate. At the heart of the
proofs is a tight element-wise bound on singular vectors of a multinomially
distributed data matrix, which do not exist in literature and we have to derive
by ourself.
We have applied our method to two data sets, Associated Process (AP) and
Statistics Literature Abstract (SLA), with encouraging results. In particular,
there is a clear simplex structure associated with the SVD of the data
matrices, which largely validates our discovery.Comment: 73 pages, 8 figures, 6 tables; considered two different VH algorithm,
OVH and GVH, and provided theoretical analysis for each algorithm;
re-organized upper bound theory part; added the subsection of comparing error
rate with other existing methods; provided another improved version of error
analysis through Bernstein inequality for martingale
On some provably correct cases of variational inference for topic models
Variational inference is a very efficient and popular heuristic used in
various forms in the context of latent variable models. It's closely related to
Expectation Maximization (EM), and is applied when exact EM is computationally
infeasible. Despite being immensely popular, current theoretical understanding
of the effectiveness of variaitonal inference based algorithms is very limited.
In this work we provide the first analysis of instances where variational
inference algorithms converge to the global optimum, in the setting of topic
models.
More specifically, we show that variational inference provably learns the
optimal parameters of a topic model under natural assumptions on the topic-word
matrix and the topic priors. The properties that the topic word matrix must
satisfy in our setting are related to the topic expansion assumption introduced
in (Anandkumar et al., 2013), as well as the anchor words assumption in (Arora
et al., 2012c). The assumptions on the topic priors are related to the well
known Dirichlet prior, introduced to the area of topic modeling by (Blei et
al., 2003).
It is well known that initialization plays a crucial role in how well
variational based algorithms perform in practice. The initializations that we
use are fairly natural. One of them is similar to what is currently used in
LDA-c, the most popular implementation of variational inference for topic
models. The other one is an overlapping clustering algorithm, inspired by a
work by (Arora et al., 2014) on dictionary learning, which is very simple and
efficient.
While our primary goal is to provide insights into when variational inference
might work in practice, the multiplicative, rather than the additive nature of
the variational inference updates forces us to use fairly non-standard proof
arguments, which we believe will be of general interest.Comment: 46 pages, Compared to previous version: clarified notation, a number
of typos fixed throughout pape