2,498 research outputs found
A Practical Algorithm for Topic Modeling with Provable Guarantees
Topic models provide a useful method for dimensionality reduction and
exploratory data analysis in large text corpora. Most approaches to topic model
inference have been based on a maximum likelihood objective. Efficient
algorithms exist that approximate this objective, but they have no provable
guarantees. Recently, algorithms have been introduced that provide provable
bounds, but these algorithms are not practical because they are inefficient and
not robust to violations of model assumptions. In this paper we present an
algorithm for topic model inference that is both provable and practical. The
algorithm produces results comparable to the best MCMC implementations while
running orders of magnitude faster.Comment: 26 page
Necessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models
The simplicial condition and other stronger conditions that imply it have
recently played a central role in developing polynomial time algorithms with
provable asymptotic consistency and sample complexity guarantees for topic
estimation in separable topic models. Of these algorithms, those that rely
solely on the simplicial condition are impractical while the practical ones
need stronger conditions. In this paper, we demonstrate, for the first time,
that the simplicial condition is a fundamental, algorithm-independent,
information-theoretic necessary condition for consistent separable topic
estimation. Furthermore, under solely the simplicial condition, we present a
practical quadratic-complexity algorithm based on random projections which
consistently detects all novel words of all topics using only up to
second-order empirical word moments. This algorithm is amenable to distributed
implementation making it attractive for 'big-data' scenarios involving a
network of large distributed databases
Learning loopy graphical models with latent variables: Efficient methods and guarantees
The problem of structure estimation in graphical models with latent variables
is considered. We characterize conditions for tractable graph estimation and
develop efficient methods with provable guarantees. We consider models where
the underlying Markov graph is locally tree-like, and the model is in the
regime of correlation decay. For the special case of the Ising model, the
number of samples required for structural consistency of our method scales
as , where p is the
number of variables, is the minimum edge potential, is
the depth (i.e., distance from a hidden node to the nearest observed nodes),
and is a parameter which depends on the bounds on node and edge
potentials in the Ising model. Necessary conditions for structural consistency
under any algorithm are derived and our method nearly matches the lower bound
on sample requirements. Further, the proposed method is practical to implement
and provides flexibility to control the number of latent variables and the
cycle lengths in the output graph.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1070 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …