4,275 research outputs found
A Spectral Algorithm for Latent Dirichlet Allocation
The problem of topic modeling can be seen as a generalization of the
clustering problem, in that it posits that observations are generated due to
multiple latent factors (e.g., the words in each document are generated as a
mixture of several active topics, as opposed to just one). This increased
representational power comes at the cost of a more challenging unsupervised
learning problem of estimating the topic probability vectors (the distributions
over words for each topic), when only the words are observed and the
corresponding topics are hidden.
We provide a simple and efficient learning procedure that is guaranteed to
recover the parameters for a wide class of mixture models, including the
popular latent Dirichlet allocation (LDA) model. For LDA, the procedure
correctly recovers both the topic probability vectors and the prior over the
topics, using only trigram statistics (i.e., third order moments, which may be
estimated with documents containing just three words). The method, termed
Excess Correlation Analysis (ECA), is based on a spectral decomposition of low
order moments (third and fourth order) via two singular value decompositions
(SVDs). Moreover, the algorithm is scalable since the SVD operations are
carried out on matrices, where is the number of latent factors
(e.g. the number of topics), rather than in the -dimensional observed space
(typically ).Comment: Changed title to match conference version, which appears in Advances
in Neural Information Processing Systems 25, 201
A Review of Codebook Models in Patch-Based Visual Object Recognition
The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods
Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes
In this paper we propose a Bayesian nonparametric model for clustering
partial ranking data. We start by developing a Bayesian nonparametric extension
of the popular Plackett-Luce choice model that can handle an infinite number of
choice items. Our framework is based on the theory of random atomic measures,
with the prior specified by a completely random measure. We characterise the
posterior distribution given data, and derive a simple and effective Gibbs
sampler for posterior simulation. We then develop a Dirichlet process mixture
extension of our model and apply it to investigate the clustering of
preferences for college degree programmes amongst Irish secondary school
graduates. The existence of clusters of applicants who have similar preferences
for degree programmes is established and we determine that subject matter and
geographical location of the third level institution characterise these
clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …