728 research outputs found
A Supervised Neural Autoregressive Topic Model for Simultaneous Image Classification and Annotation
Topic modeling based on latent Dirichlet allocation (LDA) has been a
framework of choice to perform scene recognition and annotation. Recently, a
new type of topic model called the Document Neural Autoregressive Distribution
Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance
for document modeling. In this work, we show how to successfully apply and
extend this model to the context of visual scene modeling. Specifically, we
propose SupDocNADE, a supervised extension of DocNADE, that increases the
discriminative power of the hidden topic features by incorporating label
information into the training objective of the model. We also describe how to
leverage information about the spatial position of the visual words and how to
embed additional image annotations, so as to simultaneously perform image
classification and annotation. We test our model on the Scene15, LabelMe and
UIUC-Sports datasets and show that it compares favorably to other topic models
such as the supervised variant of LDA.Comment: 13 pages, 5 figure
A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data
Topic modeling based on latent Dirichlet allocation (LDA) has been a
framework of choice to deal with multimodal data, such as in image annotation
tasks. Another popular approach to model the multimodal data is through deep
neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type
of topic model called the Document Neural Autoregressive Distribution Estimator
(DocNADE) was proposed and demonstrated state-of-the-art performance for text
document modeling. In this work, we show how to successfully apply and extend
this model to multimodal data, such as simultaneous image classification and
annotation. First, we propose SupDocNADE, a supervised extension of DocNADE,
that increases the discriminative power of the learned hidden topic features
and show how to employ it to learn a joint representation from image visual
words, annotation words and class label information. We test our model on the
LabelMe and UIUC-Sports data sets and show that it compares favorably to other
topic models. Second, we propose a deep extension of our model and provide an
efficient way of training the deep model. Experimental results show that our
deep model outperforms its shallow version and reaches state-of-the-art
performance on the Multimedia Information Retrieval (MIR) Flickr data set.Comment: 24 pages, 10 figures. A version has been accepted by TPAMI on Aug
4th, 2015. Add footnote about how to train the model in practice in Section
5.1. arXiv admin note: substantial text overlap with arXiv:1305.530
Gibbs Max-margin Topic Models with Data Augmentation
Max-margin learning is a powerful approach to building classifiers and
structured output predictors. Recent work on max-margin supervised topic models
has successfully integrated it with Bayesian topic models to discover
discriminative latent semantic structures and make accurate predictions for
unseen testing data. However, the resulting learning problems are usually hard
to solve because of the non-smoothness of the margin loss. Existing approaches
to building max-margin supervised topic models rely on an iterative procedure
to solve multiple latent SVM subproblems with additional mean-field assumptions
on the desired posterior distributions. This paper presents an alternative
approach by defining a new max-margin loss. Namely, we present Gibbs max-margin
supervised topic models, a latent variable Gibbs classifier to discover hidden
topic representations for various tasks, including classification, regression
and multi-task learning. Gibbs max-margin supervised topic models minimize an
expected margin loss, which is an upper bound of the existing margin loss
derived from an expected prediction rule. By introducing augmented variables
and integrating out the Dirichlet variables analytically by conjugacy, we
develop simple Gibbs sampling algorithms with no restricting assumptions and no
need to solve SVM subproblems. Furthermore, each step of the
"augment-and-collapse" Gibbs sampling algorithms has an analytical conditional
distribution, from which samples can be easily drawn. Experimental results
demonstrate significant improvements on time efficiency. The classification
performance is also significantly improved over competitors on binary,
multi-class and multi-label classification tasks.Comment: 35 page
- …