5,987 research outputs found
Neural Models for Documents with Metadata
Most real-world document collections involve various types of metadata, such
as author, source, and date, and yet the most commonly-used approaches to
modeling text corpora ignore this information. While specialized models have
been developed for particular applications, few are widely used in practice, as
customization typically requires derivation of a custom inference algorithm. In
this paper, we build on recent advances in variational inference methods and
propose a general neural framework, based on topic models, to enable flexible
incorporation of metadata and allow for rapid exploration of alternative
models. Our approach achieves strong performance, with a manageable tradeoff
between perplexity, coherence, and sparsity. Finally, we demonstrate the
potential of our framework through an exploration of a corpus of articles about
US immigration.Comment: 13 pages, 3 figures, 6 tables; updating to version published at ACL
201
Machine Learning with World Knowledge: The Position and Survey
Machine learning has become pervasive in multiple domains, impacting a wide
variety of applications, such as knowledge discovery and data mining, natural
language processing, information retrieval, computer vision, social and health
informatics, ubiquitous computing, etc. Two essential problems of machine
learning are how to generate features and how to acquire labels for machines to
learn. Particularly, labeling large amount of data for each domain-specific
problem can be very time consuming and costly. It has become a key obstacle in
making learning protocols realistic in applications. In this paper, we will
discuss how to use the existing general-purpose world knowledge to enhance
machine learning processes, by enriching the features or reducing the labeling
work. We start from the comparison of world knowledge with domain-specific
knowledge, and then introduce three key problems in using world knowledge in
learning processes, i.e., explicit and implicit feature representation,
inference for knowledge linking and disambiguation, and learning with direct or
indirect supervision. Finally we discuss the future directions of this research
topic
Interdependent Gibbs Samplers
Gibbs sampling, as a model learning method, is known to produce the most
accurate results available in a variety of domains, and is a de facto standard
in these domains. Yet, it is also well known that Gibbs random walks usually
have bottlenecks, sometimes termed "local maxima", and thus samplers often
return suboptimal solutions. In this paper we introduce a variation of the
Gibbs sampler which yields high likelihood solutions significantly more often
than the regular Gibbs sampler.
Specifically, we show that combining multiple samplers, with certain
dependence (coupling) between them, results in higher likelihood solutions.
This side-steps the well known issue of identifiability, which has been the
obstacle to combining samplers in previous work. We evaluate the approach on a
Latent Dirichlet Allocation model, and also on HMM's, where precise computation
of likelihoods and comparisons to the standard EM algorithm are possible.Comment: Added a reference to a previous work which considered a very similar
algorith
Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models
Supervisory signals have the potential to make low-dimensional data
representations, like those learned by mixture and topic models, more
interpretable and useful. We propose a framework for training latent variable
models that explicitly balances two goals: recovery of faithful generative
explanations of high-dimensional data, and accurate prediction of associated
semantic labels. Existing approaches fail to achieve these goals due to an
incomplete treatment of a fundamental asymmetry: the intended application is
always predicting labels from data, not data from labels. Our
prediction-constrained objective for training generative models coherently
integrates loss-based supervisory signals while enabling effective
semi-supervised learning from partially labeled data. We derive learning
algorithms for semi-supervised mixture and topic models using stochastic
gradient descent with automatic differentiation. We demonstrate improved
prediction quality compared to several previous supervised topic models,
achieving predictions competitive with high-dimensional logistic regression on
text sentiment analysis and electronic health records tasks while
simultaneously learning interpretable topics
Prediction-Constrained Topic Models for Antidepressant Recommendation
Supervisory signals can help topic models discover low-dimensional data
representations that are more interpretable for clinical tasks. We propose a
framework for training supervised latent Dirichlet allocation that balances two
goals: faithful generative explanations of high-dimensional data and accurate
prediction of associated class labels. Existing approaches fail to balance
these goals by not properly handling a fundamental asymmetry: the intended task
is always predicting labels from data, not data from labels. Our new
prediction-constrained objective trains models that predict labels from heldout
data well while also producing good generative likelihoods and interpretable
topic-word parameters. In a case study on predicting depression medications
from electronic health records, we demonstrate improved recommendations
compared to previous supervised topic models and high- dimensional logistic
regression from words alone.Comment: Accepted poster at NIPS 2017 Workshop on Machine Learning for Health
(https://ml4health.github.io/2017/
A Tutorial on Deep Latent Variable Models of Natural Language
There has been much recent, exciting work on combining the complementary
strengths of latent variable models and deep learning. Latent variable modeling
makes it easy to explicitly specify model constraints through conditional
independence properties, while deep learning makes it possible to parameterize
these conditional likelihoods with powerful function approximators. While these
"deep latent variable" models provide a rich, flexible framework for modeling
many real-world phenomena, difficulties exist: deep parameterizations of
conditional likelihoods usually make posterior inference intractable, and
latent variable objectives often complicate backpropagation by introducing
points of non-differentiability. This tutorial explores these issues in depth
through the lens of variational inference.Comment: EMNLP 2018 Tutoria
Discovering Discrete Latent Topics with Neural Variational Inference
Topic models have been widely explored as probabilistic generative models of
documents. Traditional inference methods have sought closed-form derivations
for updating the models, however as the expressiveness of these models grows,
so does the difficulty of performing fast and accurate inference over their
parameters. This paper presents alternative neural approaches to topic
modelling by providing parameterisable distributions over topics which permit
training by backpropagation in the framework of neural variational inference.
In addition, with the help of a stick-breaking construction, we propose a
recurrent network that is able to discover a notionally unbounded number of
topics, analogous to Bayesian non-parametric topic models. Experimental results
on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the
effectiveness and efficiency of these neural topic models.Comment: ICML 201
Familia: A Configurable Topic Modeling Framework for Industrial Text Engineering
In the last decade, a variety of topic models have been proposed for text
engineering. However, except Probabilistic Latent Semantic Analysis (PLSA) and
Latent Dirichlet Allocation (LDA), most of existing topic models are seldom
applied or considered in industrial scenarios. This phenomenon is caused by the
fact that there are very few convenient tools to support these topic models so
far. Intimidated by the demanding expertise and labor of designing and
implementing parameter inference algorithms, software engineers are prone to
simply resort to PLSA/LDA, without considering whether it is proper for their
problem at hand or not. In this paper, we propose a configurable topic modeling
framework named Familia, in order to bridge the huge gap between academic
research fruits and current industrial practice. Familia supports an important
line of topic models that are widely applicable in text engineering scenarios.
In order to relieve burdens of software engineers without knowledge of Bayesian
networks, Familia is able to conduct automatic parameter inference for a
variety of topic models. Simply through changing the data organization of
Familia, software engineers are able to easily explore a broad spectrum of
existing topic models or even design their own topic models, and find the one
that best suits the problem at hand. With its superior extendability, Familia
has a novel sampling mechanism that strikes balance between effectiveness and
efficiency of parameter inference. Furthermore, Familia is essentially a big
topic modeling framework that supports parallel parameter inference and
distributed parameter storage. The utilities and necessity of Familia are
demonstrated in real-life industrial applications. Familia would significantly
enlarge software engineers' arsenal of topic models and pave the way for
utilizing highly customized topic models in real-life problems.Comment: 21 pages, 15 figure
Discovering shared and individual latent structure in multiple time series
This paper proposes a nonparametric Bayesian method for exploratory data
analysis and feature construction in continuous time series. Our method focuses
on understanding shared features in a set of time series that exhibit
significant individual variability. Our method builds on the framework of
latent Diricihlet allocation (LDA) and its extension to hierarchical Dirichlet
processes, which allows us to characterize each series as switching between
latent ``topics'', where each topic is characterized as a distribution over
``words'' that specify the series dynamics. However, unlike standard
applications of LDA, we discover the words as we learn the model. We apply this
model to the task of tracking the physiological signals of premature infants;
our model obtains clinically significant insights as well as useful features
for supervised learning tasks.Comment: Additional supplementary section in tex fil
Concept Modeling with Superwords
In information retrieval, a fundamental goal is to transform a document into
concepts that are representative of its content. The term "representative" is
in itself challenging to define, and various tasks require different
granularities of concepts. In this paper, we aim to model concepts that are
sparse over the vocabulary, and that flexibly adapt their content based on
other relevant semantic information such as textual structure or associated
image features. We explore a Bayesian nonparametric model based on nested beta
processes that allows for inferring an unknown number of strictly sparse
concepts. The resulting model provides an inherently different representation
of concepts than a standard LDA (or HDP) based topic model, and allows for
direct incorporation of semantic features. We demonstrate the utility of this
representation on multilingual blog data and the Congressional Record
- …