6 research outputs found
Dirichlet belief networks for topic structure learning
Recently, considerable research effort has been devoted to developing deep
architectures for topic models to learn topic structures. Although several deep
models have been proposed to learn better topic proportions of documents, how
to leverage the benefits of deep structures for learning word distributions of
topics has not yet been rigorously studied. Here we propose a new multi-layer
generative process on word distributions of topics, where each layer consists
of a set of topics and each topic is drawn from a mixture of the topics of the
layer above. As the topics in all layers can be directly interpreted by words,
the proposed model is able to discover interpretable topic hierarchies. As a
self-contained module, our model can be flexibly adapted to different kinds of
topic models to improve their modelling accuracy and interpretability.
Extensive experiments on text corpora demonstrate the advantages of the
proposed model.Comment: accepted in NIPS 201
Decontamination of Mutual Contamination Models
Many machine learning problems can be characterized by mutual contamination
models. In these problems, one observes several random samples from different
convex combinations of a set of unknown base distributions and the goal is to
infer these base distributions. This paper considers the general setting where
the base distributions are defined on arbitrary probability spaces. We examine
three popular machine learning problems that arise in this general setting:
multiclass classification with label noise, demixing of mixed membership
models, and classification with partial labels. In each case, we give
sufficient conditions for identifiability and present algorithms for the
infinite and finite sample settings, with associated performance guarantees.Comment: Published in JMLR. Subsumes arXiv:1602.0623