15,118 research outputs found
Modeling Topic and Role Information in Meetings using the Hierarchical Dirichlet Process
Abstract. In this paper, we address the modeling of topic and role information in multiparty meetings, via a nonparametric Bayesian model called the hierarchical Dirichlet process. This model provides a powerful solution to topic modeling and a flexible framework for the incorporation of other cues such as speaker role information. We present our modeling framework for topic and role on the AMI Meeting Corpus, and illustrate the effectiveness of the approach in the context of adapting a baseline language model in a large-vocabulary automatic speech recognition system for multiparty meetings. The adapted LM produces significant improvements in terms of both perplexity and word error rate.
Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
We present a Bayesian nonparametric framework for multilevel clustering which
utilizes group-level context information to simultaneously discover
low-dimensional structures of the group contents and partitions groups into
clusters. Using the Dirichlet process as the building block, our model
constructs a product base-measure with a nested structure to accommodate
content and context observations at multiple levels. The proposed model
possesses properties that link the nested Dirichlet processes (nDP) and the
Dirichlet process mixture models (DPM) in an interesting way: integrating out
all contents results in the DPM over contexts, whereas integrating out
group-specific contexts results in the nDP mixture over content variables. We
provide a Polya-urn view of the model and an efficient collapsed Gibbs
inference procedure. Extensive experiments on real-world datasets demonstrate
the advantage of utilizing context information via our model in both text and
image domains.Comment: Full version of ICML 201
The Discrete Infinite Logistic Normal Distribution
We present the discrete infinite logistic normal distribution (DILN), a
Bayesian nonparametric prior for mixed membership models. DILN is a
generalization of the hierarchical Dirichlet process (HDP) that models
correlation structure between the weights of the atoms at the group level. We
derive a representation of DILN as a normalized collection of gamma-distributed
random variables, and study its statistical properties. We consider
applications to topic modeling and derive a variational inference algorithm for
approximate posterior inference. We study the empirical performance of the DILN
topic model on four corpora, comparing performance with the HDP and the
correlated topic model (CTM). To deal with large-scale data sets, we also
develop an online inference algorithm for DILN and compare with online HDP and
online LDA on the Nature magazine, which contains approximately 350,000
articles.Comment: This paper will appear in Bayesian Analysis. A shorter version of
this paper appeared at AISTATS 2011, Fort Lauderdale, FL, US
- ā¦