4,027 research outputs found
Nested Hierarchical Dirichlet Processes
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical
topic modeling. The nHDP is a generalization of the nested Chinese restaurant
process (nCRP) that allows each word to follow its own path to a topic node
according to a document-specific distribution on a shared tree. This alleviates
the rigid, single-path formulation of the nCRP, allowing a document to more
easily express thematic borrowings as a random effect. We derive a stochastic
variational inference algorithm for the model, in addition to a greedy subtree
selection method for each document, which allows for efficient inference using
massive collections of text documents. We demonstrate our algorithm on 1.8
million documents from The New York Times and 3.3 million documents from
Wikipedia.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence, Special Issue on Bayesian Nonparametric
The semi-hierarchical Dirichlet Process and its application to clustering homogeneous distributions
Assessing homogeneity of distributions is an old problem that has received
considerable attention, especially in the nonparametric Bayesian literature. To
this effect, we propose the semi-hierarchical Dirichlet process, a novel
hierarchical prior that extends the hierarchical Dirichlet process of Teh et
al. (2006) and that avoids the degeneracy issues of nested processes recently
described by Camerlenghi et al. (2019a). We go beyond the simple yes/no answer
to the homogeneity question and embed the proposed prior in a random partition
model; this procedure allows us to give a more comprehensive response to the
above question and in fact find groups of populations that are internally
homogeneous when I greater or equal than 2 such populations are considered. We
study theoretical properties of the semi-hierarchical Dirichlet process and of
the Bayes factor for the homogeneity test when I = 2. Extensive simulation
studies and applications to educational data are also discussed
- …