18,633 research outputs found
Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
We present a Bayesian nonparametric framework for multilevel clustering which
utilizes group-level context information to simultaneously discover
low-dimensional structures of the group contents and partitions groups into
clusters. Using the Dirichlet process as the building block, our model
constructs a product base-measure with a nested structure to accommodate
content and context observations at multiple levels. The proposed model
possesses properties that link the nested Dirichlet processes (nDP) and the
Dirichlet process mixture models (DPM) in an interesting way: integrating out
all contents results in the DPM over contexts, whereas integrating out
group-specific contexts results in the nDP mixture over content variables. We
provide a Polya-urn view of the model and an efficient collapsed Gibbs
inference procedure. Extensive experiments on real-world datasets demonstrate
the advantage of utilizing context information via our model in both text and
image domains.Comment: Full version of ICML 201
Recommended from our members
An effective data placement strategy for XML documents
As XML is increasingly being used in Web applications, new
technologies need to be investigated for processing XML documents with high
performance. Parallelism is a promising solution for structured document
processing and data placement is a major factor for system performance
improvement in parallel processing. This paper describes an effective XML
document data placement strategy. The new strategy is based on a multilevel
graph partitioning algorithm with the consideration of the unique features of
XML documents and query distributions. A new algorithm, which is based on
XML query schemas to derive the weighted graph from the labelled directed
graph presentation of XML documents, is also proposed. Performance analysis
on the algorithm presented in the paper shows that the new data placement
strategy exhibits low workload skew and a high degree of parallelism
- …