737 research outputs found
Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
We present a Bayesian nonparametric framework for multilevel clustering which
utilizes group-level context information to simultaneously discover
low-dimensional structures of the group contents and partitions groups into
clusters. Using the Dirichlet process as the building block, our model
constructs a product base-measure with a nested structure to accommodate
content and context observations at multiple levels. The proposed model
possesses properties that link the nested Dirichlet processes (nDP) and the
Dirichlet process mixture models (DPM) in an interesting way: integrating out
all contents results in the DPM over contexts, whereas integrating out
group-specific contexts results in the nDP mixture over content variables. We
provide a Polya-urn view of the model and an efficient collapsed Gibbs
inference procedure. Extensive experiments on real-world datasets demonstrate
the advantage of utilizing context information via our model in both text and
image domains.Comment: Full version of ICML 201
A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling
Driven by the multi-level structure of human intracranial
electroencephalogram (iEEG) recordings of epileptic seizures, we introduce a
new variant of a hierarchical Dirichlet Process---the multi-level clustering
hierarchical Dirichlet Process (MLC-HDP)---that simultaneously clusters
datasets on multiple levels. Our seizure dataset contains brain activity
recorded in typically more than a hundred individual channels for each seizure
of each patient. The MLC-HDP model clusters over channels-types, seizure-types,
and patient-types simultaneously. We describe this model and its implementation
in detail. We also present the results of a simulation study comparing the
MLC-HDP to a similar model, the Nested Dirichlet Process and finally
demonstrate the MLC-HDP's use in modeling seizures across multiple patients. We
find the MLC-HDP's clustering to be comparable to independent human physician
clusterings. To our knowledge, the MLC-HDP model is the first in the epilepsy
literature capable of clustering seizures within and between patients.Comment: ICML201
Probabilistic Multilevel Clustering via Composite Transportation Distance
We propose a novel probabilistic approach to multilevel clustering problems
based on composite transportation distance, which is a variant of
transportation distance where the underlying metric is Kullback-Leibler
divergence. Our method involves solving a joint optimization problem over
spaces of probability measures to simultaneously discover grouping structures
within groups and among groups. By exploiting the connection of our method to
the problem of finding composite transportation barycenters, we develop fast
and efficient optimization algorithms even for potentially large-scale
multilevel datasets. Finally, we present experimental results with both
synthetic and real data to demonstrate the efficiency and scalability of the
proposed approach.Comment: 25 pages, 3 figure
Clustering Areal Units at Multiple Levels of Resolution to Model Crime in Philadelphia
Estimation of the spatial heterogeneity in crime incidence across an entire
city is an important step towards reducing crime and increasing our
understanding of the physical and social functioning of urban environments.
This is a difficult modeling endeavor since crime incidence can vary smoothly
across space and time but there also exist physical and social barriers that
result in discontinuities in crime rates between different regions within a
city. A further difficulty is that there are different levels of resolution
that can be used for defining regions of a city in order to analyze crime. To
address these challenges, we develop a Bayesian non-parametric approach for the
clustering of urban areal units at different levels of resolution
simultaneously. Our approach is evaluated with an extensive synthetic data
study and then applied to the estimation of crime incidence at various levels
of resolution in the city of Philadelphia
Posterior Regularization on Bayesian Hierarchical Mixture Clustering
Bayesian hierarchical mixture clustering (BHMC) improves on the traditional
Bayesian hierarchical clustering by, with regard to the parent-to-child
diffusion in the generative process, replacing the conventional
Gaussian-to-Gaussian (G2G) kernels with a Hierarchical Dirichlet Process
Mixture Model (HDPMM). However, the drawback of the BHMC lies in the
possibility of obtaining trees with comparatively high nodal variance in the
higher levels (i.e., those closer to the root node). This can be interpreted as
that the separation between the nodes, particularly those in the higher levels,
might be weak. We attempt to overcome this drawback through a recent
inferential framework named posterior regularization, which facilitates a
simple manner to impose extra constraints on a Bayesian model to address its
weakness. To enhance the separation of clusters, we apply posterior
regularization to impose max-margin constraints on the nodes at every level of
the hierarchy. In this paper, we illustrate the modeling detail of applying the
PR on BHMC and show that this solution achieves the desired improvements over
the BHMC model
- …