737 research outputs found

    Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts

    Get PDF
    We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polya-urn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains.Comment: Full version of ICML 201

    A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling

    Get PDF
    Driven by the multi-level structure of human intracranial electroencephalogram (iEEG) recordings of epileptic seizures, we introduce a new variant of a hierarchical Dirichlet Process---the multi-level clustering hierarchical Dirichlet Process (MLC-HDP)---that simultaneously clusters datasets on multiple levels. Our seizure dataset contains brain activity recorded in typically more than a hundred individual channels for each seizure of each patient. The MLC-HDP model clusters over channels-types, seizure-types, and patient-types simultaneously. We describe this model and its implementation in detail. We also present the results of a simulation study comparing the MLC-HDP to a similar model, the Nested Dirichlet Process and finally demonstrate the MLC-HDP's use in modeling seizures across multiple patients. We find the MLC-HDP's clustering to be comparable to independent human physician clusterings. To our knowledge, the MLC-HDP model is the first in the epilepsy literature capable of clustering seizures within and between patients.Comment: ICML201

    Probabilistic Multilevel Clustering via Composite Transportation Distance

    Full text link
    We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach.Comment: 25 pages, 3 figure

    Clustering Areal Units at Multiple Levels of Resolution to Model Crime in Philadelphia

    Full text link
    Estimation of the spatial heterogeneity in crime incidence across an entire city is an important step towards reducing crime and increasing our understanding of the physical and social functioning of urban environments. This is a difficult modeling endeavor since crime incidence can vary smoothly across space and time but there also exist physical and social barriers that result in discontinuities in crime rates between different regions within a city. A further difficulty is that there are different levels of resolution that can be used for defining regions of a city in order to analyze crime. To address these challenges, we develop a Bayesian non-parametric approach for the clustering of urban areal units at different levels of resolution simultaneously. Our approach is evaluated with an extensive synthetic data study and then applied to the estimation of crime incidence at various levels of resolution in the city of Philadelphia

    Posterior Regularization on Bayesian Hierarchical Mixture Clustering

    Full text link
    Bayesian hierarchical mixture clustering (BHMC) improves on the traditional Bayesian hierarchical clustering by, with regard to the parent-to-child diffusion in the generative process, replacing the conventional Gaussian-to-Gaussian (G2G) kernels with a Hierarchical Dirichlet Process Mixture Model (HDPMM). However, the drawback of the BHMC lies in the possibility of obtaining trees with comparatively high nodal variance in the higher levels (i.e., those closer to the root node). This can be interpreted as that the separation between the nodes, particularly those in the higher levels, might be weak. We attempt to overcome this drawback through a recent inferential framework named posterior regularization, which facilitates a simple manner to impose extra constraints on a Bayesian model to address its weakness. To enhance the separation of clusters, we apply posterior regularization to impose max-margin constraints on the nodes at every level of the hierarchy. In this paper, we illustrate the modeling detail of applying the PR on BHMC and show that this solution achieves the desired improvements over the BHMC model
    • …
    corecore