2,447 research outputs found

    Exploiting side information in Bayesian nonparametric models and their applications

    Full text link
     My research is to exploit side information into advanced Bayesian nonparametric models. We have developed some novel models for data clustering and medical data analysis and also have made our methods scalable for large-scale data. I have published my research in several journal and conference papers

    Centered Partition Process: Informative Priors for Clustering

    Full text link
    There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors encompass a broad class of such cases, including Dirichlet and Pitman-Yor processes. Even though there have been some proposals to relax the exchangeability assumption, allowing covariate-dependence and partial exchangeability, limited consideration has been given on how to include concrete prior knowledge on the partition. For example, we are motivated by an epidemiological application, in which we wish to cluster birth defects into groups and we have prior knowledge of an initial clustering provided by experts. As a general approach for including such prior knowledge, we propose a Centered Partition (CP) process that modifies the EPPF to favor partitions close to an initial one. Some properties of the CP prior are described, a general algorithm for posterior computation is developed, and we illustrate the methodology through simulation examples and an application to the motivating epidemiology study of birth defects

    Mixture modeling via vectors of normalized independent finite point processes

    Full text link
    Statistical modeling in presence of hierarchical data is a crucial task in Bayesian statistics. The Hierarchical Dirichlet Process (HDP) represents the utmost tool to handle data organized in groups through mixture modeling. Although the HDP is mathematically tractable, its computational cost is typically demanding, and its analytical complexity represents a barrier for practitioners. The present paper conceives a mixture model based on a novel family of Bayesian priors designed for multilevel data and obtained by normalizing a finite point process. A full distribution theory for this new family and the induced clustering is developed, including tractable expressions for marginal, posterior and predictive distributions. Efficient marginal and conditional Gibbs samplers are designed for providing posterior inference. The proposed mixture model overcomes the HDP in terms of analytical feasibility, clustering discovery, and computational time. The motivating application comes from the analysis of shot put data, which contains performance measurements of athletes across different seasons. In this setting, the proposed model is exploited to induce clustering of the observations across seasons and athletes. By linking clusters across seasons, similarities and differences in athlete's performances are identified

    Bayesian Nonparametrics to Model Content, User, and Latent Structure in Hawkes Processes

    Get PDF
    Communication in social networks tends to exhibit complex dynamics both in terms of the users involved and the contents exchanged. For example, email exchanges or activities on social media may exhibit reinforcing dynamics, where earlier events trigger follow-up activity through multiple structured latent factors. Such dynamics have been previously represented using models of reinforcement and reciprocation, a canonical example being the Hawkes process (HP). However, previous HP models do not fully capture the rich dynamics of real-world activity. For example, reciprocation may be impacted by the significance and receptivity of the content being communicated, and modeling the content accurately at the individual level may require identification and exploitation of the latent hierarchical structure present among users. Additionally, real-world activity may be driven by multiple latent triggering factors shared by past and future events, with the latent features themselves exhibiting temporal dependency structures. These important characteristics have been largely ignored in previous work. In this dissertation, we address these limitations via three novel Bayesian nonparametric Hawkes process models, where the synergy between Bayesian nonparametric models and Hawkes processes captures the structural and the temporal dynamics of communication in a unified framework. Empirical results demonstrate that our models outperform competing state-of-the-art methods, by more accurately capturing the rich dynamics of the interactions and influences among users and events, and by improving predictions about future event times, user clusters, and latent features in various types of communication activities

    Hierarchical Species Sampling Models

    Get PDF
    This paper introduces a general class of hierarchical nonparametric prior distributions. The random probability measures are constructed by a hierarchy of generalized species sampling processes with possibly non-diffuse base measures. The proposed framework provides a general probabilistic foundation for hierarchical random measures with either atomic or mixed base measures and allows for studying their properties, such as the distribution of the marginal and total number of clusters. We show that hierarchical species sampling models have a Chinese Restaurants Franchise representation and can be used as prior distributions to undertake Bayesian nonparametric inference. We provide a method to sample from the posterior distribution together with some numerical illustrations. Our class of priors includes some new hierarchical mixture priors such as the hierarchical Gnedin measures, and other well-known prior distributions such as the hierarchical Pitman-Yor and the hierarchical normalized random measures
    • …
    corecore