2,447 research outputs found
Exploiting side information in Bayesian nonparametric models and their applications
My research is to exploit side information into advanced Bayesian nonparametric models. We have developed some novel models for data clustering and medical data analysis and also have made our methods scalable for large-scale data. I have published my research in several journal and conference papers
Centered Partition Process: Informative Priors for Clustering
There is a very rich literature proposing Bayesian approaches for clustering
starting with a prior probability distribution on partitions. Most approaches
assume exchangeability, leading to simple representations in terms of
Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors
encompass a broad class of such cases, including Dirichlet and Pitman-Yor
processes. Even though there have been some proposals to relax the
exchangeability assumption, allowing covariate-dependence and partial
exchangeability, limited consideration has been given on how to include
concrete prior knowledge on the partition. For example, we are motivated by an
epidemiological application, in which we wish to cluster birth defects into
groups and we have prior knowledge of an initial clustering provided by
experts. As a general approach for including such prior knowledge, we propose a
Centered Partition (CP) process that modifies the EPPF to favor partitions
close to an initial one. Some properties of the CP prior are described, a
general algorithm for posterior computation is developed, and we illustrate the
methodology through simulation examples and an application to the motivating
epidemiology study of birth defects
Mixture modeling via vectors of normalized independent finite point processes
Statistical modeling in presence of hierarchical data is a crucial task in
Bayesian statistics. The Hierarchical Dirichlet Process (HDP) represents the
utmost tool to handle data organized in groups through mixture modeling.
Although the HDP is mathematically tractable, its computational cost is
typically demanding, and its analytical complexity represents a barrier for
practitioners. The present paper conceives a mixture model based on a novel
family of Bayesian priors designed for multilevel data and obtained by
normalizing a finite point process. A full distribution theory for this new
family and the induced clustering is developed, including tractable expressions
for marginal, posterior and predictive distributions. Efficient marginal and
conditional Gibbs samplers are designed for providing posterior inference. The
proposed mixture model overcomes the HDP in terms of analytical feasibility,
clustering discovery, and computational time. The motivating application comes
from the analysis of shot put data, which contains performance measurements of
athletes across different seasons. In this setting, the proposed model is
exploited to induce clustering of the observations across seasons and athletes.
By linking clusters across seasons, similarities and differences in athlete's
performances are identified
Bayesian Nonparametrics to Model Content, User, and Latent Structure in Hawkes Processes
Communication in social networks tends to exhibit complex dynamics both in terms of the users involved and the contents exchanged. For example, email exchanges or activities on social media may exhibit reinforcing dynamics, where earlier events trigger follow-up activity through multiple structured latent factors. Such dynamics have been previously represented using models of reinforcement and reciprocation, a canonical example being the Hawkes process (HP). However, previous HP models do not fully capture the rich dynamics of real-world activity. For example, reciprocation may be impacted by the significance and receptivity of the content being communicated, and modeling the content accurately at the individual level may require identification and exploitation of the latent hierarchical structure present among users. Additionally, real-world activity may be driven by multiple latent triggering factors shared by past and future events, with the latent features themselves exhibiting temporal dependency structures. These important characteristics have been largely ignored in previous work. In this dissertation, we address these limitations via three novel Bayesian nonparametric Hawkes process models, where the synergy between Bayesian nonparametric models and Hawkes processes captures the structural and the temporal dynamics of communication in a unified framework. Empirical results demonstrate that our models outperform competing state-of-the-art methods, by more accurately capturing the rich dynamics of the interactions and influences among users and events, and by improving predictions about future event times, user clusters, and latent features in various types of communication activities
Hierarchical Species Sampling Models
This paper introduces a general class of hierarchical nonparametric prior distributions. The random probability measures are constructed by a hierarchy of generalized species sampling processes with possibly non-diffuse base measures. The proposed framework provides a general probabilistic foundation for hierarchical random measures with either atomic or mixed base measures and allows for studying their properties, such as the distribution of the marginal and total number of clusters. We show that hierarchical species sampling models have a Chinese Restaurants Franchise representation and can be used as prior distributions to undertake Bayesian nonparametric inference. We provide a method to sample from the posterior distribution together with some numerical illustrations. Our class of priors includes some new hierarchical mixture priors such as the hierarchical Gnedin measures, and other well-known prior distributions such as the hierarchical Pitman-Yor and the hierarchical normalized random measures
- …