Search CORE

396 research outputs found

Rank-based Bayesian clustering via covariate-informed Mallows mixtures

Author: Eliseussen Emilie
Frigessi Arnoldo
Vitelli Valeria
Publication venue
Publication date: 16/02/2024
Field of study

Data in the form of rankings, ratings, pair comparisons or clicks are frequently collected in diverse fields, from marketing to politics, to understand assessors' individual preferences. Combining such preference data with features associated with the assessors can lead to a better understanding of the assessors' behaviors and choices. The Mallows model is a popular model for rankings, as it flexibly adapts to different types of preference data, and the previously proposed Bayesian Mallows Model (BMM) offers a computationally efficient framework for Bayesian inference, also allowing capturing the users' heterogeneity via a finite mixture. We develop a Bayesian Mallows-based finite mixture model that performs clustering while also accounting for assessor-related features, called the Bayesian Mallows model with covariates (BMMx). BMMx is based on a similarity function that a priori favours the aggregation of assessors into a cluster when their covariates are similar, using the Product Partition models (PPMx) proposal. We present two approaches to measure the covariate similarity: one based on a novel deterministic function measuring the covariates' goodness-of-fit to the cluster, and one based on an augmented model as in PPMx. We investigate the performance of BMMx in both simulation experiments and real-data examples, showing the method's potential for advancing the understanding of assessor preferences and behaviors in different applications

arXiv.org e-Print Archive

Personalized Treatment Selection via Product Partition Models with Covariates

Author: Argiento Raffaele
Pedone Matteo
Stingo Francesco C.
Publication venue
Publication date: 01/09/2023
Field of study

Precision medicine is an approach for disease treatment that defines treatment strategies based on the individual characteristics of the patients. Motivated by an open problem in cancer genomics, we develop a novel model that flexibly clusters patients with similar predictive characteristics and similar treatment responses; this approach identifies, via predictive inference, which one among a set of treatments is better suited for a new patient. The proposed method is fully model-based, avoiding uncertainty underestimation attained when treatment assignment is performed by adopting heuristic clustering procedures, and belongs to the class of product partition models with covariates, here extended to include the cohesion induced by the Normalized Generalized Gamma process. The method performs particularly well in scenarios characterized by considerable heterogeneity of the predictive covariates in simulation studies. A cancer genomics case study illustrates the potential benefits in terms of treatment response yielded by the proposed approach. Finally, being model-based, the approach allows estimating clusters' specific response probabilities and then identifying patients more likely to benefit from personalized treatment.Comment: 31 pages, 7 figure

arXiv.org e-Print Archive

Centered Partition Process: Informative Priors for Clustering

Author: Dunson David B.
Herring Amy H.
Olshan Andrew F.
Paganin Sally
Publication venue
Publication date: 29/01/2019
Field of study

There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors encompass a broad class of such cases, including Dirichlet and Pitman-Yor processes. Even though there have been some proposals to relax the exchangeability assumption, allowing covariate-dependence and partial exchangeability, limited consideration has been given on how to include concrete prior knowledge on the partition. For example, we are motivated by an epidemiological application, in which we wish to cluster birth defects into groups and we have prior knowledge of an initial clustering provided by experts. As a general approach for including such prior knowledge, we propose a Centered Partition (CP) process that modifies the EPPF to favor partitions close to an initial one. Some properties of the CP prior are described, a general algorithm for posterior computation is developed, and we illustrate the methodology through simulation examples and an application to the motivating epidemiology study of birth defects

arXiv.org e-Print Archive

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Graph Sphere: From Nodes to Supernodes in Graphical Models

Author: Beskos Alexandros
Boom Willem van den
De Iorio Maria
Jasra Ajay
Publication venue
Publication date: 18/10/2023
Field of study

High-dimensional data analysis typically focuses on low-dimensional structure, often to aid interpretation and computational efficiency. Graphical models provide a powerful methodology for learning the conditional independence structure in multivariate data by representing variables as nodes and dependencies as edges. Inference is often focused on individual edges in the latent graph. Nonetheless, there is increasing interest in determining more complex structures, such as communities of nodes, for multiple reasons, including more effective information retrieval and better interpretability. In this work, we propose a multilayer graphical model where we first cluster nodes and then, at the second layer, investigate the relationships among groups of nodes. Specifically, nodes are partitioned into "supernodes" with a data-coherent size-biased tessellation prior which combines ideas from Bayesian nonparametrics and Voronoi tessellations. This construct allows accounting also for dependence of nodes within supernodes. At the second layer, dependence structure among supernodes is modelled through a Gaussian graphical model, where the focus of inference is on "superedges". We provide theoretical justification for our modelling choices. We design tailored Markov chain Monte Carlo schemes, which also enable parallel computations. We demonstrate the effectiveness of our approach for large-scale structure learning in simulations and a transcriptomics application.Comment: 71 pages, 18 figure

arXiv.org e-Print Archive

Explaining Differences in Voting Patterns Across Voting Domains Using Hierarchical Bayesian Models

Author: Lipman Erin
Moser Scott
Rodriguez Abel
Publication venue
Publication date: 23/02/2024
Field of study

Spatial voting models of legislators' preferences are used in political science to test theories about their voting behavior. These models posit that legislators' ideologies as well as the ideologies reflected in votes for and against a bill or measure exist as points in some low dimensional space, and that legislators vote for positions that are close to their own ideologies. Bayesian spatial voting models have been developed to test sharp hypotheses about whether a legislator's revealed ideal point differs for two distinct sets of bills. This project extends such a model to identify covariates that explain whether legislators exhibit such differences in ideal points. We use our method to examine voting behavior on procedural versus final passage votes in the U.S. house of representatives for the 93rd through 113th congresses. The analysis provides evidence that legislators in the minority party as well as legislators with a moderate constituency are more likely to have different ideal points for procedural versus final passage votes

arXiv.org e-Print Archive

Flexible clustering via hidden hierarchical Dirichlet priors

Author: Rebaudo G
Publication venue
Publication date: 01/01/2023
Field of study

Institutional Research Information System University of Turin

Flexible clustering via hidden hierarchical Dirichlet priors

Author: Lijoi Antonio
Pruenster Igor
Rebaudo Giovanni
Publication venue: 'Wiley'
Publication date: 18/01/2022
Field of study

The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for clustering probability distributions is the nested Dirichlet process, which however has the drawback of grouping distributions in a single cluster when ties are observed across samples. With the goal of achieving a flexible and effective clustering method for both samples and observations, we investigate a nonparametric prior that arises as the composition of two different discrete random structures and derive a closed-form expression for the induced distribution of the random partition, the fundamental tool regulating the clustering behavior of the model. On the one hand, this allows to gain a deeper insight into the theoretical properties of the model and, on the other hand, it yields an MCMC algorithm for evaluating Bayesian inferences of interest. Moreover, we single out limitations of this algorithm when working with more than two populations and, consequently, devise an alternative more efficient sampling scheme, which as a by-product, allows testing homogeneity between different populations. Finally, we perform a comparison with the nested Dirichlet process and provide illustrative examples of both synthetic and real data

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Pronounced Genetic Structure in a Highly Mobile Coral Reef Fish, Caesio cuning, in the Coral Triangle

Author: Ablan-Lagman Ma Carmen A.
Ackiss Amanda S.
Ambariyanto
Barber Paul H.
Carpenter Kent E.
Crandall Eric D.
Pardede Shinta
Romena November
Publication venue: ODU Digital Commons
Publication date: 01/01/2013
Field of study

The redbelly yellowtail fusilier Caesio cuning has a tropical Indo-West Pacific range that straddles the Coral Triangle, a region of dynamic geological history and the highest marine biodiversity on the planet. Previous genetic studies in the Coral Triangle indicate the presence of multiple limits to connectivity. However, these studies have focused almost exclusively on benthic, reef-dwelling species. Schooling, reef-associated fusiliers (Perciformes: Caesionidae) account for a sizable portion of the annual reef catch in the Coral Triangle, yet to date, there have been no indepth studies on the population structure of fusiliers or other mid-water, reef-associated planktivores across this region. We evaluated the genetic population structure of C. cuning using a 382 bp segment of the mitochondrial control region amplified from over 620 fish sampled from 33 localities across the Philippines and Indonesia. Phylogeographic analysis showed that individuals sampled from sites in western Sumatra belong to a distinct Indian Ocean lineage, resulting in pronounced regional structure between western Sumatra and the rest of the Coral Triangle (φCT = 0.4796, p \u3c 0.004). We found additional significant population structure between central Southeast Asia and eastern Indonesia (φCT = 0.0450, p \u3c 0.001). These data in conjunction with spatial analyses indicate that there are 2 major lineages of C. cuning and at least 3 distinct management units across the region. The location of genetic breaks as well as the distribution of divergent haplotypes across our sampling range suggests that current oceanographic patterns could be contributing to observed patterns of structure

Digital Commons @ CSUMB (California State University, Monterey Bay)

Animo Repository - De La Salle University Research

Old Dominion University