396 research outputs found
Rank-based Bayesian clustering via covariate-informed Mallows mixtures
Data in the form of rankings, ratings, pair comparisons or clicks are
frequently collected in diverse fields, from marketing to politics, to
understand assessors' individual preferences. Combining such preference data
with features associated with the assessors can lead to a better understanding
of the assessors' behaviors and choices. The Mallows model is a popular model
for rankings, as it flexibly adapts to different types of preference data, and
the previously proposed Bayesian Mallows Model (BMM) offers a computationally
efficient framework for Bayesian inference, also allowing capturing the users'
heterogeneity via a finite mixture. We develop a Bayesian Mallows-based finite
mixture model that performs clustering while also accounting for
assessor-related features, called the Bayesian Mallows model with covariates
(BMMx). BMMx is based on a similarity function that a priori favours the
aggregation of assessors into a cluster when their covariates are similar,
using the Product Partition models (PPMx) proposal. We present two approaches
to measure the covariate similarity: one based on a novel deterministic
function measuring the covariates' goodness-of-fit to the cluster, and one
based on an augmented model as in PPMx. We investigate the performance of BMMx
in both simulation experiments and real-data examples, showing the method's
potential for advancing the understanding of assessor preferences and behaviors
in different applications
Personalized Treatment Selection via Product Partition Models with Covariates
Precision medicine is an approach for disease treatment that defines
treatment strategies based on the individual characteristics of the patients.
Motivated by an open problem in cancer genomics, we develop a novel model that
flexibly clusters patients with similar predictive characteristics and similar
treatment responses; this approach identifies, via predictive inference, which
one among a set of treatments is better suited for a new patient. The proposed
method is fully model-based, avoiding uncertainty underestimation attained when
treatment assignment is performed by adopting heuristic clustering procedures,
and belongs to the class of product partition models with covariates, here
extended to include the cohesion induced by the Normalized Generalized Gamma
process. The method performs particularly well in scenarios characterized by
considerable heterogeneity of the predictive covariates in simulation studies.
A cancer genomics case study illustrates the potential benefits in terms of
treatment response yielded by the proposed approach. Finally, being
model-based, the approach allows estimating clusters' specific response
probabilities and then identifying patients more likely to benefit from
personalized treatment.Comment: 31 pages, 7 figure
Centered Partition Process: Informative Priors for Clustering
There is a very rich literature proposing Bayesian approaches for clustering
starting with a prior probability distribution on partitions. Most approaches
assume exchangeability, leading to simple representations in terms of
Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors
encompass a broad class of such cases, including Dirichlet and Pitman-Yor
processes. Even though there have been some proposals to relax the
exchangeability assumption, allowing covariate-dependence and partial
exchangeability, limited consideration has been given on how to include
concrete prior knowledge on the partition. For example, we are motivated by an
epidemiological application, in which we wish to cluster birth defects into
groups and we have prior knowledge of an initial clustering provided by
experts. As a general approach for including such prior knowledge, we propose a
Centered Partition (CP) process that modifies the EPPF to favor partitions
close to an initial one. Some properties of the CP prior are described, a
general algorithm for posterior computation is developed, and we illustrate the
methodology through simulation examples and an application to the motivating
epidemiology study of birth defects
Graph Sphere: From Nodes to Supernodes in Graphical Models
High-dimensional data analysis typically focuses on low-dimensional
structure, often to aid interpretation and computational efficiency. Graphical
models provide a powerful methodology for learning the conditional independence
structure in multivariate data by representing variables as nodes and
dependencies as edges. Inference is often focused on individual edges in the
latent graph. Nonetheless, there is increasing interest in determining more
complex structures, such as communities of nodes, for multiple reasons,
including more effective information retrieval and better interpretability. In
this work, we propose a multilayer graphical model where we first cluster nodes
and then, at the second layer, investigate the relationships among groups of
nodes. Specifically, nodes are partitioned into "supernodes" with a
data-coherent size-biased tessellation prior which combines ideas from Bayesian
nonparametrics and Voronoi tessellations. This construct allows accounting also
for dependence of nodes within supernodes. At the second layer, dependence
structure among supernodes is modelled through a Gaussian graphical model,
where the focus of inference is on "superedges". We provide theoretical
justification for our modelling choices. We design tailored Markov chain Monte
Carlo schemes, which also enable parallel computations. We demonstrate the
effectiveness of our approach for large-scale structure learning in simulations
and a transcriptomics application.Comment: 71 pages, 18 figure
Explaining Differences in Voting Patterns Across Voting Domains Using Hierarchical Bayesian Models
Spatial voting models of legislators' preferences are used in political
science to test theories about their voting behavior. These models posit that
legislators' ideologies as well as the ideologies reflected in votes for and
against a bill or measure exist as points in some low dimensional space, and
that legislators vote for positions that are close to their own ideologies.
Bayesian spatial voting models have been developed to test sharp hypotheses
about whether a legislator's revealed ideal point differs for two distinct sets
of bills. This project extends such a model to identify covariates that explain
whether legislators exhibit such differences in ideal points. We use our method
to examine voting behavior on procedural versus final passage votes in the U.S.
house of representatives for the 93rd through 113th congresses. The analysis
provides evidence that legislators in the minority party as well as legislators
with a moderate constituency are more likely to have different ideal points for
procedural versus final passage votes
Flexible clustering via hidden hierarchical Dirichlet priors
The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for clustering probability distributions is the nested Dirichlet process, which however has the drawback of grouping distributions in a single cluster when ties are observed across samples. With the goal of achieving a flexible and effective clustering method for both samples and observations, we investigate a nonparametric prior that arises as the composition of two different discrete random structures and derive a closed-form expression for the induced distribution of the random partition, the fundamental tool regulating the clustering behavior of the model. On the one hand, this allows to gain a deeper insight into the theoretical properties of the model and, on the other hand, it yields an MCMC algorithm for evaluating Bayesian inferences of interest. Moreover, we single out limitations of this algorithm when working with more than two populations and, consequently, devise an alternative more efficient sampling scheme, which as a by-product, allows testing homogeneity between different populations. Finally, we perform a comparison with the nested Dirichlet process and provide illustrative examples of both synthetic and real data
Pronounced Genetic Structure in a Highly Mobile Coral Reef Fish, Caesio cuning, in the Coral Triangle
The redbelly yellowtail fusilier Caesio cuning has a tropical Indo-West Pacific range that straddles the Coral Triangle, a region of dynamic geological history and the highest marine biodiversity on the planet. Previous genetic studies in the Coral Triangle indicate the presence of multiple limits to connectivity. However, these studies have focused almost exclusively on benthic, reef-dwelling species. Schooling, reef-associated fusiliers (Perciformes: Caesionidae) account for a sizable portion of the annual reef catch in the Coral Triangle, yet to date, there have been no indepth studies on the population structure of fusiliers or other mid-water, reef-associated planktivores across this region. We evaluated the genetic population structure of C. cuning using a 382 bp segment of the mitochondrial control region amplified from over 620 fish sampled from 33 localities across the Philippines and Indonesia. Phylogeographic analysis showed that individuals sampled from sites in western Sumatra belong to a distinct Indian Ocean lineage, resulting in pronounced regional structure between western Sumatra and the rest of the Coral Triangle (φCT = 0.4796, p \u3c 0.004). We found additional significant population structure between central Southeast Asia and eastern Indonesia (φCT = 0.0450, p \u3c 0.001). These data in conjunction with spatial analyses indicate that there are 2 major lineages of C. cuning and at least 3 distinct management units across the region. The location of genetic breaks as well as the distribution of divergent haplotypes across our sampling range suggests that current oceanographic patterns could be contributing to observed patterns of structure
- …