11 research outputs found
Bayesian nonparametric models for ranked data
We develop a Bayesian nonparametric extension of the popular Plackett-Luce
choice model that can handle an infinite number of choice items. Our framework
is based on the theory of random atomic measures, with the prior specified by a
gamma process. We derive a posterior characterization and a simple and
effective Gibbs sampler for posterior simulation. We develop a time-varying
extension of our model, and apply it to the New York Times lists of weekly
bestselling books.Comment: NIPS - Neural Information Processing Systems (2012
Bayesian nonparametrics for Sparse Dynamic Networks
We propose a Bayesian nonparametric prior for time-varying networks. To each
node of the network is associated a positive parameter, modeling the
sociability of that node. Sociabilities are assumed to evolve over time, and
are modeled via a dynamic point process model. The model is able to (a) capture
smooth evolution of the interaction between nodes, allowing edges to
appear/disappear over time (b) capture long term evolution of the sociabilities
of the nodes (c) and yield sparse graphs, where the number of edges grows
subquadratically with the number of nodes. The evolution of the sociabilities
is described by a tractable time-varying gamma process. We provide some
theoretical insights into the model and apply it to three real world datasets.Comment: 10 pages, 8 figure
Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling
The beta-negative binomial process (BNBP), an integer-valued stochastic
process, is employed to partition a count vector into a latent random count
matrix. As the marginal probability distribution of the BNBP that governs the
exchangeable random partitions of grouped data has not yet been developed,
current inference for the BNBP has to truncate the number of atoms of the beta
process. This paper introduces an exchangeable partition probability function
to explicitly describe how the BNBP clusters the data points of each group into
a random number of exchangeable partitions, which are shared across all the
groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a
novel nonparametric Bayesian topic model that is distinct from existing ones,
with simple implementation, fast convergence, good mixing, and state-of-the-art
predictive performance.Comment: in Neural Information Processing Systems (NIPS) 2014. 9 pages + 3
page appendi
A Voting-Based System for Ethical Decision Making
We present a general approach to automating ethical decisions, drawing on
machine learning and computational social choice. In a nutshell, we propose to
learn a model of societal preferences, and, when faced with a specific ethical
dilemma at runtime, efficiently aggregate those preferences to identify a
desirable choice. We provide a concrete algorithm that instantiates our
approach; some of its crucial steps are informed by a new theory of
swap-dominance efficient voting rules. Finally, we implement and evaluate a
system for ethical decision making in the autonomous vehicle domain, using
preference data collected from 1.3 million people through the Moral Machine
website.Comment: 25 pages; paper has been reorganized, related work and discussion
sections have been expande
Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes
In this paper we propose a Bayesian nonparametric model for clustering
partial ranking data. We start by developing a Bayesian nonparametric extension
of the popular Plackett-Luce choice model that can handle an infinite number of
choice items. Our framework is based on the theory of random atomic measures,
with the prior specified by a completely random measure. We characterise the
posterior distribution given data, and derive a simple and effective Gibbs
sampler for posterior simulation. We then develop a Dirichlet process mixture
extension of our model and apply it to investigate the clustering of
preferences for college degree programmes amongst Irish secondary school
graduates. The existence of clusters of applicants who have similar preferences
for degree programmes is established and we determine that subject matter and
geographical location of the third level institution characterise these
clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An R package for analyzing and modeling ranking data
published_or_final_versio
A review on competing risks methods for survival analysis
When modelling competing risks survival data, several techniques have been
proposed in both the statistical and machine learning literature.
State-of-the-art methods have extended classical approaches with more flexible
assumptions that can improve predictive performance, allow high dimensional
data and missing values, among others. Despite this, modern approaches have not
been widely employed in applied settings. This article aims to aid the uptake
of such methods by providing a condensed compendium of competing risks survival
methods with a unified notation and interpretation across approaches. We
highlight available software and, when possible, demonstrate their usage via
reproducible R vignettes. Moreover, we discuss two major concerns that can
affect benchmark studies in this context: the choice of performance metrics and
reproducibility.Comment: 22 pages, 2 table
Predictive inference with Fleming--Viot-driven dependent Dirichlet processes
We consider predictive inference using a class of temporally dependent
Dirichlet processes driven by Fleming--Viot diffusions, which have a natural
bearing in Bayesian nonparametrics and lend the resulting family of random
probability measures to analytical posterior analysis. Formulating the implied
statistical model as a hidden Markov model, we fully describe the predictive
distribution induced by these Fleming--Viot-driven dependent Dirichlet
processes, for a sequence of observations collected at a certain time given
another set of draws collected at several previous times. This is identified as
a mixture of P\'olya urns, whereby the observations can be values from the
baseline distribution or copies of previous draws collected at the same time as
in the usual P\`olya urn, or can be sampled from a random subset of the data
collected at previous times. We characterise the time-dependent weights of the
mixture which select such subsets and discuss the asymptotic regimes. We
describe the induced partition by means of a Chinese restaurant process
metaphor with a conveyor belt, whereby new customers who do not sit at an
occupied table open a new table by picking a dish either from the baseline
distribution or from a time-varying offer available on the conveyor belt. We
lay out explicit algorithms for exact and approximate posterior sampling of
both observations and partitions, and illustrate our results on predictive
problems with synthetic and real data.Comment: 30 pages, 8 figure