Search CORE

1,089 research outputs found

Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes

Author: Caron François
Murphy Thomas Brendan
Teh Yee Whye
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

In this paper we propose a Bayesian nonparametric model for clustering partial ranking data. We start by developing a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a completely random measure. We characterise the posterior distribution given data, and derive a simple and effective Gibbs sampler for posterior simulation. We then develop a Dirichlet process mixture extension of our model and apply it to investigate the clustering of preferences for college degree programmes amongst Irish secondary school graduates. The existence of clusters of applicants who have similar preferences for degree programmes is established and we determine that subject matter and geographical location of the third level institution characterise these clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Oxford University Research Archive

Particle Gibbs Split-Merge Sampling for Bayesian Inference in Mixture Models

Author: Bouchard-Côté Alexandre
Doucet Arnaud
Roth Andrew
Publication venue
Publication date: 01/01/2017
Field of study

This paper presents a new Markov chain Monte Carlo method to sample from the posterior distribution of conjugate mixture models. This algorithm relies on a flexible split-merge procedure built using the particle Gibbs sampler. Contrary to available split-merge procedures, the resulting so-called Particle Gibbs Split-Merge sampler does not require the computation of a complex acceptance ratio, is simple to implement using existing sequential Monte Carlo libraries and can be parallelized. We investigate its performance experimentally on synthetic problems as well as on geolocation and cancer genomics data. In all these examples, the particle Gibbs split-merge sampler outperforms state-of-the-art split-merge methods by up to an order of magnitude for a fixed computational complexity

arXiv.org e-Print Archive

Oxford University Research Archive

Estimating Discrete Markov Models From Various Incomplete Data Schemes

Author: Alberto Pasanisi
Allman
Andrieu
Andrieu
Andrieu
Ang
Atchadé
Bartholomew
Basile
Brooks
Bukowski
Carter
Cochran
Cole
Craiu
Cronvall
Deltout
Duckstein
Dupuis
Dupuis
El Ghaoui
El-Nashar
Früwirth-Schnatter
Fuertes
Fuh
Gelman
Genest
Genest
Gentleman
Gilks
Girard
Gouno
Grimshaw
Grinstead
Haario
Huard
Ischwaran
Kalbfleisch
Kim
Lassen
Lawless
Lee
Little
Little
MacRae
Marin
Marshall
Mira
Nelsen
Nicolas Bousquet
Nikoloulopoulos
Nott
Parent
Pasanisi
Pollard
Puolamäki
Robert
Robert
Roberts
Roberts
Roberts
Roh
Rosenthal
Rubin
Sargent
Sendi
Shuai Fu
Tuyl
Urakabe
Vihola
Vogel
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

The parameters of a discrete stationary Markov model are transition probabilities between states. Traditionally, data consist in sequences of observed states for a given number of individuals over the whole observation period. In such a case, the estimation of transition probabilities is straightforwardly made by counting one-step moves from a given state to another. In many real-life problems, however, the inference is much more difficult as state sequences are not fully observed, namely the state of each individual is known only for some given values of the time variable. A review of the problem is given, focusing on Monte Carlo Markov Chain (MCMC) algorithms to perform Bayesian inference and evaluate posterior distributions of the transition probabilities in this missing-data framework. Leaning on the dependence between the rows of the transition matrix, an adaptive MCMC mechanism accelerating the classical Metropolis-Hastings algorithm is then proposed and empirically studied.Comment: 26 pages - preprint accepted in 20th February 2012 for publication in Computational Statistics and Data Analysis (please cite the journal's paper

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Premium: An R package for profile regression mixture models using dirichlet processes

Author: Azizi L
Hastie DI
Liverani S
Papathomas M
Richardson S
Publication venue: 'Informa UK Limited'
Publication date: 25/04/2014
Field of study

PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, nonparametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Journal of Statistical Software

Brunel University Research Archive

University of St. Andrews - Pure

St Andrews Research Repository

Beta-Product Poisson-Dirichlet Processes

Author: Bassetti Federico
Casarin Roberto
Leisen Fabrizio
Publication venue
Publication date: 01/09/2011
Field of study

Time series data may exhibit clustering over time and, in a multiple time series context, the clustering behavior may differ across the series. This paper is motivated by the Bayesian non--parametric modeling of the dependence between the clustering structures and the distributions of different time series. We follow a Dirichlet process mixture approach and introduce a new class of multivariate dependent Dirichlet processes (DDP). The proposed DDP are represented in terms of vector of stick-breaking processes with dependent weights. The weights are beta random vectors that determine different and dependent clustering effects along the dimension of the DDP vector. We discuss some theoretical properties and provide an efficient Monte Carlo Markov Chain algorithm for posterior computation. The effectiveness of the method is illustrated with a simulation study and an application to the United States and the European Union industrial production indexes

arXiv.org e-Print Archive

CiteSeerX

Universidad Carlos III de Madrid e-Archivo

Dirichlet Process Hidden Markov Multiple Change-point Model

Author: Chong Terence T. L.
Ghosh Pulak
Ko Stanley I. M.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 07/08/2014
Field of study

This paper proposes a new Bayesian multiple change-point model which is based on the hidden Markov approach. The Dirichlet process hidden Markov model does not require the specification of the number of change-points a priori. Hence our model is robust to model specification in contrast to the fully parametric Bayesian model. We propose a general Markov chain Monte Carlo algorithm which only needs to sample the states around change-points. Simulations for a normal mean-shift model with known and unknown variance demonstrate advantages of our approach. Two applications, namely the coal-mining disaster data and the real United States Gross Domestic Product growth, are provided. We detect a single change-point for both the disaster data and US GDP growth. All the change-point locations and posterior inferences of the two applications are in line with existing methods.Comment: Published at http://dx.doi.org/10.1214/14-BA910 in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/

arXiv.org e-Print Archive

Munich RePEc Personal Archive

Scaling Nonparametric Bayesian Inference via Subsample-Annealing

Author: Glidden Jonathan
Jonas Eric
Obermeyer Fritz
Publication venue
Publication date: 21/02/2014
Field of study

We describe an adaptation of the simulated annealing algorithm to nonparametric clustering and related probabilistic models. This new algorithm learns nonparametric latent structure over a growing and constantly churning subsample of training data, where the portion of data subsampled can be interpreted as the inverse temperature beta(t) in an annealing schedule. Gibbs sampling at high temperature (i.e., with a very small subsample) can more quickly explore sketches of the final latent state by (a) making longer jumps around latent space (as in block Gibbs) and (b) lowering energy barriers (as in simulated annealing). We prove subsample annealing speeds up mixing time N^2 -> N in a simple clustering model and exp(N) -> N in another class of models, where N is data size. Empirically subsample-annealing outperforms naive Gibbs sampling in accuracy-per-wallclock time, and can scale to larger datasets and deeper hierarchical models. We demonstrate improved inference on million-row subsamples of US Census data and network log data and a 307-row hospital rating dataset, using a Pitman-Yor generalization of the Cross Categorization model.Comment: To appear in AISTATS 201

arXiv.org e-Print Archive

CiteSeerX