102,435 research outputs found

    Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes

    Full text link
    In this paper we propose a Bayesian nonparametric model for clustering partial ranking data. We start by developing a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a completely random measure. We characterise the posterior distribution given data, and derive a simple and effective Gibbs sampler for posterior simulation. We then develop a Dirichlet process mixture extension of our model and apply it to investigate the clustering of preferences for college degree programmes amongst Irish secondary school graduates. The existence of clusters of applicants who have similar preferences for degree programmes is established and we determine that subject matter and geographical location of the third level institution characterise these clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian econometrics:conjugate analysis and rejection sampling using mathematica

    Get PDF
    Mathematica is a powerful "system for doing mathematics by computer" which runs on personal computers (Macs and MS-DOS machines), workstations and mainframes. Here we show how Bayesian methods can be implemented in Mathematica. One of the drawbacks of Bayesian techniques is that they are computation-intensive, and every computation is a little different. Since Mathematica is so flexible, it can easily be adapted to solving a number of different Bayesian estimation problems. We illustrate the use of Mathematica functions (i) in a traditional conjugate analysis of the linear regression model and (ii) in a completely nonstandard model -where rejection sampling is used to sample from the posterior

    parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics

    Full text link
    Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for large data sets that are only large due to large sample sizes; these methods partition big data sets into subsets, and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications, and will assist future progress in this rapidly developing field.Comment: for published version see: http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0108425&representation=PD

    A Fast Algorithm for Sampling from the Posterior of a von Mises distribution

    Full text link
    Motivated by molecular biology, there has been an upsurge of research activities in directional statistics in general and its Bayesian aspect in particular. The central distribution for the circular case is von Mises distribution which has two parameters (mean and concentration) akin to the univariate normal distribution. However, there has been a challenge to sample efficiently from the posterior distribution of the concentration parameter. We describe a novel, highly efficient algorithm to sample from the posterior distribution and fill this long-standing gap

    Improving Classification When a Class Hierarchy is Available Using a Hierarchy-Based Prior

    Full text link
    We introduce a new method for building classification models when we have prior knowledge of how the classes can be arranged in a hierarchy, based on how easily they can be distinguished. The new method uses a Bayesian form of the multinomial logit (MNL, a.k.a. ``softmax'') model, with a prior that introduces correlations between the parameters for classes that are nearby in the tree. We compare the performance on simulated data of the new method, the ordinary MNL model, and a model that uses the hierarchy in different way. We also test the new method on a document labelling problem, and find that it performs better than the other methods, particularly when the amount of training data is small

    Bayesian inference and non-linear extensions of the CIRCE method for quantifying the uncertainty of closure relationships integrated into thermal-hydraulic system codes

    Full text link
    Uncertainty Quantification of closure relationships integrated into thermal-hydraulic system codes is a critical prerequisite in applying the Best-Estimate Plus Uncertainty (BEPU) methodology for nuclear safety and licensing processes.The purpose of the CIRCE method is to estimate the (log)-Gaussian probability distribution of a multiplicative factor applied to a reference closure relationship in order to assess its uncertainty. Even though this method has been implemented with success in numerous physical scenarios, it can still suffer from substantial limitations such as the linearity assumption and the difficulty of properly taking into account the inherent statistical uncertainty. In the paper, we will extend the CIRCE method in two aspects. On the one hand, we adopt the Bayesian setting putting prior probability distributions on the parameters of the (log)-Gaussian distribution. The posterior distribution of the parameters is then computed with respect to an experimental database by means of Markov Chain Monte Carlo (MCMC) algorithms. On the other hand, we tackle the more general setting where the simulations do not move linearly against the multiplicative factor(s). MCMC algorithms then become time-prohibitive when the thermal-hydraulic simulations exceed a few minutes. This handicap is overcome by using Gaussian process (GP) emulators which can yield both reliable and fast predictions of the simulations. The GP-based MCMC algorithms will be applied to quantify the uncertainty of two condensation closure relationships at a safety injection with respect to a database of experimental tests. The thermal-hydraulic simulations will be run with the CATHARE 2 computer code.Comment: 37 pages, 5 figure
    corecore