102,435 research outputs found
Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes
In this paper we propose a Bayesian nonparametric model for clustering
partial ranking data. We start by developing a Bayesian nonparametric extension
of the popular Plackett-Luce choice model that can handle an infinite number of
choice items. Our framework is based on the theory of random atomic measures,
with the prior specified by a completely random measure. We characterise the
posterior distribution given data, and derive a simple and effective Gibbs
sampler for posterior simulation. We then develop a Dirichlet process mixture
extension of our model and apply it to investigate the clustering of
preferences for college degree programmes amongst Irish secondary school
graduates. The existence of clusters of applicants who have similar preferences
for degree programmes is established and we determine that subject matter and
geographical location of the third level institution characterise these
clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian econometrics:conjugate analysis and rejection sampling using mathematica
Mathematica is a powerful "system for doing mathematics by computer" which runs on personal computers (Macs and MS-DOS machines), workstations and mainframes. Here we show how Bayesian methods can be implemented in Mathematica. One of the drawbacks of Bayesian techniques is that they are computation-intensive, and every computation is a little different. Since Mathematica is so flexible, it can easily be adapted to solving a number of different Bayesian estimation problems. We illustrate the use of Mathematica functions (i) in a traditional conjugate analysis of the linear regression model and (ii) in a completely nonstandard model -where rejection sampling is used to sample from the posterior
parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
Recent advances in big data and analytics research have provided a wealth of
large data sets that are too big to be analyzed in their entirety, due to
restrictions on computer memory or storage size. New Bayesian methods have been
developed for large data sets that are only large due to large sample sizes;
these methods partition big data sets into subsets, and perform independent
Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then
combine the independent subset posterior samples to estimate a posterior
density given the full data set. These approaches were shown to be effective
for Bayesian models including logistic regression models, Gaussian mixture
models and hierarchical models. Here, we introduce the R package
parallelMCMCcombine which carries out four of these techniques for combining
independent subset posterior samples. We illustrate each of the methods using a
Bayesian logistic regression model for simulation data and a Bayesian Gamma
model for real data; we also demonstrate features and capabilities of the R
package. The package assumes the user has carried out the Bayesian analysis and
has produced the independent subposterior samples outside of the package. The
methods are primarily suited to models with unknown parameters of fixed
dimension that exist in continuous parameter spaces. We envision this tool will
allow researchers to explore the various methods for their specific
applications, and will assist future progress in this rapidly developing field.Comment: for published version see:
http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0108425&representation=PD
A Fast Algorithm for Sampling from the Posterior of a von Mises distribution
Motivated by molecular biology, there has been an upsurge of research
activities in directional statistics in general and its Bayesian aspect in
particular. The central distribution for the circular case is von Mises
distribution which has two parameters (mean and concentration) akin to the
univariate normal distribution. However, there has been a challenge to sample
efficiently from the posterior distribution of the concentration parameter. We
describe a novel, highly efficient algorithm to sample from the posterior
distribution and fill this long-standing gap
Improving Classification When a Class Hierarchy is Available Using a Hierarchy-Based Prior
We introduce a new method for building classification models when we have
prior knowledge of how the classes can be arranged in a hierarchy, based on how
easily they can be distinguished. The new method uses a Bayesian form of the
multinomial logit (MNL, a.k.a. ``softmax'') model, with a prior that introduces
correlations between the parameters for classes that are nearby in the tree. We
compare the performance on simulated data of the new method, the ordinary MNL
model, and a model that uses the hierarchy in different way. We also test the
new method on a document labelling problem, and find that it performs better
than the other methods, particularly when the amount of training data is small
Bayesian inference and non-linear extensions of the CIRCE method for quantifying the uncertainty of closure relationships integrated into thermal-hydraulic system codes
Uncertainty Quantification of closure relationships integrated into
thermal-hydraulic system codes is a critical prerequisite in applying the
Best-Estimate Plus Uncertainty (BEPU) methodology for nuclear safety and
licensing processes.The purpose of the CIRCE method is to estimate the
(log)-Gaussian probability distribution of a multiplicative factor applied to a
reference closure relationship in order to assess its uncertainty. Even though
this method has been implemented with success in numerous physical scenarios,
it can still suffer from substantial limitations such as the linearity
assumption and the difficulty of properly taking into account the inherent
statistical uncertainty. In the paper, we will extend the CIRCE method in two
aspects. On the one hand, we adopt the Bayesian setting putting prior
probability distributions on the parameters of the (log)-Gaussian distribution.
The posterior distribution of the parameters is then computed with respect to
an experimental database by means of Markov Chain Monte Carlo (MCMC)
algorithms. On the other hand, we tackle the more general setting where the
simulations do not move linearly against the multiplicative factor(s). MCMC
algorithms then become time-prohibitive when the thermal-hydraulic simulations
exceed a few minutes. This handicap is overcome by using Gaussian process (GP)
emulators which can yield both reliable and fast predictions of the
simulations. The GP-based MCMC algorithms will be applied to quantify the
uncertainty of two condensation closure relationships at a safety injection
with respect to a database of experimental tests. The thermal-hydraulic
simulations will be run with the CATHARE 2 computer code.Comment: 37 pages, 5 figure
- …