Search CORE

8,883 research outputs found

parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics

Author: Conlon Erin
Miroshnikov Alexey
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/09/2014
Field of study

Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for large data sets that are only large due to large sample sizes; these methods partition big data sets into subsets, and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications, and will assist future progress in this rapidly developing field.Comment: for published version see: http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0108425&representation=PD

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PubMed Central

Asymptotically Exact, Embarrassingly Parallel MCMC

Author: Neiswanger Willie
Wang Chong
Xing Eric
Publication venue
Publication date: 01/01/2014
Field of study

Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, any classical MCMC method (e.g., Gibbs sampling) may be used to draw samples from a posterior distribution given the data subset. Finally, the samples from each machine are combined to form samples from the full posterior. This embarrassingly parallel algorithm allows each machine to act independently on a subset of the data (without communication) until the final combination stage. We prove that our algorithm generates asymptotically exact samples and empirically demonstrate its ability to parallelize burn-in and sampling in several models

arXiv.org e-Print Archive

CiteSeerX

Semiparametric Multinomial Logit Models for Analysing Consumer Choice Behaviour

Author: Baumgartner Bernhard
Kneib Thomas
Steiner Winfried J.
Publication venue
Publication date: 01/01/2006
Field of study

The multinomial logit model (MNL) is one of the most frequently used statistical models in marketing applications. It allows to relate an unordered categorical response variable, for example representing the choice of a brand, to a vector of covariates such as the price of the brand or variables characterising the consumer. In its classical form, all covariates enter in strictly parametric, linear form into the utility function of the MNL model. In this paper, we introduce semiparametric extensions, where smooth effects of continuous covariates are modelled by penalised splines. A mixed model representation of these penalised splines is employed to obtain estimates of the corresponding smoothing parameters, leading to a fully automated estimation procedure. To validate semiparametric models against parametric models, we utilise proper scoring rules and compare parametric and semiparametric approaches for a number of brand choice data sets

Open Access LMU

Semiparametric posterior limits

Author: Kleijn B. J. K.
Publication venue
Publication date: 21/05/2013
Field of study

We review the Bayesian theory of semiparametric inference following Bickel and Kleijn (2012) and Kleijn and Knapik (2013). After an overview of efficiency in parametric and semiparametric estimation problems, we consider the Bernstein-von Mises theorem (see, e.g., Le Cam and Yang (1990)) and generalize it to (LAN) regular and (LAE) irregular semiparametric estimation problems. We formulate a version of the semiparametric Bernstein-von Mises theorem that does not depend on least-favourable submodels, thus bypassing the most restrictive condition in the presentation of Bickel and Kleijn (2012). The results are applied to the (regular) estimation of the linear coefficient in partial linear regression (with a Gaussian nuisance prior) and of the kernel bandwidth in a model of normal location mixtures (with a Dirichlet nuisance prior), as well as the (irregular) estimation of the boundary of the support of a monotone family of densities (with a Gaussian nuisance prior).Comment: 47 pp., 1 figure, submitted for publication. arXiv admin note: substantial text overlap with arXiv:1007.017

arXiv.org e-Print Archive

CiteSeerX

Semiparametric theory and empirical processes in causal inference

Author: A. Belloni
A. Belloni
A.A. Tsiatis
A.R. Luedtke
A.W. Vaart van der
A.W. Vaart van der
A.W. Vaart van der
A.W. Vaart van der
B. Chakraborty
C.F. Manski
D. Pollard
D. Pollard
D.B. Rubin
D.B. Rubin
D.W.K. Andrews
D.W.K. Andrews
E. Tchetgen
E. Tchetgen
E. Tchetgen
E.H. Kennedy
E.L. Ogburn
G.R. Shorack
I. Diaz
I. Diaz
J. Hahn
J. Pearl
J. Pearl
J. Pfanzagl
J. Pfanzagl
J.D. Angrist
J.L. Horowitz
J.M. Begun
J.M. Robins
J.M. Robins
J.M. Robins
J.M. Robins
J.M. Robins
J.M. Robins
L.A. Stefanski
M. Carone
M.A. Hernan
M.G. Hudgens
M.J. Laan van der
M.J. Laan van der
M.J. Laan van der
M.J. Laan van der
M.J. Laan van der
M.J. Laan van der
M.J. Laan van der
M.J. Laan van der
M.L. Petersen
M.R. Kosorok
P.A. Dawid
P.J. Bickel
R. Neugebauer
R.D. Gill
S. Rose
S.A. Murphy
T.J. VanderWeele
T.J. VanderWeele
T.J. VanderWeele
T.J. VanderWeele
W. Zheng
W.K. Newey
W.K. Newey
Publication venue
Publication date: 22/07/2016
Field of study

In this paper we review important aspects of semiparametric theory and empirical processes that arise in causal inference problems. We begin with a brief introduction to the general problem of causal inference, and go on to discuss estimation and inference for causal effects under semiparametric models, which allow parts of the data-generating process to be unrestricted if they are not of particular interest (i.e., nuisance functions). These models are very useful in causal problems because the outcome process is often complex and difficult to model, and there may only be information available about the treatment process (at best). Semiparametric theory gives a framework for benchmarking efficiency and constructing estimators in such settings. In the second part of the paper we discuss empirical process theory, which provides powerful tools for understanding the asymptotic behavior of semiparametric estimators that depend on flexible nonparametric estimators of nuisance functions. These tools are crucial for incorporating machine learning and other modern methods into causal inference analyses. We conclude by examining related extensions and future directions for work in semiparametric causal inference

arXiv.org e-Print Archive

Crossref