Search CORE

673 research outputs found

Clustering South African households based on their asset status using latent variable models

Author: Clark Samuel J.
Collinson Mark A.
Gormley Isobel Claire
Kabudula Chodziwadziwa Whiteson
McCormick Tyler H.
McParland Damien
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 31/07/2014
Field of study

The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure - this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS726 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

A sparse multinomial probit model for classification

Author: Ding Y.
Harrison R.F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2011
Field of study

A recent development in penalized probit modelling using a hierarchical Bayesian approach has led to a sparse binomial (two-class) probit classifier that can be trained via an EM algorithm. A key advantage of the formulation is that no tuning of hyperparameters relating to the penalty is needed thus simplifying the model selection process. The resulting model demonstrates excellent classification performance and a high degree of sparsity when used as a kernel machine. It is, however, restricted to the binary classification problem and can only be used in the multinomial situation via a one-against-all or one-against-many strategy. To overcome this, we apply the idea to the multinomial probit model. This leads to a direct multi-classification approach and is shown to give a sparse solution with accuracy and sparsity comparable with the current state-of-the-art. Comparative numerical benchmark examples are used to demonstrate the method

White Rose Research Online

Forecasting adoption of ultra-low-emission vehicles using the GHK simulator and bayes estimates of a multinomial probit model

Author: Achtnicht Martin
Daziano Ricardo A.
Publication venue
Publication date: 01/01/2012
Field of study

In this paper we use Bayes estimates of a multinomial probit model with fully exible substitution patterns to forecast consumer response to ultra-low-emission vehicles. In this empirical application of the probit Gibbs sampler, we use statedpreference data on vehicle choice from a Germany-wide survey of potential lightduty-vehicle buyers using computer-assisted personal interviewing. We show that Bayesian estimation of a multinomial probit model with a full covariance matrix is feasible for this medium-scale problem. Using the posterior distribution of the parameters of the vehicle choice model as well as the GHK simulator we derive the choice probabilities of the different alternatives. We first show that the Bayes point estimates of the market shares reproduce the observed values. Then, we define a base scenario of vehicle attributes that aims at representing an average of the current vehicle choice situation in Germany. Consumer response to qualitative changes in the base scenario is subsequently studied. In particular, we analyze the effect of increasing the network of service stations for charging electric vehicles as well as for refueling hydrogen. The result is the posterior distribution of the choice probabilities that represent adoption of the energy-efficient technologies

MAnnheim DOCument Server

Bayesian Estimation of Mixed Multinomial Logit Models: Advances and Simulation-Based Evaluations

Author: Bansal Prateek
Bierlaire Michel
Daziano Ricardo A.
Krueger Rico
Rashidi Taha H.
Publication venue: 'Elsevier BV'
Publication date: 12/12/2019
Field of study

Variational Bayes (VB) methods have emerged as a fast and computationally-efficient alternative to Markov chain Monte Carlo (MCMC) methods for scalable Bayesian estimation of mixed multinomial logit (MMNL) models. It has been established that VB is substantially faster than MCMC at practically no compromises in predictive accuracy. In this paper, we address two critical gaps concerning the usage and understanding of VB for MMNL. First, extant VB methods are limited to utility specifications involving only individual-specific taste parameters. Second, the finite-sample properties of VB estimators and the relative performance of VB, MCMC and maximum simulated likelihood estimation (MSLE) are not known. To address the former, this study extends several VB methods for MMNL to admit utility specifications including both fixed and random utility parameters. To address the latter, we conduct an extensive simulation-based evaluation to benchmark the extended VB methods against MCMC and MSLE in terms of estimation times, parameter recovery and predictive accuracy. The results suggest that all VB variants with the exception of the ones relying on an alternative variational lower bound constructed with the help of the modified Jensen's inequality perform as well as MCMC and MSLE at prediction and parameter recovery. In particular, VB with nonconjugate variational message passing and the delta-method (VB-NCVMP-Delta) is up to 16 times faster than MCMC and MSLE. Thus, VB-NCVMP-Delta can be an attractive alternative to MCMC and MSLE for fast, scalable and accurate estimation of MMNL models

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Bayesian learning of noisy Markov decision processes

Author: Chopin Nicolas
Singh Sumeetpal S.
Whiteley Nick
Publication venue
Publication date: 26/11/2012
Field of study

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller

arXiv.org e-Print Archive

Explore Bristol Research

HAL-Polytechnique

Sequential Monte Carlo EM for multivariate probit models

Author: Kuipers Jack
Moffa Giusi
Publication venue: 'Elsevier BV'
Publication date: 14/11/2013
Field of study

Multivariate probit models (MPM) have the appealing feature of capturing some of the dependence structure between the components of multidimensional binary responses. The key for the dependence modelling is the covariance matrix of an underlying latent multivariate Gaussian. Most approaches to MLE in multivariate probit regression rely on MCEM algorithms to avoid computationally intensive evaluations of multivariate normal orthant probabilities. As an alternative to the much used Gibbs sampler a new SMC sampler for truncated multivariate normals is proposed. The algorithm proceeds in two stages where samples are first drawn from truncated multivariate Student

t

distributions and then further evolved towards a Gaussian. The sampler is then embedded in a MCEM algorithm. The sequential nature of SMC methods can be exploited to design a fully sequential version of the EM, where the samples are simply updated from one iteration to the next rather than resampled from scratch. Recycling the samples in this manner significantly reduces the computational cost. An alternative view of the standard conditional maximisation step provides the basis for an iterative procedure to fully perform the maximisation needed in the EM algorithm. The identifiability of MPM is also thoroughly discussed. In particular, the likelihood invariance can be embedded in the EM algorithm to ensure that constrained and unconstrained maximisation are equivalent. A simple iterative procedure is then derived for either maximisation which takes effectively no computational time. The method is validated by applying it to the widely analysed Six Cities dataset and on a higher dimensional simulated example. Previous approaches to the Six Cities overly restrict the parameter space but, by considering the correct invariance, the maximum likelihood is quite naturally improved when treating the full unrestricted model.Comment: 26 pages, 2 figures. In press, Computational Statistics & Data Analysi

arXiv.org e-Print Archive

University of Regensburg Publication Server

Crossref

edoc