13,845 research outputs found

    First CLADAG data mining prize : data mining for longitudinal data with different marketing campaigns

    Get PDF
    The CLAssification and Data Analysis Group (CLADAG) of the Italian Statistical Society recently organised a competition, the 'Young Researcher Data Mining Prize' sponsored by the SAS Institute. This paper was the winning entry and in it we detail our approach to the problem proposed and our results. The main methods used are linear regression, mixture models, Bayesian autoregressive and Bayesian dynamic models

    Robust Bayesian inference via coarsening

    Full text link
    The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure. We introduce a simple, coherent approach to Bayesian inference that improves robustness to perturbations from the model: rather than condition on the data exactly, one conditions on a neighborhood of the empirical distribution. When using neighborhoods based on relative entropy estimates, the resulting "coarsened" posterior can be approximated by simply tempering the likelihood---that is, by raising it to a fractional power---thus, inference is often easily implemented with standard methods, and one can even obtain analytical solutions when using conjugate priors. Some theoretical properties are derived, and we illustrate the approach with real and simulated data, using mixture models, autoregressive models of unknown order, and variable selection in linear regression

    Bayesian analysis of mixture autoregressive models covering the complete parameter space

    Get PDF
    Mixture autoregressive (MAR) models provide a flexible way to model time series with predictive distributions which depend on the recent history of the process and are able to accommodate asymmetry and multimodality. Bayesian inference for such models offers the additional advantage of incorporating the uncertainty in the estimated models into the predictions. We introduce a new way of sampling from the posterior distribution of the parameters of MAR models which allows for covering the complete parameter space of the models, unlike previous approaches. We also propose a relabelling algorithm to deal a posteriori with label switching. We apply our new method to simulated and real datasets, discuss the accuracy and performance of our new method, as well as its advantages over previous studies. The idea of density forecasting using MCMC output is also introduced.Comment: 27 pages, 10 figures, 4 table

    A Simple Class of Bayesian Nonparametric Autoregression Models

    Get PDF
    We introduce a model for a time series of continuous outcomes, that can be expressed as fully nonparametric regression or density regression on lagged terms. The model is based on a dependent Dirichlet process prior on a family of random probability measures indexed by the lagged covariates. The approach is also extended to sequences of binary responses. We discuss implementation and applications of the models to a sequence of waiting times between eruptions of the Old Faithful Geyser, and to a dataset consisting of sequences of recurrence indicators for tumors in the bladder of several patients.MIUR 2008MK3AFZFONDECYT 1100010NIH/NCI R01CA075981Mathematic

    Nonparametric Bayesian multiple testing for longitudinal performance stratification

    Full text link
    This paper describes a framework for flexible multiple hypothesis testing of autoregressive time series. The modeling approach is Bayesian, though a blend of frequentist and Bayesian reasoning is used to evaluate procedures. Nonparametric characterizations of both the null and alternative hypotheses will be shown to be the key robustification step necessary to ensure reasonable Type-I error performance. The methodology is applied to part of a large database containing up to 50 years of corporate performance statistics on 24,157 publicly traded American companies, where the primary goal of the analysis is to flag companies whose historical performance is significantly different from that expected due to chance.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS252 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Bayesian Nonparametric Markovian Model for Nonstationary Time Series

    Full text link
    Stationary time series models built from parametric distributions are, in general, limited in scope due to the assumptions imposed on the residual distribution and autoregression relationship. We present a modeling approach for univariate time series data, which makes no assumptions of stationarity, and can accommodate complex dynamics and capture nonstandard distributions. The model for the transition density arises from the conditional distribution implied by a Bayesian nonparametric mixture of bivariate normals. This implies a flexible autoregressive form for the conditional transition density, defining a time-homogeneous, nonstationary, Markovian model for real-valued data indexed in discrete-time. To obtain a more computationally tractable algorithm for posterior inference, we utilize a square-root-free Cholesky decomposition of the mixture kernel covariance matrix. Results from simulated data suggest the model is able to recover challenging transition and predictive densities. We also illustrate the model on time intervals between eruptions of the Old Faithful geyser. Extensions to accommodate higher order structure and to develop a state-space model are also discussed

    A Spatio-Temporal Point Process Model for Ambulance Demand

    Full text link
    Ambulance demand estimation at fine time and location scales is critical for fleet management and dynamic deployment. We are motivated by the problem of estimating the spatial distribution of ambulance demand in Toronto, Canada, as it changes over discrete 2-hour intervals. This large-scale dataset is sparse at the desired temporal resolutions and exhibits location-specific serial dependence, daily and weekly seasonality. We address these challenges by introducing a novel characterization of time-varying Gaussian mixture models. We fix the mixture component distributions across all time periods to overcome data sparsity and accurately describe Toronto's spatial structure, while representing the complex spatio-temporal dynamics through time-varying mixture weights. We constrain the mixture weights to capture weekly seasonality, and apply a conditionally autoregressive prior on the mixture weights of each component to represent location-specific short-term serial dependence and daily seasonality. While estimation may be performed using a fixed number of mixture components, we also extend to estimate the number of components using birth-and-death Markov chain Monte Carlo. The proposed model is shown to give higher statistical predictive accuracy and to reduce the error in predicting EMS operational performance by as much as two-thirds compared to a typical industry practice
    corecore