13,845 research outputs found
First CLADAG data mining prize : data mining for longitudinal data with different marketing campaigns
The CLAssification and Data Analysis Group (CLADAG) of the Italian
Statistical Society recently organised a competition, the 'Young Researcher Data
Mining Prize' sponsored by the SAS Institute. This paper was the winning entry
and in it we detail our approach to the problem proposed and our results. The main
methods used are linear regression, mixture models, Bayesian autoregressive and
Bayesian dynamic models
Robust Bayesian inference via coarsening
The standard approach to Bayesian inference is based on the assumption that
the distribution of the data belongs to the chosen model class. However, even a
small violation of this assumption can have a large impact on the outcome of a
Bayesian procedure. We introduce a simple, coherent approach to Bayesian
inference that improves robustness to perturbations from the model: rather than
condition on the data exactly, one conditions on a neighborhood of the
empirical distribution. When using neighborhoods based on relative entropy
estimates, the resulting "coarsened" posterior can be approximated by simply
tempering the likelihood---that is, by raising it to a fractional power---thus,
inference is often easily implemented with standard methods, and one can even
obtain analytical solutions when using conjugate priors. Some theoretical
properties are derived, and we illustrate the approach with real and simulated
data, using mixture models, autoregressive models of unknown order, and
variable selection in linear regression
Bayesian analysis of mixture autoregressive models covering the complete parameter space
Mixture autoregressive (MAR) models provide a flexible way to model time
series with predictive distributions which depend on the recent history of the
process and are able to accommodate asymmetry and multimodality. Bayesian
inference for such models offers the additional advantage of incorporating the
uncertainty in the estimated models into the predictions. We introduce a new
way of sampling from the posterior distribution of the parameters of MAR models
which allows for covering the complete parameter space of the models, unlike
previous approaches. We also propose a relabelling algorithm to deal a
posteriori with label switching. We apply our new method to simulated and real
datasets, discuss the accuracy and performance of our new method, as well as
its advantages over previous studies. The idea of density forecasting using
MCMC output is also introduced.Comment: 27 pages, 10 figures, 4 table
A Simple Class of Bayesian Nonparametric Autoregression Models
We introduce a model for a time series of continuous outcomes, that can be expressed as fully nonparametric regression or density regression on lagged terms. The model is based on a dependent Dirichlet process prior on a family of random probability measures indexed by the lagged covariates. The approach is also extended to sequences of binary responses. We discuss implementation and applications of the models to a sequence of waiting times between eruptions of the Old Faithful Geyser, and to a dataset consisting of sequences of recurrence indicators for tumors in the bladder of several patients.MIUR 2008MK3AFZFONDECYT 1100010NIH/NCI R01CA075981Mathematic
Nonparametric Bayesian multiple testing for longitudinal performance stratification
This paper describes a framework for flexible multiple hypothesis testing of
autoregressive time series. The modeling approach is Bayesian, though a blend
of frequentist and Bayesian reasoning is used to evaluate procedures.
Nonparametric characterizations of both the null and alternative hypotheses
will be shown to be the key robustification step necessary to ensure reasonable
Type-I error performance. The methodology is applied to part of a large
database containing up to 50 years of corporate performance statistics on
24,157 publicly traded American companies, where the primary goal of the
analysis is to flag companies whose historical performance is significantly
different from that expected due to chance.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS252 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Bayesian Nonparametric Markovian Model for Nonstationary Time Series
Stationary time series models built from parametric distributions are, in
general, limited in scope due to the assumptions imposed on the residual
distribution and autoregression relationship. We present a modeling approach
for univariate time series data, which makes no assumptions of stationarity,
and can accommodate complex dynamics and capture nonstandard distributions. The
model for the transition density arises from the conditional distribution
implied by a Bayesian nonparametric mixture of bivariate normals. This implies
a flexible autoregressive form for the conditional transition density, defining
a time-homogeneous, nonstationary, Markovian model for real-valued data indexed
in discrete-time. To obtain a more computationally tractable algorithm for
posterior inference, we utilize a square-root-free Cholesky decomposition of
the mixture kernel covariance matrix. Results from simulated data suggest the
model is able to recover challenging transition and predictive densities. We
also illustrate the model on time intervals between eruptions of the Old
Faithful geyser. Extensions to accommodate higher order structure and to
develop a state-space model are also discussed
A Spatio-Temporal Point Process Model for Ambulance Demand
Ambulance demand estimation at fine time and location scales is critical for
fleet management and dynamic deployment. We are motivated by the problem of
estimating the spatial distribution of ambulance demand in Toronto, Canada, as
it changes over discrete 2-hour intervals. This large-scale dataset is sparse
at the desired temporal resolutions and exhibits location-specific serial
dependence, daily and weekly seasonality. We address these challenges by
introducing a novel characterization of time-varying Gaussian mixture models.
We fix the mixture component distributions across all time periods to overcome
data sparsity and accurately describe Toronto's spatial structure, while
representing the complex spatio-temporal dynamics through time-varying mixture
weights. We constrain the mixture weights to capture weekly seasonality, and
apply a conditionally autoregressive prior on the mixture weights of each
component to represent location-specific short-term serial dependence and daily
seasonality. While estimation may be performed using a fixed number of mixture
components, we also extend to estimate the number of components using
birth-and-death Markov chain Monte Carlo. The proposed model is shown to give
higher statistical predictive accuracy and to reduce the error in predicting
EMS operational performance by as much as two-thirds compared to a typical
industry practice
- …