19,577 research outputs found
Nonparametric Bayesian multiple testing for longitudinal performance stratification
This paper describes a framework for flexible multiple hypothesis testing of
autoregressive time series. The modeling approach is Bayesian, though a blend
of frequentist and Bayesian reasoning is used to evaluate procedures.
Nonparametric characterizations of both the null and alternative hypotheses
will be shown to be the key robustification step necessary to ensure reasonable
Type-I error performance. The methodology is applied to part of a large
database containing up to 50 years of corporate performance statistics on
24,157 publicly traded American companies, where the primary goal of the
analysis is to flag companies whose historical performance is significantly
different from that expected due to chance.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS252 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Beta-Product Poisson-Dirichlet Processes
Time series data may exhibit clustering over time and, in a multiple time
series context, the clustering behavior may differ across the series. This
paper is motivated by the Bayesian non--parametric modeling of the dependence
between the clustering structures and the distributions of different time
series. We follow a Dirichlet process mixture approach and introduce a new
class of multivariate dependent Dirichlet processes (DDP). The proposed DDP are
represented in terms of vector of stick-breaking processes with dependent
weights. The weights are beta random vectors that determine different and
dependent clustering effects along the dimension of the DDP vector. We discuss
some theoretical properties and provide an efficient Monte Carlo Markov Chain
algorithm for posterior computation. The effectiveness of the method is
illustrated with a simulation study and an application to the United States and
the European Union industrial production indexes
Effect fusion using model-based clustering
In social and economic studies many of the collected variables are measured
on a nominal scale, often with a large number of categories. The definition of
categories is usually not unambiguous and different classification schemes
using either a finer or a coarser grid are possible. Categorisation has an
impact when such a variable is included as covariate in a regression model: a
too fine grid will result in imprecise estimates of the corresponding effects,
whereas with a too coarse grid important effects will be missed, resulting in
biased effect estimates and poor predictive performance.
To achieve automatic grouping of levels with essentially the same effect, we
adopt a Bayesian approach and specify the prior on the level effects as a
location mixture of spiky normal components. Fusion of level effects is induced
by a prior on the mixture weights which encourages empty components.
Model-based clustering of the effects during MCMC sampling allows to
simultaneously detect categories which have essentially the same effect size
and identify variables with no effect at all. The properties of this approach
are investigated in simulation studies. Finally, the method is applied to
analyse effects of high-dimensional categorical predictors on income in
Austria
Generalized Species Sampling Priors with Latent Beta reinforcements
Many popular Bayesian nonparametric priors can be characterized in terms of
exchangeable species sampling sequences. However, in some applications,
exchangeability may not be appropriate. We introduce a {novel and
probabilistically coherent family of non-exchangeable species sampling
sequences characterized by a tractable predictive probability function with
weights driven by a sequence of independent Beta random variables. We compare
their theoretical clustering properties with those of the Dirichlet Process and
the two parameters Poisson-Dirichlet process. The proposed construction
provides a complete characterization of the joint process, differently from
existing work. We then propose the use of such process as prior distribution in
a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte
Carlo sampler for posterior inference. We evaluate the performance of the prior
and the robustness of the resulting inference in a simulation study, providing
a comparison with popular Dirichlet Processes mixtures and Hidden Markov
Models. Finally, we develop an application to the detection of chromosomal
aberrations in breast cancer by leveraging array CGH data.Comment: For correspondence purposes, Edoardo M. Airoldi's email is
[email protected]; Federico Bassetti's email is
[email protected]; Michele Guindani's email is
[email protected] ; Fabrizo Leisen's email is
[email protected]. To appear in the Journal of the American
Statistical Associatio
- …