14,190 research outputs found
Almost the Best of Three Worlds: Risk, Consistency and Optional Stopping for the Switch Criterion in Nested Model Selection
We study the switch distribution, introduced by Van Erven et al. (2012),
applied to model selection and subsequent estimation. While switching was known
to be strongly consistent, here we show that it achieves minimax optimal
parametric risk rates up to a factor when comparing two nested
exponential families, partially confirming a conjecture by Lauritzen (2012) and
Cavanaugh (2012) that switching behaves asymptotically like the Hannan-Quinn
criterion. Moreover, like Bayes factor model selection but unlike standard
significance testing, when one of the models represents a simple hypothesis,
the switch criterion defines a robust null hypothesis test, meaning that its
Type-I error probability can be bounded irrespective of the stopping rule.
Hence, switching is consistent, insensitive to optional stopping and almost
minimax risk optimal, showing that, Yang's (2005) impossibility result
notwithstanding, it is possible to `almost' combine the strengths of AIC and
Bayes factor model selection.Comment: To appear in Statistica Sinic
Asymptotic Properties for Methods Combining Minimum Hellinger Distance Estimates and Bayesian Nonparametric Density Estimates
In frequentist inference, minimizing the Hellinger distance between a kernel
density estimate and a parametric family produces estimators that are both
robust to outliers and statistically efficienty when the parametric model is
correct. This paper seeks to extend these results to the use of nonparametric
Bayesian density estimators within disparity methods. We propose two
estimators: one replaces the kernel density estimator with the expected
posterior density from a random histogram prior; the other induces a posterior
over parameters through the posterior for the random histogram. We show that it
is possible to adapt the mathematical machinery of efficient influence
functions from semiparametric models to demonstrate that both our estimators
are efficient in the sense of achieving the Cramer-Rao lower bound. We further
demonstrate a Bernstein-von-Mises result for our second estimator indicating
that it's posterior is asymptotically Gaussian. In addition, the robustness
properties of classical minimum Hellinger distance estimators continue to hold
Lower Bounds on Exponential Moments of the Quadratic Error in Parameter Estimation
Considering the problem of risk-sensitive parameter estimation, we propose a
fairly wide family of lower bounds on the exponential moments of the quadratic
error, both in the Bayesian and the non--Bayesian regime. This family of
bounds, which is based on a change of measures, offers considerable freedom in
the choice of the reference measure, and our efforts are devoted to explore
this freedom to a certain extent. Our focus is mostly on signal models that are
relevant to communication problems, namely, models of a parameter-dependent
signal (modulated signal) corrupted by additive white Gaussian noise, but the
methodology proposed is also applicable to other types of parametric families,
such as models of linear systems driven by random input signals (white noise,
in most cases), and others. In addition to the well known motivations of the
risk-sensitive cost function (i.e., the exponential quadratic cost function),
which is most notably, the robustness to model uncertainty, we also view this
cost function as a tool for studying fundamental limits concerning the tail
behavior of the estimation error. Another interesting aspect, that we
demonstrate in a certain parametric model, is that the risk-sensitive cost
function may be subjected to phase transitions, owing to some analogies with
statistical mechanics.Comment: 28 pages; 4 figures; submitted for publicatio
Prediction of time series by statistical learning: general losses and fast rates
We establish rates of convergences in time series forecasting using the
statistical learning approach based on oracle inequalities. A series of papers
extends the oracle inequalities obtained for iid observations to time series
under weak dependence conditions. Given a family of predictors and
observations, oracle inequalities state that a predictor forecasts the series
as well as the best predictor in the family up to a remainder term .
Using the PAC-Bayesian approach, we establish under weak dependence conditions
oracle inequalities with optimal rates of convergence. We extend previous
results for the absolute loss function to any Lipschitz loss function with
rates where measures the
complexity of the model. We apply the method for quantile loss functions to
forecast the french GDP. Under additional conditions on the loss functions
(satisfied by the quadratic loss function) and on the time series, we refine
the rates of convergence to . We achieve for the
first time these fast rates for uniformly mixing processes. These rates are
known to be optimal in the iid case and for individual sequences. In
particular, we generalize the results of Dalalyan and Tsybakov on sparse
regression estimation to the case of autoregression
Bayesian Bootstrap Analysis of Systems of Equations
Research Methods/ Statistical Methods,
Good, great, or lucky? Screening for firms with sustained superior performance using heavy-tailed priors
This paper examines historical patterns of ROA (return on assets) for a
cohort of 53,038 publicly traded firms across 93 countries, measured over the
past 45 years. Our goal is to screen for firms whose ROA trajectories suggest
that they have systematically outperformed their peer groups over time. Such a
project faces at least three statistical difficulties: adjustment for relevant
covariates, massive multiplicity, and longitudinal dependence. We conclude
that, once these difficulties are taken into account, demonstrably superior
performance appears to be quite rare. We compare our findings with other recent
management studies on the same subject, and with the popular literature on
corporate success. Our methodological contribution is to propose a new class of
priors for use in large-scale simultaneous testing. These priors are based on
the hypergeometric inverted-beta family, and have two main attractive features:
heavy tails and computational tractability. The family is a four-parameter
generalization of the normal/inverted-beta prior, and is the natural conjugate
prior for shrinkage coefficients in a hierarchical normal model. Our results
emphasize the usefulness of these heavy-tailed priors in large multiple-testing
problems, as they have a mild rate of tail decay in the marginal likelihood
---a property long recognized to be important in testing.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS512 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …