3,175 research outputs found
Consistency of cross validation for comparing regression procedures
Theoretical developments on cross validation (CV) have mainly focused on
selecting one among a list of finite-dimensional models (e.g., subset or order
selection in linear regression) or selecting a smoothing parameter (e.g.,
bandwidth for kernel smoothing). However, little is known about consistency of
cross validation when applied to compare between parametric and nonparametric
methods or within nonparametric methods. We show that under some conditions,
with an appropriate choice of data splitting ratio, cross validation is
consistent in the sense of selecting the better procedure with probability
approaching 1. Our results reveal interesting behavior of cross validation.
When comparing two models (procedures) converging at the same nonparametric
rate, in contrast to the parametric case, it turns out that the proportion of
data used for evaluation in CV does not need to be dominating in size.
Furthermore, it can even be of a smaller order than the proportion for
estimation while not affecting the consistency property.Comment: Published in at http://dx.doi.org/10.1214/009053607000000514 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Maximum L-likelihood estimation
In this paper, the maximum L-likelihood estimator (MLE), a new
parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 30--35]
is introduced. The properties of the MLE are studied via asymptotic analysis
and computer simulations. The behavior of the MLE is characterized by the
degree of distortion applied to the assumed model. When is properly
chosen for small and moderate sample sizes, the MLE can successfully trade
bias for precision, resulting in a substantial reduction of the mean squared
error. When the sample size is large and tends to 1, a necessary and
sufficient condition to ensure a proper asymptotic normality and efficiency of
MLE is established.Comment: Published in at http://dx.doi.org/10.1214/09-AOS687 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Forecast Combination Under Heavy-Tailed Errors
Forecast combination has been proven to be a very important technique to
obtain accurate predictions. In many applications, forecast errors exhibit
heavy tail behaviors for various reasons. Unfortunately, to our knowledge,
little has been done to deal with forecast combination for such situations. The
familiar forecast combination methods such as simple average, least squares
regression, or those based on variance-covariance of the forecasts, may perform
very poorly. In this paper, we propose two nonparametric forecast combination
methods to address the problem. One is specially proposed for the situations
that the forecast errors are strongly believed to have heavy tails that can be
modeled by a scaled Student's t-distribution; the other is designed for
relatively more general situations when there is a lack of strong or consistent
evidence on the tail behaviors of the forecast errors due to shortage of data
and/or evolving data generating process. Adaptive risk bounds of both methods
are developed. Simulations and a real example show superior performance of the
new methods
Adaptive Regression by Mixing
Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus automatic adaptation over various scenarios is desirable. A practically feasible method, named adaptive regression by mixing (ARM), is proposed to convexly combine general candidate regression procedures. Under mild conditions, the resulting estimator is theoretically shown to perform optimally in rates of convergence without knowing which of the original procedures work the best. Simulations are conducted in several settings, including comparing a parametric model with nonparametric alternatives, comparing a neural network with a projection pursuit in multidimensional regression, and combining bandwidths in kernel regression. The results clearly support the theoretical property of ARM. The ARM algorithm assigns weights on the candidate models-procedures via proper assessment of performance of the estimators. The data are split into two parts, one for estimation and the other for measuring behavior in prediction. Although there are many plausible ways to assign the weights, ARM has a connection with information theory, which ensures the desired adaptation capability. Indeed, under mild conditions, we show that the squared L2 risk of the estimator based on ARM is basically bounded above by the risk of each candidate procedure plus a small penalty term of order 1/n. Minimizing over the procedures gives the automatically optimal rate of convergence for ARM. Model selection often induces unnecessarily large variability in estimation. Alternatively, a proper weighting of the candidate models can be more stable, resulting in a smaller risk. Simulations suggest that ARM works better than model selection using Akaike or Bayesian information criteria when the error variance is not very small
- …