66,947 research outputs found
Recommended from our members
Model Selection, Uniform Inference and Nonparametric Regression
Model selection in the nonparametric regression model is inevitable since any
nonparametric estimator requires tuning parameters to be specified in order for it to be
feasible.
It is, however, standard practice to carry over the theory of nonparametric estimators
when the model is fixed to the case where the tuning parameters are no longer fixed, but
chosen by, possibly, data-driven model selection algorithms. This theory is not necessar
ily valid as the model selection step is not taken into account. This thesis contributes to
the nonparametric econometrics and statistics literature and, in particular, to the theory
of series estimators, by showing that such estimators have desirable properties and that
valid inference is possible even when a model-selection step precedes estimation.
The first chapter is concerned with K-fold cross-validation and shows that the cross-
validated least-squares estimator predicts the response equally well as the unfeasible
best-linear predictor whose dimension may diverge with the sample size. This property,
known as risk consistency, is uncommon in econometrics, but it has the benefit that
it holds under few and very weak conditions. The risk-consistency result crucially re
lies on the non-asymptotic analysis of the difference between the prediction error of the
cross-validated estimator and the best-linear predictor. As the dimension of the parameters
may diverge, this set-up analyses both the high-dimensional linear model as well as
the nonparametric regression model which reduces the need for duplicate theories. An
extensive Monte Carlo experiment corroborates the theoretical results by showing that
the non-asymptotic bound becomes arbitrarily small as the sample size diverges.
The second chapter returns to more classical statistics and econometrics by studying the
uniform consistency of the series estimator for the conditional mean function and its
linear functionals. The uniformity holds both in the support of the covariates as well
as the models considered. Under high-level assumptions, a non-asymptotic linearisation
result delivers uniform rates of convergence for the series estimator. By verifying the
high-level assumptions, case-specific rates can easily be derived. For example, the series
estimator attains, up to a small logarithmic penalty, the minimax rate of convergence for
functions lying in a Hölder ball.
The results from the second chapter form the basis for the inference procedure proposed
in the final chapter in order to construct valid uniform confidence bands for the series
estimator. The uniform confidence bands are valid in the sense that they control the
asymptotic size for the conditional mean function, or its linear functionals, seen as a process in the covariates and the models considered. Given that the results hold uniformly
over the models considered, the inference procedure is valid regardless of which
model-selection algorithm delivers the final model used to estimate the parameters of
interest.
The key quantity is the maximal t-statistic correctly studentised using an estimator for
the standard error. The theory relies on the uniform linearisation result from chapter two
and the concept of strong approximations, or couplings, as the limit distribution of the
maximal t-statistic does not exist. A Monte Carlo study establishes that the uniform
confidence bands have the correct coverage even in finite samples. The chapter concludes
with an application testing for shape restrictions on the demand function for gasoline in
the US using a cross-validated series estimator.ESR
A Convex Framework for Confounding Robust Inference
We study policy evaluation of offline contextual bandits subject to
unobserved confounders. Sensitivity analysis methods are commonly used to
estimate the policy value under the worst-case confounding over a given
uncertainty set. However, existing work often resorts to some coarse relaxation
of the uncertainty set for the sake of tractability, leading to overly
conservative estimation of the policy value. In this paper, we propose a
general estimator that provides a sharp lower bound of the policy value using
convex programming. The generality of our estimator enables various extensions
such as sensitivity analysis with f-divergence, model selection with cross
validation and information criterion, and robust policy learning with the sharp
lower bound. Furthermore, our estimation method can be reformulated as an
empirical risk minimization problem thanks to the strong duality, which enables
us to provide strong theoretical guarantees of the proposed estimator using
techniques of the M-estimation.Comment: This is an extension of the following work
https://proceedings.mlr.press/v206/ishikawa23a.html. arXiv admin note: text
overlap with arXiv:2302.1334
Multiplicative local linear hazard estimation and best one-sided cross-validation
This paper develops detailed mathematical statistical theory of a new class of cross-validation techniques of local linear kernel hazards and their multiplicative bias corrections. The new class of cross-validation combines principles of local information and recent advances in indirect cross-validation. A few applications of cross-validating multiplicative kernel hazard estimation do exist in the literature. However, detailed mathematical statistical theory and small sample performance are introduced via this paper and further upgraded to our new class of best one-sided cross-validation. Best one-sided cross-validation turns out to have excellent performance in its practical illustrations, in its small sample performance and in its mathematical statistical theoretical performance
Optimal cross-validation in density estimation with the -loss
We analyze the performance of cross-validation (CV) in the density estimation
framework with two purposes: (i) risk estimation and (ii) model selection. The
main focus is given to the so-called leave--out CV procedure (Lpo), where
denotes the cardinality of the test set. Closed-form expressions are
settled for the Lpo estimator of the risk of projection estimators. These
expressions provide a great improvement upon -fold cross-validation in terms
of variability and computational complexity. From a theoretical point of view,
closed-form expressions also enable to study the Lpo performance in terms of
risk estimation. The optimality of leave-one-out (Loo), that is Lpo with ,
is proved among CV procedures used for risk estimation. Two model selection
frameworks are also considered: estimation, as opposed to identification. For
estimation with finite sample size , optimality is achieved for large
enough [with ] to balance the overfitting resulting from the
structure of the model collection. For identification, model selection
consistency is settled for Lpo as long as is conveniently related to the
rate of convergence of the best estimator in the collection: (i) as
with a parametric rate, and (ii) with some
nonparametric estimators. These theoretical results are validated by simulation
experiments.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1240 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso
Model selection is difficult to analyse yet theoretically and empirically important, especially for high-dimensional data analysis. Recently the least absolute shrinkage and selection operator (Lasso) has been applied in the statistical and econometric literature. Consis- tency of Lasso has been established under various conditions, some of which are difficult to verify in practice. In this paper, we study model selection from the perspective of generalization ability, under the framework of structural risk minimization (SRM) and Vapnik-Chervonenkis (VC) theory. The approach emphasizes the balance between the in-sample and out-of-sample fit, which can be achieved by using cross-validation to select a penalty on model complexity. We show that an exact relationship exists between the generalization ability of a model and model selection consistency. By implementing SRM and the VC inequality, we show that Lasso is L2-consistent for model selection under assumptions similar to those imposed on OLS. Furthermore, we derive a probabilistic bound for the distance between the penalized extremum estimator and the extremum estimator without penalty, which is dominated by overfitting. We also propose a new measurement of overfitting, GR2, based on generalization ability, that converges to zero if model selection is consistent. Using simulations, we demonstrate that the proposed CV-Lasso algorithm performs well in terms of model selection and overfitting control
- …