9,776 research outputs found
Inference for High-Dimensional Sparse Econometric Models
This article is about estimation and inference methods for high dimensional
sparse (HDS) regression models in econometrics. High dimensional sparse models
arise in situations where many regressors (or series terms) are available and
the regression function is well-approximated by a parsimonious, yet unknown set
of regressors. The latter condition makes it possible to estimate the entire
regression function effectively by searching for approximately the right set of
regressors. We discuss methods for identifying this set of regressors and
estimating their coefficients based on -penalization and describe key
theoretical results. In order to capture realistic practical situations, we
expressly allow for imperfect selection of regressors and study the impact of
this imperfect selection on estimation and inference results. We focus the main
part of the article on the use of HDS models and methods in the instrumental
variables model and the partially linear model. We present a set of novel
inference results for these models and illustrate their use with applications
to returns to schooling and growth regression
A Geometric View on Constrained M-Estimators
We study the estimation error of constrained M-estimators, and derive
explicit upper bounds on the expected estimation error determined by the
Gaussian width of the constraint set. Both of the cases where the true
parameter is on the boundary of the constraint set (matched constraint), and
where the true parameter is strictly in the constraint set (mismatched
constraint) are considered. For both cases, we derive novel universal
estimation error bounds for regression in a generalized linear model with the
canonical link function. Our error bound for the mismatched constraint case is
minimax optimal in terms of its dependence on the sample size, for Gaussian
linear regression by the Lasso
Design of Experiments for Screening
The aim of this paper is to review methods of designing screening
experiments, ranging from designs originally developed for physical experiments
to those especially tailored to experiments on numerical models. The strengths
and weaknesses of the various designs for screening variables in numerical
models are discussed. First, classes of factorial designs for experiments to
estimate main effects and interactions through a linear statistical model are
described, specifically regular and nonregular fractional factorial designs,
supersaturated designs and systematic fractional replicate designs. Generic
issues of aliasing, bias and cancellation of factorial effects are discussed.
Second, group screening experiments are considered including factorial group
screening and sequential bifurcation. Third, random sampling plans are
discussed including Latin hypercube sampling and sampling plans to estimate
elementary effects. Fourth, a variety of modelling methods commonly employed
with screening designs are briefly described. Finally, a novel study
demonstrates six screening methods on two frequently-used exemplars, and their
performances are compared
Inference for high-dimensional sparse econometric models
This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on l1 -penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression.
Estimation of high-dimensional low-rank matrices
Suppose that we observe entries or, more generally, linear combinations of
entries of an unknown -matrix corrupted by noise. We are
particularly interested in the high-dimensional setting where the number
of unknown entries can be much larger than the sample size . Motivated by
several applications, we consider estimation of matrix under the assumption
that it has small rank. This can be viewed as dimension reduction or sparsity
assumption. In order to shrink toward a low-rank representation, we investigate
penalized least squares estimators with a Schatten- quasi-norm penalty term,
. We study these estimators under two possible assumptions---a modified
version of the restricted isometry condition and a uniform bound on the ratio
"empirical norm induced by the sampling operator/Frobenius norm." The main
results are stated as nonasymptotic upper bounds on the prediction risk and on
the Schatten- risk of the estimators, where . The rates that we
obtain for the prediction risk are of the form (for ), up to
logarithmic factors, where is the rank of . The particular examples of
multi-task learning and matrix completion are worked out in detail. The proofs
are based on tools from the theory of empirical processes. As a by-product, we
derive bounds for the th entropy numbers of the quasi-convex Schatten class
embeddings , , which are of independent
interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
PAC-Bayesian High Dimensional Bipartite Ranking
This paper is devoted to the bipartite ranking problem, a classical
statistical learning task, in a high dimensional setting. We propose a scoring
and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear
additive scoring functions, and we derive non-asymptotic risk bounds under a
sparsity assumption. In particular, oracle inequalities in probability holding
under a margin condition assess the performance of our procedure, and prove its
minimax optimality. An MCMC-flavored algorithm is proposed to implement our
method, along with its behavior on synthetic and real-life datasets
- …