9,776 research outputs found

    Inference for High-Dimensional Sparse Econometric Models

    Full text link
    This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on 1\ell_1-penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression

    A Geometric View on Constrained M-Estimators

    Get PDF
    We study the estimation error of constrained M-estimators, and derive explicit upper bounds on the expected estimation error determined by the Gaussian width of the constraint set. Both of the cases where the true parameter is on the boundary of the constraint set (matched constraint), and where the true parameter is strictly in the constraint set (mismatched constraint) are considered. For both cases, we derive novel universal estimation error bounds for regression in a generalized linear model with the canonical link function. Our error bound for the mismatched constraint case is minimax optimal in terms of its dependence on the sample size, for Gaussian linear regression by the Lasso

    Design of Experiments for Screening

    Full text link
    The aim of this paper is to review methods of designing screening experiments, ranging from designs originally developed for physical experiments to those especially tailored to experiments on numerical models. The strengths and weaknesses of the various designs for screening variables in numerical models are discussed. First, classes of factorial designs for experiments to estimate main effects and interactions through a linear statistical model are described, specifically regular and nonregular fractional factorial designs, supersaturated designs and systematic fractional replicate designs. Generic issues of aliasing, bias and cancellation of factorial effects are discussed. Second, group screening experiments are considered including factorial group screening and sequential bifurcation. Third, random sampling plans are discussed including Latin hypercube sampling and sampling plans to estimate elementary effects. Fourth, a variety of modelling methods commonly employed with screening designs are briefly described. Finally, a novel study demonstrates six screening methods on two frequently-used exemplars, and their performances are compared

    Inference for high-dimensional sparse econometric models

    Get PDF
    This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on l1 -penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression.

    Estimation of high-dimensional low-rank matrices

    Full text link
    Suppose that we observe entries or, more generally, linear combinations of entries of an unknown m×Tm\times T-matrix AA corrupted by noise. We are particularly interested in the high-dimensional setting where the number mTmT of unknown entries can be much larger than the sample size NN. Motivated by several applications, we consider estimation of matrix AA under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-pp quasi-norm penalty term, p1p\leq1. We study these estimators under two possible assumptions---a modified version of the restricted isometry condition and a uniform bound on the ratio "empirical norm induced by the sampling operator/Frobenius norm." The main results are stated as nonasymptotic upper bounds on the prediction risk and on the Schatten-qq risk of the estimators, where q[p,2]q\in[p,2]. The rates that we obtain for the prediction risk are of the form rm/Nrm/N (for m=Tm=T), up to logarithmic factors, where rr is the rank of AA. The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the kkth entropy numbers of the quasi-convex Schatten class embeddings SpMS2MS_p^M\hookrightarrow S_2^M, p<1p<1, which are of independent interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    PAC-Bayesian High Dimensional Bipartite Ranking

    Get PDF
    This paper is devoted to the bipartite ranking problem, a classical statistical learning task, in a high dimensional setting. We propose a scoring and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear additive scoring functions, and we derive non-asymptotic risk bounds under a sparsity assumption. In particular, oracle inequalities in probability holding under a margin condition assess the performance of our procedure, and prove its minimax optimality. An MCMC-flavored algorithm is proposed to implement our method, along with its behavior on synthetic and real-life datasets
    corecore