9,534 research outputs found

    Pivotal estimation via square-root Lasso in nonparametric regression

    Get PDF
    We propose a self-tuning Lasso\sqrt{\mathrm {Lasso}} method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for Lasso\sqrt{\mathrm {Lasso}} including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by Lasso\sqrt{\mathrm {Lasso}} accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post Lasso\sqrt{\mathrm {Lasso}} is as good as Lasso\sqrt{\mathrm {Lasso}}'s rate. As an application, we consider the use of Lasso\sqrt{\mathrm {Lasso}} and ols post Lasso\sqrt{\mathrm {Lasso}} as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or ZZ-problem), resulting in a construction of n\sqrt{n}-consistent and asymptotically normal estimators of the main parameters.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1204 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sharp thresholds for high-dimensional and noisy recovery of sparsity

    Full text link
    The problem of consistently estimating the sparsity pattern of a vector \betastar \in \real^\mdim based on observations contaminated by noise arises in various contexts, including subset selection in regression, structure estimation in graphical models, sparse approximation, and signal denoising. We analyze the behavior of 1\ell_1-constrained quadratic programming (QP), also referred to as the Lasso, for recovering the sparsity pattern. Our main result is to establish a sharp relation between the problem dimension \mdim, the number \spindex of non-zero elements in \betastar, and the number of observations \numobs that are required for reliable recovery. For a broad class of Gaussian ensembles satisfying mutual incoherence conditions, we establish existence and compute explicit values of thresholds \ThreshLow and \ThreshUp with the following properties: for any ϵ>0\epsilon > 0, if \numobs > 2 (\ThreshUp + \epsilon) \log (\mdim - \spindex) + \spindex + 1, then the Lasso succeeds in recovering the sparsity pattern with probability converging to one for large problems, whereas for \numobs < 2 (\ThreshLow - \epsilon) \log (\mdim - \spindex) + \spindex + 1, then the probability of successful recovery converges to zero. For the special case of the uniform Gaussian ensemble, we show that \ThreshLow = \ThreshUp = 1, so that the threshold is sharp and exactly determined.Comment: Appeared as Technical Report 708, Department of Statistics, UC Berkele

    Large-scale Nonlinear Variable Selection via Kernel Random Features

    Full text link
    We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the first kernel-based variable selection method applicable to large datasets. It sidesteps the typical poor scaling properties of kernel methods by mapping the inputs into a relatively low-dimensional space of random features. The algorithm discovers the variables relevant for the regression task together with learning the prediction model through learning the appropriate nonlinear random feature maps. We demonstrate the outstanding performance of our method on a set of large-scale synthetic and real datasets.Comment: Final version for proceedings of ECML/PKDD 201

    Selective inference after feature selection via multiscale bootstrap

    Full text link
    It is common to show the confidence intervals or pp-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely-applicable resampling method addresses these issues to compute an approximately unbiased selective pp-value for the selected features. We prove that the pp-value computed by our resampling method is more accurate and more powerful than existing methods, while the computational cost is the same order as the classical bootstrap method. Numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.Comment: The title has changed (The previous title is "Selective inference after variable selection via multiscale bootstrap"). 23 pages, 11 figure
    corecore