6,055 research outputs found
Model Selection for High Dimensional Quadratic Regression via Regularization
Quadratic regression (QR) models naturally extend linear models by
considering interaction effects between the covariates. To conduct model
selection in QR, it is important to maintain the hierarchical model structure
between main effects and interaction effects. Existing regularization methods
generally achieve this goal by solving complex optimization problems, which
usually demands high computational cost and hence are not feasible for high
dimensional data. This paper focuses on scalable regularization methods for
model selection in high dimensional QR. We first consider two-stage
regularization methods and establish theoretical properties of the two-stage
LASSO. Then, a new regularization method, called Regularization Algorithm under
Marginality Principle (RAMP), is proposed to compute a hierarchy-preserving
regularization solution path efficiently. Both methods are further extended to
solve generalized QR models. Numerical results are also shown to demonstrate
performance of the methods.Comment: 37 pages, 1 figure with supplementary materia
Sparsifying the Fisher Linear Discriminant by Rotation
Many high dimensional classification techniques have been proposed in the
literature based on sparse linear discriminant analysis (LDA). To efficiently
use them, sparsity of linear classifiers is a prerequisite. However, this might
not be readily available in many applications, and rotations of data are
required to create the needed sparsity. In this paper, we propose a family of
rotations to create the required sparsity. The basic idea is to use the
principal components of the sample covariance matrix of the pooled samples and
its variants to rotate the data first and to then apply an existing high
dimensional classifier. This rotate-and-solve procedure can be combined with
any existing classifiers, and is robust against the sparsity level of the true
model. We show that these rotations do create the sparsity needed for high
dimensional classifications and provide theoretical understanding why such a
rotation works empirically. The effectiveness of the proposed method is
demonstrated by a number of simulated and real data examples, and the
improvements of our method over some popular high dimensional classification
rules are clearly shown.Comment: 30 pages and 9 figures. This paper has been accepted by Journal of
the Royal Statistical Society: Series B (Statistical Methodology). The first
two versions of this paper were uploaded to Bin Dong's web site under the
title "A Rotate-and-Solve Procedure for Classification" in 2013 May and 2014
January. This version may be slightly different from the published versio
Variance Estimation Using Refitted Cross-validation in Ultrahigh Dimensional Regression
Variance estimation is a fundamental problem in statistical modeling. In
ultrahigh dimensional linear regressions where the dimensionality is much
larger than sample size, traditional variance estimation techniques are not
applicable. Recent advances on variable selection in ultrahigh dimensional
linear regressions make this problem accessible. One of the major problems in
ultrahigh dimensional regression is the high spurious correlation between the
unobserved realized noise and some of the predictors. As a result, the realized
noises are actually predicted when extra irrelevant variables are selected,
leading to serious underestimate of the noise level. In this paper, we propose
a two-stage refitted procedure via a data splitting technique, called refitted
cross-validation (RCV), to attenuate the influence of irrelevant variables with
high spurious correlations. Our asymptotic results show that the resulting
procedure performs as well as the oracle estimator, which knows in advance the
mean regression function. The simulation studies lend further support to our
theoretical claims. The naive two-stage estimator which fits the selected
variables in the first stage and the plug-in one stage estimators using LASSO
and SCAD are also studied and compared. Their performances can be improved by
the proposed RCV method
- …