We derive sharp performance bounds for least squares regression with L1
regularization from parameter estimation accuracy and feature selection quality
perspectives. The main result proved for L1 regularization extends a similar
result in [Ann. Statist. 35 (2007) 2313--2351] for the Dantzig selector. It
gives an affirmative answer to an open question in [Ann. Statist. 35 (2007)
2358--2364]. Moreover, the result leads to an extended view of feature
selection that allows less restrictive conditions than some recent work. Based
on the theoretical insights, a novel two-stage L1-regularization procedure
with selective penalization is analyzed. It is shown that if the target
parameter vector can be decomposed as the sum of a sparse parameter vector with
large coefficients and another less sparse vector with relatively small
coefficients, then the two-stage procedure can lead to improved performance.Comment: Published in at http://dx.doi.org/10.1214/08-AOS659 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org