The Lasso has attracted the attention of many authors these last years. While
many efforts have been made to prove that the Lasso behaves like a variable
selection procedure at the price of strong (though unavoidable) assumptions on
the geometric structure of these variables, much less attention has been paid
to the analysis of the performance of the Lasso as a regularization algorithm.
Our first purpose here is to provide a conceptually very simple result in this
direction. We shall prove that, provided that the regularization parameter is
properly chosen, the Lasso works almost as well as the deterministic Lasso.
This result does not require any assumption at all, neither on the structure of
the variables nor on the regression function. Our second purpose is to
introduce a new estimator particularly adapted to deal with infinite countable
dictionaries. This estimator is constructed as an l0-penalized estimator among
a sequence of Lasso estimators associated to a dyadic sequence of growing
truncated dictionaries. The selection procedure automatically chooses the best
level of truncation of the dictionary so as to make the best tradeoff between
approximation, l1-regularization and sparsity. From a theoretical point of
view, we shall provide an oracle inequality satisfied by this selected Lasso
estimator. The oracle inequalities established for the Lasso and the selected
Lasso estimators shall enable us to derive rates of convergence on a wide class
of functions, showing that these estimators perform at least as well as greedy
algorithms. Besides, we shall prove that the rates of convergence achieved by
the selected Lasso estimator are optimal in the orthonormal case by bounding
from below the minimax risk on some Besov bodies. Finally, some theoretical
results about the performance of the Lasso for infinite uncountable
dictionaries will be studied in the specific framework of neural networks. All
the oracle inequalities presented in this paper are obtained via the
application of a single general theorem of model selection among a collection
of nonlinear models which is a direct consequence of the Gaussian concentration
inequality. The key idea that enables us to apply this general theorem is to
see l1-regularization as a model selection procedure among l1-balls