13,676 research outputs found
Model Consistency of Partly Smooth Regularizers
This paper studies least-square regression penalized with partly smooth
convex regularizers. This class of functions is very large and versatile
allowing to promote solutions conforming to some notion of low-complexity.
Indeed, they force solutions of variational problems to belong to a
low-dimensional manifold (the so-called model) which is stable under small
perturbations of the function. This property is crucial to make the underlying
low-complexity model robust to small noise. We show that a generalized
"irrepresentable condition" implies stable model selection under small noise
perturbations in the observations and the design matrix, when the
regularization parameter is tuned proportionally to the noise level. This
condition is shown to be almost a necessary condition. We then show that this
condition implies model consistency of the regularized estimator. That is, with
a probability tending to one as the number of measurements increases, the
regularized estimator belongs to the correct low-dimensional model manifold.
This work unifies and generalizes several previous ones, where model
consistency is known to hold for sparse, group sparse, total variation and
low-rank regularizations
Model Consistency for Learning with Mirror-Stratifiable Regularizers
Low-complexity non-smooth convex regularizers are routinely used to impose
some structure (such as sparsity or low-rank) on the coefficients for linear
predictors in supervised learning. Model consistency consists then in selecting
the correct structure (for instance support or rank) by regularized empirical
risk minimization.
It is known that model consistency holds under appropriate non-degeneracy
conditions. However such conditions typically fail for highly correlated
designs and it is observed that regularization methods tend to select larger
models.
In this work, we provide the theoretical underpinning of this behavior using
the notion of mirror-stratifiable regularizers. This class of regularizers
encompasses the most well-known in the literature, including the or
trace norms. It brings into play a pair of primal-dual models, which in turn
allows one to locate the structure of the solution using a specific dual
certificate.
We also show how this analysis is applicable to optimal solutions of the
learning problem, and also to the iterates computed by a certain class of
stochastic proximal-gradient algorithms.Comment: 14 pages, 4 figure
Generalized Pseudolikelihood Methods for Inverse Covariance Estimation
We introduce PseudoNet, a new pseudolikelihood-based estimator of the inverse
covariance matrix, that has a number of useful statistical and computational
properties. We show, through detailed experiments with synthetic and also
real-world finance as well as wind power data, that PseudoNet outperforms
related methods in terms of estimation error and support recovery, making it
well-suited for use in a downstream application, where obtaining low estimation
error can be important. We also show, under regularity conditions, that
PseudoNet is consistent. Our proof assumes the existence of accurate estimates
of the diagonal entries of the underlying inverse covariance matrix; we
additionally provide a two-step method to obtain these estimates, even in a
high-dimensional setting, going beyond the proofs for related methods. Unlike
other pseudolikelihood-based methods, we also show that PseudoNet does not
saturate, i.e., in high dimensions, there is no hard limit on the number of
nonzero entries in the PseudoNet estimate. We present a fast algorithm as well
as screening rules that make computing the PseudoNet estimate over a range of
tuning parameters tractable
Kernel dimension reduction in regression
We present a new methodology for sufficient dimension reduction (SDR). Our
methodology derives directly from the formulation of SDR in terms of the
conditional independence of the covariate from the response , given the
projection of on the central subspace [cf. J. Amer. Statist. Assoc. 86
(1991) 316--342 and Regression Graphics (1998) Wiley]. We show that this
conditional independence assertion can be characterized in terms of conditional
covariance operators on reproducing kernel Hilbert spaces and we show how this
characterization leads to an -estimator for the central subspace. The
resulting estimator is shown to be consistent under weak conditions; in
particular, we do not have to impose linearity or ellipticity conditions of the
kinds that are generally invoked for SDR methods. We also present empirical
results showing that the new methodology is competitive in practice.Comment: Published in at http://dx.doi.org/10.1214/08-AOS637 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …