304 research outputs found
Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection
We study the problem of selecting a subset of k random variables from a large
set, in order to obtain the best linear prediction of another variable of
interest. This problem can be viewed in the context of both feature selection
and sparse approximation. We analyze the performance of widely used greedy
heuristics, using insights from the maximization of submodular functions and
spectral analysis. We introduce the submodularity ratio as a key quantity to
help understand why greedy algorithms perform well even when the variables are
highly correlated. Using our techniques, we obtain the strongest known
approximation guarantees for this problem, both in terms of the submodularity
ratio and the smallest k-sparse eigenvalue of the covariance matrix. We further
demonstrate the wide applicability of our techniques by analyzing greedy
algorithms for the dictionary selection problem, and significantly improve the
previously known guarantees. Our theoretical analysis is complemented by
experiments on real-world and synthetic data sets; the experiments show that
the submodularity ratio is a stronger predictor of the performance of greedy
algorithms than other spectral parameters
Robust Lasso-Zero for sparse corruption and model selection with missing covariates
We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology
[Descloux and Sardy, 2018], initially introduced for sparse linear models, to
the sparse corruptions problem. We give theoretical guarantees on the sign
recovery of the parameters for a slightly simplified version of the estimator,
called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is showcased
for variable selection with missing values in the covariates. In addition to
not requiring the specification of a model for the covariates, nor estimating
their covariance matrix or the noise variance, the method has the great
advantage of handling missing not-at random values without specifying a
parametric model. Numerical experiments and a medical application underline the
relevance of Robust Lasso-Zero in such a context with few available
competitors. The method is easy to use and implemented in the R library lass0
- …