626 research outputs found
Robust classification via MOM minimization
We present an extension of Vapnik's classical empirical risk minimizer (ERM)
where the empirical risk is replaced by a median-of-means (MOM) estimator, the
new estimators are called MOM minimizers. While ERM is sensitive to corruption
of the dataset for many classical loss functions used in classification, we
show that MOM minimizers behave well in theory, in the sense that it achieves
Vapnik's (slow) rates of convergence under weak assumptions: data are only
required to have a finite second moment and some outliers may also have
corrupted the dataset.
We propose an algorithm inspired by MOM minimizers. These algorithms can be
analyzed using arguments quite similar to those used for Stochastic Block
Gradient descent. As a proof of concept, we show how to modify a proof of
consistency for a descent algorithm to prove consistency of its MOM version. As
MOM algorithms perform a smart subsampling, our procedure can also help to
reduce substantially time computations and memory ressources when applied to
non linear algorithms.
These empirical performances are illustrated on both simulated and real
datasets
Low Complexity Regularization of Linear Inverse Problems
Inverse problems and regularization theory is a central theme in contemporary
signal processing, where the goal is to reconstruct an unknown signal from
partial indirect, and possibly noisy, measurements of it. A now standard method
for recovering the unknown signal is to solve a convex optimization problem
that enforces some prior knowledge about its structure. This has proved
efficient in many problems routinely encountered in imaging sciences,
statistics and machine learning. This chapter delivers a review of recent
advances in the field where the regularization prior promotes solutions
conforming to some notion of simplicity/low-complexity. These priors encompass
as popular examples sparsity and group sparsity (to capture the compressibility
of natural signals and images), total variation and analysis sparsity (to
promote piecewise regularity), and low-rank (as natural extension of sparsity
to matrix-valued data). Our aim is to provide a unified treatment of all these
regularizations under a single umbrella, namely the theory of partial
smoothness. This framework is very general and accommodates all low-complexity
regularizers just mentioned, as well as many others. Partial smoothness turns
out to be the canonical way to encode low-dimensional models that can be linear
spaces or more general smooth manifolds. This review is intended to serve as a
one stop shop toward the understanding of the theoretical properties of the
so-regularized solutions. It covers a large spectrum including: (i) recovery
guarantees and stability to noise, both in terms of -stability and
model (manifold) identification; (ii) sensitivity analysis to perturbations of
the parameters involved (in particular the observations), with applications to
unbiased risk estimation ; (iii) convergence properties of the forward-backward
proximal splitting scheme, that is particularly well suited to solve the
corresponding large-scale regularized optimization problem
Sparse and stable Markowitz portfolios
We consider the problem of portfolio selection within the classical Markowitz
mean-variance framework, reformulated as a constrained least-squares regression
problem. We propose to add to the objective function a penalty proportional to
the sum of the absolute values of the portfolio weights. This penalty
regularizes (stabilizes) the optimization problem, encourages sparse portfolios
(i.e. portfolios with only few active positions), and allows to account for
transaction costs. Our approach recovers as special cases the
no-short-positions portfolios, but does allow for short positions in limited
number. We implement this methodology on two benchmark data sets constructed by
Fama and French. Using only a modest amount of training data, we construct
portfolios whose out-of-sample performance, as measured by Sharpe ratio, is
consistently and significantly better than that of the naive evenly-weighted
portfolio which constitutes, as shown in recent literature, a very tough
benchmark.Comment: Better emphasis of main result, new abstract, new examples and
figures. New appendix with full details of algorithm. 17 pages, 6 figure
Musings on Deep Learning: Properties of SGD
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. H.M. is supported in part by ARO Grant W911NF-15-1- 0385
- …