Search CORE

81 research outputs found

Sparse Regression Learning by Aggregation and Langevin Monte-Carlo

Author: Dalalyan Arnak
Tsybakov Alexandre B.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PAC-Bayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter

\beta

of the EWA is larger than or equal to

4\sigma^2

, where

\sigma^2

is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finite-dimensional linear space spanned by a dictionary of functions

\phi_1,...,\phi_M

. We allow

M

to be much larger than the sample size

n

but we assume that the true regression function can be well approximated by a sparse linear combination of functions

\phi_j

. Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin Monte-Carlo algorithms to approximately compute such an EWA when the number

M

of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings.Comment: Short version published in COLT 200

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL-Polytechnique

Estimation of high-dimensional low-rank matrices

Author: Rohde Angelika
Tsybakov Alexandre B.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 13/05/2011
Field of study

Suppose that we observe entries or, more generally, linear combinations of entries of an unknown

m\times T

-matrix

A

corrupted by noise. We are particularly interested in the high-dimensional setting where the number

mT

of unknown entries can be much larger than the sample size

N

. Motivated by several applications, we consider estimation of matrix

A

under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-

p

quasi-norm penalty term,

p\leq1

. We study these estimators under two possible assumptions---a modified version of the restricted isometry condition and a uniform bound on the ratio "empirical norm induced by the sampling operator/Frobenius norm." The main results are stated as nonasymptotic upper bounds on the prediction risk and on the Schatten-

q

risk of the estimators, where

q\in[p,2]

. The rates that we obtain for the prediction risk are of the form

rm/N

(for

m=T

), up to logarithmic factors, where

r

is the rank of

A

. The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the

k

th entropy numbers of the quasi-convex Schatten class embeddings

S_p^M\hookrightarrow S_2^M

p<1

, which are of independent interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

An $\{l_1,l_2,l_{\infty}\}$ -Regularization Approach to High-Dimensional Errors-in-variables Models

Author: Belloni Alexandre
Rosenbaum Mathieu
Tsybakov Alexandre B.
Publication venue
Publication date: 22/12/2014
Field of study

Several new estimation methods have been recently proposed for the linear regression model with observation error in the design. Different assumptions on the data generating process have motivated different estimators and analysis. In particular, the literature considered (1) observation errors in the design uniformly bounded by some

\bar \delta

, and (2) zero mean independent observation errors. Under the first assumption, the rates of convergence of the proposed estimators depend explicitly on

\bar \delta

, while the second assumption has been applied when an estimator for the second moment of the observational error is available. This work proposes and studies two new estimators which, compared to other procedures for regression models with errors in the design, exploit an additional

l_{\infty}

-norm regularization. The first estimator is applicable when both (1) and (2) hold but does not require an estimator for the second moment of the observational error. The second estimator is applicable under (2) and requires an estimator for the second moment of the observation error. Importantly, we impose no assumption on the accuracy of this pilot estimator, in contrast to the previously known procedures. As the recent proposals, we allow the number of covariates to be much larger than the sample size. We establish the rates of convergence of the estimators and compare them with the bounds obtained for related estimators in the literature. These comparisons show interesting insights on the interplay of the assumptions and the achievable rates of convergence

arXiv.org e-Print Archive

Hal-Diderot

HAL-Polytechnique

Fast learning rates for plug-in classifiers under the margin condition

Author: Audibert Jean-Yves
Tsybakov Alexandre B.
Publication venue
Publication date: 24/05/2011
Field of study

It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, i.e., the rates faster than

n^{-1/2}

. The works on this subject suggested the following two conjectures: (i) the best achievable fast rate is of the order

n^{-1}

, and (ii) the plug-in classifiers generally converge slower than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only the fast, but also the {\it super-fast} rates, i.e., the rates faster than

n^{-1}

. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: 36 page

arXiv.org e-Print Archive

Hal-Diderot

HAL-Ecole des Ponts ParisTech

Fast learning rates for plug-in classifiers

Author: Audibert Jean-Yves
Tsybakov Alexandre B.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than

n^{-1/2}

. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order

n^{-1}

, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than

n^{-1}

. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: Published at http://dx.doi.org/10.1214/009053606000001217 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Hal-Diderot