Search CORE

617 research outputs found

Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii

Author: Tsybakov A. B.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/08/2007
Field of study

Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001064 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Simultaneous adaptation to the margin and to complexity in classification

Author: Lecué Guillaume
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

We consider the problem of adaptation to the margin and to complexity in binary classification. We suggest an exponential weighting aggregation scheme. We use this aggregation procedure to construct classifiers which adapt automatically to margin and complexity. Two main examples are worked out in which adaptivity is achieved in frameworks proposed by Steinwart and Scovel [Learning Theory. Lecture Notes in Comput. Sci. 3559 (2005) 279--294. Springer, Berlin; Ann. Statist. 35 (2007) 575--607] and Tsybakov [Ann. Statist. 32 (2004) 135--166]. Adaptive schemes, like ERM or penalized ERM, usually involve a minimization step. This is not the case for our procedure.Comment: Published in at http://dx.doi.org/10.1214/009053607000000055 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

An adaptive multiclass nearest neighbor classifier

Author: Puchkin Nikita
Spokoiny Vladimir
Publication venue: 'EDP Sciences'
Publication date: 03/11/2019
Field of study

We consider a problem of multiclass classification, where the training sample

S_n = \{(X_i, Y_i)\}_{i=1}^n

is generated from the model

\mathbb P(Y = m | X = x) = \eta_m(x)

1 \leq m \leq M

, and

\eta_1(x), \dots, \eta_M(x)

are unknown

\alpha

-Holder continuous functions.Given a test point

X

, our goal is to predict its label. A widely used

\mathsf k

-nearest-neighbors classifier constructs estimates of

\eta_1(X), \dots, \eta_M(X)

and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter

\mathsf k

, which may become tricky in some situations. In our solution, we fix several integers

n_1, \dots, n_K

, compute corresponding

n_k

-nearest-neighbor estimates for each

m

and each

n_k

and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter

\alpha

and to the margin and establish rates of convergence under mild assumptions.Comment: Accepted in ESAIM: Probability & Statistics. The original publication is available at www.esaim-ps.or

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

PAC-Bayesian High Dimensional Bipartite Ranking

Author: Guedj Benjamin
Robbiano Sylvain
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

This paper is devoted to the bipartite ranking problem, a classical statistical learning task, in a high dimensional setting. We propose a scoring and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear additive scoring functions, and we derive non-asymptotic risk bounds under a sparsity assumption. In particular, oracle inequalities in probability holding under a margin condition assess the performance of our procedure, and prove its minimax optimality. An MCMC-flavored algorithm is proposed to implement our method, along with its behavior on synthetic and real-life datasets

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

UCL Discovery

Beyond Disagreement-based Agnostic Active Learning

Author: Chaudhuri Kamalika
Zhang Chicheng
Publication venue
Publication date: 11/07/2014
Field of study

We study agnostic active learning, where the goal is to learn a classifier in a pre-specified hypothesis class interactively with as few label queries as possible, while making no assumptions on the true function generating the labels. The main algorithms for this problem are {\em{disagreement-based active learning}}, which has a high label requirement, and {\em{margin-based active learning}}, which only applies to fairly restricted settings. A major challenge is to find an algorithm which achieves better label complexity, is consistent in an agnostic setting, and applies to general classification problems. In this paper, we provide such an algorithm. Our solution is based on two novel contributions -- a reduction from consistent active learning to confidence-rated prediction with guaranteed error, and a novel confidence-rated predictor

arXiv.org e-Print Archive

CiteSeerX

Adapting to Unknown Smoothness by Aggregation of Thresholded Wavelet Estimators

Author: Chesneau Christophe
Lecué Guillaume
Publication venue
Publication date: 01/01/2006
Field of study

We study the performances of an adaptive procedure based on a convex combination, with data-driven weights, of term-by-term thresholded wavelet estimators. For the bounded regression model, with random uniform design, and the nonparametric density model, we show that the resulting estimator is optimal in the minimax sense over all Besov balls under the

L^2

risk, without any logarithm factor

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot

HAL - UPEC / UPEM

Sharp Oracle Inequalities for Aggregation of Affine Estimators

Author: Dalalyan Arnak
Salmon Joseph
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in non-parametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a PAC-Bayesian type inequality that leads to sharp oracle inequalities in discrete but also in continuous settings. The framework is general enough to cover the combinations of various procedures such as least square regression, kernel ridge regression, shrinking estimators and many other estimators used in the literature on statistical inverse problems. As a consequence, we show that the proposed aggregate provides an adaptive estimator in the exact minimax sense without neither discretizing the range of tuning parameters nor splitting the set of observations. We also illustrate numerically the good performance achieved by the exponentially weighted aggregate

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech

A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

Author: Grünwald Peter D.
Mehta Nishant A.
Publication venue
Publication date: 20/10/2017
Field of study

We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian,

\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior})

complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to

L_2(P)

entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with

L_\infty

. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page

arXiv.org e-Print Archive

CWI's Institutional Repository