Search CORE

130 research outputs found

Risque garanti pour les modèles de discrimination multi-classes

Author: Elisseeff André
Guermeur Yann
Paugam-Moisy Hélène
Publication venue: Unité de recherche INRIA Lorraine
Publication date: 01/09/1999
Field of study

Colloque avec actes et comité de lecture.Nous étudions les performances en généralisation des systèmes de discrimination à catégories multiples. Nous établissons deux bornes sur ces performances, en fonction de deux mesures de capacité de la famille de fonctions calculées : la fonction de croissance et les nombres de couverture. Ces bornes sont évaluées sur un modèle de combinaison de classifieurs estimant les probabilités a posteriori des classes. Ceci permet de comparer l'adéquation des deux mesures de capacité

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Learning from compressed observations

Author: Raginsky Maxim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

The problem of statistical learning is to construct a predictor of a random variable

Y

as a function of a related random variable

X

on the basis of an i.i.d. training sample from the joint distribution of

(X,Y)

. Allowable predictors are drawn from some specified class, and the goal is to approach asymptotically the performance (expected loss) of the best predictor in the class. We consider the setting in which one has perfect observation of the

X

-part of the sample, while the

Y

-part has to be communicated at some finite bit rate. The encoding of the

Y

-values is allowed to depend on the

X

-values. Under suitable regularity conditions on the admissible predictors, the underlying family of probability distributions and the loss function, we give an information-theoretic characterization of achievable predictor performance in terms of conditional distortion-rate functions. The ideas are illustrated on the example of nonparametric regression in Gaussian noise.Comment: 6 pages; submitted to the 2007 IEEE Information Theory Workshop (ITW 2007

arXiv.org e-Print Archive

CiteSeerX

Crossref

Agnostic Learning of Disjunctions on Symmetric Distributions

Author: Feldman Vitaly
Kothari Pravesh
Publication venue
Publication date: 25/05/2015
Field of study

We consider the problem of approximating and learning disjunctions (or equivalently, conjunctions) on symmetric distributions over

\{0,1\}^n

. Symmetric distributions are distributions whose PDF is invariant under any permutation of the variables. We give a simple proof that for every symmetric distribution

\mathcal{D}

, there exists a set of

n^{O(\log{(1/\epsilon)})}

functions

\mathcal{S}

, such that for every disjunction

c

, there is function

p

, expressible as a linear combination of functions in

\mathcal{S}

, such that

p

\epsilon

-approximates

c

\ell_1

distance on

\mathcal{D}

\mathbf{E}_{x \sim \mathcal{D}}[ |c(x)-p(x)|] \leq \epsilon

. This directly gives an agnostic learning algorithm for disjunctions on symmetric distributions that runs in time

n^{O( \log{(1/\epsilon)})}

. The best known previous bound is

n^{O(1/\epsilon^4)}

and follows from approximation of the more general class of halfspaces (Wimmer, 2010). We also show that there exists a symmetric distribution

\mathcal{D}

, such that the minimum degree of a polynomial that

1/3

-approximates the disjunction of all

n

variables is

\ell_1

distance on

\mathcal{D}

\Omega( \sqrt{n})

. Therefore the learning result above cannot be achieved via

\ell_1

-regression with a polynomial basis used in most other agnostic learning algorithms. Our technique also gives a simple proof that for any product distribution

\mathcal{D}

and every disjunction

c

, there exists a polynomial

p

of degree

O(\log{(1/\epsilon)})

such that

p

\epsilon

-approximates

c

\ell_1

distance on

\mathcal{D}

. This was first proved by Blais et al. (2008) via a more involved argument

arXiv.org e-Print Archive

CiteSeerX

Complexity of hyperconcepts

Author: Ratsaby Joel
Publication venue: Elsevier B.V.
Publication date: 25/10/2006
Field of study

AbstractIn machine-learning, maximizing the sample margin can reduce the learning generalization error. Samples on which the target function has a large margin (γ) convey more information since they yield more accurate hypotheses. Let X be a finite domain and S denote the set of all samples S⊆X of fixed cardinality m. Let H be a class of hypotheses h on X. A hyperconcept h′ is defined as an indicator function for a set A⊆S of all samples on which the corresponding hypothesis h has a margin of at least γ. An estimate on the complexity of the class H′ of hyperconcepts h′ is obtained with explicit dependence on γ, the pseudo-dimension of H and m

Elsevier - Publisher Connector