Search CORE

98 research outputs found

On the Universality of the Logistic Loss Function

Author: Painsky Amichai
Wornell Gregory W.
Publication venue
Publication date: 10/05/2018
Field of study

A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Multiclass Learning with Simplex Coding

Author: Mroueh Youssef
Poggio Tomaso
Rosasco Lorenzo
Slotine Jean-Jacques
Publication venue
Publication date: 01/01/2012
Field of study

In this paper we discuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows to generalize to multiple classes a relaxation approach commonly used in binary classification. In this framework, a relaxation error analysis can be developed avoiding constraints on the considered hypotheses class. Moreover, we show that in this setting it is possible to derive the first provably consistent regularized method with training/tuning complexity which is independent to the number of classes. Tools from convex analysis are introduced that can be used beyond the scope of this paper

arXiv.org e-Print Archive

DSpace@MIT

Archivio istituzionale della ricerca - Università di Genova

Consistency of probabilistic classifier trees

Author: A Beygelzimer
A Kumar
F Hutter
J Duchi
J Fox
L Bottou
MD Reid
PL Bartlett
T Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Ghent University Academic Bibliography