1,771 research outputs found
Composite multiclass losses
We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We subsume results on “classification calibration” by relating it to properness. We determine the stationarity condition, Bregman representation, order-sensitivity, and quasi-convexity of multiclass proper losses. We then characterise the existence and uniqueness of the composite representation formulti class losses. We show how the composite representation is related to other core properties of a loss: mixability, admissibility and (strong) convexity of multiclass losses which we characterise in terms of the Hessian of the Bayes risk. We show that the simple integral representation for binary proper losses can not be extended to multiclass losses but offer concrete guidance regarding how to design different loss functions. The conclusion drawn from these results is that the proper composite representation is a natural and convenient tool for the design of multiclass loss functions
On the Universality of the Logistic Loss Function
A loss function measures the discrepancy between the true values
(observations) and their estimated fits, for a given instance of data. A loss
function is said to be proper (unbiased, Fisher consistent) if the fits are
defined over a unit simplex, and the minimizer of the expected loss is the true
underlying probability of the data. Typical examples are the zero-one loss, the
quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work
we show that for binary classification problems, the divergence associated with
smooth, proper and convex loss functions is bounded from above by the
Kullback-Leibler (KL) divergence, up to a multiplicative normalization
constant. It implies that by minimizing the log-loss (associated with the KL
divergence), we minimize an upper bound to any choice of loss functions from
this set. This property justifies the broad use of log-loss in regression,
decision trees, deep neural networks and many other applications. In addition,
we show that the KL divergence bounds from above any separable Bregman
divergence that is convex in its second argument (up to a multiplicative
normalization constant). This result introduces a new set of divergence
inequalities, similar to the well-known Pinsker inequality
Composite Multiclass Losses
We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a "proper composite loss", which is the composition of a proper loss and a link function. We extend existing results for binary losses to mu
- …