136 research outputs found
Bregman Divergence Bounds and the Universality of the Logarithmic Loss
A loss function measures the discrepancy between the true values and their
estimated fits, for a given instance of data. In classification problems, a
loss function is said to be proper if the minimizer of the expected loss is the
true underlying probability. In this work we show that for binary
classification, the divergence associated with smooth, proper and convex loss
functions is bounded from above by the Kullback-Leibler (KL) divergence, up to
a normalization constant. It implies that by minimizing the log-loss
(associated with the KL divergence), we minimize an upper bound to any choice
of loss from this set. This property suggests that the log-loss is universal in
the sense that it provides performance guarantees to a broad class of accuracy
measures. Importantly, our notion of universality is not restricted to a
specific problem. This allows us to apply our results to many applications,
including predictive modeling, data clustering and sample complexity analysis.
Further, we show that the KL divergence bounds from above any separable Bregman
divergence that is convex in its second argument (up to a normalization
constant). This result introduces a new set of divergence inequalities, similar
to Pinsker inequality, and extends well-known -divergence inequality
results.Comment: arXiv admin note: substantial text overlap with arXiv:1805.0380
Minimax optimal quantile and semi-adversarial regret via root-logarithmic regularizers
Quantile (and, more generally, KL) regret bounds, such as those achieved by
NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the
goal of competing against the best individual expert to only competing against
a majority of experts on adversarial data. More recently, the semi-adversarial
paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of
adversarial online learning by considering data that may be neither fully adversarial
nor stochastic (i.i.d.). We achieve the minimax optimal regret in both paradigms
using FTRL with separate, novel, root-logarithmic regularizers, both of which
can be interpreted as yielding variants of NormalHedge. We extend existing KL
regret upper bounds, which hold uniformly over target distributions, to possibly
uncountable expert classes with arbitrary priors; provide the first full-information
lower bounds for quantile regret on finite expert classes (which are tight); and
provide an adaptively minimax optimal algorithm for the semi-adversarial paradigm
that adapts to the true, unknown constraint faster, leading to uniformly improved
regret bounds over existing methods.https://arxiv.org/pdf/2110.14804.pdfPublished versio
- …