10 research outputs found
On the Universality of the Logistic Loss Function
A loss function measures the discrepancy between the true values
(observations) and their estimated fits, for a given instance of data. A loss
function is said to be proper (unbiased, Fisher consistent) if the fits are
defined over a unit simplex, and the minimizer of the expected loss is the true
underlying probability of the data. Typical examples are the zero-one loss, the
quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work
we show that for binary classification problems, the divergence associated with
smooth, proper and convex loss functions is bounded from above by the
Kullback-Leibler (KL) divergence, up to a multiplicative normalization
constant. It implies that by minimizing the log-loss (associated with the KL
divergence), we minimize an upper bound to any choice of loss functions from
this set. This property justifies the broad use of log-loss in regression,
decision trees, deep neural networks and many other applications. In addition,
we show that the KL divergence bounds from above any separable Bregman
divergence that is convex in its second argument (up to a multiplicative
normalization constant). This result introduces a new set of divergence
inequalities, similar to the well-known Pinsker inequality
Bregman Divergence Bounds and the Universality of the Logarithmic Loss
A loss function measures the discrepancy between the true values and their
estimated fits, for a given instance of data. In classification problems, a
loss function is said to be proper if the minimizer of the expected loss is the
true underlying probability. In this work we show that for binary
classification, the divergence associated with smooth, proper and convex loss
functions is bounded from above by the Kullback-Leibler (KL) divergence, up to
a normalization constant. It implies that by minimizing the log-loss
(associated with the KL divergence), we minimize an upper bound to any choice
of loss from this set. This property suggests that the log-loss is universal in
the sense that it provides performance guarantees to a broad class of accuracy
measures. Importantly, our notion of universality is not restricted to a
specific problem. This allows us to apply our results to many applications,
including predictive modeling, data clustering and sample complexity analysis.
Further, we show that the KL divergence bounds from above any separable Bregman
divergence that is convex in its second argument (up to a normalization
constant). This result introduces a new set of divergence inequalities, similar
to Pinsker inequality, and extends well-known -divergence inequality
results.Comment: arXiv admin note: substantial text overlap with arXiv:1805.0380
Using Conformal Win Probability to Predict the Winners of the Cancelled 2020 NCAA Basketball Tournaments
The COVID-19 pandemic was responsible for the cancellation of both the men's
and women's 2020 National Collegiate Athletic Association (NCAA) Division 1
basketball tournaments. Starting from the point at which the Division 1
tournaments and any unfinished conference tournaments were cancelled, we
deliver closed-form probabilities for each team of making the Division 1
tournaments, had they not been cancelled, aided by use of conformal predictive
distributions. We also deliver probabilities of a team winning March Madness,
given a tournament bracket. We then compare single-game win probabilities
generated with conformal predictive distributions, aptly named conformal win
probabilities, to those generated through linear and logistic regression on
seven years of historical college basketball data, specifically from the
2014-2015 season through the 2020-2021 season. Conformal win probabilities are
shown to be better calibrated than other methods, resulting in more accurate
win probability estimates, while requiring fewer distributional assumptions.Comment: preprint submitted to Journal of Quantitative Analysis in Sports, 28
pages without figures; figures included at end of documen