10 research outputs found

    On the Universality of the Logistic Loss Function

    Full text link
    A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality

    Bregman Divergence Bounds and the Universality of the Logarithmic Loss

    Full text link
    A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if the minimizer of the expected loss is the true underlying probability. In this work we show that for binary classification, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss from this set. This property suggests that the log-loss is universal in the sense that it provides performance guarantees to a broad class of accuracy measures. Importantly, our notion of universality is not restricted to a specific problem. This allows us to apply our results to many applications, including predictive modeling, data clustering and sample complexity analysis. Further, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a normalization constant). This result introduces a new set of divergence inequalities, similar to Pinsker inequality, and extends well-known ff-divergence inequality results.Comment: arXiv admin note: substantial text overlap with arXiv:1805.0380

    Using Conformal Win Probability to Predict the Winners of the Cancelled 2020 NCAA Basketball Tournaments

    Get PDF
    The COVID-19 pandemic was responsible for the cancellation of both the men's and women's 2020 National Collegiate Athletic Association (NCAA) Division 1 basketball tournaments. Starting from the point at which the Division 1 tournaments and any unfinished conference tournaments were cancelled, we deliver closed-form probabilities for each team of making the Division 1 tournaments, had they not been cancelled, aided by use of conformal predictive distributions. We also deliver probabilities of a team winning March Madness, given a tournament bracket. We then compare single-game win probabilities generated with conformal predictive distributions, aptly named conformal win probabilities, to those generated through linear and logistic regression on seven years of historical college basketball data, specifically from the 2014-2015 season through the 2020-2021 season. Conformal win probabilities are shown to be better calibrated than other methods, resulting in more accurate win probability estimates, while requiring fewer distributional assumptions.Comment: preprint submitted to Journal of Quantitative Analysis in Sports, 28 pages without figures; figures included at end of documen
    corecore