8 research outputs found
From Classification Accuracy to Proper Scoring Rules: Elicitability of Probabilistic Top List Predictions
In the face of uncertainty, the need for probabilistic assessments has long
been recognized in the literature on forecasting. In classification, however,
comparative evaluation of classifiers often focuses on predictions specifying a
single class through the use of simple accuracy measures, which disregard any
probabilistic uncertainty quantification. I propose probabilistic top lists as
a novel type of prediction in classification, which bridges the gap between
single-class predictions and predictive distributions. The probabilistic top
list functional is elicitable through the use of strictly consistent evaluation
metrics. The proposed evaluation metrics are based on symmetric proper scoring
rules and admit comparison of various types of predictions ranging from
single-class point predictions to fully specified predictive distributions. The
Brier score yields a metric that is particularly well suited for this kind of
comparison
In Defense of Softmax Parametrization for Calibrated and Consistent Learning to Defer
Enabling machine learning classifiers to defer their decision to a downstream
expert when the expert is more accurate will ensure improved safety and
performance. This objective can be achieved with the learning-to-defer
framework which aims to jointly learn how to classify and how to defer to the
expert. In recent studies, it has been theoretically shown that popular
estimators for learning to defer parameterized with softmax provide unbounded
estimates for the likelihood of deferring which makes them uncalibrated.
However, it remains unknown whether this is due to the widely used softmax
parameterization and if we can find a softmax-based estimator that is both
statistically consistent and possesses a valid probability estimator. In this
work, we first show that the cause of the miscalibrated and unbounded estimator
in prior literature is due to the symmetric nature of the surrogate losses used
and not due to softmax. We then propose a novel statistically consistent
asymmetric softmax-based surrogate loss that can produce valid estimates
without the issue of unboundedness. We further analyze the non-asymptotic
properties of our method and empirically validate its performance and
calibration on benchmark datasets.Comment: NeurIPS 202
Improve learning combining crowdsourced labels by weighting Areas Under the Margin
In supervised learning -- for instance in image classification -- modern
massive datasets are commonly labeled by a crowd of workers. The obtained
labels in this crowdsourcing setting are then aggregated for training. The
aggregation step generally leverages a per worker trust score. Yet, such
worker-centric approaches discard each task ambiguity. Some intrinsically
ambiguous tasks might even fool expert workers, which could eventually be
harmful for the learning step. In a standard supervised learning setting --
with one label per task and balanced classes -- the Area Under the Margin (AUM)
statistic is tailored to identify mislabeled data. We adapt the AUM to identify
ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted
AUM (WAUM). The WAUM is an average of AUMs weighted by worker and task
dependent scores. We show that the WAUM can help discarding ambiguous tasks
from the training set, leading to better generalization or calibration
performance. We report improvements with respect to feature-blind aggregation
strategies both for simulated settings and for the CIFAR-10H crowdsourced
dataset
Recommended from our members
Designing Consistent and Convex Surrogates for General Prediction Tasks
Supervised machine learning algorithms are often predicated on the minimization of loss functions which measure error of a given prediction against a ground truth label. The choice of loss function to minimize corresponds to a summary statistic of the underlying data distribution that is learned in this process. Historically, loss function design has often been ad-hoc, and often results in losses that are not actually statistically consistent with respect to the target prediction task. This work focuses on the design of losses that are simultaneously convex, consistent with respect to a target prediction task, and efficient in the dimension of the prediction space. We provide frameworks to construct such losses in both discrete prediction and continuous estimation settings, as well as tools to lower bound the prediction dimension for certain classes of consistent convex losses. We apply our results throughout to understand prediction tasks such as high-confidence classification, top-k prediction, variance estimation, conditional value at risk, and ratios of expectations