The learning to defer (L2D) framework has the potential to make AI systems
safer. For a given input, the system can defer the decision to a human if the
human is more likely than the model to take the correct action. We study the
calibration of L2D systems, investigating if the probabilities they output are
sound. We find that Mozannar & Sontag's (2020) multiclass framework is not
calibrated with respect to expert correctness. Moreover, it is not even
guaranteed to produce valid probabilities due to its parameterization being
degenerate for this purpose. We propose an L2D system based on one-vs-all
classifiers that is able to produce calibrated probabilities of expert
correctness. Furthermore, our loss function is also a consistent surrogate for
multiclass L2D, like Mozannar & Sontag's (2020). Our experiments verify that
not only is our system calibrated, but this benefit comes at no cost to
accuracy. Our model's accuracy is always comparable (and often superior) to
Mozannar & Sontag's (2020) model's in tasks ranging from hate speech detection
to galaxy classification to diagnosis of skin lesions.Comment: Accepted at the International Conference on Machine Learning (ICML),
202