6 research outputs found
Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models
Learning accurate probabilistic models from data is crucial in many practical
tasks in data mining. In this paper we present a new non-parametric calibration
method called \textit{ensemble of near isotonic regression} (ENIR). The method
can be considered as an extension of BBQ, a recently proposed calibration
method, as well as the commonly used calibration method based on isotonic
regression. ENIR is designed to address the key limitation of isotonic
regression which is the monotonicity assumption of the predictions. Similar to
BBQ, the method post-processes the output of a binary classifier to obtain
calibrated probabilities. Thus it can be combined with many existing
classification models. We demonstrate the performance of ENIR on synthetic and
real datasets for the commonly used binary classification models. Experimental
results show that the method outperforms several common binary classifier
calibration methods. In particular on the real data, ENIR commonly performs
statistically significantly better than the other methods, and never worse. It
is able to improve the calibration power of classifiers, while retaining their
discrimination power. The method is also computationally tractable for large
scale datasets, as it is time, where is the number of
samples
Calibrating Deep Neural Networks using Focal Loss
Miscalibration -- a mismatch between a model's confidence and its correctness
-- of Deep Neural Networks (DNNs) makes their predictions hard to rely on.
Ideally, we want networks to be accurate, calibrated and confident. We show
that, as opposed to the standard cross-entropy loss, focal loss (Lin et al.,
2017) allows us to learn models that are already very well calibrated. When
combined with temperature scaling, whilst preserving accuracy, it yields
state-of-the-art calibrated models. We provide a thorough analysis of the
factors causing miscalibration, and use the insights we glean from this to
justify the empirically excellent performance of focal loss. To facilitate the
use of focal loss in practice, we also provide a principled approach to
automatically select the hyperparameter involved in the loss function. We
perform extensive experiments on a variety of computer vision and NLP datasets,
and with a wide variety of network architectures, and show that our approach
achieves state-of-the-art accuracy and calibration in almost all cases
An Operational Perspective to Fairness Interventions: Where and How to Intervene
As AI-based decision systems proliferate, their successful operationalization
requires balancing multiple desiderata: predictive performance, disparity
across groups, safeguarding sensitive group attributes (e.g., race), and
engineering cost. We present a holistic framework for evaluating and
contextualizing fairness interventions with respect to the above desiderata.
The two key points of practical consideration are \emph{where} (pre-, in-,
post-processing) and \emph{how} (in what way the sensitive group data is used)
the intervention is introduced. We demonstrate our framework with a case study
on predictive parity. In it, we first propose a novel method for achieving
predictive parity fairness without using group data at inference time via
distibutionally robust optimization. Then, we showcase the effectiveness of
these methods in a benchmarking study of close to 400 variations across two
major model types (XGBoost vs. Neural Net), ten datasets, and over twenty
unique methodologies. Methodological insights derived from our empirical study
inform the practical design of ML workflow with fairness as a central concern.
We find predictive parity is difficult to achieve without using group data, and
despite requiring group data during model training (but not inference),
distributionally robust methods we develop provide significant Pareto
improvement. Moreover, a plain XGBoost model often Pareto-dominates neural
networks with fairness interventions, highlighting the importance of model
inductive bias