Generative models improve fairness of medical classifiers under
  distribution shifts

Albuquerque, Isabela; Azizi, Shekoofeh; Belgrave, Danielle; Cemgil, Taylan; Gowal, Sven; Karthikesalingam, Alan; Kohli, Pushmeet; Ktena, Ira; Rebuffi, Sylvestre-Alvise; Roy, Abhijit Guha; Tanno, Ryutaro; Wiles, Olivia

Generative models improve fairness of medical classifiers under distribution shifts

Authors: Isabela Albuquerque
Shekoofeh Azizi
Danielle Belgrave
Taylan Cemgil
Sven Gowal
Alan Karthikesalingam
Pushmeet Kohli
Ira Ktena
Sylvestre-Alvise Rebuffi
Abhijit Guha Roy
Ryutaro Tanno
Olivia Wiles
Publication date: 18 April 2023
Publisher

Abstract

A ubiquitous challenge in machine learning is the problem of domain generalisation. This can exacerbate bias against groups or labels that are underrepresented in the datasets used for model development. Model bias can lead to unintended harms, especially in safety-critical applications like healthcare. Furthermore, the challenge is compounded by the difficulty of obtaining labelled data due to high cost or lack of readily available domain expertise. In our work, we show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models. In particular, we leverage the higher abundance of unlabelled data to capture the underlying data distribution of different conditions and subgroups for an imaging modality. By conditioning generative models on appropriate labels, we can steer the distribution of synthetic examples according to specific requirements. We demonstrate that these learned augmentations can surpass heuristic ones by making models more robust and statistically fair in- and out-of-distribution. To evaluate the generality of our approach, we study 3 distinct medical imaging contexts of varying difficulty: (i) histopathology images from a publicly available generalisation benchmark, (ii) chest X-rays from publicly available clinical datasets, and (iii) dermatology images characterised by complex shifts and imaging conditions. Complementing real training samples with synthetic ones improves the robustness of models in all three medical tasks and increases fairness by improving the accuracy of diagnosis within underrepresented groups. This approach leads to stark improvements OOD across modalities: 7.7% prediction accuracy improvement in histopathology, 5.2% in chest radiology with 44.6% lower fairness gap and a striking 63.5% improvement in high-risk sensitivity for dermatology with a 7.5x reduction in fairness gap

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2304.09218

Last time updated on 22/04/2023