Multiple Instance learning (MIL) models have been extensively used in
pathology to predict biomarkers and risk-stratify patients from gigapixel-sized
images. Machine learning problems in medical imaging often deal with rare
diseases, making it important for these models to work in a label-imbalanced
setting. In pathology images, there is another level of imbalance, where given
a positively labeled Whole Slide Image (WSI), only a fraction of pixels within
it contribute to the positive label. This compounds the severity of imbalance
and makes imbalanced classification in pathology challenging. Furthermore,
these imbalances can occur in out-of-distribution (OOD) datasets when the
models are deployed in the real-world. We leverage the idea that decoupling
feature and classifier learning can lead to improved decision boundaries for
label imbalanced datasets. To this end, we investigate the integration of
supervised contrastive learning with multiple instance learning (SC-MIL).
Specifically, we propose a joint-training MIL framework in the presence of
label imbalance that progressively transitions from learning bag-level
representations to optimal classifier learning. We perform experiments with
different imbalance settings for two well-studied problems in cancer pathology:
subtyping of non-small cell lung cancer and subtyping of renal cell carcinoma.
SC-MIL provides large and consistent improvements over other techniques on both
in-distribution (ID) and OOD held-out sets across multiple imbalanced settings