Chest X-ray (CXR) anatomical abnormality detection aims at localizing and
characterising cardiopulmonary radiological findings in the radiographs, which
can expedite clinical workflow and reduce observational oversights. Most
existing methods attempted this task in either fully supervised settings which
demanded costly mass per-abnormality annotations, or weakly supervised settings
which still lagged badly behind fully supervised methods in performance. In
this work, we propose a co-evolutionary image and report distillation (CEIRD)
framework, which approaches semi-supervised abnormality detection in CXR by
grounding the visual detection results with text-classified abnormalities from
paired radiology reports, and vice versa. Concretely, based on the classical
teacher-student pseudo label distillation (TSD) paradigm, we additionally
introduce an auxiliary report classification model, whose prediction is used
for report-guided pseudo detection label refinement (RPDLR) in the primary
vision detection task. Inversely, we also use the prediction of the vision
detection model for abnormality-guided pseudo classification label refinement
(APCLR) in the auxiliary report classification task, and propose a co-evolution
strategy where the vision and report models mutually promote each other with
RPDLR and APCLR performed alternatively. To this end, we effectively
incorporate the weak supervision by reports into the semi-supervised TSD
pipeline. Besides the cross-modal pseudo label refinement, we further propose
an intra-image-modal self-adaptive non-maximum suppression, where the pseudo
detection labels generated by the teacher vision model are dynamically
rectified by high-confidence predictions by the student. Experimental results
on the public MIMIC-CXR benchmark demonstrate CEIRD's superior performance to
several up-to-date weakly and semi-supervised methods