3 research outputs found
Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning
Diabetic eye disease is one of the fastest growing causes of preventable
blindness. With the advent of anti-VEGF (vascular endothelial growth factor)
therapies, it has become increasingly important to detect center-involved
diabetic macular edema (ci-DME). However, center-involved diabetic macular
edema is diagnosed using optical coherence tomography (OCT), which is not
generally available at screening sites because of cost and workflow
constraints. Instead, screening programs rely on the detection of hard exudates
in color fundus photographs as a proxy for DME, often resulting in high false
positive or false negative calls. To improve the accuracy of DME screening, we
trained a deep learning model to use color fundus photographs to predict
ci-DME. Our model had an ROC-AUC of 0.89 (95% CI: 0.87-0.91), which corresponds
to a sensitivity of 85% at a specificity of 80%. In comparison, three retinal
specialists had similar sensitivities (82-85%), but only half the specificity
(45-50%, p<0.001 for each comparison with model). The positive predictive value
(PPV) of the model was 61% (95% CI: 56-66%), approximately double the 36-38% by
the retinal specialists. In addition to predicting ci-DME, our model was able
to detect the presence of intraretinal fluid with an AUC of 0.81 (95% CI:
0.81-0.86) and subretinal fluid with an AUC of 0.88 (95% CI: 0.85-0.91). The
ability of deep learning algorithms to make clinically relevant predictions
that generally require sophisticated 3D-imaging equipment from simple 2D images
has broad relevance to many other applications in medical imaging
The unreasonable effectiveness of AI CADe polyp detectors to generalize to new countries
: Artificial Intelligence (AI) Computer-Aided
Detection (CADe) is commonly used for polyp detection, but data seen in
clinical settings can differ from model training. Few studies evaluate how well
CADe detectors perform on colonoscopies from countries not seen during
training, and none are able to evaluate performance without collecting
expensive and time-intensive labels.
: We trained a CADe polyp detector on Israeli colonoscopy
videos (5004 videos, 1106 hours) and evaluated on Japanese videos (354 videos,
128 hours) by measuring the True Positive Rate (TPR) versus false alarms per
minute (FAPM). We introduce a colonoscopy dissimilarity measure called "MAsked
mediCal Embedding Distance" (MACE) to quantify differences between
colonoscopies, without labels. We evaluated CADe on all Japan videos and on
those with the highest MACE.
: MACE correctly quantifies that narrow-band imaging (NBI)
and chromoendoscopy (CE) frames are less similar to Israel data than Japan
whitelight (bootstrapped z-test, |z| > 690, p < for both). Despite
differences in the data, CADe performance on Japan colonoscopies was
non-inferior to Israel ones without additional training (TPR at 0.5 FAPM: 0.957
and 0.972 for Israel and Japan; TPR at 1.0 FAPM: 0.972 and 0.989 for Israel and
Japan; superiority test t > 45.2, p < ). Despite not being trained on
NBI or CE, TPR on those subsets were non-inferior to Japan overall
(non-inferiority test t > 47.3, p < , = 1.5% for both).
: Differences that prevent CADe detectors from
performing well in non-medical settings do not degrade the performance of our
AI CADe polyp detector when applied to data from a new country. MACE can help
medical AI models internationalize by identifying the most "dissimilar" data on
which to evaluate models