Machine Learning Creates a Simple Endoscopic Classification System that Improves Dysplasia Detection in Barrett's Oesophagus amongst Non-expert Endoscopists
INTRODUCTION: Barrett’s oesophagus (BE) is a precursor to oesophageal adenocarcinoma (OAC). Endoscopic surveillance is
performed to detect dysplasia arising in BE as it is likely to be amenable to curative treatment. At present, there are no
guidelines on who should perform surveillance endoscopy in BE. Machine learning (ML) is a branch of artificial intelligence
(AI) that generates simple rules, known as decision trees (DTs). We hypothesised that a DT generated from recognised expert
endoscopists could be used to improve dysplasia detection in non-expert endoscopists. To our knowledge, ML has never been
applied in this manner. METHODS: Video recordings were collected from patients with non-dysplastic (ND-BE) and dysplastic
Barrett’s oesophagus (D-BE) undergoing high-definition endoscopy with i-Scan enhancement (PENTAX®). A strict protocol
was used to record areas of interest after which a corresponding biopsy was taken to confirm the histological diagnosis. In a
blinded manner, videos were shown to 3 experts who were asked to interpret them based on their mucosal and
microvasculature patterns and presence of nodularity and ulceration as well as overall suspected diagnosis. Data generated were
entered into the WEKA package to construct a DT for dysplasia prediction. Non-expert endoscopists (gastroenterology
specialist registrars in training with variable experience and undergraduate medical students with no experience) were asked to
score these same videos both before and after web-based training using the DT constructed from the expert opinion. Accuracy,
sensitivity, and specificity values were calculated before and after training where p < 0 05 was statistically significant. RESULTS:
Videos from 40 patients were collected including 12 both before and after acetic acid (ACA) application. Experts’ average
accuracy for dysplasia prediction was 88%. When experts’ answers were entered into a DT, the resultant decision model had a
92% accuracy with a mean sensitivity and specificity of 97% and 88%, respectively. Addition of ACA did not improve dysplasia
detection. Untrained medical students tended to have a high sensitivity but poor specificity as they “overcalled” normal areas.
Gastroenterology trainees did the opposite with overall low sensitivity but high specificity. Detection improved significantly and
accuracy rose in both groups after formal web-based training although it did it reach the accuracy generated by experts. For
trainees, sensitivity rose significantly from 71% to 83% with minimal loss of specificity. Specificity rose sharply in students from
31% to 49% with no loss of sensitivity. CONCLUSION: ML is able to define rules learnt from expert opinion. These generate a
simple algorithm to accurately predict dysplasia. Once taught to non-experts, the algorithm significantly improves their rate of
dysplasia detection. This opens the door to standardised training and assessment of competence for those who perform
endoscopy in BE. It may shorten the learning curve and might also be used to compare competence of trainees with recognised
experts as part of their accreditation process