2 research outputs found
Combining Survival Analysis and Machine Learning for Mass Cancer Risk Prediction using EHR data
Purely medical cancer screening methods are often costly, time-consuming, and
weakly applicable on a large scale. Advanced Artificial Intelligence (AI)
methods greatly help cancer detection but require specific or deep medical
data. These aspects affect the mass implementation of cancer screening methods.
For these reasons, it is a disruptive change for healthcare to apply AI methods
for mass personalized assessment of the cancer risk among patients based on the
existing Electronic Health Records (EHR) volume.
This paper presents a novel method for mass cancer risk prediction using EHR
data. Among other methods, our one stands out by the minimum data greedy
policy, requiring only a history of medical service codes and diagnoses from
EHR. We formulate the problem as a binary classification. This dataset contains
175 441 de-identified patients (2 861 diagnosed with cancer). As a baseline, we
implement a solution based on a recurrent neural network (RNN). We propose a
method that combines machine learning and survival analysis since these
approaches are less computationally heavy, can be combined into an ensemble
(the Survival Ensemble), and can be reproduced in most medical institutions.
We test the Survival Ensemble in some studies. Firstly, we obtain a
significant difference between values of the primary metric (Average Precision)
with 22.8% (ROC AUC 83.7%, F1 17.8%) for the Survival Ensemble versus 15.1%
(ROC AUC 84.9%, F1 21.4%) for the Baseline. Secondly, the performance of the
Survival Ensemble is also confirmed during the ablation study. Thirdly, our
method exceeds age baselines by a significant margin. Fourthly, in the blind
retrospective out-of-time experiment, the proposed method is reliable in cancer
patient detection (9 out of 100 selected). Such results exceed the estimates of
medical screenings, e.g., the best Number Needed to Screen (9 out of 1000
screenings)