Objective: We develop a deep learning framework based on the pre-trained
Bidirectional Encoder Representations from Transformers (BERT) model using
unstructured clinical notes from electronic health records (EHRs) to predict
the risk of disease progression from Mild Cognitive Impairment (MCI) to
Alzheimer's Disease (AD). Materials and Methods: We identified 3657 patients
diagnosed with MCI together with their progress notes from Northwestern
Medicine Enterprise Data Warehouse (NMEDW) between 2000-2020. The progress
notes no later than the first MCI diagnosis were used for the prediction. We
first preprocessed the notes by deidentification, cleaning and splitting, and
then pretrained a BERT model for AD (AD-BERT) based on the publicly available
Bio+Clinical BERT on the preprocessed notes. The embeddings of all the sections
of a patient's notes processed by AD-BERT were combined by MaxPooling to
compute the probability of MCI-to-AD progression. For replication, we conducted
a similar set of experiments on 2563 MCI patients identified at Weill Cornell
Medicine (WCM) during the same timeframe. Results: Compared with the 7 baseline
models, the AD-BERT model achieved the best performance on both datasets, with
Area Under receiver operating characteristic Curve (AUC) of 0.8170 and F1 score
of 0.4178 on NMEDW dataset and AUC of 0.8830 and F1 score of 0.6836 on WCM
dataset. Conclusion: We developed a deep learning framework using BERT models
which provide an effective solution for prediction of MCI-to-AD progression
using clinical note analysis