Neural Language Model for Automated Classification of Electronic Medical Records at the Emergency Room. The Significant Benefit of Unsupervised Generative Pre-training
In order to build a national injury surveillance system based on emergency room (ER) visits we are developing a coding system to classify their causes from clinical notes in free-text. Supervised learning techniques have shown good results in this area but require large number of annotated dataset. New levels of performance have been recently achieved in neural language models (NLM) with models based on the Transformer architecture incorporating an unsupervised generative pre-training step. Our hypothesis is that methods involving a generative self-supervised pre-training step can significantly reduce the required number of annotated samples for supervised fine-tuning. In this case study, we assessed whether we could predict from free-text clinical notes whether a visit was the consequence of a traumatic or non-traumatic event. Using fully re-trained GPT-2 models (without OpenAI pre-trained weightings), we compared two scenarios: Scenario A (26 study cases of different training data sizes) consisted in training the GPT-2 on the trauma/non-trauma labeled (up to 161 930) clinical notes. In Scenario B (19 study cases), a first step of self-supervised pre-training phase with unlabeled (up to 151 930) notes and the second step of supervised fine-tuning with labeled (up to 10 000) notes. Results showed that, Scenario A needed to process >6 000 notes to achieve good performance (AUC>0.95), Scenario B needed only 600 notes, gain of a factor 10. At the end case of both scenarios, for 16 times more data (161 930 vs. 10 000), the gain from Scenario A compared to Scenario B is only an improvement of 0.89% in AUC and 2.12% in F1 score. To conclude, it is possible to adapt a multi-purpose NLM model such as the GPT-2 to create a powerful tool for classification of free-text notes with only very small number of labeled samples