Multimodal data matters: language model pre-training over structured and unstructured electronic health records

Abstract

The massive amount of electronic health records (EHR) has created enormous potential in improving healthcare. Clinical codes (structured data) and clinical narratives (unstructured data) are two important textual modalities in EHR. Clinical codes convey diagnostic and treatment information during the hospital, and clinical notes carry narratives of clinical providers for patient encounters. They do not exist in isolation and can complement each other in most real-life clinical scenarios. However, most existing EHR-oriented studies either focus on a particular modality or integrate data from different modalities in a straightforward manner, which ignores the intrinsic interactions between them. To address these issues, we proposed a Medical Multimodal Pre-trained Language Model, named MedM-PLM, to learn enhanced EHR representations over structured and unstructured data. In MedM-PLM, two Transformer-based neural network components are firstly adopted to learn representative characteristics from each modality. A cross-modal module is then introduced to model their interactions. We pre-trained MedM-PLM on the MIMIC-III dataset and verified the effectiveness of the model on three downstream clinical tasks, i.e., medication recommendation, 30-day readmission prediction and ICD coding. Extensive experiments demonstrate the power of MedM-PLM compared with state-of-the-art methods. Further analyses and visualizations show the robustness of our model, which could potentially provide more comprehensive interpretations for clinical decision-making.Comment: 30 pages, 5 figure

    Similar works

    Full text

    thumbnail-image

    Available Versions