The massive amount of electronic health records (EHR) has created enormous
potential in improving healthcare. Clinical codes (structured data) and
clinical narratives (unstructured data) are two important textual modalities in
EHR. Clinical codes convey diagnostic and treatment information during the
hospital, and clinical notes carry narratives of clinical providers for patient
encounters. They do not exist in isolation and can complement each other in
most real-life clinical scenarios. However, most existing EHR-oriented studies
either focus on a particular modality or integrate data from different
modalities in a straightforward manner, which ignores the intrinsic
interactions between them. To address these issues, we proposed a Medical
Multimodal Pre-trained Language Model, named MedM-PLM, to learn enhanced EHR
representations over structured and unstructured data. In MedM-PLM, two
Transformer-based neural network components are firstly adopted to learn
representative characteristics from each modality. A cross-modal module is then
introduced to model their interactions. We pre-trained MedM-PLM on the
MIMIC-III dataset and verified the effectiveness of the model on three
downstream clinical tasks, i.e., medication recommendation, 30-day readmission
prediction and ICD coding. Extensive experiments demonstrate the power of
MedM-PLM compared with state-of-the-art methods. Further analyses and
visualizations show the robustness of our model, which could potentially
provide more comprehensive interpretations for clinical decision-making.Comment: 30 pages, 5 figure