7,608 research outputs found
Multimodal Machine Learning for Automated ICD Coding
This study presents a multimodal machine learning model to predict ICD-10
diagnostic codes. We developed separate machine learning models that can handle
data from different modalities, including unstructured text, semi-structured
text and structured tabular data. We further employed an ensemble method to
integrate all modality-specific models to generate ICD-10 codes. Key evidence
was also extracted to make our prediction more convincing and explainable. We
used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset
to validate our approach. For ICD code prediction, our best-performing model
(micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other
baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and
Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability,
our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text
data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780
and 0.5002 respectively.Comment: Machine Learning for Healthcare 201
Predicting Clinical Events by Combining Static and Dynamic Information Using Recurrent Neural Networks
In clinical data sets we often find static information (e.g. patient gender,
blood type, etc.) combined with sequences of data that are recorded during
multiple hospital visits (e.g. medications prescribed, tests performed, etc.).
Recurrent Neural Networks (RNNs) have proven to be very successful for
modelling sequences of data in many areas of Machine Learning. In this work we
present an approach based on RNNs, specifically designed for the clinical
domain, that combines static and dynamic information in order to predict future
events. We work with a database collected in the Charit\'{e} Hospital in Berlin
that contains complete information concerning patients that underwent a kidney
transplantation. After the transplantation three main endpoints can occur:
rejection of the kidney, loss of the kidney and death of the patient. Our goal
is to predict, based on information recorded in the Electronic Health Record of
each patient, whether any of those endpoints will occur within the next six or
twelve months after each visit to the clinic. We compared different types of
RNNs that we developed for this work, with a model based on a Feedforward
Neural Network and a Logistic Regression model. We found that the RNN that we
developed based on Gated Recurrent Units provides the best performance for this
task. We also used the same models for a second task, i.e., next event
prediction, and found that here the model based on a Feedforward Neural Network
outperformed the other models. Our hypothesis is that long-term dependencies
are not as relevant in this task
- …