25 research outputs found
Assessing mortality prediction through different representation models based on concepts extracted from clinical notes
Recent years have seen particular interest in using electronic medical
records (EMRs) for secondary purposes to enhance the quality and safety of
healthcare delivery. EMRs tend to contain large amounts of valuable clinical
notes. Learning of embedding is a method for converting notes into a format
that makes them comparable. Transformer-based representation models have
recently made a great leap forward. These models are pre-trained on large
online datasets to understand natural language texts effectively. The quality
of a learning embedding is influenced by how clinical notes are used as input
to representation models. A clinical note has several sections with different
levels of information value. It is also common for healthcare providers to use
different expressions for the same concept. Existing methods use clinical notes
directly or with an initial preprocessing as input to representation models.
However, to learn a good embedding, we identified the most essential clinical
notes section. We then mapped the extracted concepts from selected sections to
the standard names in the Unified Medical Language System (UMLS). We used the
standard phrases corresponding to the unique concepts as input for clinical
models. We performed experiments to measure the usefulness of the learned
embedding vectors in the task of hospital mortality prediction on a subset of
the publicly available Medical Information Mart for Intensive Care (MIMIC-III)
dataset. According to the experiments, clinical transformer-based
representation models produced better results with getting input generated by
standard names of extracted unique concepts compared to other input formats.
The best-performing models were BioBERT, PubMedBERT, and UmlsBERT,
respectively
Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data
Electronic health records (EHR) are increasingly being used for constructing
disease risk prediction models. Feature engineering in EHR data however is
challenging due to their highly dimensional and heterogeneous nature.
Low-dimensional representations of EHR data can potentially mitigate these
challenges. In this paper, we use global vectors (GloVe) to learn word
embeddings for diagnoses and procedures recorded using 13 million ontology
terms across 2.7 million hospitalisations in national UK EHR. We demonstrate
the utility of these embeddings by evaluating their performance in identifying
patients which are at higher risk of being hospitalised for congestive heart
failure. Our findings indicate that embeddings can enable the creation of
robust EHR-derived disease risk prediction models and address some the
limitations associated with manual clinical feature engineering.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
arXiv:1811.0721