Recent years have seen particular interest in using electronic medical
records (EMRs) for secondary purposes to enhance the quality and safety of
healthcare delivery. EMRs tend to contain large amounts of valuable clinical
notes. Learning of embedding is a method for converting notes into a format
that makes them comparable. Transformer-based representation models have
recently made a great leap forward. These models are pre-trained on large
online datasets to understand natural language texts effectively. The quality
of a learning embedding is influenced by how clinical notes are used as input
to representation models. A clinical note has several sections with different
levels of information value. It is also common for healthcare providers to use
different expressions for the same concept. Existing methods use clinical notes
directly or with an initial preprocessing as input to representation models.
However, to learn a good embedding, we identified the most essential clinical
notes section. We then mapped the extracted concepts from selected sections to
the standard names in the Unified Medical Language System (UMLS). We used the
standard phrases corresponding to the unique concepts as input for clinical
models. We performed experiments to measure the usefulness of the learned
embedding vectors in the task of hospital mortality prediction on a subset of
the publicly available Medical Information Mart for Intensive Care (MIMIC-III)
dataset. According to the experiments, clinical transformer-based
representation models produced better results with getting input generated by
standard names of extracted unique concepts compared to other input formats.
The best-performing models were BioBERT, PubMedBERT, and UmlsBERT,
respectively