4 research outputs found
UW-BHI at MEDIQA 2019: An Analysis of Representation Methods for Medical Natural Language Inference
Recent advances in distributed language modeling have led to large
performance increases on a variety of natural language processing (NLP) tasks.
However, it is not well understood how these methods may be augmented by
knowledge-based approaches. This paper compares the performance and internal
representation of an Enhanced Sequential Inference Model (ESIM) between three
experimental conditions based on the representation method: Bidirectional
Encoder Representations from Transformers (BERT), Embeddings of Semantic
Predications (ESP), or Cui2Vec. The methods were evaluated on the Medical
Natural Language Inference (MedNLI) subtask of the MEDIQA 2019 shared task.
This task relied heavily on semantic understanding and thus served as a
suitable evaluation set for the comparison of these representation methods
Probing Pre-Trained Language Models for Disease Knowledge
Pre-trained language models such as ClinicalBERT have achieved impressive
results on tasks such as medical Natural Language Inference. At first glance,
this may suggest that these models are able to perform medical reasoning tasks,
such as mapping symptoms to diseases. However, we find that standard benchmarks
such as MedNLI contain relatively few examples that require such forms of
reasoning. To better understand the medical reasoning capabilities of existing
language models, in this paper we introduce DisKnE, a new benchmark for Disease
Knowledge Evaluation. To construct this benchmark, we annotated each positive
MedNLI example with the types of medical reasoning that are needed. We then
created negative examples by corrupting these positive examples in an
adversarial way. Furthermore, we define training-test splits per disease,
ensuring that no knowledge about test diseases can be learned from the training
data, and we canonicalize the formulation of the hypotheses to avoid the
presence of artefacts. This leads to a number of binary classification
problems, one for each type of reasoning and each disease. When analysing
pre-trained models for the clinical/biomedical domain on the proposed
benchmark, we find that their performance drops considerably.Comment: Accepted by ACL 2021 Finding
Interpreting patient case descriptions with biomedical language models
The advent of pre-trained language models (LMs) has enabled unprecedented advances in the Natural Language Processing (NLP) field. In this respect, various specialised
LMs for the biomedical domain have been introduced, and similar to their general purpose counterparts, these models have achieved state-of-the-art results in many biomedical NLP tasks. Accordingly, it can be assumed that they can perform medical reasoning. However, given the challenging nature of the biomedical domain and the scarcity
of labelled data, it is still not fully understood what type of knowledge these models encapsulate and how they can be enhanced further. This research seeks to address
these questions, with a focus on the task of interpreting patient case descriptions, which
provides the means to investigate the model’s ability to perform medical reasoning. In
general, this task is concerned with inferring a diagnosis or recommending a treatment
from a text fragment describing a set of symptoms accompanied by other information.
Therefore, we started by probing pre-trained language models. For this purpose, we
constructed a benchmark that is derived from an existing dataset (MedNLI). Following
that, to improve the performance of LMs, we used a distant supervision strategy to
identify cases that are similar to a given one. We then showed that using such similar cases can lead to better results than other strategies for augmenting the input to
the LM. As a final contribution, we studied the possibility of fine-tuning biomedical
LMs on PubMed abstracts that correspond to case reports. In particular, we proposed
a self-supervision task which mimics the downstream tasks of inferring diagnoses and
recommending treatments. The findings in this thesis indicate that the performance of the considered biomedical LMs can be improved by using methods that go beyond
relying on additional manually annotated datasets