5 research outputs found
Evidence Inference 2.0: More Data, Better Models
How do we most effectively treat a disease or condition? Ideally, we could
consult a database of evidence gleaned from clinical trials to answer such
questions. Unfortunately, no such database exists; clinical trial results are
instead disseminated primarily via lengthy natural language articles. Perusing
all such articles would be prohibitively time-consuming for healthcare
practitioners; they instead tend to depend on manually compiled systematic
reviews of medical literature to inform care.
NLP may speed this process up, and eventually facilitate immediate consult of
published evidence. The Evidence Inference dataset was recently released to
facilitate research toward this end. This task entails inferring the
comparative performance of two treatments, with respect to a given outcome,
from a particular article (describing a clinical trial) and identifying
supporting evidence. For instance: Does this article report that chemotherapy
performed better than surgery for five-year survival rates of operable cancers?
In this paper, we collect additional annotations to expand the Evidence
Inference dataset by 25\%, provide stronger baseline models, systematically
inspect the errors that these make, and probe dataset quality. We also release
an abstract only (as opposed to full-texts) version of the task for rapid model
prototyping. The updated corpus, documentation, and code for new baselines and
evaluations are available at http://evidence-inference.ebm-nlp.com/.Comment: Accepted as workshop paper into BioNLP Updated results from SciBERT
to Biomed RoBERT
Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations
The best evidence concerning comparative treatment effectiveness comes from
clinical trials, the results of which are reported in unstructured articles.
Medical experts must manually extract information from articles to inform
decision-making, which is time-consuming and expensive. Here we consider the
end-to-end task of both (a) extracting treatments and outcomes from full-text
articles describing clinical trials (entity identification) and, (b) inferring
the reported results for the former with respect to the latter (relation
extraction). We introduce new data for this task, and evaluate models that have
recently achieved state-of-the-art results on similar tasks in Natural Language
Processing. We then propose a new method motivated by how trial results are
typically presented that outperforms these purely data-driven baselines.
Finally, we run a fielded evaluation of the model with a non-profit seeking to
identify existing drugs that might be re-purposed for cancer, showing the
potential utility of end-to-end evidence extraction systems
Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization
We consider the problem of automatically generating a narrative biomedical
evidence summary from multiple trial reports. We evaluate modern neural models
for abstractive summarization of relevant article abstracts from systematic
reviews previously conducted by members of the Cochrane collaboration, using
the authors conclusions section of the review abstract as our target. We enlist
medical professionals to evaluate generated summaries, and we find that modern
summarization systems yield consistently fluent and relevant synopses, but that
they are not always factual. We propose new approaches that capitalize on
domain-specific models to inform summarization, e.g., by explicitly demarcating
snippets of inputs that convey key findings, and emphasizing the reports of
large and high-quality trials. We find that these strategies modestly improve
the factual accuracy of generated summaries. Finally, we propose a new method
for automatically evaluating the factuality of generated narrative evidence
syntheses using models that infer the directionality of reported findings.Comment: 11 pages, 2 figures. Accepted for presentation at the 2021 AMIA
Informatics Summi
Interpreting Neural Networks for and with Natural Language
In the past decade, natural language processing (NLP) systems have come to be built almost exclusively on a backbone of large neural models. As the landscape of feasible tasks has widened due to the capabilities of these models, the space of applications has also widened to include subfields with real-world consequences, such as fact-checking, fake news detection, and medical decision support. The increasing size and nonlinearity of these models results in an opacity that hinders efforts by machine learning practitioners and lay-users alike to understand their internals and derive meaning or trust from their predictions.
The fields of explainable artificial intelligence (XAI) and more specifically explainable NLP (ExNLP) have emerged as an active area for remedying this opacity and for ensuring models' reliability and trustworthiness in high-stakes scenarios, by providing textual explanations meaningful to human users. Models that produce justifications for their individual predictions can be inspected for the purposes of debugging, quantifying bias and fairness, understanding model behavior, and ascertaining robustness and privacy. Textual explanation is a predominant form of explanation in machine learning datasets regardless of task modality. As such, this dissertation covers both explaining tasks with natural language and explaining natural language tasks.
In this dissertation, I propose test suites for evaluating the quality of model explanations under two definitions of meaning: faithfulness and human acceptability. I use these evaluation methods to investigate the utility of two explanation forms and three model architectures. I finally propose two methods to improve explanation quality– one which increases the likelihood of faithful highlight explanations and one which improves the human acceptability of free-text explanations. This work strives to increase the likelihood of positive use and outcomes when AI systems are deployed in practice.Ph.D
Preclinical risk of bias assessment and PICO extraction using natural language processing
Drug development starts with preclinical studies which test the efficacy and
toxicology of potential candidates in living animals, before proceeding to
clinical trials examined on human subjects. Many drugs shown to be effective
in preclinical animal studies fail in clinical trials, indicating the potential
reproducibility issues and translation failure. To obtain less biased research
findings, systematic reviews are performed to collate all relevant evidence from
publications. However, systematic reviews are time-consuming and
researchers have advocated the use of automation techniques to speed the
process and reduce human efforts. Good progress has been made in
implementing automation tools into reviews for clinical trials while the tools
developed for preclinical systematic reviews are scarce. Tools for preclinical
systematic reviews should be designed specifically because preclinical
experiments differ from clinical trials. In this thesis, I explore natural language
processing models for facilitating two stages in preclinical systematic reviews:
risk of bias assessment and PICO extraction.
There are a range of measures used to reduce bias in animal experiments and
many checklist criteria require the reporting of those measures in publications.
In the first part of the thesis, I implement several binary classification models
to indicate the reporting of random allocation to groups, blinded assessment
of outcome, conflict of interests, compliance of animal welfare regulations, and
statement of animal exclusions in preclinical publications. I compare traditional
machine learning classifiers with several text representation methods,
convolutional/recurrent/hierarchical neural networks, and propose two
strategies to adapt BERT models to long documents. My findings indicate that
neural networks and BERT-based models achieve better performance than
traditional classifiers and rule-based approaches. The attention mechanism
and hierarchical architecture in neural networks do not improve performance
but are useful for extracting relevant words or sentences from publications to
inform users’ judgement. The advantages of the transformer structure are
hindered when documents are long and computing resources are limited.
In literature retrieval and citation screening of published evidence, the key
elements of interest are Population, Intervention, Comparator and Outcome,
which compose the framework of PICO. In the second part of the thesis, I first
apply several question answering models based on attention flows and
transformers to extract phrases describing intervention or method of induction
of disease models from clinical abstracts and preclinical full texts. For
preclinical datasets describing multiple interventions or induction methods in
the full texts, I apply additional unsupervised information retrieval methods to
extract relevant sentences. The question answering models achieve good
performance when the text is at abstract-level and contains only one
intervention or induction method, while for truncated documents with multiple
PICO mentions, the performance is less satisfactory. Considering this
limitation, I then collect preclinical abstracts with finer-grained PICO
annotations and develop named entity recognition models for extraction of
preclinical PICO elements including Species, Strain, Induction, Intervention,
Comparator and Outcome. I decompose PICO extraction into two independent
tasks: 1) PICO sentences classification, and 2) PICO elements detection. For
PICO extraction, BERT-based models pre-trained from biomedical corpus
outperform recurrent networks and the conditional probabilistic module only
shows advantages in recurrent networks. Self-training strategy applied to
enlarge training set from unlabelled abstracts yields better performance for
PICO elements which lack enough amount of instances.
Experimental results demonstrate the possibilities of facilitating preclinical risk
of bias assessment and PICO extraction by natural language processing