32 research outputs found
Results of the seventh edition of the BioASQ Challenge
The results of the seventh edition of the BioASQ challenge are presented in
this paper. The aim of the BioASQ challenge is the promotion of systems and
methodologies through the organization of a challenge on the tasks of
large-scale biomedical semantic indexing and question answering. In total, 30
teams with more than 100 systems participated in the challenge this year. As in
previous years, the best systems were able to outperform the strong baselines.
This suggests that state-of-the-art systems are continuously improving, pushing
the frontier of research.Comment: 17 pages, 2 figure
Question Answering with distilled BERT models: A case study for Biomedical Data
In the healthcare industry today, 80% of data is unstructured (Razzak et al., 2019). The challenge this imposes on healthcare providers is that they rely on unstructured data to inform their decision-making. Although Electronic Health Records (EHRs) exist to integrate patient data, healthcare providers are still challenged with searching for information and answers contained within unstructured data. Prior NLP and Deep Learning research has shown that these methods can improve information extraction on unstructured medical documents. This research expands upon those studies by developing a Question Answering system using distilled BERT models. Healthcare providers can use this system on their local computers to search for and receive answers to specific questions about patients. This paper’s best TinyBERT and TinyBioBERT models had Mean Reciprocal Rank (MRRs) of 0.522 and 0.284 respectively. Based on these findings this paper concludes that TinyBERT performed better than TinyBioBERT on BioASQ task 9b data
Constructing Datasets for Multi-hop Reading Comprehension Across Documents
Most Reading Comprehension methods limit themselves to queries which can be
answered using a single sentence, paragraph, or document. Enabling models to
combine disjoint pieces of textual evidence would extend the scope of machine
comprehension methods, but currently there exist no resources to train and test
this capability. We propose a novel task to encourage the development of models
for text understanding across multiple documents and to investigate the limits
of existing methods. In our task, a model learns to seek and combine evidence -
effectively performing multi-hop (alias multi-step) inference. We devise a
methodology to produce datasets for this task, given a collection of
query-answer pairs and thematically linked documents. Two datasets from
different domains are induced, and we identify potential pitfalls and devise
circumvention strategies. We evaluate two previously proposed competitive
models and find that one can integrate information across documents. However,
both models struggle to select relevant information, as providing documents
guaranteed to be relevant greatly improves their performance. While the models
outperform several strong baselines, their best accuracy reaches 42.9% compared
to human performance at 74.0% - leaving ample room for improvement.Comment: This paper directly corresponds to the TACL version
(https://transacl.org/ojs/index.php/tacl/article/view/1325) apart from minor
changes in wording, additional footnotes, and appendice
THiFLY Research at SemEval-2023 Task 7: A Multi-granularity System for CTR-based Textual Entailment and Evidence Retrieval
The NLI4CT task aims to entail hypotheses based on Clinical Trial Reports
(CTRs) and retrieve the corresponding evidence supporting the justification.
This task poses a significant challenge, as verifying hypotheses in the NLI4CT
task requires the integration of multiple pieces of evidence from one or two
CTR(s) and the application of diverse levels of reasoning, including textual
and numerical. To address these problems, we present a multi-granularity system
for CTR-based textual entailment and evidence retrieval in this paper.
Specifically, we construct a Multi-granularity Inference Network (MGNet) that
exploits sentence-level and token-level encoding to handle both textual
entailment and evidence retrieval tasks. Moreover, we enhance the numerical
inference capability of the system by leveraging a T5-based model, SciFive,
which is pre-trained on the medical corpus. Model ensembling and a joint
inference method are further utilized in the system to increase the stability
and consistency of inference. The system achieves f1-scores of 0.856 and 0.853
on textual entailment and evidence retrieval tasks, resulting in the best
performance on both subtasks. The experimental results corroborate the
effectiveness of our proposed method. Our code is publicly available at
https://github.com/THUMLP/NLI4CT.Comment: Accepted by SemEval202
Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision
In this work, we propose a method for the automated refinement of subject
annotations in biomedical literature at the level of concepts. Semantic
indexing and search of biomedical articles in MEDLINE/PubMed are based on
semantic subject annotations with MeSH descriptors that may correspond to
several related but distinct biomedical concepts. Such semantic annotations do
not adhere to the level of detail available in the domain knowledge and may not
be sufficient to fulfil the information needs of experts in the domain. To this
end, we propose a new method that uses weak supervision to train a concept
annotator on the literature available for a particular disease. We test this
method on the MeSH descriptors for two diseases: Alzheimer's Disease and
Duchenne Muscular Dystrophy. The results indicate that concept-occurrence is a
strong heuristic for automated subject annotation refinement and its use as
weak supervision can lead to improved concept-level annotations. The
fine-grained semantic annotations can enable more precise literature retrieval,
sustain the semantic integration of subject annotations with other domain
resources and ease the maintenance of consistent subject annotations, as new
more detailed entries are added in the MeSH thesaurus over time.Comment: 36 pages, 8 figures; Dictionary-based baselines added and conclusions
update
Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization
We consider the problem of automatically generating a narrative biomedical
evidence summary from multiple trial reports. We evaluate modern neural models
for abstractive summarization of relevant article abstracts from systematic
reviews previously conducted by members of the Cochrane collaboration, using
the authors conclusions section of the review abstract as our target. We enlist
medical professionals to evaluate generated summaries, and we find that modern
summarization systems yield consistently fluent and relevant synopses, but that
they are not always factual. We propose new approaches that capitalize on
domain-specific models to inform summarization, e.g., by explicitly demarcating
snippets of inputs that convey key findings, and emphasizing the reports of
large and high-quality trials. We find that these strategies modestly improve
the factual accuracy of generated summaries. Finally, we propose a new method
for automatically evaluating the factuality of generated narrative evidence
syntheses using models that infer the directionality of reported findings.Comment: 11 pages, 2 figures. Accepted for presentation at the 2021 AMIA
Informatics Summi