70 research outputs found
Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering
We propose an unsupervised strategy for the selection of justification
sentences for multi-hop question answering (QA) that (a) maximizes the
relevance of the selected sentences, (b) minimizes the overlap between the
selected facts, and (c) maximizes the coverage of both question and answer.
This unsupervised sentence selection method can be coupled with any supervised
QA approach. We show that the sentences selected by our method improve the
performance of a state-of-the-art supervised QA model on two multi-hop QA
datasets: AI2's Reasoning Challenge (ARC) and Multi-Sentence Reading
Comprehension (MultiRC). We obtain new state-of-the-art performance on both
datasets among approaches that do not use external resources for training the
QA system: 56.82% F1 on ARC (41.24% on Challenge and 64.49% on Easy) and 26.1%
EM0 on MultiRC. Our justification sentences have higher quality than the
justifications selected by a strong information retrieval baseline, e.g., by
5.4% F1 in MultiRC. We also show that our unsupervised selection of
justification sentences is more stable across domains than a state-of-the-art
supervised sentence selection method.Comment: Published at EMNLP-IJCNLP 2019 as long conference paper. Corrected
the name reference for Speer et.al, 201
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering
Evidence retrieval is a critical stage of question answering (QA), necessary
not only to improve performance, but also to explain the decisions of the
corresponding QA method. We introduce a simple, fast, and unsupervised
iterative evidence retrieval method, which relies on three ideas: (a) an
unsupervised alignment approach to soft-align questions and answers with
justification sentences using only GloVe embeddings, (b) an iterative process
that reformulates queries focusing on terms that are not covered by existing
justifications, which (c) a stopping criterion that terminates retrieval when
the terms in the given question and candidate answers are covered by the
retrieved justifications. Despite its simplicity, our approach outperforms all
the previous methods (including supervised methods) on the evidence selection
task on two datasets: MultiRC and QASC. When these evidence sentences are fed
into a RoBERTa answer classification component, we achieve state-of-the-art QA
performance on these two datasets.Comment: Accepted at ACL 2020 as a long conference pape
Semi-Structured Chain-of-Thought: Integrating Multiple Sources of Knowledge for Improved Language Model Reasoning
An important open question pertaining to the use of large language models for
knowledge-intensive tasks is how to effectively integrate knowledge from three
sources: the model's parametric memory, external structured knowledge, and
external unstructured knowledge. Most existing prompting methods either rely
solely on one or two of these sources, or require repeatedly invoking large
language models to generate similar or identical content. In this work, we
overcome these limitations by introducing a novel semi-structured prompting
approach that seamlessly integrates the model's parametric memory with
unstructured knowledge from text documents and structured knowledge from
knowledge graphs. Experimental results on open-domain multi-hop question
answering datasets demonstrate that our prompting method significantly
surpasses existing techniques, even exceeding those which require fine-tuning
Fusing Temporal Graphs into Transformers for Time-Sensitive Question Answering
Answering time-sensitive questions from long documents requires temporal
reasoning over the times in questions and documents. An important open question
is whether large language models can perform such reasoning solely using a
provided text document, or whether they can benefit from additional temporal
information extracted using other systems. We address this research question by
applying existing temporal information extraction systems to construct temporal
graphs of events, times, and temporal relations in questions and documents. We
then investigate different approaches for fusing these graphs into Transformer
models. Experimental results show that our proposed approach for fusing
temporal graphs into input text substantially enhances the temporal reasoning
capabilities of Transformer models with or without fine-tuning. Additionally,
our proposed method outperforms various graph convolution-based approaches and
establishes a new state-of-the-art performance on SituatedQA and three splits
of TimeQA.Comment: EMNLP 2023 Finding
Semantic role labeling for protein transport predicates
<p>Abstract</p> <p>Background</p> <p>Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein transport in GeneRIFs – manually curated sentences focusing on gene functions. To avoid the computational cost of syntactic parsing, and because the boundaries of our protein transport roles often did not match up with syntactic phrase boundaries, we approached this problem with a word-chunking paradigm and trained support vector machine classifiers to classify words as being at the beginning, inside or outside of a protein transport role.</p> <p>Results</p> <p>We collected a set of 837 GeneRIFs describing movements of proteins between cellular components, whose predicates were annotated for the semantic roles AGENT, PATIENT, ORIGIN and DESTINATION. We trained these models with the features of previous word-chunking models, features adapted from phrase-chunking models, and features derived from an analysis of our data. Our models were able to label protein transport semantic roles with 87.6% precision and 79.0% recall when using manually annotated protein boundaries, and 87.0% precision and 74.5% recall when using automatically identified ones.</p> <p>Conclusion</p> <p>We successfully adapted the word-chunking classification paradigm to semantic role labeling, applying it to a new domain with predicates completely absent from any previous studies. By combining the traditional word and phrasal role labeling features with biomedical features like protein boundaries and MEDPOST part of speech tags, we were able to address the challenges posed by the new domain data and subsequently build robust models that achieved F-measures as high as 83.1. This system for extracting protein transport information from GeneRIFs performs well even with proteins identified automatically, and is therefore more robust than the rule-based methods previously used to extract protein transport roles.</p
Recommended from our members
Discovering body site and severity modifiers in clinical texts
Objective: To research computational methods for discovering body site and severity modifiers in clinical texts. Methods: We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. Results: The performance of our method for discovering body site modifiers achieves F1 of 0.740–0.908 and our method for discovering severity modifiers achieves F1 of 0.905–0.929. Discussion Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. Conclusions: We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES)
TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla
Many areas, such as the biological and healthcare domain, artistic works, and
organization names, have nested, overlapping, discontinuous entity mentions
that may even be syntactically or semantically ambiguous in practice.
Traditional sequence tagging algorithms are unable to recognize these complex
mentions because they may violate the assumptions upon which sequence tagging
schemes are founded. In this paper, we describe our contribution to SemEval
2022 Task 11 on identifying such complex Named Entities. We have leveraged the
ensemble of multiple ELECTRA-based models that were exclusively pretrained on
the Bangla language with the performance of ELECTRA-based models pretrained on
English to achieve competitive performance on the Track-11. Besides providing a
system description, we will also present the outcomes of our experiments on
architectural decisions, dataset augmentations, and post-competition findings.Comment: accepted in Proceedings of the 16th International Workshop on
Semantic Evaluatio
- …