21 research outputs found
Retrospective Reader for Machine Reading Comprehension
Machine reading comprehension (MRC) is an AI challenge that requires machine
to determine the correct answers to questions based on a given passage. MRC
systems must not only answer question when necessary but also distinguish when
no answer is available according to the given passage and then tactfully
abstain from answering. When unanswerable questions are involved in the MRC
task, an essential verification module called verifier is especially required
in addition to the encoder, though the latest practice on MRC modeling still
most benefits from adopting well pre-trained language models as the encoder
block by only focusing on the "reading". This paper devotes itself to exploring
better verifier design for the MRC task with unanswerable questions. Inspired
by how humans solve reading comprehension questions, we proposed a
retrospective reader (Retro-Reader) that integrates two stages of reading and
verification strategies: 1) sketchy reading that briefly investigates the
overall interactions of passage and question, and yield an initial judgment; 2)
intensive reading that verifies the answer and gives the final prediction. The
proposed reader is evaluated on two benchmark MRC challenge datasets SQuAD2.0
and NewsQA, achieving new state-of-the-art results. Significance tests show
that our model is significantly better than the strong ELECTRA and ALBERT
baselines. A series of analysis is also conducted to interpret the
effectiveness of the proposed reader.Comment: Accepted by AAAI 202
Bipartite Flat-Graph Network for Nested Named Entity Recognition
In this paper, we propose a novel bipartite flat-graph network (BiFlaG) for
nested named entity recognition (NER), which contains two subgraph modules: a
flat NER module for outermost entities and a graph module for all the entities
located in inner layers. Bidirectional LSTM (BiLSTM) and graph convolutional
network (GCN) are adopted to jointly learn flat entities and their inner
dependencies. Different from previous models, which only consider the
unidirectional delivery of information from innermost layers to outer ones (or
outside-to-inside), our model effectively captures the bidirectional
interaction between them. We first use the entities recognized by the flat NER
module to construct an entity graph, which is fed to the next graph module. The
richer representation learned from graph module carries the dependencies of
inner entities and can be exploited to improve outermost entity predictions.
Experimental results on three standard nested NER datasets demonstrate that our
BiFlaG outperforms previous state-of-the-art models.Comment: Accepted by ACL202
Hierarchical Contextualized Representation for Named Entity Recognition
Named entity recognition (NER) models are typically based on the architecture
of Bi-directional LSTM (BiLSTM). The constraints of sequential nature and the
modeling of single input prevent the full utilization of global information
from larger scope, not only in the entire sentence, but also in the entire
document (dataset). In this paper, we address these two deficiencies and
propose a model augmented with hierarchical contextualized representation:
sentence-level representation and document-level representation. In
sentence-level, we take different contributions of words in a single sentence
into consideration to enhance the sentence representation learned from an
independent BiLSTM via label embedding attention mechanism. In document-level,
the key-value memory network is adopted to record the document-aware
information for each unique word which is sensitive to similarity of context
information. Our two-level hierarchical contextualized representations are
fused with each input token embedding and corresponding hidden state of BiLSTM,
respectively. The experimental results on three benchmark NER datasets
(CoNLL-2003 and Ontonotes 5.0 English datasets, CoNLL-2002 Spanish dataset)
show that we establish new state-of-the-art results.Comment: Accepted by AAAI 202
REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement
When answering a question, people often draw upon their rich world knowledge
in addition to the particular context. While recent works retrieve supporting
facts/evidence from commonsense knowledge bases to supply additional
information to each question, there is still ample opportunity to advance it on
the quality of the evidence. It is crucial since the quality of the evidence is
the key to answering commonsense questions, and even determines the upper bound
on the QA systems performance. In this paper, we propose a recursive erasure
memory network (REM-Net) to cope with the quality improvement of evidence. To
address this, REM-Net is equipped with a module to refine the evidence by
recursively erasing the low-quality evidence that does not explain the question
answering. Besides, instead of retrieving evidence from existing knowledge
bases, REM-Net leverages a pre-trained generative model to generate candidate
evidence customized for the question. We conduct experiments on two commonsense
question answering datasets, WIQA and CosmosQA. The results demonstrate the
performance of REM-Net and show that the refined evidence is explainable.Comment: Accepted by AAAI 202