6,691 research outputs found
Emergent Predication Structure in Hidden State Vectors of Neural Readers
A significant number of neural architectures for reading comprehension have
recently been developed and evaluated on large cloze-style datasets. We present
experiments supporting the emergence of "predication structure" in the hidden
state vectors of these readers. More specifically, we provide evidence that the
hidden state vectors represent atomic formulas where is a
semantic property (predicate) and is a constant symbol entity identifier.Comment: Accepted for Repl4NLP: 2nd Workshop on Representation Learning for
NL
RACE: Large-scale ReAding Comprehension Dataset From Examinations
We present RACE, a new dataset for benchmark evaluation of methods in the
reading comprehension task. Collected from the English exams for middle and
high school Chinese students in the age range between 12 to 18, RACE consists
of near 28,000 passages and near 100,000 questions generated by human experts
(English instructors), and covers a variety of topics which are carefully
designed for evaluating the students' ability in understanding and reasoning.
In particular, the proportion of questions that requires reasoning is much
larger in RACE than that in other benchmark datasets for reading comprehension,
and there is a significant gap between the performance of the state-of-the-art
models (43%) and the ceiling human performance (95%). We hope this new dataset
can serve as a valuable resource for research and evaluation in machine
comprehension. The dataset is freely available at
http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at
https://github.com/qizhex/RACE_AR_baselines.Comment: EMNLP 201
Retrospective Reader for Machine Reading Comprehension
Machine reading comprehension (MRC) is an AI challenge that requires machine
to determine the correct answers to questions based on a given passage. MRC
systems must not only answer question when necessary but also distinguish when
no answer is available according to the given passage and then tactfully
abstain from answering. When unanswerable questions are involved in the MRC
task, an essential verification module called verifier is especially required
in addition to the encoder, though the latest practice on MRC modeling still
most benefits from adopting well pre-trained language models as the encoder
block by only focusing on the "reading". This paper devotes itself to exploring
better verifier design for the MRC task with unanswerable questions. Inspired
by how humans solve reading comprehension questions, we proposed a
retrospective reader (Retro-Reader) that integrates two stages of reading and
verification strategies: 1) sketchy reading that briefly investigates the
overall interactions of passage and question, and yield an initial judgment; 2)
intensive reading that verifies the answer and gives the final prediction. The
proposed reader is evaluated on two benchmark MRC challenge datasets SQuAD2.0
and NewsQA, achieving new state-of-the-art results. Significance tests show
that our model is significantly better than the strong ELECTRA and ALBERT
baselines. A series of analysis is also conducted to interpret the
effectiveness of the proposed reader.Comment: Accepted by AAAI 202
Who did What: A Large-Scale Person-Centered Cloze Dataset
We have constructed a new "Who-did-What" dataset of over 200,000
fill-in-the-gap (cloze) multiple choice reading comprehension problems
constructed from the LDC English Gigaword newswire corpus. The WDW dataset has
a variety of novel features. First, in contrast with the CNN and Daily Mail
datasets (Hermann et al., 2015) we avoid using article summaries for question
formation. Instead, each problem is formed from two independent articles --- an
article given as the passage to be read and a separate article on the same
events used to form the question. Second, we avoid anonymization --- each
choice is a person named entity. Third, the problems have been filtered to
remove a fraction that are easily solved by simple baselines, while remaining
84% solvable by humans. We report performance benchmarks of standard systems
and propose the WDW dataset as a challenge task for the community.Comment: To appear at EMNLP 2016. Our dataset is available at
tticnlp.github.io/who_did_wha
- …