7,054 research outputs found
Densely Connected Attention Propagation for Reading Comprehension
We propose DecaProp (Densely Connected Attention Propagation), a new densely
connected neural architecture for reading comprehension (RC). There are two
distinct characteristics of our model. Firstly, our model densely connects all
pairwise layers of the network, modeling relationships between passage and
query across all hierarchical levels. Secondly, the dense connectors in our
network are learned via attention instead of standard residual skip-connectors.
To this end, we propose novel Bidirectional Attention Connectors (BAC) for
efficiently forging connections throughout the network. We conduct extensive
experiments on four challenging RC benchmarks. Our proposed approach achieves
state-of-the-art results on all four, outperforming existing baselines by up to
in absolute F1 score.Comment: NIPS 201
Fast Reading Comprehension with ConvNets
State-of-the-art deep reading comprehension models are dominated by recurrent
neural nets. Their sequential nature is a natural fit for language, but it also
precludes parallelization within an instances and often becomes the bottleneck
for deploying such models to latency critical scenarios. This is particularly
problematic for longer texts. Here we present a convolutional architecture as
an alternative to these recurrent architectures. Using simple dilated
convolutional units in place of recurrent ones, we achieve results comparable
to the state of the art on two question answering tasks, while at the same time
achieving up to two orders of magnitude speedups for question answering.Comment: 15 pages, 10 figures, submitted to ICLR 201
Object Ordering with Bidirectional Matchings for Visual Reasoning
Visual reasoning with compositional natural language instructions, e.g.,
based on the newly-released Cornell Natural Language Visual Reasoning (NLVR)
dataset, is a challenging task, where the model needs to have the ability to
create an accurate mapping between the diverse phrases and the several objects
placed in complex arrangements in the image. Further, this mapping needs to be
processed to answer the question in the statement given the ordering and
relationship of the objects across three similar images. In this paper, we
propose a novel end-to-end neural model for the NLVR task, where we first use
joint bidirectional attention to build a two-way conditioning between the
visual information and the language phrases. Next, we use an RL-based pointer
network to sort and process the varying number of unordered objects (so as to
match the order of the statement phrases) in each of the three images and then
pool over the three decisions. Our model achieves strong improvements (of 4-6%
absolute) over the state-of-the-art on both the structured representation and
raw image versions of the dataset.Comment: NAACL 2018 (8 pages; added pointer-ordering examples
DCN+: Mixed Objective and Deep Residual Coattention for Question Answering
Traditional models for question answering optimize using cross entropy loss,
which encourages exact answers at the cost of penalizing nearby or overlapping
answers that are sometimes equally accurate. We propose a mixed objective that
combines cross entropy loss with self-critical policy learning. The objective
uses rewards derived from word overlap to solve the misalignment between
evaluation metric and optimization objective. In addition to the mixed
objective, we improve dynamic coattention networks (DCN) with a deep residual
coattention encoder that is inspired by recent work in deep self-attention and
residual networks. Our proposals improve model performance across question
types and input lengths, especially for long questions that requires the
ability to capture long-term dependencies. On the Stanford Question Answering
Dataset, our model achieves state-of-the-art results with 75.1% exact match
accuracy and 83.1% F1, while the ensemble obtains 78.9% exact match accuracy
and 86.0% F1.Comment: 10 pages, 6 figure
Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds
Question Answering (QA), as a research field, has primarily focused on either
knowledge bases (KBs) or free text as a source of knowledge. These two sources
have historically shaped the kinds of questions that are asked over these
sources, and the methods developed to answer them. In this work, we look
towards a practical use-case of QA over user-instructed knowledge that uniquely
combines elements of both structured QA over knowledge bases, and unstructured
QA over narrative, introducing the task of multi-relational QA over personal
narrative. As a first step towards this goal, we make three key contributions:
(i) we generate and release TextWorldsQA, a set of five diverse datasets, where
each dataset contains dynamic narrative that describes entities and relations
in a simulated world, paired with variably compositional questions over that
knowledge, (ii) we perform a thorough evaluation and analysis of several
state-of-the-art QA models and their variants at this task, and (iii) we
release a lightweight Python-based framework we call TextWorlds for easily
generating arbitrary additional worlds and narrative, with the goal of allowing
the community to create and share a growing collection of diverse worlds as a
test-bed for this task.Comment: published at ACL 201
Contextual Aware Joint Probability Model Towards Question Answering System
In this paper, we address the question answering challenge with the SQuAD 2.0
dataset. We design a model architecture which leverages BERT's capability of
context-aware word embeddings and BiDAF's context interactive exploration
mechanism. By integrating these two state-of-the-art architectures, our system
tries to extract the contextual word representation at word and character
levels, for better comprehension of both question and context and their
correlations. We also propose our original joint posterior probability
predictor module and its associated loss functions. Our best model so far
obtains F1 score of 75.842% and EM score of 72.24% on the test PCE leaderboad
Fast Prototyping a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom Monitoring
Data for human-human spoken dialogues for research and development are
currently very limited in quantity, variety, and sources; such data are even
scarcer in healthcare. In this work, we investigate fast prototyping of a
dialogue comprehension system by leveraging on minimal nurse-to-patient
conversations. We propose a framework inspired by nurse-initiated clinical
symptom monitoring conversations to construct a simulated human-human dialogue
dataset, embodying linguistic characteristics of spoken interactions like
thinking aloud, self-contradiction, and topic drift. We then adopt an
established bidirectional attention pointer network on this simulated dataset,
achieving more than 80% F1 score on a held-out test set from real-world
nurse-to-patient conversations. The ability to automatically comprehend
conversations in the healthcare domain by exploiting only limited data has
implications for improving clinical workflows through red flag symptom
detection and triaging capabilities. We demonstrate the feasibility for
efficient and effective extraction, retrieval and comprehension of symptom
checking information discussed in multi-turn human-human spoken conversations.Comment: 8 pages. To appear in NAACL 201
Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language
The paper introduces methods of adaptation of multilingual masked language
models for a specific language. Pre-trained bidirectional language models show
state-of-the-art performance on a wide range of tasks including reading
comprehension, natural language inference, and sentiment analysis. At the
moment there are two alternative approaches to train such models: monolingual
and multilingual. While language specific models show superior performance,
multilingual models allow to perform a transfer from one language to another
and solve tasks for different languages simultaneously. This work shows that
transfer learning from a multilingual model to monolingual model results in
significant growth of performance on such tasks as reading comprehension,
paraphrase detection, and sentiment analysis. Furthermore, multilingual
initialization of monolingual model substantially reduces training time.
Pre-trained models for the Russian language are open sourced
Sogou Machine Reading Comprehension Toolkit
Machine reading comprehension have been intensively studied in recent years,
and neural network-based models have shown dominant performances. In this
paper, we present a Sogou Machine Reading Comprehension (SMRC) toolkit that can
be used to provide the fast and efficient development of modern machine
comprehension models, including both published models and original prototypes.
To achieve this goal, the toolkit provides dataset readers, a flexible
preprocessing pipeline, necessary neural network components, and built-in
models, which make the whole process of data preparation, model construction,
and training easier
Conditioning LSTM Decoder and Bi-directional Attention Based Question Answering System
Applying neural-networks on Question Answering has gained increasing
popularity in recent years. In this paper, I implemented a model with
Bi-directional attention flow layer, connected with a Multi-layer LSTM encoder,
connected with one start-index decoder and one conditioning end-index decoder.
I introduce a new end-index decoder layer, conditioning on start-index output.
The Experiment shows this has increased model performance by 15.16%. For
prediction, I proposed a new smart-span equation, rewarding both short answer
length and high probability in start-index and end-index, which further
improved the prediction accuracy. The best single model achieves an F1 score of
73.97% and EM score of 64.95% on test set.Comment: 7 pages, 7 figure
- …