17,877 research outputs found
Reinforced Mnemonic Reader for Machine Reading Comprehension
In this paper, we introduce the Reinforced Mnemonic Reader for machine
reading comprehension tasks, which enhances previous attentive readers in two
aspects. First, a reattention mechanism is proposed to refine current
attentions by directly accessing to past attentions that are temporally
memorized in a multi-round alignment architecture, so as to avoid the problems
of attention redundancy and attention deficiency. Second, a new optimization
approach, called dynamic-critical reinforcement learning, is introduced to
extend the standard supervised method. It always encourages to predict a more
acceptable answer so as to address the convergence suppression problem occurred
in traditional reinforcement learning algorithms. Extensive experiments on the
Stanford Question Answering Dataset (SQuAD) show that our model achieves
state-of-the-art results. Meanwhile, our model outperforms previous systems by
over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD
datasets.Comment: Published in 27th International Joint Conference on Artificial
Intelligence (IJCAI), 201
Supervised Attentions for Neural Machine Translation
In this paper, we improve the attention or alignment accuracy of neural
machine translation by utilizing the alignments of training sentence pairs. We
simply compute the distance between the machine attentions and the "true"
alignments, and minimize this cost in the training procedure. Our experiments
on large-scale Chinese-to-English task show that our model improves both
translation and alignment qualities significantly over the large-vocabulary
neural machine translation system, and even beats a state-of-the-art
traditional syntax-based system.Comment: 6 pages. In Proceedings of EMNLP 2016. arXiv admin note: text overlap
with arXiv:1605.0314
Person Search with Natural Language Description
Searching persons in large-scale image databases with the query of natural
language description has important applications in video surveillance. Existing
methods mainly focused on searching persons with image-based or attribute-based
queries, which have major limitations for a practical usage. In this paper, we
study the problem of person search with natural language description. Given the
textual description of a person, the algorithm of the person search is required
to rank all the samples in the person database then retrieve the most relevant
sample corresponding to the queried description. Since there is no person
dataset or benchmark with textual description available, we collect a
large-scale person description dataset with detailed natural language
annotations and person samples from various sources, termed as CUHK Person
Description Dataset (CUHK-PEDES). A wide range of possible models and baselines
have been evaluated and compared on the person search benchmark. An Recurrent
Neural Network with Gated Neural Attention mechanism (GNA-RNN) is proposed to
establish the state-of-the art performance on person search
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Modeling textual or visual information with vector representations trained
from large language or visual datasets has been successfully explored in recent
years. However, tasks such as visual question answering require combining these
vector representations with each other. Approaches to multimodal pooling
include element-wise product or sum, as well as concatenation of the visual and
textual representations. We hypothesize that these methods are not as
expressive as an outer product of the visual and textual vectors. As the outer
product is typically infeasible due to its high dimensionality, we instead
propose utilizing Multimodal Compact Bilinear pooling (MCB) to efficiently and
expressively combine multimodal features. We extensively evaluate MCB on the
visual question answering and grounding tasks. We consistently show the benefit
of MCB over ablations without MCB. For visual question answering, we present an
architecture which uses MCB twice, once for predicting attention over spatial
features and again to combine the attended representation with the question
representation. This model outperforms the state-of-the-art on the Visual7W
dataset and the VQA challenge.Comment: Accepted to EMNLP 201
A Controllable Model of Grounded Response Generation
Current end-to-end neural conversation models inherently lack the flexibility
to impose semantic control in the response generation process, often resulting
in uninteresting responses. Attempts to boost informativeness alone come at the
expense of factual accuracy, as attested by pretrained language models'
propensity to "hallucinate" facts. While this may be mitigated by access to
background knowledge, there is scant guarantee of relevance and informativeness
in generated responses. We propose a framework that we call controllable
grounded response generation (CGRG), in which lexical control phrases are
either provided by a user or automatically extracted by a control phrase
predictor from dialogue context and grounding knowledge. Quantitative and
qualitative results show that, using this framework, a transformer based model
with a novel inductive attention mechanism, trained on a conversation-like
Reddit dataset, outperforms strong generation baselines.Comment: AAAI 202
- …