1,630 research outputs found
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering
Evidence retrieval is a critical stage of question answering (QA), necessary
not only to improve performance, but also to explain the decisions of the
corresponding QA method. We introduce a simple, fast, and unsupervised
iterative evidence retrieval method, which relies on three ideas: (a) an
unsupervised alignment approach to soft-align questions and answers with
justification sentences using only GloVe embeddings, (b) an iterative process
that reformulates queries focusing on terms that are not covered by existing
justifications, which (c) a stopping criterion that terminates retrieval when
the terms in the given question and candidate answers are covered by the
retrieved justifications. Despite its simplicity, our approach outperforms all
the previous methods (including supervised methods) on the evidence selection
task on two datasets: MultiRC and QASC. When these evidence sentences are fed
into a RoBERTa answer classification component, we achieve state-of-the-art QA
performance on these two datasets.Comment: Accepted at ACL 2020 as a long conference pape
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
The alignment of heterogeneous sequential data (video to text) is an
important and challenging problem. Standard techniques for this task, including
Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from
inherent drawbacks. Mainly, the Markov assumption implies that, given the
immediate past, future alignment decisions are independent of further history.
The separation between similarity computation and alignment decision also
prevents end-to-end training. In this paper, we propose an end-to-end neural
architecture where alignment actions are implemented as moving data between
stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture
supports a large variety of alignment tasks, including one-to-one, one-to-many,
skipping unmatched elements, and (with extensions) non-monotonic alignment.
Extensive experiments on semi-synthetic and real datasets show that our
algorithm outperforms state-of-the-art baselines.Comment: Accepted at CVPR 2018 (Spotlight). arXiv file includes the paper and
the supplemental materia
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
An important goal of computer vision is to build systems that learn visual
representations over time that can be applied to many tasks. In this paper, we
investigate a vision-language embedding as a core representation and show that
it leads to better cross-task transfer than standard multi-task learning. In
particular, the task of visual recognition is aligned to the task of visual
question answering by forcing each to use the same word-region embeddings. We
show this leads to greater inductive transfer from recognition to VQA than
standard multitask learning. Visual recognition also improves, especially for
categories that have relatively few recognition training labels but appear
often in the VQA setting. Thus, our paper takes a small step towards creating
more general vision systems by showing the benefit of interpretable, flexible,
and trainable core representations.Comment: Accepted in ICCV 2017. The arxiv version has an extra analysis on
correlation with human attentio
Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering
We propose an unsupervised strategy for the selection of justification
sentences for multi-hop question answering (QA) that (a) maximizes the
relevance of the selected sentences, (b) minimizes the overlap between the
selected facts, and (c) maximizes the coverage of both question and answer.
This unsupervised sentence selection method can be coupled with any supervised
QA approach. We show that the sentences selected by our method improve the
performance of a state-of-the-art supervised QA model on two multi-hop QA
datasets: AI2's Reasoning Challenge (ARC) and Multi-Sentence Reading
Comprehension (MultiRC). We obtain new state-of-the-art performance on both
datasets among approaches that do not use external resources for training the
QA system: 56.82% F1 on ARC (41.24% on Challenge and 64.49% on Easy) and 26.1%
EM0 on MultiRC. Our justification sentences have higher quality than the
justifications selected by a strong information retrieval baseline, e.g., by
5.4% F1 in MultiRC. We also show that our unsupervised selection of
justification sentences is more stable across domains than a state-of-the-art
supervised sentence selection method.Comment: Published at EMNLP-IJCNLP 2019 as long conference paper. Corrected
the name reference for Speer et.al, 201
Learning Social Image Embedding with Deep Multimodal Attention Networks
Learning social media data embedding by deep models has attracted extensive
research interest as well as boomed a lot of applications, such as link
prediction, classification, and cross-modal search. However, for social images
which contain both link information and multimodal contents (e.g., text
description, and visual content), simply employing the embedding learnt from
network structure or data content results in sub-optimal social image
representation. In this paper, we propose a novel social image embedding
approach called Deep Multimodal Attention Networks (DMAN), which employs a deep
model to jointly embed multimodal contents and link information. Specifically,
to effectively capture the correlations between multimodal contents, we propose
a multimodal attention network to encode the fine-granularity relation between
image regions and textual words. To leverage the network structure for
embedding learning, a novel Siamese-Triplet neural network is proposed to model
the links among images. With the joint deep model, the learnt embedding can
capture both the multimodal contents and the nonlinear network information.
Extensive experiments are conducted to investigate the effectiveness of our
approach in the applications of multi-label classification and cross-modal
search. Compared to state-of-the-art image embeddings, our proposed DMAN
achieves significant improvement in the tasks of multi-label classification and
cross-modal search
- …