2,749 research outputs found
Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction
Attention is typically used to select informative sub-phrases that are used
for prediction. This paper investigates the novel use of attention as a form of
feature augmentation, i.e, casted attention. We propose Multi-Cast Attention
Networks (MCAN), a new attention mechanism and general model architecture for a
potpourri of ranking tasks in the conversational modeling and question
answering domains. Our approach performs a series of soft attention operations,
each time casting a scalar feature upon the inner word embeddings. The key idea
is to provide a real-valued hint (feature) to a subsequent encoder layer and is
targeted at improving the representation learning process. There are several
advantages to this design, e.g., it allows an arbitrary number of attention
mechanisms to be casted, allowing for multiple attention types (e.g.,
co-attention, intra-attention) and attention variants (e.g., alignment-pooling,
max-pooling, mean-pooling) to be executed simultaneously. This not only
eliminates the costly need to tune the nature of the co-attention layer, but
also provides greater extents of explainability to practitioners. Via extensive
experiments on four well-known benchmark datasets, we show that MCAN achieves
state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms
existing state-of-the-art models by . MCAN also achieves the best
performing score to date on the well-studied TrecQA dataset.Comment: Accepted to KDD 2018 (Paper titled only "Multi-Cast Attention
Networks" in KDD version
Multi-Instance Learning for End-to-End Knowledge Base Question Answering
End-to-end training has been a popular approach for knowledge base question
answering (KBQA). However, real world applications often contain answers of
varied quality for users' questions. It is not appropriate to treat all
available answers of a user question equally.
This paper proposes a novel approach based on multiple instance learning to
address the problem of noisy answers by exploring consensus among answers to
the same question in training end-to-end KBQA models. In particular, the QA
pairs are organized into bags with dynamic instance selection and different
options of instance weighting. Curriculum learning is utilized to select
instance bags during training. On the public CQA dataset, the new method
significantly improves both entity accuracy and the Rouge-L score over a
state-of-the-art end-to-end KBQA baseline
Adversarial Training for Community Question Answer Selection Based on Multi-scale Matching
Community-based question answering (CQA) websites represent an important
source of information. As a result, the problem of matching the most valuable
answers to their corresponding questions has become an increasingly popular
research topic. We frame this task as a binary (relevant/irrelevant)
classification problem, and present an adversarial training framework to
alleviate label imbalance issue. We employ a generative model to iteratively
sample a subset of challenging negative samples to fool our classification
model. Both models are alternatively optimized using REINFORCE algorithm. The
proposed method is completely different from previous ones, where negative
samples in training set are directly used or uniformly down-sampled. Further,
we propose using Multi-scale Matching which explicitly inspects the correlation
between words and ngrams of different levels of granularity. We evaluate the
proposed method on SemEval 2016 and SemEval 2017 datasets and achieves
state-of-the-art or similar performance
Simple Question Answering with Subgraph Ranking and Joint-Scoring
Knowledge graph based simple question answering (KBSQA) is a major area of
research within question answering. Although only dealing with simple
questions, i.e., questions that can be answered through a single knowledge base
(KB) fact, this task is neither simple nor close to being solved. Targeting on
the two main steps, subgraph selection and fact selection, the research
community has developed sophisticated approaches. However, the importance of
subgraph ranking and leveraging the subject--relation dependency of a KB fact
have not been sufficiently explored. Motivated by this, we present a unified
framework to describe and analyze existing approaches. Using this framework as
a starting point, we focus on two aspects: improving subgraph selection through
a novel ranking method and leveraging the subject--relation dependency by
proposing a joint scoring CNN model with a novel loss function that enforces
the well-order of scores. Our methods achieve a new state of the art (85.44% in
accuracy) on the SimpleQuestions dataset.Comment: Accepted by The 2019 Annual Conference of the North American Chapter
of the Association for Computational Linguistics (NAACL-HLT 2019). 11 pages,
1 figur
Adversarial TableQA: Attention Supervision for Question Answering on Tables
The task of answering a question given a text passage has shown great
developments on model performance thanks to community efforts in building
useful datasets. Recently, there have been doubts whether such rapid progress
has been based on truly understanding language. The same question has not been
asked in the table question answering (TableQA) task, where we are tasked to
answer a query given a table. We show that existing efforts, of using "answers"
for both evaluation and supervision for TableQA, show deteriorating
performances in adversarial settings of perturbations that do not affect the
answer. This insight naturally motivates to develop new models that understand
question and table more precisely. For this goal, we propose Neural Operator
(NeOp), a multi-layer sequential network with attention supervision to answer
the query given a table. NeOp uses multiple Selective Recurrent Units (SelRUs)
to further help the interpretability of the answers of the model. Experiments
show that the use of operand information to train the model significantly
improves the performance and interpretability of TableQA models. NeOp
outperforms all the previous models by a big margin.Comment: ACML 201
No Need to Pay Attention: Simple Recurrent Neural Networks Work! (for Answering "Simple" Questions)
First-order factoid question answering assumes that the question can be
answered by a single fact in a knowledge base (KB). While this does not seem
like a challenging task, many recent attempts that apply either complex
linguistic reasoning or deep neural networks achieve 65%-76% accuracy on
benchmark sets. Our approach formulates the task as two machine learning
problems: detecting the entities in the question, and classifying the question
as one of the relation types in the KB. We train a recurrent neural network to
solve each problem. On the SimpleQuestions dataset, our approach yields
substantial improvements over previously published results --- even neural
networks based on much more complex architectures. The simplicity of our
approach also has practical advantages, such as efficiency and modularity, that
are valuable especially in an industry setting. In fact, we present a
preliminary analysis of the performance of our model on real queries from
Comcast's X1 entertainment platform with millions of users every day.Comment: 7 pages, to appear in EMNLP 201
Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks
We examine the problem of question answering over knowledge graphs, focusing
on simple questions that can be answered by the lookup of a single fact.
Adopting a straightforward decomposition of the problem into entity detection,
entity linking, relation prediction, and evidence combination, we explore
simple yet strong baselines. On the popular SimpleQuestions dataset, we find
that basic LSTMs and GRUs plus a few heuristics yield accuracies that approach
the state of the art, and techniques that do not use neural networks also
perform reasonably well. These results show that gains from sophisticated deep
learning techniques proposed in the literature are quite modest and that some
previous models exhibit unnecessary complexity.Comment: Published in NAACL HLT 201
Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods
Modeling visual search not only offers an opportunity to predict the
usability of an interface before actually testing it on real users, but also
advances scientific understanding about human behavior. In this work, we first
conduct a set of analyses on a large-scale dataset of visual search tasks on
realistic webpages. We then present a deep neural network that learns to
predict the scannability of webpage content, i.e., how easy it is for a user to
find a specific target. Our model leverages both heuristic-based features such
as target size and unstructured features such as raw image pixels. This
approach allows us to model complex interactions that might be involved in a
realistic visual search task, which can not be easily achieved by traditional
analytical models. We analyze the model behavior to offer our insights into how
the salience map learned by the model aligns with human intuition and how the
learned semantic representation of each target type relates to its visual
search performance.Comment: the 2020 CHI Conference on Human Factors in Computing System
One-shot Learning for Question-Answering in Gaokao History Challenge
Answering questions from university admission exams (Gaokao in Chinese) is a
challenging AI task since it requires effective representation to capture
complicated semantic relations between questions and answers. In this work, we
propose a hybrid neural model for deep question-answering task from history
examinations. Our model employs a cooperative gated neural network to retrieve
answers with the assistance of extra labels given by a neural turing machine
labeler. Empirical study shows that the labeler works well with only a small
training dataset and the gated mechanism is good at fetching the semantic
representation of lengthy answers. Experiments on question answering
demonstrate the proposed model obtains substantial performance gains over
various neural model baselines in terms of multiple evaluation metrics.Comment: Proceedings of the 27th International Conference on Computational
Linguistics (COLING 2018
An Attentive Survey of Attention Models
Attention Model has now become an important concept in neural networks that
has been researched within diverse application domains. This survey provides a
structured and comprehensive overview of the developments in modeling
attention. In particular, we propose a taxonomy which groups existing
techniques into coherent categories. We review salient neural architectures in
which attention has been incorporated, and discuss applications in which
modeling attention has shown a significant impact. We also describe how
attention has been used to improve the interpretability of neural networks.
Finally, we discuss some future research directions in attention. We hope this
survey will provide a succinct introduction to attention models and guide
practitioners while developing approaches for their applications.Comment: accepted to Transactions on Intelligent Systems and Technology(TIST);
33 page
- …