3,742 research outputs found
Hashing based Answer Selection
Answer selection is an important subtask of question answering (QA), where
deep models usually achieve better performance. Most deep models adopt
question-answer interaction mechanisms, such as attention, to get vector
representations for answers. When these interaction based deep models are
deployed for online prediction, the representations of all answers need to be
recalculated for each question. This procedure is time-consuming for deep
models with complex encoders like BERT which usually have better accuracy than
simple encoders. One possible solution is to store the matrix representation
(encoder output) of each answer in memory to avoid recalculation. But this will
bring large memory cost. In this paper, we propose a novel method, called
hashing based answer selection (HAS), to tackle this problem. HAS adopts a
hashing strategy to learn a binary matrix representation for each answer, which
can dramatically reduce the memory cost for storing the matrix representations
of answers. Hence, HAS can adopt complex encoders like BERT in the model, but
the online prediction of HAS is still fast with a low memory cost. Experimental
results on three popular answer selection datasets show that HAS can outperform
existing models to achieve state-of-the-art performance
Cross Temporal Recurrent Networks for Ranking Question Answer Pairs
Temporal gates play a significant role in modern recurrent-based neural
encoders, enabling fine-grained control over recursive compositional operations
over time. In recurrent models such as the long short-term memory (LSTM),
temporal gates control the amount of information retained or discarded over
time, not only playing an important role in influencing the learned
representations but also serving as a protection against vanishing gradients.
This paper explores the idea of learning temporal gates for sequence pairs
(question and answer), jointly influencing the learned representations in a
pairwise manner. In our approach, temporal gates are learned via 1D
convolutional layers and then subsequently cross applied across question and
answer for joint learning. Empirically, we show that this conceptually simple
sharing of temporal gates can lead to competitive performance across multiple
benchmarks. Intuitively, what our network achieves can be interpreted as
learning representations of question and answer pairs that are aware of what
each other is remembering or forgetting, i.e., pairwise temporal gating. Via
extensive experiments, we show that our proposed model achieves
state-of-the-art performance on two community-based QA datasets and competitive
performance on one factoid-based QA dataset.Comment: Accepted to AAAI201
Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms
In NLP, convolutional neural networks (CNNs) have benefited less than
recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that
this is because the attention in CNNs has been mainly implemented as attentive
pooling (i.e., it is applied to pooling) rather than as attentive convolution
(i.e., it is integrated into convolution). Convolution is the differentiator of
CNNs in that it can powerfully model the higher-level representation of a word
by taking into account its local fixed-size context in the input text t^x. In
this work, we propose an attentive convolution network, ATTCONV. It extends the
context scope of the convolution operation, deriving higher-level features for
a word not only from local context, but also information extracted from
nonlocal context by the attention mechanism commonly used in RNNs. This
nonlocal context can come (i) from parts of the input text t^x that are distant
or (ii) from extra (i.e., external) contexts t^y. Experiments on sentence
modeling with zero-context (sentiment analysis), single-context (textual
entailment) and multiple-context (claim verification) demonstrate the
effectiveness of ATTCONV in sentence representation learning with the
incorporation of context. In particular, attentive convolution outperforms
attentive pooling and is a strong competitor to popular attentive RNNs.Comment: Camera-ready for TACL. 16 page
- …