1,487 research outputs found
Temporal Attention-Gated Model for Robust Sequence Classification
Typical techniques for sequence classification are designed for
well-segmented sequences which have been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
expected in real-world applications. In this paper, we present the Temporal
Attention-Gated Model (TAGM) which integrates ideas from attention models and
gated recurrent networks to better deal with noisy or unsegmented sequences.
Specifically, we extend the concept of attention model to measure the relevance
of each observation (time step) of a sequence. We then use a novel gated
recurrent network to learn the hidden representation for the final prediction.
An important advantage of our approach is interpretability since the temporal
attention weights provide a meaningful value for the salience of each time step
in the sequence. We demonstrate the merits of our TAGM approach, both for
prediction accuracy and interpretability, on three different tasks: spoken
digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201
Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations
We propose a novel data augmentation for labeled sentences called contextual
augmentation. We assume an invariance that sentences are natural even if the
words in the sentences are replaced with other words with paradigmatic
relations. We stochastically replace words with other words that are predicted
by a bi-directional language model at the word positions. Words predicted
according to a context are numerous but appropriate for the augmentation of the
original words. Furthermore, we retrofit a language model with a
label-conditional architecture, which allows the model to augment sentences
without breaking the label-compatibility. Through the experiments for six
various different text classification tasks, we demonstrate that the proposed
method improves classifiers based on the convolutional or recurrent neural
networks.Comment: NAACL 201
Learning text representation using recurrent convolutional neural network with highway layers
Recently, the rapid development of word embedding and neural networks has
brought new inspiration to various NLP and IR tasks. In this paper, we describe
a staged hybrid model combining Recurrent Convolutional Neural Networks (RCNN)
with highway layers. The highway network module is incorporated in the middle
takes the output of the bi-directional Recurrent Neural Network (Bi-RNN) module
in the first stage and provides the Convolutional Neural Network (CNN) module
in the last stage with the input. The experiment shows that our model
outperforms common neural network models (CNN, RNN, Bi-RNN) on a sentiment
analysis task. Besides, the analysis of how sequence length influences the RCNN
with highway layers shows that our model could learn good representation for
the long text.Comment: Neu-IR '16 SIGIR Workshop on Neural Information Retrieva
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Neural networks equipped with self-attention have parallelizable computation,
light-weight structure, and the ability to capture both long-range and local
dependencies. Further, their expressive power and performance can be boosted by
using a vector to measure pairwise dependency, but this requires to expand the
alignment matrix to a tensor, which results in memory and computation
bottlenecks. In this paper, we propose a novel attention mechanism called
"Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as
memory-efficient as a CNN, but significantly outperforms previous
CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token)
and global (source2token) dependencies by a novel compatibility function
composed of dot-product and additive attentions, 2) uses a tensor to represent
the feature-wise alignment scores for better expressive power but only requires
parallelizable matrix multiplications, and 3) combines multi-head with
multi-dimensional attentions, and applies a distinct positional mask to each
head (subspace), so the memory and computation can be distributed to multiple
heads, each with sequential information encoded independently. The experiments
show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or
competitive performance on nine NLP benchmarks with compelling memory- and
time-efficiency
- …