10,778 research outputs found
Structural Attention Neural Networks for improved sentiment analysis
We introduce a tree-structured attention neural network for sentences and
small phrases and apply it to the problem of sentiment classification. Our
model expands the current recursive models by incorporating structural
information around a node of a syntactic tree using both bottom-up and top-down
information propagation. Also, the model utilizes structural attention to
identify the most salient representations during the construction of the
syntactic tree. To our knowledge, the proposed models achieve state of the art
performance on the Stanford Sentiment Treebank dataset.Comment: Submitted to EACL2017 for revie
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Neural networks equipped with self-attention have parallelizable computation,
light-weight structure, and the ability to capture both long-range and local
dependencies. Further, their expressive power and performance can be boosted by
using a vector to measure pairwise dependency, but this requires to expand the
alignment matrix to a tensor, which results in memory and computation
bottlenecks. In this paper, we propose a novel attention mechanism called
"Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as
memory-efficient as a CNN, but significantly outperforms previous
CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token)
and global (source2token) dependencies by a novel compatibility function
composed of dot-product and additive attentions, 2) uses a tensor to represent
the feature-wise alignment scores for better expressive power but only requires
parallelizable matrix multiplications, and 3) combines multi-head with
multi-dimensional attentions, and applies a distinct positional mask to each
head (subspace), so the memory and computation can be distributed to multiple
heads, each with sequential information encoded independently. The experiments
show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or
competitive performance on nine NLP benchmarks with compelling memory- and
time-efficiency
Contextualized Non-local Neural Networks for Sequence Learning
Recently, a large number of neural mechanisms and models have been proposed
for sequence learning, of which self-attention, as exemplified by the
Transformer model, and graph neural networks (GNNs) have attracted much
attention. In this paper, we propose an approach that combines and draws on the
complementary strengths of these two methods. Specifically, we propose
contextualized non-local neural networks (CN), which can both
dynamically construct a task-specific structure of a sentence and leverage rich
local dependencies within a particular neighborhood.
Experimental results on ten NLP tasks in text classification, semantic
matching, and sequence labeling show that our proposed model outperforms
competitive baselines and discovers task-specific dependency structures, thus
providing better interpretability to users.Comment: Accepted by AAAI201
RNNs Implicitly Implement Tensor Product Representations
Recurrent neural networks (RNNs) can learn continuous vector representations
of symbolic structures such as sequences and sentences; these representations
often exhibit linear regularities (analogies). Such regularities motivate our
hypothesis that RNNs that show such regularities implicitly compile symbolic
structures into tensor product representations (TPRs; Smolensky, 1990), which
additively combine tensor products of vectors representing roles (e.g.,
sequence positions) and vectors representing fillers (e.g., particular words).
To test this hypothesis, we introduce Tensor Product Decomposition Networks
(TPDNs), which use TPRs to approximate existing vector representations. We
demonstrate using synthetic data that TPDNs can successfully approximate linear
and tree-based RNN autoencoder representations, suggesting that these
representations exhibit interpretable compositional structure; we explore the
settings that lead RNNs to induce such structure-sensitive representations. By
contrast, further TPDN experiments show that the representations of four models
trained to encode naturally-occurring sentences can be largely approximated
with a bag of words, with only marginal improvements from more sophisticated
structures. We conclude that TPDNs provide a powerful method for interpreting
vector representations, and that standard RNNs can induce compositional
sequence representations that are remarkably well approximated by TPRs; at the
same time, existing training tasks for sentence representation learning may not
be sufficient for inducing robust structural representations.Comment: Accepted to ICLR 201
- …