22,674 research outputs found
Semantic Role Labeling with Associated Memory Network
Semantic role labeling (SRL) is a task to recognize all the
predicate-argument pairs of a sentence, which has been in a performance
improvement bottleneck after a series of latest works were presented. This
paper proposes a novel syntax-agnostic SRL model enhanced by the proposed
associated memory network (AMN), which makes use of inter-sentence attention of
label-known associated sentences as a kind of memory to further enhance
dependency-based SRL. In detail, we use sentences and their labels from train
dataset as an associated memory cue to help label the target sentence.
Furthermore, we compare several associated sentences selecting strategies and
label merging methods in AMN to find and utilize the label of associated
sentences while attending them. By leveraging the attentive memory from known
training data, Our full model reaches state-of-the-art on CoNLL-2009 benchmark
datasets for syntax-agnostic setting, showing a new effective research line of
SRL enhancement other than exploiting external resources such as well
pre-trained language models.Comment: Published at NAACL 2019; This is camera Ready version; Code is
available at https://github.com/Frozenmad/AMN_SR
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Neural networks equipped with self-attention have parallelizable computation,
light-weight structure, and the ability to capture both long-range and local
dependencies. Further, their expressive power and performance can be boosted by
using a vector to measure pairwise dependency, but this requires to expand the
alignment matrix to a tensor, which results in memory and computation
bottlenecks. In this paper, we propose a novel attention mechanism called
"Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as
memory-efficient as a CNN, but significantly outperforms previous
CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token)
and global (source2token) dependencies by a novel compatibility function
composed of dot-product and additive attentions, 2) uses a tensor to represent
the feature-wise alignment scores for better expressive power but only requires
parallelizable matrix multiplications, and 3) combines multi-head with
multi-dimensional attentions, and applies a distinct positional mask to each
head (subspace), so the memory and computation can be distributed to multiple
heads, each with sequential information encoded independently. The experiments
show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or
competitive performance on nine NLP benchmarks with compelling memory- and
time-efficiency
- …