125,423 research outputs found
HEGEL: Hypergraph Transformer for Long Document Summarization
Extractive summarization for long documents is challenging due to the
extended structured input context. The long-distance sentence dependency
hinders cross-sentence relations modeling, the critical step of extractive
summarization. This paper proposes HEGEL, a hypergraph neural network for long
document summarization by capturing high-order cross-sentence relations. HEGEL
updates and learns effective sentence representations with hypergraph
transformer layers and fuses different types of sentence dependencies,
including latent topics, keywords coreference, and section structure. We
validate HEGEL by conducting extensive experiments on two benchmark datasets,
and experimental results demonstrate the effectiveness and efficiency of HEGEL.Comment: EMNLP 202
Joint Learning of Local and Global Features for Aspect-based Sentiment Classification
Aspect-based sentiment classification (ASC) aims to judge the sentiment
polarity conveyed by the given aspect term in a sentence. The sentiment
polarity is not only determined by the local context but also related to the
words far away from the given aspect term. Most recent efforts related to the
attention-based models can not sufficiently distinguish which words they should
pay more attention to in some cases. Meanwhile, graph-based models are coming
into ASC to encode syntactic dependency tree information. But these models do
not fully leverage syntactic dependency trees as they neglect to incorporate
dependency relation tag information into representation learning effectively.
In this paper, we address these problems by effectively modeling the local and
global features. Firstly, we design a local encoder containing: a Gaussian mask
layer and a covariance self-attention layer. The Gaussian mask layer tends to
adjust the receptive field around aspect terms adaptively to deemphasize the
effects of unrelated words and pay more attention to local information. The
covariance self-attention layer can distinguish the attention weights of
different words more obviously. Furthermore, we propose a dual-level graph
attention network as a global encoder by fully employing dependency tag
information to capture long-distance information effectively. Our model
achieves state-of-the-art performance on both SemEval 2014 and Twitter
datasets.Comment: under revie
Eliminating Gradient Conflict in Reference-based Line-Art Colorization
Reference-based line-art colorization is a challenging task in computer
vision. The color, texture, and shading are rendered based on an abstract
sketch, which heavily relies on the precise long-range dependency modeling
between the sketch and reference. Popular techniques to bridge the cross-modal
information and model the long-range dependency employ the attention mechanism.
However, in the context of reference-based line-art colorization, several
techniques would intensify the existing training difficulty of attention, for
instance, self-supervised training protocol and GAN-based losses. To understand
the instability in training, we detect the gradient flow of attention and
observe gradient conflict among attention branches. This phenomenon motivates
us to alleviate the gradient issue by preserving the dominant gradient branch
while removing the conflict ones. We propose a novel attention mechanism using
this training strategy, Stop-Gradient Attention (SGA), outperforming the
attention baseline by a large margin with better training stability. Compared
with state-of-the-art modules in line-art colorization, our approach
demonstrates significant improvements in Fr\'echet Inception Distance (FID, up
to 27.21%) and structural similarity index measure (SSIM, up to 25.67%) on
several benchmarks. The code of SGA is available at
https://github.com/kunkun0w0/SGA .Comment: Accepted by ECCV202
Recurrent Memory Networks for Language Modeling
Recurrent Neural Networks (RNN) have obtained excellent result in many
natural language processing (NLP) tasks. However, understanding and
interpreting the source of this success remains a challenge. In this paper, we
propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only
amplifies the power of RNN but also facilitates our understanding of its
internal functioning and allows us to discover underlying patterns in data. We
demonstrate the power of RMN on language modeling and sentence completion
tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM)
network on three large German, Italian, and English dataset. Additionally we
perform in-depth analysis of various linguistic dimensions that RMN captures.
On Sentence Completion Challenge, for which it is essential to capture sentence
coherence, our RMN obtains 69.2% accuracy, surpassing the previous
state-of-the-art by a large margin.Comment: 8 pages, 6 figures. Accepted at NAACL 201
Dependency relations as source context in phrase-based SMT
The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical
choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and
supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a Dutch—English translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline
Neural End-to-End Learning for Computational Argumentation Mining
We investigate neural techniques for end-to-end computational argumentation
mining (AM). We frame AM both as a token-based dependency parsing and as a
token-based sequence tagging problem, including a multi-task learning setup.
Contrary to models that operate on the argument component level, we find that
framing AM as dependency parsing leads to subpar performance results. In
contrast, less complex (local) tagging models based on BiLSTMs perform robustly
across classification scenarios, being able to catch long-range dependencies
inherent to the AM problem. Moreover, we find that jointly learning 'natural'
subtasks, in a multi-task learning setup, improves performance.Comment: To be published at ACL 201
- …