969 research outputs found
Attention Meets Perturbations: Robust and Interpretable Attention with Adversarial Training
Although attention mechanisms have been applied to a variety of deep learning
models and have been shown to improve the prediction performance, it has been
reported to be vulnerable to perturbations to the mechanism. To overcome the
vulnerability to perturbations in the mechanism, we are inspired by adversarial
training (AT), which is a powerful regularization technique for enhancing the
robustness of the models. In this paper, we propose a general training
technique for natural language processing tasks, including AT for attention
(Attention AT) and more interpretable AT for attention (Attention iAT). The
proposed techniques improved the prediction performance and the model
interpretability by exploiting the mechanisms with AT. In particular, Attention
iAT boosts those advantages by introducing adversarial perturbation, which
enhances the difference in the attention of the sentences. Evaluation
experiments with ten open datasets revealed that AT for attention mechanisms,
especially Attention iAT, demonstrated (1) the best performance in nine out of
ten tasks and (2) more interpretable attention (i.e., the resulting attention
correlated more strongly with gradient-based word importance) for all tasks.
Additionally, the proposed techniques are (3) much less dependent on
perturbation size in AT. Our code is available at
https://github.com/shunk031/attention-meets-perturbationComment: 12 pages, 4 figures. Accepted by IEEE Access on Jun. 21, 202
Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training
Adversarial training (AT) for attention mechanisms has successfully reduced
such drawbacks by considering adversarial perturbations. However, this
technique requires label information, and thus, its use is limited to
supervised settings. In this study, we explore the concept of incorporating
virtual AT (VAT) into the attention mechanisms, by which adversarial
perturbations can be computed even from unlabeled data. To realize this
approach, we propose two general training techniques, namely VAT for attention
mechanisms (Attention VAT) and "interpretable" VAT for attention mechanisms
(Attention iVAT), which extend AT for attention mechanisms to a semi-supervised
setting. In particular, Attention iVAT focuses on the differences in attention;
thus, it can efficiently learn clearer attention and improve model
interpretability, even with unlabeled data. Empirical experiments based on six
public datasets revealed that our techniques provide better prediction
performance than conventional AT-based as well as VAT-based techniques, and
stronger agreement with evidence that is provided by humans in detecting
important words in sentences. Moreover, our proposal offers these advantages
without needing to add the careful selection of unlabeled data. That is, even
if the model using our VAT-based technique is trained on unlabeled data from a
source other than the target task, both the prediction performance and model
interpretability can be improved.Comment: 18 pages, 3 figures. Accepted for publication in Springer Applied
Intelligence (APIN
Robust Text Classification: Analyzing Prototype-Based Networks
Downstream applications often require text classification models to be
accurate, robust, and interpretable. While the accuracy of the stateof-the-art
language models approximates human performance, they are not designed to be
interpretable and often exhibit a drop in performance on noisy data. The family
of PrototypeBased Networks (PBNs) that classify examples based on their
similarity to prototypical examples of a class (prototypes) is natively
interpretable and shown to be robust to noise, which enabled its wide usage for
computer vision tasks. In this paper, we study whether the robustness
properties of PBNs transfer to text classification tasks. We design a modular
and comprehensive framework for studying PBNs, which includes different
backbone architectures, backbone sizes, and objective functions. Our evaluation
protocol assesses the robustness of models against character-, word-, and
sentence-level perturbations. Our experiments on three benchmarks show that the
robustness of PBNs transfers to NLP classification tasks facing realistic
perturbations. Moreover, the robustness of PBNs is supported mostly by the
objective function that keeps prototypes interpretable, while the robustness
superiority of PBNs over vanilla models becomes more salient as datasets get
more complex
- …