53 research outputs found
Attention Meets Perturbations: Robust and Interpretable Attention with Adversarial Training
Although attention mechanisms have been applied to a variety of deep learning
models and have been shown to improve the prediction performance, it has been
reported to be vulnerable to perturbations to the mechanism. To overcome the
vulnerability to perturbations in the mechanism, we are inspired by adversarial
training (AT), which is a powerful regularization technique for enhancing the
robustness of the models. In this paper, we propose a general training
technique for natural language processing tasks, including AT for attention
(Attention AT) and more interpretable AT for attention (Attention iAT). The
proposed techniques improved the prediction performance and the model
interpretability by exploiting the mechanisms with AT. In particular, Attention
iAT boosts those advantages by introducing adversarial perturbation, which
enhances the difference in the attention of the sentences. Evaluation
experiments with ten open datasets revealed that AT for attention mechanisms,
especially Attention iAT, demonstrated (1) the best performance in nine out of
ten tasks and (2) more interpretable attention (i.e., the resulting attention
correlated more strongly with gradient-based word importance) for all tasks.
Additionally, the proposed techniques are (3) much less dependent on
perturbation size in AT. Our code is available at
https://github.com/shunk031/attention-meets-perturbationComment: 12 pages, 4 figures. Accepted by IEEE Access on Jun. 21, 202
Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training
Adversarial training (AT) for attention mechanisms has successfully reduced
such drawbacks by considering adversarial perturbations. However, this
technique requires label information, and thus, its use is limited to
supervised settings. In this study, we explore the concept of incorporating
virtual AT (VAT) into the attention mechanisms, by which adversarial
perturbations can be computed even from unlabeled data. To realize this
approach, we propose two general training techniques, namely VAT for attention
mechanisms (Attention VAT) and "interpretable" VAT for attention mechanisms
(Attention iVAT), which extend AT for attention mechanisms to a semi-supervised
setting. In particular, Attention iVAT focuses on the differences in attention;
thus, it can efficiently learn clearer attention and improve model
interpretability, even with unlabeled data. Empirical experiments based on six
public datasets revealed that our techniques provide better prediction
performance than conventional AT-based as well as VAT-based techniques, and
stronger agreement with evidence that is provided by humans in detecting
important words in sentences. Moreover, our proposal offers these advantages
without needing to add the careful selection of unlabeled data. That is, even
if the model using our VAT-based technique is trained on unlabeled data from a
source other than the target task, both the prediction performance and model
interpretability can be improved.Comment: 18 pages, 3 figures. Accepted for publication in Springer Applied
Intelligence (APIN
Approximate Lesion Localization in Dermoscopy Images
Background: Dermoscopy is one of the major imaging modalities used in the
diagnosis of melanoma and other pigmented skin lesions. Due to the difficulty
and subjectivity of human interpretation, automated analysis of dermoscopy
images has become an important research area. Border detection is often the
first step in this analysis. Methods: In this article, we present an
approximate lesion localization method that serves as a preprocessing step for
detecting borders in dermoscopy images. In this method, first the black frame
around the image is removed using an iterative algorithm. The approximate
location of the lesion is then determined using an ensemble of thresholding
algorithms. Results: The method is tested on a set of 428 dermoscopy images.
The localization error is quantified by a metric that uses dermatologist
determined borders as the ground truth. Conclusion: The results demonstrate
that the method presented here achieves both fast and accurate localization of
lesions in dermoscopy images
AraDIC : Arabic Document Classification using Image-Based Character Embeddings and Class-Balanced Loss
Classical and some deep learning techniques for Arabic text classification often depend on complex morphological analysis, word segmentation, and handcrafted feature engineering. These could be eliminated by using character-level features. We propose a novel end-to-end Arabic document classification framework, Arabic document image-based classifier (AraDIC), inspired by the work on image-based character embeddings. AraDIC consists of an image-based character encoder and a classifier. They are trained in an end-to-end fashion using the class balanced loss to deal with the long-tailed data distribution problem. To evaluate the effectiveness of AraDIC, we created and published two datasets, the Arabic Wikipedia title (AWT) dataset and the Arabic poetry (AraP) dataset. To the best of our knowledge, this is the first image-based character embedding framework addressing the problem of Arabic text classification. We also present the first deep learning-based text classifier widely evaluated on modern standard Arabic, colloquial Arabic and classical Arabic. AraDIC shows performance improvement over classical and deep learning baselines by 12.29% and 23.05% for the micro and macro F-score, respectively
DMS: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention
There is increasing interest in the use of multimodal data in various web
applications, such as digital advertising and e-commerce. Typical methods for
extracting important information from multimodal data rely on a mid-fusion
architecture that combines the feature representations from multiple encoders.
However, as the number of modalities increases, several potential problems with
the mid-fusion model structure arise, such as an increase in the dimensionality
of the concatenated multimodal features and missing modalities. To address
these problems, we propose a new concept that considers multimodal inputs as a
set of sequences, namely, deep multimodal sequence sets (DMS). Our
set-aware concept consists of three components that capture the relationships
among multiple modalities: (a) a BERT-based encoder to handle the inter- and
intra-order of elements in the sequences, (b) intra-modality residual attention
(IntraMRA) to capture the importance of the elements in a modality, and (c)
inter-modality residual attention (InterMRA) to enhance the importance of
elements with modality-level granularity further. Our concept exhibits
performance that is comparable to or better than the previous set-aware models.
Furthermore, we demonstrate that the visualization of the learned InterMRA and
IntraMRA weights can provide an interpretation of the prediction results
- …