11,060 research outputs found
Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms
In NLP, convolutional neural networks (CNNs) have benefited less than
recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that
this is because the attention in CNNs has been mainly implemented as attentive
pooling (i.e., it is applied to pooling) rather than as attentive convolution
(i.e., it is integrated into convolution). Convolution is the differentiator of
CNNs in that it can powerfully model the higher-level representation of a word
by taking into account its local fixed-size context in the input text t^x. In
this work, we propose an attentive convolution network, ATTCONV. It extends the
context scope of the convolution operation, deriving higher-level features for
a word not only from local context, but also information extracted from
nonlocal context by the attention mechanism commonly used in RNNs. This
nonlocal context can come (i) from parts of the input text t^x that are distant
or (ii) from extra (i.e., external) contexts t^y. Experiments on sentence
modeling with zero-context (sentiment analysis), single-context (textual
entailment) and multiple-context (claim verification) demonstrate the
effectiveness of ATTCONV in sentence representation learning with the
incorporation of context. In particular, attentive convolution outperforms
attentive pooling and is a strong competitor to popular attentive RNNs.Comment: Camera-ready for TACL. 16 page
Evaluation of pooling operations in convolutional architectures for drug-drug interaction extraction
Background: Deep Neural Networks (DNN), in particular, Convolutional Neural Networks (CNN), has recently achieved state-of-art results for the task of Drug-Drug Interaction (DDI) extraction. Most CNN architectures incorporate a pooling layer to reduce the dimensionality of the convolution layer output, preserving relevant features and removing irrelevant details. All the previous CNN based systems for DDI extraction used max-pooling layers. Results: In this paper, we evaluate the performance of various pooling methods (in particular max-pooling, average-pooling and attentive pooling), as well as their combination, for the task of DDI extraction. Our experiments show that max-pooling exhibits a higher performance in F1-score (64.56%) than attentive pooling (59.92%) and than average-pooling (58.35%). Conclusions: Max-pooling outperforms the others alternatives because is the only one which is invariant to the special pad tokens that are appending to the shorter sentences known as padding. Actually, the combination of max-pooling and attentive pooling does not improve the performance as compared with the single max-pooling technique.Publication of this article was supported by the Research Program of the Ministry of Economy and Competitiveness - Government of Spain, (DeepEMR project TIN2017-87548-C2-1-R) and the TEAM project (Erasmus Mundus Action 2-Strand 2 Programme) funded by the European Commission
Vision Transformer with Attentive Pooling for Robust Facial Expression Recognition
Facial Expression Recognition (FER) in the wild is an extremely challenging
task. Recently, some Vision Transformers (ViT) have been explored for FER, but
most of them perform inferiorly compared to Convolutional Neural Networks
(CNN). This is mainly because the new proposed modules are difficult to
converge well from scratch due to lacking inductive bias and easy to focus on
the occlusion and noisy areas. TransFER, a representative transformer-based
method for FER, alleviates this with multi-branch attention dropping but brings
excessive computations. On the contrary, we present two attentive pooling (AP)
modules to pool noisy features directly. The AP modules include Attentive Patch
Pooling (APP) and Attentive Token Pooling (ATP). They aim to guide the model to
emphasize the most discriminative features while reducing the impacts of less
relevant features. The proposed APP is employed to select the most informative
patches on CNN features, and ATP discards unimportant tokens in ViT. Being
simple to implement and without learnable parameters, the APP and ATP
intuitively reduce the computational cost while boosting the performance by
ONLY pursuing the most discriminative features. Qualitative results demonstrate
the motivations and effectiveness of our attentive poolings. Besides,
quantitative results on six in-the-wild datasets outperform other
state-of-the-art methods.Comment: Codes will be public on https://github.com/youqingxiaozhua/APVi
Attentive Statistics Pooling for Deep Speaker Embedding
This paper proposes attentive statistics pooling for deep speaker embedding
in text-independent speaker verification. In conventional speaker embedding,
frame-level features are averaged over all the frames of a single utterance to
form an utterance-level feature. Our method utilizes an attention mechanism to
give different weights to different frames and generates not only weighted
means but also weighted standard deviations. In this way, it can capture
long-term variations in speaker characteristics more effectively. An evaluation
on the NIST SRE 2012 and the VoxCeleb data sets shows that it reduces equal
error rates (EERs) from the conventional method by 7.5% and 8.1%, respectively.Comment: Proc. Interspeech 2018, pp2252--2256. arXiv admin note: text overlap
with arXiv:1809.0931
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks
In this work, we propose CLass-Enhanced Attentive Response (CLEAR): an
approach to visualize and understand the decisions made by deep neural networks
(DNNs) given a specific input. CLEAR facilitates the visualization of attentive
regions and levels of interest of DNNs during the decision-making process. It
also enables the visualization of the most dominant classes associated with
these attentive regions of interest. As such, CLEAR can mitigate some of the
shortcomings of heatmap-based methods associated with decision ambiguity, and
allows for better insights into the decision-making process of DNNs.
Quantitative and qualitative experiments across three different datasets
demonstrate the efficacy of CLEAR for gaining a better understanding of the
inner workings of DNNs during the decision-making process.Comment: Accepted at Computer Vision and Patter Recognition Workshop (CVPR-W)
on Explainable Computer Vision, 201
- …