4,025 research outputs found
WaveNets: Wavelet Channel Attention Networks
Channel Attention reigns supreme as an effective technique in the field of
computer vision. However, the proposed channel attention by SENet suffers from
information loss in feature learning caused by the use of Global Average
Pooling (GAP) to represent channels as scalars. Thus, designing effective
channel attention mechanisms requires finding a solution to enhance features
preservation in modeling channel inter-dependencies. In this work, we utilize
Wavelet transform compression as a solution to the channel representation
problem. We first test wavelet transform as an Auto-Encoder model equipped with
conventional channel attention module. Next, we test wavelet transform as a
standalone channel compression method. We prove that global average pooling is
equivalent to the recursive approximate Haar wavelet transform. With this
proof, we generalize channel attention using Wavelet compression and name it
WaveNet. Implementation of our method can be embedded within existing channel
attention methods with a couple of lines of code. We test our proposed method
using ImageNet dataset for image classification task. Our method outperforms
the baseline SENet, and achieves the state-of-the-art results. Our code
implementation is publicly available at https://github.com/hady1011/WaveNet-C.Comment: IEEE BigData2022 conferenc
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Current speaker verification techniques rely on a neural network to extract
speaker representations. The successful x-vector architecture is a Time Delay
Neural Network (TDNN) that applies statistics pooling to project
variable-length utterances into fixed-length speaker characterizing embeddings.
In this paper, we propose multiple enhancements to this architecture based on
recent trends in the related fields of face verification and computer vision.
Firstly, the initial frame layers can be restructured into 1-dimensional
Res2Net modules with impactful skip connections. Similarly to SE-ResNet, we
introduce Squeeze-and-Excitation blocks in these modules to explicitly model
channel interdependencies. The SE block expands the temporal context of the
frame layer by rescaling the channels according to global properties of the
recording. Secondly, neural networks are known to learn hierarchical features,
with each layer operating on a different level of complexity. To leverage this
complementary information, we aggregate and propagate features of different
hierarchical levels. Finally, we improve the statistics pooling module with
channel-dependent frame attention. This enables the network to focus on
different subsets of frames during each of the channel's statistics estimation.
The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art
TDNN based systems on the VoxCeleb test sets and the 2019 VoxCeleb Speaker
Recognition Challenge.Comment: proceedings of INTERSPEECH 202
PKCAM: Previous Knowledge Channel Attention Module
Recently, attention mechanisms have been explored with ConvNets, both across
the spatial and channel dimensions. However, from our knowledge, all the
existing methods devote the attention modules to capture local interactions
from a uni-scale. In this paper, we propose a Previous Knowledge Channel
Attention Module(PKCAM), that captures channel-wise relations across different
layers to model the global context. Our proposed module PKCAM is easily
integrated into any feed-forward CNN architectures and trained in an end-to-end
fashion with a negligible footprint due to its lightweight property. We
validate our novel architecture through extensive experiments on image
classification and object detection tasks with different backbones. Our
experiments show consistent improvements in performances against their
counterparts. Our code is published at https://github.com/eslambakr/EMCA
- …