53 research outputs found
Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms
Recent work has shown that the end-to-end approach using convolutional neural
network (CNN) is effective in various types of machine learning tasks. For
audio signals, the approach takes raw waveforms as input using an 1-D
convolution layer. In this paper, we improve the 1-D CNN architecture for music
auto-tagging by adopting building blocks from state-of-the-art image
classification models, ResNets and SENets, and adding multi-level feature
aggregation to it. We compare different combinations of the modules in building
CNN architectures. The results show that they achieve significant improvements
over previous state-of-the-art models on the MagnaTagATune dataset and
comparable results on Million Song Dataset. Furthermore, we analyze and
visualize our model to show how the 1-D CNN operates.Comment: Accepted for publication at ICASSP 201
Temporal Feedback Convolutional Recurrent Neural Networks for Keyword Spotting
While end-to-end learning has become a trend in deep learning, the model
architecture is often designed to incorporate domain knowledge. We propose a
novel convolutional recurrent neural network (CRNN) architecture with temporal
feedback connections, inspired by the feedback pathways from the brain to ears
in the human auditory system. The proposed architecture uses a hidden state of
the RNN module at the previous time to control the sensitivity of channel-wise
feature activations in the CNN blocks at the current time, which is analogous
to the mechanism of the outer hair-cell. We apply the proposed model to keyword
spotting where the speech commands have sequential nature. We show the proposed
model consistently outperforms the compared model without temporal feedback for
different input/output settings in the CRNN framework. We also investigate the
details of the performance improvement by conducting a failure analysis of the
keyword spotting task and a visualization of the channel-wise feature scaling
in each CNN block.Comment: This paper is submitted to ICASSP 202
- …