1 research outputs found
A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features
The explainability of Convolutional Neural Networks (CNNs) is a particularly
challenging task in all areas of application, and it is notably
under-researched in music and audio domain. In this paper, we approach
explainability by exploiting the knowledge we have on hand-crafted audio
features. Our study focuses on a well-defined MIR task, the recognition of
musical instruments from user-generated music recordings. We compute the
similarity between a set of traditional audio features and representations
learned by CNNs. We also propose a technique for measuring the similarity
between activation maps and audio features which typically presented in the
form of a matrix, such as chromagrams or spectrograms. We observe that some
neurons' activations correspond to well-known classical audio features. In
particular, for shallow layers, we found similarities between activations and
harmonic and percussive components of the spectrum. For deeper layers, we
compare chromagrams with high-level activation maps as well as loudness and
onset rate with deep-learned embeddings.Comment: The 2018 Joint Workshop on Machine Learning for Music, The Federated
Artificial Intelligence Meeting (FAIM), Joint workshop program of ICML,
IJCAI/ECAI, and AAMAS, Stockholm, Sweden, Saturday, July 14th, 201