Sounds carry an abundance of information about activities and events in our
everyday environment, such as traffic noise, road works, music, or people
talking. Recent machine learning methods, such as convolutional neural networks
(CNNs), have been shown to be able to automatically recognize sound activities,
a task known as audio tagging. One such method, pre-trained audio neural
networks (PANNs), provides a neural network which has been pre-trained on over
500 sound classes from the publicly available AudioSet dataset, and can be used
as a baseline or starting point for other tasks. However, the existing PANNs
model has a high computational complexity and large storage requirement. This
could limit the potential for deploying PANNs on resource-constrained devices,
such as on-the-edge sound sensors, and could lead to high energy consumption if
many such devices were deployed. In this paper, we reduce the computational
complexity and memory requirement of the PANNs model by taking a pruning
approach to eliminate redundant parameters from the PANNs model. The resulting
Efficient PANNs (E-PANNs) model, which requires 36\% less computations and 70\%
less memory, also slightly improves the sound recognition (audio tagging)
performance. The code for the E-PANNs model has been released under an open
source license.Comment: Accepted in Internoise 2023 conferenc