368 research outputs found
Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet
We present a work on low-complexity acoustic scene classification (ASC) with
multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge.
This subtask focuses on classifying audio samples of multiple devices with a
low-complexity model, where two main difficulties need to be overcome. First,
the audio samples are recorded by different devices, and there is mismatch of
recording devices in audio samples. We reduce the negative impact of the
mismatch of recording devices by using some effective strategies, including
data augmentation (e.g., mix-up, spectrum correction, pitch shift), usages of
multi-patch network structure and channel attention. Second, the model size
should be smaller than a threshold (e.g., 128 KB required by the DCASE2021
challenge). To meet this condition, we adopt a ResNet with both depthwise
separable convolution and channel attention as the backbone network, and
perform model compression. In summary, we propose a low-complexity ASC method
using data augmentation and a lightweight ResNet. Evaluated on the official
development and evaluation datasets, our method obtains classification accuracy
scores of 71.6% and 66.7%, respectively; and obtains Log-loss scores of 1.038
and 1.136, respectively. Our final model size is 110.3 KB which is smaller than
the maximum of 128 KB.Comment: 5 pages, 5 figures, 4 tables. Accepted for publication in the 16th
IEEE International Conference on Signal Processing (IEEE ICSP
- …