2 research outputs found
A Two-Stage Approach to Device-Robust Acoustic Scene Classification
To improve device robustness, a highly desirable key feature of a competitive
data-driven acoustic scene classification (ASC) system, a novel two-stage
system based on fully convolutional neural networks (CNNs) is proposed. Our
two-stage system leverages on an ad-hoc score combination based on two CNN
classifiers: (i) the first CNN classifies acoustic inputs into one of three
broad classes, and (ii) the second CNN classifies the same inputs into one of
ten finer-grained classes. Three different CNN architectures are explored to
implement the two-stage classifiers, and a frequency sub-sampling scheme is
investigated. Moreover, novel data augmentation schemes for ASC are also
investigated. Evaluated on DCASE 2020 Task 1a, our results show that the
proposed ASC system attains a state-of-the-art accuracy on the development set,
where our best system, a two-stage fusion of CNN ensembles, delivers a 81.9%
average accuracy among multi-device test data, and it obtains a significant
improvement on unseen devices. Finally, neural saliency analysis with class
activation mapping (CAM) gives new insights on the patterns learnt by our
models.Comment: Submitted to ICASSP 2021. Code available:
https://github.com/MihawkHu/DCASE2020_task