89 research outputs found
Capturing scattered discriminative information using a deep architecture in acoustic scene classification
Frequently misclassified pairs of classes that share many common acoustic
properties exist in acoustic scene classification (ASC). To distinguish such
pairs of classes, trivial details scattered throughout the data could be vital
clues. However, these details are less noticeable and are easily removed using
conventional non-linear activations (e.g. ReLU). Furthermore, making design
choices to emphasize trivial details can easily lead to overfitting if the
system is not sufficiently generalized. In this study, based on the analysis of
the ASC task's characteristics, we investigate various methods to capture
discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear
activations in a deep neural network, and therefore, we apply an element-wise
comparison between different filters of a convolution layer's output. Two data
augment methods and two deep architecture modules are further explored to
reduce overfitting and sustain the system's discriminative power. Various
experiments are conducted using the detection and classification of acoustic
scenes and events 2020 task1-a dataset to validate the proposed methods. Our
results show that the proposed system consistently outperforms the baseline,
where the single best performing system has an accuracy of 70.4% compared to
65.1% of the baseline.Comment: Submitted to DCASE2020 worksho
- …