14 research outputs found
SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification
Acoustic Scene Classification (ASC) is one of the core research problems in
the field of Computational Sound Scene Analysis. In this work, we present
SubSpectralNet, a novel model which captures discriminative features by
incorporating frequency band-level differences to model soundscapes. Using
mel-spectrograms, we propose the idea of using band-wise crops of the input
time-frequency representations and train a convolutional neural network (CNN)
on the same. We also propose a modification in the training method for more
efficient learning of the CNN models. We first give a motivation for using
sub-spectrograms by giving intuitive and statistical analyses and finally we
develop a sub-spectrogram based CNN architecture for ASC. The system is
evaluated on the public ASC development dataset provided for the "Detection and
Classification of Acoustic Scenes and Events" (DCASE) 2018 Challenge. Our best
model achieves an improvement of +14% in terms of classification accuracy with
respect to the DCASE 2018 baseline system. Code and figures are available at
https://github.com/ssrp/SubSpectralNetComment: Accepted to IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP) 201
SubSpectralNet - Using sub-spectrogram based convolutional neural networks for acoustic scene classification
Acoustic Scene Classification (ASC) is one of the core research problems in the field of Computational Sound Scene Analysis. In this work, we present SubSpectralNet, a novel model which captures discriminative features by incorporating frequency band-level differences to model soundscapes. Using mel-spectrograms, we propose the idea of using band-wise crops of the input time-frequency representations and train a convolutional neural network~(CNN) on the same. We also propose a modification in the training method for more efficient learning of the CNN models. We first give a motivation for using sub-spectrograms by giving intuitive and statistical analyses and finally we develop a sub-spectrogram based CNN architecture for ASC. The system is evaluated on the public ASC development dataset provided for the "Detection and Classification of Acoustic Scenes and Events" (DCASE) 2018 Challenge. Our best model achieves an improvement of +14% in terms of classification accuracy with respect to the DCASE 2018 baseline system. Code and figures are available at https://github.com/ssrp/SubSpectralNe
Domestic Activities Classification from Audio Recordings Using Multi-scale Dilated Depthwise Separable Convolutional Network
Domestic activities classification (DAC) from audio recordings aims at
classifying audio recordings into pre-defined categories of domestic
activities, which is an effective way for estimation of daily activities
performed in home environment. In this paper, we propose a method for DAC from
audio recordings using a multi-scale dilated depthwise separable convolutional
network (DSCN). The DSCN is a lightweight neural network with small size of
parameters and thus suitable to be deployed in portable terminals with limited
computing resources. To expand the receptive field with the same size of DSCN's
parameters, dilated convolution, instead of normal convolution, is used in the
DSCN for further improving the DSCN's performance. In addition, the embeddings
of various scales learned by the dilated DSCN are concatenated as a multi-scale
embedding for representing property differences among various classes of
domestic activities. Evaluated on a public dataset of the Task 5 of the 2018
challenge on Detection and Classification of Acoustic Scenes and Events
(DCASE-2018), the results show that: both dilated convolution and multi-scale
embedding contribute to the performance improvement of the proposed method; and
the proposed method outperforms the methods based on state-of-the-art
lightweight network in terms of classification accuracy.Comment: 5 pages, 2 figures, 4 tables. Accepted for publication in IEEE
MMSP202