Search CORE

1,968 research outputs found

Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification

Author: Bisot Victor
Essid Slim
Richard Gael
Serizel Romain
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2016
Field of study

International audienceIn this paper, we study the usefulness of various matrix factorization methods for learning features to be used for the specific Acoustic Scene Classification problem. A common way of addressing ASC has been to engineer features capable of capturing the specificities of acoustic environments. Instead, we show that better representations of the scenes can be automatically learned from time-frequency representations using matrix factorization techniques. We mainly focus on extensions including sparse, kernel-based, convolutive and a novel supervised dictionary learning variant of Principal Component Analysis and Nonnegative Matrix Factorization. An experimental evaluation is performed on two of the largest ASC datasets available in order to compare and discuss the usefulness of these methods for the task. We show that the unsupervised learning methods provide better representations of acoustic scenes than the best conventional hand-crafted features on both datasets. Furthermore, the introduction of a novel nonnegative supervised matrix factorization model and Deep Neural networks trained on spectrograms, allow us to reach further improvements

INRIA a CCSD electronic archive server

A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification

Author: bisot
brookes
brümmer
dehak
dehak
eghbal-zadeh
eghbal-zadeh
elizalde
garcia-romero
hatch
lehner
lin
mika
salamon
valenti
zeinali
Publication venue
Publication date: 20/06/2017
Field of study

In Acoustic Scene Classification (ASC) two major approaches have been followed . While one utilizes engineered features such as mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features that are the outcome of an optimization algorithm. I-vectors are the result of a modeling technique that usually takes engineered features as input. It has been shown that standard MFCCs extracted from monaural audio signals lead to i-vectors that exhibit poor performance, especially on indoor acoustic scenes. At the same time, Convolutional Neural Networks (CNNs) are well known for their ability to learn features by optimizing their filters. They have been applied on ASC and have shown promising results. In this paper, we first propose a novel multi-channel i-vector extraction and scoring scheme for ASC, improving their performance on indoor and outdoor scenes. Second, we propose a CNN architecture that achieves promising ASC results. Further, we show that i-vectors and CNNs capture complementary information from acoustic scenes. Finally, we propose a hybrid system for ASC using multi-channel i-vectors and CNNs by utilizing a score fusion technique. Using our method, we participated in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1 st rank among 49 submissions, substantially improving the previous state of the art

arXiv.org e-Print Archive

Crossref

SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification

Author: Benetos Emmanouil
Phaye Sai Samarth R
Wang Ye
Publication venue
Publication date: 25/02/2019
Field of study

Acoustic Scene Classification (ASC) is one of the core research problems in the field of Computational Sound Scene Analysis. In this work, we present SubSpectralNet, a novel model which captures discriminative features by incorporating frequency band-level differences to model soundscapes. Using mel-spectrograms, we propose the idea of using band-wise crops of the input time-frequency representations and train a convolutional neural network (CNN) on the same. We also propose a modification in the training method for more efficient learning of the CNN models. We first give a motivation for using sub-spectrograms by giving intuitive and statistical analyses and finally we develop a sub-spectrogram based CNN architecture for ASC. The system is evaluated on the public ASC development dataset provided for the "Detection and Classification of Acoustic Scenes and Events" (DCASE) 2018 Challenge. Our best model achieves an improvement of +14% in terms of classification accuracy with respect to the DCASE 2018 baseline system. Code and figures are available at https://github.com/ssrp/SubSpectralNetComment: Accepted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 201

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS

A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification

Author: Deng Shiwen
Han Jiqing
Song Hongwei
Publication venue: 'International Speech Communication Association'
Publication date: 10/04/2019
Field of study

One of the biggest challenges of acoustic scene classification (ASC) is to find proper features to better represent and characterize environmental sounds. Environmental sounds generally involve more sound sources while exhibiting less structure in temporal spectral representations. However, the background of an acoustic scene exhibits temporal homogeneity in acoustic properties, suggesting it could be characterized by distribution statistics rather than temporal details. In this work, we investigated using auditory summary statistics as the feature for ASC tasks. The inspiration comes from a recent neuroscience study, which shows the human auditory system tends to perceive sound textures through time-averaged statistics. Based on these statistics, we further proposed to use linear discriminant analysis to eliminate redundancies among these statistics while keeping the discriminative information, providing an extreme com-pact representation for acoustic scenes. Experimental results show the outstanding performance of the proposed feature over the conventional handcrafted features.Comment: Accepted as a conference paper of Interspeech 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

A database and challenge for acoustic scene classification and event detection

Author: Benetos E.
Giannoulis D.
Lagrange M.
Plumbley M. D.
Rossignol M.
Stowell D.
Publication venue
Publication date: 01/01/2013
Field of study

City Research Online

DCASE 2018 Challenge Surrey Cross-Task convolutional neural network baseline

Author: Iqbal Turab
Kong Qiuqiang
Plumbley Mark D.
Wang Wenwu
Xu Yong
Publication venue
Publication date: 01/01/2018
Field of study

The Detection and Classification of Acoustic Scenes and Events (DCASE) consists of five audio classification and sound event detection tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging of Freesound, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio classification. In this paper, we create a cross-task baseline system for all five tasks based on a convlutional neural network (CNN): a "CNN Baseline" system. We implemented CNNs with 4 layers and 8 layers originating from AlexNet and VGG from computer vision. We investigated how the performance varies from task to task with the same configuration of neural networks. Experiments show that deeper CNN with 8 layers performs better than CNN with 4 layers on all tasks except Task 1. Using CNN with 8 layers, we achieve an accuracy of 0.680 on Task 1, an accuracy of 0.895 and a mean average precision (MAP) of 0.928 on Task 2, an accuracy of 0.751 and an area under the curve (AUC) of 0.854 on Task 3, a sound event detection F1 score of 20.8% on Task 4, and an F1 score of 87.75% on Task 5. We released the Python source code of the baseline systems under the MIT license for further research.Comment: Accepted by DCASE 2018 Workshop. 4 pages. Source code availabl

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight