1,103 research outputs found
Leveraging deep neural networks with nonnegative representations for improved environmental sound classification
International audienceThis paper introduces the use of representations based on non-negative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks , whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neu-ral networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge
A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification
In Acoustic Scene Classification (ASC) two major approaches have been
followed . While one utilizes engineered features such as
mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features
that are the outcome of an optimization algorithm. I-vectors are the result of
a modeling technique that usually takes engineered features as input. It has
been shown that standard MFCCs extracted from monaural audio signals lead to
i-vectors that exhibit poor performance, especially on indoor acoustic scenes.
At the same time, Convolutional Neural Networks (CNNs) are well known for their
ability to learn features by optimizing their filters. They have been applied
on ASC and have shown promising results. In this paper, we first propose a
novel multi-channel i-vector extraction and scoring scheme for ASC, improving
their performance on indoor and outdoor scenes. Second, we propose a CNN
architecture that achieves promising ASC results. Further, we show that
i-vectors and CNNs capture complementary information from acoustic scenes.
Finally, we propose a hybrid system for ASC using multi-channel i-vectors and
CNNs by utilizing a score fusion technique. Using our method, we participated
in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1 st
rank among 49 submissions, substantially improving the previous state of the
art
A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification
One of the biggest challenges of acoustic scene classification (ASC) is to
find proper features to better represent and characterize environmental sounds.
Environmental sounds generally involve more sound sources while exhibiting less
structure in temporal spectral representations. However, the background of an
acoustic scene exhibits temporal homogeneity in acoustic properties, suggesting
it could be characterized by distribution statistics rather than temporal
details. In this work, we investigated using auditory summary statistics as the
feature for ASC tasks. The inspiration comes from a recent neuroscience study,
which shows the human auditory system tends to perceive sound textures through
time-averaged statistics. Based on these statistics, we further proposed to use
linear discriminant analysis to eliminate redundancies among these statistics
while keeping the discriminative information, providing an extreme com-pact
representation for acoustic scenes. Experimental results show the outstanding
performance of the proposed feature over the conventional handcrafted features.Comment: Accepted as a conference paper of Interspeech 201
Recommended from our members
A database and challenge for acoustic scene classification and event detection
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
Sound events often occur in unstructured environments where they exhibit wide
variations in their frequency content and temporal structure. Convolutional
neural networks (CNN) are able to extract higher level features that are
invariant to local spectral and temporal variations. Recurrent neural networks
(RNNs) are powerful in learning the longer term temporal context in the audio
signals. CNNs and RNNs as classifiers have recently shown improved performances
over established methods in various sound recognition tasks. We combine these
two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it
on a polyphonic sound event detection task. We compare the performance of the
proposed CRNN method with CNN, RNN, and other established methods, and observe
a considerable improvement for four different datasets consisting of everyday
sound events.Comment: Accepted for IEEE Transactions on Audio, Speech and Language
Processing, Special Issue on Sound Scene and Event Analysi
- …