333 research outputs found
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
Sound events often occur in unstructured environments where they exhibit wide
variations in their frequency content and temporal structure. Convolutional
neural networks (CNN) are able to extract higher level features that are
invariant to local spectral and temporal variations. Recurrent neural networks
(RNNs) are powerful in learning the longer term temporal context in the audio
signals. CNNs and RNNs as classifiers have recently shown improved performances
over established methods in various sound recognition tasks. We combine these
two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it
on a polyphonic sound event detection task. We compare the performance of the
proposed CRNN method with CNN, RNN, and other established methods, and observe
a considerable improvement for four different datasets consisting of everyday
sound events.Comment: Accepted for IEEE Transactions on Audio, Speech and Language
Processing, Special Issue on Sound Scene and Event Analysi
Mosquito Detection with Neural Networks: The Buzz of Deep Learning
Many real-world time-series analysis problems are characterised by scarce
data. Solutions typically rely on hand-crafted features extracted from the time
or frequency domain allied with classification or regression engines which
condition on this (often low-dimensional) feature vector. The huge advances
enjoyed by many application domains in recent years have been fuelled by the
use of deep learning architectures trained on large data sets. This paper
presents an application of deep learning for acoustic event detection in a
challenging, data-scarce, real-world problem. Our candidate challenge is to
accurately detect the presence of a mosquito from its acoustic signature. We
develop convolutional neural networks (CNNs) operating on wavelet
transformations of audio recordings. Furthermore, we interrogate the network's
predictive power by visualising statistics of network-excitatory samples. These
visualisations offer a deep insight into the relative informativeness of
components in the detection problem. We include comparisons with conventional
classifiers, conditioned on both hand-tuned and generic features, to stress the
strength of automatic deep feature learning. Detection is achieved with
performance metrics significantly surpassing those of existing algorithmic
methods, as well as marginally exceeding those attained by individual human
experts.Comment: For data and software related to this paper, see
http://humbug.ac.uk/kiskin2017/. Submitted as a conference paper to ECML 201
- …