797 research outputs found
Wavelet-based birdsong recognition for conservation : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand
Listed in 2017 Dean's List of Exceptional ThesesAccording to the International Union for the Conservation of Nature Red Data List
nearly a quarter of the world's bird species are either threatened or at risk of extinction.
To be able to protect endangered species, we need accurate survey methods that reliably
estimate numbers and hence population trends. Acoustic monitoring is the most
commonly-used method to survey birds, particularly cryptic and nocturnal species,
not least because it is non-invasive, unbiased, and relatively time-effective. Unfortunately,
the resulting data still have to be analysed manually. The current practice,
manual spectrogram reading, is tedious, prone to bias due to observer variations, and
not reproducible.
While there is a large literature on automatic recognition of targeted recordings of
small numbers of species, automatic analysis of long field recordings has not been well
studied to date. This thesis considers this problem in detail, presenting experiments
demonstrating the true efficacy of recorders in natural environments under different
conditions, and then working to reduce the noise present in the recording, as well as to
segment and recognise a range of New Zealand native bird species.
The primary issues with field recordings are that the birds are at variable distances
from the recorder, that the recordings are corrupted by many different forms of noise,
that the environment affects the quality of the recorded sound, and that birdsong is
often relatively rare within a recording. Thus, methods of dealing with faint calls,
denoising, and effective segmentation are all needed before individual species can be
recognised reliably. Experiments presented in this thesis demonstrate clearly the effects
of distance and environment on recorded calls. Some of these results are unsurprising,
for example an inverse square relationship with distance is largely true. Perhaps more
surprising is that the height from which a call is transmitted has a signifcant effect on
the recorded sound. Statistical analyses of the experiments, which demonstrate many
significant environmental and sound factors, are presented.
Regardless of these factors, the recordings have noise present, and removing this
noise is helpful for reliable recognition. A method for denoising based on the wavelet
packet decomposition is presented and demonstrated to significantly improve the quality
of recordings. Following this, wavelets were also used to implement a call detection
algorithm that identifies regions of the recording with calls from a target bird species.
This algorithm is validated using four New Zealand native species namely Australasian
bittern (Botaurus poiciloptilus), brown kiwi (Apteryx mantelli ), morepork (Ninox novaeseelandiae),
and kakapo (Strigops habroptilus), but could be used for any species.
The results demonstrate high recall rates and tolerate false positives when compared
to human experts
Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/ licenses/by/4.0
Mosquito Detection with Neural Networks: The Buzz of Deep Learning
Many real-world time-series analysis problems are characterised by scarce
data. Solutions typically rely on hand-crafted features extracted from the time
or frequency domain allied with classification or regression engines which
condition on this (often low-dimensional) feature vector. The huge advances
enjoyed by many application domains in recent years have been fuelled by the
use of deep learning architectures trained on large data sets. This paper
presents an application of deep learning for acoustic event detection in a
challenging, data-scarce, real-world problem. Our candidate challenge is to
accurately detect the presence of a mosquito from its acoustic signature. We
develop convolutional neural networks (CNNs) operating on wavelet
transformations of audio recordings. Furthermore, we interrogate the network's
predictive power by visualising statistics of network-excitatory samples. These
visualisations offer a deep insight into the relative informativeness of
components in the detection problem. We include comparisons with conventional
classifiers, conditioned on both hand-tuned and generic features, to stress the
strength of automatic deep feature learning. Detection is achieved with
performance metrics significantly surpassing those of existing algorithmic
methods, as well as marginally exceeding those attained by individual human
experts.Comment: For data and software related to this paper, see
http://humbug.ac.uk/kiskin2017/. Submitted as a conference paper to ECML 201
Differentiable Time-Frequency Scattering on GPU
Joint time-frequency scattering (JTFS) is a convolutional operator in the
time-frequency domain which extracts spectrotemporal modulations at various
rates and scales. It offers an idealized model of spectrotemporal receptive
fields (STRF) in the primary auditory cortex, and thus may serve as a
biological plausible surrogate for human perceptual judgments at the scale of
isolated audio events. Yet, prior implementations of JTFS and STRF have
remained outside of the standard toolkit of perceptual similarity measures and
evaluation methods for audio generation. We trace this issue down to three
limitations: differentiability, speed, and flexibility. In this paper, we
present an implementation of time-frequency scattering in Python. Unlike prior
implementations, ours accommodates NumPy, PyTorch, and TensorFlow as backends
and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS
via three applications: unsupervised manifold learning of spectrotemporal
modulations, supervised classification of musical instruments, and texture
resynthesis of bioacoustic sounds.Comment: 8 pages, 6 figures. Submitted to the International Conference on
Digital Audio Effects (DAFX) 202
- …