3,379 research outputs found
Robust sound event detection in bioacoustic sensor networks
Bioacoustic sensors, sometimes known as autonomous recording units (ARUs),
can record sounds of wildlife over long periods of time in scalable and
minimally invasive ways. Deriving per-species abundance estimates from these
sensors requires detection, classification, and quantification of animal
vocalizations as individual acoustic events. Yet, variability in ambient noise,
both over time and across sensors, hinders the reliability of current automated
systems for sound event detection (SED), such as convolutional neural networks
(CNN) in the time-frequency domain. In this article, we develop, benchmark, and
combine several machine listening techniques to improve the generalizability of
SED models across heterogeneous acoustic environments. As a case study, we
consider the problem of detecting avian flight calls from a ten-hour recording
of nocturnal bird migration, recorded by a network of six ARUs in the presence
of heterogeneous background noise. Starting from a CNN yielding
state-of-the-art accuracy on this task, we introduce two noise adaptation
techniques, respectively integrating short-term (60 milliseconds) and long-term
(30 minutes) context. First, we apply per-channel energy normalization (PCEN)
in the time-frequency domain, which applies short-term automatic gain control
to every subband in the mel-frequency spectrogram. Secondly, we replace the
last dense layer in the network by a context-adaptive neural network (CA-NN)
layer. Combining them yields state-of-the-art results that are unmatched by
artificial data augmentation alone. We release a pre-trained version of our
best performing system under the name of BirdVoxDetect, a ready-to-use detector
of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019;
revised August 2019; published October 201
Audio Event Detection using Weakly Labeled Data
Acoustic event detection is essential for content analysis and description of
multimedia recordings. The majority of current literature on the topic learns
the detectors through fully-supervised techniques employing strongly labeled
data. However, the labels available for majority of multimedia data are
generally weak and do not provide sufficient detail for such methods to be
employed. In this paper we propose a framework for learning acoustic event
detectors using only weakly labeled data. We first show that audio event
detection using weak labels can be formulated as an Multiple Instance Learning
problem. We then suggest two frameworks for solving multiple-instance learning,
one based on support vector machines, and the other on neural networks. The
proposed methods can help in removing the time consuming and expensive process
of manually annotating data to facilitate fully supervised learning. Moreover,
it can not only detect events in a recording but can also provide temporal
locations of events in the recording. This helps in obtaining a complete
description of the recording and is notable since temporal information was
never known in the first place in weakly labeled data.Comment: ACM Multimedia 201
Using Computer Vision And Volunteer Computing To Analyze Avian Nesting Patterns And Reduce Scientist Workload
This paper examines the use of feature detection and background subtraction algorithms to classify and detect events of interest within uncontrolled outdoor avian nesting video from the Wildlife@Home project. We tested feature detection using Speeded Up Robust Features (SURF) and a Support Vector Machine (SVM) along with four background subtraction algorithms — Mixture of Guassians (MOG), Running Gaussian Average (AccAvg), ViBe, and Pixel-Based Adaptive Segmentation (PBAS) — as methods to automatically detect and classify events from surveillance cameras. AccAvg and modified PBAS are shown to provide robust results and compensate for issues caused by cryptic coloration of the monitored species. Both methods utilize the Berkeley Open Infrastructure for Network Computing (BOINC) in order to provide the resources to be able to analyze the 68,000+ hours of video in the Wildlife@Home project in a reasonable amount of time. The feature detection technique failed to handle the many challenges found in the low quality uncontrolled outdoor video. The background subtraction work with AccAvg and the modified version of PBAS is shown to provide more accurate detection of events
Acoustic event detection for multiple overlapping similar sources
Many current paradigms for acoustic event detection (AED) are not adapted to
the organic variability of natural sounds, and/or they assume a limit on the
number of simultaneous sources: often only one source, or one source of each
type, may be active. These aspects are highly undesirable for applications such
as bird population monitoring. We introduce a simple method modelling the
onsets, durations and offsets of acoustic events to avoid intrinsic limits on
polyphony or on inter-event temporal patterns. We evaluate the method in a case
study with over 3000 zebra finch calls. In comparison against a HMM-based
method we find it more accurate at recovering acoustic events, and more robust
for estimating calling rates.Comment: Accepted for WASPAA 201
- …