27,028 research outputs found
Robust sound event detection in bioacoustic sensor networks
Bioacoustic sensors, sometimes known as autonomous recording units (ARUs),
can record sounds of wildlife over long periods of time in scalable and
minimally invasive ways. Deriving per-species abundance estimates from these
sensors requires detection, classification, and quantification of animal
vocalizations as individual acoustic events. Yet, variability in ambient noise,
both over time and across sensors, hinders the reliability of current automated
systems for sound event detection (SED), such as convolutional neural networks
(CNN) in the time-frequency domain. In this article, we develop, benchmark, and
combine several machine listening techniques to improve the generalizability of
SED models across heterogeneous acoustic environments. As a case study, we
consider the problem of detecting avian flight calls from a ten-hour recording
of nocturnal bird migration, recorded by a network of six ARUs in the presence
of heterogeneous background noise. Starting from a CNN yielding
state-of-the-art accuracy on this task, we introduce two noise adaptation
techniques, respectively integrating short-term (60 milliseconds) and long-term
(30 minutes) context. First, we apply per-channel energy normalization (PCEN)
in the time-frequency domain, which applies short-term automatic gain control
to every subband in the mel-frequency spectrogram. Secondly, we replace the
last dense layer in the network by a context-adaptive neural network (CA-NN)
layer. Combining them yields state-of-the-art results that are unmatched by
artificial data augmentation alone. We release a pre-trained version of our
best performing system under the name of BirdVoxDetect, a ready-to-use detector
of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019;
revised August 2019; published October 201
Semi-automatic semantic enrichment of raw sensor data
One of the more recent sources of large volumes of generated data is sensor devices, where dedicated sensing equipment is used to monitor events and happenings in a wide range of domains, including monitoring human biometrics. In recent trials to examine the effects that key moments in movies have on the human body, we fitted fitted with a number of biometric sensor devices and monitored them as they watched a range of dierent movies in groups. The purpose of these experiments was to examine the correlation between humans' highlights in movies as observed from biometric sensors, and highlights in the same movies as identified by our automatic movie analysis techniques. However,the problem with this type of experiment is that both the analysis of the video stream and the sensor data readings are not directly usable
in their raw form because of the sheer volume of low-level data values generated both from the sensors and from the movie analysis. This work describes the semi-automated enrichment of both video analysis and sensor data and the mechanism used to query the data in both centralised
environments, and in a peer-to-peer architecture when the number of sensor devices grows to large numbers. We present and validate a scalable means of semi-automating the semantic enrichment of sensor data, thereby providing a means of large-scale sensor management
DeepCough: A Deep Convolutional Neural Network in A Wearable Cough Detection System
In this paper, we present a system that employs a wearable acoustic sensor
and a deep convolutional neural network for detecting coughs. We evaluate the
performance of our system on 14 healthy volunteers and compare it to that of
other cough detection systems that have been reported in the literature.
Experimental results show that our system achieves a classification sensitivity
of 95.1% and a specificity of 99.5%.Comment: BioCAS-201
Deep Neural Networks for the Recognition and Classification of Heart Murmurs Using Neuromorphic Auditory Sensors
Auscultation is one of the most used techniques for
detecting cardiovascular diseases, which is one of the main causes
of death in the world. Heart murmurs are the most common abnormal
finding when a patient visits the physician for auscultation.
These heart sounds can either be innocent, which are harmless, or
abnormal, which may be a sign of a more serious heart condition.
However, the accuracy rate of primary care physicians and expert
cardiologists when auscultating is not good enough to avoid most
of both type-I (healthy patients are sent for echocardiogram) and
type-II (pathological patients are sent home without medication or
treatment) errors made. In this paper, the authors present a novel
convolutional neural network based tool for classifying between
healthy people and pathological patients using a neuromorphic
auditory sensor for FPGA that is able to decompose the audio into
frequency bands in real time. For this purpose, different networks
have been trained with the heart murmur information contained in
heart sound recordings obtained from nine different heart sound
databases sourced from multiple research groups. These samples
are segmented and preprocessed using the neuromorphic auditory
sensor to decompose their audio information into frequency
bands and, after that, sonogram images with the same size are
generated. These images have been used to train and test different
convolutional neural network architectures. The best results
have been obtained with a modified version of the AlexNet model,
achieving 97% accuracy (specificity: 95.12%, sensitivity: 93.20%,
PhysioNet/CinC Challenge 2016 score: 0.9416). This tool could aid
cardiologists and primary care physicians in the auscultation process,
improving the decision making task and reducing type-I and
type-II errors.Ministerio de Economía y Competitividad TEC2016-77785-
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
- …