2,934 research outputs found
Evaluation of classical machine learning techniques towards urban sound recognition embedded systems
Automatic urban sound classification is a desirable capability for urban monitoring systems, allowing real-time monitoring of urban environments and recognition of events. Current embedded systems provide enough computational power to perform real-time urban audio recognition. Using such devices for the edge computation when acting as nodes of Wireless Sensor Networks (WSN) drastically alleviates the required bandwidth consumption. In this paper, we evaluate classical Machine Learning (ML) techniques for urban sound classification on embedded devices with respect to accuracy and execution time. This evaluation provides a real estimation of what can be expected when performing urban sound classification on such constrained devices. In addition, a cascade approach is also proposed to combine ML techniques by exploiting embedded characteristics such as pipeline or multi-thread execution present in current embedded devices. The accuracy of this approach is similar to the traditional solutions, but provides in addition more flexibility to prioritize accuracy or timing
Environmental Sound Classification with Parallel Temporal-spectral Attention
Convolutional neural networks (CNN) are one of the best-performing neural
network architectures for environmental sound classification (ESC). Recently,
temporal attention mechanisms have been used in CNN to capture the useful
information from the relevant time frames for audio classification, especially
for weakly labelled data where the onset and offset times of the sound events
are not applied. In these methods, however, the inherent spectral
characteristics and variations are not explicitly exploited when obtaining the
deep features. In this paper, we propose a novel parallel temporal-spectral
attention mechanism for CNN to learn discriminative sound representations,
which enhances the temporal and spectral features by capturing the importance
of different time frames and frequency bands. Parallel branches are constructed
to allow temporal attention and spectral attention to be applied respectively
in order to mitigate interference from the segments without the presence of
sound events. The experiments on three environmental sound classification (ESC)
datasets and two acoustic scene classification (ASC) datasets show that our
method improves the classification performance and also exhibits robustness to
noise.Comment: submitted to INTERSPEECH202
A Robust Interpretable Deep Learning Classifier for Heart Anomaly Detection Without Segmentation
Traditionally, abnormal heart sound classification is framed as a three-stage
process. The first stage involves segmenting the phonocardiogram to detect
fundamental heart sounds; after which features are extracted and classification
is performed. Some researchers in the field argue the segmentation step is an
unwanted computational burden, whereas others embrace it as a prior step to
feature extraction. When comparing accuracies achieved by studies that have
segmented heart sounds before analysis with those who have overlooked that
step, the question of whether to segment heart sounds before feature extraction
is still open. In this study, we explicitly examine the importance of heart
sound segmentation as a prior step for heart sound classification, and then
seek to apply the obtained insights to propose a robust classifier for abnormal
heart sound detection. Furthermore, recognizing the pressing need for
explainable Artificial Intelligence (AI) models in the medical domain, we also
unveil hidden representations learned by the classifier using model
interpretation techniques. Experimental results demonstrate that the
segmentation plays an essential role in abnormal heart sound classification.
Our new classifier is also shown to be robust, stable and most importantly,
explainable, with an accuracy of almost 100% on the widely used PhysioNet
dataset
Automatic Environmental Sound Recognition: Performance versus Computational Cost
In the context of the Internet of Things (IoT), sound sensing applications
are required to run on embedded platforms where notions of product pricing and
form factor impose hard constraints on the available computing power. Whereas
Automatic Environmental Sound Recognition (AESR) algorithms are most often
developed with limited consideration for computational cost, this article seeks
which AESR algorithm can make the most of a limited amount of computing power
by comparing the sound classification performance em as a function of its
computational cost. Results suggest that Deep Neural Networks yield the best
ratio of sound classification accuracy across a range of computational costs,
while Gaussian Mixture Models offer a reasonable accuracy at a consistently
small cost, and Support Vector Machines stand between both in terms of
compromise between accuracy and computational cost
Marine animal sound classification
Software was developed to measure characteristics of marine animal sounds (AcouStat). These measurements proved effective
for classifying sounds in several contexts: identifying species, quantifying the repertoire of a single species, and identifying
individuals. The sound measures included statistics for aggregate bandwidth, intensity, duration, amplitude modulation, frequency
modulation, center frequency, and interactions among these variables. Classification analysis based on these measures suggests
they adequately characterize the variability of bioacoustic signals for many problems. Correct classification to species was as high
as 85%, and correct classification of dolphin whistles to individual was 90%.Funding was provided by the Office of Naval Research through the Naval Undersea Warfare
Center under Contract No. N-00140-90-D-1979
- …