Search CORE

10 research outputs found

Anomaly detection for environmental noise monitoring

Author: Phan Duc H.
Publication venue
Publication date: 01/05/2018
Field of study

Octave-band sound pressure level is the preferred measure for continuous environmental noise monitoring over raw audio because accepted standards and devices exist, these data do not compromise voice privacy, and thus an octave-band sound meter can legally collect data in public. By setting up an experiment that continuously monitors octave-band sound pressure level in a residential street, we show daily noise-level patterns correlated to human activities. Directly applying well-known anomaly detection algorithms including one-class support vector machine, replicator neural network, and principal component analysis based anomaly detection shows low performance in the collected data because these standard algorithms are unable to exploit the daily patterns. Therefore, principal component analysis anomaly detection with time-varying mean and the covariance matrix over each hour, is proposed in order to detect abnormal acoustic events in the octave band measurements of the residential-noise-monitoring application. The proposed method performs at 0.83 in recall, 0.88 in precision and 0.85 in F-measure on the evaluation data set.Ope

ZENODO

Illinois Digital Environment for Access to Learning and Scholarship Repository

Anomaly detection for environmental noise monitoring

Author: Phan Duc H.
Publication venue
Publication date: 01/05/2018
Field of study

Illinois Digital Environment for Access to Learning and Scholarship Repository

Clasificación automática de sonidos utilizando aprendizaje máquina

Author: Rodríguez Ramírez Patricio
Publication venue
Publication date: 01/01/2020
Field of study

En los últimos años, el aprendizaje máquina se ha venido utilizando intensamente para el reconocimiento de sonidos. Algunos son fácilmente distinguibles, como una risa, pero otros en cambio pueden ser muy similares entre sí, como una batidora y una motosierra. Además, la variabilidad inherente a estos audios hace que este problema sea bastante complicado de resolver mediante técnicas de procesado clásicas, pero supone un desafío apropiado para los altos niveles de abstracción que se pueden conseguir con las técnicas de aprendizaje máquina. En este trabajo se presentan dos modelos de red neuronal convolucional (CNN) para resolver un problema de clasificación de sonidos ambientales en siete categorías distintas. Los extractos de audio usados son los proporcionados por la base de datos UrbanSound8K. El rendimiento de ambos modelos llega a alcanzar el 90% de precisión en la clasificación de estos sonidos.Machine learning has been used intensively for sound recognition in recent years. Some sounds are easily distinguishable, like a laugh, but others can be very similar to each other, like a blender and a chainsaw. Furthermore, the inherent variability in these audios makes this problem quite difficult to solve using classical processing techniques, but it is an appropriate challenge for the high levels of abstraction that can be achieved with machine learning techniques. In this work, two convolutional neural network (CNN) models are presented to solve a problem of environmental sound classification in seven different labels. The audio excerpts used are those provided by the UrbanSound8K database. The performance of both models reaches 90% accuracy in the classification of these sounds.Universidad de Sevilla. Grado en Ingeniería de las Tecnologías de Telecomunicació

idUS. Depósito de Investigación Universidad de Sevilla

Acoustic scene classification with matrix factorization for unsupervised feature learning

Author: Bisot Victor
Essid Slim
Richard Gael
Serizel Romain
Publication venue: HAL CCSD
Publication date
Field of study

International audienc

Deep Learning Based Sound Event Detection and Classification

Author: Nasiri Alireza
Publication venue: Scholar Commons
Publication date: 01/04/2021
Field of study

Hearing sense has an important role in our daily lives. During the recent years, there has been many studies to transfer this capability to the computers. In this dissertation, we design and implement deep learning based algorithms to improve the ability of the computers in recognizing the different sound events. In the first topic, we investigate sound event detection, which identifies the time boundaries of the sound events in addition to the type of the events. For sound event detection, we propose a new method, AudioMask, to benefit from the object-detection techniques in computer vision. In this method, we convert the question of identifying time boundaries for sound events, into the problem of identifying objects in images by treating the spectrograms of the sound as images. AudioMask first applies Mask R-CNN, an algorithm for detecting objects in images, to the log-scaled mel-spectrograms of the sound files. Then we use a frame-based sound event classifier trained independently from Mask R-CNN, to analyze each individual frame in the candidate segments. Our experiments show that, this approach has promising results and can successfully identify the exact time boundaries of the sound events. The code for this study is available at https://github.com/alireza-nasiri/AudioMask. In the second topic, we present SoundCLR, a supervised contrastive learning based method for effective environmental sound classification with state-of-the-art performance, which works by learning representations that disentangle the samples of each class from those of other classes. We also exploit transfer learning and strong data augmentation to improve the results. Our extensive benchmark experiments show that our hybrid deep network models trained with combined contrastive and cross-entropy loss achieved the state-of-the-art performance on three benchmark datasets ESC-10, ESC-50, and US8K with validation accuracies of 99.75%, 93.4%, and 86.49% respectively. The ensemble version of our models also outperforms other top ensemble methods. Finally, we analyze the acoustic emissions that are generated during the degradation process of SiC composites. The aim here is to identify the state of the degradation in the material, by classifying its emitted acoustic signals. As our baseline, we use random forest method on expert-defined features. Also we propose a deep neural network of convolutional layers to identify the patterns in the raw sound signals. Our experiments show that both of our methods are reliably capable of identifying the degradation state of the composite, and in average, the convolutional model significantly outperforms the random forest technique

Scholar Commons - Institutional Repository of the University of South Carolina

Calibration of sound source localisation for robots using multiple adaptive filter models of the cerebellum

Author: Baxendale Mark David
Publication venue
Publication date
Field of study

The aim of this research was to investigate the calibration of Sound Source Localisation (SSL) for robots using the adaptive filter model of the cerebellum and how this could be automatically adapted for multiple acoustic environments. The role of the cerebellum has mainly been identified in the context of motor control, and only in recent years has it been recognised that it has a wider role to play in the senses and cognition. The adaptive filter model of the cerebellum has been successfully applied to a number of robotics applications but so far none involving auditory sense. Multiple models frameworks such as MOdular Selection And Identification for Control (MOSAIC) have also been developed in the context of motor control, and this has been the inspiration for adaptation of audio calibration in multiple acoustic environments; again, application of this approach in the area of auditory sense is completely new. The thesis showed that it was possible to calibrate the output of an SSL algorithm using the adaptive filter model of the cerebellum, improving the performance compared to the uncalibrated SSL. Using an adaptation of the MOSAIC framework, and specifically using responsibility estimation, a system was developed that was able to select an appropriate set of cerebellar calibration models and to combine their outputs in proportion to how well each was able to calibrate, to improve the SSL estimate in multiple acoustic contexts, including novel contexts. The thesis also developed a responsibility predictor, also part of the MOSAIC framework, and this improved the robustness of the system to abrupt changes in context which could otherwise have resulted in a large performance error. Responsibility prediction also improved robustness to missing ground truth, which could occur in challenging environments where sensory feedback of ground truth may become impaired, which has not been addressed in the MOSAIC literature, adding to the novelty of the thesis. The utility of the so-called cerebellar chip has been further demonstrated through the development of a responsibility predictor that is based on the adaptive filter model of the cerebellum, rather than the more conventional function fitting neural network used in the literature. Lastly, it was demonstrated that the multiple cerebellar calibration architecture is capable of limited self-organising from a de-novo state, with a predetermined number of models. It was also demonstrated that the responsibility predictor could learn against its model after self-organisation, and to a limited extent, during self-organisation. The thesis addresses an important question of how a robot could improve its ability to listen in multiple, challenging acoustic environments, and recommends future work to develop this ability

UWE Bristol Research Repository