Search CORE

9,591 research outputs found

An extensible cluster-graph taxonomy for open set sound scene analysis

Author: BEAR H
BENETOS E
Workshop on Detection and Classification of Acoustic Scenes and Events
Publication venue
Publication date: 26/09/2018
Field of study

We present a new extensible and divisible taxonomy for open set sound scene analysis. This new model allows complex scene analysis with tangible descriptors and perception labels. Its novel structure is a cluster graph such that each cluster (or subset) can stand alone for targeted analyses such as office sound event detection, whilst maintaining integrity over the whole graph (superset) of labels. The key design benefit is its extensibility as new labels are needed during new data capture. Furthermore, datasets which use the same taxonomy are easily augmented, saving future data collection effort. We balance the details needed for complex scene analysis with avoiding 'the taxonomy of everything' with our framework to ensure no duplicity in the superset of labels and demonstrate this with DCASE challenge classifications

Queen Mary Research Online

Leveraging label hierarchies for few-shot everyday sound recognition

Author: 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)
Benetos E
Liang J
Phan QH
Publication venue
Publication date: 03/11/2022
Field of study

Everyday sounds cover a considerable range of sound categories in our daily life, yet for certain sound categories it is hard to collect sufficient data. Although existing works have applied few-shot learning paradigms to sound recognition successfully, most of them have not exploited the relationship between labels in audio taxonomies. This work adopts a hierarchical prototypical network to leverage the knowledge rooted in audio taxonomies. Specifically, a VGG-like convolutional neural network is used to extract acoustic features. Prototypical nodes are then calculated in each level of the tree structure. A multi-level loss is obtained by multiplying a weight decay with multiple losses. Experimental results demonstrate our hierarchical prototypical networks not only outperform prototypical networks with no hierarchy information but yield a better result than other state-of-the art algorithms. Our code is available in: https://github.com/JinhuaLiang/HPNs_taggin

Queen Mary Research Online

Few-Shot Bioacoustic Event Detection: Enhanced Classifiers for Prototypical Networks

Author: 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)
Li R
Liang J
Phan QH
Publication venue
Publication date: 03/11/2022
Field of study

Queen Mary Research Online

To bee or not to bee: Investigating machine learning approaches for beehive sound recognition

Author: 2018 Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2018)
BENETOS E
Nolasco I
Publication venue
Publication date: 29/09/2018
Field of study

In this work, we aim to explore the potential of machine learning methods to the problem of beehive sound recognition. A major contribution of this work is the creation and release of annotations for a selection of beehive recordings. By experimenting with both support vector machines and convolutional neural networks, we explore important aspects to be considered in the development of beehive sound recognition systems using machine learning approaches

Queen Mary Research Online

Explaining the decisions of anomalous sound detectors

Author: 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)
Benetos E
Davies T
Griffi LD
Mai KT
Publication venue
Publication date: 03/11/2022
Field of study

Deciding whether a sound is anomalous is accomplished by comparing it to a learnt distribution of inliers. Therefore, learning a distribution close to the true population of inliers is vital for anomalous sound detection (ASD). Data engineering is a common strategy to aid training and improve generalisation. However, in the context of ASD, it is debatable whether data engineering indeed facilitates generalisation or whether it obscures characteristics that distinguish anomalies from inliers. We conduct an exploratory investigation into this by focusing on frequency-related data engineering. We adapt local model explanations to anomaly detectors and show that models rely on higher frequencies to distinguish anomalies from inliers. We verify this by filtering the input data's frequencies and observing the change in ASD performance. Our results indicate that sifting out low frequencies by applying high-pass filters aids downstream performance, and this could serve as a simple pre-processing step for improving anomaly detectors

Queen Mary Research Online

Robustness of Adversarial Attacks in Sound Event Classification

Author: 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2019)
Benetos E
Sandler M
SUBRAMANIAN V
Publication venue
Publication date: 25/10/2019
Field of study

An adversarial attack is a method to generate perturbations to the input of a machine learning model in order to make the output of the model incorrect. The perturbed inputs are known as adversarial examples. In this paper, we investigate the robustness of adversarial examples to simple input transformations such as mp3 compression, resampling, white noise and reverb in the task of sound event classification. By performing this analysis, we aim to provide insights on strengths and weaknesses in current adversarial attack algorithms as well as provide a baseline for defenses against adversarial attacks. Our work shows that adversarial attacks are not robust to simple input transformations. White noise is the most consistent method to defend against adversarial attacks with a success rate of 73.72% averaged across all models and attack algorithms

Queen Mary Research Online

Onsets, activity, and events: a multi-task approach for polyphonic sound event modelling

Author: 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2019)
Bear H
Benetos E
Pankajakshan A
Publication venue
Publication date: 25/10/2019
Field of study

State of the art polyphonic sound event detection (SED) systems function as frame-level multi-label classification models. In the context of dynamic polyphony levels at each frame, sound events interfere with each other which degrade a classifier's ability to learn the exact frequency profile of individual sound events. Frame-level localized classifiers also fail to explicitly model the long-term temporal structure of sound events. Consequently, the event-wise detection performance is less than the segment-wise detection. We define 'temporally precise polyphonic sound event detection' as the subtask of detecting sound event instances with the correct onset. Here, we investigate the effectiveness of sound activity detection (SAD) and onset detection as auxiliary tasks to improve temporal precision in polyphonic SED using multi-task learning. SAD helps to differentiate event activity frames from noisy and silence frames and helps to avoid missed detections at each frame. Onset predictions ensure the start of each event which in turn are used to condition predictions of both SAD and SED. Our experiments on the URBAN-SED dataset show that by conditioning SED with onset detection and SAD, there is over a three-fold relative improvement in event-based F-score

Queen Mary Research Online

Audio tagging using a linear noise modelling layer

Author: 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2019)
Benetos E
Pankajakshan A
Singh S
Publication venue
Publication date: 25/10/2019
Field of study

Label noise refers to the presence of inaccurate target labels in a dataset. It is an impediment to the performance of a deep neural network (DNN) as the network tends to overfit to the label noise, hence it becomes imperative to devise a generic methodology to counter the effects of label noise. FSDnoisy18k is an audio dataset collected with the aim of encouraging research on label noise for sound event classification. The dataset contains ~42.5 hours of audio recordings divided across 20 classes, with a small amount of manually verified labels and a large amount of noisy data. Using this dataset, our work intends to explore the potential of modelling the label noise distribution by adding a linear layer on top of a baseline network. The accuracy of the approach is compared to an alternative approach of adopting a noise robust loss function. Results show that modelling the noise distribution improves the accuracy of the baseline network in a similar capacity to the soft bootstrapping loss

Queen Mary Research Online

Detection and Classification of Acoustic Scenes and Events

Author: Benetos E
Giannoulis D
Lagrange M
Plumbley MD
Rossignol M
Stowell D
Publication venue
Publication date: 30/12/2013
Field of study

Queen Mary Research Online