Search CORE

718 research outputs found

Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network

Author: Adavanne Sharath
Pertilä Pasi
Virtanen Tuomas
Publication venue
Publication date: 01/01/2017
Field of study

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a volume. Using the proposed spatial features over monaural features on the same network gives an absolute F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and 2.7% on the TUT-SED 2009 dataset that is fifteen times larger.Comment: Accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Polyphonic Sound Event Detection by using Capsule Neural Networks

Author: Gabrielli Leonardo
Principi Emanuele
Squartini Stefano
Vesperini Fabio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/01/2019
Field of study

Artificial sound event detection (SED) has the aim to mimic the human ability to perceive and understand what is happening in the surroundings. Nowadays, Deep Learning offers valuable techniques for this goal such as Convolutional Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has been recently introduced in the image processing field with the intent to overcome some of the known limitations of CNNs, specifically regarding the scarce robustness to affine transformations (i.e., perspective, size, orientation) and the detection of overlapped images. This motivated the authors to employ CapsNets to deal with the polyphonic-SED task, in which multiple sound events occur simultaneously. Specifically, we propose to exploit the capsule units to represent a set of distinctive properties for each individual sound event. Capsule units are connected through a so-called "dynamic routing" that encourages learning part-whole relationships and improves the detection performance in a polyphonic context. This paper reports extensive evaluations carried out on three publicly available datasets, showing how the CapsNet-based algorithm not only outperforms standard CNNs but also allows to achieve the best results with respect to the state of the art algorithms

arXiv.org e-Print Archive

IRIS UniversitÃ Politecnica delle Marche

Recommended from our members

Detection and classification of acoustic scenes and events: an IEEE AASP challenge

Author: Benetos E.
Giannoulis D.
Lagrange M.
Plumbley M. D.
Rossignol M.
Stowell D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

City Research Online

Detection and Classification of Acoustic Scenes and Events

Author: Benetos E
Giannoulis D
Lagrange M
Plumbley MD
Rossignol M
Stowell D
Publication venue
Publication date: 30/12/2013
Field of study

Queen Mary Research Online

Acoustic scene classification using spatial spectrum estimation

Author: Sinisalmi Sami
Publication venue
Publication date: 02/01/2020
Field of study

Analysis of audio from our surroundings gives us important cues about the acoustic scene, with automatic analysis usually done by sound event detection or analysing the audio scene as a whole. On the other hand, inspecting the auditory space characteristics, or openness of the space, is a much less studied aspect. This thesis aims to study the classification of audio scenes based on the aforementioned auditory space characteristics with the use of different audio features. In this work, log-mel band energies and spatial spectrums for the audio recordings are calculated and used in the classification. The results revealed that best performance is obtained when using the combination of mel features and spatial spectrum, instead of either one of them. It was also observed how any differences inside a class can affect the results

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Sound Event Localization, Detection, and Tracking by Deep Neural Networks

Author: Adavanne Sharath
Publication venue: Tampere University
Publication date: 04/03/2020
Field of study

In this thesis, we present novel sound representations and classification methods for the task of sound event localization, detection, and tracking (SELDT). The human auditory system has evolved to localize multiple sound events, recognize and further track their motion individually in an acoustic environment. This ability of humans makes them context-aware and enables them to interact with their surroundings naturally. Developing similar methods for machines will provide an automatic description of social and human activities around them and enable machines to be context-aware similar to humans. Such methods can be employed to assist the hearing impaired to visualize sounds, for robot navigation, and to monitor biodiversity, the home, and cities. A real-life acoustic scene is complex in nature, with multiple sound events that are temporally and spatially overlapping, including stationary and moving events with varying angular velocities. Additionally, each individual sound event class, for example, a car horn can have a lot of variabilities, i.e., different cars have different horns, and within the same model of the car, the duration and the temporal structure of the horn sound is driver dependent. Performing SELDT in such overlapping and dynamic sound scenes while being robust is challenging for machines. Hence we propose to investigate the SELDT task in this thesis and use a data-driven approach using deep neural networks (DNNs). The sound event detection (SED) task requires the detection of onset and offset time for individual sound events and their corresponding labels. In this regard, we propose to use spatial and perceptual features extracted from multichannel audio for SED using two different DNNs, recurrent neural networks (RNNs) and convolutional recurrent neural networks (CRNNs). We show that using multichannel audio features improves the SED performance for overlapping sound events in comparison to traditional single-channel audio features. The proposed novel features and methods produced state-of-the-art performance for the real-life SED task and won the IEEE AASP DCASE challenge consecutively in 2016 and 2017. Sound event localization is the task of spatially locating the position of individual sound events. Traditionally, this has been approached using parametric methods. In this thesis, we propose a CRNN for detecting the azimuth and elevation angles of multiple temporally overlapping sound events. This is the first DNN-based method performing localization in complete azimuth and elevation space. In comparison to parametric methods which require the information of the number of active sources, the proposed method learns this information directly from the input data and estimates their respective spatial locations. Further, the proposed CRNN is shown to be more robust than parametric methods in reverberant scenarios. Finally, the detection and localization tasks are performed jointly using a CRNN. This method additionally tracks the spatial location with time, thus producing the SELDT results. This is the first DNN-based SELDT method and is shown to perform equally with stand-alone baselines for SED, localization, and tracking. The proposed SELDT method is evaluated on nine datasets that represent anechoic and reverberant sound scenes, stationary and moving sources with varying velocities, a different number of overlapping sound events and different microphone array formats. The results show that the SELDT method can track multiple overlapping sound events that are both spatially stationary and moving

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University