Search CORE

2,154 research outputs found

Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network

Author: Adavanne Sharath
Pertilä Pasi
Virtanen Tuomas
Publication venue
Publication date: 01/01/2017
Field of study

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a volume. Using the proposed spatial features over monaural features on the same network gives an absolute F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and 2.7% on the TUT-SED 2009 dataset that is fifteen times larger.Comment: Accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Polyphonic Sound Event Detection by using Capsule Neural Networks

Author: Gabrielli Leonardo
Principi Emanuele
Squartini Stefano
Vesperini Fabio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/01/2019
Field of study

Artificial sound event detection (SED) has the aim to mimic the human ability to perceive and understand what is happening in the surroundings. Nowadays, Deep Learning offers valuable techniques for this goal such as Convolutional Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has been recently introduced in the image processing field with the intent to overcome some of the known limitations of CNNs, specifically regarding the scarce robustness to affine transformations (i.e., perspective, size, orientation) and the detection of overlapped images. This motivated the authors to employ CapsNets to deal with the polyphonic-SED task, in which multiple sound events occur simultaneously. Specifically, we propose to exploit the capsule units to represent a set of distinctive properties for each individual sound event. Capsule units are connected through a so-called "dynamic routing" that encourages learning part-whole relationships and improves the detection performance in a polyphonic context. This paper reports extensive evaluations carried out on three publicly available datasets, showing how the CapsNet-based algorithm not only outperforms standard CNNs but also allows to achieve the best results with respect to the state of the art algorithms

arXiv.org e-Print Archive

IRIS UniversitÃ Politecnica delle Marche

Functional roles of synaptic inhibition in auditory temporal processing

Author: Pecka Michael
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 27/10/2008
Field of study

Digitale Hochschulschriften der LMU

Detection and Classification of Acoustic Scenes and Events

Author: Benetos E
Giannoulis D
Lagrange M
Plumbley MD
Rossignol M
Stowell D
Publication venue
Publication date: 30/12/2013
Field of study

Queen Mary Research Online

The role of perceived source location in auditory stream segregation: separation affects sound organization, common fate does not

Author: Andreou Andreas G.
Bendixen Alexandra
Bőhm Tamás M.
Cassidy Andrew
Denham Susan L.
Garreau Guillame
Georgiou Julius
Pouliquen Philippe
Shestopalova Lidia
Winkler István
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 19/06/2013
Field of study

The human auditory system is capable of grouping sounds originating from different sound sources into coherent auditory streams, a process termed auditory stream segregation. Several cues can inﬂuence auditory stream segregation, but the full set of cues and the way in which they are integrated is still unknown. In the current study, we tested whether auditory motion can serve as a cue for segregating sequences of tones. Our hypothesis was that, following the principle of common fate, sounds emitted by sources moving together in space along similar trajectories will be more likely to be grouped into a single auditory stream, while sounds emitted by independently moving sources will more often be heard as two streams. Stimuli were derived from sound recordings in which the sound source motion was induced by walking humans. Although the results showed a clear effect of spatial separation, auditory motion had a negligible inﬂuence on stream segregation. Hence, auditory motion may not be used as a primitive cue in auditory stream segregation

Repository of the Academy's Library