Search CORE

72 research outputs found

Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation

Author: Grais Emad M.
Plumbley Mark D.
Ward Dominic
Wierstorf Hagen
Publication venue
Publication date: 28/10/2017
Field of study

In deep neural networks with convolutional layers, each layer typically has fixed-size/single-resolution receptive field (RF). Convolutional layers with a large RF capture global information from the input features, while layers with small RF size capture local details with high resolution from the input features. In this work, we introduce novel deep multi-resolution fully convolutional neural networks (MR-FCNN), where each layer has different RF sizes to extract multi-resolution features that capture the global and local details information from its input features. The proposed MR-FCNN is applied to separate a target audio source from a mixture of many audio sources. Experimental results show that using MR-FCNN improves the performance compared to feedforward deep neural networks (DNNs) and single resolution deep fully convolutional neural networks (FCNNs) on the audio source separation problem.Comment: arXiv admin note: text overlap with arXiv:1703.0801

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

A machine-hearing system exploiting head movements for binaural sound localisation in reverberant conditions

Author: Brown G.J.
Ma N.
May T.
Wierstorf H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This paper is concerned with machine localisation of multiple active speech sources in reverberant environments using two (binaural) microphones. Such conditions typically present a problem for `classical' binaural models. Inspired by the human ability to utilise head movements, the current study investigated the influence of different head movement strategies on binaural sound localisation. A machine-hearing system that exploits a multi-step head rotation strategy for sound localisation was found to produce the best performance in simulated reverberant acoustic space. This paper also reports the public release of a free binaural room impulse responses (BRIRs) database that allows the simulation of head rotation used in this study

Crossref

White Rose Research Online

Online Research Database In Technology

Assessing localization accuracy in sound field synthesis

Author: Raake Alexander
Spors Sascha
Wierstorf Hagen
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 13/04/2017
Field of study

Digitale Bibliothek Thüringen

Theory of Sound Field Synthesis

Author: Hahn Nara
Hohnerlein Christoph
Rettberg Till
Schultz Frank
Spors Sascha
Wierstorf Hagen
Winter Fiete
Publication venue
Publication date
Field of study

PDF version of the Theory of Sound Field Synthesis presented at http://sfstoolbox.org/en/3.0/

ZENODO

Probing Speech Emotion Recognition Transformers for Linguistic Knowledge

Author: Burkhardt Felix
Eyben Florian
Reichel Uwe
Schmitt Maximilian
Schuller Björn W.
Triantafyllopoulos Andreas
Wagner Johannes
Wierstorf Hagen
Publication venue
Publication date: 01/01/2022
Field of study

Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in self-supervised manner with the goal to improve automatic speech recognition performance -- and thus, to understand linguistic information. In this work, we investigate the extent in which this information is exploited during SER fine-tuning. Using a reproducible methodology based on open-source tools, we synthesise prosodically neutral speech utterances while varying the sentiment of the text. Valence predictions of the transformer model are very reactive to positive and negative sentiment content, as well as negations, but not to intensifiers or reducers, while none of those linguistic features impact arousal or dominance. These findings show that transformers can successfully leverage linguistic information to improve their valence predictions, and that linguistic analysis should be included in their testing.Comment: This work has been submitted for publication to Interspeech 202

arXiv.org e-Print Archive

OPUS Augsburg

Probing speech emotion recognition transformers for linguistic knowledge

Author: Burkhardt Felix
Eyben Florian
Reichel Uwe
Schmitt Maximilian
Schuller Björn W.
Triantafyllopoulos Andreas
Wagner Johannes
Wierstorf Hagen
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2022
Field of study

OPUS Augsburg

A metric for predicting binaural speech intelligibility in stationary noise and competing speech maskers

Author: Bruno M. Fazenda
Cooke M.
Durlach N. I.
Falk T.
IEC 60268-16:2011
Mapp P.
Martin Cooke
Rothauser E. H.
Sauert B.
Sonnenscheinn D.
Taal C. H.
Tang Y.
Trevor J. Cox
Wierstorf H.
Yan Tang
Zurek P. M.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 21/09/2016
Field of study

One criterion in the design of binaural sound scenes in audio production is the extent to which the intended speech message is correctly understood. Object-based audio broadcasting systems have permitted sound editors to gain more access to the metadata (e.g., intensity and location) of each sound source, providing better control over speech intelligibility. The current study describes and evaluates a binaural distortion-weighted glimpse proportion metric -- BiDWGP -- which is motivated by better-ear glimpsing and binaural masking level differences. BiDWGP predicts intelligibility from two alternative input forms: either binaural recordings or monophonic recordings from each sound source along with their locations. Two listening experiments were performed with stationary noise and competing speech, one in the presence of a single masker, the other with multiple maskers, for a variety of spatial conﬁgurations. Overall, BiDWGP with both input forms predicts listener keyword scores with correlations of 0.95 and 0.91 for single- and multi-masker conditions, respectively. When considering masker type separately, correlations rise to 0.95 and above for both types of maskers. Predictions using the two input forms are very similar, suggesting that BiDWGP can be applied to the design of sound scenes where only individual sound sources and their locations are available

University of Salford Institutional Repository

Crossref

The moving minimum audible angle is smaller during self motion than during source motion

Author: Abraham
Brimijoin
Brimijoin
Brimijoin
Burian
Burkhard
Chen
Clark
Graybiel
Groh
Guldin
Hartline
HÃ¤usler
Jay
Jay
Kim
KÃ¶nig
Lackner
Lester
Lewald
Lewald
Lewald
Lewald
Lewald
Lorente De No
Macleod
Michael A. Akeroyd
Mills
Perrett
Perrott
Perrott
Thurlow
Van Barneveld
VÃ¤ljamÃ¤e
W. Owen Brimijoin
Wallach
Wierstorf
Wightman
Zwiers
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

We are rarely perfectly still: our heads rotate in three axes and move in three dimensions, constantly varying the spectral and binaural cues at the ear drums. In spite of this motion, static sound sources in the world are typically perceived as stable objects. This argues that the auditory system-in a manner not unlike the vestibulo-ocular reflex-works to compensate for self motion and stabilize our sensory representation of the world. We tested a prediction arising from this postulate: that self motion should be processed more accurately than source motion. We used an infrared motion tracking system to measure head angle, and real-time interpolation of head related impulse responses to create "head-stabilized" signals that appeared to remain fixed in space as the head turned. After being presented with pairs of simultaneous signals consisting of a man and a woman speaking a snippet of speech, normal and hearing impaired listeners were asked to report whether the female voice was to the left or the right of the male voice. In this way we measured the moving minimum audible angle (MMAA). This measurement was made while listeners were asked to turn their heads back and forth between ± 15° and the signals were stabilized in space. After this "self-motion" condition we measured MMAA in a second "source-motion" condition when listeners remained still and the virtual locations of the signals were moved using the trajectories from the first condition. For both normal and hearing impaired listeners, we found that the MMAA for signals moving relative to the head was ~1-2° smaller when the movement was the result of self motion than when it was the result of source motion, even though the motion with respect to the head was identical. These results as well as the results of past experiments suggest that spatial processing involves an ongoing and highly accurate comparison of spatial acoustic cues with self-motion cues

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Perzeptive Untersuchung der Schallfeldsynthese

Author: Wierstorf Hagen
Publication venue
Publication date: 19/12/2014
Field of study

Die vorliegende Arbeit untersucht die beiden Schallfeldsyntheseverfahren Wellenfeldsynthese und Nahfeld-entzerrtes Ambisonics höherer Ordnung. Sie fasst die Theorie der beiden Verfahren zusammen und stellt eine Software-Umgebung zur Verfügung, um beide Verfahren numerisch zu simulieren. Diskutiert werden mögliche Abweichungen der mit realen Lautsprechergruppen synthetisierten Schallfelder. Dies geschieht sowohl auf theoretischer Basis als auch in einer Reihe von psychoakustischen Experimenten. Die Experimente untersuchen dabei die räumliche und klangliche Treue und zeitlich-spektrale Artefakte der verwendeten Systeme. Systematisch wird dies für eine große Anzahl von verschiedenen Lautsprechergruppen angewendet. Die Experimente werden mit Hilfe von dynamischer binauraler Synthese durchgeführt, damit auch Lautsprechergruppen mit einem Abstand von unter 1 cm zwischen den Lautsprechern untersucht werden können. Die Ergebnisse zeigen, dass räumliche Treue bereits mit einem Lautsprecherabstand von 20 cm erzielt werden kann, während klangliche Treue nur mit Abständen kleiner als 1 cm möglich ist. Zeitlich-spektrale Artefakte treten nur bei der Synthese von fokussierten Quellen auf. Am Ende wird ein binaurales Modell präsentiert, welches in der Lage ist die räumliche Treue für beliebige Lautsprechergruppen vorherzusagen.This thesis investigates the two sound field synthesis methods Wave Field Synthesis and near-field compensated higher order Ambisonics. It summarizes their theory and provides a software toolkit for corresponding numerical simulations. Possible deviations of the synthesized sound field for real loudspeaker arrays and their perceptual relevance are discussed. This is done firstly based on theoretical considerations, and then addressed in several psychoacoustic experiments. These experiments investigate the spatial and timbral fidelity and spectro-temporal-artifacts in a systematic way for a large number of different loudspeaker setups. The experiments are conducted with the help of dynamic binaural synthesis in order to simulate loudspeaker setups with an inter-loudspeaker spacing of under 1 cm. The results show that spatial fidelity can already be achieved with setups having an inter-loudspeaker spacing of 20 cm, whereas timbral fidelity is only possible for setups employing a spacing below 1 cm. Spectro-temporal artifacts are relevant only for the synthesis of focused sources. At the end of the thesis, a binaural auditory model is presented that is able to predict the spatial fidelity for any given loudspeaker setup

DepositOnce

Code to reproduce the figures in the paper 'Assessing localization accuracy in sound field synthesis'

Author: Wierstorf Hagen
Publication venue
Publication date
Field of study

In this upload you find all the scripts and data you need in order to reproduce the figure from the paper Wierstorf et al., "Assessing localization accuracy in sound field synthesis" [1]. ## Software Requirements ### Sound Field Synthesis Toolbox From the [Sound Field Synthesis Toolbox](https://github.com/sfstoolbox/sfs-matlab) git repository you need to checkout the version *commit 3730bc0*, which is identical with release 1.0.0. Under Linux this can be done the following way: ```

git clone https://github.com/sfstoolbox/sfs-matlab.git

cd sfs-matlab

git checkout 3730bc0

cd .. ``` ### SOFA (Spatially Oriented Format for Acoustics) Matlab/Octave API From the [SOFA Matlab/Octave API](https://github.com/sofacoustics/API_MO) git repository you need to checkout the version *commit 260079a*, which is identical with release 1.0.1. Under Linux this can be done the following way: ```

git clone https://github.com/sofacoustics/API_MO.git sofa

cd sofa

git checkout 260079a

cd .. ``` ### Two!Ears Auditory Front-end Form the [Two!Ears Auditory Front-end](https://github.com/TWOEARS/auditory-front-end) git repository you need to checkout the version *commit ce47b54*, which is identical with the release 1.0. Under Linux this can be done the following way: ```

git clone https://github.com/TWOEARS/auditory-front-end.git

cd auditory-front-end

git checkout ce47b54

cd .. ``` ## Reproduce data After downloading all needed packages start Matlab. Then we have to initialize all the toolboxes. This can be done by running the following commands from the `sfs/` and `amtoolbox/` directory, respectively. ```Matlab >> cd sfs-matlab >> SFS_start; >> cd ../sofa/API_MO >> SOFAstart; >> cd ../../auditory-front-end >> startAuditoryFrontEnd; >> cd .. ``` Now everything is prepared and you can recreate the data from the numerical simulation in Fig.1 and the ITD estimation by the binaural model in Fig.6: ```Matlab >> cd fig01 >> fig01 >> cd ../fig06 >> fig06 ``` ## Reproduce figures All figures were plotted using gnuplot 5.0. Every figure folder has an ``figXX.plt`` file that you can execute and you will get the resulting pdf file. Only Fig. 2 is generated via tikz and you have to compile the ``fig02.tex`` file using pdflatex. ## References [1] H. Wierstorf, A. Raake, S. Spors, "Assessing localization accuracy in sound field synthesis," J. Acoust. Soc. Am., 141, p. 1111-1119 (2017), doi:10.1121/1.4976061

ZENODO

FigShare