Search CORE

1,013 research outputs found

Cognitive performance in open-plan office acoustic simulations: Effects of room acoustics and semantics but not spatial separation of sound sources

Author: Fels Janina
Georgi Markus
Klatte Maria
Leist Larissa
Schlittmeier Sabine J.
Yadav Manuj
Publication venue
Publication date: 08/08/2023
Field of study

The irrelevant sound effect (ISE) characterizes short-term memory performance impairment during irrelevant sounds relative to quiet. Irrelevant sound presentation in most laboratory-based ISE studies has been rather limited to represent complex scenarios including open-plan offices (OPOs) and not many studies have considered serial recall of heard information. This paper investigates ISE using an auditory-verbal serial recall task, wherein performance was evaluated for relevant factors in simulating OPO acoustics: the irrelevant sounds including the semanticity of speech, reproduction methods over headphones, and room acoustics. Results (Experiments 1 and 2) show that ISE was exhibited in most conditions with anechoic (irrelevant) nonspeech sounds with/without speech, but the effect was substantially higher with meaningful speech compared to foreign speech, suggesting a semantic effect. Performance differences in conditions with diotic and binaural reproductions were not statistically robust, suggesting limited role of spatial separation of sources. In Experiment 3, statistically robust ISE were exhibited for binaural room acoustic conditions with mid-frequency reverberation times, T30 (s) = 0.4, 0.8, 1.1, suggesting cognitive impairment regardless of sound absorption representative of OPOs. Performance differences in T30 = 0.4 s relative to T30 = 0.8 and 1.1 s conditions were statistically robust. This emphasizes the benefits for cognitive performance with increased sound absorption, reinforcing extant room acoustic design recommendations. Performance differences in T30 = 0.8 s vs. 1.1 s were not statistically robust. Collectively, these results suggest that certain findings from ISE studies with idiosyncratic acoustics may not translate well to complex OPO acoustic environments

arXiv.org e-Print Archive

Recommended from our members

Automatic Speech Separation for Brain-Controlled Hearing Technologies

Author: Han Cong
Publication venue
Publication date: 01/01/2024
Field of study

Speech perception in crowded acoustic environments is particularly challenging for hearing impaired listeners. While assistive hearing devices can suppress background noises distinct from speech, they struggle to lower interfering speakers without knowing the speaker on which the listener is focusing. The human brain has a remarkable ability to pick out individual voices in a noisy environment like a crowded restaurant or a busy city street. This inspires the brain-controlled hearing technologies. A brain-controlled hearing aid acts as an intelligent filter, reading wearers’ brainwaves and enhancing the voice they want to focus on. Two essential elements form the core of brain-controlled hearing aids: automatic speech separation (SS), which isolates individual speakers from mixed audio in an acoustic scene, and auditory attention decoding (AAD) in which the brainwaves of listeners are compared with separated speakers to determine the attended one, which can then be amplified to facilitate hearing. This dissertation focuses on speech separation and its integration with AAD, aiming to propel the evolution of brain-controlled hearing technologies. The goal is to help users to engage in conversations with people around them seamlessly and efficiently. This dissertation is structured into two parts. The first part focuses on automatic speech separation models, beginning with the introduction of a real-time monaural speech separation model, followed by more advanced real-time binaural speech separation models. The binaural models use both spectral and spatial features to separate speakers and are more robust to noise and reverberation. Beyond performing speech separation, the binaural models preserve the interaural cues of separated sound sources, which is a significant step towards immersive augmented hearing. Additionally, the first part explores using speaker identifications to improve the performance and robustness of models in long-form speech separation. This part also delves into unsupervised learning methods for multi-channel speech separation, aiming to improve the models' ability to generalize to real-world audio. The second part of the dissertation integrates speech separation introduced in the first part with auditory attention decoding (SS-AAD) to develop brain-controlled augmented hearing systems. It is demonstrated that auditory attention decoding with automatically separated speakers is as accurate and fast as using clean speech sounds. Furthermore, to better align the experimental environment of SS-AAD systems with real-life scenarios, the second part introduces a new AAD task that closely simulates real-world complex acoustic settings. The results show that the SS-AAD system is capable of improving speech intelligibility and facilitating tracking of the attended speaker in realistic acoustic environments. Finally, this part presents employing self-supervised learned speech representation in the SS-AAD systems to enhance the neural decoding of attentional selection

Columbia University Academic Commons

A psychoacoustic engineering approach to machine sound source separation in reverberant environments

Author: Hummersone C
Publication venue
Publication date: 01/01/2011
Field of study

Reverberation continues to present a major problem for sound source separation algorithms, due to its corruption of many of the acoustical cues on which these algorithms rely. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. This thesis therefore considers the research question: can the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation be improved? The precedence effect is a perceptual mechanism that aids our ability to localise sounds in reverberant environments. Despite this, relatively little work has been done on incorporating the precedence effect into automated sound source separation. Consequently, a study was conducted that compared several computational precedence models and their impact on the performance of a baseline separation algorithm. The algorithm included a precedence model, which was replaced with the other precedence models during the investigation. The models were tested using a novel metric in a range of reverberant rooms and with a range of other mixture parameters. The metric, termed Ideal Binary Mask Ratio, is shown to be robust to the effects of reverberation and facilitates meaningful and direct comparison between algorithms across different acoustic conditions. Large differences between the performances of the models were observed. The results showed that a separation algorithm incorporating a model based on interaural coherence produces the greatest performance gain over the baseline algorithm. The results from the study also indicated that it may be necessary to adapt the precedence model to the acoustic conditions in which the model is utilised. This effect is analogous to the perceptual Clifton effect, which is a dynamic component of the precedence effect that appears to adapt precedence to a given acoustic environment in order to maximise its effectiveness. However, no work has been carried out on adapting a precedence model to the acoustic conditions under test. Specifically, although the necessity for such a component has been suggested in the literature, neither its necessity nor benefit has been formally validated. Consequently, a further study was conducted in which parameters of each of the previously compared precedence models were varied in each room in order to identify if, and to what extent, the separation performance varied with these parameters. The results showed that the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation can be improved and can yield significant gains in separation performance.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

University of Surrey

Surrey Research Insight

OpenGrey Repository

A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms

Author: Magron Paul
Monir Nasser-Eddine
Serizel Romain
Publication venue
Publication date: 24/01/2024
Field of study

In the intricate acoustic landscapes where speech intelligibility is challenged by noise and reverberation, multichannel speech enhancement emerges as a promising solution for individuals with hearing loss. Such algorithms are commonly evaluated at the utterance level. However, this approach overlooks the granular acoustic nuances revealed by phoneme-specific analysis, potentially obscuring key insights into their performance. This paper presents an in-depth phoneme-scale evaluation of 3 state-of-the-art multichannel speech enhancement algorithms. These algorithms -- FasNet, MVDR, and Tango -- are extensively evaluated across different noise conditions and spatial setups, employing realistic acoustic simulations with measured room impulse responses, and leveraging diversity offered by multiple microphones in a binaural hearing setup. The study emphasizes the fine-grained phoneme-level analysis, revealing that while some phonemes like plosives are heavily impacted by environmental acoustics and challenging to deal with by the algorithms, others like nasals and sibilants see substantial improvements after enhancement. These investigations demonstrate important improvements in phoneme clarity in noisy conditions, with insights that could drive the development of more personalized and phoneme-aware hearing aid technologies.Comment: This is the preprint of the paper that we submitted to the Trends in Hearing Journa

arXiv.org e-Print Archive

Effects of Parameters of Spectrally Remote Frequencies on Binaural Processing

Author: Grange Anthony N.
Publication venue: Loyola eCommons
Publication date: 01/01/1995
Field of study

Loyola eCommons

Creating an Audio Story with Interactive Binaural Rendering in Virtual Reality

Author: Brogaard Vittrup Mikkel
Geronazzo Michele
Køhlert Jeppe
Leopold Hyldig Rosenkvist Amalie
Markmann-Hansen Camilla Kirstine
Sebastian Eriksen David
Serafin Stefania
Valimaa Miicha
Publication venue: 'Hindawi Limited'
Publication date: 01/11/2019
Field of study

VBN

Studies on noise robust automatic speech recognition

Author: Kurimo Mikko
Palomäki Kalle J.
Remes Ulpu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

Aaltodoc Publication Archive