1,013 research outputs found
Cognitive performance in open-plan office acoustic simulations: Effects of room acoustics and semantics but not spatial separation of sound sources
The irrelevant sound effect (ISE) characterizes short-term memory performance
impairment during irrelevant sounds relative to quiet. Irrelevant sound
presentation in most laboratory-based ISE studies has been rather limited to
represent complex scenarios including open-plan offices (OPOs) and not many
studies have considered serial recall of heard information. This paper
investigates ISE using an auditory-verbal serial recall task, wherein
performance was evaluated for relevant factors in simulating OPO acoustics: the
irrelevant sounds including the semanticity of speech, reproduction methods
over headphones, and room acoustics. Results (Experiments 1 and 2) show that
ISE was exhibited in most conditions with anechoic (irrelevant) nonspeech
sounds with/without speech, but the effect was substantially higher with
meaningful speech compared to foreign speech, suggesting a semantic effect.
Performance differences in conditions with diotic and binaural reproductions
were not statistically robust, suggesting limited role of spatial separation of
sources. In Experiment 3, statistically robust ISE were exhibited for binaural
room acoustic conditions with mid-frequency reverberation times, T30 (s) = 0.4,
0.8, 1.1, suggesting cognitive impairment regardless of sound absorption
representative of OPOs. Performance differences in T30 = 0.4 s relative to T30
= 0.8 and 1.1 s conditions were statistically robust. This emphasizes the
benefits for cognitive performance with increased sound absorption, reinforcing
extant room acoustic design recommendations. Performance differences in T30 =
0.8 s vs. 1.1 s were not statistically robust. Collectively, these results
suggest that certain findings from ISE studies with idiosyncratic acoustics may
not translate well to complex OPO acoustic environments
Recommended from our members
Automatic Speech Separation for Brain-Controlled Hearing Technologies
Speech perception in crowded acoustic environments is particularly challenging for hearing impaired listeners. While assistive hearing devices can suppress background noises distinct from speech, they struggle to lower interfering speakers without knowing the speaker on which the listener is focusing. The human brain has a remarkable ability to pick out individual voices in a noisy environment like a crowded restaurant or a busy city street. This inspires the brain-controlled hearing technologies. A brain-controlled hearing aid acts as an intelligent filter, reading wearers’ brainwaves and enhancing the voice they want to focus on.
Two essential elements form the core of brain-controlled hearing aids: automatic speech separation (SS), which isolates individual speakers from mixed audio in an acoustic scene, and auditory attention decoding (AAD) in which the brainwaves of listeners are compared with separated speakers to determine the attended one, which can then be amplified to facilitate hearing. This dissertation focuses on speech separation and its integration with AAD, aiming to propel the evolution of brain-controlled hearing technologies. The goal is to help users to engage in conversations with people around them seamlessly and efficiently.
This dissertation is structured into two parts. The first part focuses on automatic speech separation models, beginning with the introduction of a real-time monaural speech separation model, followed by more advanced real-time binaural speech separation models. The binaural models use both spectral and spatial features to separate speakers and are more robust to noise and reverberation. Beyond performing speech separation, the binaural models preserve the interaural cues of separated sound sources, which is a significant step towards immersive augmented hearing. Additionally, the first part explores using speaker identifications to improve the performance and robustness of models in long-form speech separation. This part also delves into unsupervised learning methods for multi-channel speech separation, aiming to improve the models' ability to generalize to real-world audio.
The second part of the dissertation integrates speech separation introduced in the first part with auditory attention decoding (SS-AAD) to develop brain-controlled augmented hearing systems. It is demonstrated that auditory attention decoding with automatically separated speakers is as accurate and fast as using clean speech sounds. Furthermore, to better align the experimental environment of SS-AAD systems with real-life scenarios, the second part introduces a new AAD task that closely simulates real-world complex acoustic settings. The results show that the SS-AAD system is capable of improving speech intelligibility and facilitating tracking of the attended speaker in realistic acoustic environments. Finally, this part presents employing self-supervised learned speech representation in the SS-AAD systems to enhance the neural decoding of attentional selection
A psychoacoustic engineering approach to machine sound source separation in reverberant environments
Reverberation continues to present a major problem for sound source separation algorithms, due to its corruption of many of the acoustical cues on which these algorithms rely. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. This thesis therefore considers the research question: can the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation be improved? The precedence effect is a perceptual mechanism that aids our ability to localise sounds in reverberant environments. Despite this, relatively little work has been done on incorporating the precedence effect into automated sound source separation. Consequently, a study was conducted that compared several computational precedence models and their impact on the performance of a baseline separation algorithm. The algorithm included a precedence model, which was replaced with the other precedence models during the investigation. The models were tested using a novel metric in a range of reverberant rooms and with a range of other mixture parameters. The metric, termed Ideal Binary Mask Ratio, is shown to be robust to the effects of reverberation and facilitates meaningful and direct comparison between algorithms across different acoustic conditions. Large differences between the performances of the models were observed. The results showed that a separation algorithm incorporating a model based on interaural coherence produces the greatest performance gain over the baseline algorithm. The results from the study also indicated that it may be necessary to adapt the precedence model to the acoustic conditions in which the model is utilised. This effect is analogous to the perceptual Clifton effect, which is a dynamic component of the precedence effect that appears to adapt precedence to a given acoustic environment in order to maximise its effectiveness. However, no work has been carried out on adapting a precedence model to the acoustic conditions under test. Specifically, although the necessity for such a component has been suggested in the literature, neither its necessity nor benefit has been formally validated. Consequently, a further study was conducted in which parameters of each of the previously compared precedence models were varied in each room in order to identify if, and to what extent, the separation performance varied with these parameters. The results showed that the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation can be improved and can yield significant gains in separation performance.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
In the intricate acoustic landscapes where speech intelligibility is
challenged by noise and reverberation, multichannel speech enhancement emerges
as a promising solution for individuals with hearing loss. Such algorithms are
commonly evaluated at the utterance level. However, this approach overlooks the
granular acoustic nuances revealed by phoneme-specific analysis, potentially
obscuring key insights into their performance. This paper presents an in-depth
phoneme-scale evaluation of 3 state-of-the-art multichannel speech enhancement
algorithms. These algorithms -- FasNet, MVDR, and Tango -- are extensively
evaluated across different noise conditions and spatial setups, employing
realistic acoustic simulations with measured room impulse responses, and
leveraging diversity offered by multiple microphones in a binaural hearing
setup. The study emphasizes the fine-grained phoneme-level analysis, revealing
that while some phonemes like plosives are heavily impacted by environmental
acoustics and challenging to deal with by the algorithms, others like nasals
and sibilants see substantial improvements after enhancement. These
investigations demonstrate important improvements in phoneme clarity in noisy
conditions, with insights that could drive the development of more personalized
and phoneme-aware hearing aid technologies.Comment: This is the preprint of the paper that we submitted to the Trends in
Hearing Journa
Studies on noise robust automatic speech recognition
Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK
- …