Search CORE

371 research outputs found

Acoustic source separation based on target equalization-cancellation

Author: Mi Jing
Publication venue
Publication date: 20/02/2018
Field of study

Normal-hearing listeners are good at focusing on the target talker while ignoring the interferers in a multi-talker environment. Therefore, efforts have been devoted to build psychoacoustic models to understand binaural processing in multi-talker environments and to develop bio-inspired source separation algorithms for hearing-assistive devices. This thesis presents a target-Equalization-Cancellation (target-EC) approach to the source separation problem. The idea of the target-EC approach is to use the energy change before and after cancelling the target to estimate a time-frequency (T-F) mask in which each entry estimates the strength of target signal in the original mixture. Once the mask is calculated, it is applied to the original mixture to preserve the target-dominant T-F units and to suppress the interferer-dominant T-F units. On the psychoacoustic modeling side, when the output of the target-EC approach is evaluated with the Coherence-based Speech Intelligibility Index (CSII), the predicted binaural advantage closely matches the pattern of the measured data. On the application side, the performance of the target-EC source separation algorithm was evaluated by psychoacoustic measurements using both a closed-set speech corpus and an open-set speech corpus, and it was shown that the target-EC cue is a better cue for source separation than the interaural difference cues

Boston University Institutional Repository (OpenBU)

Blind identification of acoustic systems and enhancement of reverberant speech

Author: Gaubitch Nikolay Dian
Gaubitch Nikolay Dian
Publication venue
Publication date: 01/01/2007
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

Author: Ramamurthy Anand
Publication venue: UKnowledge
Publication date: 01/01/2007
Field of study

The detection of sound sources with microphone arrays can be enhanced through processing individual microphone signals prior to the delay and sum operation. One method in particular, the Phase Transform (PHAT) has demonstrated improvement in sound source location images, especially in reverberant and noisy environments. Recent work proposed a modification to the PHAT transform that allows varying degrees of spectral whitening through a single parameter, andamp;acirc;, which has shown positive improvement in target detection in simulation results. This work focuses on experimental evaluation of the modified SRP-PHAT algorithm. Performance results are computed from actual experimental setup of an 8-element perimeter array with a receiver operating characteristic (ROC) analysis for detecting sound sources. The results verified simulation results of PHAT- andamp;acirc; in improving target detection probabilities. The ROC analysis demonstrated the relationships between various target types (narrowband and broadband), room reverberation levels (high and low) and noise levels (different SNR) with respect to optimal andamp;acirc;. Results from experiment strongly agree with those of simulations on the effect of PHAT in significantly improving detection performance for narrowband and broadband signals especially at low SNR and in the presence of high levels of reverberation

University of Kentucky

Multi-channel dereverberation for speech intelligibility improvement in hearing aid applications

Author: Kuklasinski Adam
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2016
Field of study

VBN

DESIGN AND EVALUATION OF HARMONIC SPEECH ENHANCEMENT AND BANDWIDTH EXTENSION

Author: Venkatasubramanian Arvind
Publication venue: Scholarship@Western
Publication date: 01/01/2011
Field of study

Improving the quality and intelligibility of speech signals continues to be an important topic in mobile communications and hearing aid applications. This thesis explored the possibilities of improving the quality of corrupted speech by cascading a log Minimum Mean Square Error (logMMSE) noise reduction system with a Harmonic Speech Enhancement (HSE) system. In HSE, an adaptive comb filter is deployed to harmonically filter the useful speech signal and suppress the noisy components to noise floor. A Bandwidth Extension (BWE) algorithm was applied to the enhanced speech for further improvements in speech quality. Performance of this algorithm combination was evaluated using objective speech quality metrics across a variety of noisy and reverberant environments. Results showed that the logMMSE and HSE combination enhanced the speech quality in any reverberant environment and in the presence of multi-talker babble. The objective improvements associated with the BWE were found to be minima

Scholarship@Western

Recommended from our members

Automatic Speech Separation for Brain-Controlled Hearing Technologies

Author: Han Cong
Publication venue
Publication date: 01/01/2024
Field of study

Speech perception in crowded acoustic environments is particularly challenging for hearing impaired listeners. While assistive hearing devices can suppress background noises distinct from speech, they struggle to lower interfering speakers without knowing the speaker on which the listener is focusing. The human brain has a remarkable ability to pick out individual voices in a noisy environment like a crowded restaurant or a busy city street. This inspires the brain-controlled hearing technologies. A brain-controlled hearing aid acts as an intelligent filter, reading wearers’ brainwaves and enhancing the voice they want to focus on. Two essential elements form the core of brain-controlled hearing aids: automatic speech separation (SS), which isolates individual speakers from mixed audio in an acoustic scene, and auditory attention decoding (AAD) in which the brainwaves of listeners are compared with separated speakers to determine the attended one, which can then be amplified to facilitate hearing. This dissertation focuses on speech separation and its integration with AAD, aiming to propel the evolution of brain-controlled hearing technologies. The goal is to help users to engage in conversations with people around them seamlessly and efficiently. This dissertation is structured into two parts. The first part focuses on automatic speech separation models, beginning with the introduction of a real-time monaural speech separation model, followed by more advanced real-time binaural speech separation models. The binaural models use both spectral and spatial features to separate speakers and are more robust to noise and reverberation. Beyond performing speech separation, the binaural models preserve the interaural cues of separated sound sources, which is a significant step towards immersive augmented hearing. Additionally, the first part explores using speaker identifications to improve the performance and robustness of models in long-form speech separation. This part also delves into unsupervised learning methods for multi-channel speech separation, aiming to improve the models' ability to generalize to real-world audio. The second part of the dissertation integrates speech separation introduced in the first part with auditory attention decoding (SS-AAD) to develop brain-controlled augmented hearing systems. It is demonstrated that auditory attention decoding with automatically separated speakers is as accurate and fast as using clean speech sounds. Furthermore, to better align the experimental environment of SS-AAD systems with real-life scenarios, the second part introduces a new AAD task that closely simulates real-world complex acoustic settings. The results show that the SS-AAD system is capable of improving speech intelligibility and facilitating tracking of the attended speaker in realistic acoustic environments. Finally, this part presents employing self-supervised learned speech representation in the SS-AAD systems to enhance the neural decoding of attentional selection

Columbia University Academic Commons