1,586 research outputs found

    Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information

    Get PDF
    Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

    A Study into Speech Enhancement Techniques in Adverse Environment

    Get PDF
    This dissertation developed speech enhancement techniques that improve the speech quality in applications such as mobile communications, teleconferencing and smart loudspeakers. For these applications it is necessary to suppress noise and reverberation. Thus the contribution in this dissertation is twofold: single channel speech enhancement system which exploits the temporal and spectral diversity of the received microphone signal for noise suppression and multi-channel speech enhancement method with the ability to employ spatial diversity to reduce reverberation

    Coding Strategies for Cochlear Implants Under Adverse Environments

    Get PDF
    Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI speech processors. For speech in background noise, we propose a mechanism underlying the contribution of harmonics to the benefit of electroacoustic stimulations in cochlear implants. The proposed strategy is based on harmonic modeling and uses synthesis driven approach to synthesize the harmonics in voiced segments of speech. Based on objective measures, results indicated improvement in speech quality. This study warrants further work into development of algorithms to regenerate harmonics of voiced segments in the presence of noise

    Reconfigurable Multiband Dynamic Range Compression-based FRM Filter for Hearing Aid

    Get PDF
    In this research, we present an innovative method for enhancing the performance of hearing aids using a Multiband Dynamic Range Compression-based Reconfigurable Frequency Response Masking (FRM) Filterbank. First, a unform16-band reconfigurable filter bank, which is reconfigurable, is designed utilizing the FRM scheme. The strategic arrangement of each sub-band within the proposed filter bank is meticulously prepared to optimize the matching performance. Based on the hearing characteristics of patients, the sub-bands can be distributed in low, medium, and high-frequency regions. Also, the gain can be adjusted per the patient's hearing profile from their audiogram for better auditory compensation. Further, the Multiband Dynamic Range Compression (MBDRC) technique is applied to address the specific needs of individuals with different frequency-dependent hearing impairments. It involves using dynamic range compression independently to different frequency sub-bands within a filter bank. In MBDRC, the compression parameters, such as compression threshold and ratio, can be adjusted independently for every subband. It allows for a more tailored approach to address the specific hearing needs of different frequency regions. If an individual has more severe hearing loss in high-frequency regions, higher compression ratios and lower compression thresholds can be applied to those subbands to amplify and improve audibility for high-frequency sounds. Once dynamic range compression is applied to each sub-band, the resultant sub-bands are reassembled to yield the ultimate output signal, which can subsequently be transmitted to the speaker or receiver of the hearing aid. A GUI can be helpful for better visualization and parameter control, including gain adjustment and compression parameters of this entire process. With this aim in mind, a GUI has been developed on MATLAB. Different audio files can be imported, and their frequency response can be generated and observed. Based on a person's audiogram, the control parameters can be set to low, medium, or high. Their sub-band distribution in low, medium, and high-frequency regions can be visualized. Further, the filter bank makes automatic gain adjustments, as seen in the GUI. The gain points for each band can also be manually adjusted according to users' hearing characteristics to minimize the error. Also, the compression parameters can be set separately for each subband as per the hearing requirement of the patient. Further, the processed output can be visualized in the output frequency response tab, and the input and output audio signals can be analyzed

    Perceptually motivated blind source separation of convolutive audio mixtures

    Get PDF

    Cold Diffusion for Speech Enhancement

    Full text link
    Diffusion models have recently shown promising results for difficult enhancement tasks such as the conditional and unconditional restoration of natural images and audio signals. In this work, we explore the possibility of leveraging a recently proposed advanced iterative diffusion model, namely cold diffusion, to recover clean speech signals from noisy signals. The unique mathematical properties of the sampling process from cold diffusion could be utilized to restore high-quality samples from arbitrary degradations. Based on these properties, we propose an improved training algorithm and objective to help the model generalize better during the sampling process. We verify our proposed framework by investigating two model architectures. Experimental results on benchmark speech enhancement dataset VoiceBank-DEMAND demonstrate the strong performance of the proposed approach compared to representative discriminative models and diffusion-based enhancement models.Comment: 5 pages, 1 figure, 1 table, 3 algorithms. Submitted to ICASSP 202
    • …
    corecore