1,253 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Cortical tracking of formant modulations derived from silently presented lip movements and its decline with age

    Get PDF
    The integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope, which is associated with but independent from the speakers’ lip movements, is tracked by the visual cortex. However, the speech signal also carries richer acoustic details, for example, about the fundamental frequency and the resonant frequencies, whose visuophonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change as a function of age. We recorded whole-head magnetoencephalographic (MEG) data while the participants watched silent normal (i.e., natural) and reversed videos of a speaker and paid attention to their lip movements. We found that the visual cortex is able to track the unheard natural modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to lip movements. Importantly, only the processing of natural unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency, or the purely visual information carried by lip movements. These results show that unheard spectral fine details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. As listening in noisy environments should capitalize on the ability to track spectral fine details, our results provide a novel focus on compensatory processes in such challenging situations

    Sound and noise

    Get PDF
    Sound and noise problems in space environment and human tolerance criteria at varying frequencies and intensitie

    Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering

    Full text link
    This paper addresses the problem of multichannel online dereverberation. The proposed method is carried out in the short-time Fourier transform (STFT) domain, and for each frequency band independently. In the STFT domain, the time-domain room impulse response is approximately represented by the convolutive transfer function (CTF). The multichannel CTFs are adaptively identified based on the cross-relation method, and using the recursive least square criterion. Instead of the complex-valued CTF convolution model, we use a nonnegative convolution model between the STFT magnitude of the source signal and the CTF magnitude, which is just a coarse approximation of the former model, but is shown to be more robust against the CTF perturbations. Based on this nonnegative model, we propose an online STFT magnitude inverse filtering method. The inverse filters of the CTF magnitude are formulated based on the multiple-input/output inverse theorem (MINT), and adaptively estimated based on the gradient descent criterion. Finally, the inverse filtering is applied to the STFT magnitude of the microphone signals, obtaining an estimate of the STFT magnitude of the source signal. Experiments regarding both speech enhancement and automatic speech recognition are conducted, which demonstrate that the proposed method can effectively suppress reverberation, even for the difficult case of a moving speaker.Comment: Paper submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing. IEEE Signal Processing Letters, 201

    Data-driven Speech Enhancement:from Non-negative Matrix Factorization to Deep Representation Learning

    Get PDF

    A study on different linear and non-linear filtering techniques of speech and speech recognition

    Get PDF
    In any signal noise is an undesired quantity, however most of thetime every signal get mixed with noise at different levels of theirprocessing and application, due to which the information containedby the signal gets distorted and makes the whole signal redundant.A speech signal is very prominent with acoustical noises like bubblenoise, car noise, street noise etc. So for removing the noises researchershave developed various techniques which are called filtering. Basicallyall the filtering techniques are not suitable for every application,hence based on the type of application some techniques are betterthan the others. Broadly, the filtering techniques can be classifiedinto two categories i.e. linear filtering and non-linear filtering.In this paper a study is presented on some of the filtering techniqueswhich are based on linear and nonlinear approaches. These techniquesincludes different adaptive filtering based on algorithm like LMS,NLMS and RLS etc., Kalman filter, ARMA and NARMA time series applicationfor filtering, neural networks combine with fuzzy i.e. ANFIS. Thispaper also includes the application of various features i.e. MFCC,LPC, PLP and gamma for filtering and recognition

    Aerospace medicine and biology: A continuing bibliography with indexes

    Get PDF
    This bibliography lists 138 reports, articles, and other documents introduced into the NASA scientific and technical information system in Jun. 1980
    corecore