6 research outputs found

    Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation

    Get PDF

    Multi-task deep neural network acoustic models with model adaptation using discriminative speaker identity for whisper recognition

    Get PDF
    This paper presents a study on large vocabulary continuous whisper automatic recognition (wLVCSR). wLVCSR provides the ability to use ASR equipment in public places without concern for disturbing others or leaking private information. However the task of wLVCSR is much more challenging than normal LVCSR due to the absence of pitch which not only causes the signal to noise ratio (SNR) of whispers to be much lower than normal speech but also leads to flatness and formant shifts in whisper spectra. Furthermore, the amount of whisper data available for training is much less than for normal speech. In this paper, multi-task deep neural network (DNN) acoustic models are deployed to solve these problems. Moreover, model adaptation is performed on the multi-task DNN to normalize speaker and environmental variability in whispers based on discriminative speaker identity information. On a Mandarin whisper dictation task, with 55 hours of whisper data, the proposed SI multi-task DNN model can achieve 56.7% character error rate (CER) improvement over a baseline Gaussian Mixture Model (GMM), discriminatively trained only using the whisper data. Besides, the CER of the proposed model for normal speech can reach 15.2%, which is close to the performance of a state-of-the-art DNN trained with one thousand hours of speech data. From this baseline, the model-adapted DNN gains a further 10.9% CER reduction over the generic model

    Velum movement detection based on surface electromyography for speech interface

    Get PDF
    Conventional speech communication systems do not perform well in the absence of an intelligible acoustic signal. Silent Speech Interfaces enable speech communication to take place with speech-handicapped users and in noisy environments. However, since no acoustic signal is available, information on nasality may be absent, which is an important and relevant characteristic of several languages, particularly European Portuguese. In this paper we propose a non-invasive method - surface Electromyography (EMG) electrodes - positioned in the face and neck regions to explore the existence of useful information about the velum movement. The applied procedure takes advantage of Real-Time Magnetic Resonance Imaging (RT-MRI) data, collected from the same speakers, to interpret and validate EMG data. By ensuring compatible scenario conditions and proper alignment between the EMG and RT-MRI data, we are able to estimate when the velum moves and the probable type of movement under a nasality occurrence. Overall results of this experiment revealed interesting and distinct characteristics in the EMG signal when a nasal vowel is uttered and that it is possible to detect velum movement, particularly by sensors positioned below the ear between the mastoid process and the mandible in the upper neck region.info:eu-repo/semantics/publishedVersio

    Application of neural networks in whispered speech recognition.

    Get PDF
    Nedavno postignuti uspesi dubinskih neuralnih mreža u različitim zadacima mašinskog učenja su doprineli da vestačke neuralne mreze ponovo zauzmu bitnu ulogu u automatskom prepoznavanju govora. U ovom doktoratu je ispitana primena vestačkih neuralnih mreza u prepoznavanju šapata...The recent success of Deep Neural Networks (DNN) in different machine learning tasks has significantly contributed to the rise in the popularity of artificial neural networks (ANN) and their today’s role in Automatic Speech Recognition (ASR). This thesis examines how artificial neural networks can benefit in automatic whispered speech recognition..
    corecore