476 research outputs found

    An on-line speaker adaptation method for HMM-based speech recognizers

    Get PDF
    In the past few years numerous techniques have been proposed to improve the efficiency of basic adaptation methods like MLLR and MAP. These adaptation methods have a common aim, which is to increase the likelihood of the phoneme models for a particular speaker. During their operation, these speaker adaptation methods need precise phonetic segmentation information of the actual utterance, but these data samples are often faulty. To improve the overall performance, only those frames from the spoken sentence which are well segmented should be retained, while the incorrectly segmented data should not be used during adaptation. Several heuristic algorithms have been proposed in the literature for the selection of the reliably segmented data blocks, and here we would like to suggest some new heuristics that discriminate between faulty and well-segmented data. The effect of these methods on the efficiency of speech recognition using speaker adaptation is examined, and conclusions for each will be drawn. Besided post-filtering the set of the segmented adaptation examples, another way of improving the efficiency of the adaptation method might be to create a more precise segmentation, which should then reduce the chance of faulty data samples being included. We suggest a method like this here as well which is based on a scoring procedure for the N-best lists, taking into account phoneme duration

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    Concept Type Prediction and Responsive Adaptation in a Dialogue System

    Get PDF
    Responsive adaptation in spoken dialog systems involves a change in dialog system behavior in response to a user or a dialog situation. In this paper we address responsive adaptation in the automatic speech recognition (ASR) module of a spoken dialog system. We hypothesize that information about the content of a user utterance may help improve speech recognition for the utterance. We use a two-step process to test this hypothesis: first, we automatically predict the task-relevant concept types likely to be present in a user utterance using features from the dialog context and from the output of first-pass ASR of the utterance; and then, we adapt the ASR's language model to the predicted content of the user's utterance and run a second pass of ASR. We show that: (1) it is possible to achieve high accuracy in determining presence or absence of particular concept types in a post-confirmation utterance; and (2) 2-pass speech recognition with concept type classification and language model adaptation can lead to improved speech recognition performance for post-confirmation utterances

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    Current trends in multilingual speech processing

    Get PDF
    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processin
    • …
    corecore