1,211 research outputs found

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

    MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

    Get PDF
    This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems

    Articulatory features for conversational speech recognition

    Get PDF

    Compensating hyperarticulation for automatic speech recognition

    Get PDF

    Articulatory features for robust visual speech recognition

    Full text link

    Articulatory feature based continuous speech recognition using probabilistic lexical modeling

    Get PDF
    Phonological studies suggest that the typical subword units such as phones or phonemes used in automatic speech recognition systems can be decomposed into a set of features based on the articulators used to produce the sound. Most of the current approaches to integrate articulatory feature (AF) representations into an automatic speech recognition (ASR) system are based on a deterministic knowledge-based phoneme-to-AF relationship. In this paper, we propose a novel two stage approach in the framework of probabilistic lexical modeling to integrate AF representations into an ASR system. In the first stage, the relationship between acoustic feature observations and various AFs is modeled. In the second stage, a probabilistic relationship between subword units and AFs is learned using transcribed speech data. Our studies on a continuous speech recognition task show that the proposed approach effectively integrates AFs into an ASR system. Furthermore, the studies show that either phonemes or graphemes can be used as subword units. Analysis of the probabilistic relationship captured by the parameters has shown that the approach is capable of adapting the knowledge-based phoneme-to-AF representations using speech data; and allows different AFs to evolve asynchronously

    Speech production knowledge in automatic speech recognition

    Get PDF
    Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, we provide a survey of a growing body of work in which such representations are used to improve automatic speech recognition

    Modelling and Interpolation of Austrian German and Viennese

    Get PDF
    Abstract An HMM-based speech synthesis framework is applied to both Standard Austrian German and a Viennese dialectal variety and several training strategies for multi-dialect modeling such as dialect clustering and dialect-adaptive training are investigated. For bridging the gap between processing on the level of HMMs and on the linguistic level, we add phonological transformations to the HMM interpolation and apply them to dialect interpolation. The crucial steps are to employ several formalized phonological rules between Austrian German and Viennese dialect as constraints for the HMM interpolation. We verify the effectiveness of this strategy in a number of perceptual evaluations. Since the HMM space used is not articulatory but acoustic space, there are some variations in evaluation results between the phonological rules. However, in general we obtained good evaluation results which show that listeners can perceive both continuous and categorical changes of dialect varieties by using phonological transformations employed as switching rules in the HMM interpolation

    Automatic recognition of schwa variants in spontaneous Hungarian speech

    Get PDF
    This paper analyzes the nature of the process involved in optional vowel reduction in Hungarian, and the acoustic structure of schwa variants in spontaneous speech. The study focuses on the acoustic patterns of both the basic realizations of Hungarian vowels and their realizations as neutral vowels (schwas), as well as on the design, implementation, and evaluation of a set of algorithms for the recognition of both types of realizations from the speech waveform. The authors address the question whether schwas form a unified group of vowels or they show some dependence on the originally intended articulation of the vowel they stand for. The acoustic study uses a database consisting of over 4,000 utterances extracted from continuous speech, and recorded from 19 speakers. The authors propose methods for the recognition of neutral vowels depending on the various vowels they replace in spontaneous speech. Mel-Frequency Cepstral Coefficients are calculated and used for the training of Hidden Markov Models. The recognition system was trained on 2,500 utterances and then tested on 1,500 utterances. The results show that a neutral vowel can be detected in 72% of all occurrences. Stressed and unstressed syllables can be distinguished in 92% of all cases. Neutralized vowels do not form a unified group of phoneme realizations. The pronunciation of schwa heavily depends on the original articulation configuration of the intended vowel
    • …
    corecore