Search CORE

1,211 research outputs found

Acoustic Modelling for Under-Resourced Languages

Author: Stüker Sebastian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

KITopen

An Analysis of HMM-based Prediction of Articulatory Movements

Author: Ling Zhen-Hua
Richmond Korin
Yamagishi Junichi
Publication venue: 'Elsevier BV'
Publication date: 01/10/2010
Field of study

Crossref

Edinburgh Research Explorer

MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

Author: Khanal Subash
Publication venue: UKnowledge
Publication date: 01/01/2020
Field of study

This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems

University of Kentucky

Articulatory features for conversational speech recognition

Author: Metze Florian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

Compensating hyperarticulation for automatic speech recognition

Author: Soltau Hagen
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

Articulatory features for robust visual speech recognition

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Articulatory feature based continuous speech recognition using probabilistic lexical modeling

Author: Magimai.-Doss Mathew
Rasipuram Ramya
Publication venue: 'Elsevier BV'
Publication date: 19/07/2015
Field of study

Phonological studies suggest that the typical subword units such as phones or phonemes used in automatic speech recognition systems can be decomposed into a set of features based on the articulators used to produce the sound. Most of the current approaches to integrate articulatory feature (AF) representations into an automatic speech recognition (ASR) system are based on a deterministic knowledge-based phoneme-to-AF relationship. In this paper, we propose a novel two stage approach in the framework of probabilistic lexical modeling to integrate AF representations into an ASR system. In the first stage, the relationship between acoustic feature observations and various AFs is modeled. In the second stage, a probabilistic relationship between subword units and AFs is learned using transcribed speech data. Our studies on a continuous speech recognition task show that the proposed approach effectively integrates AFs into an ASR system. Furthermore, the studies show that either phonemes or graphemes can be used as subword units. Analysis of the probabilistic relationship captured by the parameters has shown that the approach is capable of adapting the knowledge-based phoneme-to-AF representations using speech data; and allows different AFs to evolve asynchronously

Infoscience - École polytechnique fédérale de Lausanne

Speech production knowledge in automatic speech recognition

Author: Frankel Joe
King Simon
Livescu Karen
McDermott Erik
Richmond Korin
Wester Mirjam
Publication venue
Publication date: 01/01/2007
Field of study

Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, we provide a survey of a growing body of work in which such representations are used to improve automatic speech recognition

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer

Modelling and Interpolation of Austrian German and Viennese

Author: D Yamagishi
F Strom
J Neubarth
M Schabus
V
Publication venue
Publication date: 01/01/2010
Field of study

Abstract An HMM-based speech synthesis framework is applied to both Standard Austrian German and a Viennese dialectal variety and several training strategies for multi-dialect modeling such as dialect clustering and dialect-adaptive training are investigated. For bridging the gap between processing on the level of HMMs and on the linguistic level, we add phonological transformations to the HMM interpolation and apply them to dialect interpolation. The crucial steps are to employ several formalized phonological rules between Austrian German and Viennese dialect as constraints for the HMM interpolation. We verify the effectiveness of this strategy in a number of perceptual evaluations. Since the HMM space used is not articulatory but acoustic space, there are some variations in evaluation results between the phonological rules. However, in general we obtained good evaluation results which show that listeners can perceive both continuous and categorical changes of dialect varieties by using phonological transformations employed as switching rules in the HMM interpolation

CiteSeerX

Automatic recognition of schwa variants in spontaneous Hungarian speech

Author: András Beke
Beke A.
Beke A.
Bergem D. R. v.
Bernthal J. E.
Bondarko L. V.
Browman C. P.
Bunnel H. T.
Cruttenden A.
Delattre P.
Dressler W. U.
Engstrand O.
Flemming E.
Flemming E.
Furui S.
Furui S.
Gay T.
György Szaszák
Gósy M.
Gósy M.
Gósy M.
Halpern J.
Harmegnies B.
Herrick D.
Janse E.
Jenkin K. L.
Koopmans-van Beinum F. J.
Kuijk D.
Kálmán L.
Ladefoged P.
Ladefoged P.
Lindblom B.
Lindblom B.
Madelska L.
Masanobu N.
Padgett J.
Pennington M. C.
Pickett J. M.
Siptár P.
Slifka J.
Stevens K. N.
Swerts M.
Szaszák G.
Xie H.
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 01/01/2010
Field of study

This paper analyzes the nature of the process involved in optional vowel reduction in Hungarian, and the acoustic structure of schwa variants in spontaneous speech. The study focuses on the acoustic patterns of both the basic realizations of Hungarian vowels and their realizations as neutral vowels (schwas), as well as on the design, implementation, and evaluation of a set of algorithms for the recognition of both types of realizations from the speech waveform. The authors address the question whether schwas form a unified group of vowels or they show some dependence on the originally intended articulation of the vowel they stand for. The acoustic study uses a database consisting of over 4,000 utterances extracted from continuous speech, and recorded from 19 speakers. The authors propose methods for the recognition of neutral vowels depending on the various vowels they replace in spontaneous speech. Mel-Frequency Cepstral Coefficients are calculated and used for the training of Hidden Markov Models. The recognition system was trained on 2,500 utterances and then tested on 1,500 utterances. The results show that a neutral vowel can be detected in 72% of all occurrences. Stressed and unstressed syllables can be distinguished in 92% of all cases. Neutralized vowels do not form a unified group of phoneme realizations. The pronunciation of schwa heavily depends on the original articulation configuration of the intended vowel

Crossref

Repository of the Academy's Library