931 research outputs found

    PHONOTACTIC AND ACOUSTIC LANGUAGE RECOGNITION

    Get PDF
    PrĆ”ce pojednĆ”vĆ” o fonotaktickĆ©m a akustickĆ©m pÅ™Ć­stupu pro automatickĆ© rozpoznĆ”vĆ”nĆ­ jazyka. PrvnĆ­ ÄĆ”st prĆ”ce pojednĆ”vĆ” o fonotaktickĆ©m pÅ™Ć­stupu založenĆ©m na vĆ½skytu fonĆ©movĆ½ch sekvenci v řeči. NejdÅ™Ć­ve je prezentovĆ”n popis vĆ½voje fonĆ©movĆ©ho rozpoznĆ”vače jako techniky pro přepis řeči do sekvence smysluplnĆ½ch symbolÅÆ. HlavnĆ­ dÅÆraz je kladen na dobrĆ© natrĆ©novĆ”nĆ­ fonĆ©movĆ©ho rozpoznĆ”vače a kombinaci vĆ½sledkÅÆ z několika fonĆ©movĆ½ch rozpoznĆ”vačÅÆ trĆ©novanĆ½ch na rÅÆznĆ½ch jazycĆ­ch (ParalelnĆ­ fonĆ©movĆ© rozpoznĆ”vĆ”nĆ­ nĆ”sledovanĆ© jazykovĆ½mi modely (PPRLM)). PrĆ”ce takĆ© pojednĆ”vĆ” o novĆ© technice anti-modely v PPRLM a studuje použitĆ­ fonĆ©movĆ½ch grafÅÆ mĆ­sto nejlepÅ”Ć­ho přepisu. Na zĆ”věr prĆ”ce jsou porovnĆ”ny dva pÅ™Ć­stupy modelovĆ”nĆ­ vĆ½stupu fonĆ©movĆ©ho rozpoznĆ”vače -- standardnĆ­ n-gramovĆ© jazykovĆ© modely a binĆ”rnĆ­ rozhodovacĆ­ stromy. HlavnĆ­ pÅ™Ć­nos v akustickĆ©m pÅ™Ć­stupu je diskriminativnĆ­ modelovĆ”nĆ­ cĆ­lovĆ½ch modelÅÆ jazykÅÆ a prvnĆ­ experimenty s kombinacĆ­ diskriminativnĆ­ho trĆ©novĆ”nĆ­ a na pÅ™Ć­znacĆ­ch, kde byl odstraněn vliv kanĆ”lu. PrĆ”ce dĆ”le zkoumĆ” rÅÆznĆ© druhy technik fĆŗzi akustickĆ©ho a fonotaktickĆ©ho pÅ™Ć­stupu. VÅ”echny experimenty jsou provedeny na standardnĆ­ch datech z NIST evaluaci konanĆ© v letech 2003, 2005 a 2007, takže jsou pÅ™Ć­mo porovnatelnĆ© s vĆ½sledky ostatnĆ­ch skupin zabĆ½vajĆ­cĆ­ch se automatickĆ½m rozpoznĆ”vĆ”nĆ­m jazyka. S fĆŗzĆ­ uvedenĆ½ch technik jsme posunuli state-of-the-art vĆ½sledky a dosĆ”hli vynikajĆ­cĆ­ch vĆ½sledkÅÆ ve dvou NIST evaluacĆ­ch.This thesis deals with phonotactic and acoustic techniques for automatic language recognition (LRE). The first part of the thesis deals with the phonotactic language recognition based on co-occurrences of phone sequences in speech. A thorough study of phone recognition as tokenization technique for LRE is done, with focus on the amounts of training data for phone recognizer and on the combination of phone recognizers trained on several language (Parallel Phone Recognition followed by Language Model - PPRLM). The thesis also deals with novel technique of anti-models in PPRLM and investigates into using phone lattices instead of strings. The work on phonotactic approach is concluded by a comparison of classical n-gram modeling techniques and binary decision trees. The acoustic LRE was addressed too, with the main focus on discriminative techniques for training target language acoustic models and on initial (but successful) experiments with removing channel dependencies. We have also investigated into the fusion of phonotactic and acoustic approaches. All experiments were performed on standard data from NIST 2003, 2005 and 2007 evaluations so that the results are directly comparable to other laboratories in the LRE community. With the above mentioned techniques, the fused systems defined the state-of-the-art in the LRE field and reached excellent results in NIST evaluations.

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Current trends in multilingual speech processing

    Get PDF
    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processin

    Linguistic constraints for large vocabulary speech recognition.

    Get PDF
    by Roger H.Y. Leung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1999.Includes bibliographical references (leaves 79-84).Abstracts in English and Chinese.ABSTRACT --- p.IKeywords: --- p.IACKNOWLEDGEMENTS --- p.IIITABLE OF CONTENTS: --- p.IVTable of Figures: --- p.VITable of Tables: --- p.VIIChapter CHAPTER 1 --- INTRODUCTION --- p.1Chapter 1.1 --- Languages in the World --- p.2Chapter 1.2 --- Problems of Chinese Speech Recognition --- p.3Chapter 1.2.1 --- Unlimited word size: --- p.3Chapter 1.2.2 --- Too many Homophones: --- p.3Chapter 1.2.3 --- Difference between spoken and written Chinese: --- p.3Chapter 1.2.4 --- Word Segmentation Problem: --- p.4Chapter 1.3 --- Different types of knowledge --- p.5Chapter 1.4 --- Chapter Conclusion --- p.6Chapter CHAPTER 2 --- FOUNDATIONS --- p.7Chapter 2.1 --- Chinese Phonology and Language Properties --- p.7Chapter 2.1.1 --- Basic Syllable Structure --- p.7Chapter 2.2 --- Acoustic Models --- p.9Chapter 2.2.1 --- Acoustic Unit --- p.9Chapter 2.2.2 --- Hidden Markov Model (HMM) --- p.9Chapter 2.3 --- Search Algorithm --- p.11Chapter 2.4 --- Statistical Language Models --- p.12Chapter 2.4.1 --- Context-Independent Language Model --- p.12Chapter 2.4.2 --- Word-Pair Language Model --- p.13Chapter 2.4.3 --- N-gram Language Model --- p.13Chapter 2.4.4 --- Backoff n-gram --- p.14Chapter 2.5 --- Smoothing for Language Model --- p.16Chapter CHAPTER 3 --- LEXICAL ACCESS --- p.18Chapter 3.1 --- Introduction --- p.18Chapter 3.2 --- Motivationļ¼š Phonological and lexical constraints --- p.20Chapter 3.3 --- Broad Classes Representation --- p.22Chapter 3.4 --- Broad Classes Statistic Measures --- p.25Chapter 3.5 --- Broad Classes Frequency Normalization --- p.26Chapter 3.6 --- Broad Classes Analysis --- p.27Chapter 3.7 --- Isolated Word Speech Recognizer using Broad Classes --- p.33Chapter 3.8 --- Chapter Conclusion --- p.34Chapter CHAPTER 4 --- CHARACTER AND WORD LANGUAGE MODEL --- p.35Chapter 4.1 --- Introduction --- p.35Chapter 4.2 --- Motivation --- p.36Chapter 4.2.1 --- Perplexity --- p.36Chapter 4.3 --- Call Home Mandarin corpus --- p.38Chapter 4.3.1 --- Acoustic Data --- p.38Chapter 4.3.2 --- Transcription Texts --- p.39Chapter 4.4 --- Methodology: Building Language Model --- p.41Chapter 4.5 --- Character Level Language Model --- p.45Chapter 4.6 --- Word Level Language Model --- p.48Chapter 4.7 --- Comparison of Character level and Word level Language Model --- p.50Chapter 4.8 --- Interpolated Language Model --- p.54Chapter 4.8.1 --- Methodology --- p.54Chapter 4.8.2 --- Experiment Results --- p.55Chapter 4.9 --- Chapter Conclusion --- p.56Chapter CHAPTER 5 --- N-GRAM SMOOTHING --- p.57Chapter 5.1 --- Introduction --- p.57Chapter 5.2 --- Motivation --- p.58Chapter 5.3 --- Mathematical Representation --- p.59Chapter 5.4 --- Methodology: Smoothing techniques --- p.61Chapter 5.4.1 --- Add-one Smoothing --- p.62Chapter 5.4.2 --- Witten-Bell Discounting --- p.64Chapter 5.4.3 --- Good Turing Discounting --- p.66Chapter 5.4.4 --- Absolute and Linear Discounting --- p.68Chapter 5.5 --- Comparison of Different Discount Methods --- p.70Chapter 5.6 --- Continuous Word Speech Recognizer --- p.71Chapter 5.6.1 --- Experiment Setup --- p.71Chapter 5.6.2 --- Experiment Results: --- p.72Chapter 5.7 --- Chapter Conclusion --- p.74Chapter CHAPTER 6 --- SUMMARY AND CONCLUSIONS --- p.75Chapter 6.1 --- Summary --- p.75Chapter 6.2 --- Further Work --- p.77Chapter 6.3 --- Conclusion --- p.78REFERENCE --- p.7

    Investigating spoken emotion : the interplay of language and facial expression

    Get PDF
    This thesis aims to investigate how spoken expressions of emotions are influenced by the characteristics of spoken language and the facial emotion expression. The first three chapters examined how production and perception of emotions differed between Cantonese (tone language) and English (non-tone language). The rationale for this contrast was that the acoustic property of Fundamental Frequency (F0) may be used differently in the production and perception of spoken expressions in tone languages as F0 may be preserved as a linguistic resource for the production of lexical tones. To test this idea, I first developed the Cantonese Audio-visual Emotional Speech (CAVES) database, which was then used as stimuli in all the studies presented in this thesis (Chapter 1). An emotion perception study was then conducted to examine how three groups of participants (Australian English, Malaysian Malay and Hong Kong Cantonese speakers) identified spoken expression of emotions that were produced in either English or Cantonese (Chapter 2). As one of the aims of this study was to disambiguate the effects of language from culture, these participants were selected on the basis that they either shared similarities in language type (non-tone language, Malay and English) or culture (collectivist culture, Cantonese and Malay). The results showed that a greater similarity in emotion perception was observed between those who spoke a similar type of language, as opposed to those who shared a similar culture. This suggests some intergroup differences in emotion perception may be attributable to cross-language differences. Following up on these findings, an acoustic analysis study (Chapter 3) showed that compared to English spoken expression of emotions, Cantonese expressions had less F0 related cues (median and flatter F0 contour) and also the use of F0 cues was different. Taken together, these results show that language characteristics (n F0 usage) interact with the production and perception of spoken expression of emotions. The expression of disgust was used to investigate how facial expressions of emotions affect speech articulation. The rationale for selecting disgust was that the facial expression of disgust involves changes to the mouth region such as closure and retraction of the lips, and these changes are likely to have an impact on speech articulation. To test this idea, an automatic lip segmentation and measurement algorithm was developed to quantify the configuration of the lips from images (Chapter 5). By comparing neutral to disgust expressive speech, the results showed that disgust expressive speech is produced with significantly smaller vertical mouth opening, greater horizontal mouth opening and lower first and second formant frequencies (F1 and F2). Overall, this thesis provides an insight into how aspects of expressive speech may be shaped by specific (language type) and universal (face emotion expression) factors
    • ā€¦
    corecore