2,285 research outputs found

    English Broadcast News Speech Recognition by Humans and Machines

    Full text link
    With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition. In this paper we evaluate the usefulness of these proposed techniques on broadcast news (BN), a similar challenging task. We also perform a set of recognition measurements to understand how close the achieved automatic speech recognition results are to human performance on this task. On two publicly available BN test sets, DEV04F and RT04, our speech recognition system using LSTM and residual network based acoustic models with a combination of n-gram and neural network language models performs at 6.5% and 5.9% word error rate. By achieving new performance milestones on these test sets, our experiments show that techniques developed on other related tasks, like CTS, can be transferred to achieve similar performance. In contrast, the best measured human recognition performance on these test sets is much lower, at 3.6% and 2.8% respectively, indicating that there is still room for new techniques and improvements in this space, to reach human performance levels.Comment: \copyright 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

    Porting the galaxy system to Mandarin Chinese

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (leaves 83-86).by Chao Wang.M.S

    Spoken Language Learning System : an online conversational spoken language learning system

    Get PDF
    Thesis: M. Eng., Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003Includes bibliographical references (leaves 75-77).The Spoken Language Learning System (SLLS) is intended to be an engaging, educational, and extensible spoken language learning system showcasing the multilingual capabilities of the Spoken Language Systems Group's (SLS) systems. The motivation behind SLLS is to satisfy both the demand for spoken language learning in an increasingly multi-cultural society and the desire for continued development of the multilingual systems at SLS. SLLS is an integration of an Internet presence with augmentations to SLS's Mandarin systems built within the Galaxy architecture, focusing on the situation of an English speaker learning Mandarin. We offer language learners the ability to listen to spoken phrases and simulated conversations online, engage in interactive dynamic conversations over the telephone, and review audio and visual feedback of their conversations. We also provide a wide array of administration and maintenance features online for teachers and administrators to facilitate continued system development and user interaction, such as lesson plan creation, vocabulary management, and a requests forum. User studies have shown that there is an appreciation for the potential of the system and that the core operation is intuitive and entertaining. The studies have also helped to illuminate the vast array of future work necessary to further polish the language learning experience and reduce the administrative burden. The focus of this thesis is the creation of the first iteration of SLLS; we believe we have taken the first step down the long but hopeful path towards helping people speak a foreign language.by Tien-Lok Jonathan Lau.M. Eng.M.Eng. Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Scienc

    PHONOTACTIC AND ACOUSTIC LANGUAGE RECOGNITION

    Get PDF
    Práce pojednává o fonotaktickém a akustickém přístupu pro automatické rozpoznávání jazyka. První část práce pojednává o fonotaktickém přístupu založeném na výskytu fonémových sekvenci v řeči. Nejdříve je prezentován popis vývoje fonémového rozpoznávače jako techniky pro přepis řeči do sekvence smysluplných symbolů. Hlavní důraz je kladen na dobré natrénování fonémového rozpoznávače a kombinaci výsledků z několika fonémových rozpoznávačů trénovaných na různých jazycích (Paralelní fonémové rozpoznávání následované jazykovými modely (PPRLM)). Práce také pojednává o nové technice anti-modely v PPRLM a studuje použití fonémových grafů místo nejlepšího přepisu. Na závěr práce jsou porovnány dva přístupy modelování výstupu fonémového rozpoznávače -- standardní n-gramové jazykové modely a binární rozhodovací stromy. Hlavní přínos v akustickém přístupu je diskriminativní modelování cílových modelů jazyků a první experimenty s kombinací diskriminativního trénování a na příznacích, kde byl odstraněn vliv kanálu. Práce dále zkoumá různé druhy technik fúzi akustického a fonotaktického přístupu. Všechny experimenty jsou provedeny na standardních datech z NIST evaluaci konané v letech 2003, 2005 a 2007, takže jsou přímo porovnatelné s výsledky ostatních skupin zabývajících se automatickým rozpoznáváním jazyka. S fúzí uvedených technik jsme posunuli state-of-the-art výsledky a dosáhli vynikajících výsledků ve dvou NIST evaluacích.This thesis deals with phonotactic and acoustic techniques for automatic language recognition (LRE). The first part of the thesis deals with the phonotactic language recognition based on co-occurrences of phone sequences in speech. A thorough study of phone recognition as tokenization technique for LRE is done, with focus on the amounts of training data for phone recognizer and on the combination of phone recognizers trained on several language (Parallel Phone Recognition followed by Language Model - PPRLM). The thesis also deals with novel technique of anti-models in PPRLM and investigates into using phone lattices instead of strings. The work on phonotactic approach is concluded by a comparison of classical n-gram modeling techniques and binary decision trees. The acoustic LRE was addressed too, with the main focus on discriminative techniques for training target language acoustic models and on initial (but successful) experiments with removing channel dependencies. We have also investigated into the fusion of phonotactic and acoustic approaches. All experiments were performed on standard data from NIST 2003, 2005 and 2007 evaluations so that the results are directly comparable to other laboratories in the LRE community. With the above mentioned techniques, the fused systems defined the state-of-the-art in the LRE field and reached excellent results in NIST evaluations.

    A comparison of stuttering behavior and fluency improvement in english-mandarin bilinguals who stutter

    Get PDF
    Despite the number of bilinguals and speakers of English and Mandarin worldwide, up till now there have been no investigations of stuttering in any of the Chinese languages, or in bilinguals who speak both English and Mandarin. Hence, it is not known whether stuttering behavior in Mandarin mimics that in English, or whether speech restructuring techniques such as Prolonged Speech produce the same fluency outcomes in Mandarin speakers as they do for English speakers. Research into stuttering in bilinguals is available but far from adequate. Although the limited extant studies show that bilinguals who stutter (BWS) may stutter either the same or differently across languages, and that treatment effects in one language can automatically carry over to the other language, it is unclear whether these findings are influenced by factors such as language dominance or language structure. These issues need to be clarified because speech language pathologists (SLPs) who work with bilinguals often do not speak the dominant language of their clients. Thus, the language of assessment and treatment becomes an important clinical consideration. The aim of this thesis was to investigate (a) whether the severity and type of stuttering was different in English and Mandarin in English-Mandarin bilingual adults, (b) whether this difference was influenced by language dominance, (c) whether stuttering reductions in English generalized to Mandarin following treatment in English only, and (d) whether treatment generalization was influenced by language dominance. To achieve these aims, a way of establishing the dominant language in bilinguals was a necessary first step. The first part of this thesis reviews the disorder of stuttering and the treatment for adults who stutter, the differences between English and Chinese languages, and stuttering in bilinguals. Part Two of this thesis describes the development of a tool for determining language dominance in a multilingual Asian population such as that found in Singapore. This study reviews the complex issues involved in assessing language dominance. It presents the rationale for and description of a self-report classification tool for identifying the dominant language in English-Mandarin bilingual Singaporeans. The decision regarding language dominance was based on a predetermined set of criteria using self-report questionnaire data on language proficiency, frequency of language use, and domain of language use. The tool was administered to 168 English-Mandarin bilingual participants, and the self-report data were validated against the results of a discriminant analysis. The discriminant analysis revealed a reliable three-way classification into English-dominant, Mandarin-dominant, and balanced bilinguals. Scores on a single word receptive vocabulary test supported these dominance classifications. Part Three of this thesis contains two studies investigating stuttering in BWS. The second study of this thesis examined the influence of language dominance on the manifestation of stuttering in English-Mandarin BWS. Results are presented for 30 English-Mandarin BWS who were divided according to their bilingual classification group: 15 English-dominant, four Mandarin-dominant, and 11 balanced bilinguals. All participants underwent comprehensive speech evaluations in both languages. The English-dominant and Mandarin-dominant BWS were found to exhibit greater stuttering in their less dominant language, whereas the balanced bilinguals evidenced similar levels of stuttering in both languages. An analysis of the types of stutter using the Lidcombe Behavioral Data Language showed no significant differences between English and Mandarin for all bilingual groups. In the third study of this thesis, the influence of language dominance on the generalization of stuttering reductions from English to Mandarin was investigated. Results are provided for seven English-dominant, three Mandarin-dominant, and four balanced bilinguals who underwent a Smooth Speech intensive program in English only. A comparison of stuttering between their pretreatment scores and three posttreatment interval scores indicated that the degree of fluency transfer from the treated to the untreated language was disproportionate. English-dominant and Mandarin-dominant participants showed greater fluency improvement in their dominant language even if this language was not directly treated. In the final chapter, Part Four, a hypothesis is provided to explain the findings of this thesis. A discussion of the limitations of the thesis and suggestions for future research are also presented. The chapter concludes with a summary of the main contributions that this thesis makes to the field of stuttering in bilinguals

    A comparison of stuttering behavior and fluency improvement in english-mandarin bilinguals who stutter

    Get PDF
    Despite the number of bilinguals and speakers of English and Mandarin worldwide, up till now there have been no investigations of stuttering in any of the Chinese languages, or in bilinguals who speak both English and Mandarin. Hence, it is not known whether stuttering behavior in Mandarin mimics that in English, or whether speech restructuring techniques such as Prolonged Speech produce the same fluency outcomes in Mandarin speakers as they do for English speakers. Research into stuttering in bilinguals is available but far from adequate. Although the limited extant studies show that bilinguals who stutter (BWS) may stutter either the same or differently across languages, and that treatment effects in one language can automatically carry over to the other language, it is unclear whether these findings are influenced by factors such as language dominance or language structure. These issues need to be clarified because speech language pathologists (SLPs) who work with bilinguals often do not speak the dominant language of their clients. Thus, the language of assessment and treatment becomes an important clinical consideration. The aim of this thesis was to investigate (a) whether the severity and type of stuttering was different in English and Mandarin in English-Mandarin bilingual adults, (b) whether this difference was influenced by language dominance, (c) whether stuttering reductions in English generalized to Mandarin following treatment in English only, and (d) whether treatment generalization was influenced by language dominance. To achieve these aims, a way of establishing the dominant language in bilinguals was a necessary first step. The first part of this thesis reviews the disorder of stuttering and the treatment for adults who stutter, the differences between English and Chinese languages, and stuttering in bilinguals. Part Two of this thesis describes the development of a tool for determining language dominance in a multilingual Asian population such as that found in Singapore. This study reviews the complex issues involved in assessing language dominance. It presents the rationale for and description of a self-report classification tool for identifying the dominant language in English-Mandarin bilingual Singaporeans. The decision regarding language dominance was based on a predetermined set of criteria using self-report questionnaire data on language proficiency, frequency of language use, and domain of language use. The tool was administered to 168 English-Mandarin bilingual participants, and the self-report data were validated against the results of a discriminant analysis. The discriminant analysis revealed a reliable three-way classification into English-dominant, Mandarin-dominant, and balanced bilinguals. Scores on a single word receptive vocabulary test supported these dominance classifications. Part Three of this thesis contains two studies investigating stuttering in BWS. The second study of this thesis examined the influence of language dominance on the manifestation of stuttering in English-Mandarin BWS. Results are presented for 30 English-Mandarin BWS who were divided according to their bilingual classification group: 15 English-dominant, four Mandarin-dominant, and 11 balanced bilinguals. All participants underwent comprehensive speech evaluations in both languages. The English-dominant and Mandarin-dominant BWS were found to exhibit greater stuttering in their less dominant language, whereas the balanced bilinguals evidenced similar levels of stuttering in both languages. An analysis of the types of stutter using the Lidcombe Behavioral Data Language showed no significant differences between English and Mandarin for all bilingual groups. In the third study of this thesis, the influence of language dominance on the generalization of stuttering reductions from English to Mandarin was investigated. Results are provided for seven English-dominant, three Mandarin-dominant, and four balanced bilinguals who underwent a Smooth Speech intensive program in English only. A comparison of stuttering between their pretreatment scores and three posttreatment interval scores indicated that the degree of fluency transfer from the treated to the untreated language was disproportionate. English-dominant and Mandarin-dominant participants showed greater fluency improvement in their dominant language even if this language was not directly treated. In the final chapter, Part Four, a hypothesis is provided to explain the findings of this thesis. A discussion of the limitations of the thesis and suggestions for future research are also presented. The chapter concludes with a summary of the main contributions that this thesis makes to the field of stuttering in bilinguals

    Speech-enabled games for vocabulary acquisition in a foreign language

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 79-85).In this thesis, I present two novel ways in which speech recognition technology might aid students with vocabulary acquisition in a foreign language. While research in the applied linguistics field of second language acquisition (SLA) increasingly suggests that students of a foreign language should learn through meaningful interactions carried out in that language, teachers are rarely equipped with tools that allow them to provide interactive environments outside of the classroom. Fortunately, speech and language technologies are becoming robust enough to aid in this regard. This thesis presents two distinct speech-enabled systems to assist students with the difficult task of vocabulary acquisition in Mandarin Chinese. At the core of each system is a Mandarin speech recognizer that, when connected to a web-based graphical user interface, provides students with an interactive environment in which to acquire new Mandarin vocabulary.by Ian C. McGraw.S.M
    corecore