345 research outputs found

    Automatic speech recognition of Cantonese-English code-mixing utterances.

    Get PDF
    Chan Yeuk Chi Joyce.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references.Abstracts in English and Chinese.Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Previous Work on Code-switching Speech Recognition --- p.2Chapter 1.2.1 --- Keyword Spotting Approach --- p.3Chapter 1.2.2 --- Translation Approach --- p.4Chapter 1.2.3 --- Language Boundary Detection --- p.6Chapter 1.3 --- Motivations of Our Work --- p.7Chapter 1.4 --- Methodology --- p.8Chapter 1.5 --- Thesis Outline --- p.10Chapter 1.6 --- References --- p.11Chapter Chapter 2 --- Fundamentals of Large Vocabulary Continuous Speech Recognition for Cantonese and English --- p.14Chapter 2.1 --- Basic Theory of Speech Recognition --- p.14Chapter 2.1.1 --- Feature Extraction --- p.14Chapter 2.1.2 --- Maximum a Posteriori (MAP) Probability --- p.15Chapter 2.1.3 --- Hidden Markov Model (HMM) --- p.16Chapter 2.1.4 --- Statistical Language Modeling --- p.17Chapter 2.1.5 --- Search A lgorithm --- p.18Chapter 2.2 --- Word Posterior Probability (WPP) --- p.19Chapter 2.3 --- Generalized Word Posterior Probability (GWPP) --- p.23Chapter 2.4 --- Characteristics of Cantonese --- p.24Chapter 2.4.1 --- Cantonese Phonology --- p.24Chapter 2.4.2 --- Variation and Change in Pronunciation --- p.27Chapter 2.4.3 --- Syllables and Characters in Cantonese --- p.28Chapter 2.4.4 --- Spoken Cantonese vs. Written Chinese --- p.28Chapter 2.5 --- Characteristics of English --- p.30Chapter 2.5.1 --- English Phonology --- p.30Chapter 2.5.2 --- English with Cantonese Accents --- p.31Chapter 2.6 --- References --- p.32Chapter Chapter 3 --- Code-mixing and Code-switching Speech Recognition --- p.35Chapter 3.1 --- Introduction --- p.35Chapter 3.2 --- Definition --- p.35Chapter 3.2.1 --- Monolingual Speech Recognition --- p.35Chapter 3.2.2 --- Multilingual Speech Recognition --- p.35Chapter 3.2.3 --- Code-mixing and Code-switching --- p.36Chapter 3.3 --- Conversation in Hong Kong --- p.38Chapter 3.3.1 --- Language Choice of Hong Kong People --- p.38Chapter 3.3.2 --- Reasons for Code-mixing in Hong Kong --- p.40Chapter 3.3.3 --- How Does Code-mixing Occur? --- p.41Chapter 3.4 --- Difficulties for Code-mixing - Specific to Cantonese-English --- p.44Chapter 3.4.1 --- Phonetic Differences --- p.45Chapter 3.4.2 --- Phonology difference --- p.48Chapter 3.4.3 --- Accent and Borrowing --- p.49Chapter 3.4.4 --- Lexicon and Grammar --- p.49Chapter 3.4.5 --- Lack of Appropriate Speech Corpus --- p.50Chapter 3.5 --- References --- p.50Chapter Chapter 4 --- Data Collection --- p.53Chapter 4.1 --- Data Collection --- p.53Chapter 4.1.1 --- Corpus Design --- p.53Chapter 4.1.2 --- Recording Setup --- p.59Chapter 4.1.3 --- Post-processing of Speech Data --- p.60Chapter 4.2 --- A Baseline Database --- p.61Chapter 4.2.1 --- Monolingual Spoken Cantonese Speech Data (CUMIX) --- p.61Chapter 4.3 --- References --- p.61Chapter Chapter 5 --- System Design and Experimental Setup --- p.63Chapter 5.1 --- Overview of the Code-mixing Speech Recognizer --- p.63Chapter 5.1.1 --- Bilingual Syllable / Word-based Speech Recognizer --- p.63Chapter 5.1.2 --- Language Boundary Detection --- p.64Chapter 5.1.3 --- Generalized Word Posterior Probability (GWPP) --- p.65Chapter 5.2 --- Acoustic Modeling --- p.66Chapter 5.2.1 --- Speech Corpus for Training of Acoustic Models --- p.67Chapter 5.2.2 --- Features Extraction --- p.69Chapter 5.2.3 --- Variability in the Speech Signal --- p.69Chapter 5.2.4 --- Language Dependency of the Acoustic Models --- p.71Chapter 5.2.5 --- Pronunciation Dictionary --- p.80Chapter 5.2.6 --- The Training Process of Acoustic Models --- p.83Chapter 5.2.7 --- Decoding and Evaluation --- p.88Chapter 5.3 --- Language Modeling --- p.90Chapter 5.3.1 --- N-gram Language Model --- p.91Chapter 5.3.2 --- Difficulties in Data Collection --- p.91Chapter 5.3.3 --- Text Data for Training Language Model --- p.92Chapter 5.3.4 --- Training Tools --- p.95Chapter 5.3.5 --- Training Procedure --- p.95Chapter 5.3.6 --- Evaluation of the Language Models --- p.98Chapter 5.4 --- Language Boundary Detection --- p.99Chapter 5.4.1 --- Phone-based LBD --- p.100Chapter 5.4.2 --- Syllable-based LBD --- p.104Chapter 5.4.3 --- LBD Based on Syllable Lattice --- p.106Chapter 5.5 --- "Integration of the Acoustic Model Scores, Language Model Scores and Language Boundary Information" --- p.107Chapter 5.5.1 --- Integration of Acoustic Model Scores and Language Boundary Information. --- p.107Chapter 5.5.2 --- Integration of Modified Acoustic Model Scores and Language Model Scores --- p.109Chapter 5.5.3 --- Evaluation Criterion --- p.111Chapter 5.6 --- References --- p.112Chapter Chapter 6 --- Results and Analysis --- p.118Chapter 6.1 --- Speech Data for Development and Evaluation --- p.118Chapter 6.1.1 --- Development Data --- p.118Chapter 6.1.2 --- Testing Data --- p.118Chapter 6.2 --- Performance of Different Acoustic Units --- p.119Chapter 6.2.1 --- Analysis of Results --- p.120Chapter 6.3 --- Language Boundary Detection --- p.122Chapter 6.3.1 --- Phone-based Language Boundary Detection --- p.123Chapter 6.3.2 --- Syllable-based Language Boundary Detection (SYL LB) --- p.127Chapter 6.3.3 --- Language Boundary Detection Based on Syllable Lattice (BILINGUAL LBD) --- p.129Chapter 6.3.4 --- Observations --- p.129Chapter 6.4 --- Evaluation of the Language Models --- p.130Chapter 6.4.1 --- Character Perplexity --- p.130Chapter 6.4.2 --- Phonetic-to-text Conversion Rate --- p.131Chapter 6.4.3 --- Observations --- p.131Chapter 6.5 --- Character Error Rate --- p.132Chapter 6.5.1 --- Without Language Boundary Information --- p.133Chapter 6.5.2 --- With Language Boundary Detector SYL LBD --- p.134Chapter 6.5.3 --- With Language Boundary Detector BILINGUAL-LBD --- p.136Chapter 6.5.4 --- Observations --- p.138Chapter 6.6 --- References --- p.141Chapter Chapter 7 --- Conclusions and Suggestions for Future Work --- p.143Chapter 7.1 --- Conclusion --- p.143Chapter 7.1.1 --- Difficulties and Solutions --- p.144Chapter 7.2 --- Suggestions for Future Work --- p.149Chapter 7.2.1 --- Acoustic Modeling --- p.149Chapter 7.2.2 --- Pronunciation Modeling --- p.149Chapter 7.2.3 --- Language Modeling --- p.150Chapter 7.2.4 --- Speech Data --- p.150Chapter 7.2.5 --- Language Boundary Detection --- p.151Chapter 7.3 --- References --- p.151Appendix A Code-mixing Utterances in Training Set of CUMIX --- p.152Appendix B Code-mixing Utterances in Testing Set of CUMIX --- p.175Appendix C Usage of Speech Data in CUMIX --- p.20

    Leveraging writing systems changes for deep learning based Chinese affective analysis

    Get PDF
    Affective analysis of social media text is in great demand. Online text written in Chinese communities often contains mixed scripts including major text written in Chinese, an ideograph-based writing system, and minor text using Latin letters, an alphabet-based writing system. This phenomenon is referred to as writing systems changes (WSCs). Past studies have shown that WSCs often reflect unfiltered immediate affections. However, the use of WSCs poses more challenges in Natural Language Processing tasks because WSCs can break the syntax of the major text. In this work, we present our work to use WSCs as an effective feature in a hybrid deep learning model with attention network. The WSCs scripts are first identified by their encoding range. Then, the document representation of the text is learned through a Long Short-Term Memory model and the minor text is learned by a separate Convolution Neural Network model. To further highlight the WSCs components, an attention mechanism is adopted to re-weight the feature vector before the classification layer. Experiments show that the proposed hybrid deep learning method which better incorporates WSCs features can further improve performance compared to the state-of-the-art classification models. The experimental result indicates that WSCs can serve as effective information in affective analysis of the social media text

    Aspects of the Syntax, Production and Pragmatics of code-switching - with special reference to Cantonese-English

    Get PDF
    This dissertation argues for the position that code-switching utterances are constrained by the same set of mechanisms as those which govern monolingual utterances. While this thesis is in line with more recent code-switching theories (e.g. Belazi et al. 1994, MacSwan 1997, Mahootian 1993), this dissertation differs from those works in making two specific claims: Firstly, functional categories and lexical categories exhibit different syntactic behaviour in code-switching. Secondly, codeswitching is subject to the same principles not only in syntax, but also in production and pragmatics. Chapter 2 presents a critical review of constraints and processing models previously proposed in the literature. It is suggested that in view of the vast variety of data, no existing model is completely adequate. Nevertheless, it is argued that a model which does not postulate syntactic constraints (along the lines of Mahootian 1993, MacSwan 1997) or production principles (along the lines of de Bot 1992) specific to code switching is to be preferred on cognitive and theoretical grounds. Chapter 3 concerns word order between lexical heads and their complements in code-switching. It is shown that the language of a lexical head (i.e. noun or verb) may or may not determine the word order of its complement. Chapter 4 investigates word order between functional heads and their complements in code-switching. Contrary to the case with lexical categories, the language of functional heads (e.g. D, I and C) is shown to determine the word order of their complements in code-switching. It is proposed that word order between heads (lexical or functional) and complements is governed by head-parameters, and the difference between lexical heads and functional heads is due to their differential processing and production in terms of Levelt's (1989) algorithm. Chapter 5 investigates the selection properties of functional categories in codeswitching, with special reference to Cantonese-English. Contrary to the Functional Head Constraint (Belazi et al. 1994), it is shown that code-switching can occur freely between functional heads and their complements, provided that the c-selection requirements of the functional heads are satisfied. Chapter 6 investigates the selection properties of lexical categories in code-switching, again with special reference to Cantonese-English. It is shown that "language-specific" c-selection properties need not be observed: a Cantonese verb may take an English DP whereas an English verb may take a Cantonese demonstrative phrase (DemP). Similar phenomena are drawn from other language-pairs involving a language with morphological case and a language without morphological case. The difference between functional categories and lexical categories in their selection properties is again explained in terms of the different production processes they undergo. Chapter 7 is devoted to prepositions which have been problematic in terms of their status as a functional category or a lexical category. Based on the behaviour of prepositions in code-switching, it is suggested that prepositions display a dual character. It is proposed that prepositions may well point to the fact that the conventional dichotomy between functional categories and lexical categories is not a primitive one in the lexicon. Chapter 8 looks at code-switching in a wider perspective. and explores the pragmatic determinants of code-switching in the light of Relevance Theory (Sperber and Wilson 1995). It is argued that many types of code-switching (e.g. repetitions, quotations, etc.) are motivated by the desire to optimize the "relevance" of a message, with "relevance" as defined in Relevance Theory

    The language and literacy development of young dual language learners: A critical review

    Get PDF
    The number of children living in the United States who are learning two languages is increasing greatly. However, relatively little research has been conducted on the language and literacy development of dual language learners (DLLs), particularly during the early childhood years. To summarize the extant literature and guide future research, a critical analysis of the literature was conducted. A search of major databases for studies on young typically developing DLLs between 2000–2011 yielded 182 peer-reviewed articles. Findings about DLL children’s developmental trajectories in the various areas of language and literacy are presented. Much of these findings should be considered preliminary, because there were few areas where multiple studies were conducted. Conclusions were reached when sufficient evidence existed in a particular area. First, the research shows that DLLs have two separate language systems early in life. Second, differences in some areas of language development, such as vocabulary, appear to exist among DLLs depending on when they were first exposed to their second language. Third, DLLs’ language and literacy development may differ from that of monolinguals, although DLLs appear to catch up over time. Fourth, little is known about factors that influence DLLs’ development, although the amount of language exposure to and usage of DLLs’ two languages appears to play key roles. Methodological issues are addressed, and directions for future research are discussed

    Research Developments in World Englishes

    Get PDF
    This book is available as open access through the Bloomsbury Open Access programme and is available on www.bloomsburycollections.com. It is funded by the University of Klagenfurt, Austria. Discussing key issues of current relevance and setting the tone for future research in world Englishes, this book provides new perspectives on the diverse realities of Englishes around the world. Written by an international team of established and renowned scholars, it is the inaugural volume in the new series Bloomsbury Advances in World Englishes, dedicated to advancing research in the field. Chapters discuss important topics in contemporary world Englishes research, including de-colonial approaches, emerging varieties in post-protectorates and international uses as communicative events to highlight the globalizing aspect of English as a semiotic code. The book also expands on cultural conceptualizations to investigate the connections between Englishes and localized cultural knowledge and ongoing changes and attitudes towards local forms in multilingual settings. Closing with an examination of how world Englishes and the use of English as a lingua franca could influence the future teaching of Englishes, Research Developments in World Englishes presents a detailed picture of contemporary research approaches and points the way towards exciting future directions

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Research Developments in World Englishes

    Get PDF
    This book is available as open access through the Bloomsbury Open Access programme and is available on www.bloomsburycollections.com. It is funded by the University of Klagenfurt, Austria. Discussing key issues of current relevance and setting the tone for future research in world Englishes, this book provides new perspectives on the diverse realities of Englishes around the world. Written by an international team of established and renowned scholars, it is the inaugural volume in the new series Bloomsbury Advances in World Englishes, dedicated to advancing research in the field. Chapters discuss important topics in contemporary world Englishes research, including de-colonial approaches, emerging varieties in post-protectorates and international uses as communicative events to highlight the globalizing aspect of English as a semiotic code. The book also expands on cultural conceptualizations to investigate the connections between Englishes and localized cultural knowledge and ongoing changes and attitudes towards local forms in multilingual settings. Closing with an examination of how world Englishes and the use of English as a lingua franca could influence the future teaching of Englishes, Research Developments in World Englishes presents a detailed picture of contemporary research approaches and points the way towards exciting future directions
    • …
    corecore