132 research outputs found

    Analysis of Dialectal Influence in Pan-Arabic ASR

    Get PDF
    Abstract In this paper, we analyze the impact of five Arabic dialects on the front-end and pronunciation dictionary component of an Automatic Speech Recognition (ASR) system. We use ASR"s phonetic decision tree as a diagnostic tool to compare the robustness of MFCC to MLP front-ends to dialectal variations in the speech data and found that MLP Bottle-Neck features are less robust to dialectal variation. We also perform a rulebased analysis of the pronunciation dictionary, which enables us to identify dialectal words in the vocabulary and automatically generate pronunciations for unseen words. We show that our technique produces pronunciations with an average phone error rate 9.2%

    Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

    Get PDF
    Peer reviewe

    Characterizing phonetic transformations and fine-grained acoustic differences across dialects

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 169-175).This thesis is motivated by the gaps between speech science and technology in analyzing dialects. In speech science, investigating phonetic rules is usually manually laborious and time consuming, limiting the amount of data analyzed. Without sufficient data, the analysis could potentially overlook or over-specify certain phonetic rules. On the other hand, in speech technology such as automatic dialect recognition, phonetic rules are rarely modeled explicitly. While many applications do not require such knowledge to obtain good performance, it is beneficial to specifically model pronunciation patterns in certain applications. For example, users of language learning software can benefit from explicit and intuitive feedback from the computer to alter their pronunciation; in forensic phonetics, it is important that results of automated systems are justifiable on phonetic grounds. In this work, we propose a mathematical framework to analyze dialects in terms of (1) phonetic transformations and (2) acoustic differences. The proposed Phonetic based Pronunciation Model (PPM) uses a hidden Markov model to characterize when and how often substitutions, insertions, and deletions occur. In particular, clustering methods are compared to better model deletion transformations. In addition, an acoustic counterpart of PPM, Acoustic-based Pronunciation Model (APM), is proposed to characterize and locate fine-grained acoustic differences such as formant transitions and nasalization across dialects. We used three data sets to empirically compare the proposed models in Arabic and English dialects. Results in automatic dialect recognition demonstrate that the proposed models complement standard baseline systems. Results in pronunciation generation and rule retrieval experiments indicate that the proposed models learn underlying phonetic rules across dialects. Our proposed system postulates pronunciation rules to a phonetician who interprets and refines them to discover new rules or quantify known rules. This can be done on large corpora to develop rules of greater statistical significance than has previously been possible. Potential applications of this work include speaker characterization and recognition, automatic dialect recognition, automatic speech recognition and synthesis, forensic phonetics, language learning or accent training education, and assistive diagnosis tools for speech and voice disorders.by Nancy Fang-Yih Chen.Ph.D

    Grammatical Contact in the Sahara: Arabic, Berber, and Songhay in Tabelbala and Siwa

    Get PDF
    This thesis examines the effects of contact on the grammars of the languages of two Saharan oases, Siwa and Tabelbala. These share similar linguistic ecologies in many respects, and can be regarded as among the most extreme representatives of a language contact situation ongoing for centuries across the oases of the northern Sahara. This work identifies and argues for contact effects across a wide range of core morphology and syntax, using these both to shed new light on regional history and to test claims about the limits on, and expected outcomes of, contact. While reaffirming the ubiquity of pattern copying, the results encourage an expanded understanding of the role of material borrowing in grammatical contact, showing that the borrowing of functional morphemes and of paradigmatic sets of words or phrases containing them can lead to grammatical change. More generally, it confirms the uniformitarian principle that diachronic change arises through the long-term application of processes observable in synchronic language contact situations. The similarity of the sociolinguistic situations provides a close approximation to a natural controlled experiment, allowing us to pinpoint cases where differences in the original structure of the recipient language appear to have influenced its receptivity to external influence in those aspects of structure

    Book Reviews

    Get PDF
    no abstrac

    Determining code choice: written slogans during Egyptian revolution-January 2011

    Get PDF
    This qualitative study aims at depicting the phenomenon of the written code switching between Modern Standard Arabic (MSA) and Egyptian colloquial Arabic (ECA) in the written slogans during the Egyptian revolution January 2011. Findings show that ECA clauses comprise a significant percentage as observed from the survey done through the questionnaire and application of Myers-Scotton\u27s (1993) theoretical framework of Matrix Language Frame (MLF). These findings shed light on 1) Merging between MSA and ECA is a distinctive feature in Arabic in many domains 2)The best ways to benefit from this phenomenon in AFL teaching 3)The importance of the event in documenting the Arabic language and its varieties in face of future language change;and 4) Building on Bassiouney\u27s (2010) idea that code switching and role are related, this thesis demonstrates that the protestors chose ECA code when they wanted to express their anger and embrace their new role or identity as having power over the regime
    corecore