411 research outputs found

    Computational Approaches to Exploring Persian-Accented English

    Get PDF
    Methods involving phonetic speech recognition are discussed for detecting Persian-accented English. These methods offer promise for both the identification and mitigation of L2 pronunciation errors. Pronunciation errors, both segmental and suprasegmental, particular to Persian speakers of English are discussed

    MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

    Get PDF
    This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems

    Dealing with linguistic mismatches for automatic speech recognition

    Get PDF
    Recent breakthroughs in automatic speech recognition (ASR) have resulted in a word error rate (WER) on par with human transcribers on the English Switchboard benchmark. However, dealing with linguistic mismatches between the training and testing data is still a significant challenge that remains unsolved. Under the monolingual environment, it is well-known that the performance of ASR systems degrades significantly when presented with the speech from speakers with different accents, dialects, and speaking styles than those encountered during system training. Under the multi-lingual environment, ASR systems trained on a source language achieve even worse performance when tested on another target language because of mismatches in terms of the number of phonemes, lexical ambiguity, and power of phonotactic constraints provided by phone-level n-grams. In order to address the issues of linguistic mismatches for current ASR systems, my dissertation investigates both knowledge-gnostic and knowledge-agnostic solutions. In the first part, classic theories relevant to acoustics and articulatory phonetics that present capability of being transferred across a dialect continuum from local dialects to another standardized language are re-visited. Experiments demonstrate the potentials that acoustic correlates in the vicinity of landmarks could help to build a bridge for dealing with mismatches across difference local or global varieties in a dialect continuum. In the second part, we design an end-to-end acoustic modeling approach based on connectionist temporal classification loss and propose to link the training of acoustics and accent altogether in a manner similar to the learning process in human speech perception. This joint model not only performed well on ASR with multiple accents but also boosted accuracies of accent identification task in comparison to separately-trained models

    Mispronunciation of High Front and Low Hausa Vowels among the Yorùbá Speakers

    Get PDF
    Pronunciation in second language learning is sometimes challenging, especially the vowels. Vowels such as [i] and [a] are found both in Hausa and Yorùbá but [i:] and [a:] are peculiar to Hausa alone. While Hausa has short and long vowels, Yorùbá has only oral and nasal vowels in their vowel inventories. Such phonemic differences constitute learning challenges, especially for the Yorùbá native speakers. This is a cross-sectional study design using mixed methods to examines the production of high front vowels: [i], and [i:], as well as low: [a], and [a:] Hausa vowels by the Yorùbá speakers to identify which group perform better between group 1 (Yorùbá native speakers who learned Hausa in the secondary school before going to the college of education), and group 2 (Yorùbá native speakers who learned Hausa informally before going to the college of education). The study also seeks to find out vowel substitutions that occur in the pronunciation tasks using 80 participants from 18 years old and above from the College of Education system in Nigeria who were selected based on purposive sampling. The findings were discussed in line with Flege & Bohn’s (2020) ‘Revised Speech Learning Model’. 8 stimuli were audio-recorded, transcribed, and rated by two independent raters, in addition to participant observation techniques adapted. The results of the Mann-Whitney test revealed that group 2 performed better than group 1. The study discovered also that the short [a] in the first and second syllables had the highest frequency of substitution compared to [i], [i:] and [a:] vowels. Such problems have pedagogical implications for learning Hausa as a second language

    English as a lingua franca: mutual intelligibility of Chinese, Dutch and American speakers of English

    Get PDF
    English has become the language of international communication. As a result of this development, we are now confronted with a bewildering variety of ‘Englishes’, spoken with non-native accents. Research determining how intelligible non-native speakers of varying native-language backgrounds are to each other and to native speakers of English has only just started to receive attention. This thesis investigated to what extent Chinese, Dutch and American speakers of English are mutually intelligible. Intelligibility of vowels, simplex consonants and consonant clusters was tested in meaningless sound sequences, as well as in words in meaningless and meaningful short sentences. Speakers (one male, one female per language background) were selected so as to be optimally representative of their peer groups, which were made up of young academic users of English. Intelligibility was tested for all nine combinations of speaker and listener backgrounds. Results show that Chinese-accented English is less intelligible overall than Dutch-accented English, which is less intelligible than American English. Generally, the native-language background of the speaker was less important for the intelligibility than the background of the listener. Also, the results reveal a clear and consistent so-called interlanguage speech intelligibility benefit: speakers of English – whether foreign or native – are more intelligible to listeners with whom they share the native-language background than to listeners with a different native language.LEI Universiteit LeidenChina Scholarship Council; Leids Universiteits FondsTheoretical and Experimental Linguistic

    The Effect of Speech Elicitation Method on Second Language Phonemic Accuracy

    Get PDF
    The present study, a One-Group Posttest-Only Repeated-Measures Design, examined the effect of speech elicitation method on second language (L2) phonemic accuracy of high functional load initial phonemes found in frequently occurring nouns in American English. This effect was further analyzed by including the variable of first language (L1) to determine if L1 moderated any effects found. The data consisted of audio recordings of 61 adult English learners (ELs) enrolled in English for Academic Purposes (EAP) courses at a large, public, post-secondary institution in the United States. Phonemic accuracy was judged by two independent raters as either approximating a standard American English (SAE) pronunciation of the intended phoneme or not, thus a dichotomous scale, and scores were assigned to each participant in terms of the three speech elicitation methods of word reading, word repetition, and picture naming. Results from a repeated measures ANOVA test revealed a statistically significant difference in phonemic accuracy (F(1.47, 87.93) = 25.94, p = .000) based on speech elicitation method, while the two-factor mixed design ANOVA test indicated no statistically significant differences for the moderator variable of native language. However, post-hoc analyses revealed that mean scores of picture naming tasks differed significantly from the other two elicitation methods of word reading and word repetition. Moreover, the results of this study should heighten attention to the role that various speech elicitation methods, or input modalities, might play on L2 productive accuracy. Implications for practical application suggest that caution should be used when utilizing pictures to elicit specific vocabulary words–even high-frequency words–as they might result in erroneous productions or no utterance at all. These methods could inform pronunciation instructors about best teaching practices when pronunciation accuracy is the objective. Finally, the impact of L1 on L2 pronunciation accuracy might not be as important as once thought

    The Perceptual and Production Training of /d, tap, r/ in L2 Spanish: Behavioral, Psycholinguistic, and Neurolinguistic Evidence

    Get PDF
    When native speakers of American English begin learning Spanish, their acquisition of native-like pronunciation can be hampered by the tap - trill distinction in words like coro `choir' and corro `I run'. The trill proves difficult because it does not exist in English. Although the tap exists as an allophone of /t/ and /d/ in English words like `writer' and `rider', students of Spanish must learn to process it as a phoneme rather than an allophone. Similarly, learners have difficulty acquiring the spirantization of voiced stops, where the /d/ in codo `elbow' is produced as a voiced dental fricative or approximant, which is more like the `th' sound in English. This study investigates whether American English-speaking learners of Spanish can be trained to perceive and produce the intervocalic tap, trill, and /d/ contrasts in Spanish. Participants were trained using both perceptual and production training methods. Past research has reported that perceptual training alone improves both perception and production and that production training alone improves both as well, but the production training studies have not been limited to production as trainees have been able to listen to the training stimuli. This study is important because it systematically controls both training modalities so that they can be directly compared and introduces a third training methodology that includes both perception and production to discover whether perceptual training, production training, or a combination of the two is most effective. This study also uses cross-modal priming and ERP data in addition to traditional tasks (identification and production tasks) to evaluate the effect of training, an innovative use of both tasks to determine if trainees not only perceive and produce the trained L2 contrasts but also if they unconsciously process these contrasts and if they have built new phonemic categories for these sounds. All three training paradigms improved English learners' perception or production. While production trainees did not improve in their overall perception and declined in their perception of one contrast, perception trainees improved in their production and overall perception, indicating that perception training transfers more effectively than production training

    The Effect of Shadowing in Learning L2 Segments: A Perspective from Phonetic Convergence

    Get PDF
    This study aimed to investigate the role that phonetic convergence plays in the acquisition of L2 segments. In particular, it examined whether phonetic convergence towards native speakers could help Arabic-speaking second-language (L2) learners of English improve their pronunciation of four problematic English segments (/p, v, ɛ, oʊ/). To do so, the study went through several phases of experimental studies. Phonetic convergence was first explored in the productions of Arabic L2 learners towards five different English native model talkers in non-interactive setting. Five XAB perceptual similarity judgments and acoustic measurements of VOT, vowel duration, F0, and F1*F2 were used to evaluate phonetic convergence.Based mainly on perceptual measures of phonetic convergence, learners were divided evenly between two groups. C-group (convergence group) received phonetic production training from the model talkers to whom they showed the highest degree of phonetic convergence, while D-group (divergence group) received training from the model talkers they showed divergence from or the least convergence to. Training lasted three consecutive days with target segments (i.e., /p, v, ɛ, oʊ/) presented in nonsense words. They were trained using the shadowing technique that used low-variability training paradigm in which each learner received training from one native model talker. Native-speaker judgments on segmental intelligibility indicated both groups showed significant improvement on the post-test; however, no significant differences were found between groups in terms of the overall magnitude of this change. Perceived convergence in learners’ speech failed to explain the improvement. However, some patterns of acoustic convergence towards their trainers, regardless of group, predicted the overall segmental intelligibility gains. The findings suggested that the more trainees converged their vowel duration and formants to their trainers, the more their performance improved. At featural level, the study examined the relationship between the preexisting phonetic distance between the Arabic L2 learners of English and model talkers before the exposure and the degree of convergence. Results indicated that there was a direct relationship between how far Arabic L2 learners were from the native model talkers and the degree of convergence in all measured acoustic features. That is, the greater the baseline distance, the greater the degree of phonetic convergence was. However, such a relationship might be due to the metric used to assess phonetic convergence. The relationship between phonetic convergence measured by difference in distance (DID) and the absolute baseline distance is always biased due to the way they are calculated (Cohen Priva & Sanker, 2019; MacLeod, 2021). This study found shadowing to be an effective technique to promote segmental intelligibility among Arabic-speakers learning English as an L2. However, this effectiveness might be increased by trainees converging more to their trainers in vowel duration and vowel spectra or being similar to their trainers in this regard from the beginning

    A Sound Approach to Language Matters: In Honor of Ocke-Schwen Bohn

    Get PDF
    The contributions in this Festschrift were written by Ocke’s current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ocke’s career, in connection with his teaching and/or his academic productions: “Segments”, “Perception of Accent”, “Between Sounds and Graphemes”, “Prosody”, “Morphology and Syntax” and “Second Language Acquisition”. Each one of these illustrates a sound approach to language matters
    corecore