2,564 research outputs found

    Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

    Full text link
    Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PSOLA algorithm. Motivated by the visual differences found in spectrograms of native and non-native speeches, we investigated applying GAN to generate self-imitating feedback by utilizing generator's ability through adversarial training. Because this mapping is highly under-constrained, we also adopt cycle consistency loss to encourage the output to preserve the global structure, which is shared by native and non-native utterances. Trained on 97,200 spectrogram images of short utterances produced by native and non-native speakers of Korean, the generator is able to successfully transform the non-native spectrogram input to a spectrogram with properties of self-imitating feedback. Furthermore, the transformed spectrogram shows segmental corrections that cannot be obtained by prosodic transplantation. Perceptual test comparing the self-imitating and correcting abilities of our method with the baseline PSOLA method shows that the generative approach with cycle consistency loss is promising

    Communicative focus on form and second language suprasegmental learning: teaching Cantonese learners to perceive mandarin tones

    Get PDF
    The current study examined how form-focused instruction (FFI) with and without corrective feedback (CF) as output enhancement can facilitate L2 perception of Mandarin tones at both the phonetic and phonological levels in 41 Cantonese learners of Mandarin. Two experimental groups, FFI-only and FFI-CF, received a 90-minute FFI treatment designed to encourage them to notice and practice the categorical distinctions of Mandarin tones through a range of communicative input and output activities. During these activities, the instructors provided CF only to students in the FFI-CF group by recasting and pushing them to repair their mispronunciations of the target features (i.e., output enhancement). The control group received comparable meaning-oriented instruction without any FFI. The effectiveness of FFI was assessed via a forced-choice identification task with both trained and untrained items for a variety of tonal contrasts in Mandarin (high level Tone 1 vs. mid-rising Tone 2 vs. high falling Tone 4). According to statistical comparisons, the FFI-only group attained significant improvement in all lexical and tonal contexts, and such effectiveness was evident particularly in the acquisition of Tone 1 and Tone 4—supposedly the most difficult instances due to their identical phonological status in the learners’ L1, Cantonese. The FFI-CF group, however, demonstrated marginally significant gains only under the trained lexical conditions. The results in turn suggest that FFI promotes learners’ attentional shift from vocabulary to sound learning (generalizable gains in trained and untrained items) and facilitates their access to new phonetic and phonological categories. Yet, the relative advantage of adding CF to FFI as output enhancement remains unclear, especially with respect to the less experienced L2 learners in the current study

    Teaching Lexical Stress: Effective Practice in a Mandarin ELL Context

    Get PDF
    Current trends in teaching pronunciation to ELLs (English Language Learners) point towards a top-down approach. This refers to putting emphasis on the overarching prosodic features of English rather than the proper pronunciation of consonants and vowels. One of the most integral prosodic features in English is stress. Both lexical stress (stressed syllables within a word) and sentence stress (stressed words within a sentence) play an important role in the prosodic pronunciation of English. However, some languages, such as Mandarin, lack stress in their prosodic systems, instead employing features such as tonality. These languages both have overlap in their fundamental prosodic structures, with pitch changes as integral to both tonality in Mandarin and stress in English. I propose that ESL instructors will instill prosodic skills and thus make better communicators of their students by drawing attention to this positive transfer between both systems

    Comprehensibility and Prosody Ratings for Pronunciation Software Development

    Get PDF
    In the context of a project developing software for pronunciation practice and feedback for Mandarin-speaking learners of English, a key issue is how to decide which features of pronunciation to focus on in giving feedback. We used naïve and experienced native speaker ratings of comprehensibility and nativeness to establish the key features affecting comprehensibility of the utterances of a group of Chinese learners of English. Native speaker raters assessed the comprehensibility of recorded utterances, pinpointed areas of difficulty and then rated for nativeness the same utterances, but after segmental information had been filtered out. The results show that prosodic information is important for comprehensibility, and that there are no significant differences between naïve and experienced raters on either comprehensibility or nativeness judgements. This suggests that naïve judgements are a useful and accessible source of data for identifying the parameters to be used in setting up automated feedback

    Rapid neural processing of grammatical tone in second language learners

    Get PDF
    The present dissertation investigates how beginner learners process grammatical tone in a second language and whether their processing is influenced by phonological transfer. Paper I focuses on the acquisition of Swedish grammatical tone by beginner learners from a non-tonal language, German. Results show that non-tonal beginner learners do not process the grammatical regularities of the tones but rather treat them akin to piano tones. A rightwards-going spread of activity in response to pitch difference in Swedish tones possibly indicates a process of tone sensitisation. Papers II to IV investigate how artificial grammatical tone, taught in a word-picture association paradigm, is acquired by German and Swedish learners. The results of paper II show that interspersed mismatches between grammatical tone and picture referents evoke an N400 only for the Swedish learners. Both learner groups produce N400 responses to picture mismatches related to grammatically meaningful vowel changes. While mismatch detection quickly reaches high accuracy rates, tone mismatches are least accurately and most slowly detected in both learner groups. For processing of the grammatical L2 words outside of mismatch contexts, the results of paper III reveal early, preconscious and late, conscious processing in the Swedish learner group within 20 minutes of acquisition (word recognition component, ELAN, LAN, P600). German learners only produce late responses: a P600 within 20 minutes and a LAN after sleep consolidation. The surprisingly rapid emergence of early grammatical ERP components (ELAN, LAN) is attributed to less resource-heavy processing outside of violation contexts. Results of paper IV, finally, indicate that memory trace formation, as visible in the word recognition component at ~50 ms, is only possible at the highest level of formal and functional similarity, that is, for words with falling tone in Swedish participants. Together, the findings emphasise the importance of phonological transfer in the initial stages of second language acquisition and suggest that the earlier the processing, the more important the impact of phonological transfer

    Neural systems for auditory perception of lexical tones

    Get PDF
    Previous neuroimaging research on cognitive processing of speech tone has generated dramatically different patterns of findings. Even at the basic perception level, brain mapping studies of lexical tones have yielded inconsistent results. Apart from the data inconsistency problem, experimental materials in past studies of tone perception carried little or minimal lexical semantics, an important dimension that should not be dispensed with because speech tones serve to distinguish lexical meanings. The present study sought to examine the neural correlates of the perception of speech tone using lexically meaningful experimental stimuli. A simple lexical tone perception task was devised in which native Mandarin speakers were asked to judge whether or not the two syllables of an auditorily presented Chinese bisyllabic word had the same tone. We selected bisyllabic words as experimental stimuli because Chinese monosyllables often convey little or very vague meanings due to rampant homophony. We found that the left inferior frontal gyrus, the right middle temporal gyrus and bilateral superior temporal gyri are responsible for basic perception of linguistic pitches. Our interpretation of the data sees the left superior temporal gyrus as engaged in primary acoustic analysis of the auditory stimuli, while the right middle superior temporal gyrus and the left inferior frontal region are involved in both tonal and semantic processing of the language stimuli.postprin

    A Method of Teaching English Speaking Learners to Produce Mandarin-Chinese Tones

    Get PDF
    Learning Mandarin Chinese tones is a big challenge for English speaking learners. The average tonal production accuracy is reported to be about 70 percent for intermediate-level learners and 40 percent for beginning-level Chinese learners. The Chinese tonal proficiency significantly influences the learners\u27 communicative effectiveness, including listening and speaking, but research often overlooks tonal production. This study proposed and tested a novel method of teaching English-speaking learners to pronounce Mandarin Chinese tones. This teaching method includes a Chinese tones bookmark, and a 30--50 minutes in-class training module. The research undertook five cycles of Design-Based Research (DBR) implementations with 31 public school students, adult learners, and Chinese teachers. Two audio recordings, one pre-training and one post-training, were collected and compared through the paired samples t-tests. Interviews, surveys, and class observations were adopted to determine the participants\u27 attitudes toward the training and the teaching model. The results revealed that the designed teaching method was effective to improve the tonal production accuracy of English speaking K-12 children and adult learners. In addition, the results indicated that the participants\u27 attitudes toward the designed method were positive. This study contributes to the current Chinese tonal teaching repertoire and presents a flexible, practical method for teachers to use when instructing students on Chinese tones

    Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages

    Get PDF
    International audienceAutomatic speech processing technologies hold great potential to facilitate the urgent task of documenting the world's languages. The present research aims to explore the application of speech recognition tools to a little-documented language, with a view to facilitating processes of annotation, transcription and linguistic analysis. The target language is Yongning Na (a.k.a. Mosuo), an unwritten Sino-Tibetan language with less than 50,000 speakers. An acoustic model of Na was built using CMU Sphinx. In addition to this 'light' model, trained on a small data set (only 4 hours of speech from 1 speaker), 'heavyweight' models from five national languages (English, French, Chinese, Vietnamese and Khmer) were also applied to the same data. Preliminary results are reported, and perspectives for the long road ahead are outlined

    Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit

    Get PDF
    Automatic speech recognition tools have potential for facilitating language documentation, but in practice these tools remain little-used by linguists for a variety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrating the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic transcription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using Persephone, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manually transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcriptions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide insights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.National Foreign Language Resource Cente
    corecore