399 research outputs found

    A review of data collection practices using electromagnetic articulography

    Get PDF
    This paper reviews data collection practices in electromagnetic articulography (EMA) studies, with a focus on sensor placement. It consists of three parts: in the first part, we introduce electromagnetic articulography as a method. In the second part, we focus on existing data collection practices. Our overview is based on a literature review of 905 publications from a large variety of journals and conferences, identified through a systematic keyword search in Google Scholar. The review shows that experimental designs vary greatly, which in turn may limit researchers' ability to compare results across studies. In the third part of this paper we describe an EMA data collection procedure which includes an articulatory-driven strategy for determining where to position sensors on the tongue without causing discomfort to the participant. We also evaluate three approaches for preparing (NDI Wave) EMA sensors reported in the literature with respect to the duration the sensors remain attached to the tongue: 1) attaching out-of-the-box sensors, 2) attaching sensors coated in latex, and 3) attaching sensors coated in latex with an additional latex flap. Results indicate no clear general effect of sensor preparation type on adhesion duration. A subsequent exploratory analysis reveals that sensors with the additional flap tend to adhere for shorter times than the other two types, but that this pattern is inverted for the most posterior tongue sensor

    Can visual feedback improve English speakers' Mandarin tone production?

    Full text link
    Non-native tones are considered challenging for adult second language speakers to perceive and produce. The current study examined the effect of a laboratory-based intensive training in improving American English speakers’ tone production. Participants’ task was to repeat Mandarin words after the model. There were two conditions in the experiment: in one condition, participants did not get any external feedback; whereas in the other condition, participants received detailed visual feedback, which was the pitch contour of their tone production alongside the native version. Eight participants completed training with no feedback and another eight participants were trained with visual feedback. Results revealed that participants in both groups did not improve their tone production after training, and participants trained with visual feedback did not show more improvement than those trained with no feedback. Given the lack of improvement in participants’ tone production after training, methodological and theoretical limitations with respect to the use of a repetition-based training paradigm are discussed

    Restructuring multimodal corrective feedback through Augmented Reality (AR)-enabled videoconferencing in L2 pronunciation teaching

    Get PDF
    The problem of cognitive overload is particularly pertinent in multimedia L2 classroom corrective feedback (CF), which involves rich communicative tools to help the class to notice the mismatch between the target input and learners’ pronunciation. Based on multimedia design principles, this study developed a new multimodal CF model through augmented reality (AR)-enabled videoconferencing to eliminate extraneous cognitive load and guide learners’ attention to the essential material. Using a quasi-experimental design, this study aims to examine the effectiveness of this new CF model in improving Chinese L2 students’ segmental production and identification of the targeted English consonants (dark /É«/, /Ă°/and /Ξ/), as well as their attitudes towards this application. Results indicated that the online multimodal CF environment equipped with AR annotation and filters played a significant role in improving the participants’ production of the target segments. However, this advantage was not found in the auditory identification tests compared to the offline CF multimedia class. In addition, the learners reported that the new CF model helped to direct their attention to the articulatory gestures of the student being corrected, and enhance the class efficiency. Implications for computer-assisted pronunciation training and the construction of online/offline multimedia learning environments are also discussed

    Speaker Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

    Lexical Stress Realization in Mandarin Second Language Learners of English: An Acoustic and Articulatory Study

    Full text link
    This dissertation investigated the acoustic and articulatory correlates of lexical stress in Mandarin second language (L2) learners of English, as well as in first language (L1) speakers. The present study used a minimal pair respective to stress location (e.g., OBject versus obJECT) obtained from a publicly available Mandarin Accented English Electromagnetic articulography corpus dataset. In the acoustic domain, the use of acoustic parameters (duration, intensity, F0, and vowel quality) was measured in stressed and unstressed vowels. In the articulatory domain, the positional information from tongue tip (TT), tongue dorsum (TD), upper lip (UL), lower lip (LL), and jaw (JAW) were retrieved from the concurrent vowel data. Finally, the acoustic and articulatory correlation was computed and compared both within and across groups. The acoustic analysis demonstrated that L2 speakers significantly differentiated the stressed vowels from the unstressed vowels using all suprasegmental cues, while vowel quality was extremely limitedly used in the L2 group. In the articulatory analysis, Mandarin L2 speakers demonstrated the extremely limited lexical stress effect. A significant difference as a function of lexical stress was noted only in the vertical dimension of low-back vowels. The acoustic and articulatory correlation results revealed a relatively weaker correlation in L2 speakers than in L1 speakers. In the L2 group, certain articulators such as TD and the JAW demonstrated a stronger correlation than LL and TT

    MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

    Get PDF
    This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems
    • 

    corecore