635 research outputs found

    MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

    Get PDF
    This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems

    Primacy of mouth over eyes to perceive audiovisual Mandarin lexical tones

    Get PDF
    The visual cues of lexical tones are more implicit and much less investigated than consonants and vowels, and it is still unclear what facial areas contribute to facial tones identification. This study investigated Chinese and English speakers’ eye movements when they were asked to identify audiovisual Mandarin lexical tones. The Chinese and English speakers were presented with an audiovisual clip of Mandarin monosyllables (for instance, /ă/, /à/, /ĭ/, /ì/) and were asked to identify whether the syllables were a dipping tone (/ă/, / ĭ/) or a falling tone (/ à/, /ì/). These audiovisual syllables were presented in clear, noisy and silent (absence of audio signal) conditions. An eye-tracker recorded the participants’ eye movements. Results showed that the participants gazed more at the mouth than the eyes. In addition, when acoustic conditions became adverse, both the Chinese and English speakers increased their gaze duration at the mouth rather than at the eyes. The findings suggested that the mouth is the primary area that listeners utilise in their perception of audiovisual lexical tones. The similar eye movements between the Chinese and English speakers imply that the mouth acts as a perceptual cue that provides articulatory information, as opposed to social and pragmatic information

    Sound Symbolism in Foreign Language Phonological Acquisition

    Get PDF
    The paper aims at investigating the idea of a symbolic nature of sounds and its implications for in the acquisition of foreign language phonology. Firstly, it will present an overview of universal trends in phonetic symbolism, i.e. non-arbitrary representations of a phoneme by specific semantic criteria. Secondly, the results of a preliminary study on different manifestations of sound symbolism including emotionally-loaded representations of phonemes and other synaesthetic associations shall be discussed. Finally, practical pedagogical implications of sound symbolism will be explored and a number of innovative classroom activities involving sound symbolic associations will be presented

    Phonetic complexity affects children’s Mandarin tone production accuracy in disyllabic words: A perceptual study

    Get PDF
    This is the first study to examine the effect of phonetic contexts on children’s lexical tone production. Mandarin tones in disyllabic words produced by forty-four 2- to 6-year-old children and twelve mothers were low-pass filtered to eliminate lexical information. Native Mandarin-speaking adults categorized the tones based on the pitch information in the filtered stimuli. All mothers’ tones were categorized with ceiling accuracy. Counter to the findings in most previous studies on children’s tone acquisition and the prevailing assumption in models of speech development that children acquire suprasegmental features much earlier than segmental features, this study found that children as old as six years of age have not mastered the production of Mandarin tones. Children’s tones were judged with significantly lower accuracy than mothers’ productions. Tone accuracy improved, while cross subject variability in tone accuracy decreased, with age. Children’s tone accuracy was affected by the articulatory complexity of phonetic contexts. Children made more errors in tone combinations with more complex fundamental frequency (F0) contours than tone sequences with simpler F0 changes. When producing disyllabic tone sequences with complex F0 contours, children tended to shift the F0 contour of the first tone to reduce the F0 change, resulting in more tone errors in the first syllable than in the second syllable and showing substantially more anticipatory coarticulation than adults. The results provide further evidence that acquisition of lexical tones is a protracted process in children. Tones produced accurately by children in one phonetic context may not be produced correctly in another phonetic context. Children demonstrate more anticipatory coarticulation in their disyllabic productions than adults, which may be attributed to children’s immature speech motor control in tone production, and is presumably a by-product of their inability to accomplish complex F0 changes within the syllable time-frame.published_or_final_versio

    L2 speech learning of European Portuguese /l/ and /ɾ/ by L1-Mandarin learners: experimental evidence and theoretical modelling

    Get PDF
    It has been long recognized that the poor distinction between /l/ and /ɾ/ is one of the most perceptible characteristics in Chinese-accented Portuguese. Recent empirical research revealed that this notorious L2 speech learning difficulty goes beyond the confusion between two L2 categories, as L1-Mandarin learners’ acquisition of Portuguese /l/ and /ɾ/ seems to be subject to the interaction among different prosodic positions, speech modalities and representational levels. This thesis aims to deepen our current understanding of this L2 speech learning process, by exploring what constrains the development of L2 phonological categories across syllable positions and how different modalities interact during this process. To achieve this goal, both experimental tasks and theoretical modelling were employed. The first study of this thesis explores the role of cross-linguistic influence and orthography on L2 category formation. In order to elicit cross-linguistic influence directly, a delayed-imitation task was performed with L1-Mandarin naïve listeners. This task examined how the Mandarin phonology parses the Portuguese input ([l], [ɾ]) in intervocalic onset and in word-internal coda position. Moreover, whether orthography plays a role during the construction of L2 phonological representation was tested by manipulating the input types that were given in the experiment (auditory input alone vs. auditory + written input). Our study shows that naïve Mandarin listeners’ responses corroborated with that of L1-Mandarin learners, suggesting that cross-linguistic influence is responsible for the observed L2 prosodic effects. Moreover, the Mandarin [ɻ] (a repair strategy for /ɾ/) occurred almost exclusively when the written form was given, providing evidence for the cross-linguistic interaction between phonological categorization and orthography during the construction of L2 categories. In the second study, we first investigate the interaction between speech perception and production in L2 speech learning, by examining whether the L2 deviant productions stem from misperception and whether the order of acquisition in L2 speech perception mirrors that in production. Secondly, we test whether L2 phonological categories remain malleable at a mid-late stage of L2 speech learning. Two perceptual experiments were performed to test L1-Mandarin learners on their discrimination ability between the target Portuguese form and the deviant form employed in L2 production. Expanding on prior research, in this study, the perceptual motivation for L2 speech difficulties was assessed in different syllable constituents (onset and coda) and at both segmental and suprasegmental levels (structural modification). The results demonstrate that some deviant forms observed in L2 production indeed have a perceptual motivation ([w] for the velarised lateral; [l] and [ɾə] for the tap), while some others cannot be attributed to misperception (deletion of syllable-final tap). Furthermore, learners confused the intervocalic /l/ and /ɾ/ bidirectionally in perception, while in production they never misproduced the lateral (/ɾ/ → [l], */l/ → [ɾ]), revealing a mismatch between two speech modalities. By contrast, the order of acquisition (/ɾ/coda > /ɾ/onset) was shown to be consistent in L2 perception and production. The correspondence and discrepancy between the two speech modalities signal a complex relationship between L2 speech perception and production. To assess the plasticity of L2 categories /l/ and /ɾ/, two groups of L1-Mandarin learners who differ substantially in terms of L2 experience were recruited in the perceptual tasks. Our study shows that both groups behaved similarly in terms of the discrimination performance. No evidence for a role of L2 experience was found. The implication of this null result on L2 phonological development is discussed. The third study of the thesis aims to contribute to bridging the gap between the L2 experimental evidence and formal theories. Adopting the Bidirectional Phonology and Phonetics Model, we formalise some of the experimental findings that cannot be elucidated by current L2 speech theories, namely, the between and within-subject variation in L2 phonological categorization; the interaction between phonological categorization and orthography during L2 category construction; and the asymmetry between L2 perception and production. Overall, this thesis sheds light on the complex nature of L2 phonological acquisition and provides a formal account of how different modalities interact in shaping L2 speech learning. Moreover, it puts forward testable predictions for future research and suggestions for improving foreign language teaching/training methodologies.É bem conhecido o facto de as trocas associadas a /l/ e /ɾ/ constituírem uma das caraterísticas mais percetíveis no português articulado pelos aprendentes chineses. Recentemente, estudos empíricos revelam que a dificuldade por parte dos aprendentes chineses não se restringe à discriminação moderada entre as duas categorias da L2, dado que a aquisição de /l/ e /ɾ/ do português por aprendentes chineses parece estar sujeita à interação entre contextos prosódicos, entre modalidades de fala e entre níveis representacionais diferentes. Esta tese visa aprofundar a nossa compreensão deste processo da aquisição fonológica L2, explorando o que condiciona o desenvolvimento das categorias fonológicas L2 em diferentes constituintes silábicos e de que modo as modalidades interagem durante este processo, recorrendo para tal a tarefas experimentais bem como a formalização teórica. O primeiro estudo averigua o papel da influência interlinguística e o da ortografia na construção das categorias de L2. Para elicitar a influência interlinguística diretamente, uma tarefa de imitação retardada foi aplicada aos falantes nativos do mandarim sem conhecimento de português, investigando assim como a fonologia do mandarim categoriza o input do português ([l], [ɾ]) em ataque simples intervocálico e em coda medial. Para além disso, a influência ortográfica na construção de representações fonológicas em L2 foi examinada através da manipulação do tipo do input apresentado na experiência (input auditivo vs. input auditivo + ortográfico). Os resultados da situação experimental em que os participantes receberam input de ambos os tipos replicaram o efeito prosódico observado na literatura, evidenciando a interação entre categorização fonológica e ortografia na construção das categorias de L2. No segundo estudo, investigamos a interação entre a perceção e a produção de fala na aquisição das líquidas do PE por aprendentes chineses e a plasticidade destas categorias fonológicas, respondendo às questões seguintes: 1) as produções desviantes de L2 resultam da perceção incorreta? 2) a ordem da aquisição em L2 é consistente na perceção e na produção? 3) as categorias da L2 permanecem maleáveis numa fase intermédia da aquisição? Duas tarefas percetivas foram conduzidas para testar a capacidade percetiva dos aprendentes nativos do mandarim em relação à discriminação entre a forma alvo do português e as formas desviantes utilizadas na produção. No presente estudo, a motivação percetiva das dificuldades em L2 foi testada nos constituintes silábicos diferentes (ataque simples e coda) e nos níveis segmental e suprassegmental (modificação estrutural). Os resultados demonstram que algumas formas desviantes que os aprendentes chineses produzem têm uma motivação percetiva (i.e. [w] para a lateral velarizada; [l] e [ɾə] para a vibrante alveolar), enquanto outras não podem ser analisadas como casos de perceção incorreta (como é o caso do o apagamento da vibrante em coda). Para além disso, na posição intervocálica, os aprendentes manifestam dificuldade na discriminação entre /l/ e /ɾ/ de forma bidirecional, mas, na produção, a lateral nunca é produzida incorretamente (/ɾ/ → [l], */l/ → [ɾ]). Tal revela uma divergência entre as duas modalidades de fala. Por contraste, mostrou-se que a ordem da aquisição (/ɾ/coda > /ɾ/ataque) é consistente na perceção e na produção da L2. A correspondência e a discrepância entre as duas modalidades de fala, sinalizam uma relação complexa entre a perceção e a produção na aquisição fonológica de L2. Em relação à questão da plasticidade das categorias de L2, recrutaram-se para as tarefas percetivas dois grupos de aprendentes nativos do mandarim que se diferenciavam substancialmente em termos da experiência em L2. Não se encontrou um efeito significativo da experiência da L2. A implicação deste resultado nulo no desenvolvimento fonológico de L2 foi discutida. O terceiro estudo desta tese tem como objetivo contribuir para a colmatação das lacunas entre estudos empíricos de L2 e as teorias formais. Adotando o Modelo Bidirecional de Fonologia e Fonética, formalizamos os resultados experimentais que as teorias atuais da aquisição fonológica de L2 não conseguem explicar, nomeadamente, a variação inter e intra-sujeitos na categorização fonológica em L2; a interação entre categorização fonológica e ortografia na construção das categorias na L2; a assimetria entre a perceção e a produção na L2. Em suma, esta tese contribui com dados empíricos para a discussão da relação complexa entre a perceção, produção e ortografia na aquisição fonológica de L2 e formaliza a interação entre essas modalidades através de um modelo linguístico generativo. Além disso, apresentam-se predições testáveis para investigação futura e sugestões para o aperfeiçoamento das metodologias de ensino/treino da língua não materna

    A syllable-based investigation of coarticulation

    Get PDF
    Coarticulation has been long investigated in Speech Sciences and Linguistics (Kühnert & Nolan, 1999). This thesis explores coarticulation through a syllable based model (Y. Xu, 2020). First, it is hypothesised that consonant and vowel are synchronised at the syllable onset for the sake of reducing temporal degrees of freedom, and such synchronisation is the essence of coarticulation. Previous efforts in the examination of CV alignment mainly report onset asynchrony (Gao, 2009; Shaw & Chen, 2019). The first study of this thesis tested the synchrony hypothesis using articulatory and acoustic data in Mandarin. Departing from conventional approaches, a minimal triplet paradigm was applied, in which the CV onsets were determined through the consonant and vowel minimal pairs, respectively. Both articulatory and acoustical results showed that CV articulation started in close temporal proximity, supporting the synchrony hypothesis. The second study extended the research to English and syllables with cluster onsets. By using acoustic data in conjunction with Deep Learning, supporting evidence was found for co-onset, which is in contrast to the widely reported c-center effect (Byrd, 1995). Secondly, the thesis investigated the mechanism that can maximise synchrony – Dimension Specific Sequential Target Approximation (DSSTA), which is highly relevant to what is commonly known as coarticulation resistance (Recasens & Espinosa, 2009). Evidence from the first two studies show that, when conflicts arise due to articulation requirements between CV, the CV gestures can be fulfilled by the same articulator on separate dimensions simultaneously. Last but not least, the final study tested the hypothesis that resyllabification is the result of coarticulation asymmetry between onset and coda consonants. It was found that neural network based models could infer syllable affiliation of consonants, and those inferred resyllabified codas had similar coarticulatory structure with canonical onset consonants. In conclusion, this thesis found that many coarticulation related phenomena, including local vowel to vowel anticipatory coarticulation, coarticulation resistance, and resyllabification, stem from the articulatory mechanism of the syllable

    The Acoustic Features and Didactic Function of Foreigner-Directed Speech: A Scoping Review

    Get PDF
    Published online: Aug 1, 2022Purpose: This scoping review considers the acoustic features of a clear speech register directed to nonnative listeners known as foreigner-directed speech (FDS). We identify vowel hyperarticulation and low speech rate as the most representative acoustic features of FDS; other features, including wide pitch range and high intensity, are still under debate. We also discuss factors that may influence the outcomes and characteristics of FDS. We start by examining accommodation theories, outlining the reasons why FDS is likely to serve a didactic function by helping listeners acquire a second language (L2). We examine how this speech register adapts to listeners’ identities and linguistic needs, suggesting that FDS also takes listeners’ L2 proficiency into account. To confirm the didactic function of FDS, we compare it to other clear speech registers, specifically infant-directed speech and Lombard speech. Conclusions: Our review reveals that research has not yet established whether FDS succeeds as a didactic tool that supports L2 acquisition. Moreover, a complex set of factors determines specific realizations of FDS, which need further exploration. We conclude by summarizing open questions and indicating directions and recommendations for future research.This research was supported by a Doctoral Fellowship (LCF/BQ/DI19/11730045) from “La Caixa” Foundation (ID 100010434) awarded to Giorgio Piazza and by the Spanish Ministry of Science and Innovation through the Ramon y Cajal Research Fellowship (RYC2018-024284-I) awarded to Marina Kalashnikova. This research was supported by the Basque Government through the BERC 2022-2025 program and by the Spanish State Research Agency through BCBL Severo Ochoa excellence accreditation CEX2020-001010-S. This research was also supported by the Spanish Ministry of Economy and Competitiveness (PID2020-113926GB-I00 awarded to Clara D. Martin) and by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement 819093 awarded to Clara D. Martin)
    corecore