635 research outputs found
MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH
This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems
Primacy of mouth over eyes to perceive audiovisual Mandarin lexical tones
The visual cues of lexical tones are more implicit and much less investigated than consonants and vowels, and it is still unclear what facial areas contribute to facial tones identification. This study investigated Chinese and English speakers’ eye movements when they were asked to identify audiovisual Mandarin lexical tones. The Chinese and English speakers were presented with an audiovisual clip of Mandarin monosyllables (for instance, /ă/, /à/, /ĭ/, /ì/) and were asked to identify whether the syllables were a dipping tone (/ă/, / ĭ/) or a falling tone (/ à/, /ì/). These audiovisual syllables were presented in clear, noisy and silent (absence of audio signal) conditions. An eye-tracker recorded the participants’ eye movements. Results showed that the participants gazed more at the mouth than the eyes. In addition, when acoustic conditions became adverse, both the Chinese and English speakers increased their gaze duration at the mouth rather than at the eyes. The findings suggested that the mouth is the primary area that listeners utilise in their perception of audiovisual lexical tones. The similar eye movements between the Chinese and English speakers imply that the mouth acts as a perceptual cue that provides articulatory information, as opposed to social and pragmatic information
Sound Symbolism in Foreign Language Phonological Acquisition
The paper aims at investigating the idea of a symbolic nature of sounds and its implications for in the acquisition of foreign language phonology. Firstly, it will present an overview of universal trends in phonetic symbolism, i.e. non-arbitrary representations of a phoneme by specific semantic criteria. Secondly, the results of a preliminary study on different manifestations of sound symbolism including emotionally-loaded representations of phonemes and other synaesthetic associations shall be discussed. Finally, practical pedagogical implications of sound symbolism will be explored and a number of innovative classroom activities involving sound symbolic associations will be presented
Phonetic complexity affects children’s Mandarin tone production accuracy in disyllabic words: A perceptual study
This is the first study to examine the effect of phonetic contexts on children’s lexical tone production. Mandarin tones in disyllabic words produced by forty-four 2- to 6-year-old children and twelve mothers were low-pass filtered to eliminate lexical information. Native Mandarin-speaking adults categorized the tones based on the pitch information in the filtered stimuli. All mothers’ tones were categorized with ceiling accuracy. Counter to the findings in most previous studies on children’s tone acquisition and the prevailing assumption in models of speech development that children acquire suprasegmental features much earlier than segmental features, this study found that children as old as six years of age have not mastered the production of Mandarin tones. Children’s tones were judged with significantly lower accuracy than mothers’ productions. Tone accuracy improved, while cross subject variability in tone accuracy decreased, with age. Children’s tone accuracy was affected by the articulatory complexity of phonetic contexts. Children made more errors in tone combinations with more complex fundamental frequency (F0) contours than tone sequences with simpler F0 changes. When producing disyllabic tone sequences with complex F0 contours, children tended to shift the F0 contour of the first tone to reduce the F0 change, resulting in more tone errors in the first syllable than in the second syllable and showing substantially more anticipatory coarticulation than adults. The results provide further evidence that acquisition of lexical tones is a protracted process in children. Tones produced accurately by children in one phonetic context may not be produced correctly in another phonetic context. Children demonstrate more anticipatory coarticulation in their disyllabic productions than adults, which may be attributed to children’s immature speech motor control in tone production, and is presumably a by-product of their inability to accomplish complex F0 changes within the syllable time-frame.published_or_final_versio
L2 speech learning of European Portuguese /l/ and /ɾ/ by L1-Mandarin learners: experimental evidence and theoretical modelling
It has been long recognized that the poor distinction between /l/ and /ɾ/ is one
of the most perceptible characteristics in Chinese-accented Portuguese. Recent
empirical research revealed that this notorious L2 speech learning difficulty
goes beyond the confusion between two L2 categories, as L1-Mandarin learners’
acquisition of Portuguese /l/ and /ɾ/ seems to be subject to the interaction
among different prosodic positions, speech modalities and representational
levels. This thesis aims to deepen our current understanding of this L2 speech
learning process, by exploring what constrains the development of L2
phonological categories across syllable positions and how different modalities
interact during this process. To achieve this goal, both experimental tasks and
theoretical modelling were employed.
The first study of this thesis explores the role of cross-linguistic influence
and orthography on L2 category formation. In order to elicit cross-linguistic
influence directly, a delayed-imitation task was performed with L1-Mandarin
naïve listeners. This task examined how the Mandarin phonology parses the
Portuguese input ([l], [ɾ]) in intervocalic onset and in word-internal coda
position. Moreover, whether orthography plays a role during the construction
of L2 phonological representation was tested by manipulating the input types
that were given in the experiment (auditory input alone vs. auditory + written
input). Our study shows that naïve Mandarin listeners’ responses corroborated
with that of L1-Mandarin learners, suggesting that cross-linguistic influence is
responsible for the observed L2 prosodic effects. Moreover, the Mandarin [ɻ] (a
repair strategy for /ɾ/) occurred almost exclusively when the written form was
given, providing evidence for the cross-linguistic interaction between
phonological categorization and orthography during the construction of L2
categories.
In the second study, we first investigate the interaction between speech
perception and production in L2 speech learning, by examining whether the L2
deviant productions stem from misperception and whether the order of
acquisition in L2 speech perception mirrors that in production. Secondly, we
test whether L2 phonological categories remain malleable at a mid-late stage of
L2 speech learning. Two perceptual experiments were performed to test L1-Mandarin learners on their discrimination ability between the target
Portuguese form and the deviant form employed in L2 production. Expanding
on prior research, in this study, the perceptual motivation for L2 speech
difficulties was assessed in different syllable constituents (onset and coda) and
at both segmental and suprasegmental levels (structural modification). The
results demonstrate that some deviant forms observed in L2 production indeed
have a perceptual motivation ([w] for the velarised lateral; [l] and [ɾə] for the
tap), while some others cannot be attributed to misperception (deletion of
syllable-final tap). Furthermore, learners confused the intervocalic /l/ and /ɾ/
bidirectionally in perception, while in production they never misproduced the
lateral (/ɾ/ → [l], */l/ → [ɾ]), revealing a mismatch between two speech
modalities. By contrast, the order of acquisition (/ɾ/coda > /ɾ/onset) was shown to
be consistent in L2 perception and production. The correspondence and
discrepancy between the two speech modalities signal a complex relationship
between L2 speech perception and production. To assess the plasticity of L2
categories /l/ and /ɾ/, two groups of L1-Mandarin learners who differ
substantially in terms of L2 experience were recruited in the perceptual tasks.
Our study shows that both groups behaved similarly in terms of the
discrimination performance. No evidence for a role of L2 experience was found.
The implication of this null result on L2 phonological development is discussed.
The third study of the thesis aims to contribute to bridging the gap between
the L2 experimental evidence and formal theories. Adopting the Bidirectional
Phonology and Phonetics Model, we formalise some of the experimental
findings that cannot be elucidated by current L2 speech theories, namely, the
between and within-subject variation in L2 phonological categorization; the
interaction between phonological categorization and orthography during L2
category construction; and the asymmetry between L2 perception and
production.
Overall, this thesis sheds light on the complex nature of L2 phonological
acquisition and provides a formal account of how different modalities interact
in shaping L2 speech learning. Moreover, it puts forward testable predictions
for future research and suggestions for improving foreign language
teaching/training methodologies.É bem conhecido o facto de as trocas associadas a /l/ e /ɾ/ constituírem uma
das caraterísticas mais percetíveis no português articulado pelos aprendentes
chineses. Recentemente, estudos empíricos revelam que a dificuldade por parte
dos aprendentes chineses não se restringe à discriminação moderada entre as
duas categorias da L2, dado que a aquisição de /l/ e /ɾ/ do português por
aprendentes chineses parece estar sujeita à interação entre contextos
prosódicos, entre modalidades de fala e entre níveis representacionais
diferentes. Esta tese visa aprofundar a nossa compreensão deste processo da
aquisição fonológica L2, explorando o que condiciona o desenvolvimento das
categorias fonológicas L2 em diferentes constituintes silábicos e de que modo
as modalidades interagem durante este processo, recorrendo para tal a tarefas
experimentais bem como a formalização teórica.
O primeiro estudo averigua o papel da influência interlinguística e o da
ortografia na construção das categorias de L2. Para elicitar a influência
interlinguística diretamente, uma tarefa de imitação retardada foi aplicada aos
falantes nativos do mandarim sem conhecimento de português, investigando
assim como a fonologia do mandarim categoriza o input do português ([l], [ɾ])
em ataque simples intervocálico e em coda medial. Para além disso, a influência
ortográfica na construção de representações fonológicas em L2 foi examinada
através da manipulação do tipo do input apresentado na experiência (input
auditivo vs. input auditivo + ortográfico). Os resultados da situação
experimental em que os participantes receberam input de ambos os tipos
replicaram o efeito prosódico observado na literatura, evidenciando a interação
entre categorização fonológica e ortografia na construção das categorias de L2.
No segundo estudo, investigamos a interação entre a perceção e a produção
de fala na aquisição das líquidas do PE por aprendentes chineses e a
plasticidade destas categorias fonológicas, respondendo às questões seguintes:
1) as produções desviantes de L2 resultam da perceção incorreta? 2) a ordem
da aquisição em L2 é consistente na perceção e na produção? 3) as categorias
da L2 permanecem maleáveis numa fase intermédia da aquisição? Duas tarefas
percetivas foram conduzidas para testar a capacidade percetiva dos
aprendentes nativos do mandarim em relação à discriminação entre a forma
alvo do português e as formas desviantes utilizadas na produção. No presente
estudo, a motivação percetiva das dificuldades em L2 foi testada nos constituintes silábicos diferentes (ataque simples e coda) e nos níveis segmental e suprassegmental (modificação estrutural). Os resultados demonstram que algumas formas desviantes que os aprendentes chineses produzem têm uma
motivação percetiva (i.e. [w] para a lateral velarizada; [l] e [ɾə] para a vibrante
alveolar), enquanto outras não podem ser analisadas como casos de perceção
incorreta (como é o caso do o apagamento da vibrante em coda). Para além
disso, na posição intervocálica, os aprendentes manifestam dificuldade na
discriminação entre /l/ e /ɾ/ de forma bidirecional, mas, na produção, a lateral
nunca é produzida incorretamente (/ɾ/ → [l], */l/ → [ɾ]). Tal revela uma
divergência entre as duas modalidades de fala. Por contraste, mostrou-se que a
ordem da aquisição (/ɾ/coda > /ɾ/ataque) é consistente na perceção e na produção
da L2. A correspondência e a discrepância entre as duas modalidades de fala,
sinalizam uma relação complexa entre a perceção e a produção na aquisição
fonológica de L2. Em relação à questão da plasticidade das categorias de L2,
recrutaram-se para as tarefas percetivas dois grupos de aprendentes nativos do
mandarim que se diferenciavam substancialmente em termos da experiência
em L2. Não se encontrou um efeito significativo da experiência da L2. A
implicação deste resultado nulo no desenvolvimento fonológico de L2 foi
discutida.
O terceiro estudo desta tese tem como objetivo contribuir para a
colmatação das lacunas entre estudos empíricos de L2 e as teorias formais.
Adotando o Modelo Bidirecional de Fonologia e Fonética, formalizamos os
resultados experimentais que as teorias atuais da aquisição fonológica de L2
não conseguem explicar, nomeadamente, a variação inter e intra-sujeitos na
categorização fonológica em L2; a interação entre categorização fonológica e
ortografia na construção das categorias na L2; a assimetria entre a perceção e a
produção na L2.
Em suma, esta tese contribui com dados empíricos para a discussão da
relação complexa entre a perceção, produção e ortografia na aquisição
fonológica de L2 e formaliza a interação entre essas modalidades através de um
modelo linguístico generativo. Além disso, apresentam-se predições testáveis
para investigação futura e sugestões para o aperfeiçoamento das metodologias
de ensino/treino da língua não materna
A syllable-based investigation of coarticulation
Coarticulation has been long investigated in Speech Sciences and Linguistics (Kühnert &
Nolan, 1999). This thesis explores coarticulation through a syllable based model (Y. Xu,
2020). First, it is hypothesised that consonant and vowel are synchronised at the syllable
onset for the sake of reducing temporal degrees of freedom, and such synchronisation
is the essence of coarticulation. Previous efforts in the examination of CV alignment
mainly report onset asynchrony (Gao, 2009; Shaw & Chen, 2019). The first study of this
thesis tested the synchrony hypothesis using articulatory and acoustic data in Mandarin.
Departing from conventional approaches, a minimal triplet paradigm was applied, in
which the CV onsets were determined through the consonant and vowel minimal pairs,
respectively. Both articulatory and acoustical results showed that CV articulation started
in close temporal proximity, supporting the synchrony hypothesis. The second study
extended the research to English and syllables with cluster onsets. By using acoustic data
in conjunction with Deep Learning, supporting evidence was found for co-onset, which
is in contrast to the widely reported c-center effect (Byrd, 1995). Secondly, the thesis
investigated the mechanism that can maximise synchrony – Dimension Specific Sequential
Target Approximation (DSSTA), which is highly relevant to what is commonly known
as coarticulation resistance (Recasens & Espinosa, 2009). Evidence from the first two studies show that, when conflicts arise due to articulation requirements between CV, the
CV gestures can be fulfilled by the same articulator on separate dimensions simultaneously.
Last but not least, the final study tested the hypothesis that resyllabification is the result of
coarticulation asymmetry between onset and coda consonants. It was found that neural
network based models could infer syllable affiliation of consonants, and those inferred
resyllabified codas had similar coarticulatory structure with canonical onset consonants. In
conclusion, this thesis found that many coarticulation related phenomena, including local
vowel to vowel anticipatory coarticulation, coarticulation resistance, and resyllabification,
stem from the articulatory mechanism of the syllable
The Acoustic Features and Didactic Function of Foreigner-Directed Speech: A Scoping Review
Published online: Aug 1, 2022Purpose: This scoping review considers the acoustic features of a clear
speech register directed to nonnative listeners known as foreigner-directed
speech (FDS). We identify vowel hyperarticulation and low speech rate as the
most representative acoustic features of FDS; other features, including wide
pitch range and high intensity, are still under debate. We also discuss factors
that may influence the outcomes and characteristics of FDS. We start by
examining accommodation theories, outlining the reasons why FDS is likely
to serve a didactic function by helping listeners acquire a second language
(L2). We examine how this speech register adapts to listeners’ identities and
linguistic needs, suggesting that FDS also takes listeners’ L2 proficiency into
account. To confirm the didactic function of FDS, we compare it to other
clear speech registers, specifically infant-directed speech and Lombard
speech.
Conclusions: Our review reveals that research has not yet established whether
FDS succeeds as a didactic tool that supports L2 acquisition. Moreover, a complex
set of factors determines specific realizations of FDS, which need further
exploration. We conclude by summarizing open questions and indicating directions
and recommendations for future research.This research was supported by a Doctoral Fellowship
(LCF/BQ/DI19/11730045) from “La Caixa” Foundation
(ID 100010434) awarded to Giorgio Piazza and by the
Spanish Ministry of Science and Innovation through the
Ramon y Cajal Research Fellowship (RYC2018-024284-I)
awarded to Marina Kalashnikova. This research was supported
by the Basque Government through the BERC
2022-2025 program and by the Spanish State Research
Agency through BCBL Severo Ochoa excellence accreditation
CEX2020-001010-S. This research was also supported
by the Spanish Ministry of Economy and Competitiveness
(PID2020-113926GB-I00 awarded to Clara D. Martin)
and by the European Research Council under the European
Union’s Horizon 2020 research and innovation programme
(Grant Agreement 819093 awarded to Clara D.
Martin)
- …