Search CORE

9 research outputs found

Visual timing information in audiovisual speech perception: evidence from lexical tone contour

Author: Wang Rui
Xie Hui
Zeng Biao
Publication venue
Publication date: 02/09/2018
Field of study

Crossref

University of South Wales Research Explorer

Visual timing information in audiovisual speech perception: evidence from lexical tone contour

Author: Wang Rui
Xie Hui
Zeng Biao
Publication venue
Publication date: 02/09/2018
Field of study

University of South Wales Research Explorer

Primacy of mouth over eyes to perceive audiovisual Mandarin lexical tones

Author: Hasshim Nabil
Hong Shanhu
Yu Guoxing
Zeng Biao
Publication venue
Publication date: 29/11/2023
Field of study

The visual cues of lexical tones are more implicit and much less investigated than consonants and vowels, and it is still unclear what facial areas contribute to facial tones identification. This study investigated Chinese and English speakers’ eye movements when they were asked to identify audiovisual Mandarin lexical tones. The Chinese and English speakers were presented with an audiovisual clip of Mandarin monosyllables (for instance, /ă/, /à/, /ĭ/, /ì/) and were asked to identify whether the syllables were a dipping tone (/ă/, / ĭ/) or a falling tone (/ à/, /ì/). These audiovisual syllables were presented in clear, noisy and silent (absence of audio signal) conditions. An eye-tracker recorded the participants’ eye movements. Results showed that the participants gazed more at the mouth than the eyes. In addition, when acoustic conditions became adverse, both the Chinese and English speakers increased their gaze duration at the mouth rather than at the eyes. The findings suggested that the mouth is the primary area that listeners utilise in their perception of audiovisual lexical tones. The similar eye movements between the Chinese and English speakers imply that the mouth acts as a perceptual cue that provides articulatory information, as opposed to social and pragmatic information

University of Salford Institutional Repository

University of South Wales Research Explorer

Journal of Eye Movement Research

BOP Serials

Explore Bristol Research

Training Children to Perceive Non-native Lexical Tones: Tone Language Background, Bilingualism, and Auditory-Visual Information

Author: Allard Jongman
Allard Jongman
Benjawan Kasisopa
Denis Burnham
Joan A. Sereno
Joan A. Sereno
Lamya El-Khoury Antonios
Publication venue: 'Frontiers Media SA'
Publication date: 01/09/2018
Field of study

This study investigates the role of language background and bilingual status in the perception of foreign lexical tones. Eight groups of participants, consisting of children of 6 and 8 years from one of four language background (tone or non-tone) × bilingual status (monolingual or bilingual)—Thai monolingual, English monolingual, English-Thai bilingual, and English-Arabic bilingual were trained to perceive the four Mandarin lexical tones. Half the children in each of these eight groups were given auditory-only (AO) training and half auditory-visual (AV) training. In each group Mandarin tone identification was tested before and after (pre- and post-) training with both auditory-only test (ao-test) and auditory-visual test (av test). The effect of training on Mandarin tone identification was minimal for 6-year-olds. On the other hand, 8-year-olds, particularly those with tone language experience showed greater pre- to post-training improvement, and this was best indexed by ao-test trials. Bilingual vs. monolingual background did not facilitate overall improvement due to training, but it did modulate the efficacy of the Training mode: for bilinguals both AO and AV training, and especially AO, resulted in performance gain; but for monolinguals training was most effective with AV stimuli. Again this effect was best indexed by ao-test trials. These results suggest that tone language experience, be it monolingual or bilingual, is a strong predictor of learning unfamiliar tones; that monolinguals learn best from AV training trials and bilinguals from AO training trials; and that there is no metalinguistic advantage due to bilingualism in learning to perceive lexical tones

Directory of Open Access Journals

Chinese Tones: Can You Listen With Your Eyes?:The Influence of Visual Information on Auditory Perception of Chinese Tones

Author: Han Yueqiao
Publication venue: [s.n.]
Publication date: 01/01/2021
Field of study

CHINESE TONES: CAN YOU LISTEN WITH YOUR EYES? The Influence of Visual Information on Auditory Perception of Chinese Tones YUEQIAO HAN Summary Considering the fact that more than half of the languages spoken in the world (60%-70%) are so-called tone languages (Yip, 2002), and tone is notoriously difficult to learn for westerners, this dissertation focused on tone perception in Mandarin Chinese by tone-naïve speakers. Moreover, it has been shown that speech perception is more than just an auditory phenomenon, especially in situations when the speaker’s face is visible. Therefore, the aim of this dissertation is to also study the value of visual information (over and above that of acoustic information) in Mandarin tone perception for tone-naïve perceivers, in combination with other contextual (such as speaking style) and individual factors (such as musical background). Consequently, this dissertation assesses the relative strength of acoustic and visual information in tone perception and tone classification. In the first two empirical and exploratory studies in Chapter 2 and 3 , we set out to investigate to what extent tone-naïve perceivers are able to identify Mandarin Chinese tones in isolated words, and whether or not they can benefit from (seeing) the speakers’ face, and what the contribution is of a hyperarticulated speaking style, and/or their own musical experience. Respectively, in Chapter 2 we investigated the effect of visual cues (comparing audio-only with audio-visual presentations) and speaking style (comparing a natural speaking style with a teaching speaking style) on the perception of Mandarin tones by tone-naïve listeners, looking both at the relative strength of these two factors and their possible interactions; Chapter 3 was concerned with the effects of musicality of the participants (combined with modality) on Mandarin tone perception. In both of these studies, a Mandarin Chinese tone identification experiment was conducted: native speakers of a non-tonal language were asked to distinguish Mandarin Chinese tones based on audio (-only) or video (audio-visual) materials. In order to include variations, the experimental stimuli were recorded using four different speakers in imagined natural and teaching speaking scenarios. The proportion of correct responses (and average reaction times) of the participants were reported. The tone identification experiment presented in Chapter 2 showed that the video conditions (audio-visual natural and audio-visual teaching) resulted in an overall higher accuracy in tone perception than the auditory-only conditions (audio-only natural and audio-only teaching), but no better performance was observed in the audio-visual conditions in terms of reaction time, compared to the auditory-only conditions. Teaching style turned out to make no difference on the speed or accuracy of Mandarin tone perception (as compared to a natural speaking style). Further on, we presented the same experimental materials and procedure in Chapter 3 , but now with musicians and non-musicians as participants. The Goldsmith Musical Sophistication Index (Gold-MSI) was used to assess the musical aptitude of the participants. The data showed that overall, musicians outperformed non-musicians in the tone identification task in both auditory-visual and auditory-only conditions. Both groups identified tones more accurately in the auditory-visual conditions than in the auditory-only conditions. These results provided further evidence for the view that the availability of visual cues along with auditory information is useful for people who have no knowledge of Mandarin Chinese tones when they need to learn to identify these tones. Out of all the musical skills measured by Gold-MSI, the amount of musical training was the only predictor that had an impact on the accuracy of Mandarin tone perception. These findings suggest that learning to perceive Mandarin tones benefits from musical expertise, and visual information can facilitate Mandarin tone identification, but mainly for tone-naïve non-musicians. In addition, performance differed by tone: musicality improves accuracy for every tone; some tones are easier to identify than others: in particular, the identification of tone 3 (a low-falling-rising) proved to be the easiest, while tone 4 (a high-falling tone) was the most difficult to identify for all participants. The results of the first two experiments presented in chapters 2 and 3 showed that adding visual cues to clear auditory information facilitated the tone identification for tone-naïve perceivers (there is a significantly higher accuracy in audio-visual condition(s) than in auditory-only condition(s)). This visual facilitation was unaffected by the presence of (hyperarticulated) speaking style or the musical skill of the participants. Moreover, variations in speakers and tones had effects on the accurate identification of Mandarin tones by tone-naïve perceivers. In Chapter 4 , we compared the relative contribution of auditory and visual information during Mandarin Chinese tone perception. More specifically, we aimed to answer two questions: firstly, whether or not there is audio-visual integration at the tone level (i.e., we explored perceptual fusion between auditory and visual information). Secondly, we studied how visual information affects tone perception for native speakers and non-native (tone-naïve) speakers. To do this, we constructed various tone combinations of congruent (e.g., an auditory tone 1 paired with a visual tone 1, written as AxVx) and incongruent (e.g., an auditory tone 1 paired with a visual tone 2, written as AxVy) auditory-visual materials and presented them to native speakers of Mandarin Chinese and speakers of tone-naïve languages. Accuracy, defined as the percentage correct identification of a tone based on its auditory realization, was reported. When comparing the relative contribution of auditory and visual information during Mandarin Chinese tone perception with congruent and incongruent auditory and visual Chinese material for native speakers of Chinese and non-tonal languages, we found that visual information did not significantly contribute to the tone identification for native speakers of Mandarin Chinese. When there is a discrepancy between visual cues and acoustic information, (native and tone-naïve) participants tend to rely more on the auditory input than on the visual cues. Unlike the native speakers of Mandarin Chinese, tone-naïve participants were significantly influenced by the visual information during their auditory-visual integration, and they identified tones more accurately in congruent stimuli than in incongruent stimuli. In line with our previous work, the tone confusion matrix showed that tone identification varies with individual tones, with tone 3 (the low-dipping tone) being the easiest one to identify, whereas tone 4 (the high-falling tone) was the most difficult one. The results did not show evidence for auditory-visual integration among native participants, while visual information was helpful for tone-naïve participants. However, even for this group, visual information only marginally increased the accuracy in the tone identification task, and this increase depended on the tone in question. Chapter 5 is another chapter that zooms in on the relative strength of auditory and visual information for tone-naïve perceivers, but from the aspect of tone classification. In this chapter, we studied the acoustic and visual features of the tones produced by native speakers of Mandarin Chinese. Computational models based on acoustic features, visual features and acoustic-visual features were constructed to automatically classify Mandarin tones. Moreover, this study examined what perceivers pick up (perception) from what a speaker does (production, facial expression) by studying both production and perception. To be more specific, this chapter set out to answer: (1) which acoustic and visual features of tones produced by native speakers could be used to automatically classify Mandarin tones. Furthermore, (2) whether or not the features used in tone production are similar to or different from the ones that have cue value for tone-naïve perceivers when they categorize tones; and (3) whether and how visual information (i.e., facial expression and facial pose) contributes to the classification of Mandarin tones over and above the information provided by the acoustic signal. To address these questions, the stimuli that had been recorded (and described in chapter 2) and the response data that had been collected (and reported on in chapter 3) were used. Basic acoustic and visual features were extracted. Based on them, we used Random Forest classification to identify the most important acoustic and visual features for classifying the tones. The classifiers were trained on produced tone classification (given a set of auditory and visual features, predict the produced tone) and on perceived/responded tone classification (given a set of features, predict the corresponding tone as identified by the participant). The results showed that acoustic features outperformed visual features for tone classification, both for the classification of the produced and the perceived tone. However, tone-naïve perceivers did revert to the use of visual information in certain cases (when they gave wrong responses). So, visual information does not seem to play a significant role in native speakers’ tone production, but tone-naïve perceivers do sometimes consider visual information in their tone identification. These findings provided additional evidence that auditory information is more important than visual information in Mandarin tone perception and tone classification. Notably, visual features contributed to the participants’ erroneous performance. This suggests that visual information actually misled tone-naïve perceivers in their task of tone identification. To some extent, this is consistent with our claim that visual cues do influence tone perception. In addition, the ranking of the auditory features and visual features in tone perception showed that the factor perceiver (i.e., the participant) was responsible for the largest amount of variance explained in the responses by our tone-naïve participants, indicating the importance of individual differences in tone perception. To sum up, perceivers who do not have tone in their language background tend to make use of visual cues from the speakers’ faces for their perception of unknown tones (Mandarin Chinese in this dissertation), in addition to the auditory information they clearly also use. However, auditory cues are still the primary source they rely on. There is a consistent finding across the studies that the variations between tones, speakers and participants have an effect on the accuracy of tone identification for tone-naïve speaker

Tilburg University Repository

Can an Accelerated Intervention Close the School Readiness Gap for Disadvantaged Children? An Evaluation of the Effects of the LEARN Project’s Summer Pre-Primary Program on Literacy Outcomes in Northern Lao PDR

Author: Fonseca Jodie
Publication venue: UCL (University College London)
Publication date: 28/12/2021
Field of study

Developed against the backdrop of Sustainable Development Goal 4, as well as a global trend towards rigorous assessment of early childhood programs, this thesis answers questions about the effects of an accelerated school readiness intervention for non-Lao children in disadvantaged communities of Lao People’s Democratic Republic. Through a longitudinal, cluster randomized control trial, the study employs multi-level regression with an analytical sample of 391 children to examine the outcomes of a summer pre-primary program piloted from 2015-2018 by the Lao government with support from Plan International and Save the Children International in the Dubai-Cares funded Lao Educational Access, Research, and Networking (LEARN) Project. Research questions are investigated through a design in which the same panel of children are assessed against a control group at three intervals using the Measurement of Development and Early Learning. The thesis identifies significant associations between receiving the treatment and achieving higher gain scores on several emergent literacy tasks between baseline and midline, with effects roughly in line with similar interventions in other contexts. At the same time, the thesis finds that those effects had largely faded by endline. An interaction between treatment and ethnicity was only evident in a few instances, suggesting that the intervention may have boosted school readiness for Khmu children more by the start of grade 1 and for Hmong children more during grade 1. The thesis raises important recommendations about how to improve the fit between the ultimate objectives of accelerated interventions, the evaluations they undergo, and the needs of the broader education system. New contributions to knowledge are also found by interrogating a global assessment paradigm through a comparative linguistic lens, so that forthcoming evaluations benefit from the lessons learned based on LEARN’s attempt to fit a square peg into a unique alpha-syllabic, tonal Southeast Asian language

UCL Discovery

Audiovisual perception of Mandarin lexical tones.

Author: Wang Rui
Publication venue
Publication date
Field of study

It has been widely acknowledged that visual information from a talker’s face, mouth and lip movements plays an important role in speech perception of spoken languages. Visual information facilitates speech perception in audiovisual congruent condition and even alters speech perception in audiovisual incongruent condition. Audiovisual speech perception has been greatly researched in terms of consonants and vowels, and it has been thought that visual information from articulatory movements conveys phonetic information (e.g. place of articulation) that facilitates or changes speech perception. However, some research give rise to another type of visual information which conveys non-phonetic information (e.g. timing cue), affecting speech perception. The existence of these two types of visual information in audiovisual integration process suggests that there are two levels of audiovisual speech integration in different stages of processing. The studies in this dissertation focused on audiovisual perception of Mandarin lexical tones. The results of the experiments which employed behavioural and event-related potential measures provided evidence that visual information has an effect on auditory lexical tone perception. First, lexical tone perception benefits from adding visual information of corresponding articulatory movement. Second, the duration perception of lexical tones is changed by incongruent visual information. Moreover, the studies revealed that there are two types of visual information—timing (non-phonetic) cue and tone duration (phonetic/ tonetic) cue— involving in audiovisual integration process of Mandarin lexical tone. This finding further supports that audiovisual speech perception comprises non-phonetic and phonetic-specific levels of processing. Non-phonetic audiovisual integration could start in an early stage while phonetic-specific audiovisual integration could occur in a later stage of processing. Lexical tones have not been paid much attention in the research of audiovisual speech perception. The current studies fill the gap in the research of Mandarin lexical tone perception, and the findings from these experiments have important theoretical implications for audiovisual speech processing

Bournemouth University Research Online

Visual cues in Mandarin tone perception

Author: Burnham Denis K.
Hu Yu
Mixdorff Hansjorg
Publication venue: Adelaide, S. Aust, Causal Productions
Publication date: 01/01/2005
Field of study

This paper presents results concerning the exploitation of visual cues in the perception of Mandarin tones. The lower part of a female speaker's face was recorded on digital video as she uttered 25 sets of syllabic tokens covering the four different tones of Mandarin. Then in a perception study the audio sound track alone, as well an audio plus video condition were presented to native Mandarin speakers who were required to decide which tone they perceived. Audio was presented in various conditions: clear, babble-noise masked at different SNR levels, as well as devoiced and amplitude-modulated noise conditions using LPC resynthesis. In the devoiced and the clear audio conditions, there is little augmentation of audio alone due to the addition of video. However, the addition of visual information did significantly improve perception in the babble-noise masked condition, and this effect increased with decreasing SNR. This outcome suggests that the improvement in noise-masked conditions is not due to additional information in the video per se, but rather to an effect of early integration of acoustic and visual cues facilitating auditory-visual speech perception

Western Sydney ResearchDirect

Visual cues in Mandarin tone perception. In

Author: Denis Burnham
Hansjörg Mixdorff
Yu Hu
Publication venue
Publication date: 01/01/2005
Field of study

Abstract This paper presents results concerning the exploitation of visual cues in the perception of Mandarin tones. The lower part of a female speaker's face was recorded on digital video as she uttered 25 sets of syllabic tokens covering the four different tones of Mandarin. Then in a perception study the audio sound track alone, as well an audio plus video condition were presented to native Mandarin speakers who were required to decide which tone they perceived. Audio was presented in various conditions: clear, babble-noise masked at different SNR levels, as well as devoiced and amplitudemodulated noise conditions using LPC resynthesis. In the devoiced and the clear audio conditions, there is little augmentation of audio alone due to the addition of video. However, the addition of visual information did significantly improve perception in the babble-noise masked condition, and this effect increased with decreasing SNR. This outcome suggests that the improvement in noise-masked conditions is not due to additional information in the video per se, but rather to an effect of early integration of acoustic and visual cues facilitating auditory-visual speech perception

CiteSeerX