506 research outputs found

    Speaker matters: Natural inter-speaker variation affects 4-month-olds’ perception of audio-visual speech

    Get PDF
    First Published September 27, 2019In the language development literature, studies often make inferences about infants’ speech perception abilities based on their responses to a single speaker. However, there can be significant natural variability across speakers in how speech is produced (i.e., inter-speaker differences). The current study examined whether inter-speaker differences can affect infants’ ability to detect a mismatch between the auditory and visual components of vowels. Using an eye-tracker, 4.5-month-old infants were tested on auditory-visual (AV) matching for two vowels (/i/ and /u/). Critically, infants were tested with two speakers who naturally differed in how distinctively they articulated the two vowels within and across the categories. Only infants who watched and listened to the speaker whose visual articulations of the two vowels were most distinct from one another were sensitive to AV mismatch. This speaker also produced a visually more distinct /i/ as compared to the other speaker. This finding suggests that infants are sensitive to the distinctiveness of AV information across speakers, and that when making inferences about infants’ perceptual abilities, characteristics of the speaker should be taken into account.The author(s) disclosed receipt of the following financial support for the research, authorship and/ or publication of this article: This research was funded by the grant PSI2014-5452-P from the Spanish Ministry of Economy and Competitiveness to M.M. The authors also acknowledge financial support from the ‘Severo Ochoa Program for Centers/Units of Excellence in R&D’ (SEV-2015-490) and from the Basque Government ‘Programa Predoctoral’ to J.P

    The development of audiovisual vowel processing in monolingual and bilingual infants: a cross-sectional and longitudinal study.

    Get PDF
    127 p.The aim of the current dissertation is to investigate to what extent infants acquiring one language (monolinguals) and infants acquiring two languages (bilinguals) share their strategies during audiovisual speech processing. The dissertation focuses on typically developing Basque and Spanish monolingual and bilingual infants' processing of matching and mismatching audio-visual vowels at 4 and 8 months of age. Using an eye-tracker, the infants' attention to audiovisual match versus mismatch conditions and to the speakers' eyes versus mouth was measured in a cross-sectional and a longitudinal design. The cross-sectional data revealed that bilingual and monolingual infants exhibited similar audiovisual matching ability. Furthermore, they exhibited similar looking pattern: at 4 months of age, monolinguals and bilinguals attended more to the speakers' eyes, whereas at 8 months of age they attended equally to the eyes and to the mouth. Finally, the longitudinal data revealed that infants' attention to the eyes versus the mouth is correlated between 4 and 8 months of age, regardless of the linguistic group. Taken together, the current research demonstrated no clear difference in audiovisual vowel processing between monolingual and bilingual infants. Overall, the dissertation has made fundamental contributions to understanding underlying processes in language acquisition across linguistically diverse populations.bcbl: basque center on cognition, brain and languag

    The development of audiovisual vowel processing in monolingual and bilingual infants: a cross-sectional and longitudinal study.

    Get PDF
    127 p.The aim of the current dissertation is to investigate to what extent infants acquiring one language (monolinguals) and infants acquiring two languages (bilinguals) share their strategies during audiovisual speech processing. The dissertation focuses on typically developing Basque and Spanish monolingual and bilingual infants' processing of matching and mismatching audio-visual vowels at 4 and 8 months of age. Using an eye-tracker, the infants' attention to audiovisual match versus mismatch conditions and to the speakers' eyes versus mouth was measured in a cross-sectional and a longitudinal design. The cross-sectional data revealed that bilingual and monolingual infants exhibited similar audiovisual matching ability. Furthermore, they exhibited similar looking pattern: at 4 months of age, monolinguals and bilinguals attended more to the speakers' eyes, whereas at 8 months of age they attended equally to the eyes and to the mouth. Finally, the longitudinal data revealed that infants' attention to the eyes versus the mouth is correlated between 4 and 8 months of age, regardless of the linguistic group. Taken together, the current research demonstrated no clear difference in audiovisual vowel processing between monolingual and bilingual infants. Overall, the dissertation has made fundamental contributions to understanding underlying processes in language acquisition across linguistically diverse populations.bcbl: basque center on cognition, brain and languag

    Multi-Level Audio-Visual Interactions in Speech and Language Perception

    Get PDF
    That we perceive our environment as a unified scene rather than individual streams of auditory, visual, and other sensory information has recently provided motivation to move past the long-held tradition of studying these systems separately. Although they are each unique in their transduction organs, neural pathways, and cortical primary areas, the senses are ultimately merged in a meaningful way which allows us to navigate the multisensory world. Investigating how the senses are merged has become an increasingly wide field of research in recent decades, with the introduction and increased availability of neuroimaging techniques. Areas of study range from multisensory object perception to cross-modal attention, multisensory interactions, and integration. This thesis focuses on audio-visual speech perception, with special focus on facilitatory effects of visual information on auditory processing. When visual information is concordant with auditory information, it provides an advantage that is measurable in behavioral response times and evoked auditory fields (Chapter 3) and in increased entrainment to multisensory periodic stimuli reflected by steady-state responses (Chapter 4). When the audio-visual information is incongruent, the combination can often, but not always, combine to form a third, non-physically present percept (known as the McGurk effect). This effect is investigated (Chapter 5) using real word stimuli. McGurk percepts were not robustly elicited for a majority of stimulus types, but patterns of responses suggest that the physical and lexical properties of the auditory and visual stimulus may affect the likelihood of obtaining the illusion. Together, these experiments add to the growing body of knowledge that suggests that audio-visual interactions occur at multiple stages of processing

    Comprehension in-situ: how multimodal information shapes language processing

    Get PDF
    The human brain supports communication in dynamic face-to-face environments where spoken words are embedded in linguistic discourse and accompanied by multimodal cues, such as prosody, gestures and mouth movements. However, we only have limited knowledge of how these multimodal cues jointly modulate language comprehension. In a series of behavioural and EEG studies, we investigated the joint impact of these cues when processing naturalistic-style materials. First, we built a mouth informativeness corpus of English words, to quantify mouth informativeness of a large number of words used in the following experiments. Then, across two EEG studies, we found and replicated that native English speakers use multimodal cues and that their interactions dynamically modulate N400 amplitude elicited by words that are less predictable in the discourse context (indexed by surprisal values per word). We then extended the findings to second language comprehenders, finding that multimodal cues modulate L2 comprehension, just like in L1, but to a lesser extent; although L2 comprehenders benefit more from meaningful gestures and mouth movements. Finally, in two behavioural experiments investigating whether multimodal cues jointly modulate the learning of new concepts, we found some evidence that presence of iconic gestures improves memory, and that the effect may be larger if information is presented also with prosodic accentuation. Overall, these findings suggest that real-world comprehension uses all cues present and weights cues differently in a dynamic manner. Therefore, multimodal cues should not be neglected for language studies. Investigating communication in naturalistic contexts containing more than one cue can provide new insight into our understanding of language comprehension in the real world

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    Acoustic cues for body size: how size-related features are used and perceived

    Get PDF
    We live in a noisy world. There is no place on the Earth where it is possible to have the experience of complete silence, not even the deepest place in the ocean. Billions of living and nonliving objects around us produce sounds, which are extremely different in their physical structure. Some of these sounds are noisy, some are harmonic, some are continuous, others are impulsive, soft, loud; the sound environment contains an infinite combination of all these characteristics and more. Evolving in such an environment has resulted in a human auditory system that is able to extract useful information from sounds. We are able to say whether a sound source is still or moving (and in the latter case, the direction of movement), what kind of object produced the sound, and the meaning of the message if the perceived sound is an intentional communicative signal. When we hear someone\u2019s voice, for instance, we are able to extract useful information about talker identity apart from the meaning of the heard words. This thesis focuses on a particular kind of information that can be extracted from an acoustic signal: the apparent size of the sound-producing object

    Production and perception of the voiceless sibilant fricatives in typically developing children with applications for children with cleft palate

    Get PDF
    The purpose of this study was to advance the current knowledge base regarding production and perception of the voiceless sibilant fricatives [s] and [sh] in two groups of ten typically developing children each, age 7 and 11. Developmental differences in production and perception were investigated, as well as the relationship between production and perception. A group of five children with repaired cleft lip and palate between 7 and 11 years of age was included in the study to determine if differences exist in perception or production in children with obligatory limitations in early development of speech production and perception skills compared to typically developing children. The findings from the analyses of fricative production indicated that almost all typically developing children (95%) showed non-overlapping productive distinction between the voiceless sibilant fricatives, with varying degrees of token-to-token variability and variability in dynamic patterns of production. Developmental differences in production between the two age groups were found for fricative duration and coefficient of variation for [s] at midpoint. Differences in fricative perception were found between the TD-7 and TD-11 groups, with the older children displaying qualitatively steeper slopes on identification functions, and greater accuracy and less variability on tests of fricative discrimination compared to the younger children. No linear relationship was found between the participant's measures of fricative production and perception in the two age groups. Children with repaired cleft lip and palate showed greater proportion of overlapping fricative production, but like the typically developing children, showed individual speaker variability in dynamic spectral patterns during production. In general, the participants in the CLP group showed monotonic crossovers in identification of the [s] - [sh] continuum despite most speakers showing no productive distinction. Fricative discrimination in the CLP group was similar to the performance of the TD-7 and TD-11 group, with older children in the CLP group demonstrating greater accuracy and less variability compared to the younger children in this group. Similar to the typically developing children, there did not seem to be a relationship between production and perception of the voiceless sibilant fricatives in the participants with repaired cleft lip and palate
    corecore