1,363 research outputs found
Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants
Objective: The present study investigated the development of audiovisual comprehension skills in prelingually deaf children who received cochlear implants.
Design: We analyzed results obtained with the Common Phrases (Robbins et al., 1995) test of sentence comprehension from 80 prelingually deaf children with cochlear implants who were enrolled in a longitudinal study, from pre-implantation to 5 years after implantation.
Results: The results revealed that prelingually deaf children with cochlear implants performed better under audiovisual (AV) presentation compared with auditory-alone (A-alone) or visual-alone (V-alone) conditions. AV sentence comprehension skills were found to be strongly correlated with several clinical outcome measures of speech perception, speech intelligibility, and language. Finally, pre-implantation V-alone performance on the Common Phrases test was strongly correlated with 3-year postimplantation performance on clinical outcome measures of speech perception, speech intelligibility, and language skills.
Conclusions: The results suggest that lipreading skills and AV speech perception reflect a common source of variance associated with the development of phonological processing skills that is shared among a wide range of speech and language outcome measures
The listening talker: A review of human and algorithmic context-induced modifications of speech
International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output
Recommended from our members
Developmental and cultural factors of audiovisual speech perception in noise
textThe aim of this project is two-fold: 1) to investigate developmental differences in intelligibility gains from visual cues in speech perception-in-noise, and 2) to examine how different types of maskers modulate visual enhancement across age groups. A secondary aim of this project is to investigate whether or not bilingualism differentially
modulates audiovisual integration during speech in noise tasks. To that end, both child and adult, monolingual and bilingual participants completed speech perception in noise tasks through three within-subject variables: (1) masker type: pink noise or two-talker babble, (2) modality: audio-only (AO) and audiovisual (AV), and (3) Signal-to-noise ratio (SNR): 0 dB, -4 dB, -8 dB, -12 dB, and -16 dB. The findings revealed that, although
both children and adults benefited from visual cues in speech-in-noise tasks, adults showed greater benefit at lower SNRs. Moreover, although child monolingual and bilingual participants performed comparably across all conditions, monolingual adults
outperformed simultaneous bilingual adult participants. These results may indicate that the divergent use of visual cues in speech perception between bilingual and monolingual speakers occurs later in development.Communication Sciences and Disorder
Seeing a talking face matters to infants, children and adults : behavioural and neurophysiological studies
Everyday conversations typically occur face-to-face. Over and above auditory information, visual information from a speakerâs face, e.g., lips, eyebrows, contributes to speech perception and comprehension. The facilitation that visual speech cues bringâ termed the visual speech benefitâare experienced by infants, children and adults. Even so, studies on speech perception have largely focused on auditory-only speech leaving a relative paucity of research on the visual speech benefit. Central to this thesis are the behavioural and neurophysiological manifestations of the visual speech benefit. As the visual speech benefit assumes that a listener is attending to a speakerâs talking face, the investigations are conducted in relation to the possible modulating effects that gaze behaviour brings. Three investigations were conducted. Collectively, these studies demonstrate that visual speech information facilitates speech perception, and this has implications for individuals who do not have clear access to the auditory speech signal. The results, for instance the enhancement of 5-month-oldsâ cortical tracking by visual speech cues, and the effect of idiosyncratic differences in gaze behaviour on speech processing, expand knowledge of auditory-visual speech processing, and provide firm bases for new directions in this burgeoning and important area of research
Unsupervised syntactic chunking with acoustic cues: Computational models for prosodic bootstrapping
Learning to group words into phrases without supervision is a hard task for NLP systems, but infants routinely accomplish it. We hypothesize that infants use acoustic cues to prosody, which NLP systems typically ignore. To evaluate the utility of prosodic information for phrase discovery, we present an HMM-based unsupervised chunker that learns from only transcribed words and raw acoustic correlates to prosody. Unlike previous work on unsupervised parsing and chunking, we use neither gold standard part-of-speech tags nor punctuation in the input. Evaluated on the Switchboard corpus, our model outperforms several baselines that exploit either lexical or prosodic information alone, and, despite producing a flat structure, performs competitively with a state-of-the-art unsupervised lexicalized parser, with a substantial advantage in precision. Our results support the hypothesis that acoustic-prosodic cues provide useful evidence about syntactic phrases for language-learning infants.10 page(s
The phylogenetic origin and mechanism of sound symbolism - the role of action-perception circuits
As opposed to the classic Saussurean view on the arbitrary relationship between linguistic form and meaning, non-arbitrariness is a pervasive feature in human language. Sound symbolismânamely, the intrinsic relationship between meaningless speech sounds and visual shapesâis a typical case of non-arbitrariness. A demonstration of sound symbolism is the âmaluma-taketeâ effect, in which immanent links are observed between meaningless âroundâ or âsharpâ speech sounds (e.g., maluma vs. takete) and round or sharp abstract visual shapes, respectively. An extensive amount of empirical work suggests that these mappings are shared by humans and play a distinct role in the emergence and acquisition of language. However, important questions are still pending on the origins and mechanism of sound symbolic processing. Those questions are addressed in the present work.
The first part of this dissertation focuses on the validation of sound symbolic effects in a forced choice task, and on the interaction of sound symbolism with two crossmodal
mappings shared by humans. To address this question, human subjects were tested with
a forced choice task on sound symbolic mappings crossed with two crossmodal audiovisual mappings (pitch-shape and pitch-spatial position). Subjects performed significantly above chance only for the sound symbolic associations but not for the other two mappings. Sound symbolic effects were replicated, while the other two crossmodal mappings involving low-level audiovisual properties, such as pitch and spatial position, did not emerge.
The second issue examined in the present dissertation are the phylogenetic origins of sound symbolic associations. Human subjects and a group of touchscreen trained
great apes were tested with a forced choice task on sound symbolic mappings. Only
humans were able to process and/or infer the links between meaningless speech sounds
and abstract shapes. These results reveal, for the first time, the specificity of humansâ
sound symbolic ability, which can be related to neurobiological findings on the distinct
development and connectivity of the human language network.
The last part of the dissertation investigates whether action knowledge and knowledge
of the perceptual outputs of our actions can provide a possible explanation of
sound symbolic mappings. In a series of experiments, human subjects performed sound
symbolic mappings, and mappings of âroundâ or âsharpâ hand actions sounds with the
shapes produced by these hand actions. In addition, the auditory and visual stimuli of
both conditions were crossed. Subjects significantly detected congruencies for all mappings, and most importantly, a positive correlation was observed in their performances across conditions. Physical acoustic and visual similarities between the audiovisual byproducts of our hand actions with the sound symbolic pseudowords and shapes show that the link between meaningless speech sounds and abstract visual shapes is found in action knowledge. From a neurobiological perspective the link between actions and the audiovisual by-products of our actions is also in accordance with distributed action perception circuits in the human brain. Action-perception circuits, supported by the human neuroanatomical connectivity between auditory, visual, and motor cortices, and under associative learning, emerge and carry the perceptual and motor knowledge of our actions. These findings give a novel explanation for how symbolic communication is linked to our sensorimotor experiences.
To sum up, the present dissertation (i) validates the presence of sound symbolic effects
in a forced choice task, (ii) shows that sound symbolic ability is specific to humans, and
(iii) that action knowledge can provide the mechanistic glue of mapping meaningless
speech sounds to abstract shapes. Overall, the present work contributes to a better
understanding of the phylogenetic origins and mechanism of sound symbolic ability in
humans.Im Gegensatz zur klassischen Saussureschen Ansicht Ăźber die willkĂźrliche Beziehung
zwischen sprachlicher Form und Bedeutung ist die NichtwillkĂźrlichkeit ein durchdringendes Merkmal der menschlichen Sprache. Lautsymbolikânämlich die intrinsische Beziehung zwischen bedeutungslosen Sprachlauten und visuellen Formenâist ein typischer Fall von NichtwillkĂźrlichkeit. Ein Beispiel fĂźr Klangsymbolik ist der âmalumataketeâ Effekt, bei dem immanente Verbindungen zwischen bedeutungslosen ârundenâ oder âscharfenâ Sprachlauten (z.B. maluma vs. takete) und runden bzw. scharfen abstrakten visuellen Formen beobachtet werden. Umfangreiche empirische Arbeiten legen nahe, dass diese Zuordnungen von Menschen vorgenommen werden und bei der Entstehung und dem Erwerb von Sprache eine besondere Rolle spielen. Wichtige Fragen zu Ursprung und Mechanismus der Verarbeitung von Lautsymbolen sind jedoch noch offen. Diese Fragen werden in der vorliegenden Arbeit behandelt.
Der erste Teil dieser Dissertation konzentriert sich auf die Validierung von klangsymbolischen Effekten in einer Forced-Choice-Auswahlaufgabe (erzwungene Wahl) und auf die Interaktion von Klangsymbolik mit zwei crossmodalen Mappings, die von Menschen vorgenommen werden. Um dieser Frage nachzugehen, wurden menschliche Probanden mit einer Auswahlaufgabe mit zwei AuswahlmÜglichkeiten auf klangsymbolische Zuordnungen getestet , die mit zwei crossmodalen audiovisuellen Zuordnungen (TonhÜhenform und TonhÜhen-Raum-Position) gekreuzt wurden. Die Versuchspersonen erbrachten nur bei den klangsymbolischen Assoziationen eine signifikant ßber dem Zufall liegende Leistung, nicht aber bei den beiden anderen Zuordnungen. Tonsymbolische Effekte wurden repliziert, während die beiden anderen crossmodalen Zuordnungen, die audiovisuelle Eigenschaften auf niedriger Ebene wie TonhÜhe und räumliche Position beinhalteten, nicht auftraten.
Das zweite Thema, das in der vorliegenden Dissertation untersucht wird, sind die
phylogenetischen Ursprßnge der klangsymbolischen Assoziationen. Menschliche Versuchspersonen und eine Gruppe von Menschenaffen, die auf Touchscreens trainiert wurden, wurden mit einer Forced-Choice-Aufgabe auf klangsymbolische Zuordnungen getestet. Nur Menschen waren in der Lage, die Verbindungen zwischen bedeutungslosen Sprachlauten und abstrakten Formen zu verarbeiten und/oder abzuleiten. Diese Ergebnisse zeigen zum ersten Mal die Spezifität der lautsymbolischen Fähigkeit des Menschen, die mit neurobiologischen Erkenntnissen ßber die ausgeprägte Entwicklung und Konnektivität des menschlichen Sprachnetzwerks in Verbindung gebracht werden kann.
Der letzte Teil der Dissertation untersucht darĂźber hinaus, ob Handlungswissen und
das Wissen um die Wahrnehmungsergebnisse unserer Handlungen eine mĂśgliche Erklärung fĂźr solide symbolische Mappings liefern kĂśnnen. In einer Reihe von Experimenten fĂźhrten menschliche Versuchspersonen klangsymbolische Mappings durch sowie Mappings von ârundenâ oder âscharfenâ Handaktionen Klänge mit den durch diese Handaktionen erzeugten Formen. DarĂźber hinaus wurden die auditiven und visuellen Reize beider Bedingungen gekreuzt. Die Probanden stellten bei allen Zuordnungen signifikant Kongruenzen fest, und, was am wichtigsten war, es wurde eine positive Korrelation ihrer Leistungen unter allen Bedingungen beobachtet. Physikalische akustische und visuelle Ăhnlichkeiten zwischen den audiovisuellen Nebenprodukten unserer Handaktionen mit den klangsymbolischen PseudowĂśrtern und Formen zeigen, dass die Verbindung zwischen bedeutungslosen Sprachlauten und abstrakten visuellen Formen im Handlungswissen zu finden ist. Aus neurobiologischer Sicht stimmt die Verbindung zwischen Handlungen und den audiovisuellen Nebenprodukten unserer Handlungen auch mit den verteilten Handlungs- und Wahrnehmungskreisläufen im menschlichen Gehirn Ăźberein. Aktions- Wahrnehmungsnetzwerken, die durch die neuroanatomische Konnektivität zwischen auditorischen, visuellen und motorischen kortikalen Arealen des Menschen unterstĂźtzt werden, entstehen und tragen unter assoziativem Lernen das perzeptuelle und motorische
Wissen unserer Handlungen. Diese Erkenntnisse geben eine neuartige Erklärung dafßr,
wie symbolische Kommunikation in unseren sensomotorischen Erfahrungen verknĂźpft
ist.
Zusammenfassend lässt sich sagen, dass die vorliegende Dissertation (i) das Vorhandensein von lautsymbolischen Effekten in einer Forced-Choice-Aufgabe validiert, (ii) zeigt, dass lautsymbolische Fähigkeiten spezifisch fßr Menschen sind, und (iii) dass Handlungswissen den mechanistischen Klebstoff liefern kann, um bedeutungslose Sprachlaute auf abstrakte Formen abzubilden. Insgesamt trägt die vorliegende Arbeit zu einem besseren Verständnis der phylogenetischen Ursprßnge und des Mechanismus der lautsymbolischen Fähigkeit des Menschen bei
Neural pathways for visual speech perception
This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA
Multi-Sensoriality In Language Acquisition: The Relationship Between Selective Visual Attention Towards The Adultâs Face And Language Skills
Introduzione Le componenti uditive e visive del linguaggio offrono al bambino informazioni cruciali per il processamento del parlato. LâabilitĂ del bambino di integrare informazioni da diverse fonti multimodali (audio e visive) e di focalizzare lâattenzione sui segnali rilevanti presenti nellâambiente circostante (selective visual attention) sono aspetti importanti che influenzano le prime fasi di acquisizione di una lingua. Alcuni recenti studi hanno ipotizzato e testato la relazione tra attenzione selettiva visiva verso specifiche aree del volto parlante (occhi o bocca) e le abilitĂ linguistiche di bambini nei primi anni di vita. Molti ricercatori hanno speculato su come questa relazione potesse essere mediata dal livello di expetise del bambino, a livello linguistico (language expertise hypothesis), ma nessuno studio, fin ad ora, ha cercato di approfondire questa ipotesi, andando ad investigare le abilitĂ linguistiche dei bambini usando misure di linguaggio spontaneo. Altri studi, hanno cercato di esplorare come diversi comportamenti attentivi verso specifiche aree del volto (occhi o bocca) fossero correlati alle abilitĂ linguistiche concomitanti o longitudinali dei partecipanti. In molti casi, i risultati di questi studi hanno confermato lâesistenza di relazioni significative tra attenzione visiva selettiva e abilitĂ linguistiche al tempo dellâesperimento o qualche mese dopo. Obiettivi Lâobiettivo generale di questa tesi è quello di esaminare il fenomeno dellâattenzione selettiva visiva verso il volto e la sua relazione con lo sviluppo del linguaggio sia in un setting di laboratorio sia in un contesto naturalistico. In particolare, tre sono gli obiettivi specifici: - il primo obiettivo specifico è quello di sintetizzare e analizzare i fattori individuati dalla letteratura di riferimento che possono determinare diversi patterns di attenzione selettiva visiva nei bambini durante un compito audiovisivo. Ed in particolare, descrivere come la letteratura spiega questi patterns in relazione agli aspetti dello sviluppo del linguaggio; 8 - il secondo obiettivo specifico è quello di analizzare sperimentalmente lâattenzione selettiva visiva del bambino verso specifiche aree del volto (occhi e bocca) durante un compito di esposizione audiovisivo. In particolare, lo studio è volto ad indagare due aspetti. Il primo aspetto riguarda lâetĂ e la condizione linguistica (esposizione ad una lingua nativa vs una lingua non nativa) dei partecipanti e come queste influenzano lâattenzione selettiva visiva verso specifiche aree del volto. Il secondo aspetto riguarda lâesplorazione dellâesistenza di una correlazione tra comportamento attentivo dei bambini la produzione vocale al tempo dellâesperimento e allâampiezza del vocabolario tre mesi dopo; - il terzo obiettivo specifico è quello di capire se lâattenzione a volti o altre parti della scena visiva (oggetto, altre parti della stanza) è influenzato o spigato dalle abilitĂ vocali del bambino al tempo del task e se gli episodi di fissazione al volto adulto possono essere predetti da specifiche proprietĂ fonologiche e semantiche del parlato del bambino. Metodo Per quanto concerne il primo studio, una rassegna sistematica della letteratura è stata condotta esplorando quattro fonti bibliografiche e usando specifici criteri di inclusione per selezionare la letteratura scientifica di interesse. Per quanto riguarda il secondo studio, i movimenti oculari verso un volto parlante la lingua nativa (Italiano) e non-nativa (Inglese) di 26 bambini tra i 6 e i 14 mesi sono stati tracciati usando lâeye tracker. Due gruppi sono stati creati sulla base dellâetĂ (G1, M = 7 mesi, N = 15 bambini; G2, M = 12 mesi, N = 11 bambini). Ogni competenza linguistica del bambino è stata valutata due volte, al tempo dellâesperimento, attraverso lâosservazione diretta e tre mesi dopo, attraverso il MB-CDI. Due gruppi sono stati creati sulla base della produzione vocale dei bambini (vocalizzi pre-canonici, babbling, parole) attraverso un latent class cluster analysis: una classe vocale âaltaâ (percentuale di babbling e parole piĂš alta) e una classe vocale âbassaâ (percentuale maggiore di produzioni pre-canoniche). Per quanto concerne il terzo studio, il comportamento attentivo di 29 bambini tra i 12 e i 19 mesi è stato esplorato utilizzando sia una videocamera stazionaria 9 (posizionata di fronte alla diade) e una go-pro (posizionata sulla fronte del caregiver di riferimento) durante un semplice task linguistico (single object task). Durante il task i bambini sono stati esposti ad un set di stimoli audiovisivi, parole vere e non parole, scelte sulla base dei report dei genitori e sulle risposte al MB-CDI. Il comportamento attentivo dei bambini è stato codificato offline, secondo per secondo per un totale di 116 sessioni. La codifica ha riguardato specifiche aree di interesse (il volto, lâoggetto, o altre parti della stanza). La produzione vocale per ogni bambino è stata quantificata usando LENA e le produzioni del bambino (vocalizzi pre-canonici, babbling, parole) durante un periodo di gioco con la mamma sono state trascritte foneticamente. Risultati La rassegna sistematica della letteratura (Capitolo 2) ha portato allâidentificazione di 19 articoli. Alcuni dei quali volti a chiarire il ruolo giocato da diversi fattori nel spiegare diversi patterns attentivi. Altri interessati ad indagare la correlazione tra lâattenzione selettiva visiva verso specifiche aree del volto alle competenze linguistiche o sociali dei partecipanti, aprendo le porte a diverse linee interpretative. Il primo studio empirico (Capitolo 3) ha messo in luce che i bambini italiani con etĂ superiore ai 12 mesi, mostrano maggiore interesse verso lâarea della bocca, specialmente quando esposti alla lingua nativa. Questo è in accordo con la recente letteratura, ma contrasta con la language expertise hypotesis (secondo la quale bambini attorno allâanno di etĂ dovrebbero spostare il focus attentivo dalla bocca agli occhi). Il secondo risultato emerso in questo lavoro empirico riguarda lâinteresse verso lâarea della bocca per i bambini che hanno maggiori livelli di produzione in termini di babbling e parole al tempo dellâesperimento. Il terzo risultato riguarda lâassociazione positiva tra il comportamento attentivo verso la bocca ed il vocabolario espressivo dei bambini misurato tramite questionario (MB-CDI) tre mesi dopo lâesperimento. Dal secondo studio empirico (Capitolo 4) emerge una differenza significativa in termini di tempo attentivo verso il volto adulto tra i bambini del gruppo linguistico âaltoâ e âbassoâ durante un task condotto in un contesto naturalistico. 10 In particolare, da questo studio emergono due risultati interessanti: il primo è che i bambini che producono forme vocaliche piĂš avanzate (babbling e parole) guardano di piĂš verso il volto adulto, specialmente quando esposti alle non-parole. Il secondo riguarda lâesistenza di una relazione significativa tra gli episodi di fissazione al volto e le abilitĂ vocaliche del bambino al tempo del task (vocalizzi pre-canonici, babbling e parole). In particolare, emerge che la quantitĂ di babbling prodotto ha un ruolo nel predire gli episodi di fissazione al volto durante il task, sia per le parole sia per le non parole. Conclusioni Diverse ipotesi linguistiche e sociali sono state avanzate per spiegare le differenze emerse dalla rassegna della letteratura in relazione al fenomeno dellâattenzione selettiva visiva. Gli studi empirici presentati in questa tesi hanno portato due contributi originali in questâambito di ricerca. Da un lato, i nostri risultati confermano lâidea che la bocca e, piĂš in generale, il volto forniscono segnali visivi cruciali nelle prime fasi di acquisizione del linguaggio. Dallâaltro lato, i risultati hanno messo in luce che la conoscenza linguistica e le abilitĂ linguistiche dei partecipanti aiutano a spiegare diversi comportamenti attentivi. In altre parole, è possibile dire che lâattenzione selettiva ai volti, o a specifiche aree di questi, è spiegata dalle conoscenze e abilitĂ linguistiche attuali dei partecipanti.Introduction Speech is the result of multimodal or multi-sensorial processes. The auditory and visual components of language provide the child with information crucial to the processing of speech. The language acquisition process is influenced by the childâs ability to integrate information from multimodal (audio and visual) sources and to focus attention on the relevant cues in the environment; this is selective visual attention. This dissertation will explore the relationship between childrenâs selective visual attention and their early language skills. Several recent studies with infant populations have hypothesised or tested the relationship between childrenâs selective visual attention towards specific regions of the talking face (i.e., the eyes or the mouth) and their language skills. These studies have tried to show how concomitant or longitudinal language skills can explain looking behaviours. In most cases, these studies have speculated on how this relationship is mediated by the childâs level of language expertise (this is known as the language expertise hypothesis). However, no studies until now, to the best of our knowledge, have investigated the childâs linguistic skills using spontaneous language measures. Aims The dissertation has one broad aim, within which there are three particular aims. The broad aim is to examine the phenomenon of selective visual attention toward the face in both a laboratory and a naturalistic setting, and its relationship with language development. The three particular aims are as follows. The first aim is to synthesise and analyse the factors that might determine different looking patterns in infants during audiovisual tasks using dynamic faces; it describes how the literature explains these patterns in relation to aspects of language development. The second aim is to experimentally investigate the childâs selective visual attention towards a specific region of the adultâs face (the eyes and the mouth) in a task using the eye-tracking method. In particular, the study will explore two 12 questions: First, how do age and language condition (exposure to native vs non-native speech) affect looking behaviour in children? Second, are a childâs looking behaviours related to vocal production at the time of the experiment and to vocabulary rates three months later, and if so, how? The third aim is to understand whether selective attention towards the face or other parts of the visual scene (i.e. the object or elsewhere) is influenced or explained by the childâs vocal skills at the time of the task. And can the episodes of fixation towards the adultâs face be predicted by specific phonological and semantic properties (i.e., pre-canonical vocalisations, babbling, words) of the childâs speech? Method For the first study, a systematic review of the literature was conducted, exploring four bibliographic databases and using specific inclusion criteria to select the records. For the second study, eye movements towards a dynamic face (on a screen), speaking in the childâs native language (Italian) and a non-native language (English), were tracked using an eye-tracker in 26 infants between 6 and 14 months. Two groups were created based on age (G1, M = 7 months, N = 15 infants; G2, M = 12 months, N = 11 infants). Each childâs language skill was assessed twice: at the time of the experiment (through direct observation, Time 1) and three months later (through MB-CDI, Time 2). Two groups were created, based on the childâs vocal production (Time 1, latent class cluster analysis): a high class (higher percentage of babbling and words) vs a low class (higher percentage of pre-canonical vocalisations). For the third study, the looking behaviour of the same 29 children between 12 and 19 months was tracked, using both a stationary video camera and a head-mounted camera on the motherâs head during a single object task. During the task, children were exposed to a set of audiovisual stimuli, real words and non-words, chosen based on the parentsâ reports and their MB-CDI answers. The childâs looking behaviour was coded offline second-by-second for a total of 116 sessions. The coding relates to specific areas of interest, i.e., the face, the object or 13 elsewhere. The vocal production of each child was quantified using a LENA device, and their speech during a play period with their mothers was transcribed phonetically. Results The systematic search of the literature (Chapter 2) identified 19 papers. Some tried to clarify the role played by audiovisual factors in support of speech perception (provided by looking towards the eyes or the mouth of a talking face). Others related selective visual attention towards specific areas of the adultâs face to the childâs competence in terms of linguistic or social skills, this leads to correspondingly different lines of interpretation. The first empirical study (Chapter 3) shows that Italian children older than 12 months displayed a greater interest in the mouth area, especially when they were exposed to their native language. This accords with the more recent literature but contrasts with the language expertise hypothesis. The second significant result of Chapter 3 is that children who had a higher level of production in terms of babbling and words at the time of the experiment looked more towards the mouth area. The study reported in Chapter 3 also demonstrated a positive association between the childâs looking to the mouth and their expressive vocabulary as measured (using the MB-CDI) three months after the experiment The second empirical study (Chapter 4) shows a significant difference in the looking time towards the adultâs face between children with low- and high-vocal production in a naturalistic setting. More specifically, from this study, we find two things. Firstly, we found that the children who produced more advanced vocal forms (higher amount of babbling and word production) looked more towards the adultâs face, especially when exposed to non-words. Secondly, that a significant relationship exists between the episodes of fixation towards the adultâs face and the childâs vocal skills (i.e., pre-canonical vocalisations, babbling, words); babbling productions predicted the episodes of face fixation in the task as a whole, for both words and non-words. 14 Conclusion Linguistic and social-based hypotheses attempting to explain the differences in the selective visual attention phenomenon emerged from the literature review. The empirical studies presented in this thesis bring two original contributions to this research field. First, our findings reinforce the idea that the mouth and, more generally the face, provide crucial visual cues when acquiring a language. Secondly, our results demonstrate that language knowledge and language skills at the time the child was observed significantly help to explain different looking behaviours. In other words, we can conclude that each childâs attention to faces is shaped by their own linguistic characteristics
- âŚ