1,363 research outputs found

    Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants

    Get PDF
    Objective: The present study investigated the development of audiovisual comprehension skills in prelingually deaf children who received cochlear implants. Design: We analyzed results obtained with the Common Phrases (Robbins et al., 1995) test of sentence comprehension from 80 prelingually deaf children with cochlear implants who were enrolled in a longitudinal study, from pre-implantation to 5 years after implantation. Results: The results revealed that prelingually deaf children with cochlear implants performed better under audiovisual (AV) presentation compared with auditory-alone (A-alone) or visual-alone (V-alone) conditions. AV sentence comprehension skills were found to be strongly correlated with several clinical outcome measures of speech perception, speech intelligibility, and language. Finally, pre-implantation V-alone performance on the Common Phrases test was strongly correlated with 3-year postimplantation performance on clinical outcome measures of speech perception, speech intelligibility, and language skills. Conclusions: The results suggest that lipreading skills and AV speech perception reflect a common source of variance associated with the development of phonological processing skills that is shared among a wide range of speech and language outcome measures

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Seeing a talking face matters to infants, children and adults : behavioural and neurophysiological studies

    Get PDF
    Everyday conversations typically occur face-to-face. Over and above auditory information, visual information from a speaker’s face, e.g., lips, eyebrows, contributes to speech perception and comprehension. The facilitation that visual speech cues bring— termed the visual speech benefit—are experienced by infants, children and adults. Even so, studies on speech perception have largely focused on auditory-only speech leaving a relative paucity of research on the visual speech benefit. Central to this thesis are the behavioural and neurophysiological manifestations of the visual speech benefit. As the visual speech benefit assumes that a listener is attending to a speaker’s talking face, the investigations are conducted in relation to the possible modulating effects that gaze behaviour brings. Three investigations were conducted. Collectively, these studies demonstrate that visual speech information facilitates speech perception, and this has implications for individuals who do not have clear access to the auditory speech signal. The results, for instance the enhancement of 5-month-olds’ cortical tracking by visual speech cues, and the effect of idiosyncratic differences in gaze behaviour on speech processing, expand knowledge of auditory-visual speech processing, and provide firm bases for new directions in this burgeoning and important area of research

    Unsupervised syntactic chunking with acoustic cues: Computational models for prosodic bootstrapping

    Get PDF
    Learning to group words into phrases without supervision is a hard task for NLP systems, but infants routinely accomplish it. We hypothesize that infants use acoustic cues to prosody, which NLP systems typically ignore. To evaluate the utility of prosodic information for phrase discovery, we present an HMM-based unsupervised chunker that learns from only transcribed words and raw acoustic correlates to prosody. Unlike previous work on unsupervised parsing and chunking, we use neither gold standard part-of-speech tags nor punctuation in the input. Evaluated on the Switchboard corpus, our model outperforms several baselines that exploit either lexical or prosodic information alone, and, despite producing a flat structure, performs competitively with a state-of-the-art unsupervised lexicalized parser, with a substantial advantage in precision. Our results support the hypothesis that acoustic-prosodic cues provide useful evidence about syntactic phrases for language-learning infants.10 page(s

    The phylogenetic origin and mechanism of sound symbolism - the role of action-perception circuits

    Get PDF
    As opposed to the classic Saussurean view on the arbitrary relationship between linguistic form and meaning, non-arbitrariness is a pervasive feature in human language. Sound symbolism—namely, the intrinsic relationship between meaningless speech sounds and visual shapes—is a typical case of non-arbitrariness. A demonstration of sound symbolism is the “maluma-takete” effect, in which immanent links are observed between meaningless ‘round’ or ‘sharp’ speech sounds (e.g., maluma vs. takete) and round or sharp abstract visual shapes, respectively. An extensive amount of empirical work suggests that these mappings are shared by humans and play a distinct role in the emergence and acquisition of language. However, important questions are still pending on the origins and mechanism of sound symbolic processing. Those questions are addressed in the present work. The first part of this dissertation focuses on the validation of sound symbolic effects in a forced choice task, and on the interaction of sound symbolism with two crossmodal mappings shared by humans. To address this question, human subjects were tested with a forced choice task on sound symbolic mappings crossed with two crossmodal audiovisual mappings (pitch-shape and pitch-spatial position). Subjects performed significantly above chance only for the sound symbolic associations but not for the other two mappings. Sound symbolic effects were replicated, while the other two crossmodal mappings involving low-level audiovisual properties, such as pitch and spatial position, did not emerge. The second issue examined in the present dissertation are the phylogenetic origins of sound symbolic associations. Human subjects and a group of touchscreen trained great apes were tested with a forced choice task on sound symbolic mappings. Only humans were able to process and/or infer the links between meaningless speech sounds and abstract shapes. These results reveal, for the first time, the specificity of humans’ sound symbolic ability, which can be related to neurobiological findings on the distinct development and connectivity of the human language network. The last part of the dissertation investigates whether action knowledge and knowledge of the perceptual outputs of our actions can provide a possible explanation of sound symbolic mappings. In a series of experiments, human subjects performed sound symbolic mappings, and mappings of ‘round’ or ‘sharp’ hand actions sounds with the shapes produced by these hand actions. In addition, the auditory and visual stimuli of both conditions were crossed. Subjects significantly detected congruencies for all mappings, and most importantly, a positive correlation was observed in their performances across conditions. Physical acoustic and visual similarities between the audiovisual byproducts of our hand actions with the sound symbolic pseudowords and shapes show that the link between meaningless speech sounds and abstract visual shapes is found in action knowledge. From a neurobiological perspective the link between actions and the audiovisual by-products of our actions is also in accordance with distributed action perception circuits in the human brain. Action-perception circuits, supported by the human neuroanatomical connectivity between auditory, visual, and motor cortices, and under associative learning, emerge and carry the perceptual and motor knowledge of our actions. These findings give a novel explanation for how symbolic communication is linked to our sensorimotor experiences. To sum up, the present dissertation (i) validates the presence of sound symbolic effects in a forced choice task, (ii) shows that sound symbolic ability is specific to humans, and (iii) that action knowledge can provide the mechanistic glue of mapping meaningless speech sounds to abstract shapes. Overall, the present work contributes to a better understanding of the phylogenetic origins and mechanism of sound symbolic ability in humans.Im Gegensatz zur klassischen Saussureschen Ansicht über die willkürliche Beziehung zwischen sprachlicher Form und Bedeutung ist die Nichtwillkürlichkeit ein durchdringendes Merkmal der menschlichen Sprache. Lautsymbolik—nämlich die intrinsische Beziehung zwischen bedeutungslosen Sprachlauten und visuellen Formen—ist ein typischer Fall von Nichtwillkürlichkeit. Ein Beispiel für Klangsymbolik ist der “malumatakete” Effekt, bei dem immanente Verbindungen zwischen bedeutungslosen ‘runden’ oder ‘scharfen’ Sprachlauten (z.B. maluma vs. takete) und runden bzw. scharfen abstrakten visuellen Formen beobachtet werden. Umfangreiche empirische Arbeiten legen nahe, dass diese Zuordnungen von Menschen vorgenommen werden und bei der Entstehung und dem Erwerb von Sprache eine besondere Rolle spielen. Wichtige Fragen zu Ursprung und Mechanismus der Verarbeitung von Lautsymbolen sind jedoch noch offen. Diese Fragen werden in der vorliegenden Arbeit behandelt. Der erste Teil dieser Dissertation konzentriert sich auf die Validierung von klangsymbolischen Effekten in einer Forced-Choice-Auswahlaufgabe (erzwungene Wahl) und auf die Interaktion von Klangsymbolik mit zwei crossmodalen Mappings, die von Menschen vorgenommen werden. Um dieser Frage nachzugehen, wurden menschliche Probanden mit einer Auswahlaufgabe mit zwei Auswahlmöglichkeiten auf klangsymbolische Zuordnungen getestet , die mit zwei crossmodalen audiovisuellen Zuordnungen (Tonhöhenform und Tonhöhen-Raum-Position) gekreuzt wurden. Die Versuchspersonen erbrachten nur bei den klangsymbolischen Assoziationen eine signifikant über dem Zufall liegende Leistung, nicht aber bei den beiden anderen Zuordnungen. Tonsymbolische Effekte wurden repliziert, während die beiden anderen crossmodalen Zuordnungen, die audiovisuelle Eigenschaften auf niedriger Ebene wie Tonhöhe und räumliche Position beinhalteten, nicht auftraten. Das zweite Thema, das in der vorliegenden Dissertation untersucht wird, sind die phylogenetischen Ursprünge der klangsymbolischen Assoziationen. Menschliche Versuchspersonen und eine Gruppe von Menschenaffen, die auf Touchscreens trainiert wurden, wurden mit einer Forced-Choice-Aufgabe auf klangsymbolische Zuordnungen getestet. Nur Menschen waren in der Lage, die Verbindungen zwischen bedeutungslosen Sprachlauten und abstrakten Formen zu verarbeiten und/oder abzuleiten. Diese Ergebnisse zeigen zum ersten Mal die Spezifität der lautsymbolischen Fähigkeit des Menschen, die mit neurobiologischen Erkenntnissen über die ausgeprägte Entwicklung und Konnektivität des menschlichen Sprachnetzwerks in Verbindung gebracht werden kann. Der letzte Teil der Dissertation untersucht darüber hinaus, ob Handlungswissen und das Wissen um die Wahrnehmungsergebnisse unserer Handlungen eine mögliche Erklärung für solide symbolische Mappings liefern können. In einer Reihe von Experimenten führten menschliche Versuchspersonen klangsymbolische Mappings durch sowie Mappings von ‘runden’ oder ‘scharfen’ Handaktionen Klänge mit den durch diese Handaktionen erzeugten Formen. Darüber hinaus wurden die auditiven und visuellen Reize beider Bedingungen gekreuzt. Die Probanden stellten bei allen Zuordnungen signifikant Kongruenzen fest, und, was am wichtigsten war, es wurde eine positive Korrelation ihrer Leistungen unter allen Bedingungen beobachtet. Physikalische akustische und visuelle Ähnlichkeiten zwischen den audiovisuellen Nebenprodukten unserer Handaktionen mit den klangsymbolischen Pseudowörtern und Formen zeigen, dass die Verbindung zwischen bedeutungslosen Sprachlauten und abstrakten visuellen Formen im Handlungswissen zu finden ist. Aus neurobiologischer Sicht stimmt die Verbindung zwischen Handlungen und den audiovisuellen Nebenprodukten unserer Handlungen auch mit den verteilten Handlungs- und Wahrnehmungskreisläufen im menschlichen Gehirn überein. Aktions- Wahrnehmungsnetzwerken, die durch die neuroanatomische Konnektivität zwischen auditorischen, visuellen und motorischen kortikalen Arealen des Menschen unterstützt werden, entstehen und tragen unter assoziativem Lernen das perzeptuelle und motorische Wissen unserer Handlungen. Diese Erkenntnisse geben eine neuartige Erklärung dafür, wie symbolische Kommunikation in unseren sensomotorischen Erfahrungen verknüpft ist. Zusammenfassend lässt sich sagen, dass die vorliegende Dissertation (i) das Vorhandensein von lautsymbolischen Effekten in einer Forced-Choice-Aufgabe validiert, (ii) zeigt, dass lautsymbolische Fähigkeiten spezifisch für Menschen sind, und (iii) dass Handlungswissen den mechanistischen Klebstoff liefern kann, um bedeutungslose Sprachlaute auf abstrakte Formen abzubilden. Insgesamt trägt die vorliegende Arbeit zu einem besseren Verständnis der phylogenetischen Ursprünge und des Mechanismus der lautsymbolischen Fähigkeit des Menschen bei

    Neural pathways for visual speech perception

    Get PDF
    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA

    Multi-Sensoriality In Language Acquisition: The Relationship Between Selective Visual Attention Towards The Adult’s Face And Language Skills

    Get PDF
    Introduzione Le componenti uditive e visive del linguaggio offrono al bambino informazioni cruciali per il processamento del parlato. L’abilità del bambino di integrare informazioni da diverse fonti multimodali (audio e visive) e di focalizzare l’attenzione sui segnali rilevanti presenti nell’ambiente circostante (selective visual attention) sono aspetti importanti che influenzano le prime fasi di acquisizione di una lingua. Alcuni recenti studi hanno ipotizzato e testato la relazione tra attenzione selettiva visiva verso specifiche aree del volto parlante (occhi o bocca) e le abilità linguistiche di bambini nei primi anni di vita. Molti ricercatori hanno speculato su come questa relazione potesse essere mediata dal livello di expetise del bambino, a livello linguistico (language expertise hypothesis), ma nessuno studio, fin ad ora, ha cercato di approfondire questa ipotesi, andando ad investigare le abilità linguistiche dei bambini usando misure di linguaggio spontaneo. Altri studi, hanno cercato di esplorare come diversi comportamenti attentivi verso specifiche aree del volto (occhi o bocca) fossero correlati alle abilità linguistiche concomitanti o longitudinali dei partecipanti. In molti casi, i risultati di questi studi hanno confermato l’esistenza di relazioni significative tra attenzione visiva selettiva e abilità linguistiche al tempo dell’esperimento o qualche mese dopo. Obiettivi L’obiettivo generale di questa tesi è quello di esaminare il fenomeno dell’attenzione selettiva visiva verso il volto e la sua relazione con lo sviluppo del linguaggio sia in un setting di laboratorio sia in un contesto naturalistico. In particolare, tre sono gli obiettivi specifici: - il primo obiettivo specifico è quello di sintetizzare e analizzare i fattori individuati dalla letteratura di riferimento che possono determinare diversi patterns di attenzione selettiva visiva nei bambini durante un compito audiovisivo. Ed in particolare, descrivere come la letteratura spiega questi patterns in relazione agli aspetti dello sviluppo del linguaggio; 8 - il secondo obiettivo specifico è quello di analizzare sperimentalmente l’attenzione selettiva visiva del bambino verso specifiche aree del volto (occhi e bocca) durante un compito di esposizione audiovisivo. In particolare, lo studio è volto ad indagare due aspetti. Il primo aspetto riguarda l’età e la condizione linguistica (esposizione ad una lingua nativa vs una lingua non nativa) dei partecipanti e come queste influenzano l’attenzione selettiva visiva verso specifiche aree del volto. Il secondo aspetto riguarda l’esplorazione dell’esistenza di una correlazione tra comportamento attentivo dei bambini la produzione vocale al tempo dell’esperimento e all’ampiezza del vocabolario tre mesi dopo; - il terzo obiettivo specifico è quello di capire se l’attenzione a volti o altre parti della scena visiva (oggetto, altre parti della stanza) è influenzato o spigato dalle abilità vocali del bambino al tempo del task e se gli episodi di fissazione al volto adulto possono essere predetti da specifiche proprietà fonologiche e semantiche del parlato del bambino. Metodo Per quanto concerne il primo studio, una rassegna sistematica della letteratura è stata condotta esplorando quattro fonti bibliografiche e usando specifici criteri di inclusione per selezionare la letteratura scientifica di interesse. Per quanto riguarda il secondo studio, i movimenti oculari verso un volto parlante la lingua nativa (Italiano) e non-nativa (Inglese) di 26 bambini tra i 6 e i 14 mesi sono stati tracciati usando l’eye tracker. Due gruppi sono stati creati sulla base dell’età (G1, M = 7 mesi, N = 15 bambini; G2, M = 12 mesi, N = 11 bambini). Ogni competenza linguistica del bambino è stata valutata due volte, al tempo dell’esperimento, attraverso l’osservazione diretta e tre mesi dopo, attraverso il MB-CDI. Due gruppi sono stati creati sulla base della produzione vocale dei bambini (vocalizzi pre-canonici, babbling, parole) attraverso un latent class cluster analysis: una classe vocale “alta” (percentuale di babbling e parole più alta) e una classe vocale “bassa” (percentuale maggiore di produzioni pre-canoniche). Per quanto concerne il terzo studio, il comportamento attentivo di 29 bambini tra i 12 e i 19 mesi è stato esplorato utilizzando sia una videocamera stazionaria 9 (posizionata di fronte alla diade) e una go-pro (posizionata sulla fronte del caregiver di riferimento) durante un semplice task linguistico (single object task). Durante il task i bambini sono stati esposti ad un set di stimoli audiovisivi, parole vere e non parole, scelte sulla base dei report dei genitori e sulle risposte al MB-CDI. Il comportamento attentivo dei bambini è stato codificato offline, secondo per secondo per un totale di 116 sessioni. La codifica ha riguardato specifiche aree di interesse (il volto, l’oggetto, o altre parti della stanza). La produzione vocale per ogni bambino è stata quantificata usando LENA e le produzioni del bambino (vocalizzi pre-canonici, babbling, parole) durante un periodo di gioco con la mamma sono state trascritte foneticamente. Risultati La rassegna sistematica della letteratura (Capitolo 2) ha portato all’identificazione di 19 articoli. Alcuni dei quali volti a chiarire il ruolo giocato da diversi fattori nel spiegare diversi patterns attentivi. Altri interessati ad indagare la correlazione tra l’attenzione selettiva visiva verso specifiche aree del volto alle competenze linguistiche o sociali dei partecipanti, aprendo le porte a diverse linee interpretative. Il primo studio empirico (Capitolo 3) ha messo in luce che i bambini italiani con età superiore ai 12 mesi, mostrano maggiore interesse verso l’area della bocca, specialmente quando esposti alla lingua nativa. Questo è in accordo con la recente letteratura, ma contrasta con la language expertise hypotesis (secondo la quale bambini attorno all’anno di età dovrebbero spostare il focus attentivo dalla bocca agli occhi). Il secondo risultato emerso in questo lavoro empirico riguarda l’interesse verso l’area della bocca per i bambini che hanno maggiori livelli di produzione in termini di babbling e parole al tempo dell’esperimento. Il terzo risultato riguarda l’associazione positiva tra il comportamento attentivo verso la bocca ed il vocabolario espressivo dei bambini misurato tramite questionario (MB-CDI) tre mesi dopo l’esperimento. Dal secondo studio empirico (Capitolo 4) emerge una differenza significativa in termini di tempo attentivo verso il volto adulto tra i bambini del gruppo linguistico “alto” e “basso” durante un task condotto in un contesto naturalistico. 10 In particolare, da questo studio emergono due risultati interessanti: il primo è che i bambini che producono forme vocaliche più avanzate (babbling e parole) guardano di più verso il volto adulto, specialmente quando esposti alle non-parole. Il secondo riguarda l’esistenza di una relazione significativa tra gli episodi di fissazione al volto e le abilità vocaliche del bambino al tempo del task (vocalizzi pre-canonici, babbling e parole). In particolare, emerge che la quantità di babbling prodotto ha un ruolo nel predire gli episodi di fissazione al volto durante il task, sia per le parole sia per le non parole. Conclusioni Diverse ipotesi linguistiche e sociali sono state avanzate per spiegare le differenze emerse dalla rassegna della letteratura in relazione al fenomeno dell’attenzione selettiva visiva. Gli studi empirici presentati in questa tesi hanno portato due contributi originali in quest’ambito di ricerca. Da un lato, i nostri risultati confermano l’idea che la bocca e, più in generale, il volto forniscono segnali visivi cruciali nelle prime fasi di acquisizione del linguaggio. Dall’altro lato, i risultati hanno messo in luce che la conoscenza linguistica e le abilità linguistiche dei partecipanti aiutano a spiegare diversi comportamenti attentivi. In altre parole, è possibile dire che l’attenzione selettiva ai volti, o a specifiche aree di questi, è spiegata dalle conoscenze e abilità linguistiche attuali dei partecipanti.Introduction Speech is the result of multimodal or multi-sensorial processes. The auditory and visual components of language provide the child with information crucial to the processing of speech. The language acquisition process is influenced by the child’s ability to integrate information from multimodal (audio and visual) sources and to focus attention on the relevant cues in the environment; this is selective visual attention. This dissertation will explore the relationship between children’s selective visual attention and their early language skills. Several recent studies with infant populations have hypothesised or tested the relationship between children’s selective visual attention towards specific regions of the talking face (i.e., the eyes or the mouth) and their language skills. These studies have tried to show how concomitant or longitudinal language skills can explain looking behaviours. In most cases, these studies have speculated on how this relationship is mediated by the child’s level of language expertise (this is known as the language expertise hypothesis). However, no studies until now, to the best of our knowledge, have investigated the child’s linguistic skills using spontaneous language measures. Aims The dissertation has one broad aim, within which there are three particular aims. The broad aim is to examine the phenomenon of selective visual attention toward the face in both a laboratory and a naturalistic setting, and its relationship with language development. The three particular aims are as follows. The first aim is to synthesise and analyse the factors that might determine different looking patterns in infants during audiovisual tasks using dynamic faces; it describes how the literature explains these patterns in relation to aspects of language development. The second aim is to experimentally investigate the child’s selective visual attention towards a specific region of the adult’s face (the eyes and the mouth) in a task using the eye-tracking method. In particular, the study will explore two 12 questions: First, how do age and language condition (exposure to native vs non-native speech) affect looking behaviour in children? Second, are a child’s looking behaviours related to vocal production at the time of the experiment and to vocabulary rates three months later, and if so, how? The third aim is to understand whether selective attention towards the face or other parts of the visual scene (i.e. the object or elsewhere) is influenced or explained by the child’s vocal skills at the time of the task. And can the episodes of fixation towards the adult’s face be predicted by specific phonological and semantic properties (i.e., pre-canonical vocalisations, babbling, words) of the child’s speech? Method For the first study, a systematic review of the literature was conducted, exploring four bibliographic databases and using specific inclusion criteria to select the records. For the second study, eye movements towards a dynamic face (on a screen), speaking in the child’s native language (Italian) and a non-native language (English), were tracked using an eye-tracker in 26 infants between 6 and 14 months. Two groups were created based on age (G1, M = 7 months, N = 15 infants; G2, M = 12 months, N = 11 infants). Each child’s language skill was assessed twice: at the time of the experiment (through direct observation, Time 1) and three months later (through MB-CDI, Time 2). Two groups were created, based on the child’s vocal production (Time 1, latent class cluster analysis): a high class (higher percentage of babbling and words) vs a low class (higher percentage of pre-canonical vocalisations). For the third study, the looking behaviour of the same 29 children between 12 and 19 months was tracked, using both a stationary video camera and a head-mounted camera on the mother’s head during a single object task. During the task, children were exposed to a set of audiovisual stimuli, real words and non-words, chosen based on the parents’ reports and their MB-CDI answers. The child’s looking behaviour was coded offline second-by-second for a total of 116 sessions. The coding relates to specific areas of interest, i.e., the face, the object or 13 elsewhere. The vocal production of each child was quantified using a LENA device, and their speech during a play period with their mothers was transcribed phonetically. Results The systematic search of the literature (Chapter 2) identified 19 papers. Some tried to clarify the role played by audiovisual factors in support of speech perception (provided by looking towards the eyes or the mouth of a talking face). Others related selective visual attention towards specific areas of the adult’s face to the child’s competence in terms of linguistic or social skills, this leads to correspondingly different lines of interpretation. The first empirical study (Chapter 3) shows that Italian children older than 12 months displayed a greater interest in the mouth area, especially when they were exposed to their native language. This accords with the more recent literature but contrasts with the language expertise hypothesis. The second significant result of Chapter 3 is that children who had a higher level of production in terms of babbling and words at the time of the experiment looked more towards the mouth area. The study reported in Chapter 3 also demonstrated a positive association between the child’s looking to the mouth and their expressive vocabulary as measured (using the MB-CDI) three months after the experiment The second empirical study (Chapter 4) shows a significant difference in the looking time towards the adult’s face between children with low- and high-vocal production in a naturalistic setting. More specifically, from this study, we find two things. Firstly, we found that the children who produced more advanced vocal forms (higher amount of babbling and word production) looked more towards the adult’s face, especially when exposed to non-words. Secondly, that a significant relationship exists between the episodes of fixation towards the adult’s face and the child’s vocal skills (i.e., pre-canonical vocalisations, babbling, words); babbling productions predicted the episodes of face fixation in the task as a whole, for both words and non-words. 14 Conclusion Linguistic and social-based hypotheses attempting to explain the differences in the selective visual attention phenomenon emerged from the literature review. The empirical studies presented in this thesis bring two original contributions to this research field. First, our findings reinforce the idea that the mouth and, more generally the face, provide crucial visual cues when acquiring a language. Secondly, our results demonstrate that language knowledge and language skills at the time the child was observed significantly help to explain different looking behaviours. In other words, we can conclude that each child’s attention to faces is shaped by their own linguistic characteristics
    • …
    corecore