33 research outputs found

    Language and gaze cues: findings from the real world and the lab

    Get PDF

    A Rose by Any Other Verb: The Effect of Expectations and Word Category on Processing Effort in Situated Sentence Comprehension

    Get PDF
    Recent work has shown that linguistic and visual contexts jointly modulate linguistic expectancy and, thus, the processing effort for a (more or less) expected critical word. According to these findings, uncertainty about the upcoming referent in a visually-situated sentence can be reduced by exploiting the selectional restrictions of a preceding word (e.g., a verb or an adjective), which then reduces processing effort on the critical word (e.g., a referential noun). Interestingly, however, no such modulation was observed in these studies on the expectation-generating word itself. The goal of the current study is to investigate whether the reduction of uncertainty (i.e., the generation of expectations) simply does not modulate processing effort-or whether the particular subject-verb-object (SVO) sentence structure used in these studies (which emphasizes the referential nature of the noun as direct pointer to visually co-present objects) accounts for the observed pattern. To test these questions, the current design reverses the functional roles of nouns and verbs by using sentence constructions in which the noun reduces uncertainty about upcoming verbs, and the verb provides the disambiguating and reference-resolving piece of information. Experiment 1 (a Visual World Paradigm study) and Experiment 2 (a Grammaticality Maze study) both replicate the effect found in previous work (i.e., the effect of visually-situated context on the word which uniquely identifies the referent), albeit on the verb in the current study. Results on the noun, where uncertainty is reduced and expectations are generated in the current design, were mixed and were most likely influenced by design decisions specific to each experiment. These results show that processing of the reference-resolving word—whether it be a noun or a verb—reliably benefits from the prior linguistic and visual information that lead to the generation of concrete expectations

    Ratings of name agreement and semantic categorization of 247 colored clipart pictures by young German children

    Get PDF
    Developmental and longitudinal studies with children increasingly use pictorial stimuli in cognitive, psychologic, and psycholinguistic research. To enhance validity and comparability within and across those studies, the use of normed pictures is recommended. Besides, creating picture sets and evaluating them in rating studies is very time consuming, in particular regarding samples of young children in which testing time is rather limited. As there is an increasing number of studies that investigate young German children's semantic language processing with colored clipart stimuli, this work provides a first set of 247 colored cliparts with ratings of German native speaking children aged 4 to 6 years. We assessed two central rating aspects of pictures: Name agreement (Do pictures elicit the intended name of an object?) and semantic categorization (Are objects classified as members of the intended semantic category?). Our ratings indicate that children are proficient in naming and even better in semantic categorization of objects, whereas both seems to improve with increasing age of young childhood. Finally, this paper discusses some features of pictorial objects that might be important for children's name agreement and semantic categorization and could be considered in future picture rating studies

    Joint attention in spoken human-robot interaction

    Get PDF
    Gaze during situated language production and comprehension is tightly coupled with the unfolding speech stream - speakers look at entities before mentioning them (Griffin, 2001; Meyer et al., 1998), while listeners look at objects as they are mentioned (Tanenhaus et al., 1995). Thus, a speaker\u27s gaze to mentioned objects in a shared environment provides the listener with a cue to the speaker\u27s focus of visual attention and potentially to an intended referent. The coordination of interlocutor\u27s visual attention, in order to learn about the partner\u27s goals and intentions, has been called joint attention (Moore and Dunham, 1995; Emery, 2000). By revealing the speakers communicative intentions, such attentional cues thus complement spoken language, facilitating grounding and sometimes disambiguating references (Hanna and Brennan, 2007). Previous research has shown that people readily attribute intentional states to non-humans as well, like animals, computers, or robots (Nass and Moon, 2000). Assuming that people indeed ascribe intentional states to a robot, joint attention may be a relevant component of human-robot interaction as well. It was the objective of this thesis to investigate the hypothesis that people jointly attend to objects looked at by a speaking robot and that human listeners use this visual information to infer the robot\u27s communicative intentions. Five eye-tracking experiments in a spoken human-robot interaction setting were conducted and provide supporting evidence for this hypothesis. In these experiments, participants\u27 eye movements and responses were recorded while they viewed videos of a robot that described and looked at objects in a scene. The congruency and alignment of robot gaze and the spoken references were manipulated in order to establish the relevance of such gaze cues for utterance comprehension in participants. Results suggest that people follow robot gaze to objects and infer referential intentions from it, causing both facilitation and disruption of reference resolution, depending on the match or mismatch between inferred intentions and the actual utterance. Specifically, we have shown in Experiments 1-3 that people assign attentional and intentional states to a robot, interpreting its gaze as cue to intended referents. This interpretation determined how people grounded spoken references in the scene, thus, influencing overall utterance comprehension as well as the production of verbal corrections in response to false robot utterances. In Experiments 4 and 5, we further manipulated temporal synchronization and linear alignment of robot gaze and speech and found that substantial temporal shifts of gaze relative to speech did not affect utterance comprehension while the order of visual and spoken referential cues did. These results show that people interpret gaze cues in the order they occur in and expect the retrieved referential intentions to be realized accordingly. Thus, our findings converge to the result that people establish joint attention with a robot.Die Blickrichtung des Menschen ist eng mit Sprachproduktion und Sprachverstehen verknĂŒpft: So schaut ein Sprecher in der Regel auf ein Objekt kurz bevor er es nennt, wĂ€hrend der Blick des Hörers sich beim Verstehen des Objektnamens darauf richtet (Griffin, 2001; Meyer et al., 1998; Tanenhaus et al., 1995). Die Blickrichtung des Sprechers gibt dem Hörer also Aufschluss darĂŒber, wohin die Aufmerksamkeit des Sprechers gerade gerichtet ist und worĂŒber möglicherweise als nĂ€chstes gesprochen wird. Wenn jemand dem Blick seines GegenĂŒbers folgt, um herauszufinden was dieser fuer Ziele oder Absichten hat, spricht man von gemeinsamer Aufmerksamkeit (Joint Attention, bzw. Shared Attention, wenn beide GesprĂ€chspartner ihre Aufmerksamkeit bewusst koordinieren, Moore and Dunham, 1995; Emery, 2000). Der Blickrichtung des Sprechers zu folgen, kann demnach nĂŒtzlich sein, da sie hĂ€ufig seine Absichten verrĂ€t. Sie kann sogar das Sprachverstehen erleichtern, indem zum Beispiel referenzierende Ausdruecke mit Hilfe solcher visuellen Informationen disambiguiert werden (Hanna and Brennan, 2007). DarĂŒber hinaus wurde in der Vergangenheit gezeigt, dass Menschen hĂ€ufig nicht nur Menschen, sondern auch Tieren und Maschinen, wie zum Bespiel Robotern, Ab- sichten oder CharakterzĂŒge zuschreiben (Nass and Moon, 2000). Wenn Robotern tatsĂ€chlich die eigentlich menschliche FĂ€higkeit, Ziele oder Absichten zu haben, zugeordnet wird, dann ist davon auszugehen, dass gemeinsame Aufmerksamkeit auch einen wichtigen Bestandteil der Kommunikation zwischen Mensch und Roboter darstellt. Ziel dieser Dissertation war es, die Hypothese zu untersuchen, dass Menschen versuchen Aufmerksamkeit mit Robotern zu teilen, um zu erkennen, was ein Roboter beabsichtigt zu sagen oder zu tun. Wir stellen insgesamt fĂŒnf Experimente vor, die diese Hypothese unterstĂŒtzen. In diesen Experimenten wurden die Augenbewegungen und Antworten, beziehungsweise Reaktionszeiten, von Versuchspersonen aufgezeichnet, wĂ€hrend letztere sich Videos anschauten. Die Videos zeigten einen Roboter, welcher eine Anordnung von Objekten beschrieb, wĂ€hrend er seine Kamera auf das ein oder andere Objekt richtete, um Blickrichtung zu simulieren. Manipuliert wurde die Kongruenz der Verweise auf Objekte durch Blickrichtung und Objektnamen, sowie die Abfolge solcher Verweise. Folglich konnten der Informationsgehalt und die relative Gewichtung von Blickrichtung fuer das Sprachverstehen bestimmt werden. Unsere Ergebnisse belegen, dass Menschen tatsĂ€chlich dem Roboterblick folgen und ihn Ă€hnlich interpretieren wie die Blickrichtung anderer Menschen, d.h. Versuchspersonen leiteten aus der Blickrichtung des Roboters ab, was dessen vermeintliche (sprachliche) Absichten waren. Insbesondere zeigen die Experimente 1-3, dass Versuchspersonen die Blickrichtung des Roboters als Hinweis auf nachfolgende, referenzierende AusdrĂŒcke verstehen und dementsprechend die Äußerung des Roboter speziell auf jene angeschauten Objekte beziehen. Dies fĂŒhrt zu verkĂŒrzten Reaktionszeiten wenn die Verweise auf Objekte durch Blickrichtung und Objektnamen ĂŒbereinstimmen, wĂ€hrend widersprĂŒchliche Verweise zu verlĂ€ngerten Reaktionszeiten fĂŒhren. Dass Roboterblick als Ausdruck einer (sprachlichen) Absicht interpretiert wird, zeigt sich auch in den Antworten, mit denen Versuchspersonen falsche Aussagen des Roboters korrigierten. In den Experimenten 4-5 wurde außerdem die Anordnung der Verweise durch Blick und Sprache manipuliert. WĂ€hrend die genaue zeitliche Abstimmung der Verweise den Einfluss von Roboterblick nicht mindert, so scheint die Reihenfolge der Verweise entscheidend zu sein. Unsere Ergebnisse deuten darauf hin, dass Menschen Absichten aus den Verweisen durch Blickrichtung ableiten und erwarten, dass diese Absichten in derselben Anordnung umgesetzt werden. Insgesamt lassen unsere Ergebnisse also darauf schließen, dass Menschen versuchen, ihre Aufmerksamkeit gemeinsam mit Robotern zu koordinieren, um das Sprachverstehen zu erleichtern

    Language and gaze cues: findings from the real world and the lab

    Get PDF

    Even young children make multiple predictions in the complex visual world

    Get PDF
    Children can anticipate upcoming input in sentences with semantically constraining verbs. In the visual world, the sentence context is used to anticipatorily fixate the only object matching potential sentence continuations. Adults can process even multiple visual objects in parallel when predicting language. This study examined whether young children can also maintain multiple prediction options in parallel during language processing. In addition, we aimed at replicating the finding that children’s receptive vocabulary size modulates their prediction. German children (5–6 years, n = 26) and adults (19–40 years, n = 37) listened to 32 subject–verb–object sentences with semantically constraining verbs (e.g., ‘‘The father eats the waffle”) while looking at visual scenes of four objects. The number of objects being consistent with the verb constraints (e.g., being edible) varied among 0, 1, 3, and 4. A linear mixed effects model on the proportion of target fixations with the effect coded factors condition (i.e., the number of consistent objects), time window, and age group revealed that upon hearing the verb, children and adults anticipatorily fixated the single visual object, or even multiple visual objects, being consistent with the verb constraints, whereas inconsistent objects were fixated less. This provides first evidence that, comparable to adults, young children maintain multiple prediction options in parallel. Moreover, children with larger receptive vocabulary sizes (Peabody Picture Vocabulary Test) anticipatorily fixated potential targets more often than those with smaller ones, showing that verbal abilities affect children’s prediction in the complex visual world

    Graded expectations in visually situated comprehension: Costs and benefits as indexed by the N400

    Get PDF
    Recently, Ankener et al. (Frontiers in Psychology, 9, 2387, 2018) presented a visual world study which combined both attention and pupillary measures to demonstrate that anticipating a target results in lower effort to integrate that target (noun). However, they found no indication that the anticipatory processes themselves, i.e., the reduction of uncertainty about upcoming referents, results in processing effort (cf. Linzen and Jaeger, Cognitive Science, 40(6), 1382–1411, 2016). In contrast, Maess et al. (Frontiers in Human Neuroscience, 10, 1–11, 2016) found that more constraining verbs elicited a higher N400 amplitude than unconstraining verbs. The aim of the present study was therefore twofold: Firstly, we examined whether the graded ICA effect, which was previously found on the noun as a result of a likelihood manipulation, replicates in ERP measures. Secondly, we set out to investigate whether the processes leading to the generation of expectations (derived during verb and scene processing) induce an N400 modulation. Our results confirm that visual context is combined with the verb’s meaning to establish expectations about upcoming nouns and that these expectations affect the retrieval of the upcoming noun (modulated N400 on the noun). Importantly, however, we find no evidence for different costs in generating more or less specific expectations for upcoming nouns. Thus, the benefits of generating expectations are not associated with any costs in situated language comprehension

    Influence of speakers' gaze on situated language comprehension: Evidence from Event-Related Potentials

    Get PDF
    Behavioral studies have shown that speaker gaze to objects in a co-present scene can influence listeners' sentence comprehension. To gain deeper insight into the mechanisms involved in gaze processing and integration, we conducted two ERP experiments (N = 30, Age: [18, 32] and [19, 33] respectively). Participants watched a centrally positioned face performing gaze actions aligned to utterances comparing two out of three displayed objects. They were asked to judge whether the sentence was true given the provided scene. We manipulated the second gaze cue to be either Congruent (baseline), Incongruent or Averted (Exp1)/Mutual (Exp2). When speaker gaze is used to form lexical expectations about upcoming referents, we found an attenuated N200 when phonological information confirms these expectations (Congruent). Similarly, we observed attenuated N400 amplitudes when gaze-cued expectations (Congruent) facilitate lexical retrieval. Crucially, only a violation of gaze-cued lexical expectations (Incongruent) leads to a P600 effect, suggesting the necessity to revise the mental representation of the situation. Our results support the hypothesis that gaze is utilized above and beyond simply enhancing a cued object's prominence. Rather, gaze to objects leads to their integration into the mental representation of the situation before they are mentioned

    The Influence of Visual Uncertainty on Word Surprisal and Processing Effort

    Get PDF
    A word’s predictability or surprisal, as determined by cloze probabilities or language models (Frank, 2013) is related to processing effort, in that less expected words take more effort to process (Hale, 2001; Lau et al., 2013). A word’s surprisal, however, may also be influenced by the non-linguistic context, such as visual cues: In the visual world paradigm (VWP), anticipatory eye movements suggest that listeners exploit the scene to predict what will be mentioned next (Altmann and Kamide, 1999). How visual context affects surprisal and processing effort, however, remains unclear. Here, we present a series of four studies providing evidence on how visually-determined probabilistic expectations for a spoken target word, as indicated by anticipatory eye movements, predict graded processing effort for that word, as assessed by a pupillometric measure (the Index of Cognitive Activity, ICA). These findings are a clear and robust demonstration that the non-linguistic context can immediately influence both lexical expectations, and surprisal-based processing effort
    corecore