23,218 research outputs found

    Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

    Get PDF
    When speakers describe an image, they tend to look at objects before mentioning them. In this paper, we investigate such sequential cross-modal alignment by modelling the image description generation process computationally. We take as our starting point a state-of-the-art image captioning system and develop several model variants that exploit information from human gaze patterns recorded during language production. In particular, we propose the first approach to image description generation where visual processing is modelled sequentially\textit{sequentially}. Our experiments and analyses confirm that better descriptions can be obtained by exploiting gaze-driven attention and shed light on human cognitive processes by comparing different ways of aligning the gaze modality with language production. We find that processing gaze data sequentially leads to descriptions that are better aligned to those produced by speakers, more diverse, and more natural−{-}particularly when gaze is encoded with a dedicated recurrent component.Comment: In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    A facial expression for anxiety.

    Get PDF
    Anxiety and fear are often confounded in discussions of human emotions. However, studies of rodent defensive reactions under naturalistic conditions suggest anxiety is functionally distinct from fear. Unambiguous threats, such as predators, elicit flight from rodents (if an escape-route is available), whereas ambiguous threats (e.g., the odor of a predator) elicit risk assessment behavior, which is associated with anxiety as it is preferentially modulated by anti-anxiety drugs. However, without human evidence, it would be premature to assume that rodent-based psychological models are valid for humans. We tested the human validity of the risk assessment explanation for anxiety by presenting 8 volunteers with emotive scenarios and asking them to pose facial expressions. Photographs and videos of these expressions were shown to 40 participants who matched them to the scenarios and labeled each expression. Scenarios describing ambiguous threats were preferentially matched to the facial expression posed in response to the same scenario type. This expression consisted of two plausible environmental-scanning behaviors (eye darts and head swivels) and was labeled as anxiety, not fear. The facial expression elicited by unambiguous threat scenarios was labeled as fear. The emotion labels generated were then presented to another 18 participants who matched them back to photographs of the facial expressions. This back-matching of labels to faces also linked anxiety to the environmental-scanning face rather than fear face. Results therefore suggest that anxiety produces a distinct facial expression and that it has adaptive value in situations that are ambiguously threatening, supporting a functional, risk-assessing explanation for human anxiet

    Reading Russian poetry: An expert–novice study

    Get PDF
    Studying the role of expertise in poetry reading, we hypothesized that poets’ expert knowledge comprises genre-appropriate reading- and comprehension strategies that are reflected in distinct patterns of reading behavior. We recorded eye movements while two groups of native speakers (n=10 each) read selected Russian poetry: an expert group of professional poets who read poetry daily, and a control group of novices who read poetry less than once a month. We conducted mixed-effects regression analyses to test for effects of group on first-fixation durations, first-pass gaze durations, and total reading times per word while controlling for lexical- and text variables. First-fixation durations exclusively reflected lexical features, and total reading times reflected both lexical- and text variables; only first-pass gaze durations were additionally modulated by readers’ level of expertise. Whereas gaze durations of novice readers became faster as they progressed through the poems, and differed between line-final words and non-final ones, poets retained a steady pace of first-pass reading throughout the poems and within verse lines. Additionally, poets’ gaze durations were less sensitive to word length. We conclude that readers’ level of expertise modulates the way they read poetry. Our findings support theories of literary comprehension that assume distinct processing modes which emerge from prior experience with literary texts

    Markers of cognitive function in individuals with metabolic disease: Morquio Syndrome and Tyrosinemia Type III

    Get PDF
    We characterized cognitive function in two metabolic diseases. MPS–IVa (mucopolysaccharidosis IVa, Morquio) and tyrosinemia type III individuals were assessed using tasks of attention, language and oculomotor function. MPS–IVa individuals were slower in visual search, but the display size effects were normal, and slowing was not due to long reaction times (ruling out slow item processing or distraction). Maintaining gaze in an oculomotor task was difficult. Results implicated sustained attention and task initiation or response processing. Shifting attention, accumulating evidence and selecting targets were unaffected. Visual search was also slowed in tyrosinemia type III, and patterns in visual search and fixation tasks pointed to sustained attention impairments, although there were differences from MPS–IVa. Language was impaired in tyrosinemia type III but not MPS–IVa. Metabolic diseases produced selective cognitive effects. Our results, incorporating new methods for developmental data and model selection, illustrate how cognitive data can contribute to understanding function in biochemical brain systems

    Gesture and Speech in Interaction - 4th edition (GESPIN 4)

    Get PDF
    International audienceThe fourth edition of Gesture and Speech in Interaction (GESPIN) was held in Nantes, France. With more than 40 papers, these proceedings show just what a flourishing field of enquiry gesture studies continues to be. The keynote speeches of the conference addressed three different aspects of multimodal interaction:gesture and grammar, gesture acquisition, and gesture and social interaction. In a talk entitled Qualitiesof event construal in speech and gesture: Aspect and tense, Alan Cienki presented an ongoing researchproject on narratives in French, German and Russian, a project that focuses especially on the verbal andgestural expression of grammatical tense and aspect in narratives in the three languages. Jean-MarcColletta's talk, entitled Gesture and Language Development: towards a unified theoretical framework,described the joint acquisition and development of speech and early conventional and representationalgestures. In Grammar, deixis, and multimodality between code-manifestation and code-integration or whyKendon's Continuum should be transformed into a gestural circle, Ellen Fricke proposed a revisitedgrammar of noun phrases that integrates gestures as part of the semiotic and typological codes of individuallanguages. From a pragmatic and cognitive perspective, Judith Holler explored the use ofgaze and hand gestures as means of organizing turns at talk as well as establishing common ground in apresentation entitled On the pragmatics of multi-modal face-to-face communication: Gesture, speech andgaze in the coordination of mental states and social interaction.Among the talks and posters presented at the conference, the vast majority of topics related, quitenaturally, to gesture and speech in interaction - understood both in terms of mapping of units in differentsemiotic modes and of the use of gesture and speech in social interaction. Several presentations explored the effects of impairments(such as diseases or the natural ageing process) on gesture and speech. The communicative relevance ofgesture and speech and audience-design in natural interactions, as well as in more controlled settings liketelevision debates and reports, was another topic addressed during the conference. Some participantsalso presented research on first and second language learning, while others discussed the relationshipbetween gesture and intonation. While most participants presented research on gesture and speech froman observer's perspective, be it in semiotics or pragmatics, some nevertheless focused on another importantaspect: the cognitive processes involved in language production and perception. Last but not least,participants also presented talks and posters on the computational analysis of gestures, whether involvingexternal devices (e.g. mocap, kinect) or concerning the use of specially-designed computer software forthe post-treatment of gestural data. Importantly, new links were made between semiotics and mocap data

    An Eye for AI: A Multimodal Bottleneck Transformer Approach for Predicting Individual Eye Movements : Towards Foundation Models for Human Factors & Neuroscience

    Get PDF
    Human perception has been a subject of study for centuries. Various eye tracking methods in many study designs have shed light on individual differences in perception and visual navigation. However, accurately identifying individuals based on gaze behaviour remains a challenge. Artificial intelligence (AI) based methods have led to large successes in domains such as vision and language; they are also making their introduction in human factors & neuroscience (HFN). Leveraging AI for HFN requires quantities of data several orders of magnitude larger than the field is used to organising; there exists a clear discrepancy in the standardisation of data publication. In this work, we work towards foundation models (FM) for HFN by highlighting important data insights from AI. A multimodal bottleneck transformer is proposed, a model architecture that can effectively and efficiently represent and work with the varying modalities encountered in HFN. Results indicate that classification of individuals and prediction of gaze is possible, given more training data
    • …
    corecore