4,077 research outputs found

    When Computer Vision Gazes at Cognition

    Get PDF
    Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately determine whether another person is looking directly at them versus away, little is known about human ability to discriminate a third person gaze directed towards objects that are further away, especially in unconstraint cases where the looker can move her head and eyes freely. In this paper we address this question by jointly exploring human psychophysics and a cognitively motivated computer vision model, which can detect the 3D direction of gaze from 2D face images. The synthesis of behavioral study and computer vision yields several interesting discoveries. (1) Human accuracy of discriminating targets 8{\deg}-10{\deg} of visual angle apart is around 40% in a free looking gaze task; (2) The ability to interpret gaze of different lookers vary dramatically; (3) This variance can be captured by the computational model; (4) Human outperforms the current model significantly. These results collectively show that the acuity of human joint attention is indeed highly impressive, given the computational challenge of the natural looking task. Moreover, the gap between human and model performance, as well as the variability of gaze interpretation across different lookers, require further understanding of the underlying mechanisms utilized by humans for this challenging task.Comment: Tao Gao and Daniel Harari contributed equally to this wor

    Object Referring in Videos with Language and Human Gaze

    Full text link
    We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

    Looking back at the stare-in-the-crowd effect: Staring eyes do not capture attention in visual search

    Get PDF
    The stare-in-the crowd effect refers to the finding that a visual search for a target of staring eyes among averted- eyesdistractersismoreefficientthanthesearchforan averted-eyes target among staring distracters. This finding could indicate that staring eyes are prioritized in the processing of the search array so that attention is more likely to be directed to their location than to any other. However, visual search is a complex process, which not only depends upon the properties of the target, but also the similarity between the target of the search and the distractor items and between the distractor items themselves. Across five experiments, we show that the search asymmetry diagnostic of the stare- in-the-crowd effect is more likely to be the result of a failure to control for the similarity among distracting items between the two critical search conditions rather than any special attention-grabbing property of staring gazes. Our results suggest that, contrary to results reported in the literature, staring gazes are not prioritized by attention in visual search

    Investigating non-visual eye movements non-intrusively: Comparing manual and automatic annotation styles

    Get PDF
    Non-visual eye-movements (NVEMs) are eye movements that do not serve the provision of visual information. As of yet, their cognitive origins and meaning remain under-explored in eye-movement research. The first problem presenting itself in pursuit of their study is one of annotation: in virtue of their being non-visual, they are not necessarily bound to a specific surface or object of interest, rendering conventional eye-trackers nonideal for their study. This, however, makes it potentially viable to investigate them without requiring high resolution data. In this report, we present two approaches to annotating NVEM data – one of them grid-based, involving manual annotation in ELAN (Max Planck Institute for Psycholinguistics: The Language Archive, 2019), the other one Cartesian coordinate-based, derived algorithmically through OpenFace (Baltrušaitis et al., 2018). We evaluated a) the two approaches in themselves, e.g. in terms of consistency, as well as b) their compatibility, i.e. the possibilities of mapping one to the other. In the case of a), we found good overall consistency in both approaches, in the case of b), there is evidence for the eventual possibility of mapping the OpenFace gaze estimations onto the manual coding grid

    Sensory salience processing moderates attenuated gazes on faces in autism spectrum disorder: a case–control study

    Full text link
    Background: Attenuated social attention is a key marker of autism spectrum disorder (ASD). Recent neuroimaging findings also emphasize an altered processing of sensory salience in ASD. The locus coeruleus-norepinephrine system (LC-NE) has been established as a modulator of this sensory salience processing (SSP). We tested the hypothesis that altered LC-NE functioning contributes to different SSP and results in diverging social attention in ASD. Methods: We analyzed the baseline eye-tracking data of the EU-AIMS Longitudinal European Autism Project (LEAP) for subgroups of autistic participants (n = 166, age = 6-30 years, IQ = 61-138, gender [female/male] = 41/125) or neurotypical development (TD; n = 166, age = 6-30 years, IQ = 63-138, gender [female/male] = 49/117) that were matched for demographic variables and data quality. Participants watched brief movie scenes (k = 85) depicting humans in social situations (human) or without humans (non-human). SSP was estimated by gazes on physical and motion salience and a corresponding pupillary response that indexes phasic activity of the LC-NE. Social attention is estimated by gazes on faces via manual areas of interest definition. SSP is compared between groups and related to social attention by linear mixed models that consider temporal dynamics within scenes. Models are controlled for comorbid psychopathology, gaze behavior, and luminance. Results: We found no group differences in gazes on salience, whereas pupillary responses were associated with altered gazes on physical and motion salience. In ASD compared to TD, we observed pupillary responses that were higher for non-human scenes and lower for human scenes. In ASD, we observed lower gazes on faces across the duration of the scenes. Crucially, this different social attention was influenced by gazes on physical salience and moderated by pupillary responses. Limitations: The naturalistic study design precluded experimental manipulations and stimulus control, while effect sizes were small to moderate. Covariate effects of age and IQ indicate that the findings differ between age and developmental subgroups. Conclusions: Pupillary responses as a proxy of LC-NE phasic activity during visual attention are suggested to modulate sensory salience processing and contribute to attenuated social attention in ASD

    The role of perspective taking on attention: a review of the special issue on the Reflexive Attentional Shift Phenomenon

    Get PDF
    Attention is a process that alters how cognitive resources are allocated, and it allows individuals to efficiently process information at the attended location. The presence of visual or auditory cues in the environment can direct the focus of attention towards certain stimuli; even if the cued stimuli are not the individual’s primary target. Samson et al. [1] demonstrated that seeing another person in the scene (i.e. a person-like cue) caused a delay in responding to target stimuli not visible to that person: “altercentric intrusion”. This phenomenon, they argue, is dependent upon the fact that the cue used resembled a person as opposed to a more generic directional indicator. The characteristics of the cue are the core of the debate of this special issue. Some maintain that the perceptual-directional characteristics of the cue are sufficient to generate the bias whilst others argue that the cuing is stronger when the cue has social characteristics (relates to what another individual can perceive). The research contained in this issue confirms that human attention is biased by the presence of a directional cue. We discuss and compare the different studies. The pattern that emerges seems to suggest that social relevance of the cue is necessary in some contexts but not in others, depending on the cognitive demand of the experimental task. One possibility is that the social mechanisms are involved in perspective taking when the task is cognitively demanding, whilst they may not play a role in automatic attention allocation

    Is anyone looking at me? Direct gaze detection in children with and without autism

    Get PDF
    Atypical processing of eye contact is one of the significant characteristics of individuals with autism, but the mechanism underlying atypical direct gaze processing is still unclear. This study used a visual search paradigm to examine whether the facial context would affect direct gaze detection in children with autism. Participants were asked to detect target gazes presented among distracters with different gaze directions. The target gazes were either direct gaze or averted gaze, which were either presented alone (Experiment 1) or within facial context (Experiment 2). As with the typically developing children, the children with autism, were faster and more efficient to detect direct gaze than averted gaze, whether or not the eyes were presented alone or within faces. In addition, face inversion distorted efficient direct gaze detection in typically developing children, but not in children with autism. These results suggest that children with autism use featural information to detect direct gaze, whereas typically developing children use configural information to detect direct gaze
    • …
    corecore