3,239 research outputs found

    Holistic gaze strategy to categorize facial expression of varying intensities

    Get PDF
    Using faces representing exaggerated emotional expressions, recent behaviour and eye-tracking studies have suggested a dominant role of individual facial features in transmitting diagnostic cues for decoding facial expressions. Considering that in everyday life we frequently view low-intensity expressive faces in which local facial cues are more ambiguous, we probably need to combine expressive cues from more than one facial feature to reliably decode naturalistic facial affects. In this study we applied a morphing technique to systematically vary intensities of six basic facial expressions of emotion, and employed a self-paced expression categorization task to measure participants’ categorization performance and associated gaze patterns. The analysis of pooled data from all expressions showed that increasing expression intensity would improve categorization accuracy, shorten reaction time and reduce number of fixations directed at faces. The proportion of fixations and viewing time directed at internal facial features (eyes, nose and mouth region), however, was not affected by varying levels of intensity. Further comparison between individual facial expressions revealed that although proportional gaze allocation at individual facial features was quantitatively modulated by the viewed expressions, the overall gaze distribution in face viewing was qualitatively similar across different facial expressions and different intensities. It seems that we adopt a holistic viewing strategy to extract expressive cues from all internal facial features in processing of naturalistic facial expressions

    Object Detection Through Exploration With A Foveated Visual Field

    Get PDF
    We present a foveated object detector (FOD) as a biologically-inspired alternative to the sliding window (SW) approach which is the dominant method of search in computer vision object detection. Similar to the human visual system, the FOD has higher resolution at the fovea and lower resolution at the visual periphery. Consequently, more computational resources are allocated at the fovea and relatively fewer at the periphery. The FOD processes the entire scene, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. Our approach combines modern object detectors from computer vision with a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We assessed various eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD performs on par with the SW detector while bringing significant computational cost savings.Comment: An extended version of this manuscript was published in PLOS Computational Biology (October 2017) at https://doi.org/10.1371/journal.pcbi.100574

    Objects predict fixations better than early saliency

    Get PDF
    Humans move their eyes while looking at scenes and pictures. Eye movements correlate with shifts in attention and are thought to be a consequence of optimal resource allocation for high-level tasks such as visual recognition. Models of attention, such as “saliency maps,” are often built on the assumption that “early” features (color, contrast, orientation, motion, and so forth) drive attention directly. We explore an alternative hypothesis: Observers attend to “interesting” objects. To test this hypothesis, we measure the eye position of human observers while they inspect photographs of common natural scenes. Our observers perform different tasks: artistic evaluation, analysis of content, and search. Immediately after each presentation, our observers are asked to name objects they saw. Weighted with recall frequency, these objects predict fixations in individual images better than early saliency, irrespective of task. Also, saliency combined with object positions predicts which objects are frequently named. This suggests that early saliency has only an indirect effect on attention, acting through recognized objects. Consequently, rather than treating attention as mere preprocessing step for object recognition, models of both need to be integrated

    Skim reading: an adaptive strategy for reading on the web

    No full text
    It has been suggested that readers spend a great deal of time skim reading on the Web and that if readers skim read they reduce their comprehension of what they have read. There have been a number of studies exploring skim reading, but relatively little exists on the skim reading of hypertext and Webpages. In the experiment documented here, we utilised eye tracking methodology to explore how readers skim read hypertext and how hyperlinks affect reading behaviour. The results show that the readers read faster when they were skim reading and comprehension was reduced. However, the presence of hyperlinks seemed to assist the readers in picking out important information when skim reading. We suggest that readers engage in an adaptive information foraging strategy where they attempt to minimise comprehension loss while maintaining a high reading speed. Readers use hyperlinks as markers to suggest important information and use them to read through the text in an efficient and effective way. This suggests that skim reading may not be as damaging to comprehension when reading hypertext, but it does mean that the words we choose to hyperlink become very important to comprehension for those skim reading text on the Web

    Human Attention in Image Captioning: Dataset and Analysis

    Get PDF
    In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images. Using this data, we study the differences in human attention during free-viewing and image captioning tasks. We look into the relationship between human attention and language constructs during perception and sentence articulation. We also analyse attention deployment mechanisms in the top-down soft attention approach that is argued to mimic human attention in captioning tasks, and investigate whether visual saliency can help image captioning. Our study reveals that (1) human attention behaviour differs in free-viewing and image description tasks. Humans tend to fixate on a greater variety of regions under the latter task, (2) there is a strong relationship between described objects and attended objects (97%97\% of the described objects are being attended), (3) a convolutional neural network as feature encoder accounts for human-attended regions during image captioning to a great extent (around 78%78\%), (4) soft-attention mechanism differs from human attention, both spatially and temporally, and there is low correlation between caption scores and attention consistency scores. These indicate a large gap between humans and machines in regards to top-down attention, and (5) by integrating the soft attention model with image saliency, we can significantly improve the model's performance on Flickr30k and MSCOCO benchmarks. The dataset can be found at: https://github.com/SenHe/Human-Attention-in-Image-Captioning.Comment: To appear at ICCV 201

    Some , And Possibly All, Scalar Inferences Are Not Delayed: Evidence For Immediate Pragmatic Enrichment

    Get PDF
    Scalar inferences are commonly generated when a speaker uses a weaker expression rather than a stronger alternative, e.g., John ate some of the apples implies that he did not eat them all. This article describes a visual-world study investigating how and when perceivers compute these inferences. Participants followed spoken instructions containing the scalar quantifier some directing them to interact with one of several referential targets (e.g., Click on the girl who has some of the balloons). Participants fixated on the target compatible with the implicated meaning of some and avoided a competitor compatible with the literal meaning prior to a disambiguating noun. Further, convergence on the target was as fast for some as for the non-scalar quantifiers none and all. These findings indicate that the scalar inference is computed immediately and is not delayed relative to the literal interpretation of some. It is argued that previous demonstrations that scalar inferences increase processing time are not necessarily due to delays in generating the inference itself, but rather arise because integrating the interpretation of the inference with relevant information in the context may require additional time. With sufficient contextual support, processing delays disappear
    corecore