3,239 research outputs found
Holistic gaze strategy to categorize facial expression of varying intensities
Using faces representing exaggerated emotional expressions, recent behaviour and eye-tracking studies have suggested a
dominant role of individual facial features in transmitting diagnostic cues for decoding facial expressions. Considering that in everyday life we frequently view low-intensity expressive faces in which local facial cues are more ambiguous, we probably need to combine expressive cues from more than one facial feature to reliably decode naturalistic facial affects. In this study we applied a morphing technique to systematically vary intensities of six basic facial expressions of emotion, and employed a self-paced expression categorization task to measure participants’ categorization performance and associated
gaze patterns. The analysis of pooled data from all expressions showed that increasing expression intensity would improve categorization accuracy, shorten reaction time and reduce number of fixations directed at faces. The proportion of fixations and viewing time directed at internal facial features (eyes, nose and mouth region), however, was not affected by varying levels of intensity. Further comparison between individual facial expressions revealed that although proportional gaze allocation at individual facial features was quantitatively modulated by the viewed expressions, the overall gaze distribution
in face viewing was qualitatively similar across different facial expressions and different intensities. It seems that we adopt a holistic viewing strategy to extract expressive cues from all internal facial features in processing of naturalistic facial expressions
Object Detection Through Exploration With A Foveated Visual Field
We present a foveated object detector (FOD) as a biologically-inspired
alternative to the sliding window (SW) approach which is the dominant method of
search in computer vision object detection. Similar to the human visual system,
the FOD has higher resolution at the fovea and lower resolution at the visual
periphery. Consequently, more computational resources are allocated at the
fovea and relatively fewer at the periphery. The FOD processes the entire
scene, uses retino-specific object detection classifiers to guide eye
movements, aligns its fovea with regions of interest in the input image and
integrates observations across multiple fixations. Our approach combines modern
object detectors from computer vision with a recent model of peripheral pooling
regions found at the V1 layer of the human visual system. We assessed various
eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD
performs on par with the SW detector while bringing significant computational
cost savings.Comment: An extended version of this manuscript was published in PLOS
Computational Biology (October 2017) at
https://doi.org/10.1371/journal.pcbi.100574
Objects predict fixations better than early saliency
Humans move their eyes while looking at scenes and pictures. Eye movements correlate with shifts in attention and are thought to be a consequence of optimal resource allocation for high-level tasks such as visual recognition. Models of attention, such as “saliency maps,” are often built on the assumption that “early” features (color, contrast, orientation, motion, and so forth) drive attention directly. We explore an alternative hypothesis: Observers attend to “interesting” objects. To test this hypothesis, we measure the eye position of human observers while they inspect photographs of common natural
scenes. Our observers perform different tasks: artistic evaluation, analysis of content, and search. Immediately after each presentation, our observers are asked to name objects they saw. Weighted with recall frequency, these objects predict fixations in individual images better than early saliency, irrespective of task. Also, saliency combined with object positions predicts which objects are frequently named. This suggests that early saliency has only an indirect effect on attention, acting
through recognized objects. Consequently, rather than treating attention as mere preprocessing step for object recognition, models of both need to be integrated
Skim reading: an adaptive strategy for reading on the web
It has been suggested that readers spend a great deal of time skim reading on the Web and that if readers skim read they reduce their comprehension of what they have read. There have been a number of studies exploring skim reading, but relatively little exists on the skim reading of hypertext and Webpages. In the experiment documented here, we utilised eye tracking methodology to explore how readers skim read hypertext and how hyperlinks affect reading behaviour. The results show that the readers read faster when they were skim reading and comprehension was reduced. However, the presence of hyperlinks seemed to assist the readers in picking out important information when skim reading. We suggest that readers engage in an adaptive information foraging strategy where they attempt to minimise comprehension loss while maintaining a high reading speed. Readers use hyperlinks as markers to suggest important information and use them to read through the text in an efficient and effective way. This suggests that skim reading may not be as damaging to comprehension when reading hypertext, but it does mean that the words we choose to hyperlink become very important to comprehension for those skim reading text on the Web
Human Attention in Image Captioning: Dataset and Analysis
In this work, we present a novel dataset consisting of eye movements and
verbal descriptions recorded synchronously over images. Using this data, we
study the differences in human attention during free-viewing and image
captioning tasks. We look into the relationship between human attention and
language constructs during perception and sentence articulation. We also
analyse attention deployment mechanisms in the top-down soft attention approach
that is argued to mimic human attention in captioning tasks, and investigate
whether visual saliency can help image captioning. Our study reveals that (1)
human attention behaviour differs in free-viewing and image description tasks.
Humans tend to fixate on a greater variety of regions under the latter task,
(2) there is a strong relationship between described objects and attended
objects ( of the described objects are being attended), (3) a
convolutional neural network as feature encoder accounts for human-attended
regions during image captioning to a great extent (around ), (4)
soft-attention mechanism differs from human attention, both spatially and
temporally, and there is low correlation between caption scores and attention
consistency scores. These indicate a large gap between humans and machines in
regards to top-down attention, and (5) by integrating the soft attention model
with image saliency, we can significantly improve the model's performance on
Flickr30k and MSCOCO benchmarks. The dataset can be found at:
https://github.com/SenHe/Human-Attention-in-Image-Captioning.Comment: To appear at ICCV 201
Some , And Possibly All, Scalar Inferences Are Not Delayed: Evidence For Immediate Pragmatic Enrichment
Scalar inferences are commonly generated when a speaker uses a weaker expression rather than a stronger alternative, e.g., John ate some of the apples implies that he did not eat them all. This article describes a visual-world study investigating how and when perceivers compute these inferences. Participants followed spoken instructions containing the scalar quantifier some directing them to interact with one of several referential targets (e.g., Click on the girl who has some of the balloons). Participants fixated on the target compatible with the implicated meaning of some and avoided a competitor compatible with the literal meaning prior to a disambiguating noun. Further, convergence on the target was as fast for some as for the non-scalar quantifiers none and all. These findings indicate that the scalar inference is computed immediately and is not delayed relative to the literal interpretation of some. It is argued that previous demonstrations that scalar inferences increase processing time are not necessarily due to delays in generating the inference itself, but rather arise because integrating the interpretation of the inference with relevant information in the context may require additional time. With sufficient contextual support, processing delays disappear
- …