1,990 research outputs found

    How is Gaze Influenced by Image Transformations? Dataset and Model

    Full text link
    Data size is the bottleneck for developing deep saliency models, because collecting eye-movement data is very time consuming and expensive. Most of current studies on human attention and saliency modeling have used high quality stereotype stimuli. In real world, however, captured images undergo various types of transformations. Can we use these transformations to augment existing saliency datasets? Here, we first create a novel saliency dataset including fixations of 10 observers over 1900 images degraded by 19 types of transformations. Second, by analyzing eye movements, we find that observers look at different locations over transformed versus original images. Third, we utilize the new data over transformed images, called data augmentation transformation (DAT), to train deep saliency models. We find that label preserving DATs with negligible impact on human gaze boost saliency prediction, whereas some other DATs that severely impact human gaze degrade the performance. These label preserving valid augmentation transformations provide a solution to enlarge existing saliency datasets. Finally, we introduce a novel saliency model based on generative adversarial network (dubbed GazeGAN). A modified UNet is proposed as the generator of the GazeGAN, which combines classic skip connections with a novel center-surround connection (CSC), in order to leverage multi level features. We also propose a histogram loss based on Alternative Chi Square Distance (ACS HistLoss) to refine the saliency map in terms of luminance distribution. Extensive experiments and comparisons over 3 datasets indicate that GazeGAN achieves the best performance in terms of popular saliency evaluation metrics, and is more robust to various perturbations. Our code and data are available at: https://github.com/CZHQuality/Sal-CFS-GAN

    Visual attention in the real world

    Get PDF
    Humans typically direct their gaze and attention at locations important for the tasks they are engaged in. By measuring the direction of gaze, the relative importance of each location can be estimated which can reveal how cognitive processes choose where gaze is to be directed. For decades, this has been done in laboratory setups, which have the advantage of being well-controlled. Here, visual attention is studied in more life-like situations, which allows testing ecological validity of laboratory results and allows the use of real-life setups that are hard to mimic in a laboratory. All four studies in this thesis contribute to our understanding of visual attention and perception in more complex situations than are found in the traditional laboratory experiments. Bottom-up models of attention use the visual input to predict attention or even the direction of gaze. In such models the input image is analyzed for each of several features first. In the classic Saliency Map model, these features are color contrast, luminance contrast and orientation contrast. The “interestingness” of each location in the image is represented in a ‘conspicuity maps’, one for each feature. The Saliency Map model then combines these conspicuity maps by linear addition, and this additivity has recently been challenged. The alternative is to use the maxima across all conspicuity maps. In the first study, the features color contrast and luminance contrast were manipulated in photographs of natural scenes to test which of these mechanisms is the best predictor of human behavior. It was shown that a linear addition, as in the original model, matches human behavior best. As all the assumptions of the Saliency Map model on the processes preceding the linear addition of the conspicuity maps are based on physiological research, this result constrains future models in their mechanistic assumption. If models of visual attention are to have ecological validity, comparing visual attention in laboratory and real-world conditions is necessary, and this is done in the second study. In the first condition, eye movements and head-centered, first-person perspective movies were recorded while participants explored 15 real-world environments (“free exploration”). Clips from these movies were shown to participants in two laboratory tasks. First, the movies were replayed as they were recorded (“video replay”), and second, a shuffled selection of frames was shown for 1 second each (“1s frame replay”). Eye-movement recordings from all three conditions revealed that in comparison to 1s frame replay, the video replay condition was qualitatively more alike to the free exploration condition with respect to the distribution of gaze and the relationship between gaze and model saliency and was quantitatively better able to predict free exploration gaze. Furthermore, the onset of a new frame in 1s frame replay evoked a reorientation of gaze towards the center. That is, the event of presenting a stimulus in a laboratory setup affects attention in a way unlikely to occur in real life. In conclusion, video replay is a better model for real-world visual input. The hypothesis that walking on more irregular terrain requires visual attention to be directed at the path more was tested on a local street (“Hirschberg”) in the third study. Participants walked on both sides of this inclined street; a cobbled road and the immediately adjacent, irregular steps. The environment and instructions were kept constant. Gaze was directed at the path more when participants walked on the steps as compared to the road. This was accomplished by pointing both the head and the eyes lower on the steps than on the road, while only eye-in-head orientation was spread out along the vertical more on the steps, indicating more or large eye movements on the more irregular steps. These results confirm earlier findings that eye and head movements play distinct roles in directing gaze in real-world situations. Furthermore, they show that implicit tasks (not falling, in this case) affect visual attention as much as explicit tasks do. In the last study it is asked if actions affect perception. An ambiguous stimulus that is alternatively perceived as rotating clockwise or counterclockwise (the ‘percept’) was used. When participants had to rotate a manipulandum continuously in a pre-defined direction – either clockwise or counterclockwise – and reported their concurrent percept with a keyboard, percepts weren’t affected by movements. If participants had to use the manipulandum to indicate their percept – by rotating either congruently or incongruently with the percept – the movements did affect perception. This shows that ambiguity in visual input is resolved by relying on motor signals, but only when they are relevant for the task at hand. Either by using natural stimuli, by comparing behavior in the laboratory with behavior in the real world, by performing an experiment on the street, or by testing how two diverse but everyday sources of information are integrated, the faculty of vision was studied in more life like situations. The validity of some laboratory work has been examined and confirmed and some first steps in doing experiments in real-world situations have been made. Both seem to be promising approaches for future research
    • 

    corecore