10,386 research outputs found

    Goal Directed Visual Search Based on Color Cues: Co-operative Effectes of Top-Down & Bottom-Up Visual Attention

    Get PDF
    Focus of Attention plays an important role in perception of the visual environment. Certain objects stand out in the scene irrespective of observers\u27 goals. This form of attention capture, in which stimulus feature saliency captures our attention, is of a bottom-up nature. Often prior knowledge about objects and scenes can influence our attention. This form of attention capture, which is influenced by higher level knowledge about the objects, is called top-down attention. Top-down attention acts as a feedback mechanism for the feed-forward bottom-up attention. Visual search is a result of a combined effort of the top-down (cognitive cue) system and bottom-up (low level feature saliency) system. In my thesis I investigate the process of goal directed visual search based on color cue, which is a process of searching for objects of a certain color. The computational model generates saliency maps that predict the locations of interest during a visual search. Comparison between the model-generated saliency maps and the results of psychophysical human eye -tracking experiments was conducted. The analysis provides a measure of how well the human eye movements correspond with the predicted locations of the saliency maps. Eye tracking equipment in the Visual Perceptual Laboratory in the Center for Imaging Science was used to conduct the experiments

    Predicting visual fixations on video based on low-level visual features

    Get PDF
    AbstractTo what extent can a computational model of the bottom–up visual attention predict what an observer is looking at? What is the contribution of the low-level visual features in the attention deployment? To answer these questions, a new spatio-temporal computational model is proposed. This model incorporates several visual features; therefore, a fusion algorithm is required to combine the different saliency maps (achromatic, chromatic and temporal). To quantitatively assess the model performances, eye movements were recorded while naive observers viewed natural dynamic scenes. Four completing metrics have been used. In addition, predictions from the proposed model are compared to the predictions from a state of the art model [Itti’s model (Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259)] and from three non-biologically plausible models (uniform, flicker and centered models). Regardless of the metric used, the proposed model shows significant improvement over the selected benchmarking models (except the centered model). Conclusions are drawn regarding both the influence of low-level visual features over time and the central bias in an eye tracking experiment

    Predicting Visual Attention and Distraction During Visual Search Using Convolutional Neural Networks

    Full text link
    Most studies in computational modeling of visual attention encompass task-free observation of images. Free-viewing saliency considers limited scenarios of daily life. Most visual activities are goal-oriented and demand a great amount of top-down attention control. Visual search task demands more top-down control of attention, compared to free-viewing. In this paper, we present two approaches to model visual attention and distraction of observers during visual search. Our first approach adapts a light-weight free-viewing saliency model to predict eye fixation density maps of human observers over pixels of search images, using a two-stream convolutional encoder-decoder network, trained and evaluated on COCO-Search18 dataset. This method predicts which locations are more distracting when searching for a particular target. Our network achieves good results on standard saliency metrics (AUC-Judd=0.95, AUC-Borji=0.85, sAUC=0.84, NSS=4.64, KLD=0.93, CC=0.72, SIM=0.54, and IG=2.59). Our second approach is object-based and predicts the distractor and target objects during visual search. Distractors are all objects except the target that observers fixate on during search. This method uses a Mask-RCNN segmentation network pre-trained on MS-COCO and fine-tuned on COCO-Search18 dataset. We release our segmentation annotations of targets and distractors in COCO-Search18 for three target categories: bottle, bowl, and car. The average scores over the three categories are: F1-score=0.64, MAP(iou:0.5)=0.57, MAR(iou:0.5)=0.73. Our implementation code in Tensorflow is publicly available at https://github.com/ManooshSamiei/Distraction-Visual-Search .Comment: 33 pages, 24 figures, 12 tables, this is a pre-print manuscript currently under review in Journal of Visio

    Pseudo-Saliency for Human Gaze Simulation

    Get PDF
    Understanding and modeling human vision is an endeavor which can be; and has been, approached from multiple disciplines. Saliency prediction is a subdomain of computer vision which tries to predict human eye movements made during either guided or free viewing of static images. In the context of simulation and animation, vision is often also modeled for the purposes of realistic and reactive autonomous agents. These often focus more on plausible gaze movements of the eyes and head, and are less concerned with scene understanding through visual stimuli. In order to bring techniques and knowledge over from computer vision fields into simulated virtual humans requires a methodology to generate saliency maps. Traditional saliency models are ill suited for this due to large computational costs as well as a lack of control due to the nature of most deep network based models. The primary contribution of this thesis is a proposed model for generating pseudo-saliency maps for virtual characters, Parametric Saliency Maps (PSM). This parametric model calculates saliency as a weighted combination of 7 factors selected from saliency and attention literature. Experiments conducted show that the model is expressive enough to mimic results from state-of-the-art saliency models to a high degree of similarity, as well as being extraordinarily cheap to compute by virtue of being done using the graphics processing pipeline of a simulation. The secondary contribution, two models are proposed for saliency driven gaze control. These models are expressive and present novel approaches for controlling the gaze of a virtual character using only visual saliency maps as input

    Pseudo-Saliency for Human Gaze Simulation

    Get PDF
    Understanding and modeling human vision is an endeavor which can be; and has been, approached from multiple disciplines. Saliency prediction is a subdomain of computer vision which tries to predict human eye movements made during either guided or free viewing of static images. In the context of simulation and animation, vision is often also modeled for the purposes of realistic and reactive autonomous agents. These often focus more on plausible gaze movements of the eyes and head, and are less concerned with scene understanding through visual stimuli. In order to bring techniques and knowledge over from computer vision fields into simulated virtual humans requires a methodology to generate saliency maps. Traditional saliency models are ill suited for this due to large computational costs as well as a lack of control due to the nature of most deep network based models. The primary contribution of this thesis is a proposed model for generating pseudo-saliency maps for virtual characters, Parametric Saliency Maps (PSM). This parametric model calculates saliency as a weighted combination of 7 factors selected from saliency and attention literature. Experiments conducted show that the model is expressive enough to mimic results from state-of-the-art saliency models to a high degree of similarity, as well as being extraordinarily cheap to compute by virtue of being done using the graphics processing pipeline of a simulation. The secondary contribution, two models are proposed for saliency driven gaze control. These models are expressive and present novel approaches for controlling the gaze of a virtual character using only visual saliency maps as input

    VST++: Efficient and Stronger Visual Saliency Transformer

    Full text link
    While previous CNN-based models have exhibited promising results for salient object detection (SOD), their ability to explore global long-range dependencies is restricted. Our previous work, the Visual Saliency Transformer (VST), addressed this constraint from a transformer-based sequence-to-sequence perspective, to unify RGB and RGB-D SOD. In VST, we developed a multi-task transformer decoder that concurrently predicts saliency and boundary outcomes in a pure transformer architecture. Moreover, we introduced a novel token upsampling method called reverse T2T for predicting a high-resolution saliency map effortlessly within transformer-based structures. Building upon the VST model, we further propose an efficient and stronger VST version in this work, i.e. VST++. To mitigate the computational costs of the VST model, we propose a Select-Integrate Attention (SIA) module, partitioning foreground into fine-grained segments and aggregating background information into a single coarse-grained token. To incorporate 3D depth information with low cost, we design a novel depth position encoding method tailored for depth maps. Furthermore, we introduce a token-supervised prediction loss to provide straightforward guidance for the task-related tokens. We evaluate our VST++ model across various transformer-based backbones on RGB, RGB-D, and RGB-T SOD benchmark datasets. Experimental results show that our model outperforms existing methods while achieving a 25% reduction in computational costs without significant performance compromise. The demonstrated strong ability for generalization, enhanced performance, and heightened efficiency of our VST++ model highlight its potential

    Multiscale Discriminant Saliency for Visual Attention

    Full text link
    The bottom-up saliency, an early stage of humans' visual attention, can be considered as a binary classification problem between center and surround classes. Discriminant power of features for the classification is measured as mutual information between features and two classes distribution. The estimated discrepancy of two feature classes very much depends on considered scale levels; then, multi-scale structure and discriminant power are integrated by employing discrete wavelet features and Hidden markov tree (HMT). With wavelet coefficients and Hidden Markov Tree parameters, quad-tree like label structures are constructed and utilized in maximum a posterior probability (MAP) of hidden class variables at corresponding dyadic sub-squares. Then, saliency value for each dyadic square at each scale level is computed with discriminant power principle and the MAP. Finally, across multiple scales is integrated the final saliency map by an information maximization rule. Both standard quantitative tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed multiscale discriminant saliency method (MDIS) against the well-know information-based saliency method AIM on its Bruce Database wity eye-tracking data. Simulation results are presented and analyzed to verify the validity of MDIS as well as point out its disadvantages for further research direction.Comment: 16 pages, ICCSA 2013 - BIOCA sessio
    • …
    corecore