1,104 research outputs found

    Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics

    Full text link
    Dozens of new models on fixation prediction are published every year and compared on open benchmarks such as MIT300 and LSUN. However, progress in the field can be difficult to judge because models are compared using a variety of inconsistent metrics. Here we show that no single saliency map can perform well under all metrics. Instead, we propose a principled approach to solve the benchmarking problem by separating the notions of saliency models, maps and metrics. Inspired by Bayesian decision theory, we define a saliency model to be a probabilistic model of fixation density prediction and a saliency map to be a metric-specific prediction derived from the model density which maximizes the expected performance on that metric given the model density. We derive these optimal saliency maps for the most commonly used saliency metrics (AUC, sAUC, NSS, CC, SIM, KL-Div) and show that they can be computed analytically or approximated with high precision. We show that this leads to consistent rankings in all metrics and avoids the penalties of using one saliency map for all metrics. Our method allows researchers to have their model compete on many different metrics with state-of-the-art in those metrics: "good" models will perform well in all metrics.Comment: published at ECCV 201

    Attention Allocation Aid for Visual Search

    Full text link
    This paper outlines the development and testing of a novel, feedback-enabled attention allocation aid (AAAD), which uses real-time physiological data to improve human performance in a realistic sequential visual search task. Indeed, by optimizing over search duration, the aid improves efficiency, while preserving decision accuracy, as the operator identifies and classifies targets within simulated aerial imagery. Specifically, using experimental eye-tracking data and measurements about target detectability across the human visual field, we develop functional models of detection accuracy as a function of search time, number of eye movements, scan path, and image clutter. These models are then used by the AAAD in conjunction with real time eye position data to make probabilistic estimations of attained search accuracy and to recommend that the observer either move on to the next image or continue exploring the present image. An experimental evaluation in a scenario motivated from human supervisory control in surveillance missions confirms the benefits of the AAAD.Comment: To be presented at the ACM CHI conference in Denver, Colorado in May 201

    Behind the Machine's Gaze: Biologically Constrained Neural Networks Exhibit Human-like Visual Attention

    Full text link
    By and large, existing computational models of visual attention tacitly assume perfect vision and full access to the stimulus and thereby deviate from foveated biological vision. Moreover, modelling top-down attention is generally reduced to the integration of semantic features without incorporating the signal of a high-level visual tasks that have shown to partially guide human attention. We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner. With our method, we explore the ability of neural networks on which we impose the biological constraints of foveated vision to generate human-like scanpaths. Thereby, the scanpaths are generated to maximize the performance with respect to the underlying visual task (i.e., classification or reconstruction). Extensive experiments show that the proposed method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths. Additionally, the flexibility of the framework allows to quantitatively investigate the role of different tasks in the generated visual behaviours. Finally, we demonstrate the superiority of the approach in a novel experiment that investigates the utility of scanpaths in real-world applications, where imperfect viewing conditions are given

    Digital Oculomotor Biomarkers in Dementia

    Get PDF
    Dementia is an umbrella term that covers a number of neurodegenerative syndromes featuring gradual disturbance of various cognitive functions that are severe enough to interfere with tasks of daily life. The diagnosis of dementia occurs frequently when pathological changes have been developing for years, symptoms of cognitive impairment are evident and the quality of life of the patients has already been deteriorated significantly. Although brain imaging and fluid biomarkers allow the monitoring of disease progression in vivo, they are expensive, invasive and not necessarily diagnostic in isolation. Recent studies suggest that eye-tracking technology is an innovative tool that holds promise for accelerating early detection of the disease, as well as, supporting the development of strategies that minimise impairment during every day activities. However, the optimal methods for quantitative evaluation of oculomotor behaviour during complex and naturalistic tasks in dementia have yet to be determined. This thesis investigates the development of computational tools and techniques to analyse eye movements of dementia patients and healthy controls under naturalistic and less constrained scenarios to identify novel digital oculomotor biomarkers. Three key contributions are made. First, the evaluation of the role of environment during navigation in patients with typical Alzheimer disease and Posterior Cortical Atrophy compared to a control group using a combination of eye movement and egocentric video analysis. Secondly, the development of a novel method of extracting salient features directly from the raw eye-tracking data of a mixed sample of dementia patients during a novel instruction-less cognitive test to detect oculomotor biomarkers of dementia-related cognitive dysfunction. Third, the application of unsupervised anomaly detection techniques for visualisation of oculomotor anomalies during various cognitive tasks. The work presented in this thesis furthers our understanding of dementia-related oculomotor dysfunction and gives future research direction for the development of computerised cognitive tests and ecological interventions

    Learning a saliency map using fixated locations in natural scenes

    Get PDF
    Inspired by the primate visual system, computational saliency models decompose visual input into a set of feature maps across spatial scales in a number of pre-specified channels. The outputs of these feature maps are summed to yield the final saliency map. Here we use a least square technique to learn the weights associated with these maps from subjects freely fixating natural scenes drawn from four recent eye-tracking data sets. Depending on the data set, the weights can be quite different, with the face and orientation channels usually more important than color and intensity channels. Inter-subject differences are negligible. We also model a bias toward fixating at the center of images and consider both time-varying and constant factors that contribute to this bias. To compensate for the inadequacy of the standard method to judge performance (area under the ROC curve), we use two other metrics to comprehensively assess performance. Although our model retains the basic structure of the standard saliency model, it outperforms several state-of-the-art saliency algorithms. Furthermore, the simple structure makes the results applicable to numerous studies in psychophysics and physiology and leads to an extremely easy implementation for real-world applications

    Scanpath modeling and classification with Hidden Markov Models

    Get PDF
    How people look at visual information reveals fundamental information about them; their interests and their states of mind. Previous studies showed that scanpath, i.e., the sequence of eye movements made by an observer exploring a visual stimulus, can be used to infer observer-related (e.g., task at hand) and stimuli-related (e.g., image semantic category) information. However, eye movements are complex signals and many of these studies rely on limited gaze descriptors and bespoke datasets. Here, we provide a turnkey method for scanpath modeling and classification. This method relies on variational hidden Markov models (HMMs) and discriminant analysis (DA). HMMs encapsulate the dynamic and individualistic dimensions of gaze behavior, allowing DA to capture systematic patterns diagnostic of a given class of observers and/or stimuli. We test our approach on two very different datasets. Firstly, we use fixations recorded while viewing 800 static natural scene images, and infer an observer-related characteristic: the task at hand. We achieve an average of 55.9% correct classification rate (chance = 33%). We show that correct classification rates positively correlate with the number of salient regions present in the stimuli. Secondly, we use eye positions recorded while viewing 15 conversational videos, and infer a stimulus-related characteristic: the presence or absence of original soundtrack. We achieve an average 81.2% correct classification rate (chance = 50%). HMMs allow to integrate bottom-up, top-down, and oculomotor influences into a single model of gaze behavior. This synergistic approach between behavior and machine learning will open new avenues for simple quantification of gazing behavior. We release SMAC with HMM, a Matlab toolbox freely available to the community under an open-source license agreement.published_or_final_versio

    A computational model of visual attention.

    Get PDF
    Visual attention is a process by which the Human Visual System (HVS) selects most important information from a scene. Visual attention models are computational or mathematical models developed to predict this information. The performance of the state-of-the-art visual attention models is limited in terms of prediction accuracy and computational complexity. In spite of significant amount of active research in this area, modelling visual attention is still an open research challenge. This thesis proposes a novel computational model of visual attention that achieves higher prediction accuracy with low computational complexity. A new bottom-up visual attention model based on in-focus regions is proposed. To develop the model, an image dataset is created by capturing images with in-focus and out-of-focus regions. The Discrete Cosine Transform (DCT) spectrum of these images is investigated qualitatively and quantitatively to discover the key frequency coefficients that correspond to the in-focus regions. The model detects these key coefficients by formulating a novel relation between the in-focus and out-of-focus regions in the frequency domain. These frequency coefficients are used to detect the salient in-focus regions. The simulation results show that this attention model achieves good prediction accuracy with low complexity. The prediction accuracy of the proposed in-focus visual attention model is further improved by incorporating sensitivity of the HVS towards the image centre and the human faces. Moreover, the computational complexity is further reduced by using Integer Cosine Transform (ICT). The model is parameter tuned using the hill climbing approach to optimise the accuracy. The performance has been analysed qualitatively and quantitatively using two large image datasets with eye tracking fixation ground truth. The results show that the model achieves higher prediction accuracy with a lower computational complexity compared to the state-of-the-art visual attention models. The proposed model is useful in predicting human fixations in computationally constrained environments. Mainly it is useful in applications such as perceptual video coding, image quality assessment, object recognition and image segmentation
    corecore