33 research outputs found

    What has been missed for predicting human attention in viewing driving clips?

    Get PDF
    Recent research progress on the topic of human visual attention allocation in scene perception and its simulation is based mainly on studies with static images. However, natural vision requires us to extract visual information that constantly changes due to egocentric movements or dynamics of the world. It is unclear to what extent spatio-temporal regularity, an inherent regularity in dynamic vision, affects human gaze distribution and saliency computation in visual attention models. In this free-viewing eye-tracking study we manipulated the spatio-temporal regularity of traffic videos by presenting them in normal video sequence, reversed video sequence, normal frame sequence, and randomised frame sequence. The recorded human gaze allocation was then used as the ‘ground truth’ to examine the predictive ability of a number of state-of-the-art visual attention models. The analysis revealed high inter-observer agreement across individual human observers, but all the tested attention models performed significantly worse than humans. The inferior predictability of the models was evident from indistinguishable gaze prediction irrespective of stimuli presentation sequence, and weak central fixation bias. Our findings suggest that a realistic visual attention model for the processing of dynamic scenes should incorporate human visual sensitivity with spatio-temporal regularity and central fixation bias

    FastSal: a Computationally Efficient Network for Visual Saliency Prediction

    Get PDF
    This paper focuses on the problem of visual saliency prediction, predicting regions of an image that tend to attract human visual attention, under a constrained computational budget. We modify and test various recent efficient convolutional neural network architectures like EfficientNet and MobileNetV2 and compare them with existing state-of-the-art saliency models such as SalGAN and DeepGaze II both in terms of standard accuracy metrics like AUC and NSS, and in terms of the computational complexity and model size. We find that MobileNetV2 makes an excellent backbone for a visual saliency model and can be effective even without a complex decoder. We also show that knowledge transfer from a more computationally expensive model like DeepGaze II can be achieved via pseudo-labelling an unlabelled dataset, and that this approach gives result on-par with many state-of-the-art algorithms with a fraction of the computational cost and model size. Source code is available at https://github.com/feiyanhu/FastSal

    Intelligent and Energy-Efficient Data Prioritization in Green Smart Cities: Current Challenges and Future Directions

    Full text link
    [EN] The excessive use of digital devices such as cameras and smartphones in smart cities has produced huge data repositories that require automatic tools for efficient browsing, searching, and management. Data prioritization (DP) is a technique that produces a condensed form of the original data by analyzing its contents. Current DP studies are either concerned with data collected through stable capturing devices or focused on prioritization of data of a certain type such as surveillance, sports, or industry. This necessitates the need for DP tools that intelligently and cost-effectively prioritize a large variety of data for detecting abnormal events and hence effectively manage them, thereby making the current smart cities greener. In this article, we first carry out an in-depth investigation of the recent approaches and trends of DP for data of different natures, genres, and domains of two decades in green smart cities. Next, we propose an energy-efficient DP framework by intelligent integration of the Internet of Things, artificial intelligence, and big data analytics. Experimental evaluation on real-world surveillance data verifies the energy efficiency and applicability of this framework in green smart cities. Finally, this article highlights the key challenges of DP, its future requirements, and propositions for integration into green smart citiesThis work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (no. 2016R-1A2B4011712).Muhammad, K.; Lloret, J.; Baik, SW. (2019). Intelligent and Energy-Efficient Data Prioritization in Green Smart Cities: Current Challenges and Future Directions. IEEE Communications Magazine. 57(2):60-65. https://doi.org/10.1109/MCOM.2018.1800371S606557

    The Alleviation of Perceptual Blindness During Driving in Urban Areas Guided by Saccades Recommendation

    Get PDF
    In advanced industrial applications, computational visual attention models (CVAMs) could predict visual attention very similarly to actual human attention allocation. This has been used as a very important component of technology in advanced driver assistance systems (ADAS). Given that the biological inspiration of the driving-related CVAMs could be obtained from skilled drivers in complex driving conditions, in which the driver’s attention is constantly directed at various salient and informative visual stimuli by alternating the eye fixations via saccades to drive safely, this paper proposes a saccade recommendation strategy to enhance the driving safety under urban road environment, particularly when the driver’s vision is often impaired by the visual crowding. The altered and directed saccades are collected and optimized by extracting four innate features from human dynamic vision. A neural network is designed to classify preferable saccades to reduce perceptual blindness due to visual crowding under urban scenes. A state-of- the-art CVAM is firstly adopted to localize the predicted eye fixation locations (EFLs) in driving video clips. Besides, human subjects’ gaze at the recommended EFLs is measured via an eye-tracker. The time delays between the predicted EFLs and drivers’ EFLs are analyzed under different driving conditions, followed by the time delays between the predicted EFLs and the driver’s hand control. The visually safe margin is then measured by mediating the driving speed and the total delay. Experimental results demonstrate that the recommended saccades can effectively reduce the amount of perceptual blindness, which is known to be of help to further improve road driving safety

    Remote Sensing Scene Classification Based on Convolutional Neural Networks Pre-Trained Using Attention-Guided Sparse Filters

    Get PDF
    Open access articleSemantic-level land-use scene classification is a challenging problem, in which deep learning methods, e.g., convolutional neural networks (CNNs), have shown remarkable capacity. However, a lack of sufficient labeled images has proved a hindrance to increasing the land-use scene classification accuracy of CNNs. Aiming at this problem, this paper proposes a CNN pre-training method under the guidance of a human visual attention mechanism. Specifically, a computational visual attention model is used to automatically extract salient regions in unlabeled images. Then, sparse filters are adopted to learn features from these salient regions, with the learnt parameters used to initialize the convolutional layers of the CNN. Finally, the CNN is further fine-tuned on labeled images. Experiments are performed on the UCMerced and AID datasets, which show that when combined with a demonstrative CNN, our method can achieve 2.24% higher accuracy than a plain CNN and can obtain an overall accuracy of 92.43% when combined with AlexNet. The results indicate that the proposed method can effectively improve CNN performance using easy-to-access unlabeled images and thus will enhance the performance of land-use scene classification especially when a large-scale labeled dataset is unavailable

    Image synthesis based on a model of human vision

    Get PDF
    Modern computer graphics systems are able to construct renderings of such high quality that viewers are deceived into regarding the images as coming from a photographic source. Large amounts of computing resources are expended in this rendering process, using complex mathematical models of lighting and shading. However, psychophysical experiments have revealed that viewers only regard certain informative regions within a presented image. Furthermore, it has been shown that these visually important regions contain low-level visual feature differences that attract the attention of the viewer. This thesis will present a new approach to image synthesis that exploits these experimental findings by modulating the spatial quality of image regions by their visual importance. Efficiency gains are therefore reaped, without sacrificing much of the perceived quality of the image. Two tasks must be undertaken to achieve this goal. Firstly, the design of an appropriate region-based model of visual importance, and secondly, the modification of progressive rendering techniques to effect an importance-based rendering approach. A rule-based fuzzy logic model is presented that computes, using spatial feature differences, the relative visual importance of regions in an image. This model improves upon previous work by incorporating threshold effects induced by global feature difference distributions and by using texture concentration measures. A modified approach to progressive ray-tracing is also presented. This new approach uses the visual importance model to guide the progressive refinement of an image. In addition, this concept of visual importance has been incorporated into supersampling, texture mapping and computer animation techniques. Experimental results are presented, illustrating the efficiency gains reaped from using this method of progressive rendering. This visual importance-based rendering approach is expected to have applications in the entertainment industry, where image fidelity may be sacrificed for efficiency purposes, as long as the overall visual impression of the scene is maintained. Different aspects of the approach should find many other applications in image compression, image retrieval, progressive data transmission and active robotic vision

    Performance Evaluation of Object Proposal Generators for Salient Object Detection

    Get PDF
    abstract: The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these methods are capable of detecting the salient objects in the scene when constraining the number of proposals that can be generated due to constraints on timing or computations during execution. Salient objects are objects that tend to be more fixated by human subjects. The detection of salient objects is important in applications such as image collection browsing, image display on small devices, and perceptual compression. This thesis proposes a novel evaluation framework that analyses the performance of popular existing object proposal generators in detecting the most salient objects. This work also shows that, by incorporating saliency constraints, the number of generated object proposals and thus the computational cost can be decreased significantly for a target true positive detection rate (TPR). As part of the proposed framework, salient ground-truth masks are generated from the given original ground-truth masks for a given dataset. Given an object detection dataset, this work constructs salient object location ground-truth data, referred to here as salient ground-truth data for short, that only denotes the locations of salient objects. This is obtained by first computing a saliency map for the input image and then using it to assign a saliency score to each object in the image. Objects whose saliency scores are sufficiently high are referred to as salient objects. The detection rates are analyzed for existing object proposal generators with respect to the original ground-truth masks and the generated salient ground-truth masks. As part of this work, a salient object detection database with salient ground-truth masks was constructed from the PASCAL VOC 2007 dataset. Not only does this dataset aid in analyzing the performance of existing object detectors for salient object detection, but it also helps in the development of new object detection methods and evaluating their performance in terms of successful detection of salient objects.Dissertation/ThesisMasters Thesis Electrical Engineering 201

    Discovering salient objects from videos using spatiotemporal salient region detection

    Get PDF
    Detecting salient objects from images and videos has many useful applications in computer vision. In this paper, a novel spatiotemporal salient region detection approach is proposed. The proposed approach computes spatiotemporal saliency by estimating spatial and temporal saliencies separately. The spatial saliency of an image is computed by estimating the color contrast cue and color distribution cue. The estimations of these cues exploit the patch level and region level image abstractions in a unified way. The aforementioned cues are fused to compute an initial spatial saliency map, which is further refined to emphasize saliencies of objects uniformly, and to suppress saliencies of background noises. The final spatial saliency map is computed by integrating the refined saliency map with center prior map. The temporal saliency is computed based on local and global temporal saliencies estimations using patch level optical flow abstractions. Both local and global temporal saliencies are fused to compute the temporal saliency. Finally, spatial and temporal saliencies are integrated to generate a spatiotemporal saliency map. The proposed temporal and spatiotemporal salient region detection approaches are extensively experimented on challenging salient object detection video datasets. The experimental results show that the proposed approaches achieve an improved performance than several state-of-the-art saliency detection approaches. In order to compensate different needs in respect of the speed/accuracy tradeoff, faster variants of the spatial, temporal and spatiotemporal salient region detection approaches are also presented in this paper

    Psychophysical assessment of perceived interest in natural images: The ROI-D database

    Full text link
    We introduce a novel region-of-interest (ROI) database for natural image content, the ROI-D database. The database consists of ROI maps created from manual selections obtained in a psychophysical experiment with 20 participants. The presented stimuli were 42 photographic images taken from 3 publicly available image quality databases. In addition to the ROI selections, dominance ratings were recorded that provide further insight into the interest of the selected ROI in relation to the background. In this paper, the experiment is described, the resulting ROI database is analysed, and possible applications of the database are discussed. The ROI-D database is made freely available to the image processing research community
    corecore