3,687 research outputs found

    Toward predictive machine learning for active vision

    Full text link
    We develop a comprehensive description of the active inference framework, as proposed by Friston (2010), under a machine-learning compliant perspective. Stemming from a biological inspiration and the auto-encoding principles, the sketch of a cognitive architecture is proposed that should provide ways to implement estimation-oriented control policies. Computer simulations illustrate the effectiveness of the approach through a foveated inspection of the input data. The pros and cons of the control policy are analyzed in detail, showing interesting promises in terms of processing compression. Though optimizing future posterior entropy over the actions set is shown enough to attain locally optimal action selection, offline calculation using class-specific saliency maps is shown better for it saves processing costs through saccades pathways pre-processing, with a negligible effect on the recognition/compression rates.Comment: submitted to ICLR 201

    Learning Gaze Transitions from Depth to Improve Video Saliency Estimation

    Full text link
    In this paper we introduce a novel Depth-Aware Video Saliency approach to predict human focus of attention when viewing RGBD videos on regular 2D screens. We train a generative convolutional neural network which predicts a saliency map for a frame, given the fixation map of the previous frame. Saliency estimation in this scenario is highly important since in the near future 3D video content will be easily acquired and yet hard to display. This can be explained, on the one hand, by the dramatic improvement of 3D-capable acquisition equipment. On the other hand, despite the considerable progress in 3D display technologies, most of the 3D displays are still expensive and require wearing special glasses. To evaluate the performance of our approach, we present a new comprehensive database of eye-fixation ground-truth for RGBD videos. Our experiments indicate that integrating depth into video saliency calculation is beneficial. We demonstrate that our approach outperforms state-of-the-art methods for video saliency, achieving 15% relative improvement

    cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey

    Full text link
    The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers on computer vision, pattern recognition, and related fields. For this particular review, we focused on reading the ALL 602 conference papers presented at the CVPR2015, the premier annual computer vision event held in June 2015, in order to grasp the trends in the field. Further, we are proposing "DeepSurvey" as a mechanism embodying the entire process from the reading through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape

    Review of Visual Saliency Detection with Comprehensive Information

    Full text link
    Visual saliency detection model simulates the human visual system to perceive the scene, and has been widely used in many vision tasks. With the acquisition technology development, more comprehensive information, such as depth cue, inter-image correspondence, or temporal relationship, is available to extend image saliency detection to RGBD saliency detection, co-saliency detection, or video saliency detection. RGBD saliency detection model focuses on extracting the salient regions from RGBD images by combining the depth information. Co-saliency detection model introduces the inter-image correspondence constraint to discover the common salient object in an image group. The goal of video saliency detection model is to locate the motion-related salient object in video sequences, which considers the motion cue and spatiotemporal constraint jointly. In this paper, we review different types of saliency detection algorithms, summarize the important issues of the existing methods, and discuss the existent problems and future works. Moreover, the evaluation datasets and quantitative measurements are briefly introduced, and the experimental analysis and discission are conducted to provide a holistic overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on Circuits and Systems for Video Technology 2018, https://rmcong.github.io

    SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection

    Full text link
    Data-driven saliency detection has attracted strong interest as a result of applying convolutional neural networks to the detection of eye fixations. Although a number of imagebased salient object and fixation detection models have been proposed, video fixation detection still requires more exploration. Different from image analysis, motion and temporal information is a crucial factor affecting human attention when viewing video sequences. Although existing models based on local contrast and low-level features have been extensively researched, they failed to simultaneously consider interframe motion and temporal information across neighboring video frames, leading to unsatisfactory performance when handling complex scenes. To this end, we propose a novel and efficient video eye fixation detection model to improve the saliency detection performance. By simulating the memory mechanism and visual attention mechanism of human beings when watching a video, we propose a step-gained fully convolutional network by combining the memory information on the time axis with the motion information on the space axis while storing the saliency information of the current frame. The model is obtained through hierarchical training, which ensures the accuracy of the detection. Extensive experiments in comparison with 11 state-of-the-art methods are carried out, and the results show that our proposed model outperforms all 11 methods across a number of publicly available datasets

    Automatic Salient Object Detection for Panoramic Images Using Region Growing and Fixation Prediction Model

    Full text link
    Almost all previous works on saliency detection have been dedicated to conventional images, however, with the outbreak of panoramic images due to the rapid development of VR or AR technology, it is becoming more challenging, meanwhile valuable for extracting salient contents in panoramic images. In this paper, we propose a novel bottom-up salient object detection framework for panoramic images. First, we employ a spatial density estimation method to roughly extract object proposal regions, with the help of region growing algorithm. Meanwhile, an eye fixation model is utilized to predict visually attractive parts in the image from the perspective of the human visual search mechanism. Then, the previous results are combined by the maxima normalization to get the coarse saliency map. Finally, a refinement step based on geodesic distance is utilized for post-processing to derive the final saliency map. To fairly evaluate the performance of the proposed approach, we propose a high-quality dataset of panoramic images (SalPan). Extensive evaluations demonstrate the effectiveness of our proposed method on panoramic images and the superiority of the proposed method against other methods.Comment: Previous Project website: https://github.com/ChunbiaoZhu/DCC-201

    Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach

    Full text link
    Panoramic video provides immersive and interactive experience by enabling humans to control the field of view (FoV) through head movement (HM). Thus, HM plays a key role in modeling human attention on panoramic video. This paper establishes a database collecting subjects' HM in panoramic video sequences. From this database, we find that the HM data are highly consistent across subjects. Furthermore, we find that deep reinforcement learning (DRL) can be applied to predict HM positions, via maximizing the reward of imitating human HM scanpaths through the agent's actions. Based on our findings, we propose a DRL-based HM prediction (DHP) approach with offline and online versions, called offline-DHP and online-DHP. In offline-DHP, multiple DRL workflows are run to determine potential HM positions at each panoramic frame. Then, a heat map of the potential HM positions, named the HM map, is generated as the output of offline-DHP. In online-DHP, the next HM position of one subject is estimated given the currently observed HM position, which is achieved by developing a DRL algorithm upon the learned offline-DHP model. Finally, the experiments validate that our approach is effective in both offline and online prediction of HM positions for panoramic video, and that the learned offline-DHP model can improve the performance of online-DHP.Comment: 15 pages, 10 figures, published on TPAMI 201

    Semantic and Contrast-Aware Saliency

    Full text link
    In this paper, we proposed an integrated model of semantic-aware and contrast-aware saliency combining both bottom-up and top-down cues for effective saliency estimation and eye fixation prediction. The proposed model processes visual information using two pathways. The first pathway aims to capture the attractive semantic information in images, especially for the presence of meaningful objects and object parts such as human faces. The second pathway is based on multi-scale on-line feature learning and information maximization, which learns an adaptive sparse representation for the input and discovers the high contrast salient patterns within the image context. The two pathways characterize both long-term and short-term attention cues and are integrated dynamically using maxima normalization. We investigate two different implementations of the semantic pathway including an End-to-End deep neural network solution and a dynamic feature integration solution, resulting in the SCA and SCAFI model respectively. Experimental results on artificial images and 5 popular benchmark datasets demonstrate the superior performance and better plausibility of the proposed model over both classic approaches and recent deep models.Comment: arXiv admin note: text overlap with arXiv:1710.04071 by other author

    cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey

    Full text link
    The paper gives futuristic challenges disscussed in the cvpaper.challenge. In 2015 and 2016, we thoroughly study 1,600+ papers in several conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV

    BIT: Biologically Inspired Tracker

    Full text link
    Visual tracking is challenging due to image variations caused by various factors, such as object deformation, scale change, illumination change and occlusion. Given the superior tracking performance of human visual system (HVS), an ideal design of biologically inspired model is expected to improve computer visual tracking. This is however a difficult task due to the incomplete understanding of neurons' working mechanism in HVS. This paper aims to address this challenge based on the analysis of visual cognitive mechanism of the ventral stream in the visual cortex, which simulates shallow neurons (S1 units and C1 units) to extract low-level biologically inspired features for the target appearance and imitates an advanced learning mechanism (S2 units and C2 units) to combine generative and discriminative models for target location. In addition, fast Gabor approximation (FGA) and fast Fourier transform (FFT) are adopted for real-time learning and detection in this framework. Extensive experiments on large-scale benchmark datasets show that the proposed biologically inspired tracker performs favorably against state-of-the-art methods in terms of efficiency, accuracy, and robustness. The acceleration technique in particular ensures that BIT maintains a speed of approximately 45 frames per second
    corecore