3,687 research outputs found
Toward predictive machine learning for active vision
We develop a comprehensive description of the active inference framework, as
proposed by Friston (2010), under a machine-learning compliant perspective.
Stemming from a biological inspiration and the auto-encoding principles, the
sketch of a cognitive architecture is proposed that should provide ways to
implement estimation-oriented control policies. Computer simulations illustrate
the effectiveness of the approach through a foveated inspection of the input
data. The pros and cons of the control policy are analyzed in detail, showing
interesting promises in terms of processing compression. Though optimizing
future posterior entropy over the actions set is shown enough to attain locally
optimal action selection, offline calculation using class-specific saliency
maps is shown better for it saves processing costs through saccades pathways
pre-processing, with a negligible effect on the recognition/compression rates.Comment: submitted to ICLR 201
Learning Gaze Transitions from Depth to Improve Video Saliency Estimation
In this paper we introduce a novel Depth-Aware Video Saliency approach to
predict human focus of attention when viewing RGBD videos on regular 2D
screens. We train a generative convolutional neural network which predicts a
saliency map for a frame, given the fixation map of the previous frame.
Saliency estimation in this scenario is highly important since in the near
future 3D video content will be easily acquired and yet hard to display. This
can be explained, on the one hand, by the dramatic improvement of 3D-capable
acquisition equipment. On the other hand, despite the considerable progress in
3D display technologies, most of the 3D displays are still expensive and
require wearing special glasses. To evaluate the performance of our approach,
we present a new comprehensive database of eye-fixation ground-truth for RGBD
videos. Our experiments indicate that integrating depth into video saliency
calculation is beneficial. We demonstrate that our approach outperforms
state-of-the-art methods for video saliency, achieving 15% relative
improvement
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Review of Visual Saliency Detection with Comprehensive Information
Visual saliency detection model simulates the human visual system to perceive
the scene, and has been widely used in many vision tasks. With the acquisition
technology development, more comprehensive information, such as depth cue,
inter-image correspondence, or temporal relationship, is available to extend
image saliency detection to RGBD saliency detection, co-saliency detection, or
video saliency detection. RGBD saliency detection model focuses on extracting
the salient regions from RGBD images by combining the depth information.
Co-saliency detection model introduces the inter-image correspondence
constraint to discover the common salient object in an image group. The goal of
video saliency detection model is to locate the motion-related salient object
in video sequences, which considers the motion cue and spatiotemporal
constraint jointly. In this paper, we review different types of saliency
detection algorithms, summarize the important issues of the existing methods,
and discuss the existent problems and future works. Moreover, the evaluation
datasets and quantitative measurements are briefly introduced, and the
experimental analysis and discission are conducted to provide a holistic
overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on
Circuits and Systems for Video Technology 2018, https://rmcong.github.io
SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection
Data-driven saliency detection has attracted strong interest as a result of
applying convolutional neural networks to the detection of eye fixations.
Although a number of imagebased salient object and fixation detection models
have been proposed, video fixation detection still requires more exploration.
Different from image analysis, motion and temporal information is a crucial
factor affecting human attention when viewing video sequences. Although
existing models based on local contrast and low-level features have been
extensively researched, they failed to simultaneously consider interframe
motion and temporal information across neighboring video frames, leading to
unsatisfactory performance when handling complex scenes. To this end, we
propose a novel and efficient video eye fixation detection model to improve the
saliency detection performance. By simulating the memory mechanism and visual
attention mechanism of human beings when watching a video, we propose a
step-gained fully convolutional network by combining the memory information on
the time axis with the motion information on the space axis while storing the
saliency information of the current frame. The model is obtained through
hierarchical training, which ensures the accuracy of the detection. Extensive
experiments in comparison with 11 state-of-the-art methods are carried out, and
the results show that our proposed model outperforms all 11 methods across a
number of publicly available datasets
Automatic Salient Object Detection for Panoramic Images Using Region Growing and Fixation Prediction Model
Almost all previous works on saliency detection have been dedicated to
conventional images, however, with the outbreak of panoramic images due to the
rapid development of VR or AR technology, it is becoming more challenging,
meanwhile valuable for extracting salient contents in panoramic images.
In this paper, we propose a novel bottom-up salient object detection
framework for panoramic images. First, we employ a spatial density estimation
method to roughly extract object proposal regions, with the help of region
growing algorithm. Meanwhile, an eye fixation model is utilized to predict
visually attractive parts in the image from the perspective of the human visual
search mechanism. Then, the previous results are combined by the maxima
normalization to get the coarse saliency map. Finally, a refinement step based
on geodesic distance is utilized for post-processing to derive the final
saliency map.
To fairly evaluate the performance of the proposed approach, we propose a
high-quality dataset of panoramic images (SalPan). Extensive evaluations
demonstrate the effectiveness of our proposed method on panoramic images and
the superiority of the proposed method against other methods.Comment: Previous Project website: https://github.com/ChunbiaoZhu/DCC-201
Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach
Panoramic video provides immersive and interactive experience by enabling
humans to control the field of view (FoV) through head movement (HM). Thus, HM
plays a key role in modeling human attention on panoramic video. This paper
establishes a database collecting subjects' HM in panoramic video sequences.
From this database, we find that the HM data are highly consistent across
subjects. Furthermore, we find that deep reinforcement learning (DRL) can be
applied to predict HM positions, via maximizing the reward of imitating human
HM scanpaths through the agent's actions. Based on our findings, we propose a
DRL-based HM prediction (DHP) approach with offline and online versions, called
offline-DHP and online-DHP. In offline-DHP, multiple DRL workflows are run to
determine potential HM positions at each panoramic frame. Then, a heat map of
the potential HM positions, named the HM map, is generated as the output of
offline-DHP. In online-DHP, the next HM position of one subject is estimated
given the currently observed HM position, which is achieved by developing a DRL
algorithm upon the learned offline-DHP model. Finally, the experiments validate
that our approach is effective in both offline and online prediction of HM
positions for panoramic video, and that the learned offline-DHP model can
improve the performance of online-DHP.Comment: 15 pages, 10 figures, published on TPAMI 201
Semantic and Contrast-Aware Saliency
In this paper, we proposed an integrated model of semantic-aware and
contrast-aware saliency combining both bottom-up and top-down cues for
effective saliency estimation and eye fixation prediction. The proposed model
processes visual information using two pathways. The first pathway aims to
capture the attractive semantic information in images, especially for the
presence of meaningful objects and object parts such as human faces. The second
pathway is based on multi-scale on-line feature learning and information
maximization, which learns an adaptive sparse representation for the input and
discovers the high contrast salient patterns within the image context. The two
pathways characterize both long-term and short-term attention cues and are
integrated dynamically using maxima normalization. We investigate two different
implementations of the semantic pathway including an End-to-End deep neural
network solution and a dynamic feature integration solution, resulting in the
SCA and SCAFI model respectively. Experimental results on artificial images and
5 popular benchmark datasets demonstrate the superior performance and better
plausibility of the proposed model over both classic approaches and recent deep
models.Comment: arXiv admin note: text overlap with arXiv:1710.04071 by other author
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
BIT: Biologically Inspired Tracker
Visual tracking is challenging due to image variations caused by various
factors, such as object deformation, scale change, illumination change and
occlusion. Given the superior tracking performance of human visual system
(HVS), an ideal design of biologically inspired model is expected to improve
computer visual tracking. This is however a difficult task due to the
incomplete understanding of neurons' working mechanism in HVS. This paper aims
to address this challenge based on the analysis of visual cognitive mechanism
of the ventral stream in the visual cortex, which simulates shallow neurons (S1
units and C1 units) to extract low-level biologically inspired features for the
target appearance and imitates an advanced learning mechanism (S2 units and C2
units) to combine generative and discriminative models for target location. In
addition, fast Gabor approximation (FGA) and fast Fourier transform (FFT) are
adopted for real-time learning and detection in this framework. Extensive
experiments on large-scale benchmark datasets show that the proposed
biologically inspired tracker performs favorably against state-of-the-art
methods in terms of efficiency, accuracy, and robustness. The acceleration
technique in particular ensures that BIT maintains a speed of approximately 45
frames per second
- …