3,261 research outputs found
Fully automatic extraction of salient objects from videos in near real-time
Automatic video segmentation plays an important role in a wide range of
computer vision and image processing applications. Recently, various methods
have been proposed for this purpose. The problem is that most of these methods
are far from real-time processing even for low-resolution videos due to the
complex procedures. To this end, we propose a new and quite fast method for
automatic video segmentation with the help of 1) efficient optimization of
Markov random fields with polynomial time of number of pixels by introducing
graph cuts, 2) automatic, computationally efficient but stable derivation of
segmentation priors using visual saliency and sequential update mechanism, and
3) an implementation strategy in the principle of stream processing with
graphics processor units (GPUs). Test results indicates that our method
extracts appropriate regions from videos as precisely as and much faster than
previous semi-automatic methods even though any supervisions have not been
incorporated.Comment: submitted to Special Issue on High Performance Computation on
Hardware Accelerators, the Computer Journa
Visual saliency estimation by integrating features using multiple kernel learning
In the last few decades, significant achievements have been attained in
predicting where humans look at images through different computational models.
However, how to determine contributions of different visual features to overall
saliency still remains an open problem. To overcome this issue, a recent class
of models formulates saliency estimation as a supervised learning problem and
accordingly apply machine learning techniques. In this paper, we also address
this challenging problem and propose to use multiple kernel learning (MKL) to
combine information coming from different feature dimensions and to perform
integration at an intermediate level. Besides, we suggest to use responses of a
recently proposed filterbank of object detectors, known as Object-Bank, as
additional semantic high-level features. Here we show that our MKL-based
framework together with the proposed object-specific features provide
state-of-the-art performance as compared to SVM or AdaBoost-based saliency
models
Scale Selection of Adaptive Kernel Regression by Joint Saliency Map for Nonrigid Image Registration
Joint saliency map (JSM) [1] was developed to assign high joint saliency
values to the corresponding saliency structures (called Joint Saliency
Structures, JSSs) but zero or low joint saliency values to the outliers (or
mismatches) that are introduced by missing correspondence or local large
deformations between the reference and moving images to be registered. JSM
guides the local structure matching in nonrigid registration by emphasizing
these JSSs' sparse deformation vectors in adaptive kernel regression of
hierarchical sparse deformation vectors for iterative dense deformation
reconstruction. By designing an effective superpixel-based local structure
scale estimator to compute the reference structure's structure scale, we
further propose to determine the scale (the width) of kernels in the adaptive
kernel regression through combining the structure scales to JSM-based scales of
mismatch between the local saliency structures. Therefore, we can adaptively
select the sample size of sparse deformation vectors to reconstruct the dense
deformation vectors for accurately matching the every local structures in the
two images. The experimental results demonstrate better accuracy of our method
in aligning two images with missing correspondence and local large deformation
than the state-of-the-art methods.Comment: 9 page
Augmented Semantic Signatures of Airborne LiDAR Point Clouds for Comparison
LiDAR point clouds provide rich geometric information, which is particularly
useful for the analysis of complex scenes of urban regions. Finding structural
and semantic differences between two different three-dimensional point clouds,
say, of the same region but acquired at different time instances is an
important problem. A comparison of point clouds involves computationally
expensive registration and segmentation. We are interested in capturing the
relative differences in the geometric uncertainty and semantic content of the
point cloud without the registration process. Hence, we propose an
orientation-invariant geometric signature of the point cloud, which integrates
its probabilistic geometric and semantic classifications. We study different
properties of the geometric signature, which are an image-based encoding of
geometric uncertainty and semantic content. We explore different metrics to
determine differences between these signatures, which in turn compare point
clouds without performing point-to-point registration. Our results show that
the differences in the signatures corroborate with the geometric and semantic
differences of the point clouds.Comment: 18 pages, 6 figures, 1 tabl
Visual saliency detection: a Kalman filter based approach
In this paper we propose a Kalman filter aided saliency detection model which
is based on the conjecture that salient regions are considerably different from
our "visual expectation" or they are "visually surprising" in nature. In this
work, we have structured our model with an immediate objective to predict
saliency in static images. However, the proposed model can be easily extended
for space-time saliency prediction. Our approach was evaluated using two
publicly available benchmark data sets and results have been compared with
other existing saliency models. The results clearly illustrate the superior
performance of the proposed model over other approaches
Human Attention Estimation for Natural Images: An Automatic Gaze Refinement Approach
Photo collections and its applications today attempt to reflect user
interactions in various forms. Moreover, photo collections aim to capture the
users' intention with minimum effort through applications capturing user
intentions. Human interest regions in an image carry powerful information about
the user's behavior and can be used in many photo applications. Research on
human visual attention has been conducted in the form of gaze tracking and
computational saliency models in the computer vision community, and has shown
considerable progress. This paper presents an integration between implicit gaze
estimation and computational saliency model to effectively estimate human
attention regions in images on the fly. Furthermore, our method estimates human
attention via implicit calibration and incremental model updating without any
active participation from the user. We also present extensive analysis and
possible applications for personal photo collections
SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection
Data-driven saliency detection has attracted strong interest as a result of
applying convolutional neural networks to the detection of eye fixations.
Although a number of imagebased salient object and fixation detection models
have been proposed, video fixation detection still requires more exploration.
Different from image analysis, motion and temporal information is a crucial
factor affecting human attention when viewing video sequences. Although
existing models based on local contrast and low-level features have been
extensively researched, they failed to simultaneously consider interframe
motion and temporal information across neighboring video frames, leading to
unsatisfactory performance when handling complex scenes. To this end, we
propose a novel and efficient video eye fixation detection model to improve the
saliency detection performance. By simulating the memory mechanism and visual
attention mechanism of human beings when watching a video, we propose a
step-gained fully convolutional network by combining the memory information on
the time axis with the motion information on the space axis while storing the
saliency information of the current frame. The model is obtained through
hierarchical training, which ensures the accuracy of the detection. Extensive
experiments in comparison with 11 state-of-the-art methods are carried out, and
the results show that our proposed model outperforms all 11 methods across a
number of publicly available datasets
Benchmark 3D eye-tracking dataset for visual saliency prediction on stereoscopic 3D video
Visual Attention Models (VAMs) predict the location of an image or video
regions that are most likely to attract human attention. Although saliency
detection is well explored for 2D image and video content, there are only few
attempts made to design 3D saliency prediction models. Newly proposed 3D visual
attention models have to be validated over large-scale video saliency
prediction datasets, which also contain results of eye-tracking information.
There are several publicly available eye-tracking datasets for 2D image and
video content. In the case of 3D, however, there is still a need for
large-scale video saliency datasets for the research community for validating
different 3D-VAMs. In this paper, we introduce a large-scale dataset containing
eye-tracking data collected from 61 stereoscopic 3D videos (and also 2D
versions of those) and 24 subjects participated in a free-viewing test. We
evaluate the performance of the existing saliency detection methods over the
proposed dataset. In addition, we created an online benchmark for validating
the performance of the existing 2D and 3D visual attention models and
facilitate addition of new VAMs to the benchmark. Our benchmark currently
contains 50 different VAMs
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Wavelet-based Scale Saliency
Both pixel-based scale saliency (PSS) and basis project methods focus on
multiscale analysis of data content and structure. Their theoretical relations
and practical combination are previously discussed. However, no models have
ever been proposed for calculating scale saliency on basis-projected
descriptors since then. This paper extend those ideas into mathematical models
and implement them in the wavelet-based scale saliency (WSS). While PSS uses
pixel-value descriptors, WSS treats wavelet sub-bands as basis descriptors. The
paper discusses different wavelet descriptors: discrete wavelet transform
(DWT), wavelet packet transform (DWPT), quaternion wavelet transform (QWT) and
best basis quaternion wavelet packet transform (QWPTBB). WSS saliency maps of
different descriptors are generated and compared against other saliency methods
by both quantitative and quanlitative methods. Quantitative results, ROC
curves, AUC values and NSS values are collected from simulations on Bruce and
Kootstra image databases with human eye-tracking data as ground-truth.
Furthermore, qualitative visual results of saliency maps are analyzed and
compared against each other as well as eye-tracking data inclusive in the
databases.Comment: Partly published in ACIIDS 2013 - Kuala Lumpur Malaysi
- …