1,219 research outputs found
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Salient Object Detection in Video using Deep Non-Local Neural Networks
Detection of salient objects in image and video is of great importance in
many computer vision applications. In spite of the fact that the state of the
art in saliency detection for still images has been changed substantially over
the last few years, there have been few improvements in video saliency
detection. This paper investigates the use of recently introduced non-local
neural networks in video salient object detection. Non-local neural networks
are applied to capture global dependencies and hence determine the salient
objects. The effect of non-local operations is studied separately on static and
dynamic saliency detection in order to exploit both appearance and motion
features. A novel deep non-local neural network architecture is introduced for
video salient object detection and tested on two well-known datasets DAVIS and
FBMS. The experimental results show that the proposed algorithm outperforms
state-of-the-art video saliency detection methods.Comment: Submitted to Journal of Visual Communication and Image Representatio
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification
Video classification is highly important with wide applications, such as
video search and intelligent surveillance. Video naturally consists of static
and motion information, which can be represented by frame and optical flow.
Recently, researchers generally adopt the deep networks to capture the static
and motion information \textbf{\emph{separately}}, which mainly has two
limitations: (1) Ignoring the coexistence relationship between spatial and
temporal attention, while they should be jointly modelled as the spatial and
temporal evolutions of video, thus discriminative video features can be
extracted.(2) Ignoring the strong complementarity between static and motion
information coexisted in video, while they should be collaboratively learned to
boost each other. For addressing the above two limitations, this paper proposes
the approach of two-stream collaborative learning with spatial-temporal
attention (TCLSTA), which consists of two models: (1) Spatial-temporal
attention model: The spatial-level attention emphasizes the salient regions in
frame, and the temporal-level attention exploits the discriminative frames in
video. They are jointly learned and mutually boosted to learn the
discriminative static and motion features for better classification
performance. (2) Static-motion collaborative model: It not only achieves mutual
guidance on static and motion information to boost the feature learning, but
also adaptively learns the fusion weights of static and motion streams, so as
to exploit the strong complementarity between static and motion information to
promote video classification. Experiments on 4 widely-used datasets show that
our TCLSTA approach achieves the best performance compared with more than 10
state-of-the-art methods.Comment: 14 pages, accepted by IEEE Transactions on Circuits and Systems for
Video Technolog
Motion-Appearance Interactive Encoding for Object Segmentation in Unconstrained Videos
We present a novel method of integrating motion and appearance cues for
foreground object segmentation in unconstrained videos. Unlike conventional
methods encoding motion and appearance patterns individually, our method puts
particular emphasis on their mutual assistance. Specifically, we propose using
an interactively constrained encoding (ICE) scheme to incorporate motion and
appearance patterns into a graph that leads to a spatiotemporal energy
optimization. The reason of utilizing ICE is that both motion and appearance
cues for the same target share underlying correlative structure, thus can be
exploited in a deeply collaborative manner. We perform ICE not only in the
initialization but also in the refinement stage of a two-layer framework for
object segmentation. This scheme allows our method to consistently capture
structural patterns about object perceptions throughout the whole framework.
Our method can be operated on superpixels instead of raw pixels to reduce the
number of graph nodes by two orders of magnitude. Moreover, we propose to
partially explore the multi-object localization problem with inter-occlusion by
weighted bipartite graph matching. Comprehensive experiments on three benchmark
datasets (i.e., SegTrack, MOViCS, and GaTech) demonstrate the effectiveness of
our approach compared with extensive state-of-the-art methods.Comment: 11 pages, 7 figure
A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges
Co-saliency detection is a newly emerging and rapidly growing research area
in computer vision community. As a novel branch of visual saliency, co-saliency
detection refers to the discovery of common and salient foregrounds from two or
more relevant images, and can be widely used in many computer vision tasks. The
existing co-saliency detection algorithms mainly consist of three components:
extracting effective features to represent the image regions, exploring the
informative cues or factors to characterize co-saliency, and designing
effective computational frameworks to formulate co-saliency. Although numerous
methods have been developed, the literature is still lacking a deep review and
evaluation of co-saliency detection techniques. In this paper, we aim at
providing a comprehensive review of the fundamentals, challenges, and
applications of co-saliency detection. Specifically, we provide an overview of
some related computer vision works, review the history of co-saliency
detection, summarize and categorize the major algorithms in this research area,
discuss some open issues in this area, present the potential applications of
co-saliency detection, and finally point out some unsolved challenges and
promising future works. We expect this review to be beneficial to both fresh
and senior researchers in this field, and give insights to researchers in other
related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table
Towards Storytelling from Visual Lifelogging: An Overview
Visual lifelogging consists of acquiring images that capture the daily
experiences of the user by wearing a camera over a long period of time. The
pictures taken offer considerable potential for knowledge mining concerning how
people live their lives, hence, they open up new opportunities for many
potential applications in fields including healthcare, security, leisure and
the quantified self. However, automatically building a story from a huge
collection of unstructured egocentric data presents major challenges. This
paper provides a thorough review of advances made so far in egocentric data
analysis, and in view of the current state of the art, indicates new lines of
research to move us towards storytelling from visual lifelogging.Comment: 16 pages, 11 figures, Submitted to IEEE Transactions on Human-Machine
System
Review of Visual Saliency Detection with Comprehensive Information
Visual saliency detection model simulates the human visual system to perceive
the scene, and has been widely used in many vision tasks. With the acquisition
technology development, more comprehensive information, such as depth cue,
inter-image correspondence, or temporal relationship, is available to extend
image saliency detection to RGBD saliency detection, co-saliency detection, or
video saliency detection. RGBD saliency detection model focuses on extracting
the salient regions from RGBD images by combining the depth information.
Co-saliency detection model introduces the inter-image correspondence
constraint to discover the common salient object in an image group. The goal of
video saliency detection model is to locate the motion-related salient object
in video sequences, which considers the motion cue and spatiotemporal
constraint jointly. In this paper, we review different types of saliency
detection algorithms, summarize the important issues of the existing methods,
and discuss the existent problems and future works. Moreover, the evaluation
datasets and quantitative measurements are briefly introduced, and the
experimental analysis and discission are conducted to provide a holistic
overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on
Circuits and Systems for Video Technology 2018, https://rmcong.github.io
Advances in Human Action Recognition: A Survey
Human action recognition has been an important topic in computer vision due
to its many applications such as video surveillance, human machine interaction
and video retrieval. One core problem behind these applications is
automatically recognizing low-level actions and high-level activities of
interest. The former is usually the basis for the latter. This survey gives an
overview of the most recent advances in human action recognition during the
past several years, following a well-formed taxonomy proposed by a previous
survey. From this state-of-the-art survey, researchers can view a panorama of
progress in this area for future research
Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction
Computational saliency models for still images have gained significant
popularity in recent years. Saliency prediction from videos, on the other hand,
has received relatively little interest from the community. Motivated by this,
in this work, we study the use of deep learning for dynamic saliency prediction
and propose the so-called spatio-temporal saliency networks. The key to our
models is the architecture of two-stream networks where we investigate
different fusion mechanisms to integrate spatial and temporal information. We
evaluate our models on the DIEM and UCF-Sports datasets and present highly
competitive results against the existing state-of-the-art models. We also carry
out some experiments on a number of still images from the MIT300 dataset by
exploiting the optical flow maps predicted from these images. Our results show
that considering inherent motion information in this way can be helpful for
static saliency estimation
- …