3,908 research outputs found
Generalized Video Deblurring for Dynamic Scenes
Several state-of-the-art video deblurring methods are based on a strong
assumption that the captured scenes are static. These methods fail to deblur
blurry videos in dynamic scenes. We propose a video deblurring method to deal
with general blurs inherent in dynamic scenes, contrary to other methods. To
handle locally varying and general blurs caused by various sources, such as
camera shake, moving objects, and depth variation in a scene, we approximate
pixel-wise kernel with bidirectional optical flows. Therefore, we propose a
single energy model that simultaneously estimates optical flows and latent
frames to solve our deblurring problem. We also provide a framework and
efficient solvers to optimize the energy model. By minimizing the proposed
energy function, we achieve significant improvements in removing blurs and
estimating accurate optical flows in blurry frames. Extensive experimental
results demonstrate the superiority of the proposed method in real and
challenging videos that state-of-the-art methods fail in either deblurring or
optical flow estimation.Comment: CVPR 2015 ora
Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence
The conventional methods for estimating camera poses and scene structures
from severely blurry or low resolution images often result in failure. The
off-the-shelf deblurring or super-resolution methods may show visually pleasing
results. However, applying each technique independently before matching is
generally unprofitable because this naive series of procedures ignores the
consistency between images. In this paper, we propose a pioneering unified
framework that solves four problems simultaneously, namely, dense depth
reconstruction, camera pose estimation, super-resolution, and deblurring. By
reflecting a physical imaging process, we formulate a cost minimization problem
and solve it using an alternating optimization technique. The experimental
results on both synthetic and real videos show high-quality depth maps derived
from severely degraded images that contrast the failures of naive multi-view
stereo methods. Our proposed method also produces outstanding deblurred and
super-resolved images unlike the independent application or combination of
conventional video deblurring, super-resolution methods.Comment: accepted to ICCV 201
Multiple Object Tracking: A Literature Review
Multiple Object Tracking (MOT) is an important computer vision problem which
has gained increasing attention due to its academic and commercial potential.
Although different kinds of approaches have been proposed to tackle this
problem, it still remains challenging due to factors like abrupt appearance
changes and severe object occlusions. In this work, we contribute the first
comprehensive and most recent review on this problem. We inspect the recent
advances in various aspects and propose some interesting directions for future
research. To the best of our knowledge, there has not been any extensive review
on this topic in the community. We endeavor to provide a thorough review on the
development of this problem in recent decades. The main contributions of this
review are fourfold: 1) Key aspects in a multiple object tracking system,
including formulation, categorization, key principles, evaluation of an MOT are
discussed. 2) Instead of enumerating individual works, we discuss existing
approaches according to various aspects, in each of which methods are divided
into different groups and each group is discussed in detail for the principles,
advances and drawbacks. 3) We examine experiments of existing publications and
summarize results on popular datasets to provide quantitative comparisons. We
also point to some interesting discoveries by analyzing these results. 4) We
provide a discussion about issues of MOT research, as well as some interesting
directions which could possibly become potential research effort in the future
Pushing the Limits of Deep CNNs for Pedestrian Detection
Compared to other applications in computer vision, convolutional neural
networks have under-performed on pedestrian detection. A breakthrough was made
very recently by using sophisticated deep CNN models, with a number of
hand-crafted features, or explicit occlusion handling mechanism. In this work,
we show that by re-using the convolutional feature maps (CFMs) of a deep
convolutional neural network (DCNN) model as image features to train an
ensemble of boosted decision models, we are able to achieve the best reported
accuracy without using specially designed learning algorithms. We empirically
identify and disclose important implementation details. We also show that pixel
labelling may be simply combined with a detector to boost the detection
performance. By adding complementary hand-crafted features such as optical
flow, the DCNN based detector can be further improved. We set a new record on
the Caltech pedestrian dataset, lowering the log-average miss rate from
to , a relative improvement of . We also achieve a
comparable result to the state-of-the-art approaches on the KITTI dataset.Comment: Fixed some typo
Structured Depth Prediction in Challenging Monocular Video Sequences
In this paper, we tackle the problem of estimating the depth of a scene from
a monocular video sequence. In particular, we handle challenging scenarios,
such as non-translational camera motion and dynamic scenes, where traditional
structure from motion and motion stereo methods do not apply. To this end, we
first study the problem of depth estimation from a single image. In this
context, we exploit the availability of a pool of images for which the depth is
known, and formulate monocular depth estimation as a discrete-continuous
optimization problem, where the continuous variables encode the depth of the
superpixels in the input image, and the discrete ones represent relationships
between neighboring superpixels. The solution to this discrete-continuous
optimization problem is obtained by performing inference in a graphical model
using particle belief propagation. To handle video sequences, we then extend
our single image model to a two-frame one that naturally encodes short-range
temporal consistency and inherently handles dynamic objects. Based on the
prediction of this model, we then introduce a fully-connected pairwise CRF that
accounts for longer range spatio-temporal interactions throughout a video. We
demonstrate the effectiveness of our model in both the indoor and outdoor
scenarios
Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects
Using synthetic data for training deep neural networks for robotic
manipulation holds the promise of an almost unlimited amount of pre-labeled
training data, generated safely out of harm's way. One of the key challenges of
synthetic data, to date, has been to bridge the so-called reality gap, so that
networks trained on synthetic data operate correctly when exposed to real-world
data. We explore the reality gap in the context of 6-DoF pose estimation of
known objects from a single RGB image. We show that for this problem the
reality gap can be successfully spanned by a simple combination of domain
randomized and photorealistic data. Using synthetic data generated in this
manner, we introduce a one-shot deep neural network that is able to perform
competitively against a state-of-the-art network trained on a combination of
real and synthetic data. To our knowledge, this is the first deep network
trained only on synthetic data that is able to achieve state-of-the-art
performance on 6-DoF object pose estimation. Our network also generalizes
better to novel environments including extreme lighting conditions, for which
we show qualitative results. Using this network we demonstrate a real-time
system estimating object poses with sufficient accuracy for real-world semantic
grasping of known household objects in clutter by a real robot.Comment: Conference on Robot Learning (CoRL) 201
Differentiating Objects by Motion: Joint Detection and Tracking of Small Flying Objects
While generic object detection has achieved large improvements with rich
feature hierarchies from deep nets, detecting small objects with poor visual
cues remains challenging. Motion cues from multiple frames may be more
informative for detecting such hard-to-distinguish objects in each frame.
However, how to encode discriminative motion patterns, such as deformations and
pose changes that characterize objects, has remained an open question. To learn
them and thereby realize small object detection, we present a neural model
called the Recurrent Correlational Network, where detection and tracking are
jointly performed over a multi-frame representation learned through a single,
trainable, and end-to-end network. A convolutional long short-term memory
network is utilized for learning informative appearance change for detection,
while learned representation is shared in tracking for enhancing its
performance. In experiments with datasets containing images of scenes with
small flying objects, such as birds and unmanned aerial vehicles, the proposed
method yielded consistent improvements in detection performance over deep
single-frame detectors and existing motion-based detectors. Furthermore, our
network performs as well as state-of-the-art generic object trackers when it
was evaluated as a tracker on the bird dataset.Comment: 10 pages, 8 figure
Human detection in surveillance videos and its applications - a review
Detecting human beings accurately in a visual surveillance system is crucial for diverse application areas including abnormal event detection, human gait characterization, congestion analysis, person identification, gender classification and fall detection for elderly people. The first step of the detection process is to detect an object which is in motion. Object detection could be performed using background subtraction, optical flow and spatio-temporal filtering techniques. Once detected, a moving object could be classified as a human being using shape-based, texture-based or motion-based features. A comprehensive review with comparisons on available techniques for detecting human beings in surveillance videos is presented in this paper. The characteristics of few benchmark datasets as well as the future research directions on human detection have also been discussed
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
- …