24,571 research outputs found
A survey on 2d object tracking in digital video
This paper presents object tracking methods in video.Different algorithms based on rigid, non rigid and articulated object tracking are studied. The goal of this article is to review the state-of-the-art tracking methods, classify them
into different categories, and identify new trends.It is often the case that tracking objects in consecutive frames is supported by a prediction scheme. Based on information extracted from previous frames and any high level information that can be obtained, the state (location) of the
object is predicted.An excellent framework for prediction is kalman filter, which additionally estimates prediction error.In complex scenes, instead of single hypothesis, multiple hypotheses using Particle filter can be used.Different
techniques are given for different types of constraints in video
FULL 3D RECONSTRUCTION OF DYNAMIC NON-RIGID SCENES: ACQUISITION AND ENHANCEMENT
Recent advances in commodity depth or 3D sensing technologies have enabled us to move
closer to the goal of accurately sensing and modeling the 3D representations of complex
dynamic scenes. Indeed, in domains such as virtual reality, security, surveillance and
e-health, there is now a greater demand for aff ordable and flexible vision systems which
are capable of acquiring high quality 3D reconstructions. Available commodity RGB-D
cameras, though easily accessible, have limited fi eld-of-view, and acquire noisy and low-resolution measurements which restricts their direct usage in building such vision systems.
This thesis targets these limitations and builds approaches around commodity 3D
sensing technologies to acquire noise-free and feature preserving full 3D reconstructions
of dynamic scenes containing, static or moving, rigid or non-rigid objects. A mono-view
system based on a single RGB-D camera is incapable of acquiring full 360 degrees 3D reconstruction of a dynamic scene instantaneously. For this purpose, a multi-view system
composed of several RGB-D cameras covering the whole scene is used. In the first part of
this thesis, the domain of correctly aligning the information acquired from RGB-D cameras
in a multi-view system to provide full and textured 3D reconstructions of dynamic
scenes, instantaneously, is explored. This is achieved by solving the extrinsic calibration
problem. This thesis proposes an extrinsic calibration framework which uses the 2D
photometric and 3D geometric information, acquired with RGB-D cameras, according
to their relative (in)accuracies, a ffected by the presence of noise, in a single weighted
bi-objective optimization. An iterative scheme is also proposed, which estimates the parameters
of noise model aff ecting both 2D and 3D measurements, and solves the extrinsic
calibration problem simultaneously. Results show improvement in calibration accuracy
as compared to state-of-art methods. In the second part of this thesis, the domain
of enhancement of noisy and low-resolution 3D data acquired with commodity RGB-D
cameras in both mono-view and multi-view systems is explored. This thesis extends
the state-of-art in mono-view template-free recursive 3D data enhancement which targets
dynamic scenes containing rigid-objects, and thus requires tracking only the global
motions of those objects for view-dependent surface representation and fi ltering. This
thesis proposes to target dynamic scenes containing non-rigid objects which introduces
the complex requirements of tracking relatively large local motions and maintaining data
organization for view-dependent surface representation. The proposed method is shown
to be e ffective in handling non-rigid objects of changing topologies. Building upon the
previous work, this thesis overcomes the requirement of data organization by proposing
an approach based on view-independent surface representation. View-independence
decreases the complexity of the proposed algorithm and allows it the flexibility to process
and enhance noisy data, acquired with multiple cameras in a multi-view system,
simultaneously. Moreover, qualitative and quantitative experimental analysis shows this
method to be more accurate in removing noise to produce enhanced 3D reconstructions
of non-rigid objects. Although, extending this method to a multi-view system would
allow for obtaining instantaneous enhanced full 360 degrees 3D reconstructions of non-rigid
objects, it still lacks the ability to explicitly handle low-resolution data. Therefore, this
thesis proposes a novel recursive dynamic multi-frame 3D super-resolution algorithm
together with a novel 3D bilateral total variation regularization to filter out the noise,
recover details and enhance the resolution of data acquired from commodity cameras in
a multi-view system. Results show that this method is able to build accurate, smooth
and feature preserving full 360 degrees 3D reconstructions of the dynamic scenes containing
non-rigid objects
Temporally coherent 4D reconstruction of complex dynamic scenes
This paper presents an approach for reconstruction of 4D temporally coherent
models of complex dynamic scenes. No prior knowledge is required of scene
structure or camera calibration allowing reconstruction from multiple moving
cameras. Sparse-to-dense temporal correspondence is integrated with joint
multi-view segmentation and reconstruction to obtain a complete 4D
representation of static and dynamic objects. Temporal coherence is exploited
to overcome visual ambiguities resulting in improved reconstruction of complex
scenes. Robust joint segmentation and reconstruction of dynamic objects is
achieved by introducing a geodesic star convexity constraint. Comparative
evaluation is performed on a variety of unstructured indoor and outdoor dynamic
scenes with hand-held cameras and multiple people. This demonstrates
reconstruction of complete temporally coherent 4D scene models with improved
nonrigid object segmentation and shape reconstruction.Comment: To appear in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 2016 . Video available at:
https://www.youtube.com/watch?v=bm_P13_-Ds
Real-Time Enhancement of Dynamic Depth Videos with Non-Rigid Deformations
We propose a novel approach for enhancing depth videos containing non-rigidly deforming objects. Depth sensors are capable of capturing depth maps in real-time but suffer from high noise levels and low spatial resolutions. While solutions for reconstructing 3D details in static scenes, or scenes with rigid global motions have been recently proposed, handling unconstrained non-rigid deformations in relative complex scenes remains a challenge. Our solution consists in a recursive dynamic multi-frame superresolution algorithm where the relative local 3D motions between consecutive frames are directly accounted for. We rely on the assumption that these 3D motions can be decoupled into lateral motions and radial displacements. This allows to perform a simple local per-pixel tracking where both depth measurements and deformations are dynamically optimized. The geometric smoothness is subsequently added using a multi-level L1 minimization with a bilateral total variation regularization. The performance of this method is thoroughly evaluated on both real and synthetic data. As compared to alternative approaches, the results show a clear improvement in reconstruction accuracy and in robustness to noise, to relative large non-rigid deformations, and to topological changes. Moreover, the proposed approach, implemented on a CPU, is shown to be computationally efficient and working in real-time
Shape Animation with Combined Captured and Simulated Dynamics
We present a novel volumetric animation generation framework to create new
types of animations from raw 3D surface or point cloud sequence of captured
real performances. The framework considers as input time incoherent 3D
observations of a moving shape, and is thus particularly suitable for the
output of performance capture platforms. In our system, a suitable virtual
representation of the actor is built from real captures that allows seamless
combination and simulation with virtual external forces and objects, in which
the original captured actor can be reshaped, disassembled or reassembled from
user-specified virtual physics. Instead of using the dominant surface-based
geometric representation of the capture, which is less suitable for volumetric
effects, our pipeline exploits Centroidal Voronoi tessellation decompositions
as unified volumetric representation of the real captured actor, which we show
can be used seamlessly as a building block for all processing stages, from
capture and tracking to virtual physic simulation. The representation makes no
human specific assumption and can be used to capture and re-simulate the actor
with props or other moving scenery elements. We demonstrate the potential of
this pipeline for virtual reanimation of a real captured event with various
unprecedented volumetric visual effects, such as volumetric distortion,
erosion, morphing, gravity pull, or collisions
Better Feature Tracking Through Subspace Constraints
Feature tracking in video is a crucial task in computer vision. Usually, the
tracking problem is handled one feature at a time, using a single-feature
tracker like the Kanade-Lucas-Tomasi algorithm, or one of its derivatives.
While this approach works quite well when dealing with high-quality video and
"strong" features, it often falters when faced with dark and noisy video
containing low-quality features. We present a framework for jointly tracking a
set of features, which enables sharing information between the different
features in the scene. We show that our method can be employed to track
features for both rigid and nonrigid motions (possibly of few moving bodies)
even when some features are occluded. Furthermore, it can be used to
significantly improve tracking results in poorly-lit scenes (where there is a
mix of good and bad features). Our approach does not require direct modeling of
the structure or the motion of the scene, and runs in real time on a single CPU
core.Comment: 8 pages, 2 figures. CVPR 201
- …