16,505 research outputs found
Temporally coherent 4D reconstruction of complex dynamic scenes
This paper presents an approach for reconstruction of 4D temporally coherent
models of complex dynamic scenes. No prior knowledge is required of scene
structure or camera calibration allowing reconstruction from multiple moving
cameras. Sparse-to-dense temporal correspondence is integrated with joint
multi-view segmentation and reconstruction to obtain a complete 4D
representation of static and dynamic objects. Temporal coherence is exploited
to overcome visual ambiguities resulting in improved reconstruction of complex
scenes. Robust joint segmentation and reconstruction of dynamic objects is
achieved by introducing a geodesic star convexity constraint. Comparative
evaluation is performed on a variety of unstructured indoor and outdoor dynamic
scenes with hand-held cameras and multiple people. This demonstrates
reconstruction of complete temporally coherent 4D scene models with improved
nonrigid object segmentation and shape reconstruction.Comment: To appear in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 2016 . Video available at:
https://www.youtube.com/watch?v=bm_P13_-Ds
General Dynamic Scene Reconstruction from Multiple View Video
This paper introduces a general approach to dynamic scene reconstruction from
multiple moving cameras without prior knowledge or limiting constraints on the
scene structure, appearance, or illumination. Existing techniques for dynamic
scene reconstruction from multiple wide-baseline camera views primarily focus
on accurate reconstruction in controlled environments, where the cameras are
fixed and calibrated and background is known. These approaches are not robust
for general dynamic scenes captured with sparse moving cameras. Previous
approaches for outdoor dynamic scene reconstruction assume prior knowledge of
the static background appearance and structure. The primary contributions of
this paper are twofold: an automatic method for initial coarse dynamic scene
segmentation and reconstruction without prior knowledge of background
appearance or structure; and a general robust approach for joint segmentation
refinement and dense reconstruction of dynamic scenes from multiple
wide-baseline static or moving cameras. Evaluation is performed on a variety of
indoor and outdoor scenes with cluttered backgrounds and multiple dynamic
non-rigid objects such as people. Comparison with state-of-the-art approaches
demonstrates improved accuracy in both multiple view segmentation and dense
reconstruction. The proposed approach also eliminates the requirement for prior
knowledge of scene structure and appearance
Dynamic Body VSLAM with Semantic Constraints
Image based reconstruction of urban environments is a challenging problem
that deals with optimization of large number of variables, and has several
sources of errors like the presence of dynamic objects. Since most large scale
approaches make the assumption of observing static scenes, dynamic objects are
relegated to the noise modeling section of such systems. This is an approach of
convenience since the RANSAC based framework used to compute most multiview
geometric quantities for static scenes naturally confine dynamic objects to the
class of outlier measurements. However, reconstructing dynamic objects along
with the static environment helps us get a complete picture of an urban
environment. Such understanding can then be used for important robotic tasks
like path planning for autonomous navigation, obstacle tracking and avoidance,
and other areas. In this paper, we propose a system for robust SLAM that works
in both static and dynamic environments. To overcome the challenge of dynamic
objects in the scene, we propose a new model to incorporate semantic
constraints into the reconstruction algorithm. While some of these constraints
are based on multi-layered dense CRFs trained over appearance as well as motion
cues, other proposed constraints can be expressed as additional terms in the
bundle adjustment optimization process that does iterative refinement of 3D
structure and camera / object motion trajectories. We show results on the
challenging KITTI urban dataset for accuracy of motion segmentation and
reconstruction of the trajectory and shape of moving objects relative to ground
truth. We are able to show average relative error reduction by a significant
amount for moving object trajectory reconstruction relative to state-of-the-art
methods like VISO 2, as well as standard bundle adjustment algorithms
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects
In this paper we introduce Co-Fusion, a dense SLAM system that takes a live
stream of RGB-D images as input and segments the scene into different objects
(using either motion or semantic cues) while simultaneously tracking and
reconstructing their 3D shape in real time. We use a multiple model fitting
approach where each object can move independently from the background and still
be effectively tracked and its shape fused over time using only the information
from pixels associated with that object label. Previous attempts to deal with
dynamic scenes have typically considered moving regions as outliers, and
consequently do not model their shape or track their motion over time. In
contrast, we enable the robot to maintain 3D models for each of the segmented
objects and to improve them over time through fusion. As a result, our system
can enable a robot to maintain a scene description at the object level which
has the potential to allow interactions with its working environment; even in
the case of dynamic scenes.Comment: International Conference on Robotics and Automation (ICRA) 2017,
http://visual.cs.ucl.ac.uk/pubs/cofusion,
https://github.com/martinruenz/co-fusio
Robust Dense Mapping for Large-Scale Dynamic Environments
We present a stereo-based dense mapping algorithm for large-scale dynamic
urban environments. In contrast to other existing methods, we simultaneously
reconstruct the static background, the moving objects, and the potentially
moving but currently stationary objects separately, which is desirable for
high-level mobile robotic tasks such as path planning in crowded environments.
We use both instance-aware semantic segmentation and sparse scene flow to
classify objects as either background, moving, or potentially moving, thereby
ensuring that the system is able to model objects with the potential to
transition from static to dynamic, such as parked cars. Given camera poses
estimated from visual odometry, both the background and the (potentially)
moving objects are reconstructed separately by fusing the depth maps computed
from the stereo input. In addition to visual odometry, sparse scene flow is
also used to estimate the 3D motions of the detected moving objects, in order
to reconstruct them accurately. A map pruning technique is further developed to
improve reconstruction accuracy and reduce memory consumption, leading to
increased scalability. We evaluate our system thoroughly on the well-known
KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz,
with the primary bottleneck being the instance-aware semantic segmentation,
which is a limitation we hope to address in future work. The source code is
available from the project website (http://andreibarsan.github.io/dynslam).Comment: Presented at IEEE International Conference on Robotics and Automation
(ICRA), 201
Mesh-based 3D Textured Urban Mapping
In the era of autonomous driving, urban mapping represents a core step to let
vehicles interact with the urban context. Successful mapping algorithms have
been proposed in the last decade building the map leveraging on data from a
single sensor. The focus of the system presented in this paper is twofold: the
joint estimation of a 3D map from lidar data and images, based on a 3D mesh,
and its texturing. Indeed, even if most surveying vehicles for mapping are
endowed by cameras and lidar, existing mapping algorithms usually rely on
either images or lidar data; moreover both image-based and lidar-based systems
often represent the map as a point cloud, while a continuous textured mesh
representation would be useful for visualization and navigation purposes. In
the proposed framework, we join the accuracy of the 3D lidar data, and the
dense information and appearance carried by the images, in estimating a
visibility consistent map upon the lidar measurements, and refining it
photometrically through the acquired images. We evaluate the proposed framework
against the KITTI dataset and we show the performance improvement with respect
to two state of the art urban mapping algorithms, and two widely used surface
reconstruction algorithms in Computer Graphics.Comment: accepted at iros 201
- …