9,505 research outputs found
A Unified Framework for Mutual Improvement of SLAM and Semantic Segmentation
This paper presents a novel framework for simultaneously implementing
localization and segmentation, which are two of the most important vision-based
tasks for robotics. While the goals and techniques used for them were
considered to be different previously, we show that by making use of the
intermediate results of the two modules, their performance can be enhanced at
the same time. Our framework is able to handle both the instantaneous motion
and long-term changes of instances in localization with the help of the
segmentation result, which also benefits from the refined 3D pose information.
We conduct experiments on various datasets, and prove that our framework works
effectively on improving the precision and robustness of the two tasks and
outperforms existing localization and segmentation algorithms.Comment: 7 pages, 5 figures.This work has been accepted by ICRA 2019. The demo
video can be found at https://youtu.be/Bkt53dAehj
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Dynamic Objects Segmentation for Visual Localization in Urban Environments
Visual localization and mapping is a crucial capability to address many
challenges in mobile robotics. It constitutes a robust, accurate and
cost-effective approach for local and global pose estimation within prior maps.
Yet, in highly dynamic environments, like crowded city streets, problems arise
as major parts of the image can be covered by dynamic objects. Consequently,
visual odometry pipelines often diverge and the localization systems
malfunction as detected features are not consistent with the precomputed 3D
model. In this work, we present an approach to automatically detect dynamic
object instances to improve the robustness of vision-based localization and
mapping in crowded environments. By training a convolutional neural network
model with a combination of synthetic and real-world data, dynamic object
instance masks are learned in a semi-supervised way. The real-world data can be
collected with a standard camera and requires minimal further post-processing.
Our experiments show that a wide range of dynamic objects can be reliably
detected using the presented method. Promising performance is demonstrated on
our own and also publicly available datasets, which also shows the
generalization capabilities of this approach.Comment: 4 pages, submitted to the IROS 2018 Workshop "From Freezing to
Jostling Robots: Current Challenges and New Paradigms for Safe Robot
Navigation in Dense Crowds
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects
In this paper we introduce Co-Fusion, a dense SLAM system that takes a live
stream of RGB-D images as input and segments the scene into different objects
(using either motion or semantic cues) while simultaneously tracking and
reconstructing their 3D shape in real time. We use a multiple model fitting
approach where each object can move independently from the background and still
be effectively tracked and its shape fused over time using only the information
from pixels associated with that object label. Previous attempts to deal with
dynamic scenes have typically considered moving regions as outliers, and
consequently do not model their shape or track their motion over time. In
contrast, we enable the robot to maintain 3D models for each of the segmented
objects and to improve them over time through fusion. As a result, our system
can enable a robot to maintain a scene description at the object level which
has the potential to allow interactions with its working environment; even in
the case of dynamic scenes.Comment: International Conference on Robotics and Automation (ICRA) 2017,
http://visual.cs.ucl.ac.uk/pubs/cofusion,
https://github.com/martinruenz/co-fusio
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments
Simultaneous Localization and Mapping (SLAM) is considered to be a
fundamental capability for intelligent mobile robots. Over the past decades,
many impressed SLAM systems have been developed and achieved good performance
under certain circumstances. However, some problems are still not well solved,
for example, how to tackle the moving objects in the dynamic environments, how
to make the robots truly understand the surroundings and accomplish advanced
tasks. In this paper, a robust semantic visual SLAM towards dynamic
environments named DS-SLAM is proposed. Five threads run in parallel in
DS-SLAM: tracking, semantic segmentation, local mapping, loop closing, and
dense semantic map creation. DS-SLAM combines semantic segmentation network
with moving consistency check method to reduce the impact of dynamic objects,
and thus the localization accuracy is highly improved in dynamic environments.
Meanwhile, a dense semantic octo-tree map is produced, which could be employed
for high-level tasks. We conduct experiments both on TUM RGB-D dataset and in
the real-world environment. The results demonstrate the absolute trajectory
accuracy in DS-SLAM can be improved by one order of magnitude compared with
ORB-SLAM2. It is one of the state-of-the-art SLAM systems in high-dynamic
environments. Now the code is available at our github:
https://github.com/ivipsourcecode/DS-SLAMComment: 7 pages, accepted at the 2018 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2018). Now the code is available at our
github: https://github.com/ivipsourcecode/DS-SLA
- …