6,107 research outputs found
ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes
Exploiting synthetic data to learn deep models has attracted increasing
attention in recent years. However, the intrinsic domain difference between
synthetic and real images usually causes a significant performance drop when
applying the learned model to real world scenarios. This is mainly due to two
reasons: 1) the model overfits to synthetic images, making the convolutional
filters incompetent to extract informative representation for real images; 2)
there is a distribution difference between synthetic and real data, which is
also known as the domain adaptation problem. To this end, we propose a new
reality oriented adaptation approach for urban scene semantic segmentation by
learning from synthetic data. First, we propose a target guided distillation
approach to learn the real image style, which is achieved by training the
segmentation model to imitate a pretrained real style model using real images.
Second, we further take advantage of the intrinsic spatial structure presented
in urban scene images, and propose a spatial-aware adaptation scheme to
effectively align the distribution of two domains. These two modules can be
readily integrated with existing state-of-the-art semantic segmentation
networks to improve their generalizability when adapting from synthetic to real
urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting
from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness
of our method.Comment: Add experiments on SYNTHIA, CVPR 2018 camera-ready versio
Dynamic Objects Segmentation for Visual Localization in Urban Environments
Visual localization and mapping is a crucial capability to address many
challenges in mobile robotics. It constitutes a robust, accurate and
cost-effective approach for local and global pose estimation within prior maps.
Yet, in highly dynamic environments, like crowded city streets, problems arise
as major parts of the image can be covered by dynamic objects. Consequently,
visual odometry pipelines often diverge and the localization systems
malfunction as detected features are not consistent with the precomputed 3D
model. In this work, we present an approach to automatically detect dynamic
object instances to improve the robustness of vision-based localization and
mapping in crowded environments. By training a convolutional neural network
model with a combination of synthetic and real-world data, dynamic object
instance masks are learned in a semi-supervised way. The real-world data can be
collected with a standard camera and requires minimal further post-processing.
Our experiments show that a wide range of dynamic objects can be reliably
detected using the presented method. Promising performance is demonstrated on
our own and also publicly available datasets, which also shows the
generalization capabilities of this approach.Comment: 4 pages, submitted to the IROS 2018 Workshop "From Freezing to
Jostling Robots: Current Challenges and New Paradigms for Safe Robot
Navigation in Dense Crowds
A Fully Convolutional Tri-branch Network (FCTN) for Domain Adaptation
A domain adaptation method for urban scene segmentation is proposed in this
work. We develop a fully convolutional tri-branch network, where two branches
assign pseudo labels to images in the unlabeled target domain while the third
branch is trained with supervision based on images in the pseudo-labeled target
domain. The re-labeling and re-training processes alternate. With this design,
the tri-branch network learns target-specific discriminative representations
progressively and, as a result, the cross-domain capability of the segmenter
improves. We evaluate the proposed network on large-scale domain adaptation
experiments using both synthetic (GTA) and real (Cityscapes) images. It is
shown that our solution achieves the state-of-the-art performance and it
outperforms previous methods by a significant margin.Comment: Accepted by ICASSP 201
COCO_TS Dataset: Pixel-level Annotations Based on Weak Supervision for Scene Text Segmentation
The absence of large scale datasets with pixel-level supervisions is a
significant obstacle for the training of deep convolutional networks for scene
text segmentation. For this reason, synthetic data generation is normally
employed to enlarge the training dataset. Nonetheless, synthetic data cannot
reproduce the complexity and variability of natural images. In this paper, a
weakly supervised learning approach is used to reduce the shift between
training on real and synthetic data. Pixel-level supervisions for a text
detection dataset (i.e. where only bounding-box annotations are available) are
generated. In particular, the COCO-Text-Segmentation (COCO_TS) dataset, which
provides pixel-level supervisions for the COCO-Text dataset, is created and
released. The generated annotations are used to train a deep convolutional
neural network for semantic segmentation. Experiments show that the proposed
dataset can be used instead of synthetic data, allowing us to use only a
fraction of the training samples and significantly improving the performances
- …