107,063 research outputs found
Visual road following using intrinsic images
We present a real-time visual-based road following method for mobile robots in outdoor environments. The approach combines an image processing method, that allows to retrieve illumination invariant images, with an efficient path following algorithm. The method allows a mobile robot to autonomously navigate along pathways of different types in adverse lighting conditions using monocular vision
Self-Supervised Relative Depth Learning for Urban Scene Understanding
As an agent moves through the world, the apparent motion of scene elements is
(usually) inversely proportional to their depth. It is natural for a learning
agent to associate image patterns with the magnitude of their displacement over
time: as the agent moves, faraway mountains don't move much; nearby trees move
a lot. This natural relationship between the appearance of objects and their
motion is a rich source of information about the world. In this work, we start
by training a deep network, using fully automatic supervision, to predict
relative scene depth from single images. The relative depth training images are
automatically derived from simple videos of cars moving through a scene, using
recent motion segmentation techniques, and no human-provided labels. This proxy
task of predicting relative depth from a single image induces features in the
network that result in large improvements in a set of downstream tasks
including semantic segmentation, joint road segmentation and car detection, and
monocular (absolute) depth estimation, over a network trained from scratch. The
improvement on the semantic segmentation task is greater than those produced by
any other automatically supervised methods. Moreover, for monocular depth
estimation, our unsupervised pre-training method even outperforms supervised
pre-training with ImageNet. In addition, we demonstrate benefits from learning
to predict (unsupervised) relative depth in the specific videos associated with
various downstream tasks. We adapt to the specific scenes in those tasks in an
unsupervised manner to improve performance. In summary, for semantic
segmentation, we present state-of-the-art results among methods that do not use
supervised pre-training, and we even exceed the performance of supervised
ImageNet pre-trained models for monocular depth estimation, achieving results
that are comparable with state-of-the-art methods
Recommended from our members
Spatial calibration of an optical see-through head-mounted display
We present here a method for calibrating an optical see-through Head Mounted Display (HMD) using techniques usually applied to camera calibration (photogrammetry). Using a camera placed inside the HMD to take pictures simultaneously of a tracked object and features in the HMD display, we could exploit established camera calibration techniques to recover both the intrinsic and extrinsic properties of the~HMD (width, height, focal length, optic centre and principal ray of the display). Our method gives low re-projection errors and, unlike existing methods, involves no time-consuming and error-prone human measurements, nor any prior estimates about the HMD geometry
ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes
Exploiting synthetic data to learn deep models has attracted increasing
attention in recent years. However, the intrinsic domain difference between
synthetic and real images usually causes a significant performance drop when
applying the learned model to real world scenarios. This is mainly due to two
reasons: 1) the model overfits to synthetic images, making the convolutional
filters incompetent to extract informative representation for real images; 2)
there is a distribution difference between synthetic and real data, which is
also known as the domain adaptation problem. To this end, we propose a new
reality oriented adaptation approach for urban scene semantic segmentation by
learning from synthetic data. First, we propose a target guided distillation
approach to learn the real image style, which is achieved by training the
segmentation model to imitate a pretrained real style model using real images.
Second, we further take advantage of the intrinsic spatial structure presented
in urban scene images, and propose a spatial-aware adaptation scheme to
effectively align the distribution of two domains. These two modules can be
readily integrated with existing state-of-the-art semantic segmentation
networks to improve their generalizability when adapting from synthetic to real
urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting
from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness
of our method.Comment: Add experiments on SYNTHIA, CVPR 2018 camera-ready versio
- …