3 research outputs found
A Multi-Stage Multi-Task Neural Network for Aerial Scene Interpretation and Geolocalization
Semantic segmentation and vision-based geolocalization in aerial images are
challenging tasks in computer vision. Due to the advent of deep convolutional
nets and the availability of relatively low cost UAVs, they are currently
generating a growing attention in the field. We propose a novel multi-task
multi-stage neural network that is able to handle the two problems at the same
time, in a single forward pass. The first stage of our network predicts
pixelwise class labels, while the second stage provides a precise location
using two branches. One branch uses a regression network, while the other is
used to predict a location map trained as a segmentation task. From a
structural point of view, our architecture uses encoder-decoder modules at each
stage, having the same encoder structure re-used. Furthermore, its size is
limited to be tractable on an embedded GPU. We achieve commercial GPS-level
localization accuracy from satellite images with spatial resolution of 1 square
meter per pixel in a city-wide area of interest. On the task of semantic
segmentation, we obtain state-of-the-art results on two challenging datasets,
the Inria Aerial Image Labeling dataset and Massachusetts Buildings.Comment: 23 pages, 11 figures. Under review at the 15th European Conference on
Computer Vision (ECCV 2018
Contextual Pyramid Attention Network for Building Segmentation in Aerial Imagery
Building extraction from aerial images has several applications in problems
such as urban planning, change detection, and disaster management. With the
increasing availability of data, Convolutional Neural Networks (CNNs) for
semantic segmentation of remote sensing imagery has improved significantly in
recent years. However, convolutions operate in local neighborhoods and fail to
capture non-local features that are essential in semantic understanding of
aerial images. In this work, we propose to improve building segmentation of
different sizes by capturing long-range dependencies using contextual pyramid
attention (CPA). The pathways process the input at multiple scales efficiently
and combine them in a weighted manner, similar to an ensemble model. The
proposed method obtains state-of-the-art performance on the Inria Aerial Image
Labelling Dataset with minimal computation costs. Our method improves 1.8
points over current state-of-the-art methods and 12.6 points higher than
existing baselines on the Intersection over Union (IoU) metric without any
post-processing. Code and models will be made publicly available
Learning Navigation by Visual Localization and Trajectory Prediction
When driving, people make decisions based on current traffic as well as their
desired route. They have a mental map of known routes and are often able to
navigate without needing directions. Current self-driving models improve their
performances when using additional GPS information. Here we aim to push forward
self-driving research and perform route planning even in the absence of GPS.
Our system learns to predict in real-time vehicle's current location and future
trajectory, as a function of time, on a known map, given only the raw video
stream and the intended destination. The GPS signal is available only at
training time, with training data annotation being fully automatic. Different
from other published models, we predict the vehicle's trajectory for up to
seven seconds ahead, from which complete steering, speed and acceleration
information can be derived for the entire time span. Trajectories capture
navigational information on multiple levels, from instant steering commands
that depend on present traffic and obstacles ahead, to longer-term navigation
decisions, towards a specific destination. We collect our dataset with a
regular car and a smartphone that records video and GPS streams. The GPS data
is used to derive ground-truth supervision labels and create an analytical
representation of the traversed map. In tests, our system outperforms published
methods on visual localization and steering and gives accurate navigation
assistance between any two known locations.Comment: Submitted to ICRA 202