3 research outputs found

    A Multi-Stage Multi-Task Neural Network for Aerial Scene Interpretation and Geolocalization

    Full text link
    Semantic segmentation and vision-based geolocalization in aerial images are challenging tasks in computer vision. Due to the advent of deep convolutional nets and the availability of relatively low cost UAVs, they are currently generating a growing attention in the field. We propose a novel multi-task multi-stage neural network that is able to handle the two problems at the same time, in a single forward pass. The first stage of our network predicts pixelwise class labels, while the second stage provides a precise location using two branches. One branch uses a regression network, while the other is used to predict a location map trained as a segmentation task. From a structural point of view, our architecture uses encoder-decoder modules at each stage, having the same encoder structure re-used. Furthermore, its size is limited to be tractable on an embedded GPU. We achieve commercial GPS-level localization accuracy from satellite images with spatial resolution of 1 square meter per pixel in a city-wide area of interest. On the task of semantic segmentation, we obtain state-of-the-art results on two challenging datasets, the Inria Aerial Image Labeling dataset and Massachusetts Buildings.Comment: 23 pages, 11 figures. Under review at the 15th European Conference on Computer Vision (ECCV 2018

    Contextual Pyramid Attention Network for Building Segmentation in Aerial Imagery

    Full text link
    Building extraction from aerial images has several applications in problems such as urban planning, change detection, and disaster management. With the increasing availability of data, Convolutional Neural Networks (CNNs) for semantic segmentation of remote sensing imagery has improved significantly in recent years. However, convolutions operate in local neighborhoods and fail to capture non-local features that are essential in semantic understanding of aerial images. In this work, we propose to improve building segmentation of different sizes by capturing long-range dependencies using contextual pyramid attention (CPA). The pathways process the input at multiple scales efficiently and combine them in a weighted manner, similar to an ensemble model. The proposed method obtains state-of-the-art performance on the Inria Aerial Image Labelling Dataset with minimal computation costs. Our method improves 1.8 points over current state-of-the-art methods and 12.6 points higher than existing baselines on the Intersection over Union (IoU) metric without any post-processing. Code and models will be made publicly available

    Learning Navigation by Visual Localization and Trajectory Prediction

    Full text link
    When driving, people make decisions based on current traffic as well as their desired route. They have a mental map of known routes and are often able to navigate without needing directions. Current self-driving models improve their performances when using additional GPS information. Here we aim to push forward self-driving research and perform route planning even in the absence of GPS. Our system learns to predict in real-time vehicle's current location and future trajectory, as a function of time, on a known map, given only the raw video stream and the intended destination. The GPS signal is available only at training time, with training data annotation being fully automatic. Different from other published models, we predict the vehicle's trajectory for up to seven seconds ahead, from which complete steering, speed and acceleration information can be derived for the entire time span. Trajectories capture navigational information on multiple levels, from instant steering commands that depend on present traffic and obstacles ahead, to longer-term navigation decisions, towards a specific destination. We collect our dataset with a regular car and a smartphone that records video and GPS streams. The GPS data is used to derive ground-truth supervision labels and create an analytical representation of the traversed map. In tests, our system outperforms published methods on visual localization and steering and gives accurate navigation assistance between any two known locations.Comment: Submitted to ICRA 202