6,523 research outputs found

    Dense prediction of label noise for learning building extraction from aerial drone imagery

    Get PDF
    Label noise is a commonly encountered problem in learning building extraction tasks; its presence can reduce performance and increase learning complexity. This is especially true for cases where high resolution aerial drone imagery is used, as the labels may not perfectly correspond/align with the actual objects in the imagery. In general machine learning and computer vision context, labels refer to the associated class of data, and in remote sensing-based building extraction refer to pixel-level classes. Dense label noise in building extraction tasks has rarely been formalized and assessed. We formulate a taxonomy of label noise models for building extraction tasks, which incorporates both pixel-wise and dense models. While learning dense prediction under label noise, the differences between the ground truth clean label and observed noisy label can be encoded by error matrices indicating locations and type of noisy pixel-level labels. In this work, we explicitly learn to approximate error matrices for improving building extraction performance; essentially, learning dense prediction of label noise as a subtask of a larger building extraction task. We propose two new model frameworks for learning building extraction under dense real-world label noise, and consequently two new network architectures, which approximate the error matrices as intermediate predictions. The first model learns the general error matrix as an intermediate step and the second model learns the false positive and false-negative error matrices independently, as intermediate steps. Approximating intermediate error matrices can generate label noise saliency maps, for identifying labels having higher chances of being mis-labelled. We have used ultra-high-resolution aerial images, noisy observed labels from OpenStreetMap, and clean labels obtained after careful annotation by the authors. When compared to the baseline model trained and tested using clean labels, our intermediate false positive-false negative error matrix model provides Intersection-Over-Union gain of 2.74% and F1-score gain of 1.75% on the independent test set. Furthermore, our proposed models provide much higher recall than currently used deep learning models for building extraction, while providing comparable precision. We show that intermediate false positive-false negative error matrix approximation can improve performance under label noise

    Learning Aerial Image Segmentation from Online Maps

    Get PDF
    This study deals with semantic segmentation of high-resolution (aerial) images where a semantic class label is assigned to each pixel via supervised classification as a basis for automatic map generation. Recently, deep convolutional neural networks (CNNs) have shown impressive performance and have quickly become the de-facto standard for semantic segmentation, with the added benefit that task-specific feature design is no longer necessary. However, a major downside of deep learning methods is that they are extremely data-hungry, thus aggravating the perennial bottleneck of supervised classification, to obtain enough annotated training data. On the other hand, it has been observed that they are rather robust against noise in the training labels. This opens up the intriguing possibility to avoid annotating huge amounts of training data, and instead train the classifier from existing legacy data or crowd-sourced maps which can exhibit high levels of noise. The question addressed in this paper is: can training with large-scale, publicly available labels replace a substantial part of the manual labeling effort and still achieve sufficient performance? Such data will inevitably contain a significant portion of errors, but in return virtually unlimited quantities of it are available in larger parts of the world. We adapt a state-of-the-art CNN architecture for semantic segmentation of buildings and roads in aerial images, and compare its performance when using different training data sets, ranging from manually labeled, pixel-accurate ground truth of the same city to automatic training data derived from OpenStreetMap data from distant locations. We report our results that indicate that satisfying performance can be obtained with significantly less manual annotation effort, by exploiting noisy large-scale training data.Comment: Published in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSIN
    • …
    corecore