34,668 research outputs found

    Integrating aerial and street view images for urban land use classification

    Get PDF
    Urban land use is key to rational urban planning and management. Traditional land use classification methods rely heavily on domain experts, which is both expensive and inefficient. In this paper, deep neural network-based approaches are presented to label urban land use at pixel level using high-resolution aerial images and ground-level street view images. We use a deep neural network to extract semantic features from sparsely distributed street view images and interpolate them in the spatial domain to match the spatial resolution of the aerial images, which are then fused together through a deep neural network for classifying land use categories. Our methods are tested on a large publicly available aerial and street view images dataset of New York City, and the results show that using aerial images alone can achieve relatively high classification accuracy, the ground-level street view images contain useful information for urban land use classification, and fusing street image features with aerial images can improve classification accuracy. Moreover, we present experimental studies to show that street view images add more values when the resolutions of the aerial images are lower, and we also present case studies to illustrate how street view images provide useful auxiliary information to aerial images to boost performances

    Multi-source data fusion for land use classification using deep learning

    Get PDF
    Land use classification is the process of characterising the land by the purposes of usage. It is significantly important since timely and accurate land use information is essential for land monitoring and management. Traditional classification relies heavily on labour-intensive land survey, which is time-consuming and expensive. The development of remote sensing (RS) technologies has greatly advanced the automation of land monitoring, since RS images can efficiently capture the physical attributes of the earth surface. However, it is challenging to use single RS images to recognise land use, especially for high-density cities with complex and mixed land-use patterns. There are several major challenges. Firstly, it is challenging to extract representative features from RS images that are highly related to land use semantics and invariant to complex landscapes. Secondly, RS images capture the physical information from nadir view, and thus miss out many useful ground-level details. Thirdly, RS images cannot directly capture information about human activities on the land, which is critical to determine land use. To address these issues, this thesis leverages deep learning techniques and exploits recently emerged geospatial big data to complement RS images for land use classification. In summary, the thesis has made following contributions: Firstly, a triplet deep metric learning network has been developed to enhance remotely sensed land-use scene retrieval and classification. The developed network can embed RS images into a semantic space where images from the same class are near each other while those of different classes are far apart through training with triplet loss. The produced features are discriminative and can be exploited for content-based RS image retrieval. Feature reduction methods have been investigated to reduce the redundancy of learned semantic features. Triplet selection strategies have also been examined to ensure effective and efficient training of the network. Furthermore, the extracted features have been leveraged for image scene classification. Comprehensive experiments have been conducted on popular land-use scene benchmarks, and the results demonstrated the effectiveness of the proposed methods. Secondly, a spatial-aware Siamese-like network has been proposed to learn distinguishable embeddings for ground-to-aerial geolocalisation, which can serve as a potential solution for the geotagging of ground-taken images without geotags. Spatial transformer layer was exploited to address the large view variation issue. A loss that combines the triplet loss with a simple and effective location identity loss has also been designed to train the proposed network to further enhance the geolocalisation performances. Extensive experiments were conducted on a publicly available dataset of cross-view image pairs, and the results demonstrated the effectiveness of the method. The proposed geolocalisation method can be used to infer the geolocation of ground-taken images, which makes it possible to further fuse them with RS images for land use classification. Thirdly, a deep learning-based approach has been developed to integrate RS images with geotagged ground-taken images for urban land use classification. Specifically, a deep neural network was used to extract semantic features from sparsely distributed street view images, and the features were further interpolated in the spatial domain to match the aerial images, which were then fused together through a deep fusion neural network for pixel-level land use classification. The proposed methods were tested on a large public dataset, and the results showed that the street view images contain useful information about land use and fusing them with aerial images can improve classification accuracy. Moreover, experimental studies have been presented to show that street view images add more values when the resolutions of the aerial images are lower, and case studies have also been presented to illustrate how street view images provide useful auxiliary information to aerial images to boost performances. Finally, a deep learning-based method has been proposed to fuse remote and social sensing data for urban land use classification. Two neural network based methods have been developed to automatically extract discriminative time-dependent social sensing signature features, which were fused with remote sensing image features extracted via a residual neural network. Besides, a deep learning-based strategy has also been developed to address the data asynchrony problem by enforcing cross-modal feature consistency and cross-modal triplet constraints during the training of the model in an end-to-end manner. Extensive experiments have been conducted on publicly available datasets to demonstrate the effectiveness of the proposed methods. The results showed that the physically sensed RS images and social activities sensed signatures can complement each other to help enhance the accuracy of urban land use classification

    Deep learning in remote sensing: a review

    Get PDF
    Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all? Or, should we resist a 'black-box' solution? There are controversial opinions in the remote sensing community. In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with. More importantly, we advocate remote sensing scientists to bring their expertise into deep learning, and use it as an implicit general model to tackle unprecedented large-scale influential challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin

    Learning Aerial Image Segmentation from Online Maps

    Get PDF
    This study deals with semantic segmentation of high-resolution (aerial) images where a semantic class label is assigned to each pixel via supervised classification as a basis for automatic map generation. Recently, deep convolutional neural networks (CNNs) have shown impressive performance and have quickly become the de-facto standard for semantic segmentation, with the added benefit that task-specific feature design is no longer necessary. However, a major downside of deep learning methods is that they are extremely data-hungry, thus aggravating the perennial bottleneck of supervised classification, to obtain enough annotated training data. On the other hand, it has been observed that they are rather robust against noise in the training labels. This opens up the intriguing possibility to avoid annotating huge amounts of training data, and instead train the classifier from existing legacy data or crowd-sourced maps which can exhibit high levels of noise. The question addressed in this paper is: can training with large-scale, publicly available labels replace a substantial part of the manual labeling effort and still achieve sufficient performance? Such data will inevitably contain a significant portion of errors, but in return virtually unlimited quantities of it are available in larger parts of the world. We adapt a state-of-the-art CNN architecture for semantic segmentation of buildings and roads in aerial images, and compare its performance when using different training data sets, ranging from manually labeled, pixel-accurate ground truth of the same city to automatic training data derived from OpenStreetMap data from distant locations. We report our results that indicate that satisfying performance can be obtained with significantly less manual annotation effort, by exploiting noisy large-scale training data.Comment: Published in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSIN
    • …
    corecore