3,484 research outputs found

    Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks

    Full text link
    Semantic labeling (or pixel-level land-cover classification) in ultra-high resolution imagery (< 10cm) requires statistical models able to learn high level concepts from spatial data, with large appearance variations. Convolutional Neural Networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper we present a CNN-based system relying on an downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including i) state-of-the-art numerical accuracy, ii) improved geometric accuracy of predictions and iii) high efficiency at inference time. We test the proposed system on the Vaihingen and Potsdam sub-decimeter resolution datasets, involving semantic labeling of aerial images of 9cm and 5cm resolution, respectively. These datasets are composed by many large and fully annotated tiles allowing an unbiased evaluation of models making use of spatial information. We do so by comparing two standard CNN architectures to the proposed one: standard patch classification, prediction of local label patches by employing only convolutions and full patch labeling by employing deconvolutions. All the systems compare favorably or outperform a state-of-the-art baseline relying on superpixels and powerful appearance descriptors. The proposed full patch labeling CNN outperforms these models by a large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201

    The detailed interpretation of pole-like street furniture in mobile laser scanning data

    Get PDF
    The interpretation of pole - like road furniture in mobile laser scanning data has received much attention in recent years. Most current studies interpret road furniture as a single object, which is infeasible for road furniture with multiple classes. In order to tackle this problem, we propose a framework using machine learning classifiers to interpret road furniture into detailed classes based on their functionalities such as street lights and traffic signs connected with poles (Figure 1). The overall accuracy of the interpretation in one test site is higher than 90%. A screenshot of our result is as shown in Figure 2. To conclude, our framework well interprets road furniture at a detailed level, which is of great importance for 3D precise mapping

    Deep learning in remote sensing: a review

    Get PDF
    Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all? Or, should we resist a 'black-box' solution? There are controversial opinions in the remote sensing community. In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with. More importantly, we advocate remote sensing scientists to bring their expertise into deep learning, and use it as an implicit general model to tackle unprecedented large-scale influential challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin

    Urban Land Cover Classification with Missing Data Modalities Using Deep Convolutional Neural Networks

    Get PDF
    Automatic urban land cover classification is a fundamental problem in remote sensing, e.g. for environmental monitoring. The problem is highly challenging, as classes generally have high inter-class and low intra-class variance. Techniques to improve urban land cover classification performance in remote sensing include fusion of data from different sensors with different data modalities. However, such techniques require all modalities to be available to the classifier in the decision-making process, i.e. at test time, as well as in training. If a data modality is missing at test time, current state-of-the-art approaches have in general no procedure available for exploiting information from these modalities. This represents a waste of potentially useful information. We propose as a remedy a convolutional neural network (CNN) architecture for urban land cover classification which is able to embed all available training modalities in a so-called hallucination network. The network will in effect replace missing data modalities in the test phase, enabling fusion capabilities even when data modalities are missing in testing. We demonstrate the method using two datasets consisting of optical and digital surface model (DSM) images. We simulate missing modalities by assuming that DSM images are missing during testing. Our method outperforms both standard CNNs trained only on optical images as well as an ensemble of two standard CNNs. We further evaluate the potential of our method to handle situations where only some DSM images are missing during testing. Overall, we show that we can clearly exploit training time information of the missing modality during testing

    Classification of very high resolution aerial photos using spectral-spatial convolutional neural networks

    Full text link
    © 2018 Maher Ibrahim Sameen et al. Classification of aerial photographs relying purely on spectral content is a challenging topic in remote sensing. A convolutional neural network (CNN) was developed to classify aerial photographs into seven land cover classes such as building, grassland, dense vegetation, waterbody, barren land, road, and shadow. The classifier utilized spectral and spatial contents of the data to maximize the accuracy of the classification process. CNN was trained from scratch with manually created ground truth samples. The architecture of the network comprised of a single convolution layer of 32 filters and a kernel size of 3 × 3, pooling size of 2 × 2, batch normalization, dropout, and a dense layer with Softmax activation. The design of the architecture and its hyperparameters were selected via sensitivity analysis and validation accuracy. The results showed that the proposed model could be effective for classifying the aerial photographs. The overall accuracy and Kappa coefficient of the best model were 0.973 and 0.967, respectively. In addition, the sensitivity analysis suggested that the use of dropout and batch normalization technique in CNN is essential to improve the generalization performance of the model. The CNN model without the techniques above achieved the worse performance, with an overall accuracy and Kappa of 0.932 and 0.922, respectively. This research shows that CNN-based models are robust for land cover classification using aerial photographs. However, the architecture and hyperparameters of these models should be carefully selected and optimized
    • …
    corecore