6,016 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Data Reduction and Deep-Learning Based Recovery for Geospatial Visualization and Satellite Imagery

    Get PDF
    The storage, retrieval and distribution of data are some critical aspects of big data management. Data scientists and decision-makers often need to share large datasets and make decisions on archiving or deleting historical data to cope with resource constraints. As a consequence, there is an urgency of reducing the storage and transmission requirement. A potential approach to mitigate such problems is to reduce big datasets into smaller ones, which will not only lower storage requirements but also allow light load transfer over the network. The high dimensional data often exhibit high repetitiveness and paradigm across different dimensions. Carefully prepared data by removing redundancies, along with a machine learning model capable of reconstructing the whole dataset from its reduced version, can improve the storage scalability, data transfer, and speed up the overall data management pipeline. In this thesis, we explore some data reduction strategies for big datasets, while ensuring that the data can be transferred and used ubiquitously by all stakeholders, i.e., the entire dataset can be reconstructed with high quality whenever necessary. One of our data reduction strategies follows a straightforward uniform pattern, which guarantees a minimum of 75% data size reduction. We also propose a novel variance based reduction technique, which focuses on removing only redundant data and offers additional 1% to 2% deletion rate. We have adopted various traditional machine learning and deep learning approaches for high-quality reconstruction. We evaluated our pipelines with big geospatial data and satellite imageries. Among them, our deep learning approaches have performed very well both quantitatively and qualitatively with the capability of reconstructing high quality features. We also show how to leverage temporal data for better reconstruction. For uniform deletion, the reconstruction accuracy observed is as high as 98.75% on an average for spatial meteorological data (e.g., soil moisture and albedo), and 99.09% for satellite imagery. Pushing the deletion rate further by following variance based deletion method, the decrease in accuracy remains within 1% for spatial meteorological data and 7% for satellite imagery

    Towards Efficient 3D Reconstructions from High-Resolution Satellite Imagery

    Get PDF
    Recent years have witnessed the rapid growth of commercial satellite imagery. Compared with other imaging products, such as aerial or streetview imagery, modern satellite images are captured at high resolution and with multiple spectral bands, thus provide unique viewing angles, global coverage, and frequent updates of the Earth surfaces. With automated processing and intelligent analysis algorithms, satellite images can enable global-scale 3D modeling applications. This dissertation explores computer vision algorithms to reconstruct 3D models from satellite images at different levels: geometric, semantic, and parametric reconstructions. However, reconstructing satellite imagery is particularly challenging for the following reasons: 1) Satellite images typically contain an enormous amount of raw pixels. Efficient algorithms are needed to minimize the substantial computational burden. 2) The ground sampling distances of satellite images are comparatively low. Visual entities, such as buildings, appear visually small and cluttered, thus posing difficulties for 3D modeling. 3) Satellite images usually have complex camera models and inaccurate vendor-provided camera calibrations. Rational polynomial coefficients (RPC) camera models, although widely used, need to be appropriately handled to ensure high-quality reconstructions. To obtain geometric reconstructions efficiently, we propose an edge-aware interpolation-based algorithm to obtain 3D point clouds from satellite image pairs. Initial 2D pixel matches are first established and triangulated to compensate the RPC calibration errors. Noisy dense correspondences can then be estimated by interpolating the inlier matches in an edge-aware manner. After refining the correspondence map with a fast bilateral solver, we can obtain dense 3D point clouds via triangulation. Pixel-wise semantic classification results for satellite images are usually noisy due to the negligence of spatial neighborhood information. Thus, we propose to aggregate multiple corresponding observations of the same 3D point to obtain high-quality semantic models. Instead of just leveraging geometric reconstructions to provide such correspondences, we formulate geometric modeling and semantic reasoning in a joint Markov Random Field (MRF) model. Our experiments show that both tasks can benefit from the joint inference. Finally, we propose a novel deep learning based approach to perform single-view parametric reconstructions from satellite imagery. By parametrizing buildings as 3D cuboids, our method simultaneously localizes building instances visible in the image and estimates their corresponding cuboid models. Aerial LiDAR and vectorized GIS maps are utilized as supervision. Our network upsamples CNN features to detect small but cluttered building instances. In addition, we estimate building contours through a separate fully convolutional network to avoid overlapping building cuboids.Doctor of Philosoph

    A Featureless Approach to 3D Polyhedral Building Modeling from Aerial Images

    Get PDF
    This paper presents a model-based approach for reconstructing 3D polyhedral building models from aerial images. The proposed approach exploits some geometric and photometric properties resulting from the perspective projection of planar structures. Data are provided by calibrated aerial images. The novelty of the approach lies in its featurelessness and in its use of direct optimization based on image rawbrightness. The proposed framework avoids feature extraction and matching. The 3D polyhedral model is directly estimated by optimizing an objective function that combines an image-based dissimilarity measure and a gradient score over several aerial images. The optimization process is carried out by the Differential Evolution algorithm. The proposed approach is intended to provide more accurate 3D reconstruction than feature-based approaches. Fast 3D model rectification and updating can take advantage of the proposed method. Several results and evaluations of performance from real and synthetic images show the feasibility and robustness of the proposed approach

    Detection and classification of non-stationary signals using sparse representations in adaptive dictionaries

    Get PDF
    Automatic classification of non-stationary radio frequency (RF) signals is of particular interest in persistent surveillance and remote sensing applications. Such signals are often acquired in noisy, cluttered environments, and may be characterized by complex or unknown analytical models, making feature extraction and classification difficult. This thesis proposes an adaptive classification approach for poorly characterized targets and backgrounds based on sparse representations in non-analytical dictionaries learned from data. Conventional analytical orthogonal dictionaries, e.g., Short Time Fourier and Wavelet Transforms, can be suboptimal for classification of non-stationary signals, as they provide a rigid tiling of the time-frequency space, and are not specifically designed for a particular signal class. They generally do not lead to sparse decompositions (i.e., with very few non-zero coefficients), and use in classification requires separate feature selection algorithms. Pursuit-type decompositions in analytical overcomplete (non-orthogonal) dictionaries yield sparse representations, by design, and work well for signals that are similar to the dictionary elements. The pursuit search, however, has a high computational cost, and the method can perform poorly in the presence of realistic noise and clutter. One such overcomplete analytical dictionary method is also analyzed in this thesis for comparative purposes. The main thrust of the thesis is learning discriminative RF dictionaries directly from data, without relying on analytical constraints or additional knowledge about the signal characteristics. A pursuit search is used over the learned dictionaries to generate sparse classification features in order to identify time windows that contain a target pulse. Two state-of-the-art dictionary learning methods are compared, the K-SVD algorithm and Hebbian learning, in terms of their classification performance as a function of dictionary training parameters. Additionally, a novel hybrid dictionary algorithm is introduced, demonstrating better performance and higher robustness to noise. The issue of dictionary dimensionality is explored and this thesis demonstrates that undercomplete learned dictionaries are suitable for non-stationary RF classification. Results on simulated data sets with varying background clutter and noise levels are presented. Lastly, unsupervised classification with undercomplete learned dictionaries is also demonstrated in satellite imagery analysis

    Learning Aerial Image Segmentation from Online Maps

    Get PDF
    This study deals with semantic segmentation of high-resolution (aerial) images where a semantic class label is assigned to each pixel via supervised classification as a basis for automatic map generation. Recently, deep convolutional neural networks (CNNs) have shown impressive performance and have quickly become the de-facto standard for semantic segmentation, with the added benefit that task-specific feature design is no longer necessary. However, a major downside of deep learning methods is that they are extremely data-hungry, thus aggravating the perennial bottleneck of supervised classification, to obtain enough annotated training data. On the other hand, it has been observed that they are rather robust against noise in the training labels. This opens up the intriguing possibility to avoid annotating huge amounts of training data, and instead train the classifier from existing legacy data or crowd-sourced maps which can exhibit high levels of noise. The question addressed in this paper is: can training with large-scale, publicly available labels replace a substantial part of the manual labeling effort and still achieve sufficient performance? Such data will inevitably contain a significant portion of errors, but in return virtually unlimited quantities of it are available in larger parts of the world. We adapt a state-of-the-art CNN architecture for semantic segmentation of buildings and roads in aerial images, and compare its performance when using different training data sets, ranging from manually labeled, pixel-accurate ground truth of the same city to automatic training data derived from OpenStreetMap data from distant locations. We report our results that indicate that satisfying performance can be obtained with significantly less manual annotation effort, by exploiting noisy large-scale training data.Comment: Published in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSIN
    corecore