263 research outputs found
Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks
Semantic labeling (or pixel-level land-cover classification) in ultra-high
resolution imagery (< 10cm) requires statistical models able to learn high
level concepts from spatial data, with large appearance variations.
Convolutional Neural Networks (CNNs) achieve this goal by learning
discriminatively a hierarchy of representations of increasing abstraction.
In this paper we present a CNN-based system relying on an
downsample-then-upsample architecture. Specifically, it first learns a rough
spatial map of high-level representations by means of convolutions and then
learns to upsample them back to the original resolution by deconvolutions. By
doing so, the CNN learns to densely label every pixel at the original
resolution of the image. This results in many advantages, including i)
state-of-the-art numerical accuracy, ii) improved geometric accuracy of
predictions and iii) high efficiency at inference time.
We test the proposed system on the Vaihingen and Potsdam sub-decimeter
resolution datasets, involving semantic labeling of aerial images of 9cm and
5cm resolution, respectively. These datasets are composed by many large and
fully annotated tiles allowing an unbiased evaluation of models making use of
spatial information. We do so by comparing two standard CNN architectures to
the proposed one: standard patch classification, prediction of local label
patches by employing only convolutions and full patch labeling by employing
deconvolutions. All the systems compare favorably or outperform a
state-of-the-art baseline relying on superpixels and powerful appearance
descriptors. The proposed full patch labeling CNN outperforms these models by a
large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201
Segmentation and Classification of Multimodal Imagery
Segmentation and classification are two important computer vision tasks that transform input data into a compact representation that allow fast and efficient analysis. Several challenges exist in generating accurate segmentation or classification results. In a video, for example, objects often change the appearance and are partially occluded, making it difficult to delineate the object from its surroundings. This thesis proposes video segmentation and aerial image classification algorithms to address some of the problems and provide accurate results.
We developed a gradient driven three-dimensional segmentation technique that partitions a video into spatiotemporal objects. The algorithm utilizes the local gradient computed at each pixel location together with the global boundary map acquired through deep learning methods to generate initial pixel groups by traversing from low to high gradient regions. A local clustering method is then employed to refine these initial pixel groups. The refined sub-volumes in the homogeneous regions of video are selected as initial seeds and iteratively combined with adjacent groups based on intensity similarities. The volume growth is terminated at the color boundaries of the video. The over-segments obtained from the above steps are then merged hierarchically by a multivariate approach yielding a final segmentation map for each frame. In addition, we also implemented a streaming version of the above algorithm that requires a lower computational memory. The results illustrate that our proposed methodology compares favorably well, on a qualitative and quantitative level, in segmentation quality and computational efficiency with the latest state of the art techniques.
We also developed a convolutional neural network (CNN)-based method to efficiently combine information from multisensor remotely sensed images for pixel-wise semantic classification. The CNN features obtained from multiple spectral bands are fused at the initial layers of deep neural networks as opposed to final layers. The early fusion architecture has fewer parameters and thereby reduces the computational time and GPU memory during training and inference. We also introduce a composite architecture that fuses features throughout the network. The methods were validated on four different datasets: ISPRS Potsdam, Vaihingen, IEEE Zeebruges, and Sentinel-1, Sentinel-2 dataset. For the Sentinel-1,-2 datasets, we obtain the ground truth labels for three classes from OpenStreetMap. Results on all the images show early fusion, specifically after layer three of the network, achieves results similar to or better than a decision level fusion mechanism. The performance of the proposed architecture is also on par with the state-of-the-art results
μ΄κ³ ν΄μλ μμ λΆλ₯λ₯Ό μν μν μ λμ μμ± μ κ²½λ§ κΈ°λ°μ μ€μ§λ νμ΅ νλ μμν¬
νμλ
Όλ¬Έ(μμ¬) -- μμΈλνκ΅λνμ : 곡과λν 건μ€ν경곡νλΆ, 2021.8. κΉμ©μΌ.κ³ ν΄μλ μμ λΆλ₯λ ν μ§νΌλ³΅μ§λ μ μ, μμ λΆλ₯, λμ κ³ν λ±μμ λ€μνκ² νμ©λλ λνμ μΈ μμ λΆμ κΈ°μ μ΄λ€. μ΅κ·Ό, μ¬μΈ΅ ν©μ±κ³± μ κ²½λ§ (deep convolutional neural network)μ μμ λΆλ₯ λΆμΌμμ λκ°μ 보μ¬μλ€. νΉν, μ¬μΈ΅ ν©μ±κ³± μ κ²½λ§ κΈ°λ°μ μλ―Έλ‘ μ μμ λΆν (semantic segmentation) κΈ°λ²μ μ°μ° λΉμ©μ λ§€μ° κ°μμν€λ©°, μ΄λ¬ν μ μ μ§μμ μΌλ‘ κ³ ν΄μλ λ°μ΄ν°κ° μΆμ λκ³ μλ κ³ ν΄μλ μμμ λΆμν λ μ€μνκ² μμ©λλ€.
μ¬μΈ΅ νμ΅ (deep learning) κΈ°λ° κΈ°λ²μ΄ μμ μ μΈ μ±λ₯μ λ¬μ±νκΈ° μν΄μλ μΌλ°μ μΌλ‘ μΆ©λΆν μμ λΌλ²¨λ§λ λ°μ΄ν° (labeled data)κ° ν보λμ΄μΌ νλ€. κ·Έλ¬λ, μ격νμ¬ λΆμΌμμ κ³ ν΄μλ μμμ λν μ°Έμ‘°λ°μ΄ν°λ₯Ό μ»λ κ²μ λΉμ©μ μΌλ‘ μ νμ μΈ κ²½μ°κ° λ§λ€. μ΄λ¬ν λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ λ³Έ λ
Όλ¬Έμμλ λΌλ²¨λ§λ μμκ³Ό λΌλ²¨λ§λμ§ μμ μμ (unlabeled image)μ ν¨κ» μ¬μ©νλ μ€μ§λ νμ΅ νλ μμν¬λ₯Ό μ μνμμΌλ©°, μ΄λ₯Ό ν΅ν΄ κ³ ν΄μλ μμ λΆλ₯λ₯Ό μννμλ€. λ³Έ λ
Όλ¬Έμμλ λΌλ²¨λ§λμ§ μμ μμμ μ¬μ©νκΈ° μν΄μ κ°μ λ μν μ λμ μμ± μ κ²½λ§ (CycleGAN) λ°©λ²μ μ μνμλ€.
μν μ λμ μμ± μ κ²½λ§μ μμ λ³ν λͺ¨λΈ (image translation model)λ‘ μ²μ μ μλμμΌλ©°, νΉν μν μΌκ΄μ± μμ€ ν¨μ (cycle consistency loss function)λ₯Ό ν΅ν΄ νμ΄λ§λμ§ μμ μμ (unpaired image)μ λͺ¨λΈ νμ΅μ νμ©ν μ°κ΅¬μ΄λ€. μ΄λ¬ν μν μΌκ΄μ± μμ€ ν¨μμ μκ°μ λ°μ, λ³Έ λ
Όλ¬Έμμλ λΌλ²¨λ§λμ§ μμ μμμ μ°Έμ‘°λ°μ΄ν°μ νμ΄λ§λμ§ μμ λ°μ΄ν°λ‘ κ°μ£ΌνμμΌλ©°, μ΄λ₯Ό ν΅ν΄ λΌλ²¨λ§λμ§ μμ μμμΌλ‘ λΆλ₯ λͺ¨λΈμ ν¨κ» νμ΅μμΌ°λ€.
μλ§μ λΌλ²¨λ§λμ§ μμ λ°μ΄ν°μ μλμ μΌλ‘ μ μ λΌλ²¨λ§λ λ°μ΄ν°λ₯Ό ν¨κ» νμ©νκΈ° μν΄, λ³Έ λ
Όλ¬Έμ μ§λ νμ΅κ³Ό κ°μ λ μ€μ§λ νμ΅ κΈ°λ°μ μν μ λμ μμ± μ κ²½λ§μ κ²°ν©νμλ€. μ μλ νλ μμν¬λ μν κ³Όμ (cyclic phase), μ λμ κ³Όμ (adversarial phase), μ§λ νμ΅ κ³Όμ (supervised learning phase), μΈ λΆλΆμ ν¬ν¨νκ³ μλ€. λΌλ²¨λ§λ μμμ μ§λ νμ΅ κ³Όμ μμ λΆλ₯ λͺ¨λΈμ νμ΅μν€λ λ°μ μ¬μ©λλ€. μ λμ κ³Όμ κ³Ό μ§λ νμ΅ κ³Όμ μμλ λΌλ²¨λ§λμ§ μμ λ°μ΄ν°κ° μ¬μ©λ μ μμΌλ©°, μ΄λ₯Ό ν΅ν΄ μ μ μμ μ°Έμ‘°λ°μ΄ν°λ‘ μΈν΄ μΆ©λΆν νμ΅λμ§ λͺ»ν λΆλ₯ λͺ¨λΈμ μΆκ°μ μΌλ‘ νμ΅μν¨λ€.
μ μλ νλ μμν¬μ κ²°κ³Όλ 곡곡 λ°μ΄ν°μΈ ISPRS Vaihingen Datasetμ ν΅ν΄ νκ°λμλ€. μ νλ κ²μ¦μ μν΄, μ μλ νλ μμν¬μ κ²°κ³Όλ 5κ°μ λ²€μΉλ§ν¬λ€ (benchmarks)κ³Ό λΉκ΅λμμΌλ©°, μ΄λ μ¬μ©λ λ²€μΉλ§ν¬ λͺ¨λΈλ€μ μ§λ νμ΅κ³Ό μ€μ§λ νμ΅ λ°©λ² λͺ¨λλ₯Ό ν¬ν¨νλ€. μ΄μ λν΄, λ³Έ λ
Όλ¬Έμμλ λΌλ²¨λ§λ λ°μ΄ν°μ λΌλ²¨λ§λμ§ μμ λ°μ΄ν°μ ꡬμ±μ λ°λ₯Έ μν₯μ νμΈνμμΌλ©°, λ€λ₯Έ λΆλ₯ λͺ¨λΈμ λν λ³Έ νλ μμν¬μ μ μ©κ°λ₯μ±μ λν μΆκ°μ μΈ μ€νλ μννμλ€.
μ μλ νλ μμν¬λ λ€λ₯Έ λ²€μΉλ§ν¬λ€κ³Ό λΉκ΅ν΄μ κ°μ₯ λμ μ νλ (μΈ μ€ν μ§μμ λν΄ 0.796, 0.786, 0.784μ μ 체 μ νλ)λ₯Ό λ¬μ±νμλ€. νΉν, κ°μ²΄μ ν¬κΈ°λ λͺ¨μκ³Ό κ°μ νΉμ±μ΄ λ€λ₯Έ μ€ν μ§μμμ κ°μ₯ ν° μ νλ μμΉμ νμΈνμμΌλ©°, μ΄λ¬ν κ²°κ³Όλ₯Ό ν΅ν΄ μ μλ μ€μ§λ νμ΅μ΄ λͺ¨λΈμ μ°μνκ² μ κ·ν(regularization)ν¨μ νμΈνμλ€. λν, μ€μ§λ νμ΅μ ν΅ν΄ ν₯μλλ μ νλλ λΌλ²¨λ§λ λ°μ΄ν°μ λΉν΄ λΌλ²¨λ§λμ§ μμ λ°μ΄ν°κ° μλμ μΌλ‘ λ§μμ λ κ·Έ μ¦κ° νμ΄ λμ± μ»€μ‘λ€. λ§μ§λ§μΌλ‘, μ μλ μ€μ§λ νμ΅ κΈ°λ°μ μν μ λμ μμ± μ κ²½λ§ κΈ°λ²μ΄ UNet μΈμλ FPNκ³Ό PSPNetμ΄λΌλ λ€λ₯Έ λΆλ₯ λͺ¨λΈμμλ μ μλ―Έν μ νλ μμΉμ 보μλ€. μ΄λ₯Ό ν΅ν΄ λ€λ₯Έ λΆλ₯ λͺ¨λΈμ λν μ μλ νλ μμν¬μ μ μ©κ°λ₯μ±μ νμΈνμλ€Image classification of Very High Resolution (VHR) images is a fundamental task in the remote sensing domain for various applications such as land cover mapping, vegetation mapping, and urban planning. In recent years, deep convolutional neural networks have shown promising performance in image classification studies. In particular, semantic segmentation models with fully convolutional architecture-based networks demonstrated great improvements in terms of computational cost, which has become especially important with the large accumulation of VHR images in recent years.
However, deep learning-based approaches are generally limited by the need of a sufficient amount of labeled data to obtain stable accuracy, and acquiring reference labels of remotely-sensed VHR images is very labor-extensive and expensive. To overcome this problem, this thesis proposed a semi-supervised learning framework for VHR image classification. Semi-supervised learning uses both labeled and unlabeled data together, thus reducing the modelβs dependency on data labels. To address this issue, this thesis employed a modified CycleGAN model to utilize large amounts of unlabeled images.
CycleGAN is an image translation model which was developed from Generative Adversarial Networks (GAN) for image generation. CycleGAN trains unpaired dataset by using cycle consistency loss with two generators and two discriminators. Inspired by the concept of cycle consistency, this thesis modified CycleGAN to enable the use of unlabeled VHR data in model training by considering the unlabeled images as images unpaired with their corresponding ground truth maps.
To utilize a large amount of unlabeled VHR data and a relatively small amount of labeled VHR data, this thesis combined a supervised learning classification model with the modified CycleGAN architecture. The proposed framework contains three phases: cyclic phase, adversarial phase, and supervised learning phase. Through the three phase, both labeled and unlabeled data can be utilized simultaneously to train the model in an end-to-end manner.
The result of the proposed framework was evaluated by using an open-source VHR image dataset, referred to as the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen dataset. To validate the accuracy of the proposed framework, benchmark models including both supervised and semi-supervised learning methods were compared on the same dataset. Furthermore, two additional experiments were conducted to confirm the impact of labeled and unlabeled data on classification accuracy and adaptation of the CycleGAN model for other classification models. These results were evaluated by the popular three metrics for image classification: Overall Accuracy (OA), F1-score, and mean Intersection over Union (mIoU).
The proposed framework achieved the highest accuracy (OA: 0.796, 0.786, and 0.784, respectively in three test sites) in comparison to the other five benchmarks. In particular, in a test site containing numerous objects with various properties, the largest increase in accuracy was observed due to the regularization effect from the semi-supervised method using unlabeled data with the modified CycleGAN. Moreover, by controlling the amount of labeled and unlabeled data, results indicated that a relatively sufficient amount of unlabeled and labeled data is required to increase the accuracy when using the semi-supervised CycleGAN. Lastly, this thesis applied the proposed CycleGAN method to other classification models such as the feature pyramid network (FPN) and the pyramid scene parsing network (PSPNet), in place of UNet. In all cases, the proposed framework returned significantly improved results, displaying the frameworkβs applicability for semi-supervised image classification on remotely-sensed VHR images.1. Introduction 1
2. Background and Related Works 6
2.1. Deep Learning for Image Classification 6
2.1.1. Image-level Classifiaction 6
2.1.2. Fully Convolutional Architectures 7
2.1.3. Semantic Segmentation for Remote Sensing Images 9
2.2. Generative Adversarial Networks (GAN) 12
2.2.1. Introduction to GAN 12
2.2.2. Image Translation 14
2.2.3. GAN for Semantic Segmentation 16
3. Proposed Framework 20
3.1. Modification of CycleGAN 22
3.2. Feed-forward Path of the Proposed Framework 23
3.2.1. Cyclic Phase 23
3.2.2. Adversarial Phase 23
3.2.3. Supervised Learning Phase 24
3.3. Loss Function for Back-propagation 25
3.4. Proposed Network Architecture 28
3.4.1. Generator Architecture 28
3.4.2. Discriminator Architecture 29
4. Experimental Design 31
4.1. Overall Workflow 33
4.2. Vaihingen Dataset 38
4.3. Implementation Details 40
4.4. Metrics for Quantitative Evaluation 41
5. Results and Discussion 42
5.1. Performance Evaluation of the Proposed Feamwork 42
5.2. Comparison of Classification Performance in the Proposed Framework and Benchmarks 45
5.3. Impact of labeled and Unlabeled Data for Semi-supervised Learning 52
5.4. Cycle Consistency in Semi-supervised Learning 55
5.5. Adaptation of the GAN Framework for Other Classification Models 59
6. Conclusion 62
Reference 65
κ΅λ¬Έ μ΄λ‘ 69μ
Semantic Labeling of High Resolution Images Using EfficientUNets and Transformers
Semantic segmentation necessitates approaches that learn high-level
characteristics while dealing with enormous amounts of data. Convolutional
neural networks (CNNs) can learn unique and adaptive features to achieve this
aim. However, due to the large size and high spatial resolution of remote
sensing images, these networks cannot analyze an entire scene efficiently.
Recently, deep transformers have proven their capability to record global
interactions between different objects in the image. In this paper, we propose
a new segmentation model that combines convolutional neural networks with
transformers, and show that this mixture of local and global feature extraction
techniques provides significant advantages in remote sensing segmentation. In
addition, the proposed model includes two fusion layers that are designed to
represent multi-modal inputs and output of the network efficiently. The input
fusion layer extracts feature maps summarizing the relationship between image
content and elevation maps (DSM). The output fusion layer uses a novel
multi-task segmentation strategy where class labels are identified using
class-specific feature extraction layers and loss functions. Finally, a
fast-marching method is used to convert all unidentified class labels to their
closest known neighbors. Our results demonstrate that the proposed methodology
improves segmentation accuracy compared to state-of-the-art techniques
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
Scale-aware neural network for semantic segmentation of multi-resolution remote sensing images
Assigning geospatial objects with specific categories at the pixel level is a fundamental task in remote sensing image analysis. Along with the rapid development of sensor technologies, remotely sensed images can be captured at multiple spatial resolutions (MSR) with information content manifested at different scales. Extracting information from these MSR images represents huge opportunities for enhanced feature representation and characterisation. However, MSR images suffer from two critical issues: (1) increased scale variation of geo-objects and (2) loss of detailed information at coarse spatial resolutions. To bridge these gaps, in this paper, we propose a novel scale-aware neural network (SaNet) for the semantic segmentation of MSR remotely sensed imagery. SaNet deploys a densely connected feature network (DCFFM) module to capture high-quality multi-scale context, such that the scale variation is handled properly and the quality of segmentation is increased for both large and small objects. A spatial feature recalibration (SFRM) module was further incorporated into the network to learn intact semantic content with enhanced spatial relationships, where the negative effects of information loss are removed. The combination of DCFFM and SFRM allows SaNet to learn scale-aware feature representation, which outperforms the existing multi-scale feature representation. Extensive experiments on three semantic segmentation datasets demonstrated the effectiveness of the proposed SaNet in cross-resolution segmentation
Development of Mining Sector Applications for Emerging Remote Sensing and Deep Learning Technologies
This thesis uses neural networks and deep learning to address practical, real-world problems in the mining sector. The main focus is on developing novel applications in the area of object detection from remotely sensed data. This area has many potential mining applications and is an important part of moving towards data driven strategic decision making across the mining sector. The scientific contributions of this research are twofold; firstly, each of the three case studies demonstrate new applications which couple remote sensing and neural network based technologies for improved data driven decision making. Secondly, the thesis presents a framework to guide implementation of these technologies in the mining sector, providing a guide for researchers and professionals undertaking further studies of this type. The first case study builds a fully connected neural network method to locate supporting rock bolts from 3D laser scan data. This method combines input features from the remote sensing and mobile robotics research communities, generating accuracy scores up to 22% higher than those found using either feature set in isolation. The neural network approach also is compared to the widely used random forest classifier and is shown to outperform this classifier on the test datasets. Additionally, the algorithmsβ performance is enhanced by adding a confusion class to the training data and by grouping the output predictions using density based spatial clustering. The method is tested on two datasets, gathered using different laser scanners, in different types of underground mines which have different rock bolting patterns. In both cases the method is found to be highly capable of detecting the rock bolts with recall scores of 0.87-0.96. The second case study investigates modern deep learning for LiDAR data. Here, multiple transfer learning strategies and LiDAR data representations are examined for the task of identifying historic mining remains. A transfer learning approach based on a Lunar crater detection model is used, due to the task similarities between both the underlying data structures and the geometries of the objects to be detected. The relationship between dataset resolution and detection accuracy is also examined, with the results showing that the approach is capable of detecting pits and shafts to a high degree of accuracy with precision and recall scores between 0.80-0.92, provided the input data is of sufficient quality and resolution. Alongside resolution, different LiDAR data representations are explored, showing that the precision-recall balance varies depending on the input LiDAR data representation. The third case study creates a deep convolutional neural network model to detect artisanal scale mining from multispectral satellite data. This model is trained from initialisation without transfer learning and demonstrates that accurate multispectral models can be built from a smaller training dataset when appropriate design and data augmentation strategies are adopted. Alongside the deep learning model, novel mosaicing algorithms are developed both to improve cloud cover penetration and to decrease noise in the final prediction maps. When applied to the study area, the results from this model provide valuable information about the expansion, migration and forest encroachment of artisanal scale mining in southwestern Ghana over the last four years. Finally, this thesis presents an implementation framework for these neural network based object detection models, to generalise the findings from this research to new mining sector deep learning tasks. This framework can be used to identify applications which would benefit from neural network approaches; to build the models; and to apply these algorithms in a real world environment. The case study chapters confirm that the neural network models are capable of interpreting remotely sensed data to a high degree of accuracy on real world mining problems, while the framework guides the development of new models to solve a wide range of related challenges
- β¦