452 research outputs found
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Expediting Building Footprint Segmentation from High-resolution Remote Sensing Images via progressive lenient supervision
The efficacy of building footprint segmentation from remotely sensed images
has been hindered by model transfer effectiveness. Many existing building
segmentation methods were developed upon the encoder-decoder architecture of
U-Net, in which the encoder is finetuned from the newly developed backbone
networks that are pre-trained on ImageNet. However, the heavy computational
burden of the existing decoder designs hampers the successful transfer of these
modern encoder networks to remote sensing tasks. Even the widely-adopted deep
supervision strategy fails to mitigate these challenges due to its invalid loss
in hybrid regions where foreground and background pixels are intermixed. In
this paper, we conduct a comprehensive evaluation of existing decoder network
designs for building footprint segmentation and propose an efficient framework
denoted as BFSeg to enhance learning efficiency and effectiveness.
Specifically, a densely-connected coarse-to-fine feature fusion decoder network
that facilitates easy and fast feature fusion across scales is proposed.
Moreover, considering the invalidity of hybrid regions in the down-sampled
ground truth during the deep supervision process, we present a lenient deep
supervision and distillation strategy that enables the network to learn proper
knowledge from deep supervision. Building upon these advancements, we have
developed a new family of building segmentation networks, which consistently
surpass prior works with outstanding performance and efficiency across a wide
range of newly developed encoder networks. The code will be released on
https://github.com/HaonanGuo/BFSeg-Efficient-Building-Footprint-Segmentation-Framework.Comment: 13 pages,8 figures. Submitted to IEEE Transactions on Neural Networks
and Learning System
μ΄κ³ ν΄μλ μμ λΆλ₯λ₯Ό μν μν μ λμ μμ± μ κ²½λ§ κΈ°λ°μ μ€μ§λ νμ΅ νλ μμν¬
νμλ
Όλ¬Έ(μμ¬) -- μμΈλνκ΅λνμ : 곡과λν 건μ€ν경곡νλΆ, 2021.8. κΉμ©μΌ.κ³ ν΄μλ μμ λΆλ₯λ ν μ§νΌλ³΅μ§λ μ μ, μμ λΆλ₯, λμ κ³ν λ±μμ λ€μνκ² νμ©λλ λνμ μΈ μμ λΆμ κΈ°μ μ΄λ€. μ΅κ·Ό, μ¬μΈ΅ ν©μ±κ³± μ κ²½λ§ (deep convolutional neural network)μ μμ λΆλ₯ λΆμΌμμ λκ°μ 보μ¬μλ€. νΉν, μ¬μΈ΅ ν©μ±κ³± μ κ²½λ§ κΈ°λ°μ μλ―Έλ‘ μ μμ λΆν (semantic segmentation) κΈ°λ²μ μ°μ° λΉμ©μ λ§€μ° κ°μμν€λ©°, μ΄λ¬ν μ μ μ§μμ μΌλ‘ κ³ ν΄μλ λ°μ΄ν°κ° μΆμ λκ³ μλ κ³ ν΄μλ μμμ λΆμν λ μ€μνκ² μμ©λλ€.
μ¬μΈ΅ νμ΅ (deep learning) κΈ°λ° κΈ°λ²μ΄ μμ μ μΈ μ±λ₯μ λ¬μ±νκΈ° μν΄μλ μΌλ°μ μΌλ‘ μΆ©λΆν μμ λΌλ²¨λ§λ λ°μ΄ν° (labeled data)κ° ν보λμ΄μΌ νλ€. κ·Έλ¬λ, μ격νμ¬ λΆμΌμμ κ³ ν΄μλ μμμ λν μ°Έμ‘°λ°μ΄ν°λ₯Ό μ»λ κ²μ λΉμ©μ μΌλ‘ μ νμ μΈ κ²½μ°κ° λ§λ€. μ΄λ¬ν λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ λ³Έ λ
Όλ¬Έμμλ λΌλ²¨λ§λ μμκ³Ό λΌλ²¨λ§λμ§ μμ μμ (unlabeled image)μ ν¨κ» μ¬μ©νλ μ€μ§λ νμ΅ νλ μμν¬λ₯Ό μ μνμμΌλ©°, μ΄λ₯Ό ν΅ν΄ κ³ ν΄μλ μμ λΆλ₯λ₯Ό μννμλ€. λ³Έ λ
Όλ¬Έμμλ λΌλ²¨λ§λμ§ μμ μμμ μ¬μ©νκΈ° μν΄μ κ°μ λ μν μ λμ μμ± μ κ²½λ§ (CycleGAN) λ°©λ²μ μ μνμλ€.
μν μ λμ μμ± μ κ²½λ§μ μμ λ³ν λͺ¨λΈ (image translation model)λ‘ μ²μ μ μλμμΌλ©°, νΉν μν μΌκ΄μ± μμ€ ν¨μ (cycle consistency loss function)λ₯Ό ν΅ν΄ νμ΄λ§λμ§ μμ μμ (unpaired image)μ λͺ¨λΈ νμ΅μ νμ©ν μ°κ΅¬μ΄λ€. μ΄λ¬ν μν μΌκ΄μ± μμ€ ν¨μμ μκ°μ λ°μ, λ³Έ λ
Όλ¬Έμμλ λΌλ²¨λ§λμ§ μμ μμμ μ°Έμ‘°λ°μ΄ν°μ νμ΄λ§λμ§ μμ λ°μ΄ν°λ‘ κ°μ£ΌνμμΌλ©°, μ΄λ₯Ό ν΅ν΄ λΌλ²¨λ§λμ§ μμ μμμΌλ‘ λΆλ₯ λͺ¨λΈμ ν¨κ» νμ΅μμΌ°λ€.
μλ§μ λΌλ²¨λ§λμ§ μμ λ°μ΄ν°μ μλμ μΌλ‘ μ μ λΌλ²¨λ§λ λ°μ΄ν°λ₯Ό ν¨κ» νμ©νκΈ° μν΄, λ³Έ λ
Όλ¬Έμ μ§λ νμ΅κ³Ό κ°μ λ μ€μ§λ νμ΅ κΈ°λ°μ μν μ λμ μμ± μ κ²½λ§μ κ²°ν©νμλ€. μ μλ νλ μμν¬λ μν κ³Όμ (cyclic phase), μ λμ κ³Όμ (adversarial phase), μ§λ νμ΅ κ³Όμ (supervised learning phase), μΈ λΆλΆμ ν¬ν¨νκ³ μλ€. λΌλ²¨λ§λ μμμ μ§λ νμ΅ κ³Όμ μμ λΆλ₯ λͺ¨λΈμ νμ΅μν€λ λ°μ μ¬μ©λλ€. μ λμ κ³Όμ κ³Ό μ§λ νμ΅ κ³Όμ μμλ λΌλ²¨λ§λμ§ μμ λ°μ΄ν°κ° μ¬μ©λ μ μμΌλ©°, μ΄λ₯Ό ν΅ν΄ μ μ μμ μ°Έμ‘°λ°μ΄ν°λ‘ μΈν΄ μΆ©λΆν νμ΅λμ§ λͺ»ν λΆλ₯ λͺ¨λΈμ μΆκ°μ μΌλ‘ νμ΅μν¨λ€.
μ μλ νλ μμν¬μ κ²°κ³Όλ 곡곡 λ°μ΄ν°μΈ ISPRS Vaihingen Datasetμ ν΅ν΄ νκ°λμλ€. μ νλ κ²μ¦μ μν΄, μ μλ νλ μμν¬μ κ²°κ³Όλ 5κ°μ λ²€μΉλ§ν¬λ€ (benchmarks)κ³Ό λΉκ΅λμμΌλ©°, μ΄λ μ¬μ©λ λ²€μΉλ§ν¬ λͺ¨λΈλ€μ μ§λ νμ΅κ³Ό μ€μ§λ νμ΅ λ°©λ² λͺ¨λλ₯Ό ν¬ν¨νλ€. μ΄μ λν΄, λ³Έ λ
Όλ¬Έμμλ λΌλ²¨λ§λ λ°μ΄ν°μ λΌλ²¨λ§λμ§ μμ λ°μ΄ν°μ ꡬμ±μ λ°λ₯Έ μν₯μ νμΈνμμΌλ©°, λ€λ₯Έ λΆλ₯ λͺ¨λΈμ λν λ³Έ νλ μμν¬μ μ μ©κ°λ₯μ±μ λν μΆκ°μ μΈ μ€νλ μννμλ€.
μ μλ νλ μμν¬λ λ€λ₯Έ λ²€μΉλ§ν¬λ€κ³Ό λΉκ΅ν΄μ κ°μ₯ λμ μ νλ (μΈ μ€ν μ§μμ λν΄ 0.796, 0.786, 0.784μ μ 체 μ νλ)λ₯Ό λ¬μ±νμλ€. νΉν, κ°μ²΄μ ν¬κΈ°λ λͺ¨μκ³Ό κ°μ νΉμ±μ΄ λ€λ₯Έ μ€ν μ§μμμ κ°μ₯ ν° μ νλ μμΉμ νμΈνμμΌλ©°, μ΄λ¬ν κ²°κ³Όλ₯Ό ν΅ν΄ μ μλ μ€μ§λ νμ΅μ΄ λͺ¨λΈμ μ°μνκ² μ κ·ν(regularization)ν¨μ νμΈνμλ€. λν, μ€μ§λ νμ΅μ ν΅ν΄ ν₯μλλ μ νλλ λΌλ²¨λ§λ λ°μ΄ν°μ λΉν΄ λΌλ²¨λ§λμ§ μμ λ°μ΄ν°κ° μλμ μΌλ‘ λ§μμ λ κ·Έ μ¦κ° νμ΄ λμ± μ»€μ‘λ€. λ§μ§λ§μΌλ‘, μ μλ μ€μ§λ νμ΅ κΈ°λ°μ μν μ λμ μμ± μ κ²½λ§ κΈ°λ²μ΄ UNet μΈμλ FPNκ³Ό PSPNetμ΄λΌλ λ€λ₯Έ λΆλ₯ λͺ¨λΈμμλ μ μλ―Έν μ νλ μμΉμ 보μλ€. μ΄λ₯Ό ν΅ν΄ λ€λ₯Έ λΆλ₯ λͺ¨λΈμ λν μ μλ νλ μμν¬μ μ μ©κ°λ₯μ±μ νμΈνμλ€Image classification of Very High Resolution (VHR) images is a fundamental task in the remote sensing domain for various applications such as land cover mapping, vegetation mapping, and urban planning. In recent years, deep convolutional neural networks have shown promising performance in image classification studies. In particular, semantic segmentation models with fully convolutional architecture-based networks demonstrated great improvements in terms of computational cost, which has become especially important with the large accumulation of VHR images in recent years.
However, deep learning-based approaches are generally limited by the need of a sufficient amount of labeled data to obtain stable accuracy, and acquiring reference labels of remotely-sensed VHR images is very labor-extensive and expensive. To overcome this problem, this thesis proposed a semi-supervised learning framework for VHR image classification. Semi-supervised learning uses both labeled and unlabeled data together, thus reducing the modelβs dependency on data labels. To address this issue, this thesis employed a modified CycleGAN model to utilize large amounts of unlabeled images.
CycleGAN is an image translation model which was developed from Generative Adversarial Networks (GAN) for image generation. CycleGAN trains unpaired dataset by using cycle consistency loss with two generators and two discriminators. Inspired by the concept of cycle consistency, this thesis modified CycleGAN to enable the use of unlabeled VHR data in model training by considering the unlabeled images as images unpaired with their corresponding ground truth maps.
To utilize a large amount of unlabeled VHR data and a relatively small amount of labeled VHR data, this thesis combined a supervised learning classification model with the modified CycleGAN architecture. The proposed framework contains three phases: cyclic phase, adversarial phase, and supervised learning phase. Through the three phase, both labeled and unlabeled data can be utilized simultaneously to train the model in an end-to-end manner.
The result of the proposed framework was evaluated by using an open-source VHR image dataset, referred to as the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen dataset. To validate the accuracy of the proposed framework, benchmark models including both supervised and semi-supervised learning methods were compared on the same dataset. Furthermore, two additional experiments were conducted to confirm the impact of labeled and unlabeled data on classification accuracy and adaptation of the CycleGAN model for other classification models. These results were evaluated by the popular three metrics for image classification: Overall Accuracy (OA), F1-score, and mean Intersection over Union (mIoU).
The proposed framework achieved the highest accuracy (OA: 0.796, 0.786, and 0.784, respectively in three test sites) in comparison to the other five benchmarks. In particular, in a test site containing numerous objects with various properties, the largest increase in accuracy was observed due to the regularization effect from the semi-supervised method using unlabeled data with the modified CycleGAN. Moreover, by controlling the amount of labeled and unlabeled data, results indicated that a relatively sufficient amount of unlabeled and labeled data is required to increase the accuracy when using the semi-supervised CycleGAN. Lastly, this thesis applied the proposed CycleGAN method to other classification models such as the feature pyramid network (FPN) and the pyramid scene parsing network (PSPNet), in place of UNet. In all cases, the proposed framework returned significantly improved results, displaying the frameworkβs applicability for semi-supervised image classification on remotely-sensed VHR images.1. Introduction 1
2. Background and Related Works 6
2.1. Deep Learning for Image Classification 6
2.1.1. Image-level Classifiaction 6
2.1.2. Fully Convolutional Architectures 7
2.1.3. Semantic Segmentation for Remote Sensing Images 9
2.2. Generative Adversarial Networks (GAN) 12
2.2.1. Introduction to GAN 12
2.2.2. Image Translation 14
2.2.3. GAN for Semantic Segmentation 16
3. Proposed Framework 20
3.1. Modification of CycleGAN 22
3.2. Feed-forward Path of the Proposed Framework 23
3.2.1. Cyclic Phase 23
3.2.2. Adversarial Phase 23
3.2.3. Supervised Learning Phase 24
3.3. Loss Function for Back-propagation 25
3.4. Proposed Network Architecture 28
3.4.1. Generator Architecture 28
3.4.2. Discriminator Architecture 29
4. Experimental Design 31
4.1. Overall Workflow 33
4.2. Vaihingen Dataset 38
4.3. Implementation Details 40
4.4. Metrics for Quantitative Evaluation 41
5. Results and Discussion 42
5.1. Performance Evaluation of the Proposed Feamwork 42
5.2. Comparison of Classification Performance in the Proposed Framework and Benchmarks 45
5.3. Impact of labeled and Unlabeled Data for Semi-supervised Learning 52
5.4. Cycle Consistency in Semi-supervised Learning 55
5.5. Adaptation of the GAN Framework for Other Classification Models 59
6. Conclusion 62
Reference 65
κ΅λ¬Έ μ΄λ‘ 69μ
Deep Learning Methods for Remote Sensing
Remote sensing is a field where important physical characteristics of an area are exacted using emitted radiation generally captured by satellite cameras, sensors onboard aerial vehicles, etc. Captured data help researchers develop solutions to sense and detect various characteristics such as forest fires, flooding, changes in urban areas, crop diseases, soil moisture, etc. The recent impressive progress in artificial intelligence (AI) and deep learning has sparked innovations in technologies, algorithms, and approaches and led to results that were unachievable until recently in multiple areas, among them remote sensing. This book consists of sixteen peer-reviewed papers covering new advances in the use of AI for remote sensing
Remote sensing traffic scene retrieval based on learning control algorithm for robot multimodal sensing information fusion and human-machine interaction and collaboration
In light of advancing socio-economic development and urban infrastructure, urban traffic congestion and accidents have become pressing issues. High-resolution remote sensing images are crucial for supporting urban geographic information systems (GIS), road planning, and vehicle navigation. Additionally, the emergence of robotics presents new possibilities for traffic management and road safety. This study introduces an innovative approach that combines attention mechanisms and robotic multimodal information fusion for retrieving traffic scenes from remote sensing images. Attention mechanisms focus on specific road and traffic features, reducing computation and enhancing detail capture. Graph neural algorithms improve scene retrieval accuracy. To achieve efficient traffic scene retrieval, a robot equipped with advanced sensing technology autonomously navigates urban environments, capturing high-accuracy, wide-coverage images. This facilitates comprehensive traffic databases and real-time traffic information retrieval for precise traffic management. Extensive experiments on large-scale remote sensing datasets demonstrate the feasibility and effectiveness of this approach. The integration of attention mechanisms, graph neural algorithms, and robotic multimodal information fusion enhances traffic scene retrieval, promising improved information extraction accuracy for more effective traffic management, road safety, and intelligent transportation systems. In conclusion, this interdisciplinary approach, combining attention mechanisms, graph neural algorithms, and robotic technology, represents significant progress in traffic scene retrieval from remote sensing images, with potential applications in traffic management, road safety, and urban planning
Flood dynamics derived from video remote sensing
Flooding is by far the most pervasive natural hazard, with the human impacts of floods expected to worsen in the coming decades due to climate change. Hydraulic models are a key tool for understanding flood dynamics and play a pivotal role in unravelling the processes that occur during a flood event, including inundation flow patterns and velocities. In the realm of river basin dynamics, video remote sensing is emerging as a transformative tool that can offer insights into flow dynamics and thus, together with other remotely sensed data, has the potential to be deployed to estimate discharge. Moreover, the integration of video remote sensing data with hydraulic models offers a pivotal opportunity to enhance the predictive capacity of these models.
Hydraulic models are traditionally built with accurate terrain, flow and bathymetric data and are often calibrated and validated using observed data to obtain meaningful and actionable model predictions. Data for accurately calibrating and validating hydraulic models are not always available, leaving the assessment of the predictive capabilities of some models deployed in flood risk management in question. Recent advances in remote sensing have heralded the availability of vast video datasets of high resolution. The parallel evolution of computing capabilities, coupled with advancements in artificial intelligence are enabling the processing of data at unprecedented scales and complexities, allowing us to glean meaningful insights into datasets that can be integrated with hydraulic models. The aims of the research presented in this thesis were twofold. The first aim was to evaluate and explore the potential applications of video from air- and space-borne platforms to comprehensively calibrate and validate two-dimensional hydraulic models. The second aim was to estimate river discharge using satellite video combined with high resolution topographic data. In the first of three empirical chapters, non-intrusive image velocimetry techniques were employed to estimate river surface velocities in a rural catchment. For the first time, a 2D hydraulicvmodel was fully calibrated and validated using velocities derived from Unpiloted Aerial Vehicle (UAV) image velocimetry approaches. This highlighted the value of these data in mitigating the limitations associated with traditional data sources used in parameterizing two-dimensional hydraulic models. This finding inspired the subsequent chapter where river surface velocities, derived using Large Scale Particle Image Velocimetry (LSPIV), and flood extents, derived using deep neural network-based segmentation, were extracted from satellite video and used to rigorously assess the skill of a two-dimensional hydraulic model. Harnessing the ability of deep neural networks to learn complex features and deliver accurate and contextually informed flood segmentation, the potential value of satellite video for validating two dimensional hydraulic model simulations is exhibited. In the final empirical chapter, the convergence of satellite video imagery and high-resolution topographical data bridges the gap between visual observations and quantitative measurements by enabling the direct extraction of velocities from video imagery, which is used to estimate river discharge. Overall, this thesis demonstrates the significant potential of emerging video-based remote sensing datasets and offers approaches for integrating these data into hydraulic modelling and discharge estimation practice. The incorporation of LSPIV techniques into flood modelling workflows signifies a methodological progression, especially in areas lacking robust data collection infrastructure. Satellite video remote sensing heralds a major step forward in our ability to observe river dynamics in real time, with potentially significant implications in the domain of flood modelling science
- β¦