20 research outputs found

    μ΄ˆκ³ ν•΄μƒλ„ μ˜μƒ λΆ„λ₯˜λ₯Ό μœ„ν•œ μˆœν™˜ μ λŒ€μ  생성 신경망 기반의 쀀지도 ν•™μŠ΅ ν”„λ ˆμž„μ›Œν¬

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ κ±΄μ„€ν™˜κ²½κ³΅ν•™λΆ€, 2021.8. κΉ€μš©μΌ.고해상도 μ˜μƒ λΆ„λ₯˜λŠ” 토지피볡지도 μ œμž‘, 식생 λΆ„λ₯˜, λ„μ‹œ κ³„νš λ“±μ—μ„œ λ‹€μ–‘ν•˜κ²Œ ν™œμš©λ˜λŠ” λŒ€ν‘œμ μΈ μ˜μƒ 뢄석 κΈ°μˆ μ΄λ‹€. 졜근, 심측 ν•©μ„±κ³± 신경망 (deep convolutional neural network)은 μ˜μƒ λΆ„λ₯˜ λΆ„μ•Όμ—μ„œ 두각을 보여왔닀. 특히, 심측 ν•©μ„±κ³± 신경망 기반의 의미둠적 μ˜μƒ λΆ„ν•  (semantic segmentation) 기법은 μ—°μ‚° λΉ„μš©μ„ 맀우 κ°μ†Œμ‹œν‚€λ©°, μ΄λŸ¬ν•œ 점은 μ§€μ†μ μœΌλ‘œ 고해상도 데이터가 μΆ•μ λ˜κ³  μžˆλŠ” 고해상도 μ˜μƒμ„ 뢄석할 λ•Œ μ€‘μš”ν•˜κ²Œ μž‘μš©λœλ‹€. 심측 ν•™μŠ΅ (deep learning) 기반 기법이 μ•ˆμ •μ μΈ μ„±λŠ₯을 λ‹¬μ„±ν•˜κΈ° μœ„ν•΄μ„œλŠ” 일반적으둜 μΆ©λΆ„ν•œ μ–‘μ˜ 라벨링된 데이터 (labeled data)κ°€ ν™•λ³΄λ˜μ–΄μ•Ό ν•œλ‹€. κ·ΈλŸ¬λ‚˜, 원격탐사 λΆ„μ•Όμ—μ„œ 고해상도 μ˜μƒμ— λŒ€ν•œ 참쑰데이터λ₯Ό μ–»λŠ” 것은 λΉ„μš©μ μœΌλ‘œ μ œν•œμ μΈ κ²½μš°κ°€ λ§Žλ‹€. μ΄λŸ¬ν•œ 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 라벨링된 μ˜μƒκ³Ό λΌλ²¨λ§λ˜μ§€ μ•Šμ€ μ˜μƒ (unlabeled image)을 ν•¨κ»˜ μ‚¬μš©ν•˜λŠ” 쀀지도 ν•™μŠ΅ ν”„λ ˆμž„μ›Œν¬λ₯Ό μ œμ•ˆν•˜μ˜€μœΌλ©°, 이λ₯Ό 톡해 고해상도 μ˜μƒ λΆ„λ₯˜λ₯Ό μˆ˜ν–‰ν•˜μ˜€λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λΌλ²¨λ§λ˜μ§€ μ•Šμ€ μ˜μƒμ„ μ‚¬μš©ν•˜κΈ° μœ„ν•΄μ„œ κ°œμ„ λœ μˆœν™˜ μ λŒ€μ  생성 신경망 (CycleGAN) 방법을 μ œμ•ˆν•˜μ˜€λ‹€. μˆœν™˜ μ λŒ€μ  생성 신경망은 μ˜μƒ λ³€ν™˜ λͺ¨λΈ (image translation model)둜 처음 μ œμ•ˆλ˜μ—ˆμœΌλ©°, 특히 μˆœν™˜ 일관성 손싀 ν•¨μˆ˜ (cycle consistency loss function)λ₯Ό 톡해 νŽ˜μ–΄λ§λ˜μ§€ μ•Šμ€ μ˜μƒ (unpaired image)을 λͺ¨λΈ ν•™μŠ΅μ— ν™œμš©ν•œ 연ꡬ이닀. μ΄λŸ¬ν•œ μˆœν™˜ 일관성 손싀 ν•¨μˆ˜μ— μ˜κ°μ„ λ°›μ•„, λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λΌλ²¨λ§λ˜μ§€ μ•Šμ€ μ˜μƒμ„ 참쑰데이터와 νŽ˜μ–΄λ§λ˜μ§€ μ•Šμ€ λ°μ΄ν„°λ‘œ κ°„μ£Όν•˜μ˜€μœΌλ©°, 이λ₯Ό 톡해 λΌλ²¨λ§λ˜μ§€ μ•Šμ€ μ˜μƒμœΌλ‘œ λΆ„λ₯˜ λͺ¨λΈμ„ ν•¨κ»˜ ν•™μŠ΅μ‹œμΌ°λ‹€. μˆ˜λ§Žμ€ λΌλ²¨λ§λ˜μ§€ μ•Šμ€ 데이터와 μƒλŒ€μ μœΌλ‘œ 적은 라벨링된 데이터λ₯Ό ν•¨κ»˜ ν™œμš©ν•˜κΈ° μœ„ν•΄, λ³Έ 논문은 지도 ν•™μŠ΅κ³Ό κ°œμ„ λœ 쀀지도 ν•™μŠ΅ 기반의 μˆœν™˜ μ λŒ€μ  생성 신경망을 κ²°ν•©ν•˜μ˜€λ‹€. μ œμ•ˆλœ ν”„λ ˆμž„μ›Œν¬λŠ” μˆœν™˜ κ³Όμ •(cyclic phase), μ λŒ€μ  κ³Όμ •(adversarial phase), 지도 ν•™μŠ΅ κ³Όμ •(supervised learning phase), μ„Έ 뢀뢄을 ν¬ν•¨ν•˜κ³  μžˆλ‹€. 라벨링된 μ˜μƒμ€ 지도 ν•™μŠ΅ κ³Όμ •μ—μ„œ λΆ„λ₯˜ λͺ¨λΈμ„ ν•™μŠ΅μ‹œν‚€λŠ” 데에 μ‚¬μš©λœλ‹€. μ λŒ€μ  κ³Όμ •κ³Ό 지도 ν•™μŠ΅ κ³Όμ •μ—μ„œλŠ” λΌλ²¨λ§λ˜μ§€ μ•Šμ€ 데이터가 μ‚¬μš©λ  수 있으며, 이λ₯Ό 톡해 적은 μ–‘μ˜ μ°Έμ‘°λ°μ΄ν„°λ‘œ 인해 μΆ©λΆ„νžˆ ν•™μŠ΅λ˜μ§€ λͺ»ν•œ λΆ„λ₯˜ λͺ¨λΈμ„ μΆ”κ°€μ μœΌλ‘œ ν•™μŠ΅μ‹œν‚¨λ‹€. μ œμ•ˆλœ ν”„λ ˆμž„μ›Œν¬μ˜ κ²°κ³ΌλŠ” 곡곡 데이터인 ISPRS Vaihingen Dataset을 톡해 ν‰κ°€λ˜μ—ˆλ‹€. 정확도 검증을 μœ„ν•΄, μ œμ•ˆλœ ν”„λ ˆμž„μ›Œν¬μ˜ κ²°κ³ΌλŠ” 5개의 λ²€μΉ˜λ§ˆν¬λ“€ (benchmarks)κ³Ό λΉ„κ΅λ˜μ—ˆμœΌλ©°, μ΄λ•Œ μ‚¬μš©λœ 벀치마크 λͺ¨λΈλ“€μ€ 지도 ν•™μŠ΅κ³Ό 쀀지도 ν•™μŠ΅ 방법 λͺ¨λ‘λ₯Ό ν¬ν•¨ν•œλ‹€. 이에 더해, λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 라벨링된 데이터와 λΌλ²¨λ§λ˜μ§€ μ•Šμ€ λ°μ΄ν„°μ˜ ꡬ성에 λ”°λ₯Έ 영ν–₯을 ν™•μΈν•˜μ˜€μœΌλ©°, λ‹€λ₯Έ λΆ„λ₯˜ λͺ¨λΈμ— λŒ€ν•œ λ³Έ ν”„λ ˆμž„μ›Œν¬μ˜ μ μš©κ°€λŠ₯성에 λŒ€ν•œ 좔가적인 μ‹€ν—˜λ„ μˆ˜ν–‰ν•˜μ˜€λ‹€. μ œμ•ˆλœ ν”„λ ˆμž„μ›Œν¬λŠ” λ‹€λ₯Έ λ²€μΉ˜λ§ˆν¬λ“€κ³Ό λΉ„κ΅ν•΄μ„œ κ°€μž₯ 높은 정확도 (μ„Έ μ‹€ν—˜ 지역에 λŒ€ν•΄ 0.796, 0.786, 0.784의 전체 정확도)λ₯Ό λ‹¬μ„±ν•˜μ˜€λ‹€. 특히, 객체의 ν¬κΈ°λ‚˜ λͺ¨μ–‘κ³Ό 같은 νŠΉμ„±μ΄ λ‹€λ₯Έ μ‹€ν—˜ μ§€μ—­μ—μ„œ κ°€μž₯ 큰 정확도 μƒμŠΉμ„ ν™•μΈν•˜μ˜€μœΌλ©°, μ΄λŸ¬ν•œ κ²°κ³Όλ₯Ό 톡해 μ œμ•ˆλœ 쀀지도 ν•™μŠ΅μ΄ λͺ¨λΈμ„ μš°μˆ˜ν•˜κ²Œ μ •κ·œν™”(regularization)함을 ν™•μΈν•˜μ˜€λ‹€. λ˜ν•œ, 쀀지도 ν•™μŠ΅μ„ 톡해 ν–₯μƒλ˜λŠ” μ •ν™•λ„λŠ” 라벨링된 데이터에 λΉ„ν•΄ λΌλ²¨λ§λ˜μ§€ μ•Šμ€ 데이터가 μƒλŒ€μ μœΌλ‘œ λ§Žμ•˜μ„ λ•Œ κ·Έ 증가 폭이 λ”μš± μ»€μ‘Œλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μ œμ•ˆλœ 쀀지도 ν•™μŠ΅ 기반의 μˆœν™˜ μ λŒ€μ  생성 신경망 기법이 UNet 외에도 FPNκ³Ό PSPNetμ΄λΌλŠ” λ‹€λ₯Έ λΆ„λ₯˜ λͺ¨λΈμ—μ„œλ„ μœ μ˜λ―Έν•œ 정확도 μƒμŠΉμ„ λ³΄μ˜€λ‹€. 이λ₯Ό 톡해 λ‹€λ₯Έ λΆ„λ₯˜ λͺ¨λΈμ— λŒ€ν•œ μ œμ•ˆλœ ν”„λ ˆμž„μ›Œν¬μ˜ μ μš©κ°€λŠ₯성을 ν™•μΈν•˜μ˜€λ‹€Image classification of Very High Resolution (VHR) images is a fundamental task in the remote sensing domain for various applications such as land cover mapping, vegetation mapping, and urban planning. In recent years, deep convolutional neural networks have shown promising performance in image classification studies. In particular, semantic segmentation models with fully convolutional architecture-based networks demonstrated great improvements in terms of computational cost, which has become especially important with the large accumulation of VHR images in recent years. However, deep learning-based approaches are generally limited by the need of a sufficient amount of labeled data to obtain stable accuracy, and acquiring reference labels of remotely-sensed VHR images is very labor-extensive and expensive. To overcome this problem, this thesis proposed a semi-supervised learning framework for VHR image classification. Semi-supervised learning uses both labeled and unlabeled data together, thus reducing the model’s dependency on data labels. To address this issue, this thesis employed a modified CycleGAN model to utilize large amounts of unlabeled images. CycleGAN is an image translation model which was developed from Generative Adversarial Networks (GAN) for image generation. CycleGAN trains unpaired dataset by using cycle consistency loss with two generators and two discriminators. Inspired by the concept of cycle consistency, this thesis modified CycleGAN to enable the use of unlabeled VHR data in model training by considering the unlabeled images as images unpaired with their corresponding ground truth maps. To utilize a large amount of unlabeled VHR data and a relatively small amount of labeled VHR data, this thesis combined a supervised learning classification model with the modified CycleGAN architecture. The proposed framework contains three phases: cyclic phase, adversarial phase, and supervised learning phase. Through the three phase, both labeled and unlabeled data can be utilized simultaneously to train the model in an end-to-end manner. The result of the proposed framework was evaluated by using an open-source VHR image dataset, referred to as the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen dataset. To validate the accuracy of the proposed framework, benchmark models including both supervised and semi-supervised learning methods were compared on the same dataset. Furthermore, two additional experiments were conducted to confirm the impact of labeled and unlabeled data on classification accuracy and adaptation of the CycleGAN model for other classification models. These results were evaluated by the popular three metrics for image classification: Overall Accuracy (OA), F1-score, and mean Intersection over Union (mIoU). The proposed framework achieved the highest accuracy (OA: 0.796, 0.786, and 0.784, respectively in three test sites) in comparison to the other five benchmarks. In particular, in a test site containing numerous objects with various properties, the largest increase in accuracy was observed due to the regularization effect from the semi-supervised method using unlabeled data with the modified CycleGAN. Moreover, by controlling the amount of labeled and unlabeled data, results indicated that a relatively sufficient amount of unlabeled and labeled data is required to increase the accuracy when using the semi-supervised CycleGAN. Lastly, this thesis applied the proposed CycleGAN method to other classification models such as the feature pyramid network (FPN) and the pyramid scene parsing network (PSPNet), in place of UNet. In all cases, the proposed framework returned significantly improved results, displaying the framework’s applicability for semi-supervised image classification on remotely-sensed VHR images.1. Introduction 1 2. Background and Related Works 6 2.1. Deep Learning for Image Classification 6 2.1.1. Image-level Classifiaction 6 2.1.2. Fully Convolutional Architectures 7 2.1.3. Semantic Segmentation for Remote Sensing Images 9 2.2. Generative Adversarial Networks (GAN) 12 2.2.1. Introduction to GAN 12 2.2.2. Image Translation 14 2.2.3. GAN for Semantic Segmentation 16 3. Proposed Framework 20 3.1. Modification of CycleGAN 22 3.2. Feed-forward Path of the Proposed Framework 23 3.2.1. Cyclic Phase 23 3.2.2. Adversarial Phase 23 3.2.3. Supervised Learning Phase 24 3.3. Loss Function for Back-propagation 25 3.4. Proposed Network Architecture 28 3.4.1. Generator Architecture 28 3.4.2. Discriminator Architecture 29 4. Experimental Design 31 4.1. Overall Workflow 33 4.2. Vaihingen Dataset 38 4.3. Implementation Details 40 4.4. Metrics for Quantitative Evaluation 41 5. Results and Discussion 42 5.1. Performance Evaluation of the Proposed Feamwork 42 5.2. Comparison of Classification Performance in the Proposed Framework and Benchmarks 45 5.3. Impact of labeled and Unlabeled Data for Semi-supervised Learning 52 5.4. Cycle Consistency in Semi-supervised Learning 55 5.5. Adaptation of the GAN Framework for Other Classification Models 59 6. Conclusion 62 Reference 65 κ΅­λ¬Έ 초둝 69석

    Multi-Object Segmentation in Complex Urban Scenes from High-Resolution Remote Sensing Data

    Full text link
    Terrestrial features extraction, such as roads and buildings from aerial images using an automatic system, has many usages in an extensive range of fields, including disaster management, change detection, land cover assessment, and urban planning. This task is commonly tough because of complex scenes, such as urban scenes, where buildings and road objects are surrounded by shadows, vehicles, trees, etc., which appear in heterogeneous forms with lower inter-class and higher intra-class contrasts. Moreover, such extraction is time-consuming and expensive to perform by human specialists manually. Deep convolutional models have displayed considerable performance for feature segmentation from remote sensing data in the recent years. However, for the large and continuous area of obstructions, most of these techniques still cannot detect road and building well. Hence, this work’s principal goal is to introduce two novel deep convolutional models based on UNet family for multi-object segmentation, such as roads and buildings from aerial imagery. We focused on buildings and road networks because these objects constitute a huge part of the urban areas. The presented models are called multi-level context gating UNet (MCG-UNet) and bi-directional ConvLSTM UNet model (BCL-UNet). The proposed methods have the same advantages as the UNet model, the mechanism of densely connected convolutions, bi-directional ConvLSTM, and squeeze and excitation module to produce the segmentation maps with a high resolution and maintain the boundary information even under complicated backgrounds. Additionally, we implemented a basic efficient loss function called boundary-aware loss (BAL) that allowed a network to concentrate on hard semantic segmentation regions, such as overlapping areas, small objects, sophisticated objects, and boundaries of objects, and produce high-quality segmentation maps. The presented networks were tested on the Massachusetts building and road datasets. The MCG-UNet improved the average F1 accuracy by 1.85%, and 1.19% and 6.67% and 5.11% compared with UNet and BCL-UNet for road and building extraction, respectively. Additionally, the presented MCG-UNet and BCL-UNet networks were compared with other state-of-the-art deep learning-based networks, and the results proved the superiority of the networks in multi-object segmentation tasks

    Semantic Segmentation and Edge Detectionβ€”Approach to Road Detection in Very High Resolution Satellite Images

    Get PDF
    Road detection technology plays an essential role in a variety of applications, such as urban planning, map updating, traffic monitoring and automatic vehicle navigation. Recently, there has been much development in detecting roads in high-resolution (HR) satellite images based on semantic segmentation. However, the objects being segmented in such images are of small size, and not all the information in the images is equally important when making a decision. This paper proposes a novel approach to road detection based on semantic segmentation and edge detection. Our approach aims to combine these two techniques to improve road detection, and it produces sharp-pixel segmentation maps, using the segmented masks to generate road edges. In addition, some well-known architectures, such as SegNet, used multi-scale features without refinement; thus, using attention blocks in the encoder to predict fine segmentation masks resulted in finer edges. A combination of weighted cross-entropy loss and the focal Tversky loss as the loss function is also used to deal with the highly imbalanced dataset. We conducted various experiments on two datasets describing real-world datasets covering the three largest regions in Saudi Arabia and Massachusetts. The results demonstrated that the proposed method of encoding HR feature maps effectively predicts sharp segmentation masks to facilitate accurate edge detection, even against a harsh and complicated background

    Training a Fully Convolutional Neural Network with Imbalanced, Imperfect and Incomplete Data for Roof Type Segmentation

    Get PDF
    Nowadays, satellites constantly supply world-wide coverage of large-scale, Very High-Resolution (VHR) satellite imagery. The interpretation of such imagery is very expensive if done by a human. However, modern deep learning methods automatically extract semantically meaningful features for image interpretation if trained on a set of input-output pairs of high quality. In 3D reconstruction, the automatic prediction of the roof-type is an open problem. Even though some research has been done to predict the roof-type, either the number of classes was limited to flat and non-flat, or the acquisition of the ground truth was done by manually labeling many buildings. But roof type information is publicly available through the internet, such as contained in the CityGML [3] dataset of Berlin, Germany. On the other hand, such datasets have only very few samples of some classes, contain mislabeling and are incomplete. But there are methods for dealing with class-imbalance, such as the focal loss [4] and inverse frequency weights and recently, an adaption of the loss function in deep learning has been proposed, which makes the training of an Fully Convolutional Neural Network (FCN) more robust to errors in the ground truth [5]. Furthermore, Semi-Supervised Learning (SSL) was extended from classification to semantic segmentation. For example, Virtual Adversarial Training (VAT) was evaluated for dense, pixel-wise classification on a benchmark dataset [6]. In this thesis, these solutions are assembled into a combined loss LCOM to train a DeepLabv3+ [7] for roof-type segmentation on an imbalanced, imperfect and incomplete training dataset. The proposed method achieves considerable improvements and successfully predicts the roof-type in many cases. But it also fails in some cases, which are visualized and discussed

    Road Segmentation in High-Resolution Images Using Deep Residual Networks

    Get PDF
    Automatic road detection from remote sensing images is a vital application for traffic management, urban planning, and disaster management. The presence of occlusions like shadows of buildings, trees, and flyovers in high-resolution images and miss-classifications in databases create obstacles in the road detection task. Therefore, an automatic road detection system is required to detect roads in the presence of occlusions. This paper presents a deep convolutional neural network to address the problem of road detection, consisting of an encoder-decoder architecture. The architecture contains a U-Network with residual blocks. U-Network allows the transfer of low-level features to the high-level, helping the network to learn low-level details. Residual blocks help maintain the network's training performance, which may deteriorate due to a deep network. The encoder and decoder structures generate a feature map and classify pixels into road and non-road classes, respectively. Experimentation was performed on the Massachusetts road dataset. The results showed that the proposed model gave better accuracy than current state-of-the-art methods

    Deep Learning based 3D Segmentation: A Survey

    Full text link
    3D object segmentation is a fundamental and challenging problem in computer vision with applications in autonomous driving, robotics, augmented reality and medical image analysis. It has received significant attention from the computer vision, graphics and machine learning communities. Traditionally, 3D segmentation was performed with hand-crafted features and engineered methods which failed to achieve acceptable accuracy and could not generalize to large-scale data. Driven by their great success in 2D computer vision, deep learning techniques have recently become the tool of choice for 3D segmentation tasks as well. This has led to an influx of a large number of methods in the literature that have been evaluated on different benchmark datasets. This paper provides a comprehensive survey of recent progress in deep learning based 3D segmentation covering over 150 papers. It summarizes the most commonly used pipelines, discusses their highlights and shortcomings, and analyzes the competitive results of these segmentation methods. Based on the analysis, it also provides promising research directions for the future.Comment: Under review of ACM Computing Surveys, 36 pages, 10 tables, 9 figure

    CLASSIFICATION OF PARKINSON'S DISEASE IN BRAIN MRI IMAGES USING DEEP RESIDUAL CONVOLUTIONAL NEURAL NETWORK

    Get PDF
    In our aging culture, neurodegenerative disorders like Parkinson's disease (PD) are among the most serious health issues. It is a neurological condition that has social and economic effects on individuals. It happens because the brain's dopamine-producing cells are unable to produce enough of the chemical to support the body's motor functions. The main symptoms of this illness are eyesight, excretion activity, speech, and mobility issues, followed by depression, anxiety, sleep issues, and panic attacks. The main aim of this research is to develop a workable clinical decision-making framework that aids the physician in diagnosing patients with PD influence. In this research, we proposed a technique to classify Parkinson’s disease by MRI brain images. Initially, normalize the input data using the min-max normalization method and then remove noise from input images using a median filter. Then utilizing the Binary Dragonfly Algorithm to select the features. Furthermore, to segment the diseased part from MRI brain images using the technique Dense-UNet. Then, classify the disease as if it’s Parkinson’s disease or health control using the Deep Residual Convolutional Neural Network (DRCNN) technique along with Enhanced Whale Optimization Algorithm (EWOA) to get better classification accuracy. Here, we use the public Parkinson’s Progression Marker Initiative (PPMI) dataset for Parkinson’s MRI images. The accuracy, sensitivity, specificity, and precision metrics will be utilized with manually gathered data to assess the efficacy of the proposed methodology

    Medical Image Segmentation Review: The success of U-Net

    Full text link
    Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model achieved tremendous attention from academic and industrial researchers. Several extensions of this network have been proposed to address the scale and complexity created by medical tasks. Addressing the deficiency of the naive U-Net model is the foremost step for vendors to utilize the proper U-Net variant model for their business. Having a compendium of different variants in one place makes it easier for builders to identify the relevant research. Also, for ML researchers it will help them understand the challenges of the biological tasks that challenge the model. To address this, we discuss the practical aspects of the U-Net model and suggest a taxonomy to categorize each network variant. Moreover, to measure the performance of these strategies in a clinical application, we propose fair evaluations of some unique and famous designs on well-known datasets. We provide a comprehensive implementation library with trained models for future research. In addition, for ease of future studies, we created an online list of U-Net papers with their possible official implementation. All information is gathered in https://github.com/NITR098/Awesome-U-Net repository.Comment: Submitted to the IEEE Transactions on Pattern Analysis and Machine Intelligence Journa

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Unstructured road extraction and roadside fruit recognition in grape orchards based on a synchronous detection algorithm

    Get PDF
    Accurate road extraction and recognition of roadside fruit in complex orchard environments are essential prerequisites for robotic fruit picking and walking behavioral decisions. In this study, a novel algorithm was proposed for unstructured road extraction and roadside fruit synchronous recognition, with wine grapes and nonstructural orchards as research objects. Initially, a preprocessing method tailored to field orchards was proposed to reduce the interference of adverse factors in the operating environment. The preprocessing method contained 4 parts: interception of regions of interest, bilateral filter, logarithmic space transformation and image enhancement based on the MSRCR algorithm. Subsequently, the analysis of the enhanced image enabled the optimization of the gray factor, and a road region extraction method based on dual-space fusion was proposed by color channel enhancement and gray factor optimization. Furthermore, the YOLO model suitable for grape cluster recognition in the wild environment was selected, and its parameters were optimized to enhance the recognition performance of the model for randomly distributed grapes. Finally, a fusion recognition framework was innovatively established, wherein the road extraction result was taken as input, and the optimized parameter YOLO model was utilized to identify roadside fruits, thus realizing synchronous road extraction and roadside fruit detection. Experimental results demonstrated that the proposed method based on the pretreatment could reduce the impact of interfering factors in complex orchard environments and enhance the quality of road extraction. Using the optimized YOLOv7 model, the precision, recall, mAP, and F1-score for roadside fruit cluster detection were 88.9%, 89.7%, 93.4%, and 89.3%, respectively, all of which were higher than those of the YOLOv5 model and were more suitable for roadside grape recognition. Compared to the identification results obtained by the grape detection algorithm alone, the proposed synchronous algorithm increased the number of fruit identifications by 23.84% and the detection speed by 14.33%. This research enhanced the perception ability of robots and provided a solid support for behavioral decision systems
    corecore