129 research outputs found

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Siamese Object Tracking for Unmanned Aerial Vehicle: A Review and Comprehensive Analysis

    Full text link
    Unmanned aerial vehicle (UAV)-based visual object tracking has enabled a wide range of applications and attracted increasing attention in the field of intelligent transportation systems because of its versatility and effectiveness. As an emerging force in the revolutionary trend of deep learning, Siamese networks shine in UAV-based object tracking with their promising balance of accuracy, robustness, and speed. Thanks to the development of embedded processors and the gradual optimization of deep neural networks, Siamese trackers receive extensive research and realize preliminary combinations with UAVs. However, due to the UAV's limited onboard computational resources and the complex real-world circumstances, aerial tracking with Siamese networks still faces severe obstacles in many aspects. To further explore the deployment of Siamese networks in UAV-based tracking, this work presents a comprehensive review of leading-edge Siamese trackers, along with an exhaustive UAV-specific analysis based on the evaluation using a typical UAV onboard processor. Then, the onboard tests are conducted to validate the feasibility and efficacy of representative Siamese trackers in real-world UAV deployment. Furthermore, to better promote the development of the tracking community, this work analyzes the limitations of existing Siamese trackers and conducts additional experiments represented by low-illumination evaluations. In the end, prospects for the development of Siamese tracking for UAV-based intelligent transportation systems are deeply discussed. The unified framework of leading-edge Siamese trackers, i.e., code library, and the results of their experimental evaluations are available at https://github.com/vision4robotics/SiameseTracking4UAV

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Deep Learning Methods for Remote Sensing

    Get PDF
    Remote sensing is a field where important physical characteristics of an area are exacted using emitted radiation generally captured by satellite cameras, sensors onboard aerial vehicles, etc. Captured data help researchers develop solutions to sense and detect various characteristics such as forest fires, flooding, changes in urban areas, crop diseases, soil moisture, etc. The recent impressive progress in artificial intelligence (AI) and deep learning has sparked innovations in technologies, algorithms, and approaches and led to results that were unachievable until recently in multiple areas, among them remote sensing. This book consists of sixteen peer-reviewed papers covering new advances in the use of AI for remote sensing

    Semantic Segmentation and Completion of 2D and 3D Scenes

    Get PDF
    Semantic segmentation is one of the fundamental problems in computer vision. This thesis addresses various tasks, all related to the fine-grained, i.e. pixel-wise or voxel-wise, semantic understanding of a scene. In the recent years semantic segmentation by 2D convolutional neural networks has become as much as a default pre-processing step for many other computer vision tasks, since it outputs very rich spatially resolved feature maps and semantic labels that are useful for many higher level recognition tasks. In this thesis, we make several contributions to the field of semantic scene understanding using an image or a depth measurement, recorded by different types of laser sensors, as input. Firstly, we propose a new approach to 2D semantic segmentation of images. It consists of an adaptation of an existing approach for real time capability under constrained hardware demands that are required by a real life drone. The approach is based on a highly optimized implementation of random forests combined with a label propagation strategy. Next, we shift our focus to what we believe is one of the important next forefronts in computer vision: To give machines the ability to anticipate and extrapolate beyond what is captured in a single frame by a camera or depth sensor. This anticipation capability is what allows humans to efficiently interact with their environment. The need for this ability is most prominently displayed in the behaviour of today's autonomous cars. One of their shortcomings is that they only interpret the current sensor state, which prevents them from anticipating events which would require an adaptation of their driving policy. The result is a lot of sudden breaks and non-human-like driving behaviour, which can provoke accidents or negatively impact the traffic flow. Therefore we first propose a task to spatially anticipate semantic labels outside the field of view of an image. The task is based on the Cityscapes dataset, where each image has been center cropped. The goal is to train an algorithm that predicts the semantic segmentation map in the area outside the cropped input region. Along with the task itself, we propose an efficient iterative approach based on 2D convolutional neural networks by designing a task adapted loss function. Afterwards, we switch to the 3D domain. In three dimensions the goal shifts from assigning pixel-wise labels towards the reconstruction of the full 3D scene using a grid of labeled voxels. Thereby one has to anticipate the semantics and geometry in the space that is occluded by the objects themselves from the viewpoint of an image or laser sensor. The task is known as 3D semantic scene completion and has recently caught a lot of attention. Here we propose two new approaches that advance the performance of existing 3D semantic scene completion baselines. The first one is a two stream approach where we leverage a multi-modal input consisting of images and Kinect depth measurements in an early fusion scheme. Moreover we propose a more memory efficient input embedding. The second approach to semantic scene completion leverages the power of the recently introduced generative adversarial networks (GANs). Here we construct a network architecture that follows the GAN principles and uses a discriminator network as an additional regularizer in the 3D-CNN training. With our proposed approaches in semantic scene completion we achieve a new state-of-the-art performance on two benchmark datasets. Finally we observe that one of the shortcomings in semantic scene completion is the lack of a realistic, large scale dataset. We therefore introduce the first real world dataset for semantic scene completion based on the KITTI odometry benchmark. By semantically annotating alls scans of a 10 Hz Velodyne laser scanner, driving through urban and countryside areas, we obtain data that is valuable for many tasks including semantic scene completion. Along with the data we explore the performance of current semantic scene completion models as well as models for semantic point cloud segmentation and motion segmentation. The results show that there is still a lot of space for improvement for either tasks so our dataset is a valuable contribution for future research into these directions

    Towards Large-Scale Small Object Detection: Survey and Benchmarks

    Full text link
    With the rise of deep convolutional neural networks, object detection has achieved prominent advances in past years. However, such prosperity could not camouflage the unsatisfactory situation of Small Object Detection (SOD), one of the notoriously challenging tasks in computer vision, owing to the poor visual appearance and noisy representation caused by the intrinsic structure of small targets. In addition, large-scale dataset for benchmarking small object detection methods remains a bottleneck. In this paper, we first conduct a thorough review of small object detection. Then, to catalyze the development of SOD, we construct two large-scale Small Object Detection dAtasets (SODA), SODA-D and SODA-A, which focus on the Driving and Aerial scenarios respectively. SODA-D includes 24828 high-quality traffic images and 278433 instances of nine categories. For SODA-A, we harvest 2513 high resolution aerial images and annotate 872069 instances over nine classes. The proposed datasets, as we know, are the first-ever attempt to large-scale benchmarks with a vast collection of exhaustively annotated instances tailored for multi-category SOD. Finally, we evaluate the performance of mainstream methods on SODA. We expect the released benchmarks could facilitate the development of SOD and spawn more breakthroughs in this field. Datasets and codes are available at: \url{https://shaunyuan22.github.io/SODA}

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Deep learning & remote sensing : pushing the frontiers in image segmentation

    Get PDF
    Dissertação (Mestrado em Informática) — Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, Brasília, 2022.A segmentação de imagens visa simplificar o entendimento de imagens digitais e métodos de aprendizado profundo usando redes neurais convolucionais permitem a exploração de diferentes tarefas (e.g., segmentação semântica, instância e panóptica). A segmentação semântica atribui uma classe a cada pixel em uma imagem, a segmentação de instância classifica objetos a nível de pixel com um identificador exclusivo para cada alvo e a segmentação panóptica combina instâncias com diferentes planos de fundo. Os dados de sensoriamento remoto são muito adequados para desenvolver novos algoritmos. No entanto, algumas particularidades impedem que o sensoriamento remoto com imagens orbitais e aéreas cresça quando comparado às imagens tradicionais (e.g., fotos de celulares): (1) as imagens são muito extensas, (2) apresenta características diferentes (e.g., número de canais e formato de imagem), (3) um grande número de etapas de préprocessamento e pós-processamento (e.g., extração de quadros e classificação de cenas grandes) e (4) os softwares para rotulagem e treinamento de modelos não são compatíveis. Esta dissertação visa avançar nas três principais categorias de segmentação de imagens. Dentro do domínio de segmentação de instâncias, propusemos três experimentos. Primeiro, aprimoramos a abordagem de segmentação de instância baseada em caixa para classificar cenas grandes. Em segundo lugar, criamos um método sem caixas delimitadoras para alcançar resultados de segmentação de instâncias usando modelos de segmentação semântica em um cenário com objetos esparsos. Terceiro, aprimoramos o método anterior para cenas aglomeradas e desenvolvemos o primeiro estudo considerando aprendizado semissupervisionado usando sensoriamento remoto e dados GIS. Em seguida, no domínio da segmentação panóptica, apresentamos o primeiro conjunto de dados de segmentação panóptica de sensoriamento remoto e dispomos de uma metodologia para conversão de dados GIS no formato COCO. Como nosso primeiro estudo considerou imagens RGB, estendemos essa abordagem para dados multiespectrais. Por fim, melhoramos o método box-free inicialmente projetado para segmentação de instâncias para a tarefa de segmentação panóptica. Esta dissertação analisou vários métodos de segmentação e tipos de imagens, e as soluções desenvolvidas permitem a exploração de novas tarefas , a simplificação da rotulagem de dados e uma forma simplificada de obter previsões de instância e panópticas usando modelos simples de segmentação semântica.Image segmentation aims to simplify the understanding of digital images. Deep learning-based methods using convolutional neural networks have been game-changing, allowing the exploration of different tasks (e.g., semantic, instance, and panoptic segmentation). Semantic segmentation assigns a class to every pixel in an image, instance segmentation classifies objects at a pixel level with a unique identifier for each target, and panoptic segmentation combines instancelevel predictions with different backgrounds. Remote sensing data largely benefits from those methods, being very suitable for developing new DL algorithms and creating solutions using top-view images. However, some peculiarities prevent remote sensing using orbital and aerial imagery from growing when compared to traditional ground-level images (e.g., camera photos): (1) The images are extensive, (2) it presents different characteristics (e.g., number of channels and image format), (3) a high number of pre-processes and post-processes steps (e.g., extracting patches and classifying large scenes), and (4) most open software for labeling and deep learning applications are not friendly to remote sensing due to the aforementioned reasons. This dissertation aimed to improve all three main categories of image segmentation. Within the instance segmentation domain, we proposed three experiments. First, we enhanced the box-based instance segmentation approach for classifying large scenes, allowing practical pipelines to be implemented. Second, we created a bounding-box free method to reach instance segmentation results by using semantic segmentation models in a scenario with sparse objects. Third, we improved the previous method for crowded scenes and developed the first study considering semi-supervised learning using remote sensing and GIS data. Subsequently, in the panoptic segmentation domain, we presented the first remote sensing panoptic segmentation dataset containing fourteen classes and disposed of software and methodology for converting GIS data into the panoptic segmentation format. Since our first study considered RGB images, we extended our approach to multispectral data. Finally, we leveraged the box-free method initially designed for instance segmentation to the panoptic segmentation task. This dissertation analyzed various segmentation methods and image types, and the developed solutions enable the exploration of new tasks (such as panoptic segmentation), the simplification of labeling data (using the proposed semi-supervised learning procedure), and a simplified way to obtain instance and panoptic predictions using simple semantic segmentation models

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
    • …
    corecore