129 research outputs found
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
Siamese Object Tracking for Unmanned Aerial Vehicle: A Review and Comprehensive Analysis
Unmanned aerial vehicle (UAV)-based visual object tracking has enabled a wide
range of applications and attracted increasing attention in the field of
intelligent transportation systems because of its versatility and
effectiveness. As an emerging force in the revolutionary trend of deep
learning, Siamese networks shine in UAV-based object tracking with their
promising balance of accuracy, robustness, and speed. Thanks to the development
of embedded processors and the gradual optimization of deep neural networks,
Siamese trackers receive extensive research and realize preliminary
combinations with UAVs. However, due to the UAV's limited onboard computational
resources and the complex real-world circumstances, aerial tracking with
Siamese networks still faces severe obstacles in many aspects. To further
explore the deployment of Siamese networks in UAV-based tracking, this work
presents a comprehensive review of leading-edge Siamese trackers, along with an
exhaustive UAV-specific analysis based on the evaluation using a typical UAV
onboard processor. Then, the onboard tests are conducted to validate the
feasibility and efficacy of representative Siamese trackers in real-world UAV
deployment. Furthermore, to better promote the development of the tracking
community, this work analyzes the limitations of existing Siamese trackers and
conducts additional experiments represented by low-illumination evaluations. In
the end, prospects for the development of Siamese tracking for UAV-based
intelligent transportation systems are deeply discussed. The unified framework
of leading-edge Siamese trackers, i.e., code library, and the results of their
experimental evaluations are available at
https://github.com/vision4robotics/SiameseTracking4UAV
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Deep Learning Methods for Remote Sensing
Remote sensing is a field where important physical characteristics of an area are exacted using emitted radiation generally captured by satellite cameras, sensors onboard aerial vehicles, etc. Captured data help researchers develop solutions to sense and detect various characteristics such as forest fires, flooding, changes in urban areas, crop diseases, soil moisture, etc. The recent impressive progress in artificial intelligence (AI) and deep learning has sparked innovations in technologies, algorithms, and approaches and led to results that were unachievable until recently in multiple areas, among them remote sensing. This book consists of sixteen peer-reviewed papers covering new advances in the use of AI for remote sensing
Semantic Segmentation and Completion of 2D and 3D Scenes
Semantic segmentation is one of the fundamental problems in computer vision. This thesis addresses various tasks, all related to the fine-grained, i.e. pixel-wise or voxel-wise, semantic understanding of a scene. In the recent years semantic segmentation by 2D convolutional neural networks has become as much as a default pre-processing step for many other computer vision tasks, since it outputs very rich spatially resolved feature maps and semantic labels that are useful for many higher level recognition tasks. In this thesis, we make several contributions to the field of semantic scene understanding using an image or a depth measurement, recorded by different types of laser sensors, as input. Firstly, we propose a new approach to 2D semantic segmentation of images. It consists of an adaptation of an existing approach for real time capability under constrained hardware demands that are required by a real life drone. The approach is based on a highly optimized implementation of random forests combined with a label propagation strategy. Next, we shift our focus to what we believe is one of the important next forefronts in computer vision: To give machines the ability to anticipate and extrapolate beyond what is captured in a single frame by a camera or depth sensor. This anticipation capability is what allows humans to efficiently interact with their environment. The need for this ability is most prominently displayed in the behaviour of today's autonomous cars. One of their shortcomings is that they only interpret the current sensor state, which prevents them from anticipating events which would require an adaptation of their driving policy. The result is a lot of sudden breaks and non-human-like driving behaviour, which can provoke accidents or negatively impact the traffic flow. Therefore we first propose a task to spatially anticipate semantic labels outside the field of view of an image. The task is based on the Cityscapes dataset, where each image has been center cropped. The goal is to train an algorithm that predicts the semantic segmentation map in the area outside the cropped input region. Along with the task itself, we propose an efficient iterative approach based on 2D convolutional neural networks by designing a task adapted loss function. Afterwards, we switch to the 3D domain. In three dimensions the goal shifts from assigning pixel-wise labels towards the reconstruction of the full 3D scene using a grid of labeled voxels. Thereby one has to anticipate the semantics and geometry in the space that is occluded by the objects themselves from the viewpoint of an image or laser sensor. The task is known as 3D semantic scene completion and has recently caught a lot of attention. Here we propose two new approaches that advance the performance of existing 3D semantic scene completion baselines. The first one is a two stream approach where we leverage a multi-modal input consisting of images and Kinect depth measurements in an early fusion scheme. Moreover we propose a more memory efficient input embedding. The second approach to semantic scene completion leverages the power of the recently introduced generative adversarial networks (GANs). Here we construct a network architecture that follows the GAN principles and uses a discriminator network as an additional regularizer in the 3D-CNN training. With our proposed approaches in semantic scene completion we achieve a new state-of-the-art performance on two benchmark datasets. Finally we observe that one of the shortcomings in semantic scene completion is the lack of a realistic, large scale dataset. We therefore introduce the first real world dataset for semantic scene completion based on the KITTI odometry benchmark. By semantically annotating alls scans of a 10 Hz Velodyne laser scanner, driving through urban and countryside areas, we obtain data that is valuable for many tasks including semantic scene completion. Along with the data we explore the performance of current semantic scene completion models as well as models for semantic point cloud segmentation and motion segmentation. The results show that there is still a lot of space for improvement for either tasks so our dataset is a valuable contribution for future research into these directions
Towards Large-Scale Small Object Detection: Survey and Benchmarks
With the rise of deep convolutional neural networks, object detection has
achieved prominent advances in past years. However, such prosperity could not
camouflage the unsatisfactory situation of Small Object Detection (SOD), one of
the notoriously challenging tasks in computer vision, owing to the poor visual
appearance and noisy representation caused by the intrinsic structure of small
targets. In addition, large-scale dataset for benchmarking small object
detection methods remains a bottleneck. In this paper, we first conduct a
thorough review of small object detection. Then, to catalyze the development of
SOD, we construct two large-scale Small Object Detection dAtasets (SODA),
SODA-D and SODA-A, which focus on the Driving and Aerial scenarios
respectively. SODA-D includes 24828 high-quality traffic images and 278433
instances of nine categories. For SODA-A, we harvest 2513 high resolution
aerial images and annotate 872069 instances over nine classes. The proposed
datasets, as we know, are the first-ever attempt to large-scale benchmarks with
a vast collection of exhaustively annotated instances tailored for
multi-category SOD. Finally, we evaluate the performance of mainstream methods
on SODA. We expect the released benchmarks could facilitate the development of
SOD and spawn more breakthroughs in this field. Datasets and codes are
available at: \url{https://shaunyuan22.github.io/SODA}
Applications
Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
Deep learning & remote sensing : pushing the frontiers in image segmentation
Dissertação (Mestrado em Informática) — Universidade de BrasÃlia, Instituto de Ciências Exatas, Departamento de Ciência da Computação, BrasÃlia, 2022.A segmentação de imagens visa simplificar o entendimento de imagens digitais e métodos de
aprendizado profundo usando redes neurais convolucionais permitem a exploração de diferentes
tarefas (e.g., segmentação semântica, instância e panóptica). A segmentação semântica atribui
uma classe a cada pixel em uma imagem, a segmentação de instância classifica objetos a nÃvel
de pixel com um identificador exclusivo para cada alvo e a segmentação panóptica combina
instâncias com diferentes planos de fundo. Os dados de sensoriamento remoto são muito adequados para desenvolver novos algoritmos. No entanto, algumas particularidades impedem que o
sensoriamento remoto com imagens orbitais e aéreas cresça quando comparado à s imagens tradicionais (e.g., fotos de celulares): (1) as imagens são muito extensas, (2) apresenta caracterÃsticas
diferentes (e.g., número de canais e formato de imagem), (3) um grande número de etapas de préprocessamento e pós-processamento (e.g., extração de quadros e classificação de cenas grandes) e
(4) os softwares para rotulagem e treinamento de modelos não são compatÃveis. Esta dissertação
visa avançar nas três principais categorias de segmentação de imagens. Dentro do domÃnio de
segmentação de instâncias, propusemos três experimentos. Primeiro, aprimoramos a abordagem
de segmentação de instância baseada em caixa para classificar cenas grandes. Em segundo
lugar, criamos um método sem caixas delimitadoras para alcançar resultados de segmentação
de instâncias usando modelos de segmentação semântica em um cenário com objetos esparsos.
Terceiro, aprimoramos o método anterior para cenas aglomeradas e desenvolvemos o primeiro
estudo considerando aprendizado semissupervisionado usando sensoriamento remoto e dados
GIS. Em seguida, no domÃnio da segmentação panóptica, apresentamos o primeiro conjunto de
dados de segmentação panóptica de sensoriamento remoto e dispomos de uma metodologia para
conversão de dados GIS no formato COCO. Como nosso primeiro estudo considerou imagens
RGB, estendemos essa abordagem para dados multiespectrais. Por fim, melhoramos o método
box-free inicialmente projetado para segmentação de instâncias para a tarefa de segmentação
panóptica. Esta dissertação analisou vários métodos de segmentação e tipos de imagens, e as
soluções desenvolvidas permitem a exploração de novas tarefas , a simplificação da rotulagem
de dados e uma forma simplificada de obter previsões de instância e panópticas usando modelos
simples de segmentação semântica.Image segmentation aims to simplify the understanding of digital images. Deep learning-based
methods using convolutional neural networks have been game-changing, allowing the exploration
of different tasks (e.g., semantic, instance, and panoptic segmentation). Semantic segmentation
assigns a class to every pixel in an image, instance segmentation classifies objects at a pixel
level with a unique identifier for each target, and panoptic segmentation combines instancelevel predictions with different backgrounds. Remote sensing data largely benefits from those
methods, being very suitable for developing new DL algorithms and creating solutions using
top-view images. However, some peculiarities prevent remote sensing using orbital and aerial
imagery from growing when compared to traditional ground-level images (e.g., camera photos):
(1) The images are extensive, (2) it presents different characteristics (e.g., number of channels
and image format), (3) a high number of pre-processes and post-processes steps (e.g., extracting
patches and classifying large scenes), and (4) most open software for labeling and deep learning applications are not friendly to remote sensing due to the aforementioned reasons. This
dissertation aimed to improve all three main categories of image segmentation. Within the instance segmentation domain, we proposed three experiments. First, we enhanced the box-based
instance segmentation approach for classifying large scenes, allowing practical pipelines to be
implemented. Second, we created a bounding-box free method to reach instance segmentation
results by using semantic segmentation models in a scenario with sparse objects. Third, we
improved the previous method for crowded scenes and developed the first study considering
semi-supervised learning using remote sensing and GIS data. Subsequently, in the panoptic
segmentation domain, we presented the first remote sensing panoptic segmentation dataset containing fourteen classes and disposed of software and methodology for converting GIS data into
the panoptic segmentation format. Since our first study considered RGB images, we extended
our approach to multispectral data. Finally, we leveraged the box-free method initially designed
for instance segmentation to the panoptic segmentation task. This dissertation analyzed various
segmentation methods and image types, and the developed solutions enable the exploration of
new tasks (such as panoptic segmentation), the simplification of labeling data (using the proposed semi-supervised learning procedure), and a simplified way to obtain instance and panoptic
predictions using simple semantic segmentation models
Applications
Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
- …