1,563 research outputs found
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
Change detection and landscape similarity comparison using computer vision methods
Human-induced disturbances of terrestrial and aquatic ecosystems continue at alarming rates. With the advent of both raw sensor and analysis-ready datasets, the need to monitor ecosystem disturbances is now more imperative than ever; yet the task is becoming increasingly complex with increasing sources and varieties of earth observation data. In this research, computer vision methods and tools are interrogated to understand their capability for comparing spatial patterns. A critical survey of literature provides evidence that computer vision methods are relatively robust to scale and highlights issues involved in parameterization of computer vision models for characterizing significant pattern information in a geographic context. Utilizing two widely used pattern indices to compare spatial patterns in simulated and real-world datasets revealed their potential to detect subtle changes in spatial patterns which would not otherwise be feasible using traditional pixel-level techniques. A texture-based CNN model was developed to extract spatially relevant information for landscape similarity comparison; the CNN feature maps proved to be effective in distinguishing agriculture landscapes from other landscape types (e.g., forest and mountainous landscapes). For real-world human disturbance monitoring, a U-Net CNN was developed and compared with a random forest model. Both modeling frameworks exhibit promising potential to map placer mining disturbance; however, random forests proved simple to train and deploy for placer mapping, while the U-Net may be used to augment RF as it is capable of reducing misclassification errors and will benefit from increasing availability of detailed training data
Unmasking Clever Hans Predictors and Assessing What Machines Really Learn
Current learning machines have successfully solved hard application problems,
reaching high accuracy and displaying seemingly "intelligent" behavior. Here we
apply recent techniques for explaining decisions of state-of-the-art learning
machines and analyze various tasks from computer vision and arcade games. This
showcases a spectrum of problem-solving behaviors ranging from naive and
short-sighted, to well-informed and strategic. We observe that standard
performance evaluation metrics can be oblivious to distinguishing these diverse
problem solving behaviors. Furthermore, we propose our semi-automated Spectral
Relevance Analysis that provides a practically effective way of characterizing
and validating the behavior of nonlinear learning machines. This helps to
assess whether a learned model indeed delivers reliably for the problem that it
was conceived for. Furthermore, our work intends to add a voice of caution to
the ongoing excitement about machine intelligence and pledges to evaluate and
judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication
3D pose estimation of flying animals in multi-view video datasets
Flying animals such as bats, birds, and moths are actively studied by researchers wanting to better understand these animals’ behavior and flight characteristics. Towards this goal, multi-view videos of flying animals have been recorded both in lab- oratory conditions and natural habitats. The analysis of these videos has shifted over time from manual inspection by scientists to more automated and quantitative approaches based on computer vision algorithms.
This thesis describes a study on the largely unexplored problem of 3D pose estimation of flying animals in multi-view video data. This problem has received little attention in the computer vision community where few flying animal datasets exist. Additionally, published solutions from researchers in the natural sciences have not taken full advantage of advancements in computer vision research. This thesis addresses this gap by proposing three different approaches for 3D pose estimation of flying animals in multi-view video datasets, which evolve from successful pose estimation paradigms used in computer vision. The first approach models the appearance of a flying animal with a synthetic 3D graphics model and then uses a Markov Random Field to model 3D pose estimation over time as a single optimization problem. The second approach builds on the success of Pictorial Structures models and further improves them for the case where only a sparse set of landmarks are annotated in training data. The proposed approach first discovers parts from regions of the training images that are not annotated. The discovered parts are then used to generate more accurate appearance likelihood terms which in turn produce more accurate landmark localizations. The third approach takes advantage of the success of deep learning models and adapts existing deep architectures to perform landmark localization. Both the second and third approaches perform 3D pose estimation by first obtaining accurate localization of key landmarks in individual views, and then using calibrated cameras and camera geometry to reconstruct the 3D position of key landmarks.
This thesis shows that the proposed algorithms generate first-of-a-kind and leading results on real world datasets of bats and moths, respectively. Furthermore, a variety of resources are made freely available to the public to further strengthen the connection between research communities
Training deep retrieval models with noisy datasets
In this thesis we study loss functions that allow to train Convolutional Neural
Networks (CNNs) under noisy datasets for the particular task of Content-
Based Image Retrieval (CBIR). In particular, we propose two novel losses to fit
models that generate global image representations. First, a Soft-Matching (SM)
loss, exploiting both image content and meta data, is used to specialized general
CNNs to particular cities or regions using weakly annotated datasets. Second,
a Bag Exponential (BE) loss inspired by the Multiple Instance Learning (MIL)
framework is employed to train CNNs for CBIR under noisy datasets.
The first part of the thesis introduces a novel training framework that, relying
on image content and meta data, learns location-adapted deep models that
provide fine-tuned image descriptors for specific visual contents. Our networks,
which start from a baseline model originally learned for a different task, are specialized
using a custom pairwise loss function, our proposed SM loss, that uses
weak labels based on image content and meta data.
The experimental results show that the proposed location-adapted CNNs
achieve an improvement of up to a 55% over the baseline networks on a landmark
discovery task. This implies that the models successfully learn the visual
clues and peculiarities of the region for which they are trained, and generate
image descriptors that are better location-adapted. In addition, for those landmarks
that are not present on the training set or even other cities, our proposed
models perform at least as well as the baseline network, which indicates a good
resilience against overfitting.
The second part of the thesis introduces the BE Loss function to train CNNs
for image retrieval borrowing inspiration from the MIL framework. The loss
combines the use of an exponential function acting as a soft margin, and a MILbased
mechanism working with bags of positive and negative pairs of images.
The method allows to train deep retrieval networks under noisy datasets, by
weighing the influence of the different samples at loss level, which increases the
performance of the generated global descriptors. The rationale behind the improvement
is that we are handling noise in an end-to-end manner and, therefore,
avoiding its negative influence as well as the unintentional biases due to fixed
pre-processing cleaning procedures. In addition, our method is general enough
to suit other scenarios requiring different weights for the training instances (e.g.
boosting the influence of hard positives during training). The proposed bag exponential
function can bee seen as a back door to guide the learning process
according to a certain objective in a end-to-end manner, allowing the model to
approach such an objective smoothly and progressively.
Our results show that our loss allows CNN-based retrieval systems to be
trained with noisy training sets and achieve state-of-the-art performance. Furthermore,
we have found that it is better to use training sets that are highly
correlated with the final task, even if they are noisy, than training with a clean set that is only weakly related with the topic at hand. From our point of view,
this result represents a big leap in the applicability of retrieval systems and help
to reduce the effort needed to set-up new CBIR applications: e.g. by allowing
a fast automatic generation of noisy training datasets and then using our bag
exponential loss to deal with noise. Moreover, we also consider that this result
opens a new line of research for CNN-based image retrieval: let the models decide
not only on the best features to solve the task but also on the most relevant
samples to do it.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Luis Salgado Álvarez de Sotomayor.- Secretario: Pablos Martínez Olmos.- Vocal: Ernest Valveny Llobe
Personal Identification Using Ultrawideband Radar Measurement of Walking and Sitting Motions and a Convolutional Neural Network
This study proposes a personal identification technique that applies machine
learning with a two-layered convolutional neural network to spectrogram images
obtained from radar echoes of a target person in motion. The walking and
sitting motions of six participants were measured using an ultrawideband radar
system. Time-frequency analysis was applied to the radar signal to generate
spectrogram images containing the micro-Doppler components associated with limb
movements. A convolutional neural network was trained using the spectrogram
images with personal labels to achieve radar-based personal identification. The
personal identification accuracies were evaluated experimentally to demonstrate
the effectiveness of the proposed technique.Comment: 9 pages, 7 figures, and 3 table
- …