1,563 research outputs found

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Change detection and landscape similarity comparison using computer vision methods

    Get PDF
    Human-induced disturbances of terrestrial and aquatic ecosystems continue at alarming rates. With the advent of both raw sensor and analysis-ready datasets, the need to monitor ecosystem disturbances is now more imperative than ever; yet the task is becoming increasingly complex with increasing sources and varieties of earth observation data. In this research, computer vision methods and tools are interrogated to understand their capability for comparing spatial patterns. A critical survey of literature provides evidence that computer vision methods are relatively robust to scale and highlights issues involved in parameterization of computer vision models for characterizing significant pattern information in a geographic context. Utilizing two widely used pattern indices to compare spatial patterns in simulated and real-world datasets revealed their potential to detect subtle changes in spatial patterns which would not otherwise be feasible using traditional pixel-level techniques. A texture-based CNN model was developed to extract spatially relevant information for landscape similarity comparison; the CNN feature maps proved to be effective in distinguishing agriculture landscapes from other landscape types (e.g., forest and mountainous landscapes). For real-world human disturbance monitoring, a U-Net CNN was developed and compared with a random forest model. Both modeling frameworks exhibit promising potential to map placer mining disturbance; however, random forests proved simple to train and deploy for placer mapping, while the U-Net may be used to augment RF as it is capable of reducing misclassification errors and will benefit from increasing availability of detailed training data

    Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

    Full text link
    Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication

    3D pose estimation of flying animals in multi-view video datasets

    Get PDF
    Flying animals such as bats, birds, and moths are actively studied by researchers wanting to better understand these animals’ behavior and flight characteristics. Towards this goal, multi-view videos of flying animals have been recorded both in lab- oratory conditions and natural habitats. The analysis of these videos has shifted over time from manual inspection by scientists to more automated and quantitative approaches based on computer vision algorithms. This thesis describes a study on the largely unexplored problem of 3D pose estimation of flying animals in multi-view video data. This problem has received little attention in the computer vision community where few flying animal datasets exist. Additionally, published solutions from researchers in the natural sciences have not taken full advantage of advancements in computer vision research. This thesis addresses this gap by proposing three different approaches for 3D pose estimation of flying animals in multi-view video datasets, which evolve from successful pose estimation paradigms used in computer vision. The first approach models the appearance of a flying animal with a synthetic 3D graphics model and then uses a Markov Random Field to model 3D pose estimation over time as a single optimization problem. The second approach builds on the success of Pictorial Structures models and further improves them for the case where only a sparse set of landmarks are annotated in training data. The proposed approach first discovers parts from regions of the training images that are not annotated. The discovered parts are then used to generate more accurate appearance likelihood terms which in turn produce more accurate landmark localizations. The third approach takes advantage of the success of deep learning models and adapts existing deep architectures to perform landmark localization. Both the second and third approaches perform 3D pose estimation by first obtaining accurate localization of key landmarks in individual views, and then using calibrated cameras and camera geometry to reconstruct the 3D position of key landmarks. This thesis shows that the proposed algorithms generate first-of-a-kind and leading results on real world datasets of bats and moths, respectively. Furthermore, a variety of resources are made freely available to the public to further strengthen the connection between research communities

    Training deep retrieval models with noisy datasets

    Get PDF
    In this thesis we study loss functions that allow to train Convolutional Neural Networks (CNNs) under noisy datasets for the particular task of Content- Based Image Retrieval (CBIR). In particular, we propose two novel losses to fit models that generate global image representations. First, a Soft-Matching (SM) loss, exploiting both image content and meta data, is used to specialized general CNNs to particular cities or regions using weakly annotated datasets. Second, a Bag Exponential (BE) loss inspired by the Multiple Instance Learning (MIL) framework is employed to train CNNs for CBIR under noisy datasets. The first part of the thesis introduces a novel training framework that, relying on image content and meta data, learns location-adapted deep models that provide fine-tuned image descriptors for specific visual contents. Our networks, which start from a baseline model originally learned for a different task, are specialized using a custom pairwise loss function, our proposed SM loss, that uses weak labels based on image content and meta data. The experimental results show that the proposed location-adapted CNNs achieve an improvement of up to a 55% over the baseline networks on a landmark discovery task. This implies that the models successfully learn the visual clues and peculiarities of the region for which they are trained, and generate image descriptors that are better location-adapted. In addition, for those landmarks that are not present on the training set or even other cities, our proposed models perform at least as well as the baseline network, which indicates a good resilience against overfitting. The second part of the thesis introduces the BE Loss function to train CNNs for image retrieval borrowing inspiration from the MIL framework. The loss combines the use of an exponential function acting as a soft margin, and a MILbased mechanism working with bags of positive and negative pairs of images. The method allows to train deep retrieval networks under noisy datasets, by weighing the influence of the different samples at loss level, which increases the performance of the generated global descriptors. The rationale behind the improvement is that we are handling noise in an end-to-end manner and, therefore, avoiding its negative influence as well as the unintentional biases due to fixed pre-processing cleaning procedures. In addition, our method is general enough to suit other scenarios requiring different weights for the training instances (e.g. boosting the influence of hard positives during training). The proposed bag exponential function can bee seen as a back door to guide the learning process according to a certain objective in a end-to-end manner, allowing the model to approach such an objective smoothly and progressively. Our results show that our loss allows CNN-based retrieval systems to be trained with noisy training sets and achieve state-of-the-art performance. Furthermore, we have found that it is better to use training sets that are highly correlated with the final task, even if they are noisy, than training with a clean set that is only weakly related with the topic at hand. From our point of view, this result represents a big leap in the applicability of retrieval systems and help to reduce the effort needed to set-up new CBIR applications: e.g. by allowing a fast automatic generation of noisy training datasets and then using our bag exponential loss to deal with noise. Moreover, we also consider that this result opens a new line of research for CNN-based image retrieval: let the models decide not only on the best features to solve the task but also on the most relevant samples to do it.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Luis Salgado Álvarez de Sotomayor.- Secretario: Pablos Martínez Olmos.- Vocal: Ernest Valveny Llobe

    Personal Identification Using Ultrawideband Radar Measurement of Walking and Sitting Motions and a Convolutional Neural Network

    Full text link
    This study proposes a personal identification technique that applies machine learning with a two-layered convolutional neural network to spectrogram images obtained from radar echoes of a target person in motion. The walking and sitting motions of six participants were measured using an ultrawideband radar system. Time-frequency analysis was applied to the radar signal to generate spectrogram images containing the micro-Doppler components associated with limb movements. A convolutional neural network was trained using the spectrogram images with personal labels to achieve radar-based personal identification. The personal identification accuracies were evaluated experimentally to demonstrate the effectiveness of the proposed technique.Comment: 9 pages, 7 figures, and 3 table
    corecore