1,014 research outputs found

    Deep convolutional regression modelling for forest parameter retrieval

    Get PDF
    Accurate forest monitoring is crucial as forests are major global carbon sinks. Additionally, accurate prediction of forest parameters, such as forest biomass and stem volume (SV), has economic importance. Therefore, the development of regression models for forest parameter retrieval is essential. Existing forest parameter estimation methods use regression models that establish pixel-wise relationships between ground reference data and corresponding pixels in remote sensing (RS) images. However, these models often overlook spatial contextual relationships among neighbouring pixels, limiting the potential for improved forest monitoring. The emergence of deep convolutional neural networks (CNNs) provides opportunities for enhanced forest parameter retrieval through their convolutional filters that allow for contextual modelling. However, utilising deep CNNs for regression presents its challenges. One significant challenge is that the training of CNNs typically requires continuous data layers for both predictor and response variables. While RS data is continuous, the ground reference data is sparse and scattered across large areas due to the challenges and costs associated with in situ data collection. This thesis tackles challenges related to using CNNs for regression by introducing novel deep learning-based solutions across diverse forest types and parameters. To address the sparsity of available reference data, RS-derived prediction maps can be used as auxiliary data to train the CNN-based regression models. This is addressed through two different approaches. Although these prediction maps offer greater spatial coverage than the original ground reference data, they do not ensure spatially continuous prediction target data. This work proposes a novel methodology that enables CNN-based regression models to handle this diversity. Efficient CNN architectures for the regression task are developed by investigating relevant learning objectives, including a new frequency-aware one. To enable large-scale and cost-effective regression modelling of forests, this thesis suggests utilising C-band synthetic aperture radar SAR data as regressor input. Results demonstrate the substantial potential of C-band SAR-based convolutional regression models for forest parameter retrieval

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Deep learning for image-based liver analysis — A comprehensive review focusing on malignant lesions

    Get PDF
    Deep learning-based methods, in particular, convolutional neural networks and fully convolutional networks are now widely used in the medical image analysis domain. The scope of this review focuses on the analysis using deep learning of focal liver lesions, with a special interest in hepatocellular carcinoma and metastatic cancer; and structures like the parenchyma or the vascular system. Here, we address several neural network architectures used for analyzing the anatomical structures and lesions in the liver from various imaging modalities such as computed tomography, magnetic resonance imaging and ultrasound. Image analysis tasks like segmentation, object detection and classification for the liver, liver vessels and liver lesions are discussed. Based on the qualitative search, 91 papers were filtered out for the survey, including journal publications and conference proceedings. The papers reviewed in this work are grouped into eight categories based on the methodologies used. By comparing the evaluation metrics, hybrid models performed better for both the liver and the lesion segmentation tasks, ensemble classifiers performed better for the vessel segmentation tasks and combined approach performed better for both the lesion classification and detection tasks. The performance was measured based on the Dice score for the segmentation, and accuracy for the classification and detection tasks, which are the most commonly used metrics.publishedVersio

    [pt] SEGMENTAÇÃO SEMÂNTICA DE CONJUNTO ABERTO APLICADA A IMAGENS DE SENSORIAMENTO REMOTO

    Get PDF

    Combining Features and Semantics for Low-level Computer Vision

    Get PDF
    Visual perception of depth and motion plays a significant role in understanding and navigating the environment. Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving. The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information. Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions. Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects. Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem Verständnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die Schätzung der Bewegung von Videokameras sind von größter Bedeutung für Anwendungen, wie das autonome Fahren. Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfür sind reflektierende und texturlose Oberflächen oder große Bewegungen, bei denen herkömmliche lokale Methoden häufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, große Verzerrungen und schwierige Lichtverhältnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern. Für die binokulare Stereo Schätzung schlagen wir zuallererst vor zusammenhängende Bereiche mit objektklassen-spezifischen Disparitäts Vorschlägen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spärlichen Disparitätsschätzung und semantischen Segmentierung des Bildes erhalten. Die Disparitäts Vorschläge kodieren die Tatsache, dass die Gegenstände bestimmter Kategorien nicht willkürlich geformt sind, sondern typischerweise regelmäßige Strukturen aufweisen. Wir integrieren sie für die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen. Zweitens nutzen wir für die 3D-Rekonstruktion die Tatsache, dass mit der Größe der rekonstruierten Fläche auch die Wahrscheinlichkeit steigt, Objekte von ähnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders für Szenen im Freien, in denen Gebäude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber ähnlichkeit in der Form aufweisen. Wir nutzen diese ähnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, während fehlende Flächen vervollständigt werden, da Objekte ähnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstädtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen Abhängigkeiten zwischen Objekten. Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, präsentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusätzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein großes rezeptives Feld besitzt. Für das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm für eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes für das Feature Matching
    corecore