12 research outputs found

    A NEW SATELLITE IMAGERY STEREO PIPELINE DESIGNED FOR SCALABILITY, ROBUSTNESS AND PERFORMANCE

    Get PDF
    Abstract. This paper presents a new Multiview Stereo Pipeline (MVS), called CARS, dedicated to satellite imagery. This pipeline is intended for massive Digital Surface Model (DSM) production and has therefore been designed to maximize scalability robustness and performance. Those two properties have driven the design of the workflow as well as the choice of algorithms and parameter trends, making our pipeline unique with respect to existing solutions in literature. This paper intends to serve as a reference paper for the pipeline implementation, and therefore provides a detailed description of algorithms and workflow. It also demonstrates the pipeline robustness and stability in several use cases, and compares its accuracy with the state-of-the-art pipelines on a reference dataset. Document type: Articl

    From light rays to 3D models

    Get PDF

    Combining Features and Semantics for Low-level Computer Vision

    Get PDF
    Visual perception of depth and motion plays a significant role in understanding and navigating the environment. Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving. The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information. Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions. Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects. Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem Verständnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die Schätzung der Bewegung von Videokameras sind von größter Bedeutung für Anwendungen, wie das autonome Fahren. Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfür sind reflektierende und texturlose Oberflächen oder große Bewegungen, bei denen herkömmliche lokale Methoden häufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, große Verzerrungen und schwierige Lichtverhältnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern. Für die binokulare Stereo Schätzung schlagen wir zuallererst vor zusammenhängende Bereiche mit objektklassen-spezifischen Disparitäts Vorschlägen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spärlichen Disparitätsschätzung und semantischen Segmentierung des Bildes erhalten. Die Disparitäts Vorschläge kodieren die Tatsache, dass die Gegenstände bestimmter Kategorien nicht willkürlich geformt sind, sondern typischerweise regelmäßige Strukturen aufweisen. Wir integrieren sie für die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen. Zweitens nutzen wir für die 3D-Rekonstruktion die Tatsache, dass mit der Größe der rekonstruierten Fläche auch die Wahrscheinlichkeit steigt, Objekte von ähnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders für Szenen im Freien, in denen Gebäude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber ähnlichkeit in der Form aufweisen. Wir nutzen diese ähnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, während fehlende Flächen vervollständigt werden, da Objekte ähnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstädtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen Abhängigkeiten zwischen Objekten. Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, präsentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusätzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein großes rezeptives Feld besitzt. Für das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm für eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes für das Feature Matching

    Towards Efficient 3D Reconstructions from High-Resolution Satellite Imagery

    Get PDF
    Recent years have witnessed the rapid growth of commercial satellite imagery. Compared with other imaging products, such as aerial or streetview imagery, modern satellite images are captured at high resolution and with multiple spectral bands, thus provide unique viewing angles, global coverage, and frequent updates of the Earth surfaces. With automated processing and intelligent analysis algorithms, satellite images can enable global-scale 3D modeling applications. This dissertation explores computer vision algorithms to reconstruct 3D models from satellite images at different levels: geometric, semantic, and parametric reconstructions. However, reconstructing satellite imagery is particularly challenging for the following reasons: 1) Satellite images typically contain an enormous amount of raw pixels. Efficient algorithms are needed to minimize the substantial computational burden. 2) The ground sampling distances of satellite images are comparatively low. Visual entities, such as buildings, appear visually small and cluttered, thus posing difficulties for 3D modeling. 3) Satellite images usually have complex camera models and inaccurate vendor-provided camera calibrations. Rational polynomial coefficients (RPC) camera models, although widely used, need to be appropriately handled to ensure high-quality reconstructions. To obtain geometric reconstructions efficiently, we propose an edge-aware interpolation-based algorithm to obtain 3D point clouds from satellite image pairs. Initial 2D pixel matches are first established and triangulated to compensate the RPC calibration errors. Noisy dense correspondences can then be estimated by interpolating the inlier matches in an edge-aware manner. After refining the correspondence map with a fast bilateral solver, we can obtain dense 3D point clouds via triangulation. Pixel-wise semantic classification results for satellite images are usually noisy due to the negligence of spatial neighborhood information. Thus, we propose to aggregate multiple corresponding observations of the same 3D point to obtain high-quality semantic models. Instead of just leveraging geometric reconstructions to provide such correspondences, we formulate geometric modeling and semantic reasoning in a joint Markov Random Field (MRF) model. Our experiments show that both tasks can benefit from the joint inference. Finally, we propose a novel deep learning based approach to perform single-view parametric reconstructions from satellite imagery. By parametrizing buildings as 3D cuboids, our method simultaneously localizes building instances visible in the image and estimates their corresponding cuboid models. Aerial LiDAR and vectorized GIS maps are utilized as supervision. Our network upsamples CNN features to detect small but cluttered building instances. In addition, we estimate building contours through a separate fully convolutional network to avoid overlapping building cuboids.Doctor of Philosoph

    The design and implementation of a purely digital stereo-photogrammetric system on the IBM 3090 multi-user mainframe computer

    Get PDF
    This thesis is concerned with an investigation into the possibilities of implementing various aspects of a purely digital stereo-photogrammetric (DSP) system on the IBM 3090 150E mainframe multi-user computer. The main aspects discussed within the context of this thesis are:-i) Mathematical modelling of the process of formation of digital images in the space and frequency domains.ii) Experiments on improving the pictorial quality of digital aerial photos using Inverse and Wiener filters.iii) Devising and implementing an approach for the automatic sub-pixel measurement of cross-type fiducial marks for the inner orientation, using the Gradient operator and image modelling least squares (IML) approach.iv) Devising and implementing a method for the digital rectification of overlapping aerial photos and the formation of the stereo-model.v) Design and implementation of a digital stereo-photogrammetric system (DSP) and the generation of a DTM using visual measurement.vi) Investigating the feasibility of stereo-viewing of binary images and the possibility of performing measurements on such images.vii) Implementing a method for the automatic generation of a DTM using a one-dimensional image correlation along epipolar lines and experimentally optimizing the size of the correlation window.viii) Assessment of the accuracy of the DTM data generated both by the DSP and the automatic correlation method.ix) Vectorization of the rectification and correlation programs to achieve higher speed-up factors in the computational process

    Extraction of buildings from high-resolution satellite data and airborne Lidar

    Get PDF
    Automatic building extraction is a difficult object recognition problem due to a high complexity of the scene content and the object representation. There is a dilemma to select appropriate building models to be reconstructed; the models have to be generic in order to represent a variety of building shape, whereas they also have to be specific to differentiate buildings from other objects in the scene. Therefore, a scientific challenge of building extraction lies in constructing a framework for modelling building objects with appropriate balance between generic and specific models. This thesis investigates a synergy of IKONOS satellite imagery and airborne LIDAR data, which have recently emerged as powerful remote sensing tools, and aims to develop an automatic system, which delineates building outlines with more complex shape, but by less use of geometric constraints. The method described in this thesis is a two step procedure: building detection and building description. A method of automatic building detection that can separate individual buildings from surrounding features is presented. The process is realized in a hierarchical strategy, where terrain, trees, and building objects are sequentially detected. Major research efforts are made on the development of a LIDAR filtering technique, which automatically detects terrain surfaces from a cloud of 3D laser points. The thesis also proposes a method of building description to automatically reconstruct building boundaries. A building object is generally represented as a mosaic of convex polygons. The first stage is to generate polygonal cues by a recursive intersection of both datadriven and model-driven linear features extracted from IKONOS imagery and LIDAR data. The second stage is to collect relevant polygons comprising the building object and to merge them for reconstructing the building outlines. The developed LIDAR filter was tested in a range of different landforms, and showed good results to meet most of the requirements of DTM generation and building detection. Also, the implemented building extraction system was able to successfully reconstruct the building outlines, and the accuracy of the building extraction is good enough for mapping purposes

    Extraction of buildings from high-resolution satellite data and airborne LIDAR

    Get PDF
    Automatic building extraction is a difficult object recognition problem due to a high complexity of the scene content and the object representation. There is a dilemma to select appropriate building models to be reconstructed; the models have to be generic in order to represent a variety of building shape, whereas they also have to be specific to differentiate buildings from other objects in the scene. Therefore, a scientific challenge of building extraction lies in constructing a framework for modelling building objects with appropriate balance between generic and specific models. This thesis investigates a synergy of IKONOS satellite imagery and airborne LIDAR data, which have recently emerged as powerful remote sensing tools, and aims to develop an automatic system, which delineates building outlines with more complex shape, but by less use of geometric constraints. The method described in this thesis is a two step procedure: building detection and building description. A method of automatic building detection that can separate individual buildings from surrounding features is presented. The process is realized in a hierarchical strategy, where terrain, trees, and building objects are sequentially detected. Major research efforts are made on the development of a LIDAR filtering technique, which automatically detects terrain surfaces from a cloud of 3D laser points. The thesis also proposes a method of building description to automatically reconstruct building boundaries. A building object is generally represented as a mosaic of convex polygons. The first stage is to generate polygonal cues by a recursive intersection of both datadriven and model-driven linear features extracted from IKONOS imagery and LIDAR data. The second stage is to collect relevant polygons comprising the building object and to merge them for reconstructing the building outlines. The developed LIDAR filter was tested in a range of different landforms, and showed good results to meet most of the requirements of DTM generation and building detection. Also, the implemented building extraction system was able to successfully reconstruct the building outlines, and the accuracy of the building extraction is good enough for mapping purposes.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Proceedings of the NASA Workshop on Registration and Rectification

    Get PDF
    Issues associated with the registration and rectification of remotely sensed data. Near and long range applications research tasks and some medium range technology augmentation research areas are recommended. Image sharpness, feature extraction, inter-image mapping, error analysis, and verification methods are addressed
    corecore