155 research outputs found

    Visual Odometry and Sparse Scene Reconstruction for UAVs with a Multi-Fisheye Camera System

    Get PDF
    Autonomously operating UAVs demand a fast localization for navigation, to actively explore unknown areas and to create maps. For pose estimation, many UAV systems make use of a combination of GPS receivers and inertial sensor units (IMU). However, GPS signal coverage may go down occasionally, especially in the close vicinity of objects, and precise IMUs are too heavy to be carried by lightweight UAVs. This and the high cost of high quality IMU motivate the use of inexpensive vision based sensors for localization using visual odometry or visual SLAM (simultaneous localization and mapping) techniques. The first contribution of this thesis is a more general approach to bundle adjustment with an extended version of the projective coplanarity equation which enables us to make use of omnidirectional multi-camera systems which may consist of fisheye cameras that can capture a large field of view with one shot. We use ray directions as observations instead of image points which is why our approach does not rely on a specific projection model assuming a central projection. In addition, our approach allows the integration and estimation of points at infinity, which classical bundle adjustments are not capable of. We show that the integration of far or infinitely far points stabilizes the estimation of the rotation angles of the camera poses. In its second contribution, we employ this approach to bundle adjustment in a highly integrated system for incremental pose estimation and mapping on light-weight UAVs. Based on the image sequences of a multi-camera system our system makes use of tracked feature points to incrementally build a sparse map and incrementally refines this map using the iSAM2 algorithm. Our system is able to optionally integrate GPS information on the level of carrier phase observations even in underconstrained situations, e.g. if only two satellites are visible, for georeferenced pose estimation. This way, we are able to use all available information in underconstrained GPS situations to keep the mapped 3D model accurate and georeferenced. In its third contribution, we present an approach for re-using existing methods for dense stereo matching with fisheye cameras, which has the advantage that highly optimized existing methods can be applied as a black-box without modifications even with cameras that have field of view of more than 180 deg. We provide a detailed accuracy analysis of the obtained dense stereo results. The accuracy analysis shows the growing uncertainty of observed image points of fisheye cameras due to increasing blur towards the image border. Core of the contribution is a rigorous variance component estimation which allows to estimate the variance of the observed disparities at an image point as a function of the distance of that point to the principal point. We show that this improved stochastic model provides a more realistic prediction of the uncertainty of the triangulated 3D points.Autonom operierende UAVs benötigen eine schnelle Lokalisierung zur Navigation, zur Exploration unbekannter Umgebungen und zur Kartierung. Zur Posenbestimmung verwenden viele UAV-Systeme eine Kombination aus GPS-EmpfĂ€ngern und Inertial-Messeinheiten (IMU). Die VerfĂŒgbarkeit von GPS-Signalen ist jedoch nicht ĂŒberall gewĂ€hrleistet, insbesondere in der NĂ€he abschattender Objekte, und prĂ€zise IMUs sind fĂŒr leichtgewichtige UAVs zu schwer. Auch die hohen Kosten qualitativ hochwertiger IMUs motivieren den Einsatz von kostengĂŒnstigen bildgebenden Sensoren zur Lokalisierung mittels visueller Odometrie oder SLAM-Techniken zur simultanen Lokalisierung und Kartierung. Im ersten wissenschaftlichen Beitrag dieser Arbeit entwickeln wir einen allgemeineren Ansatz fĂŒr die BĂŒndelausgleichung mit einem erweiterten Modell fĂŒr die projektive KollinearitĂ€tsgleichung, sodass auch omnidirektionale Multikamerasysteme verwendet werden können, welche beispielsweise bestehend aus Fisheyekameras mit einer Aufnahme einen großen Sichtbereich abdecken. Durch die Integration von Strahlrichtungen als Beobachtungen ist unser Ansatz nicht von einem kameraspezifischen Abbildungsmodell abhĂ€ngig solange dieses der Zentralprojektion folgt. Zudem erlaubt unser Ansatz die Integration und SchĂ€tzung von unendlich fernen Punkten, was bei klassischen BĂŒndelausgleichungen nicht möglich ist. Wir zeigen, dass durch die Integration weit entfernter und unendlich ferner Punkte die SchĂ€tzung der Rotationswinkel der Kameraposen stabilisiert werden kann. Im zweiten Beitrag verwenden wir diesen entwickelten Ansatz zur BĂŒndelausgleichung fĂŒr ein System zur inkrementellen PosenschĂ€tzung und dĂŒnnbesetzten Kartierung auf einem leichtgewichtigen UAV. Basierend auf den Bildsequenzen eines Mulitkamerasystems baut unser System mittels verfolgter markanter Bildpunkte inkrementell eine dĂŒnnbesetzte Karte auf und verfeinert diese inkrementell mittels des iSAM2-Algorithmus. Unser System ist in der Lage optional auch GPS Informationen auf dem Level von GPS-TrĂ€gerphasen zu integrieren, wodurch sogar in unterbestimmten Situation - beispielsweise bei nur zwei verfĂŒgbaren Satelliten - diese Informationen zur georeferenzierten PosenschĂ€tzung verwendet werden können. Im dritten Beitrag stellen wir einen Ansatz zur Verwendung existierender Methoden fĂŒr dichtes Stereomatching mit Fisheyekameras vor, sodass hoch optimierte existierende Methoden als Black Box ohne Modifzierungen sogar mit Kameras mit einem Gesichtsfeld von mehr als 180 Grad verwendet werden können. Wir stellen eine detaillierte Genauigkeitsanalyse basierend auf dem Ergebnis des dichten Stereomatchings dar. Die Genauigkeitsanalyse zeigt, wie stark die Genauigkeit beobachteter Bildpunkte bei Fisheyekameras zum Bildrand aufgrund von zunehmender UnschĂ€rfe abnimmt. Das KernstĂŒck dieses Beitrags ist eine VarianzkomponentenschĂ€tzung, welche die SchĂ€tzung der Varianz der beobachteten DisparitĂ€ten an einem Bildpunkt als Funktion von der Distanz dieses Punktes zum Hauptpunkt des Bildes ermöglicht. Wir zeigen, dass dieses verbesserte stochastische Modell eine realistischere PrĂ€diktion der Genauigkeiten der 3D Punkte ermöglicht

    3D Scene Geometry Estimation from 360∘^\circ Imagery: A Survey

    Full text link
    This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360∘^\circ, spherical or panoramic) images and videos. We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data. The classical stereo matching is then revised on the spherical domain, where methodologies for detecting and describing sparse and dense features become crucial. The stereo matching concepts are then extrapolated for multiple view camera setups, categorizing them among light fields, multi-view stereo, and structure from motion (or visual simultaneous localization and mapping). We also compile and discuss commonly adopted datasets and figures of merit indicated for each purpose and list recent results for completeness. We conclude this paper by pointing out current and future trends.Comment: Published in ACM Computing Survey

    Real-Time High-Resolution Multiple-Camera Depth Map Estimation Hardware and Its Applications

    Get PDF
    Depth information is used in a variety of 3D based signal processing applications such as autonomous navigation of robots and driving systems, object detection and tracking, computer games, 3D television, and free view-point synthesis. These applications require high accuracy and speed performances for depth estimation. Depth maps can be generated using disparity estimation methods, which are obtained from stereo matching between multiple images. The computational complexity of disparity estimation algorithms and the need of large size and bandwidth for the external and internal memory make the real-time processing of disparity estimation challenging, especially for high resolution images. This thesis proposes a high-resolution high-quality multiple-camera depth map estimation hardware. The proposed hardware is verified in real-time with a complete system from the initial image capture to the display and applications. The details of the complete system are presented. The proposed binocular and trinocular adaptive window size disparity estimation algorithms are carefully designed to be suitable to real-time hardware implementation by allowing efficient parallel and local processing while providing high-quality results. The proposed binocular and trinocular disparity estimation hardware implementations can process 55 frames per second on a Virtex-7 FPGA at a 1024 x 768 XGA video resolution for a 128 pixel disparity range. The proposed binocular disparity estimation hardware provides best quality compared to existing real-time high-resolution disparity estimation hardware implementations. A novel compressed-look up table based rectification algorithm and its real-time hardware implementation are presented. The low-complexity decompression process of the rectification hardware utilizes a negligible amount of LUT and DFF resources of the FPGA while it does not require the existence of external memory. The first real-time high-resolution free viewpoint synthesis hardware utilizing three-camera disparity estimation is presented. The proposed hardware generates high-quality free viewpoint video in real-time for any horizontally aligned arbitrary camera positioned between the leftmost and rightmost physical cameras. The full embedded system of the depth estimation is explained. The presented embedded system transfers disparity results together with synchronized RGB pixels to the PC for application development. Several real-time applications are developed on a PC using the obtained RGB+D results. The implemented depth estimation based real-time software applications are: depth based image thresholding, speed and distance measurement, head-hands-shoulders tracking, virtual mouse using hand tracking and face tracking integrated with free viewpoint synthesis. The proposed binocular disparity estimation hardware is implemented in an ASIC. The ASIC implementation of disparity estimation imposes additional constraints with respect to the FPGA implementation. These restrictions, their implemented efficient solutions and the ASIC implementation results are presented. In addition, a very high-resolution (82.3 MP) 360°x90° omnidirectional multiple camera system is proposed. The hemispherical camera system is able to view the target locations close to horizontal plane with more than two cameras. Therefore, it can be used in high-resolution 360° depth map estimation and its applications in the future

    EVALUATION OF STEREO ALGORITHMS FOR OBSTACLE DETECTION WITH FISHEYE LENSES

    Get PDF

    Multi-task near-field perception for autonomous driving using surround-view fisheye cameras

    Get PDF
    Die Bildung der Augen fĂŒhrte zum Urknall der Evolution. Die Dynamik Ă€nderte sich von einem primitiven Organismus, der auf den Kontakt mit der Nahrung wartete, zu einem Organismus, der durch visuelle Sensoren gesucht wurde. Das menschliche Auge ist eine der raffiniertesten Entwicklungen der Evolution, aber es hat immer noch MĂ€ngel. Der Mensch hat ĂŒber Millionen von Jahren einen biologischen Wahrnehmungsalgorithmus entwickelt, der in der Lage ist, Autos zu fahren, Maschinen zu bedienen, Flugzeuge zu steuern und Schiffe zu navigieren. Die Automatisierung dieser FĂ€higkeiten fĂŒr Computer ist entscheidend fĂŒr verschiedene Anwendungen, darunter selbstfahrende Autos, Augmented RealitĂ€t und architektonische Vermessung. Die visuelle Nahfeldwahrnehmung im Kontext von selbstfahrenden Autos kann die Umgebung in einem Bereich von 0 - 10 Metern und 360° Abdeckung um das Fahrzeug herum wahrnehmen. Sie ist eine entscheidende Entscheidungskomponente bei der Entwicklung eines sichereren automatisierten Fahrens. JĂŒngste Fortschritte im Bereich Computer Vision und Deep Learning in Verbindung mit hochwertigen Sensoren wie Kameras und LiDARs haben ausgereifte Lösungen fĂŒr die visuelle Wahrnehmung hervorgebracht. Bisher stand die Fernfeldwahrnehmung im Vordergrund. Ein weiteres wichtiges Problem ist die begrenzte Rechenleistung, die fĂŒr die Entwicklung von Echtzeit-Anwendungen zur VerfĂŒgung steht. Aufgrund dieses Engpasses kommt es hĂ€ufig zu einem Kompromiss zwischen Leistung und Laufzeiteffizienz. Wir konzentrieren uns auf die folgenden Themen, um diese anzugehen: 1) Entwicklung von Nahfeld-Wahrnehmungsalgorithmen mit hoher Leistung und geringer RechenkomplexitĂ€t fĂŒr verschiedene visuelle Wahrnehmungsaufgaben wie geometrische und semantische Aufgaben unter Verwendung von faltbaren neuronalen Netzen. 2) Verwendung von Multi-Task-Learning zur Überwindung von RechenengpĂ€ssen durch die gemeinsame Nutzung von initialen Faltungsschichten zwischen den Aufgaben und die Entwicklung von Optimierungsstrategien, die die Aufgaben ausbalancieren.The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, piloting aircraft, and navigating ships over millions of years. Automating these capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying. Near-field visual perception in the context of self-driving cars can perceive the environment in a range of 0 - 10 meters and 360° coverage around the vehicle. It is a critical decision-making component in the development of safer automated driving. Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions. Until now, far-field perception has been the primary focus. Another significant issue is the limited processing power available for developing real-time applications. Because of this bottleneck, there is frequently a trade-off between performance and run-time efficiency. We concentrate on the following issues in order to address them: 1) Developing near-field perception algorithms with high performance and low computational complexity for various visual perception tasks such as geometric and semantic tasks using convolutional neural networks. 2) Using Multi-Task Learning to overcome computational bottlenecks by sharing initial convolutional layers between tasks and developing optimization strategies that balance tasks

    Real-time indoor assistive localization with mobile omnidirectional vision and cloud GPU acceleration

    Full text link
    In this paper we propose a real-time assistive localization approach to help blind and visually impaired people in navigating an indoor environment. The system consists of a mobile vision front end with a portable panoramic lens mounted on a smart phone, and a remote image feature-based database of the scene on a GPU-enabled server. Compact and elective omnidirectional image features are extracted and represented in the smart phone front end, and then transmitted to the server in the cloud. These features of a short video clip are used to search the database of the indoor environment via image-based indexing to find the location of the current view within the database, which is associated with floor plans of the environment. A median-filter-based multi-frame aggregation strategy is used for single path modeling, and a 2D multi-frame aggregation strategy based on the candidates’ distribution densities is used for multi-path environmental modeling to provide a final location estimation. To deal with the high computational cost in searching a large database for a realistic navigation application, data parallelism and task parallelism properties are identified in the database indexing process, and computation is accelerated by using multi-core CPUs and GPUs. User-friendly HCI particularly for the visually impaired is designed and implemented on an iPhone, which also supports system configurations and scene modeling for new environments. Experiments on a database of an eight-floor building are carried out to demonstrate the capacity of the proposed system, with real-time response (14 fps) and robust localization results

    Enhancing 3D Visual Odometry with Single-Camera Stereo Omnidirectional Systems

    Full text link
    We explore low-cost solutions for efficiently improving the 3D pose estimation problem of a single camera moving in an unfamiliar environment. The visual odometry (VO) task -- as it is called when using computer vision to estimate egomotion -- is of particular interest to mobile robots as well as humans with visual impairments. The payload capacity of small robots like micro-aerial vehicles (drones) requires the use of portable perception equipment, which is constrained by size, weight, energy consumption, and processing power. Using a single camera as the passive sensor for the VO task satisfies these requirements, and it motivates the proposed solutions presented in this thesis. To deliver the portability goal with a single off-the-shelf camera, we have taken two approaches: The first one, and the most extensively studied here, revolves around an unorthodox camera-mirrors configuration (catadioptrics) achieving a stereo omnidirectional system (SOS). The second approach relies on expanding the visual features from the scene into higher dimensionalities to track the pose of a conventional camera in a photogrammetric fashion. The first goal has many interdependent challenges, which we address as part of this thesis: SOS design, projection model, adequate calibration procedure, and application to VO. We show several practical advantages for the single-camera SOS due to its complete 360-degree stereo views, that other conventional 3D sensors lack due to their limited field of view. Since our omnidirectional stereo (omnistereo) views are captured by a single camera, a truly instantaneous pair of panoramic images is possible for 3D perception tasks. Finally, we address the VO problem as a direct multichannel tracking approach, which increases the pose estimation accuracy of the baseline method (i.e., using only grayscale or color information) under the photometric error minimization as the heart of the “direct” tracking algorithm. Currently, this solution has been tested on standard monocular cameras, but it could also be applied to an SOS. We believe the challenges that we attempted to solve have not been considered previously with the level of detail needed for successfully performing VO with a single camera as the ultimate goal in both real-life and simulated scenes

    Visual Distortions in 360-degree Videos.

    Get PDF
    Omnidirectional (or 360°) images and videos are emergent signals being used in many areas, such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360° content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360° images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360° signals going through the different processing elements of the visual communication pipeline. While their impact on viewers' visual perception and the immersive experience at large is still unknown-thus, it is an open research topic-this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360° signals. Their underlying causes in the end-to-end 360° content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360° content in interactive and immersive applications
    • 

    corecore