    Data2MV - A user behaviour dataset for multi-view scenarios

    The Data2MV dataset contains gaze fixation data obtained through experimental procedures from a total of 45 partic- ipants using an Intel RealSense F200 camera module and seven different video playlists. Each of the playlists had an approximate duration of 20 minutes and was viewed at least 17 times, with raw tracking data being recorded with a 0.05 second interval. The Data2MV dataset encompasses a total of 1.0 0 0.845 gaze fixations, gathered across a total of 128 exper- iments. It is also composed of 68.393 image frames, extracted from each of the 6 videos selected for these experiments, and an equal quantity of saliency maps, generated from aggregate fixation data. Software tools to obtain saliency maps and generate complementary plots are also provided as an open- source software package. The Data2MV dataset was publicly released to the research community on Mendeley Data and constitutes an important contribution to reduce the current scarcity of such data, particularly in immersive, multi-view streaming scenarios.info:eu-repo/semantics/publishedVersio

    Learning the surroundings: 3D scene understanding from omnidirectional images

    Las redes neuronales se han extendido por todo el mundo, siendo utilizadas en una gran variedad de aplicaciones. Estos métodos son capaces de reconocer música y audio, generar textos completos a partir de ideas simples u obtener información detallada y relevante de imágenes y videos. Las posibilidades que ofrecen las redes neuronales y métodos de aprendizaje profundo son incontables, convirtiéndose en la principal herramienta de investigación y nuevas aplicaciones en nuestra vida diaria. Al mismo tiempo, las imágenes omnidireccionales se están extendiendo dentro de la industria y nuestra sociedad, causando que la visión omnidireccional gane atención. A partir de imágenes 360 capturamos toda la información que rodea a la cámara en una sola toma.La combinación del aprendizaje profundo y la visión omnidireccional ha atraído a muchos investigadores. A partir de una única imagen omnidireccional se obtiene suficiente información del entorno para que una red neuronal comprenda sus alrededores y pueda interactuar con el entorno. Para aplicaciones como navegación y conducción autónoma, el uso de cámaras omnidireccionales proporciona información en torno del robot, person o vehículo, mientras que las cámaras convencionales carecen de esta información contextual debido a su reducido campo de visión. Aunque algunas aplicaciones pueden incluir varias cámaras convencionales para aumentar el campo de visión del sistema, tareas en las que el peso es importante (P.ej. guiado de personas con discapacidad visual o navegación de drones autónomos), un número reducido de dispositivos es altamente deseable.En esta tesis nos centramos en el uso conjunto de cámaras omnidireccionales, aprendizaje profundo, geometría y fotometría. Evaluamos diferentes enfoques para tratar con imágenes omnidireccionales, adaptando métodos a los modelos de proyección omnidireccionales y proponiendo nuevas soluciones para afrontar los retos de este tipo de imágenes. Para la comprensión de entornos interiores, proponemos una nueva red neuronal que obtiene segmentación semántica y mapas de profundidad de forma conjunta a partir de un único panoramaequirectangular. Nuestra red logra, con un nuevo enfoque convolucional, aprovechar la información del entorno proporcionada por la imagen panorámica y explotar la información combinada de semántica y profundidad. En el mismo tema, combinamos aprendizaje profundo y soluciones geométricas para recuperar el diseño estructural, junto con su escala, de entornos de interior a partir de un único panorama no central. Esta combinación de métodos proporciona una implementación rápida, debido a la red neuronal, y resultados precisos, gracias a lassoluciones geométricas. Además, también proponemos varios enfoques para la adaptación de redes neuronales a la distorsión de modelos de proyección omnidireccionales para la navegación y la adaptación del dominio soluciones previas. En términos generales, esta tesis busca encontrar soluciones novedosas e innovadoras para aprovechar las ventajas de las cámaras omnidireccionales y superar los desafíos que plantean.Neural networks have become widespread all around the world and are used for many different applications. These new methods are able to recognize music and audio, generate full texts from simple ideas and obtain detailed and relevant information from images and videos. The possibilities of neural networks and deep learning methods are uncountable, becoming the main tool for research and new applications in our daily-life. At the same time, omnidirectional and 360 images are also becoming widespread in industry and in consumer society, causing omnidirectional computer vision to gain attention. From 360 images, we capture all the information surrounding the camera in a single shot. The combination of deep learning methods and omnidirectional computer vision have attracted many researchers to this new field. From a single omnidirectional image, we obtain enough information of the environment to make a neural network understand its surroundings and interact with the environment. For applications such as navigation and autonomous driving, the use of omnidirectional cameras provide information all around the robot, person or vehicle, while conventional perspective cameras lack this context information due to their narrow field of view. Even if some applications can include several conventional cameras to increase the system's field of view, tasks where weight is more important (i.e. guidance of visually impaired people or navigation of autonomous drones), the less cameras we need to include, the better. In this thesis, we focus in the joint use of omnidirectional cameras, deep learning, geometry and photometric methods. We evaluate different approaches to handle omnidirectional images, adapting previous methods to the distortion of omnidirectional projection models and also proposing new solutions to tackle the challenges of this kind of images. For indoor scene understanding, we propose a novel neural network that jointly obtains semantic segmentation and depth maps from single equirectangular panoramas. Our network manages, with a new convolutional approach, to leverage the context information provided by the panoramic image and exploit the combined information of semantics and depth. In the same topic, we combine deep learning and geometric solvers to recover the scaled structural layout of indoor environments from single non-central panoramas. This combination provides a fast implementation, thanks to the learning approach, and accurate result, due to the geometric solvers. Additionally, we also propose several approaches of network adaptation to the distortion of omnidirectional projection models for outdoor navigation and domain adaptation of previous solutions. All in all, this thesis looks for finding novel and innovative solutions to take advantage of omnidirectional cameras while overcoming the challenges they pose.<br /

    Design and Analysis of a Single-Camera Omnistereo Sensor for Quadrotor Micro Aerial Vehicles (MAVs)

    We describe the design and 3D sensing performance of an omnidirectional stereo (omnistereo) vision system applied to Micro Aerial Vehicles (MAVs). The proposed omnistereo sensor employs a monocular camera that is co-axially aligned with a pair of hyperboloidal mirrors (a vertically-folded catadioptric configuration). We show that this arrangement provides a compact solution for omnidirectional 3D perception while mounted on top of propeller-based MAVs (not capable of large payloads). The theoretical single viewpoint (SVP) constraint helps us derive analytical solutions for the sensor’s projective geometry and generate SVP-compliant panoramic images to compute 3D information from stereo correspondences (in a truly synchronous fashion). We perform an extensive analysis on various system characteristics such as its size, catadioptric spatial resolution, field-of-view. In addition, we pose a probabilistic model for the uncertainty estimation of 3D information from triangulation of back-projected rays. We validate the projection error of the design using both synthetic and real-life images against ground-truth data. Qualitatively, we show 3D point clouds (dense and sparse) resulting out of a single image captured from a real-life experiment. We expect the reproducibility of our sensor as its model parameters can be optimized to satisfy other catadioptric-based omnistereo vision under different circumstances

    Coordinated Sensor-Based Area Coverage and Cooperative Localization of a Heterogeneous Fleet of Autonomous Surface Vessels (ASVs)

    Sensor coverage with fleets of robots is a complex task requiring solutions to localization, communication, navigation and basic sensor coverage. Sensor coverage of large areas is a problem that occurs in a variety of different environments from terrestrial to aerial to aquatic. In this thesis we consider the aquatic version of the problem. Given a known aquatic environment and collection of aquatic surface vehicles with known kinematic and dynamic constraints, how can a fleet of vehicles be deployed to provide sensor coverage of the surface of the body of water? Rather than considering this problem in general, in this work we consider the problem given a specific fleet consisting of one very well equipped robot aided by a number of smaller, less well equipped devices that must operate in close proximity to the main robot. A boustrophedon decomposition algorithm is developed that incorporates the motion, sensing and communication constraints imposed by the autonomous fleet. Solving the coverage problem leads to a localization/communication problem. A critical problem for a group of autonomous vehicles is ensuring that the collection operates within a common reference frame. Here we consider the problem of localizing a heterogenous collection of aquatic surface vessels within a global reference frame. We assume that one vessel -- the mother robot -- has access to global position data of high accuracy, while the other vessels -- the child robots -- utilize limited onboard sensors and sophisticated sensors on board the mother robot to localize themselves. This thesis provides details of the design of the elements of the heterogeneous fleet including the sensors and sensing algorithms along with the communication strategy used to localize all elements of the fleet within a global reference frame. Details of the robot platforms to be used in implementing a solution are also described. Simulation of the approach is used to demonstrate the effectiveness of the algorithm, and the algorithm and its components are evaluated using a fleet of ASVs

    Método para el registro automático de imágenes basado en transformaciones proyectivas planas dependientes de las distancias y orientado a imágenes sin características comunes

    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias Físicas, Departamento de Arquitectura de Computadores y Automática, leída el 18-12-2015Multisensory data fusion oriented to image-based application improves the accuracy, quality and availability of the data, and consequently, the performance of robotic systems, by means of combining the information of a scene acquired from multiple and different sources into a unified representation of the 3D world scene, which is more enlightening and enriching for the subsequent image processing, improving either the reliability by using the redundant information, or the capability by taking advantage of complementary information. Image registration is one of the most relevant steps in image fusion techniques. This procedure aims the geometrical alignment of two or more images. Normally, this process relies on feature-matching techniques, which is a drawback for combining sensors that are not able to deliver common features. For instance, in the combination of ToF and RGB cameras, the robust feature-matching is not reliable. Typically, the fusion of these two sensors has been addressed from the computation of the cameras calibration parameters for coordinate transformation between them. As a result, a low resolution colour depth map is provided. For improving the resolution of these maps and reducing the loss of colour information, extrapolation techniques are adopted. A crucial issue for computing high quality and accurate dense maps is the presence of noise in the depth measurement from the ToF camera, which is normally reduced by means of sensor calibration and filtering techniques. However, the filtering methods, implemented for the data extrapolation and denoising, usually over-smooth the data, reducing consequently the accuracy of the registration procedure...La fusión multisensorial orientada a aplicaciones de procesamiento de imágenes, conocida como fusión de imágenes, es una técnica que permite mejorar la exactitud, la calidad y la disponibilidad de datos de un entorno tridimensional, que a su vez permite mejorar el rendimiento y la operatividad de sistemas robóticos. Dicha fusión, se consigue mediante la combinación de la información adquirida por múltiples y diversas fuentes de captura de datos, la cual se agrupa del tal forma que se obtiene una mejor representación del entorno 3D, que es mucho más ilustrativa y enriquecedora para la implementación de métodos de procesamiento de imágenes. Con ello se consigue una mejora en la fiabilidad y capacidad del sistema, empleando la información redundante que ha sido adquirida por múltiples sensores. El registro de imágenes es uno de los procedimientos más importantes que componen la fusión de imágenes. El objetivo principal del registro de imágenes es la consecución de la alineación geométrica entre dos o más imágenes. Normalmente, este proceso depende de técnicas de búsqueda de patrones comunes entre imágenes, lo cual puede ser un inconveniente cuando se combinan sensores que no proporcionan datos con características similares. Un ejemplo de ello, es la fusión de cámaras de color de alta resolución (RGB) con cámaras de Tiempo de Vuelo de baja resolución (Time-of-Flight (ToF)), con las cuales no es posible conseguir una detección robusta de patrones comunes entre las imágenes capturadas por ambos sensores. Por lo general, la fusión entre estas cámaras se realiza mediante el cálculo de los parámetros de calibración de las mismas, que permiten realizar la trasformación homogénea entre ellas. Y como resultado de este xii Abstract procedimiento, se obtienen mapas de profundad y de color de baja resolución. Con el objetivo de mejorar la resolución de estos mapas y de evitar la pérdida de información de color, se utilizan diversas técnicas de extrapolación de datos. Un factor crucial a tomar en cuenta para la obtención de mapas de alta calidad y alta exactitud, es la presencia de ruido en las medidas de profundidad obtenidas por las cámaras ToF. Este problema, normalmente se reduce mediante la calibración de estos sensores y con técnicas de filtrado de datos. Sin embargo, las técnicas de filtrado utilizadas, tanto para la interpolación de datos, como para la reducción del ruido, suelen producir el sobre-alisamiento de los datos originales, lo cual reduce la exactitud del registro de imágenes...Sección Deptal. de Arquitectura de Computadores y Automática (Físicas)Fac. de Ciencias FísicasTRUEunpu

    A Vision Guided Method Using Omnidirectional Camera

    The omnidirectional camera is widely used in robotics technology to get the image of the overall direction and many studies using a 360 ° view. In this project, a new guided localization method using the omnidirectional camera is developed. The task is using two omnidirectional cameras with 360 vision to capture images upside and down or similar as a global vision camera that can capture images in 720. By using the omnidirectional camera and local cues, a robotic platform will be able to determine its current position and utilise it for onward navigation. An image-based algorithm is developed to overcome the problem of scaling and distortion of the object within the surrounded image, and use it to determine its exact location referring to a pre-determined reference marker. All the system will try to reduce distortion of images that have been captured by this two omnidirectional cameras. This project will involve a development of hardware and software integration, with emphasises to be given on the solving the localization problem. The result from this studiey is the images from two cameras can be processed to form 720° image and coloured object detection can be done

    Design And Implementation Of An Omnidirectional Mobile Robot Platform

    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2008Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2008Bu çalısmada robotik alanında yapılan akademik çalısmaların genis bir bölümünde uygulama gelistirme platformu olarak kullanılmak üzere; gerekli islemci gücü, algılama yetileri, hareket kabiliyeti ve iletisim altyapılarını sunan bir mobil robot platform tasarlanmıs ve gerçeklenmistir. Gerçeklenen robotun tabanı, iki diferansiyel sürümlü platformun üzerine sabitlenmistir. Bu sayede serbestlik derecesi dört olan taban, diferansiyel sürümlü platformları kontrol ederek her yöne hareket edebilme yeteneğine sahiptir. Gerçeklenen mekanik tasarımda, odometri tabanlı hassas konumlandırmanın mümkün olabilmesi için, robotun tasarımının kendine has geometrik avantajlarını kullanarak odometri hatalarının azaltılmasına olanak veren bir yöntem sunulmustur. Hareketli platformun üzerindeki donanım bataryalar, üç eksende hareketli bir kamera, çift çekirdekli bir DSP sistemi, Linux tabanlı bir kontrol kartı, kablosuz ağ ve video bağlantısı, grafik LCD ve detayları sunulmus olan, iki eksende hareketli bir lazer isaretçi ile kameranın kullanıldığı, çalısmaya özel olarak gelistirilmis üç boyutlu mesafe ölçerinden olusmaktadır.In this study, an omnidirectional mobile robot with sufficient processing power, sensory units and communication facilities for being utilized as an application development platform for a wide range of academic research in the field of robotics was designed and implemented. The base plane of the robot is attached onto two differential drive platforms, giving four-degrees-of-freedom to the base. This makes the robot able to move to any direction with proper control of the differential drive platforms, giving the property of omnidirectionality. A method to reduce odometric errors and make odometry based accurate positioning possible was also presented which utilizes the geometrical advantages particular to the robot’s mechanic design. The hardware on the moving base consists of batteries, a camera moving in three axes, a dual core DSP system, a Linux based control card, wireless network and video connection, graphical LCD and a laser pointer moving in two axes. An algorithm that uses the laser and the camera to obtain three dimensional distance measurements was also derived.Yüksek LisansM.Sc

    Real-Time Multi-Fisheye Camera Self-Localization and Egomotion Estimation in Complex Indoor Environments

    In this work a real-time capable multi-fisheye camera self-localization and egomotion estimation framework is developed. The thesis covers all aspects ranging from omnidirectional camera calibration to the development of a complete multi-fisheye camera SLAM system based on a generic multi-camera bundle adjustment method

    Structureless Camera Motion Estimation of Unordered Omnidirectional Images

    This work aims at providing a novel camera motion estimation pipeline from large collections of unordered omnidirectional images. In oder to keep the pipeline as general and flexible as possible, cameras are modelled as unit spheres, allowing to incorporate any central camera type. For each camera an unprojection lookup is generated from intrinsics, which is called P2S-map (Pixel-to-Sphere-map), mapping pixels to their corresponding positions on the unit sphere. Consequently the camera geometry becomes independent of the underlying projection model. The pipeline also generates P2S-maps from world map projections with less distortion effects as they are known from cartography. Using P2S-maps from camera calibration and world map projection allows to convert omnidirectional camera images to an appropriate world map projection in oder to apply standard feature extraction and matching algorithms for data association. The proposed estimation pipeline combines the flexibility of SfM (Structure from Motion) - which handles unordered image collections - with the efficiency of PGO (Pose Graph Optimization), which is used as back-end in graph-based Visual SLAM (Simultaneous Localization and Mapping) approaches to optimize camera poses from large image sequences. SfM uses BA (Bundle Adjustment) to jointly optimize camera poses (motion) and 3d feature locations (structure), which becomes computationally expensive for large-scale scenarios. On the contrary PGO solves for camera poses (motion) from measured transformations between cameras, maintaining optimization managable. The proposed estimation algorithm combines both worlds. It obtains up-to-scale transformations between image pairs using two-view constraints, which are jointly scaled using trifocal constraints. A pose graph is generated from scaled two-view transformations and solved by PGO to obtain camera motion efficiently even for large image collections. Obtained results can be used as input data to provide initial pose estimates for further 3d reconstruction purposes e.g. to build a sparse structure from feature correspondences in an SfM or SLAM framework with further refinement via BA. The pipeline also incorporates fixed extrinsic constraints from multi-camera setups as well as depth information provided by RGBD sensors. The entire camera motion estimation pipeline does not need to generate a sparse 3d structure of the captured environment and thus is called SCME (Structureless Camera Motion Estimation).:1 Introduction 1.1 Motivation 1.1.1 Increasing Interest of Image-Based 3D Reconstruction 1.1.2 Underground Environments as Challenging Scenario 1.1.3 Improved Mobile Camera Systems for Full Omnidirectional Imaging 1.2 Issues 1.2.1 Directional versus Omnidirectional Image Acquisition 1.2.2 Structure from Motion versus Visual Simultaneous Localization and Mapping 1.3 Contribution 1.4 Structure of this Work 2 Related Work 2.1 Visual Simultaneous Localization and Mapping 2.1.1 Visual Odometry 2.1.2 Pose Graph Optimization 2.2 Structure from Motion 2.2.1 Bundle Adjustment 2.2.2 Structureless Bundle Adjustment 2.3 Corresponding Issues 2.4 Proposed Reconstruction Pipeline 3 Cameras and Pixel-to-Sphere Mappings with P2S-Maps 3.1 Types 3.2 Models 3.2.1 Unified Camera Model 3.2.2 Polynomal Camera Model 3.2.3 Spherical Camera Model 3.3 P2S-Maps - Mapping onto Unit Sphere via Lookup Table 3.3.1 Lookup Table as Color Image 3.3.2 Lookup Interpolation 3.3.3 Depth Data Conversion 4 Calibration 4.1 Overview of Proposed Calibration Pipeline 4.2 Target Detection 4.3 Intrinsic Calibration 4.3.1 Selected Examples 4.4 Extrinsic Calibration 4.4.1 3D-2D Pose Estimation 4.4.2 2D-2D Pose Estimation 4.4.3 Pose Optimization 4.4.4 Uncertainty Estimation 4.4.5 PoseGraph Representation 4.4.6 Bundle Adjustment 4.4.7 Selected Examples 5 Full Omnidirectional Image Projections 5.1 Panoramic Image Stitching 5.2 World Map Projections 5.3 World Map Projection Generator for P2S-Maps 5.4 Conversion between Projections based on P2S-Maps 5.4.1 Proposed Workflow 5.4.2 Data Storage Format 5.4.3 Real World Example 6 Relations between Two Camera Spheres 6.1 Forward and Backward Projection 6.2 Triangulation 6.2.1 Linear Least Squares Method 6.2.2 Alternative Midpoint Method 6.3 Epipolar Geometry 6.4 Transformation Recovery from Essential Matrix 6.4.1 Cheirality 6.4.2 Standard Procedure 6.4.3 Simplified Procedure 6.4.4 Improved Procedure 6.5 Two-View Estimation 6.5.1 Evaluation Strategy 6.5.2 Error Metric 6.5.3 Evaluation of Estimation Algorithms 6.5.4 Concluding Remarks 6.6 Two-View Optimization 6.6.1 Epipolar-Based Error Distances 6.6.2 Projection-Based Error Distances 6.6.3 Comparison between Error Distances 6.7 Two-View Translation Scaling 6.7.1 Linear Least Squares Estimation 6.7.2 Non-Linear Least Squares Optimization 6.7.3 Comparison between Initial and Optimized Scaling Factor 6.8 Homography to Identify Degeneracies 6.8.1 Homography for Spherical Cameras 6.8.2 Homography Estimation 6.8.3 Homography Optimization 6.8.4 Homography and Pure Rotation 6.8.5 Homography in Epipolar Geometry 7 Relations between Three Camera Spheres 7.1 Three View Geometry 7.2 Crossing Epipolar Planes Geometry 7.3 Trifocal Geometry 7.4 Relation between Trifocal, Three-View and Crossing Epipolar Planes 7.5 Translation Ratio between Up-To-Scale Two-View Transformations 7.5.1 Structureless Determination Approaches 7.5.2 Structure-Based Determination Approaches 7.5.3 Comparison between Proposed Approaches 8 Pose Graphs 8.1 Optimization Principle 8.2 Solvers 8.2.1 Additional Graph Solvers 8.2.2 False Loop Closure Detection 8.3 Pose Graph Generation 8.3.1 Generation of Synthetic Pose Graph Data 8.3.2 Optimization of Synthetic Pose Graph Data 9 Structureless Camera Motion Estimation 9.1 SCME Pipeline 9.2 Determination of Two-View Translation Scale Factors 9.3 Integration of Depth Data 9.4 Integration of Extrinsic Camera Constraints 10 Camera Motion Estimation Results 10.1 Directional Camera Images 10.2 Omnidirectional Camera Images 11 Conclusion 11.1 Summary 11.2 Outlook and Future Work Appendices A.1 Additional Extrinsic Calibration Results A.2 Linear Least Squares Scaling A.3 Proof Rank Deficiency A.4 Alternative Derivation Midpoint Method A.5 Simplification of Depth Calculation A.6 Relation between Epipolar and Circumferential Constraint A.7 Covariance Estimation A.8 Uncertainty Estimation from Epipolar Geometry A.9 Two-View Scaling Factor Estimation: Uncertainty Estimation A.10 Two-View Scaling Factor Optimization: Uncertainty Estimation A.11 Depth from Adjoining Two-View Geometries A.12 Alternative Three-View Derivation A.12.1 Second Derivation Approach A.12.2 Third Derivation Approach A.13 Relation between Trifocal Geometry and Alternative Midpoint Method A.14 Additional Pose Graph Generation Examples A.15 Pose Graph Solver Settings A.16 Additional Pose Graph Optimization Examples Bibliograph