316 research outputs found

    An intelligent robotic vision system with environment perception

    Get PDF
    Ever since the dawn of computer vision[1, 2], 3D environment reconstruction and object 6D pose estimation have been a core problem. This thesis attempts to develop a novel 3D intelligent robotic vision system integrating environment reconstruction and object detection techniques to solve practical problems. Chapter 2 reviews current state-of-the art of 3D vision techniques from environment reconstruction and 6D pose estimation.In Chapter 3 a novel environment reconstruction system is proposed by using coloured point clouds. The evaluation experiment indicates that the proposed algorithm 2 is effective for small-scale and large scale and textureless scenes. Chapter 4 presents Image-6D (that is section 4.2), a learning-based object pose estimation algorithm from a single RGB image. Contour-alignment is introduced as an efficient algorithm for pose refinement in an RGB image. This new method is evaluated on two widely used benchmark image data bases, LINEMOD and Occlusion-LINEMOD. Experiments show that the proposed method surpasses other state-of-the-art RGB based prediction approaches. Chapter 5 describes Point-6D (defined in section 5.2), a novel 6D pose estimation method using coloured point clouds as input. The performance of this new method is demonstrated on LineMOD [3] and YCB-Video [4] dataset. Chapter 6 summarizes contributions and discusses potential future research directions. In addition, we presents an intelligent 3D robotic vision system deployed in a simulated/laboratory nuclear waste disposal scenario in Appendices B. To verify the results, a simulated nuclear waste handling experiment has been successfully completed via the proposed robotic system

    Visual / acoustic detection and localisation in embedded systems

    Get PDF
    ©Cranfield UniversityThe continuous miniaturisation of sensing and processing technologies is increasingly offering a variety of embedded platforms, enabling the accomplishment of a broad range of tasks using such systems. Motivated by these advances, this thesis investigates embedded detection and localisation solutions using vision and acoustic sensors. Focus is particularly placed on surveillance applications using sensor networks. Existing vision-based detection solutions for embedded systems suffer from the sensitivity to environmental conditions. In the literature, there seems to be no algorithm able to simultaneously tackle all the challenges inherent to real-world videos. Regarding the acoustic modality, many research works have investigated acoustic source localisation solutions in distributed sensor networks. Nevertheless, it is still a challenging task to develop an ecient algorithm that deals with the experimental issues, to approach the performance required by these systems and to perform the data processing in a distributed and robust manner. The movement of scene objects is generally accompanied with sound emissions with features that vary from an environment to another. Therefore, considering the combination of the visual and acoustic modalities would offer a significant opportunity for improving the detection and/or localisation using the described platforms. In the light of the described framework, we investigate in the first part of the thesis the use of a cost-effective visual based method that can deal robustly with the issue of motion detection in static, dynamic and moving background conditions. For motion detection in static and dynamic backgrounds, we present the development and the performance analysis of a spatio- temporal form of the Gaussian mixture model. On the other hand, the problem of motion detection in moving backgrounds is addressed by accounting for registration errors in the captured images. By adopting a robust optimisation technique that takes into account the uncertainty about the visual measurements, we show that high detection accuracy can be achieved. In the second part of this thesis, we investigate solutions to the problem of acoustic source localisation using a trust region based optimisation technique. The proposed method shows an overall higher accuracy and convergence improvement compared to a linear-search based method. More importantly, we show that through characterising the errors in measurements, which is a common problem for such platforms, higher accuracy in the localisation can be attained. The last part of this work studies the different possibilities of combining visual and acoustic information in a distributed sensors network. In this context, we first propose to include the acoustic information in the visual model. The obtained new augmented model provides promising improvements in the detection and localisation processes. The second investigated solution consists in the fusion of the measurements coming from the different sensors. An evaluation of the accuracy of localisation and tracking using a centralised/decentralised architecture is conducted in various scenarios and experimental conditions. Results have shown the capability of this fusion approach to yield higher accuracy in the localisation and tracking of an active acoustic source than by using a single type of data

    Exploiting Structural Regularities and Beyond: Vision-based Localization and Mapping in Man-Made Environments

    Get PDF
    Image-based estimation of camera motion, known as visual odometry (VO), plays a very important role in many robotic applications such as control and navigation of unmanned mobile robots, especially when no external navigation reference signal is available. The core problem of VO is the estimation of the camera’s ego-motion (i.e. tracking) either between successive frames, namely relative pose estimation, or with respect to a global map, namely absolute pose estimation. This thesis aims to develop efficient, accurate and robust VO solutions by taking advantage of structural regularities in man-made environments, such as piece-wise planar structures, Manhattan World and more generally, contours and edges. Furthermore, to handle challenging scenarios that are beyond the limits of classical sensor based VO solutions, we investigate a recently emerging sensor — the event camera and study on event-based mapping — one of the key problems in the event-based VO/SLAM. The main achievements are summarized as follows. First, we revisit an old topic on relative pose estimation: accurately and robustly estimating the fundamental matrix given a collection of independently estimated homograhies. Three classical methods are reviewed and then we show a simple but nontrivial two-step normalization within the direct linear method that achieves similar performance to the less attractive and more computationally intensive hallucinated points based method. Second, an efficient 3D rotation estimation algorithm for depth cameras in piece-wise planar environments is presented. It shows that by using surface normal vectors as an input, planar modes in the corresponding density distribution function can be discovered and continuously tracked using efficient non-parametric estimation techniques. The relative rotation can be estimated by registering entire bundles of planar modes by using robust L1-norm minimization. Third, an efficient alternative to the iterative closest point algorithm for real-time tracking of modern depth cameras in ManhattanWorlds is developed. We exploit the common orthogonal structure of man-made environments in order to decouple the estimation of the rotation and the three degrees of freedom of the translation. The derived camera orientation is absolute and thus free of long-term drift, which in turn benefits the accuracy of the translation estimation as well. Fourth, we look into a more general structural regularity—edges. A real-time VO system that uses Canny edges is proposed for RGB-D cameras. Two novel alternatives to classical distance transforms are developed with great properties that significantly improve the classical Euclidean distance field based methods in terms of efficiency, accuracy and robustness. Finally, to deal with challenging scenarios that go beyond what standard RGB/RGB-D cameras can handle, we investigate the recently emerging event camera and focus on the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping

    Analysis and Exploitation of Automatically Generated Scene Structure from Aerial Imagery

    Get PDF
    The recent advancements made in the field of computer vision, along with the ever increasing rate of computational power has opened up opportunities in the field of automated photogrammetry. Many researchers have focused on using these powerful computer vision algorithms to extract three-dimensional point clouds of scenes from multi-view imagery, with the ultimate goal of creating a photo-realistic scene model. However, geographically accurate three-dimensional scene models have the potential to be exploited for much more than just visualization. This work looks at utilizing automatically generated scene structure from near-nadir aerial imagery to identify and classify objects within the structure, through the analysis of spatial-spectral information. The limitation to this type of imagery is imposed due to the common availability of this type of aerial imagery. Popular third-party computer-vision algorithms are used to generate the scene structure. A voxel-based approach for surface estimation is developed using Manhattan-world assumptions. A surface estimation confidence metric is also presented. This approach provides the basis for further analysis of surface materials, incorporating spectral information. Two cases of spectral analysis are examined: when additional hyperspectral imagery of the reconstructed scene is available, and when only R,G,B spectral information can be obtained. A method for registering the surface estimation to hyperspectral imagery, through orthorectification, is developed. Atmospherically corrected hyperspectral imagery is used to assign reflectance values to estimated surface facets for physical simulation with DIRSIG. A spatial-spectral region growing-based segmentation algorithm is developed for the R,G,B limited case, in order to identify possible materials for user attribution. Finally, an analysis of the geographic accuracy of automatically generated three-dimensional structure is performed. An end-to-end, semi-automated, workflow is developed, described, and made available for use

    Investigation of Intensity Correction in the Context of Image Registration

    Get PDF
    An image registration algorithm with intensity correction was developed. A particular goal was to apply intensity correction instead of using multimodal similarity measures. The algorithm utilises common Levenberg-Marquardt optimisation. The author has chosen two dimensional affine and one dimensional B-Spline model as spatial transformation, as well as intensity correction models specific to CT images. They are global non-linear mapping and smooth local affine correction. The algorithm was tested experimentally using a wide class of simulated images and a limited class of medical images. Affine registration works properly even for deformations which exceed typical deformation encountered in medical practice. B-Spline registration works properly for small deformations and requires further development to increase capture range. The idea of separating intensity correction mapping from similarity measure is shown to have advantages. Choosing intensity correction model can make the registration algorithm specific to the image class of interest

    Advances in Simultaneous Localization and Mapping in Confined Underwater Environments Using Sonar and Optical Imaging.

    Full text link
    This thesis reports on the incorporation of surface information into a probabilistic simultaneous localization and mapping (SLAM) framework used on an autonomous underwater vehicle (AUV) designed for underwater inspection. AUVs operating in cluttered underwater environments, such as ship hulls or dams, are commonly equipped with Doppler-based sensors, which---in addition to navigation---provide a sparse representation of the environment in the form of a three-dimensional (3D) point cloud. The goal of this thesis is to develop perceptual algorithms that take full advantage of these sparse observations for correcting navigational drift and building a model of the environment. In particular, we focus on three objectives. First, we introduce a novel representation of this 3D point cloud as collections of planar features arranged in a factor graph. This factor graph representation probabalistically infers the spatial arrangement of each planar segment and can effectively model smooth surfaces (such as a ship hull). Second, we show how this technique can produce 3D models that serve as input to our pipeline that produces the first-ever 3D photomosaics using a two-dimensional (2D) imaging sonar. Finally, we propose a model-assisted bundle adjustment (BA) framework that allows for robust registration between surfaces observed from a Doppler sensor and visual features detected from optical images. Throughout this thesis, we show methods that produce 3D photomosaics using a combination of triangular meshes (derived from our SLAM framework or given a-priori), optical images, and sonar images. Overall, the contributions of this thesis greatly increase the accuracy, reliability, and utility of in-water ship hull inspection with AUVs despite the challenges they face in underwater environments. We provide results using the Hovering Autonomous Underwater Vehicle (HAUV) for autonomous ship hull inspection, which serves as the primary testbed for the algorithms presented in this thesis. The sensor payload of the HAUV consists primarily of: a Doppler velocity log (DVL) for underwater navigation and ranging, monocular and stereo cameras, and---for some applications---an imaging sonar.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120750/1/paulozog_1.pd

    Efficient Dense Registration, Segmentation, and Modeling Methods for RGB-D Environment Perception

    Get PDF
    One perspective for artificial intelligence research is to build machines that perform tasks autonomously in our complex everyday environments. This setting poses challenges to the development of perception skills: A robot should be able to perceive its location and objects in its surrounding, while the objects and the robot itself could also be moving. Objects may not only be composed of rigid parts, but could be non-rigidly deformable or appear in a variety of similar shapes. Furthermore, it could be relevant to the task to observe object semantics. For a robot acting fluently and immediately, these perception challenges demand efficient methods. This theses presents novel approaches to robot perception with RGB-D sensors. It develops efficient registration, segmentation, and modeling methods for scene and object perception. We propose multi-resolution surfel maps as a concise representation for RGB-D measurements. We develop probabilistic registration methods that handle rigid scenes, scenes with multiple rigid parts that move differently, and scenes that undergo non-rigid deformations. We use these methods to learn and perceive 3D models of scenes and objects in both static and dynamic environments. For learning models of static scenes, we propose a real-time capable simultaneous localization and mapping approach. It aligns key views in RGB-D video using our rigid registration method and optimizes the pose graph of the key views. The acquired models are then perceived in live images through detection and tracking within a Bayesian filtering framework. An assumption frequently made for environment mapping is that the observed scene remains static during the mapping process. Through rigid multi-body registration, we take advantage of releasing this assumption: Our registration method segments views into parts that move independently between the views and simultaneously estimates their motion. Within simultaneous motion segmentation, localization, and mapping, we separate scenes into objects by their motion. Our approach acquires 3D models of objects and concurrently infers hierarchical part relations between them using probabilistic reasoning. It can be applied for interactive learning of objects and their part decomposition. Endowing robots with manipulation skills for a large variety of objects is a tedious endeavor if the skill is programmed for every instance of an object class. Furthermore, slight deformations of an instance could not be handled by an inflexible program. Deformable registration is useful to perceive such shape variations, e.g., between specific instances of a tool. We develop an efficient deformable registration method and apply it for the transfer of robot manipulation skills between varying object instances. On the object-class level, we segment images using random decision forest classifiers in real-time. The probabilistic labelings of individual images are fused in 3D semantic maps within a Bayesian framework. We combine our object-class segmentation method with simultaneous localization and mapping to achieve online semantic mapping in real-time. The methods developed in this thesis are evaluated in experiments on publicly available benchmark datasets and novel own datasets. We publicly demonstrate several of our perception approaches within integrated robot systems in the mobile manipulation context.Effiziente Dichte Registrierungs-, Segmentierungs- und Modellierungsmethoden für die RGB-D Umgebungswahrnehmung In dieser Arbeit beschäftigen wir uns mit Herausforderungen der visuellen Wahrnehmung für intelligente Roboter in Alltagsumgebungen. Solche Roboter sollen sich selbst in ihrer Umgebung zurechtfinden, und Wissen über den Verbleib von Objekten erwerben können. Die Schwierigkeit dieser Aufgaben erhöht sich in dynamischen Umgebungen, in denen ein Roboter die Bewegung einzelner Teile differenzieren und auch wahrnehmen muss, wie sich diese Teile bewegen. Bewegt sich ein Roboter selbständig in dieser Umgebung, muss er auch seine eigene Bewegung von der Veränderung der Umgebung unterscheiden. Szenen können sich aber nicht nur durch die Bewegung starrer Teile verändern. Auch die Teile selbst können ihre Form in nicht-rigider Weise ändern. Eine weitere Herausforderung stellt die semantische Interpretation von Szenengeometrie und -aussehen dar. Damit intelligente Roboter unmittelbar und flüssig handeln können, sind effiziente Algorithmen für diese Wahrnehmungsprobleme erforderlich. Im ersten Teil dieser Arbeit entwickeln wir effiziente Methoden zur Repräsentation und Registrierung von RGB-D Messungen. Zunächst stellen wir Multi-Resolutions-Oberflächenelement-Karten (engl. multi-resolution surfel maps, MRSMaps) als eine kompakte Repräsentation von RGB-D Messungen vor, die unseren effizienten Registrierungsmethoden zugrunde liegt. Bilder können effizient in dieser Repräsentation aggregiert werde, wobei auch mehrere Bilder aus verschiedenen Blickpunkten integriert werden können, um Modelle von Szenen und Objekte aus vielfältigen Ansichten darzustellen. Für die effiziente, robuste und genaue Registrierung von MRSMaps wird eine Methode vorgestellt, die Rigidheit der betrachteten Szene voraussetzt. Die Registrierung schätzt die Kamerabewegung zwischen den Bildern und gewinnt ihre Effizienz durch die Ausnutzung der kompakten multi-resolutionalen Darstellung der Karten. Die Registrierungsmethode erzielt hohe Bildverarbeitungsraten auf einer CPU. Wir demonstrieren hohe Effizienz, Genauigkeit und Robustheit unserer Methode im Vergleich zum bisherigen Stand der Forschung auf Vergleichsdatensätzen. In einem weiteren Registrierungsansatz lösen wir uns von der Annahme, dass die betrachtete Szene zwischen Bildern statisch ist. Wir erlauben nun, dass sich rigide Teile der Szene bewegen dürfen, und erweitern unser rigides Registrierungsverfahren auf diesen Fall. Unser Ansatz segmentiert das Bild in Bereiche einzelner Teile, die sich unterschiedlich zwischen Bildern bewegen. Wir demonstrieren hohe Segmentierungsgenauigkeit und Genauigkeit in der Bewegungsschätzung unter Echtzeitbedingungen für die Verarbeitung. Schließlich entwickeln wir ein Verfahren für die Wahrnehmung von nicht-rigiden Deformationen zwischen zwei MRSMaps. Auch hier nutzen wir die multi-resolutionale Struktur in den Karten für ein effizientes Registrieren von grob zu fein. Wir schlagen Methoden vor, um aus den geschätzten Deformationen die lokale Bewegung zwischen den Bildern zu berechnen. Wir evaluieren Genauigkeit und Effizienz des Registrierungsverfahrens. Der zweite Teil dieser Arbeit widmet sich der Verwendung unserer Kartenrepräsentation und Registrierungsmethoden für die Wahrnehmung von Szenen und Objekten. Wir verwenden MRSMaps und unsere rigide Registrierungsmethode, um dichte 3D Modelle von Szenen und Objekten zu lernen. Die räumlichen Beziehungen zwischen Schlüsselansichten, die wir durch Registrierung schätzen, werden in einem Simultanen Lokalisierungs- und Kartierungsverfahren (engl. simultaneous localization and mapping, SLAM) gegeneinander abgewogen, um die Blickposen der Schlüsselansichten zu schätzen. Für das Verfolgen der Kamerapose bezüglich der Modelle in Echtzeit, kombinieren wir die Genauigkeit unserer Registrierung mit der Robustheit von Partikelfiltern. Zu Beginn der Posenverfolgung, oder wenn das Objekt aufgrund von Verdeckungen oder extremen Bewegungen nicht weiter verfolgt werden konnte, initialisieren wir das Filter durch Objektdetektion. Anschließend wenden wir unsere erweiterten Registrierungsverfahren für die Wahrnehmung in nicht-rigiden Szenen und für die Übertragung von Objekthandhabungsfähigkeiten von Robotern an. Wir erweitern unseren rigiden Kartierungsansatz auf dynamische Szenen, in denen sich rigide Teile bewegen. Die Bewegungssegmente in Schlüsselansichten werden zueinander in Bezug gesetzt, um Äquivalenz- und Teilebeziehungen von Objekten probabilistisch zu inferieren, denen die Segmente entsprechen. Auch hier liefert unsere Registrierungsmethode die Bewegung der Kamera bezüglich der Objekte, die wir in einem SLAM Verfahren optimieren. Aus diesen Blickposen wiederum können wir die Bewegungssegmente in dichten Objektmodellen vereinen. Objekte einer Klasse teilen oft eine gemeinsame Topologie von funktionalen Elementen, die durch Formkorrespondenzen ermittelt werden kann. Wir verwenden unsere deformierbare Registrierung, um solche Korrespondenzen zu finden und die Handhabung eines Objektes durch einen Roboter auf neue Objektinstanzen derselben Klasse zu übertragen. Schließlich entwickeln wir einen echtzeitfähigen Ansatz, der Kategorien von Objekten in RGB-D Bildern erkennt und segmentiert. Die Segmentierung basiert auf Ensemblen randomisierter Entscheidungsbäume, die Geometrie- und Texturmerkmale zur Klassifikation verwenden. Wir fusionieren Segmentierungen von Einzelbildern einer Szene aus mehreren Ansichten in einer semantischen Objektklassenkarte mit Hilfe unseres SLAM-Verfahrens. Die vorgestellten Methoden werden auf öffentlich verfügbaren Vergleichsdatensätzen und eigenen Datensätzen evaluiert. Einige unserer Ansätze wurden auch in integrierten Robotersystemen für mobile Objekthantierungsaufgaben öffentlich demonstriert. Sie waren ein wichtiger Bestandteil für das Gewinnen der RoboCup-Roboterwettbewerbe in der RoboCup@Home Liga in den Jahren 2011, 2012 und 2013

    Face tracking and pose estimation with automatic three-dimensional model construction

    Get PDF
    A method for robustly tracking and estimating the face pose of a person using stereo vision is presented. The method is invariant to identity and does not require previous training. A face model is automatically initialised and constructed online: a fixed point distribution is superposed over the face when it is frontal to the cameras, and several appropriate points close to those locations are chosen for tracking. Using the stereo correspondence of the cameras, the three-dimensional (3D) coordinates of these points are extracted, and the 3D model is created. The 2D projections of the model points are tracked separately on the left and right images using SMAT. RANSAC and POSIT are used for 3D pose estimation. Head rotations up to ±45° are correctly estimated. The approach runs in real time. The purpose of this method is to serve as the basis of a driver monitoring system, and has been tested on sequences recorded in a moving car.Ministerio de Educación y CienciaComunidad de Madri

    Lidar-based scale recovery dense SLAM for UAV navigation

    Get PDF
    Imagine of having an autonomous agent (drone, robot, car, ..) that wants to navigate inside an unknown environment. The first question that it needs to answer for accomplish such task is: where Am I? Where are the objects that are surrounding me? The SLAM algorithm can answer to both questions simultaneously, in an on-line manner. This thesis focus on the implementation of a monocular SLAM algorithm on the UAV framework, where the classical obtained sparsity map is densified by means of a Convolutional Neural Network, properly scaled through 2D lidar measurements.Imagine of having an autonomous agent (drone, robot, car, ..) that wants to navigate inside an unknown environment. The first question that it needs to answer for accomplish such task is: where Am I? Where are the objects that are surrounding me? The SLAM algorithm can answer to both questions simultaneously, in an on-line manner. This thesis focus on the implementation of a monocular SLAM algorithm on the UAV framework, where the classical obtained sparsity map is densified by means of a Convolutional Neural Network, properly scaled through 2D lidar measurements
    corecore