5,334 research outputs found

    Learning Action Maps of Large Environments via First-Person Vision

    Full text link
    When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment. Our goal is to automate dense functional understanding of large spaces by leveraging sparse activity demonstrations recorded from an ego-centric viewpoint. The method we describe enables functionality estimation in large scenes where people have behaved, as well as novel scenes where no behaviors are observed. Our method learns and predicts "Action Maps", which encode the ability for a user to perform activities at various locations. With the usage of an egocentric camera to observe human activities, our method scales with the size of the scene without the need for mounting multiple static surveillance cameras and is well-suited to the task of observing activities up-close. We demonstrate that by capturing appearance-based attributes of the environment and associating these attributes with activity demonstrations, our proposed mathematical framework allows for the prediction of Action Maps in new environments. Additionally, we offer a preliminary glance of the applicability of Action Maps by demonstrating a proof-of-concept application in which they are used in concert with activity detections to perform localization.Comment: To appear at CVPR 201

    Large Scale 3D Mapping of Indoor Environments Using a Handheld RGBD Camera

    Get PDF
    The goal of this research is to investigate the problem of reconstructing a 3D representation of an environment, of arbitrary size, using a handheld color and depth (RGBD) sensor. The focus of this dissertation is to examine four of the underlying subproblems to this system: camera tracking, loop closure, data storage, and integration. First, a system for 3D reconstruction of large indoor planar environments with data captured from an RGBD sensor mounted on a mobile robotic platform is presented. An algorithm for constructing nearly drift-free 3D occupancy grids of large indoor environments in an online manner is also presented. This approach combines data from an odometry sensor with output from a visual registration algorithm, and it enforces a Manhattan world constraint by utilizing factor graphs to produce an accurate online estimate of the trajectory of the mobile robotic platform. Through several experiments in environments with varying sizes and construction it is shown that this method reduces rotational and translational drift significantly without performing any loop closing techniques. In addition the advantages and limitations of an octree data structure representation of a 3D environment is examined. Second, the problem of sensor tracking, specifically the use of the KinectFusion algorithm to align two subsequent point clouds generated by an RGBD sensor, is studied. A method to overcome a significant limitation of the Iterative Closest Point (ICP) algorithm used in KinectFusion is proposed, namely, its sole reliance upon geometric information. The proposed method uses both geometric and color information in a direct manner that uses all the data in order to accurately estimate camera pose. Data association is performed by computing a warp between the two color images associated with two RGBD point clouds using the Lucas-Kanade algorithm. A subsequent step then estimates the transformation between the point clouds using either a point-to-point or point-to-plane error metric. Scenarios in which each of these metrics fails are described, and a normal covariance test for automatically selecting between them is proposed. Together, Lucas-Kanade data association (LKDA) along with covariance testing enables robust camera tracking through areas of low geometrical features, while at the same time retaining accuracy in environments in which the existing ICP technique succeeds. Experimental results on several publicly available datasets demonstrate the improved performance both qualitatively and quantitatively. Third, the choice of state space in the context of performing loop closure is revisited. Although a relative state space has been discounted by previous authors, it is shown that such a state space is actually extremely powerful, able to achieve recognizable results after just one iteration. The power behind the technique is that changing the orientation of one node is able to affect other nodes. At the same time, the approach --- which is referred to as Pose Optimization using a Relative State Space (POReSS) --- is fast because, like the more popular incremental state space, the Jacobian never needs to be explicitly computed. Furthermore, it is shown that while POReSS is able to quickly compute a solution near the global optimum, it is not precise enough to perform the fine adjustments necessary to achieve acceptable results. As a result, a method to augment POReSS with a fast variant of Gauss-Seidel --- which is referred to as Graph-Seidel --- on a global state space to allow the solution to settle closer to the global minimum is proposed. Through a set of experiments, it is shown that this combination of POReSS and Graph-Seidel is not only faster but achieves a lower residual than other non-linear algebra techniques. Moreover, unlike the linear algebra-based techniques, it is shown that this approach scales to very large graphs. In addition to revisiting the idea of using a relative state space, the benefits of only optimizing the rotational components of a trajectory in order to perform loop closing is examined (rPOReSS). Finally, an incremental implementation of the rotational optimization is proposed (irPOReSS)

    Image-based 3-D reconstruction of constrained environments

    Get PDF
    Nuclear power plays a important role to the United Kingdom electricity generation infrastructure, providing a reliable baseload of low carbon electricity. The Advanced Gas-cooled Reactor (AGR) design makes up approximately 50% of the existing fleet, however, many of the operating reactors have exceeding their original design lifetimes.To ensure safe reactor operation, engineers perform periodic in-core visual inspections of reactor components to monitor the structural health of the core as it ages. However, current inspection mechanisms deployed provide limited structural information about the fuel channel or defects.;This thesis investigates the suitability of image-based 3-D reconstruction techniques to acquire 3-D structural geometry to enable improved diagnostic and prognostic abilities for inspection engineers. The application of image-based 3-D reconstruction to in-core inspection footage highlights significant challenges, most predominantly that the image saliency proves insuffcient for general reconstruction frameworks. The contribution of the thesis is threefold. Firstly, a novel semi-dense matching scheme which exploits sparse and dense image correspondence in combination with a novel intra-image region strength approach to improve the stability of the correspondence between images.;This results in a percentage increase of 138.53% of correct feature matches over similar state-of-the-art image matching paradigms. Secondly, a bespoke incremental Structure-from-Motion (SfM) framework called the Constrained Homogeneous SfM (CH-SfM) which is able to derive structure from deficient feature spaces and constrained environments. Thirdly, the application of the CH-SfM framework to remote visual inspection footage gathered within AGR fuel channels, outperforming other state-of-the-art reconstruction approaches and extracting representative 3-D structural geometry of orientational scans and fully circumferential reconstructions.;This is demonstrated on in-core and laboratory footage, achieving an approximate 3-D point density of 2.785 - 23.8025NX/cm² for real in-core inspection footage and high quality laboratory footage respectively. The demonstrated novelties have applicability to other constrained or feature-poor environments, with future work looking to producing fully dense, photo-realistic 3-D reconstructions.Nuclear power plays a important role to the United Kingdom electricity generation infrastructure, providing a reliable baseload of low carbon electricity. The Advanced Gas-cooled Reactor (AGR) design makes up approximately 50% of the existing fleet, however, many of the operating reactors have exceeding their original design lifetimes.To ensure safe reactor operation, engineers perform periodic in-core visual inspections of reactor components to monitor the structural health of the core as it ages. However, current inspection mechanisms deployed provide limited structural information about the fuel channel or defects.;This thesis investigates the suitability of image-based 3-D reconstruction techniques to acquire 3-D structural geometry to enable improved diagnostic and prognostic abilities for inspection engineers. The application of image-based 3-D reconstruction to in-core inspection footage highlights significant challenges, most predominantly that the image saliency proves insuffcient for general reconstruction frameworks. The contribution of the thesis is threefold. Firstly, a novel semi-dense matching scheme which exploits sparse and dense image correspondence in combination with a novel intra-image region strength approach to improve the stability of the correspondence between images.;This results in a percentage increase of 138.53% of correct feature matches over similar state-of-the-art image matching paradigms. Secondly, a bespoke incremental Structure-from-Motion (SfM) framework called the Constrained Homogeneous SfM (CH-SfM) which is able to derive structure from deficient feature spaces and constrained environments. Thirdly, the application of the CH-SfM framework to remote visual inspection footage gathered within AGR fuel channels, outperforming other state-of-the-art reconstruction approaches and extracting representative 3-D structural geometry of orientational scans and fully circumferential reconstructions.;This is demonstrated on in-core and laboratory footage, achieving an approximate 3-D point density of 2.785 - 23.8025NX/cm² for real in-core inspection footage and high quality laboratory footage respectively. The demonstrated novelties have applicability to other constrained or feature-poor environments, with future work looking to producing fully dense, photo-realistic 3-D reconstructions

    Visual Perception For Robotic Spatial Understanding

    Get PDF
    Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet

    Visual Odometry and Sparse Scene Reconstruction for UAVs with a Multi-Fisheye Camera System

    Get PDF
    Autonomously operating UAVs demand a fast localization for navigation, to actively explore unknown areas and to create maps. For pose estimation, many UAV systems make use of a combination of GPS receivers and inertial sensor units (IMU). However, GPS signal coverage may go down occasionally, especially in the close vicinity of objects, and precise IMUs are too heavy to be carried by lightweight UAVs. This and the high cost of high quality IMU motivate the use of inexpensive vision based sensors for localization using visual odometry or visual SLAM (simultaneous localization and mapping) techniques. The first contribution of this thesis is a more general approach to bundle adjustment with an extended version of the projective coplanarity equation which enables us to make use of omnidirectional multi-camera systems which may consist of fisheye cameras that can capture a large field of view with one shot. We use ray directions as observations instead of image points which is why our approach does not rely on a specific projection model assuming a central projection. In addition, our approach allows the integration and estimation of points at infinity, which classical bundle adjustments are not capable of. We show that the integration of far or infinitely far points stabilizes the estimation of the rotation angles of the camera poses. In its second contribution, we employ this approach to bundle adjustment in a highly integrated system for incremental pose estimation and mapping on light-weight UAVs. Based on the image sequences of a multi-camera system our system makes use of tracked feature points to incrementally build a sparse map and incrementally refines this map using the iSAM2 algorithm. Our system is able to optionally integrate GPS information on the level of carrier phase observations even in underconstrained situations, e.g. if only two satellites are visible, for georeferenced pose estimation. This way, we are able to use all available information in underconstrained GPS situations to keep the mapped 3D model accurate and georeferenced. In its third contribution, we present an approach for re-using existing methods for dense stereo matching with fisheye cameras, which has the advantage that highly optimized existing methods can be applied as a black-box without modifications even with cameras that have field of view of more than 180 deg. We provide a detailed accuracy analysis of the obtained dense stereo results. The accuracy analysis shows the growing uncertainty of observed image points of fisheye cameras due to increasing blur towards the image border. Core of the contribution is a rigorous variance component estimation which allows to estimate the variance of the observed disparities at an image point as a function of the distance of that point to the principal point. We show that this improved stochastic model provides a more realistic prediction of the uncertainty of the triangulated 3D points.Autonom operierende UAVs benötigen eine schnelle Lokalisierung zur Navigation, zur Exploration unbekannter Umgebungen und zur Kartierung. Zur Posenbestimmung verwenden viele UAV-Systeme eine Kombination aus GPS-Empfängern und Inertial-Messeinheiten (IMU). Die Verfügbarkeit von GPS-Signalen ist jedoch nicht überall gewährleistet, insbesondere in der Nähe abschattender Objekte, und präzise IMUs sind für leichtgewichtige UAVs zu schwer. Auch die hohen Kosten qualitativ hochwertiger IMUs motivieren den Einsatz von kostengünstigen bildgebenden Sensoren zur Lokalisierung mittels visueller Odometrie oder SLAM-Techniken zur simultanen Lokalisierung und Kartierung. Im ersten wissenschaftlichen Beitrag dieser Arbeit entwickeln wir einen allgemeineren Ansatz für die Bündelausgleichung mit einem erweiterten Modell für die projektive Kollinearitätsgleichung, sodass auch omnidirektionale Multikamerasysteme verwendet werden können, welche beispielsweise bestehend aus Fisheyekameras mit einer Aufnahme einen großen Sichtbereich abdecken. Durch die Integration von Strahlrichtungen als Beobachtungen ist unser Ansatz nicht von einem kameraspezifischen Abbildungsmodell abhängig solange dieses der Zentralprojektion folgt. Zudem erlaubt unser Ansatz die Integration und Schätzung von unendlich fernen Punkten, was bei klassischen Bündelausgleichungen nicht möglich ist. Wir zeigen, dass durch die Integration weit entfernter und unendlich ferner Punkte die Schätzung der Rotationswinkel der Kameraposen stabilisiert werden kann. Im zweiten Beitrag verwenden wir diesen entwickelten Ansatz zur Bündelausgleichung für ein System zur inkrementellen Posenschätzung und dünnbesetzten Kartierung auf einem leichtgewichtigen UAV. Basierend auf den Bildsequenzen eines Mulitkamerasystems baut unser System mittels verfolgter markanter Bildpunkte inkrementell eine dünnbesetzte Karte auf und verfeinert diese inkrementell mittels des iSAM2-Algorithmus. Unser System ist in der Lage optional auch GPS Informationen auf dem Level von GPS-Trägerphasen zu integrieren, wodurch sogar in unterbestimmten Situation - beispielsweise bei nur zwei verfügbaren Satelliten - diese Informationen zur georeferenzierten Posenschätzung verwendet werden können. Im dritten Beitrag stellen wir einen Ansatz zur Verwendung existierender Methoden für dichtes Stereomatching mit Fisheyekameras vor, sodass hoch optimierte existierende Methoden als Black Box ohne Modifzierungen sogar mit Kameras mit einem Gesichtsfeld von mehr als 180 Grad verwendet werden können. Wir stellen eine detaillierte Genauigkeitsanalyse basierend auf dem Ergebnis des dichten Stereomatchings dar. Die Genauigkeitsanalyse zeigt, wie stark die Genauigkeit beobachteter Bildpunkte bei Fisheyekameras zum Bildrand aufgrund von zunehmender Unschärfe abnimmt. Das Kernstück dieses Beitrags ist eine Varianzkomponentenschätzung, welche die Schätzung der Varianz der beobachteten Disparitäten an einem Bildpunkt als Funktion von der Distanz dieses Punktes zum Hauptpunkt des Bildes ermöglicht. Wir zeigen, dass dieses verbesserte stochastische Modell eine realistischere Prädiktion der Genauigkeiten der 3D Punkte ermöglicht

    Multi-resolution mapping and planning for UAV navigation in attitude constrained environments

    Get PDF
    In this thesis we aim to bridge the gap between high quality map reconstruction and Unmanned Aerial Vehicles (UAVs) SE(3) motion planning in challenging environments with narrow openings, such as disaster areas, which requires attitude to be considered. We propose an efficient system that leverages the concept of adaptive-resolution volumetric mapping, which naturally integrates with the hierarchical decomposition of space in an octree data structure. Instead of a Truncated Signed Distance Function (TSDF), we adopt mapping of occupancy probabilities in log-odds representation, which allows representation of both surfaces, as well as the entire free, i.e.\ observed space, as opposed to unobserved space. We introduce a method for choosing resolution -on the fly- in real-time by means of a multi-scale max-min pooling of the input depth image. The notion of explicit free space mapping paired with the spatial hierarchy in the data structure, as well as map resolution, allows for collision queries, as needed for robot motion planning, at unprecedented speed. Our mapping strategy supports pinhole cameras as well as spherical sensor models. Additionally, we introduce a first-of-a-kind global minimum cost path search method based on A* that considers attitude along the path. State-of-the-art methods incorporate attitude only in the refinement stage. To make the problem tractable, our method exploits an adaptive and coarse-to-fine approach using global and local A* runs, plus an efficient method to introduce the UAV attitude in the process. We integrate our method with an SE(3) trajectory optimisation method based on a safe-flight-corridor, yielding a complete path planning pipeline. We quantitatively evaluate our mapping strategy in terms of mapping accuracy, memory, runtime performance, and planning performance showing improvements over the state-of-the-art, particularly in cases requiring high resolution maps. Furthermore, extensive evaluation is undertaken using the AirSim flight simulator under closed loop control in a set of randomised maps, allowing us to quantitatively assess our path initialisation method. We show that it achieves significantly higher success rates than the baselines, at a reduced computational burden.Open Acces

    Towards Reliable and Accurate Global Structure-from-Motion

    Get PDF
    Reconstruction of objects or scenes from sparse point detections across multiple views is one of the most tackled problems in computer vision. Given the coordinates of 2D points tracked in multiple images, the problem consists of estimating the corresponding 3D points and cameras\u27 calibrations (intrinsic and pose), and can be solved by minimizing reprojection errors using bundle adjustment. However, given bundle adjustment\u27s nonlinear objective function and iterative nature, a good starting guess is required to converge to global minima. Global and Incremental Structure-from-Motion methods appear as ways to provide good initializations to bundle adjustment, each with different properties. While Global Structure-from-Motion has been shown to result in more accurate reconstructions compared to Incremental Structure-from-Motion, the latter has better scalability by starting with a small subset of images and sequentially adding new views, allowing reconstruction of sequences with millions of images. Additionally, both Global and Incremental Structure-from-Motion methods rely on accurate models of the scene or object, and under noisy conditions or high model uncertainty might result in poor initializations for bundle adjustment. Recently pOSE, a class of matrix factorization methods, has been proposed as an alternative to conventional Global SfM methods. These methods use VarPro - a second-order optimization method - to minimize a linear combination of an approximation of reprojection errors and a regularization term based on an affine camera model, and have been shown to converge to global minima with a high rate even when starting from random camera calibration estimations.This thesis aims at improving the reliability and accuracy of global SfM through different approaches. First, by studying conditions for global optimality of point set registration, a point cloud averaging method that can be used when (incomplete) 3D point clouds of the same scene in different coordinate systems are available. Second, by extending pOSE methods to different Structure-from-Motion problem instances, such as Non-Rigid SfM or radial distortion invariant SfM. Third and finally, by replacing the regularization term of pOSE methods with an exponential regularization on the projective depth of the 3D point estimations, resulting in a loss that achieves reconstructions with accuracy close to bundle adjustment
    • …
    corecore