12 research outputs found
SLAM-based 3D outdoor reconstructions from lidar data
The use of depth (RGBD) cameras to reconstruct large outdoor environments is not feasible due to lighting conditions
and low depth range. LIDAR sensors can be used instead.
Most state of the art SLAM methods are devoted to indoor environments and depth (RGBD) cameras. We have adapted two SLAM systems to work with LIDAR data. We have compared the systems for LIDAR and RGBD data by performing quantitative evaluations. Results show that the best method for LIDAR data is RTAB-Map with a clear difference. Additionally, RTAB-Map has been used to create 3D reconstructions with and without photometry from a visible color camera. This proves the potential of LIDAR sensors for the reconstruction of outdoor environments for immersion or audiovisual production applicationsPeer ReviewedPostprint (author's final draft
Estimating Head Measurements from 3D Point Clouds
Maße menschlicher Köpfe sind unter anderem nützlich für die Ergonomie, die Akustik,
die Medizin, Computer Vision sowie Computergrafik. Solche Maße werden üblicherweise
gänzlich oder teilweise manuell gewonnen, was ein umständliches Verfahren darstellt,
da die Genauigkeit von der Kompetenz der Person abhängt, die diese Messungen vornimmt.
Darüber hinaus enthalten manuell erfasste Daten weniger Informationen, von
denen neue Maße abgeleitet werden können, wenn das Subjekt nicht länger verfügbar
ist. Um diese Nachteile wettzumachen, wurde ein Verfahren entwickelt, das in diesem
Manuskript vorgestellt wird, um automatisch Maße aus 3D Punktwolken zu bestimmen,
da diese eine langfristige Repräsentation von Menschen darstellen. Diese 3D Punktwolken
wurden mit dem ASUS Xtion Pro Live RGB-D Sensor und KinFu (der open-source
Implementierung von KinectFusion) aufgenommen. Es werden sowohl qualitative als
auch quantitative Auswertungen der gewonnenen Maße präsentiert. Weiterhin wurde die
Umsetzbarkeit des entwickelten Verfahrens anhand einer Fallstudie beurteilt, in der die
gewonnenen Maße genutzt wurden, um den Einfluss von anthropometrischen Daten auf
die Berechung der interauralen Zeitdifferenz zu schätzen.
In Anbetracht der vielversprechenden Ergebnisse der Bestimmung von Maßen aus 3D
Modellen, die mit dem Asus Xtion Pro Live Sensor und KinFu erstellt wurden, (sowie
der Ergebnisse aus der Literatur) und der Entwicklung neuer RGB-D Sensoren, wird außerdem
eine Studie des Einflusses von sieben verschiedenen RGB-D Sensoren auf die
Rekonstruktion mittels KinFu dargestellt. Diese Studie enthält qualitative und quantitative
Auswertungen von Rekonstruktionen vier verschiedener Objekte, die in unterschiedlichen
Distanzen von 40 cm bis 120 cm aufgenommen wurden. Diese Spanne wurde anhand
der Reichweite der Sensoren gewählt. Des Weiteren ist eine Sammlung der erhaltenen
Rekonstruktionen als Datensatz verfügbar unter http://uni-tuebingen.de/en/138898.Human head measurements are valuable in ergonomics, acoustics, medicine, computer
vision, and computer graphics, among other fields. Such measurements are usually obtained using entirely or partially manual tasks, which is a cumbersome practice since
the level of accuracy depends on the expertise of the person that takes the measurements. Moreover, manually acquired measurements contain less information from which new measurements can be deduced when the subject is no longer accessible. Therefore, in order to overcome these disadvantages, an approach to automatically estimate measurements from 3D point clouds, which are long-term representations of humans, has been developed and is described in the presented manuscript. The 3D point clouds were acquired using an RGBD sensor Asus Xtion Pro Live and KinFu (open-source implementation of KinectFusion). Qualitative and quantitative evaluations of the estimated
measurements are presented. Furthermore, the feasibility of the developed approach was
evaluated through a case study in which the estimated measurements were used to appraise the influence of anthropometric data on the computation of the interaural time
difference.
Considering the promising results obtained from the estimation of measurements from
3D models acquired with the sensor Asus Xtion Pro Live and KinFu (plus the results
reported in the literature) and the development of new RGBD sensors, a study of the
influence of seven different RGBD sensors on the reconstruction obtained with KinFu
is also presented. This study contains qualitative and quantitative evaluations of reconstructions of four diverse objects captured at different distances that range from 40 cm to 120 cm. Such range was established according to the operational range of the sensors. Furthermore, a collection of obtained reconstructions is available as a dataset in
http://uni-tuebingen.de/en/138898
{CurveFusion}: {R}econstructing Thin Structures from {RGBD} Sequences
We introduce CurveFusion, the first approach for high quality scanning of thin structures at interactive rates using a handheld RGBD camera. Thin filament-like structures are mathematically just 1D curves embedded in R^3, and integration-based reconstruction works best when depth sequences (from the thin structure parts) are fused using the object's (unknown) curve skeleton. Thus, using the complementary but noisy color and depth channels, CurveFusion first automatically identifies point samples on potential thin structures and groups them into bundles, each being a group of a fixed number of aligned consecutive frames. Then, the algorithm extracts per-bundle skeleton curves using L1 axes, and aligns and iteratively merges the L1 segments from all the bundles to form the final complete curve skeleton. Thus, unlike previous methods, reconstruction happens via integration along a data-dependent fusion primitive, i.e., the extracted curve skeleton. We extensively evaluate CurveFusion on a range of challenging examples, different scanner and calibration settings, and present high fidelity thin structure reconstructions previously just not possible from raw RGBD sequences
Python API for Altair Inspire Studio with functionality of capturing 3D models from RGBD sensors
This thesis shows how we implemented the Python support to write plugins and to interact with Altair Inspire Studio, in addition to the C++ SDK they have been offering for years. Then, the thesis shows how we used this novel api to develop a plugin to capture 3D models from reality, using RGBD sensors
Large Scale 3D Mapping of Indoor Environments Using a Handheld RGBD Camera
The goal of this research is to investigate the problem of reconstructing a 3D representation of an environment, of arbitrary size, using a handheld color and depth (RGBD) sensor. The focus of this dissertation is to examine four of the underlying subproblems to this system: camera tracking, loop closure, data storage, and integration. First, a system for 3D reconstruction of large indoor planar environments with data captured from an RGBD sensor mounted on a mobile robotic platform is presented. An algorithm for constructing nearly drift-free 3D occupancy grids of large indoor environments in an online manner is also presented. This approach combines data from an odometry sensor with output from a visual registration algorithm, and it enforces a Manhattan world constraint by utilizing factor graphs to produce an accurate online estimate of the trajectory of the mobile robotic platform. Through several experiments in environments with varying sizes and construction it is shown that this method reduces rotational and translational drift significantly without performing any loop closing techniques. In addition the advantages and limitations of an octree data structure representation of a 3D environment is examined. Second, the problem of sensor tracking, specifically the use of the KinectFusion algorithm to align two subsequent point clouds generated by an RGBD sensor, is studied. A method to overcome a significant limitation of the Iterative Closest Point (ICP) algorithm used in KinectFusion is proposed, namely, its sole reliance upon geometric information. The proposed method uses both geometric and color information in a direct manner that uses all the data in order to accurately estimate camera pose. Data association is performed by computing a warp between the two color images associated with two RGBD point clouds using the Lucas-Kanade algorithm. A subsequent step then estimates the transformation between the point clouds using either a point-to-point or point-to-plane error metric. Scenarios in which each of these metrics fails are described, and a normal covariance test for automatically selecting between them is proposed. Together, Lucas-Kanade data association (LKDA) along with covariance testing enables robust camera tracking through areas of low geometrical features, while at the same time retaining accuracy in environments in which the existing ICP technique succeeds. Experimental results on several publicly available datasets demonstrate the improved performance both qualitatively and quantitatively. Third, the choice of state space in the context of performing loop closure is revisited. Although a relative state space has been discounted by previous authors, it is shown that such a state space is actually extremely powerful, able to achieve recognizable results after just one iteration. The power behind the technique is that changing the orientation of one node is able to affect other nodes. At the same time, the approach --- which is referred to as Pose Optimization using a Relative State Space (POReSS) --- is fast because, like the more popular incremental state space, the Jacobian never needs to be explicitly computed. Furthermore, it is shown that while POReSS is able to quickly compute a solution near the global optimum, it is not precise enough to perform the fine adjustments necessary to achieve acceptable results. As a result, a method to augment POReSS with a fast variant of Gauss-Seidel --- which is referred to as Graph-Seidel --- on a global state space to allow the solution to settle closer to the global minimum is proposed. Through a set of experiments, it is shown that this combination of POReSS and Graph-Seidel is not only faster but achieves a lower residual than other non-linear algebra techniques. Moreover, unlike the linear algebra-based techniques, it is shown that this approach scales to very large graphs. In addition to revisiting the idea of using a relative state space, the benefits of only optimizing the rotational components of a trajectory in order to perform loop closing is examined (rPOReSS). Finally, an incremental implementation of the rotational optimization is proposed (irPOReSS)
Efficient 3D Segmentation, Registration and Mapping for Mobile Robots
Sometimes simple is better! For certain situations and tasks, simple but robust methods can achieve the same or better results in the same or less time than related sophisticated approaches. In the context of robots operating in real-world environments, key challenges are perceiving objects of interest and obstacles as well as building maps of the environment and localizing therein. The goal of this thesis is to carefully analyze such problem formulations, to deduce valid assumptions and simplifications, and to develop simple solutions that are both robust and fast. All approaches make use of sensors capturing 3D information, such as consumer RGBD cameras. Comparative evaluations show the performance of the developed approaches. For identifying objects and regions of interest in manipulation tasks, a real-time object segmentation pipeline is proposed. It exploits several common assumptions of manipulation tasks such as objects being on horizontal support surfaces (and well separated). It achieves real-time performance by using particularly efficient approximations in the individual processing steps, subsampling the input data where possible, and processing only relevant subsets of the data. The resulting pipeline segments 3D input data with up to 30Hz. In order to obtain complete segmentations of the 3D input data, a second pipeline is proposed that approximates the sampled surface, smooths the underlying data, and segments the smoothed surface into coherent regions belonging to the same geometric primitive. It uses different primitive models and can reliably segment input data into planes, cylinders and spheres. A thorough comparative evaluation shows state-of-the-art performance while computing such segmentations in near real-time. The second part of the thesis addresses the registration of 3D input data, i.e., consistently aligning input captured from different view poses. Several methods are presented for different types of input data. For the particular application of mapping with micro aerial vehicles where the 3D input data is particularly sparse, a pipeline is proposed that uses the same approximate surface reconstruction to exploit the measurement topology and a surface-to-surface registration algorithm that robustly aligns the data. Optimization of the resulting graph of determined view poses then yields globally consistent 3D maps. For sequences of RGBD data this pipeline is extended to include additional subsampling steps and an initial alignment of the data in local windows in the pose graph. In both cases, comparative evaluations show a robust and fast alignment of the input data
Room layout estimation on mobile devices
Room layout generation is the problem of generating a drawing or a digital model of an existing room from a set of measurements such as laser data or images. The generation of floor plans can find application in the building industry to assess the quality and the correctness of an ongoing construction w.r.t. the initial model, or to quickly sketch the renovation of an apartment. Real estate industry can rely on automatic generation of floor plans to ease the process of checking the livable surface and to propose virtual visits to prospective customers. As for the general public, the room layout can be integrated into mixed reality games to provide a better immersiveness experience, or used in other related augmented reality applications such room redecoration. The goal of this industrial thesis (CIFRE) is to investigate and take advantage of the state-of-the art mobile devices in order to automate the process of generating room layouts. Nowadays, modern mobile devices usually come a wide range of sensors, such as inertial motion unit (IMU), RGB cameras and, more recently, depth cameras. Moreover, tactile touchscreens offer a natural and simple way to interact with the user, thus favoring the development of interactive applications, in which the user can be part of the processing loop. This work aims at exploiting the richness of such devices to address the room layout generation problem. The thesis has three major contributions. We first show how the classic problem of detecting vanishing points in an image can benefit from an a-priori given by the IMU sensor. We propose a simple and effective algorithm for detecting vanishing points relying on the gravity vector estimated by the IMU. A new public dataset containing images and the relevant IMU data is introduced to help assessing vanishing point algorithms and foster further studies in the field. As a second contribution, we explored the state of-the-art of real-time localization and map optimization algorithms for RGB-D sensors. Real-time localization is a fundamental task to enable augmented reality applications, and thus it is a critical component when designing interactive applications. We propose an evaluation of existing algorithms for the common desktop set-up in order to be employed on a mobile device. For each considered method, we assess the accuracy of the localization as well as the computational performances when ported on a mobile device. Finally, we present a proof of concept of application able to generate the room layout relying on a Project Tango tablet equipped with an RGB-D sensor. In particular, we propose an algorithm that incrementally processes and fuses the 3D data provided by the sensor in order to obtain the layout of the room. We show how our algorithm can rely on the user interactions in order to correct the generated 3D model during the acquisition process
Reconstruction and recognition of confusable models using three-dimensional perception
Perception is one of the key topics in robotics research. It is about the processing
of external sensor data and its interpretation. The necessity of fully autonomous
robots makes it crucial to help them to perform tasks more reliably, flexibly, and
efficiently. As these platforms obtain more refined manipulation capabilities, they
also require expressive and comprehensive environment models: for manipulation
and affordance purposes, their models have to involve each one of the objects
present in the world, coincidentally with their location, pose, shape and other aspects.
The aim of this dissertation is to provide a solution to several of these challenges
that arise when meeting the object grasping problem, with the aim of improving
the autonomy of the mobile manipulator robot MANFRED-2. By the analysis
and interpretation of 3D perception, this thesis covers in the first place the
localization of supporting planes in the scenario. As the environment will contain
many other things apart from the planar surface, the problem within cluttered
scenarios has been solved by means of Differential Evolution, which is a particlebased
evolutionary algorithm that evolves in time to the solution that yields the
cost function lowest value.
Since the final purpose of this thesis is to provide with valuable information for
grasping applications, a complete model reconstructor has been developed. The
proposed method holdsmany features such as robustness against abrupt rotations,
multi-dimensional optimization, feature extensibility, compatible with other scan
matching techniques, management of uncertain information and an initialization
process to reduce convergence timings. It has been designed using a evolutionarybased
scan matching optimizer that takes into account surface features of the object,
global form and also texture and color information.
The last tackled challenge regards the recognition problem. In order to procure
with worthy information about the environment to the robot, a meta classifier that discerns efficiently the observed objects has been implemented. It is capable
of distinguishing between confusable objects, such as mugs or dishes with similar
shapes but different size or color.
The contributions presented in this thesis have been fully implemented and
empirically evaluated in the platform. A continuous grasping pipeline covering
from perception to grasp planning including visual object recognition for confusable
objects has been developed. For that purpose, an indoor environment with
several objects on a table is presented in the nearby of the robot. Items are recognized
from a database and, if one is chosen, the robot will calculate how to grasp
it taking into account the kinematic restrictions associated to the anthropomorphic
hand and the 3D model for this particular object. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------La percepción es uno de los temas más relevantes en el mundo de la investigaci
ón en robótica. Su objetivo es procesar e interpretar los datos recibidos por
un sensor externo. La gran necesidad de desarrollar robots autónomos hace imprescindible
proporcionar soluciones que les permita realizar tareas más precisas,
flexibles y eficientes. Dado que estas plataformas cada día adquieren mejores capacidades
para manipular objetos, también necesitarán modelos expresivos y comprensivos:
para realizar tareas de manipulación y prensión, sus modelos han de
tener en cuenta cada uno de los objetos presentes en su entorno, junto con su localizaci
ón, orientación, forma y otros aspectos.
El objeto de la presente tesis doctoral es proponer soluciones a varios de los
retos que surgen al enfrentarse al problema del agarre, con el propósito final de
aumentar la capacidad de autonomía del robot manipulador MANFRED-2. Mediante
el análisis e interpretación de la percepción tridimensional, esta tesis cubre
en primer lugar la localización de planos de soporte en sus alrededores. Dado que
el entorno contendrá muchos otros elementos aparte de la superficie de apoyo buscada, el problema en entornos abarrotados ha sido solucionado mediante Evolución
Diferencial, que es un algoritmo evolutivo basado en partículas que evoluciona
temporalmente a la solución que contempla el menor resultado en la función de
coste.
Puesto que el propósito final de este trabajo de investigación es proveer de información valiosa a las aplicaciones de prensión, se ha desarrollado un reconstructor
de modelos completos. El método propuesto posee diferentes características
como robustez a giros abruptos, optimización multidimensional, extensión a otras
características, compatibilidad con otras técnicas de reconstrucción, manejo de incertidumbres
y un proceso de inicialización para reducir el tiempo de convergencia. Ha sido diseñado usando un registro optimizado mediante técnicas evolutivas
que tienen en cuenta las particularidades de la superficie del objeto, su forma
global y la información relativa a la textura.
El último problema abordado está relacionado con el reconocimiento de objetos. Con la intención de abastecer al robot con la mayor información posible sobre el entorno, se ha implementado un meta clasificador que diferencia de manera eficaz los objetos observados. Ha sido capacitado para distinguir objetos confundibles como tazas o platos con formas similares pero con diferentes colores o tamaños.
Las contribuciones presentes en esta tesis han sido completamente implementadas y probadas de manera empírica en la plataforma. Se ha desarrollado un sistema que cubre el problema de agarre desde la percepción al cálculo de la trayectoria
incluyendo el sistema de reconocimiento de objetos confundibles. Para ello, se ha presentado una mesa con objetos en un entorno cerrado cercano al robot. Los elementos son comparados con una base de datos y si se desea agarrar uno de ellos,
el robot estimará cómo cogerlo teniendo en cuenta las restricciones cinemáticas asociadas a una mano antropomórfica y el modelo tridimensional generado del objeto en cuestión