66 research outputs found

    Robust Intrinsic and Extrinsic Calibration of RGB-D Cameras

    Get PDF
    Color-depth cameras (RGB-D cameras) have become the primary sensors in most robotics systems, from service robotics to industrial robotics applications. Typical consumer-grade RGB-D cameras are provided with a coarse intrinsic and extrinsic calibration that generally does not meet the accuracy requirements needed by many robotics applications (e.g., highly accurate 3D environment reconstruction and mapping, high precision object recognition and localization, ...). In this paper, we propose a human-friendly, reliable and accurate calibration framework that enables to easily estimate both the intrinsic and extrinsic parameters of a general color-depth sensor couple. Our approach is based on a novel two components error model. This model unifies the error sources of RGB-D pairs based on different technologies, such as structured-light 3D cameras and time-of-flight cameras. Our method provides some important advantages compared to other state-of-the-art systems: it is general (i.e., well suited for different types of sensors), based on an easy and stable calibration protocol, provides a greater calibration accuracy, and has been implemented within the ROS robotics framework. We report detailed experimental validations and performance comparisons to support our statements

    Resolving depth measurement ambiguity with commercially available range imaging cameras

    Get PDF
    Time-of-flight range imaging is typically performed with the amplitude modulated continuous wave method. This involves illuminating a scene with amplitude modulated light. Reflected light from the scene is received by the sensor with the range to the scene encoded as a phase delay of the modulation envelope. Due to the cyclic nature of phase, an ambiguity in the measured range occurs every half wavelength in distance, thereby limiting the maximum useable range of the camera. This paper proposes a procedure to resolve depth ambiguity using software post processing. First, the range data is processed to segment the scene into separate objects. The average intensity of each object can then be used to determine which pixels are beyond the non-ambiguous range. The results demonstrate that depth ambiguity can be resolved for various scenes using only the available depth and intensity information. This proposed method reduces the sensitivity to objects with very high and very low reflectance, normally a key problem with basic threshold approaches. This approach is very flexible as it can be used with any range imaging camera. Furthermore, capture time is not extended, keeping the artifacts caused by moving objects at a minimum. This makes it suitable for applications such as robot vision where the camera may be moving during captures. The key limitation of the method is its inability to distinguish between two overlapping objects that are separated by a distance of exactly one non-ambiguous range. Overall the reliability of this method is higher than the basic threshold approach, but not as high as the multiple frequency method of resolving ambiguity

    OpenPTrack: Open Source Multi-Camera Calibration and People Tracking for RGB-D Camera Networks

    Get PDF
    OpenPTrack is an open source software for multi-camera calibration and people tracking in RGB-D camera networks. It allows to track people in big volumes at sensor frame rate and currently supports a heterogeneous set of 3D sensors. In this work, we describe its user-friendly calibration procedure, which consists of simple steps with real-time feedback that allow to obtain accurate results in estimating the camera poses that are then used for tracking people. On top of a calibration based on moving a checkerboard within the tracking space and on a global optimization of cameras and checkerboards poses, a novel procedure which aligns people detections coming from all sensors in a x-y-time space is used for refining camera poses. While people detection is executed locally, in the machines connected to each sensor, tracking is performed by a single node which takes into account detections from all over the network. Here we detail how a cascade of algorithms working on depth point clouds and color, infrared and disparity images is used to perform people detection from different types of sensors and in any indoor light condition. We present experiments showing that a considerable improvement can be obtained with the proposed calibration refinement procedure that exploits people detections and we compare Kinect v1, Kinect v2 and Mesa SR4500 performance for people tracking applications. OpenPTrack is based on the Robot Operating System and the Point Cloud Library and has already been adopted in networks composed of up to ten imagers for interactive arts, education, culture and human\u2013robot interaction applications

    Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions

    Precise Depth Image Based Real-Time 3D Difference Detection

    Get PDF
    3D difference detection is the task to verify whether the 3D geometry of a real object exactly corresponds to a 3D model of this object. This thesis introduces real-time 3D difference detection with a hand-held depth camera. In contrast to previous works, with the proposed approach, geometric differences can be detected in real time and from arbitrary viewpoints. Therefore, the scan position of the 3D difference detection be changed on the fly, during the 3D scan. Thus, the user can move the scan position closer to the object to inspect details or to bypass occlusions. The main research questions addressed by this thesis are: Q1: How can 3D differences be detected in real time and from arbitrary viewpoints using a single depth camera? Q2: Extending the first question, how can 3D differences be detected with a high precision? Q3: Which accuracy can be achieved with concrete setups of the proposed concept for real time, depth image based 3D difference detection? This thesis answers Q1 by introducing a real-time approach for depth image based 3D difference detection. The real-time difference detection is based on an algorithm which maps the 3D measurements of a depth camera onto an arbitrary 3D model in real time by fusing computer vision (depth imaging and pose estimation) with a computer graphics based analysis-by-synthesis approach. Then, this thesis answers Q2 by providing solutions for enhancing the 3D difference detection accuracy, both by precise pose estimation and by reducing depth measurement noise. A precise variant of the 3D difference detection concept is proposed, which combines two main aspects. First, the precision of the depth camera’s pose estimation is improved by coupling the depth camera with a very precise coordinate measuring machine. Second, measurement noise of the captured depth images is reduced and missing depth information is filled in by extending the 3D difference detection with 3D reconstruction. The accuracy of the proposed 3D difference detection is quantified by a quantitative evaluation. This provides an anwer to Q3. The accuracy is evaluated both for the basic setup and for the variants that focus on a high precision. The quantitative evaluation using real-world data covers both the accuracy which can be achieved with a time-of-flight camera (SwissRanger 4000) and with a structured light depth camera (Kinect). With the basic setup and the structured light depth camera, differences of 8 to 24 millimeters can be detected from one meter measurement distance. With the enhancements proposed for precise 3D difference detection, differences of 4 to 12 millimeters can be detected from one meter measurement distance using the same depth camera. By solving the challenges described by the three research question, this thesis provides a solution for precise real-time 3D difference detection based on depth images. With the approach proposed in this thesis, dense 3D differences can be detected in real time and from arbitrary viewpoints using a single depth camera. Furthermore, by coupling the depth camera with a coordinate measuring machine and by integrating 3D reconstruction in the 3D difference detection, 3D differences can be detected in real time and with a high precision

    Task-oriented viewpoint planning for free-form objects

    Get PDF
    A thesis submitted to the Universitat Politècnica de Catalunya to obtain the degree of Doctor of Philosophy. Doctoral programme: Automatic Control, Robotics and Computer Vision. This thesis was completed at: Institut de Robòtica i Informàtica Industrial, CSIC-UPC.[EN]: This thesis deals with active sensing and its use in real exploration tasks under both scene ambiguities and measurement uncertainties. While object modeling is the implicit objective of most of active sensing algorithms, in this work we have explored new strategies to deal with more generic and more complex tasks. Active sensing requires the ability of moving the perceptual system to gather new information. Our approach uses a robot manipulator with a 3D Time-of-Flight (ToF) camera attached to the end-effector. For a complex task, we have focused our attention on plant phenotyping. Plants are complex objects, with leaves that change their position and size along time. Valid viewpoints for a certain plant are hardly valid for a different one, even belonging to the same species. Some instruments, such as chlorophyll meters or disk sampling tools, require being precisely positioned over a particular location of the leaf. Therefore, their use requires the modeling of specific regions of interest of the plant, including also the free space needed for avoiding obstacles and approaching the leaf with tool. It is easy to observe that predefined camera trajectories are not valid here, and that usually with one single view it is very difficult to acquire all the required information. The overall objective of this thesis is to solve complex active sensing tasks by embedding their exploratory goal into a pre-estimated geometrical model, using information-gain as the fundamental guideline for the reward function. The main contributions can be divided in two groups: first, the evaluation of ToF cameras and their calibration to assess the uncertainty of the measurements (presented in Part I); and second, the proposal of a framework capable of embedding the task, modeled as free and occupied space, and that takes into account the modeled sensor's uncertainty to improve the action selection algorithm (presented in Part II). This thesishas given rise to 14 publications, including 5 indexed journals, and its results have been used in the GARNICS European project. The complete framework is based on the Next-Best-View methodology and it can be summarized in the following main steps. First, an initial view of the object (e.g., a plant) is acquired. From this initial view and given a set of candidate viewpoints, the expected gain obtained by moving the robot and acquiring the next image is computed. This computation takes into account the uncertainty from all the different pixels of the sensor, the expected information based on a predefined task model, and the possible occlusions. Once the most promising view is selected, the robot moves, takes a new image, integrates this information intothe model, and evaluates again the set of remaining views. Finally, the task terminates when enough information is gathered. In our examples, this process enables the robot to perform a measurement on top of a leaf. The key ingredient is to model the complexity of the task in a layered representation of free-occupied occupancy grid maps. This allows to naturally encode the requirements of the task, to maintain and update the belief state with the measurements performed, to simulate and compute the expected gains of all potential viewpoints, and to encode the termination condition. During this work the technology of ToF cameras has incredibly evolved. Nowadays it is very popular and ToF cameras are already embedded in some consumer devices. Although the quality of the measurements has been considerably improved, it is still not uniform in the sensor. We believe, as it has been demonstrated in various experiments in this work, that a careful modeling of the sensor's uncertainty is highly beneficial and helps to design better decision systems. In our case, it enables a more realistic computation of the information gain measure, and consequently, a better selection criterion.[CA]: Aquesta tesi aborda el tema de la percepció activa i el seu ús en tasques d'exploració en entorns reals tot considerant la ambigüitat en l'escena i la incertesa del sistema de percepció. Al contrari de la majoria d'algoritmes de percepció activa, on el modelatge d'objectes sol ser l'objectiu implícit, en aquesta tesi hem explorat noves estratègies per poder tractar tasques genèriques i de major complexitat. Tot sistema de percepció activa requereix un aparell sensorial amb la capacitat de variar els seus paràmetres de forma controlada, per poder, d'aquesta manera, recopilar nova informació per resoldre una tasca determinada. En tasques d'exploració, la posició i orientació del sensor són paràmetres claus per resoldre la tasca. En el nostre estudi hem fet ús d'un robot manipulador com a sistema de posicionament i d'una càmera de profunditat de temps de vol (ToF), adherida al seu efector final, com a sistema de percepció. Com a tasca final, ens hem concentrat en l'adquisició de mesures sobre fulles dins de l'àmbit del fenotipatge de les plantes. Les plantes son objectes molt complexos, amb fulles que canvien de textura, posició i mida al llarg del temps. Això comporta diverses dificultats. Per una banda, abans de dur a terme una mesura sobre un fulla s'ha d'explorar l'entorn i trobar una regió que ho permeti. A més a més, aquells punts de vista que han estat adequats per una determinada planta difícilment ho seran per una altra, tot i sent les dues de la mateixa espècie. Per un altra banda, en el moment de la mesura, certs instruments, tals com els mesuradors de clorofil·la o les eines d'extracció de mostres, requereixen ser posicionats amb molta precisió. És necessari, doncs, disposar d'un model detallat d'aquestes regions d'interès, i que inclogui no només l'espai ocupat sinó també el lliure. Gràcies a la modelització de l'espai lliure es pot dur a terme una bona evitació d'obstacles i un bon càlcul de la trajectòria d'aproximació de l'eina a la fulla. En aquest context, és fàcil veure que, en general, amb un sol punt de vistano n'hi haprou per adquirir tota la informació necessària per prendre una mesura, i que l'ús de trajectòries predeterminades no garanteixen l'èxit. L'objectiu general d'aquesta tesi és resoldre tasques complexes de percepció activa mitjançant la codificació del seu objectiu d'exploració en un model geomètric prèviament estimat, fent servir el guany d'informació com a guia fonamental dins de la funció de cost. Les principals contribucions d'aquesta tesi es poden dividir en dos grups: primer, l'avaluació de les càmeres ToF i el seu calibratge per poder avaluar la incertesa de les seves mesures (presentat en la Part I); i en segon lloc, la proposta d'un sistema capaç de codificar la tasca mitjançant el modelatge de l'espai lliure i ocupat, i que té en compte la incertesa del sensor per millorar la selecció de les accions (presentat en la Part II). Aquesta tesi ha donat lloc a 14 publicacions, incloent 5 en revistes indexades, i els resultats obtinguts s'han fet servir en el projecte Europeu GARNICS. La funcionalitat del sistema complet està basada en els mètodes Next-Best-View (següent-millor-vista) i es pot desglossar en els següents passos principals. En primer lloc, s'obté una vista inicial de l'objecte (p. ex., una planta). A partir d'aquesta vista inicial i d'un conjunt de vistes candidates, s'estima, per cada una d'elles, el guany d'informació resultant, tant de moure la càmera com d'obtenir una nova mesura. És rellevant dir que aquest càlcul té en compte la incertesa de cada un dels píxels del sensor, l'estimació de la informació basada en el model de la tasca preestablerta i les possibles oclusions. Un cop seleccionada la vista més prometedora, el robot es mou a la nova posició, pren una nova imatge, integra aquesta informació en el model i torna a avaluar, un altre cop, el conjunt de punts de vista restants. Per últim, la tasca acaba en el moment que es recopila suficient informació.This work has been partially supported by a JAE fellowship of the Spanish Scientific Research Council (CSIC), the Spanish Ministry of Science and Innovation, the Catalan Research Commission and the European Commission under the research projects: DPI2008-06022: PAU: Percepción y acción ante incertidumbre. DPI2011-27510: PAU+: Perception and Action in Robotics Problems with Large State Spaces. 201350E102: MANIPlus: Manipulación robotizada de objetos deformables. 2009-SGR-155: SGR ROBÒTICA: Grup de recerca consolidat - Grup de Robòtica. FP6-2004-IST-4-27657: EU PACO PLUS project. FP7-ICT-2009-4-247947: GARNICS: Gardening with a cognitive system. FP7-ICT-2009-6-269959: IntellAct: Intelligent observation and execution of Actions and manipulations.Peer Reviewe

    Vision-Based 2D and 3D Human Activity Recognition

    Get PDF
    corecore