263 research outputs found

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

    Gaze-tracking-based interface for robotic chair guidance

    Get PDF
    This research focuses on finding solutions to enhance the quality of life for wheelchair users, specifically by applying a gaze-tracking-based interface for the guidance of a robotized wheelchair. For this purpose, the interface was applied in two different approaches for the wheelchair control system. The first one was an assisted control in which the user was continuously involved in controlling the movement of the wheelchair in the environment and the inclination of the different parts of the seat through the user’s gaze and eye blinks obtained with the interface. The second approach was to take the first steps to apply the device to an autonomous wheelchair control in which the wheelchair moves autonomously avoiding collisions towards the position defined by the user. To this end, the basis for obtaining the gaze position relative to the wheelchair and the object detection was developed in this project to be able to calculate in the future the optimal route to which the wheelchair should move. In addition, the integration of a robotic arm in the wheelchair to manipulate different objects was also considered, obtaining in this work the object of interest indicated by the user's gaze within the detected objects so that in the future the robotic arm could select and pick up the object the user wants to manipulate. In addition to the two approaches, an attempt was also made to estimate the user's gaze without the software interface. For this purpose, the gaze is obtained from pupil detection libraries, a calibration and a mathematical model that relates pupil positions to gaze. The results of the implementations have been analysed in this work, including some limitations encountered. Nevertheless, future improvements are proposed, with the aim of increasing the independence of wheelchair user

    HeadOn: Real-time Reenactment of Human Portrait Videos

    Get PDF
    We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel real-time reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at Siggraph'1

    Técnicas de coste reducido para el posicionamiento del paciente en radioterapia percutánea utilizando un sistema de imágenes ópticas

    Get PDF
    Patient positioning is an important part of radiation therapy which is one of the main solutions for the treatment of malignant tissue in the human body. Currently, the most common patient positioning methods expose healthy tissue of the patient's body to extra dangerous radiations. Other non-invasive positioning methods are either not very accurate or are very costly for an average hospital. In this thesis, we explore the possibility of developing a system comprised of affordable hardware and advanced computer vision algorithms that facilitates patient positioning. Our algorithms are based on the usage of affordable RGB-D sensors, image features, ArUco planar markers, and other geometry registration methods. Furthermore, we take advantage of consumer-level computing hardware to make our systems widely accessible. More specifically, we avoid the usage of approaches that need to take advantage of dedicated GPU hardware for general-purpose computing since they are more costly. In different publications, we explore the usage of the mentioned tools to increase the accuracy of reconstruction/localization of the patient in its pose. We also take into account the visualization of the patient's target position with respect to their current position in order to assist the person who performs patient positioning. Furthermore, we make usage of augmented reality in conjunction with a real-time 3D tracking algorithm for better interaction between the program and the operator. We also solve more fundamental problems about ArUco markers that could be used in the future to improve our systems. These include highquality multi-camera calibration and mapping using ArUco markers plus detection of these markers in event cameras which are very useful in the presence of fast camera movement. In the end, we conclude that it is possible to increase the accuracy of 3D reconstruction and localization by combining current computer vision algorithms with fiducial planar markers with RGB-D sensors. This is reflected in the low amount of error we have achieved in our experiments for patient positioning, pushing forward the state of the art for this application.En el tratamiento de tumores malignos en el cuerpo, el posicionamiento del paciente en las sesiones de radioterapia es una cuestión crucial. Actualmente, los métodos más comunes de posicionamiento del paciente exponen tejido sano del mismo a radiaciones peligrosas debido a que no es posible asegurar que la posición del paciente siempre sea la misma que la que tuvo cuando se planificó la zona a radiar. Los métodos que se usan actualmente, o no son precisos o tienen costes que los hacen inasequibles para ser usados en hospitales con financiación limitada. En esta Tesis hemos analizado la posibilidad de desarrollar un sistema compuesto por hardware de bajo coste y métodos avanzados de visión por ordenador que ayuden a que el posicionamiento del paciente sea el mismo en las diferentes sesiones de radioterapia, con respecto a su pose cuando fue se planificó la zona a radiar. La solución propuesta como resultado de la Tesis se basa en el uso de sensores RGB-D, características extraídas de la imagen, marcadores cuadrados denominados ArUco y métodos de registro de la geometría en la imagen. Además, en la solución propuesta, se aprovecha la existencia de hardware convencional de bajo coste para hacer nuestro sistema ampliamente accesible. Más específicamente, evitamos el uso de enfoques que necesitan aprovechar GPU, de mayores costes, para computación de propósito general. Se han obtenido diferentes publicaciones para conseguir el objetivo final. Las mismas describen métodos para aumentar la precisión de la reconstrucción y la localización del paciente en su pose, teniendo en cuenta la visualización de la posición ideal del paciente con respecto a su posición actual, para ayudar al profesional que realiza la colocación del paciente. También se han propuesto métodos de realidad aumentada junto con algoritmos para seguimiento 3D en tiempo real para conseguir una mejor interacción entre el sistema ideado y el profesional que debe realizar esa labor. De forma añadida, también se han propuesto soluciones para problemas fundamentales relacionados con el uso de marcadores cuadrados que han sido utilizados para conseguir el objetivo de la Tesis. Las soluciones propuestas pueden ser empleadas en el futuro para mejorar otros sistemas. Los problemas citados incluyen la calibración y el mapeo multicámara de alta calidad utilizando los marcadores y la detección de estos marcadores en cámaras de eventos, que son muy útiles en presencia de movimientos rápidos de la cámara. Al final, concluimos que es posible aumentar la precisión de la reconstrucción y localización en 3D combinando los actuales algoritmos de visión por ordenador, que usan marcadores cuadrados de referencia, con sensores RGB-D. Los resultados obtenidos con respecto al error que el sistema obtiene al reproducir el posicionamiento del paciente suponen un importante avance en el estado del arte de este tópico

    Estimating Head Measurements from 3D Point Clouds

    Get PDF
    Maße menschlicher Köpfe sind unter anderem nützlich für die Ergonomie, die Akustik, die Medizin, Computer Vision sowie Computergrafik. Solche Maße werden üblicherweise gänzlich oder teilweise manuell gewonnen, was ein umständliches Verfahren darstellt, da die Genauigkeit von der Kompetenz der Person abhängt, die diese Messungen vornimmt. Darüber hinaus enthalten manuell erfasste Daten weniger Informationen, von denen neue Maße abgeleitet werden können, wenn das Subjekt nicht länger verfügbar ist. Um diese Nachteile wettzumachen, wurde ein Verfahren entwickelt, das in diesem Manuskript vorgestellt wird, um automatisch Maße aus 3D Punktwolken zu bestimmen, da diese eine langfristige Repräsentation von Menschen darstellen. Diese 3D Punktwolken wurden mit dem ASUS Xtion Pro Live RGB-D Sensor und KinFu (der open-source Implementierung von KinectFusion) aufgenommen. Es werden sowohl qualitative als auch quantitative Auswertungen der gewonnenen Maße präsentiert. Weiterhin wurde die Umsetzbarkeit des entwickelten Verfahrens anhand einer Fallstudie beurteilt, in der die gewonnenen Maße genutzt wurden, um den Einfluss von anthropometrischen Daten auf die Berechung der interauralen Zeitdifferenz zu schätzen. In Anbetracht der vielversprechenden Ergebnisse der Bestimmung von Maßen aus 3D Modellen, die mit dem Asus Xtion Pro Live Sensor und KinFu erstellt wurden, (sowie der Ergebnisse aus der Literatur) und der Entwicklung neuer RGB-D Sensoren, wird außerdem eine Studie des Einflusses von sieben verschiedenen RGB-D Sensoren auf die Rekonstruktion mittels KinFu dargestellt. Diese Studie enthält qualitative und quantitative Auswertungen von Rekonstruktionen vier verschiedener Objekte, die in unterschiedlichen Distanzen von 40 cm bis 120 cm aufgenommen wurden. Diese Spanne wurde anhand der Reichweite der Sensoren gewählt. Des Weiteren ist eine Sammlung der erhaltenen Rekonstruktionen als Datensatz verfügbar unter http://uni-tuebingen.de/en/138898.Human head measurements are valuable in ergonomics, acoustics, medicine, computer vision, and computer graphics, among other fields. Such measurements are usually obtained using entirely or partially manual tasks, which is a cumbersome practice since the level of accuracy depends on the expertise of the person that takes the measurements. Moreover, manually acquired measurements contain less information from which new measurements can be deduced when the subject is no longer accessible. Therefore, in order to overcome these disadvantages, an approach to automatically estimate measurements from 3D point clouds, which are long-term representations of humans, has been developed and is described in the presented manuscript. The 3D point clouds were acquired using an RGBD sensor Asus Xtion Pro Live and KinFu (open-source implementation of KinectFusion). Qualitative and quantitative evaluations of the estimated measurements are presented. Furthermore, the feasibility of the developed approach was evaluated through a case study in which the estimated measurements were used to appraise the influence of anthropometric data on the computation of the interaural time difference. Considering the promising results obtained from the estimation of measurements from 3D models acquired with the sensor Asus Xtion Pro Live and KinFu (plus the results reported in the literature) and the development of new RGBD sensors, a study of the influence of seven different RGBD sensors on the reconstruction obtained with KinFu is also presented. This study contains qualitative and quantitative evaluations of reconstructions of four diverse objects captured at different distances that range from 40 cm to 120 cm. Such range was established according to the operational range of the sensors. Furthermore, a collection of obtained reconstructions is available as a dataset in http://uni-tuebingen.de/en/138898

    Automatic Pipeline Surveillance Air-Vehicle

    Get PDF
    This thesis presents the developments of a vision-based system for aerial pipeline Right-of-Way surveillance using optical/Infrared sensors mounted on Unmanned Aerial Vehicles (UAV). The aim of research is to develop a highly automated, on-board system for detecting and following the pipelines; while simultaneously detecting any third-party interference. The proposed approach of using a UAV platform could potentially reduce the cost of monitoring and surveying pipelines when compared to manned aircraft. The main contributions of this thesis are the development of the image-analysis algorithms, the overall system architecture and validation of in hardware based on scaled down Test environment. To evaluate the performance of the system, the algorithms were coded using Python programming language. A small-scale test-rig of the pipeline structure, as well as expected third-party interference, was setup to simulate the operational environment and capture/record data for the algorithm testing and validation. The pipeline endpoints are identified by transforming the 16-bits depth data of the explored environment into 3D point clouds world coordinates. Then, using the Random Sample Consensus (RANSAC) approach, the foreground and background are separated based on the transformed 3D point cloud to extract the plane that corresponds to the ground. Simultaneously, the boundaries of the explored environment are detected based on the 16-bit depth data using a canny detector. Following that, these boundaries were filtered out, after being transformed into a 3D point cloud, based on the real height of the pipeline for fast and accurate measurements using a Euclidean distance of each boundary point, relative to the plane of the ground extracted previously. The filtered boundaries were used to detect the straight lines of the object boundary (Hough lines), once transformed into 16-bit depth data, using a Hough transform method. The pipeline is verified by estimating a centre line segment, using a 3D point cloud of each pair of the Hough line segments, (transformed into 3D). Then, the corresponding linearity of the pipeline points cloud is filtered within the width of the pipeline using Euclidean distance in the foreground point cloud. Then, the segment length of the detected centre line is enhanced to match the exact pipeline segment by extending it along the filtered point cloud of the pipeline. The third-party interference is detected based on four parameters, namely: foreground depth data; pipeline depth data; pipeline endpoints location in the 3D point cloud; and Right-of-Way distance. The techniques include detection, classification, and localization algorithms. Finally, a waypoints-based navigation system was implemented for the air- vehicle to fly over the course waypoints that were generated online by a heading angle demand to follow the pipeline structure in real-time based on the online identification of the pipeline endpoints relative to a camera frame

    Peer Attention Modeling with Head Pose Trajectory Tracking Using Temporal Thermal Maps

    Get PDF
    Human head pose trajectories can represent a wealth of implicit information such as areas of attention, body language, potential future actions, and more. This signal is of high value for use in Human-Robot teams due to the implicit information encoded within it. Although team-based tasks require both explicit and implicit communication among peers, large team sizes, noisy environments, distance, and mission urgency can inhibit the frequency and quality of explicit communication. The goal for this thesis is to improve the capabilities of Human-Robot teams by making use of implicit communication. In support of this goal, the following hypotheses are investigated: ● Implicit information about a human subject’s attention can be reliably extracted with software by tracking the subject’s head pose trajectory, and ● Attention can be represented with a 3D temporal thermal map for implicitly determining a subject’s Objects Of Interest (OOIs). These hypotheses are investigated by experimentation with a new tool for peer attention modeling by Head Pose Trajectory Tracking using Temporal Thermal Maps (HPT4M). This system allows a robot Observing Agent (OA) to view a human teammate and temporally model their Regions Of Interest (ROIs) by generating a 3D thermal map based on the subject’s head pose trajectory. The findings in this work are that HPT4M can be used by an OA to contribute to a team search mission by implicitly discovering a human subject’s OOI type, mapping the item’s location within the searched space, and labeling the item’s discovery state. Furthermore, this work discusses some of the discovered limitations of this technology and hurdles that must be overcome before implementing HPT4M in a reliable real-world system. Finally, the techniques used in this work are provided as an open source Robot Operating System (ROS) node at github.com/HPT4M with the intent that it will aid other developers in the robotics community with improving Human-Robot teams. Furthermore, the proofs of principle and tools developed in this thesis are a foundational platform for deeper investigation in future research on improving Human-Robot teams via implicit communication techniques
    • …
    corecore