263 research outputs found
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
Gaze-tracking-based interface for robotic chair guidance
This research focuses on finding solutions to enhance the quality of life for wheelchair users, specifically by applying a gaze-tracking-based interface for the guidance of a robotized wheelchair. For this purpose, the interface was applied in two different approaches for the wheelchair control system. The first one was an assisted control in which the user was continuously involved in controlling the movement of the wheelchair in the environment and the inclination of the different parts of the seat through the user’s gaze and eye blinks obtained with the interface. The second approach was to take the first steps to apply the device to an autonomous wheelchair control in which the wheelchair moves autonomously avoiding collisions towards the position defined by the user. To this end, the basis for obtaining the gaze position relative to the wheelchair and the object detection was developed in this project to be able to calculate in the future the optimal route to which the wheelchair should move. In addition, the integration of a robotic arm in the wheelchair to manipulate different objects was also considered, obtaining in this work the object of interest indicated by the user's gaze within the detected objects so that in the future the robotic arm could select and pick up the object the user wants to manipulate. In addition to the two approaches, an attempt was also made to estimate the user's gaze without the software interface. For this purpose, the gaze is obtained from pupil detection libraries, a calibration and a mathematical model that relates pupil positions to gaze. The results of the implementations have been analysed in this work, including some limitations encountered. Nevertheless, future improvements are proposed, with the aim of increasing the independence of wheelchair user
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
Técnicas de coste reducido para el posicionamiento del paciente en radioterapia percutánea utilizando un sistema de imágenes ópticas
Patient positioning is an important part of radiation therapy which is one of the main solutions for the treatment of malignant tissue in the human body. Currently, the most common patient positioning methods expose healthy tissue of the patient's body to extra dangerous radiations. Other non-invasive positioning methods are either not very accurate or are very costly for an average hospital. In this thesis, we explore the possibility of developing a system comprised of affordable hardware and advanced computer vision algorithms that facilitates patient positioning. Our algorithms are based on the usage of affordable RGB-D sensors, image features, ArUco planar markers, and other geometry registration methods. Furthermore, we take advantage of consumer-level computing hardware to make our systems widely accessible. More specifically, we avoid the usage of approaches that need to take advantage of dedicated GPU hardware for general-purpose computing since they are more costly. In different publications, we explore the usage of the mentioned tools to increase the accuracy of reconstruction/localization of the patient in its pose. We also take into account the visualization of the patient's target position with respect to their current position in order to assist the person who performs patient positioning. Furthermore, we make usage of augmented reality in conjunction with a real-time 3D tracking algorithm for better interaction between the program and the operator. We also solve more fundamental problems about ArUco markers that could be used in the future to improve our systems. These include highquality multi-camera calibration and mapping using ArUco markers plus detection of these markers in event cameras which are very useful in the presence of fast camera movement. In the end, we conclude that it is possible to increase the accuracy of 3D reconstruction and localization by combining current computer vision algorithms with fiducial planar markers with RGB-D sensors. This is reflected in the low amount of error we have achieved in our experiments for patient positioning, pushing forward the state of the art for this application.En el tratamiento de tumores malignos en el cuerpo, el posicionamiento del paciente en las sesiones de radioterapia es una cuestiĂłn crucial. Actualmente, los mĂ©todos más comunes de posicionamiento del paciente exponen tejido sano del mismo a radiaciones peligrosas debido a que no es posible asegurar que la posiciĂłn del paciente siempre sea la misma que la que tuvo cuando se planificĂł la zona a radiar. Los mĂ©todos que se usan actualmente, o no son precisos o tienen costes que los hacen inasequibles para ser usados en hospitales con financiaciĂłn limitada. En esta Tesis hemos analizado la posibilidad de desarrollar un sistema compuesto por hardware de bajo coste y mĂ©todos avanzados de visiĂłn por ordenador que ayuden a que el posicionamiento del paciente sea el mismo en las diferentes sesiones de radioterapia, con respecto a su pose cuando fue se planificĂł la zona a radiar. La soluciĂłn propuesta como resultado de la Tesis se basa en el uso de sensores RGB-D, caracterĂsticas extraĂdas de la imagen, marcadores cuadrados denominados ArUco y mĂ©todos de registro de la geometrĂa en la imagen. Además, en la soluciĂłn propuesta, se aprovecha la existencia de hardware convencional de bajo coste para hacer nuestro sistema ampliamente accesible. Más especĂficamente, evitamos el uso de enfoques que necesitan aprovechar GPU, de mayores costes, para computaciĂłn de propĂłsito general. Se han obtenido diferentes publicaciones para conseguir el objetivo final. Las mismas describen mĂ©todos para aumentar la precisiĂłn de la reconstrucciĂłn y la localizaciĂłn del paciente en su pose, teniendo en cuenta la visualizaciĂłn de la posiciĂłn ideal del paciente con respecto a su posiciĂłn actual, para ayudar al profesional que realiza la colocaciĂłn del paciente. TambiĂ©n se han propuesto mĂ©todos de realidad aumentada junto con algoritmos para seguimiento 3D en tiempo real para conseguir una mejor interacciĂłn entre el sistema ideado y el profesional que debe realizar esa labor. De forma añadida, tambiĂ©n se han propuesto soluciones para problemas fundamentales relacionados con el uso de marcadores cuadrados que han sido utilizados para conseguir el objetivo de la Tesis. Las soluciones propuestas pueden ser empleadas en el futuro para mejorar otros sistemas. Los problemas citados incluyen la calibraciĂłn y el mapeo multicámara de alta calidad utilizando los marcadores y la detecciĂłn de estos marcadores en cámaras de eventos, que son muy Ăştiles en presencia de movimientos rápidos de la cámara. Al final, concluimos que es posible aumentar la precisiĂłn de la reconstrucciĂłn y localizaciĂłn en 3D combinando los actuales algoritmos de visiĂłn por ordenador, que usan marcadores cuadrados de referencia, con sensores RGB-D. Los resultados obtenidos con respecto al error que el sistema obtiene al reproducir el posicionamiento del paciente suponen un importante avance en el estado del arte de este tĂłpico
Estimating Head Measurements from 3D Point Clouds
Maße menschlicher Köpfe sind unter anderem nützlich für die Ergonomie, die Akustik,
die Medizin, Computer Vision sowie Computergrafik. Solche MaĂźe werden ĂĽblicherweise
gänzlich oder teilweise manuell gewonnen, was ein umständliches Verfahren darstellt,
da die Genauigkeit von der Kompetenz der Person abhängt, die diese Messungen vornimmt.
DarĂĽber hinaus enthalten manuell erfasste Daten weniger Informationen, von
denen neue Maße abgeleitet werden können, wenn das Subjekt nicht länger verfügbar
ist. Um diese Nachteile wettzumachen, wurde ein Verfahren entwickelt, das in diesem
Manuskript vorgestellt wird, um automatisch MaĂźe aus 3D Punktwolken zu bestimmen,
da diese eine langfristige Repräsentation von Menschen darstellen. Diese 3D Punktwolken
wurden mit dem ASUS Xtion Pro Live RGB-D Sensor und KinFu (der open-source
Implementierung von KinectFusion) aufgenommen. Es werden sowohl qualitative als
auch quantitative Auswertungen der gewonnenen Maße präsentiert. Weiterhin wurde die
Umsetzbarkeit des entwickelten Verfahrens anhand einer Fallstudie beurteilt, in der die
gewonnenen MaĂźe genutzt wurden, um den Einfluss von anthropometrischen Daten auf
die Berechung der interauralen Zeitdifferenz zu schätzen.
In Anbetracht der vielversprechenden Ergebnisse der Bestimmung von MaĂźen aus 3D
Modellen, die mit dem Asus Xtion Pro Live Sensor und KinFu erstellt wurden, (sowie
der Ergebnisse aus der Literatur) und der Entwicklung neuer RGB-D Sensoren, wird auĂźerdem
eine Studie des Einflusses von sieben verschiedenen RGB-D Sensoren auf die
Rekonstruktion mittels KinFu dargestellt. Diese Studie enthält qualitative und quantitative
Auswertungen von Rekonstruktionen vier verschiedener Objekte, die in unterschiedlichen
Distanzen von 40 cm bis 120 cm aufgenommen wurden. Diese Spanne wurde anhand
der Reichweite der Sensoren gewählt. Des Weiteren ist eine Sammlung der erhaltenen
Rekonstruktionen als Datensatz verfĂĽgbar unter http://uni-tuebingen.de/en/138898.Human head measurements are valuable in ergonomics, acoustics, medicine, computer
vision, and computer graphics, among other fields. Such measurements are usually obtained using entirely or partially manual tasks, which is a cumbersome practice since
the level of accuracy depends on the expertise of the person that takes the measurements. Moreover, manually acquired measurements contain less information from which new measurements can be deduced when the subject is no longer accessible. Therefore, in order to overcome these disadvantages, an approach to automatically estimate measurements from 3D point clouds, which are long-term representations of humans, has been developed and is described in the presented manuscript. The 3D point clouds were acquired using an RGBD sensor Asus Xtion Pro Live and KinFu (open-source implementation of KinectFusion). Qualitative and quantitative evaluations of the estimated
measurements are presented. Furthermore, the feasibility of the developed approach was
evaluated through a case study in which the estimated measurements were used to appraise the influence of anthropometric data on the computation of the interaural time
difference.
Considering the promising results obtained from the estimation of measurements from
3D models acquired with the sensor Asus Xtion Pro Live and KinFu (plus the results
reported in the literature) and the development of new RGBD sensors, a study of the
influence of seven different RGBD sensors on the reconstruction obtained with KinFu
is also presented. This study contains qualitative and quantitative evaluations of reconstructions of four diverse objects captured at different distances that range from 40 cm to 120 cm. Such range was established according to the operational range of the sensors. Furthermore, a collection of obtained reconstructions is available as a dataset in
http://uni-tuebingen.de/en/138898
Automatic Pipeline Surveillance Air-Vehicle
This thesis presents the developments of a vision-based system for
aerial pipeline Right-of-Way surveillance using optical/Infrared sensors mounted
on Unmanned Aerial Vehicles (UAV). The aim of research is to develop a highly
automated, on-board system for detecting and following the pipelines; while
simultaneously detecting any third-party interference. The proposed approach
of using a UAV platform could potentially reduce the cost of monitoring and
surveying pipelines when compared to manned aircraft. The main contributions
of this thesis are the development of the image-analysis algorithms, the overall
system architecture and validation of in hardware based on scaled down Test
environment.
To evaluate the performance of the system, the algorithms were coded using
Python programming language. A small-scale test-rig of the pipeline structure,
as well as expected third-party interference, was setup to simulate the
operational environment and capture/record data for the algorithm testing and
validation.
The pipeline endpoints are identified by transforming the 16-bits depth data of
the explored environment into 3D point clouds world coordinates. Then, using
the Random Sample Consensus (RANSAC) approach, the foreground and
background are separated based on the transformed 3D point cloud to extract
the plane that corresponds to the ground. Simultaneously, the boundaries of the
explored environment are detected based on the 16-bit depth data using a
canny detector. Following that, these boundaries were filtered out, after being
transformed into a 3D point cloud, based on the real height of the pipeline for fast and accurate measurements using a Euclidean distance of each boundary
point, relative to the plane of the ground extracted previously. The filtered
boundaries were used to detect the straight lines of the object boundary (Hough
lines), once transformed into 16-bit depth data, using a Hough transform
method. The pipeline is verified by estimating a centre line segment, using a 3D
point cloud of each pair of the Hough line segments, (transformed into 3D).
Then, the corresponding linearity of the pipeline points cloud is filtered within
the width of the pipeline using Euclidean distance in the foreground point cloud.
Then, the segment length of the detected centre line is enhanced to match the
exact pipeline segment by extending it along the filtered point cloud of the
pipeline.
The third-party interference is detected based on four parameters, namely:
foreground depth data; pipeline depth data; pipeline endpoints location in the
3D point cloud; and Right-of-Way distance. The techniques include detection,
classification, and localization algorithms.
Finally, a waypoints-based navigation system was implemented for the air-
vehicle to fly over the course waypoints that were generated online by a
heading angle demand to follow the pipeline structure in real-time based on the
online identification of the pipeline endpoints relative to a camera frame
Peer Attention Modeling with Head Pose Trajectory Tracking Using Temporal Thermal Maps
Human head pose trajectories can represent a wealth of implicit information such as areas of attention, body language, potential future actions, and more. This signal is of high value for use in Human-Robot teams due to the implicit information encoded within it. Although team-based tasks require both explicit and implicit communication among peers, large team sizes, noisy environments, distance, and mission urgency can inhibit the frequency and quality of explicit communication. The goal for this thesis is to improve the capabilities of Human-Robot teams by making use of implicit communication. In support of this goal, the following hypotheses are investigated:
● Implicit information about a human subject’s attention can be reliably extracted with software by tracking the subject’s head pose trajectory, and
● Attention can be represented with a 3D temporal thermal map for implicitly determining a subject’s Objects Of Interest (OOIs).
These hypotheses are investigated by experimentation with a new tool for peer attention modeling by Head Pose Trajectory Tracking using Temporal Thermal Maps (HPT4M). This system allows a robot Observing Agent (OA) to view a human teammate and temporally model their Regions Of Interest (ROIs) by generating a 3D thermal map based on the subject’s head pose trajectory.
The findings in this work are that HPT4M can be used by an OA to contribute to a team search mission by implicitly discovering a human subject’s OOI type, mapping the item’s location within the searched space, and labeling the item’s discovery state. Furthermore, this work discusses some of the discovered limitations of this technology and hurdles that must be overcome before implementing HPT4M in a reliable real-world system.
Finally, the techniques used in this work are provided as an open source Robot Operating System (ROS) node at github.com/HPT4M with the intent that it will aid other developers in the robotics community with improving Human-Robot teams. Furthermore, the proofs of principle and tools developed in this thesis are a foundational platform for deeper investigation in future research on improving Human-Robot teams via implicit communication techniques
- …