    3-D Hand Pose Estimation from Kinect's Point Cloud Using Appearance Matching

    We present a novel appearance-based approach for pose estimation of a human hand using the point clouds provided by the low-cost Microsoft Kinect sensor. Both the free-hand case, in which the hand is isolated from the surrounding environment, and the hand-object case, in which the different types of interactions are classified, have been considered. The hand-object case is clearly the most challenging task having to deal with multiple tracks. The approach proposed here belongs to the class of partial pose estimation where the estimated pose in a frame is used for the initialization of the next one. The pose estimation is obtained by applying a modified version of the Iterative Closest Point (ICP) algorithm to synthetic models to obtain the rigid transformation that aligns each model with respect to the input data. The proposed framework uses a "pure" point cloud as provided by the Kinect sensor without any other information such as RGB values or normal vector components. For this reason, the proposed method can also be applied to data obtained from other types of depth sensor, or RGB-D camera

    A fast and robust hand-driven 3D mouse

    The development of new interaction paradigms requires a natural interaction. This means that people should be able to interact with technology with the same models used to interact with everyday real life, that is through gestures, expressions, voice. Following this idea, in this paper we propose a non intrusive vision based tracking system able to capture hand motion and simple hand gestures. The proposed device allows to use the hand as a "natural" 3D mouse, where the forefinger tip or the palm centre are used to identify a 3D marker and the hand gesture can be used to simulate the mouse buttons. The approach is based on a monoscopic tracking algorithm which is computationally fast and robust against noise and cluttered backgrounds. Two image streams are processed in parallel exploiting multi-core architectures, and their results are combined to obtain a constrained stereoscopic problem. The system has been implemented and thoroughly tested in an experimental environment where the 3D hand mouse has been used to interact with objects in a virtual reality application. We also provide results about the performances of the tracker, which demonstrate precision and robustness of the proposed syste

    Real-Time Markerless Tracking the Human Hands for 3D Interaction

    This thesis presents methods for enabling suitable human computer interaction using only movements of the bare human hands in free space. This kind of interaction is natural and intuitive, particularly because actions familiar to our everyday life can be reflected. Furthermore, the input is contact-free which is of great advantage e.g. in medical applications due to hygiene factors. For enabling the translation of hand movements to control signals an automatic method for tracking the pose and/or posture of the hand is needed. In this context the simultaneous recognition of both hands is desirable to allow for more natural input. The first contribution of this thesis is a novel video-based method for real-time detection of the positions and orientations of both bare human hands in four different predefined postures, respectively. Based on such a system novel interaction interfaces can be developed. However, the design of such interfaces is a non-trivial task. Additionally, the development of novel interaction techniques is often mandatory in order to enable the design of efficient and easily operable interfaces. To this end, several novel interaction techniques are presented and investigated in this thesis, which solve existing problems and substantially improve the applicability of such a new device. These techniques are not restricted to this input instrument and can also be employed to improve the handling of other interaction devices. Finally, several new interaction interfaces are described and analyzed to demonstrate possible applications in specific interaction scenarios.Markerlose Verfolgung der menschlichen Hände in Echtzeit für 3D Interaktion In der vorliegenden Arbeit werden Verfahren dargestellt, die sinnvolle Mensch- Maschine-Interaktionen nur durch Bewegungen der bloßen Hände in freiem Raum ermöglichen. Solche "natürlichen" Interaktionen haben den besonderen Vorteil, dass alltägliche und vertraute Handlungen in die virtuelle Umgebung übertragen werden können. Außerdem werden auf diese Art berührungslose Eingaben ermöglicht, nützlich z.B. wegen hygienischer Aspekte im medizinischen Bereich. Um Handbewegungen in Steuersignale umsetzen zu können, ist zunächst ein automatisches Verfahren zur Erkennung der Lage und/oder der Art der mit der Hand gebildeten Geste notwendig. Dabei ist die gleichzeitige Erfassung beider Hände wünschenswert, um die Eingaben möglichst natürlich gestalten zu können. Der erste Beitrag dieser Arbeit besteht aus einer neuen videobasierten Methode zur unmittelbaren Erkennung der Positionen und Orientierungen beider Hände in jeweils vier verschiedenen, vordefinierten Gesten. Basierend auf einem solchen Verfahren können neuartige Interaktionsschnittstellen entwickelt werden. Allerdings ist die Ausgestaltung solcher Schnittstellen keinesfalls trivial. Im Gegenteil ist bei einer neuen Art der Interaktion meist sogar die Entwicklung neuer Interaktionstechniken erforderlich, damit überhaupt effiziente und gut bedienbare Schnittstellen konzipiert werden können. Aus diesem Grund wurden in dieser Arbeit einige neue Interaktionstechniken entwickelt und untersucht, die vorhandene Probleme beheben und die Anwendbarkeit eines solchen Eingabeinstruments für bestimmte Arten der Interaktion verbessern oder überhaupt erst ermöglichen. Diese Techniken sind nicht auf dieses Eingabeinstrument beschränkt und können durchaus auch die Handhabung anderer Eingabegeräte verbessern. Des Weiteren werden mehrere neue Interaktionsschnittstellen präsentiert, die den möglichen Einsatz bloßhändiger Interaktion in verschiedenen, typischen Anwendungsgebieten veranschaulichen

    3D hand tracking.

    The hand is often considered as one of the most natural and intuitive interaction modalities for human-to-human interaction. In human-computer interaction (HCI), proper 3D hand tracking is the first step in developing a more intuitive HCI system which can be used in applications such as gesture recognition, virtual object manipulation and gaming. However, accurate 3D hand tracking, remains a challenging problem due to the hand’s deformation, appearance similarity, high inter-finger occlusion and complex articulated motion. Further, 3D hand tracking is also interesting from a theoretical point of view as it deals with three major areas of computer vision- segmentation (of hand), detection (of hand parts), and tracking (of hand). This thesis proposes a region-based skin color detection technique, a model-based and an appearance-based 3D hand tracking techniques to bring the human-computer interaction applications one step closer. All techniques are briefly described below. Skin color provides a powerful cue for complex computer vision applications. Although skin color detection has been an active research area for decades, the mainstream technology is based on individual pixels. This thesis presents a new region-based technique for skin color detection which outperforms the current state-of-the-art pixel-based skin color detection technique on the popular Compaq dataset (Jones & Rehg 2002). The proposed technique achieves 91.17% true positive rate with 13.12% false negative rate on the Compaq dataset tested over approximately 14,000 web images. Hand tracking is not a trivial task as it requires tracking of 27 degreesof- freedom of hand. Hand deformation, self occlusion, appearance similarity and irregular motion are major problems that make 3D hand tracking a very challenging task. This thesis proposes a model-based 3D hand tracking technique, which is improved by using proposed depth-foreground-background ii feature, palm deformation module and context cue. However, the major problem of model-based techniques is, they are computationally expensive. This can be overcome by discriminative techniques as described below. Discriminative techniques (for example random forest) are good for hand part detection, however they fail due to sensor noise and high interfinger occlusion. Additionally, these techniques have difficulties in modelling kinematic or temporal constraints. Although model-based descriptive (for example Markov Random Field) or generative (for example Hidden Markov Model) techniques utilize kinematic and temporal constraints well, they are computationally expensive and hardly recover from tracking failure. This thesis presents a unified framework for 3D hand tracking, using the best of both methodologies, which out performs the current state-of-the-art 3D hand tracking techniques. The proposed 3D hand tracking techniques in this thesis can be used to extract accurate hand movement features and enable complex human machine interaction such as gaming and virtual object manipulation

    Markerless Motion Capture via Convolutional Neural Network

    A human motion capture system can be defined as a process that digitally records the movements of a person and then translates them into computer-animated images. To achieve this goal, motion capture systems usually exploit different types of algorithms, which include techniques such as pose estimation or background subtraction: this latter aims at segmenting moving objects from the background under multiple challenging scenarios. Recently, encoder-decoder-type deep neural networks designed to accomplish this task have reached impressive results, outperforming classical approaches. The aim of this thesis is to evaluate and discuss the predictions provided by the multi-scale convolutional neural network FgSegNet_v2, a deep learning-based method which represents the current state-of-the-art for implementing scene-specific background subtraction. In this work, FgSegNet_v2 is trained and tested on BBSoF S.r.l. dataset, extending its scene- specific use to a more general application in several environments

    Stochastic optimization and interactive machine learning for human motion analysis

    The analysis of human motion from visual data is a central issue in the computer vision research community as it enables a wide range of applications and it still remains a challenging problem when dealing with unconstrained scenarios and general conditions. Human motion analysis is used in the entertainment industry for movies or videogame production, in medical applications for rehabilitation or biomechanical studies. It is also used for human computer interaction in any kind of environment, and moreover, it is used for big data analysis from social networks such as Youtube or Flickr, to mention some of its use cases. In this thesis we have studied human motion analysis techniques with a focus on its application for smart room environments. That is, we have studied methods that will support the analysis of people behavior in the room, allowing interaction with computers in a natural manner and in general, methods that introduce computers in human activity environments to enable new kind of services but in an unobstrusive mode. The thesis is structured in two parts, where we study the problem of 3D pose estimation from multiple views and the recognition of gestures using range sensors. First, we propose a generic framework for hierarchically layered particle filtering (HPF) specially suited for motion capture tasks. Human motion capture problem generally involve tracking or optimization of high-dimensional state vectors where also one have to deal with multi-modal pdfs. HPF allow to overcome the problem by means of multiple passes through substate space variables. Then, based on the HPF framework, we propose a method to estimate the anthropometry of the subject, which at the end allows to obtain a human body model adjusted to the subject. Moreover, we introduce a new weighting function strategy for approximate partitioning of observations and a method that employs body part detections to improve particle propagation and weight evaluation, both integrated within the HPF framework. The second part of this thesis is centered in the detection of gestures, and we have focused the problem of reducing annotation and training efforts required to train a specific gesture. In order to reduce the efforts required to train a gesture detector, we propose a solution based on online random forests that allows training in real-time, while receiving new data in sequence. The main aspect that makes the solution effective is the method we propose to collect the hard negatives examples while training the forests. The method uses the detector trained up to the current frame to test on that frame, and then collects samples based on the response of the detector such that they will be more relevant for training. In this manner, training is more effective in terms of the number of annotated frames required.L'anàlisi del moviment humà a partir de dades visuals és un tema central en la recerca en visió per computador, per una banda perquè habilita un ampli espectre d'aplicacions i per altra perquè encara és un problema no resolt quan és aplicat en escenaris no controlats. L'analisi del moviment humà s'utilitza a l'indústria de l'entreteniment per la producció de pel·lícules i videojocs, en aplicacions mèdiques per rehabilitació o per estudis bio-mecànics. També s'utilitza en el camp de la interacció amb computadors o també per l'analisi de grans volums de dades de xarxes socials com Youtube o Flickr, per mencionar alguns exemples. En aquesta tesi s'han estudiat tècniques per l'anàlisi de moviment humà enfocant la seva aplicació en entorns de sales intel·ligents. És a dir, s'ha enfocat a mètodes que puguin permetre l'anàlisi del comportament de les persones a la sala, que permetin la interacció amb els dispositius d'una manera natural i, en general, mètodes que incorporin les computadores en espais on hi ha activitat de persones, per habilitar nous serveis de manera que no interfereixin en la activitat. A la primera part, es proposa un marc genèric per l'ús de filtres de partícules jeràrquics (HPF) especialment adequat per tasques de captura de moviment humà. La captura de moviment humà generalment implica seguiment i optimització de vectors d'estat de molt alta dimensió on a la vegada també s'han de tractar pdf's multi-modals. Els HPF permeten tractar aquest problema mitjançant multiples passades en subdivisions del vector d'estat. Basant-nos en el marc dels HPF, es proposa un mètode per estimar l'antropometria del subjecte, que a la vegada permet obtenir un model acurat del subjecte. També proposem dos nous mètodes per la captura de moviment humà. Per una banda, el APO es basa en una nova estratègia per les funcions de cost basada en la partició de les observacions. Per altra, el DD-HPF utilitza deteccions de parts del cos per millorar la propagació de partícules i l'avaluació de pesos. Ambdós mètodes són integrats dins el marc dels HPF. La segona part de la tesi es centra en la detecció de gestos, i s'ha enfocat en el problema de reduir els esforços d'anotació i entrenament requerits per entrenar un detector per un gest concret. Per tal de reduir els esforços requerits per entrenar un detector de gestos, proposem una solució basada en online random forests que permet l'entrenament en temps real, mentre es reben noves dades sequencialment. El principal aspecte que fa la solució efectiva és el mètode que proposem per obtenir mostres negatives rellevants, mentre s'entrenen els arbres de decisió. El mètode utilitza el detector entrenat fins al moment per recollir mostres basades en la resposta del detector, de manera que siguin més rellevants per l'entrenament. D'aquesta manera l'entrenament és més efectiu pel que fa al nombre de mostres anotades que es requereixen

    Capturing Hands in Action using Discriminative Salient Points and Physics Simulation

    Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated objects. Our framework combines a generative model with discriminatively trained salient points to achieve a low tracking error and with collision detection and physics simulation to achieve physically plausible estimates even in case of occlusions and missing visual data. Since all components are unified in a single objective function which is almost everywhere differentiable, it can be optimized with standard optimization techniques. Our approach works for monocular RGB-D sequences as well as setups with multiple synchronized RGB cameras. For a qualitative and quantitative evaluation, we captured 29 sequences with a large variety of interactions and up to 150 degrees of freedom.Comment: Accepted for publication by the International Journal of Computer Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14 hand tracking paper with several extensions, additional experiments and detail

    Tracking hands in action for gesture-based computer input

    This thesis introduces new methods for markerless tracking of the full articulated motion of hands and for informing the design of gesture-based computer input. Emerging devices such as smartwatches or virtual/augmented reality glasses are in need of new input devices for interaction on the move. The highly dexterous human hands could provide an always-on input capability without the actual need to carry a physical device. First, we present novel methods to address the hard computer vision-based hand tracking problem under varying number of cameras, viewpoints, and run-time requirements. Second, we contribute to the design of gesture-based interaction techniques by presenting heuristic and computational approaches. The contributions of this thesis allow users to effectively interact with computers through markerless tracking of hands and objects in desktop, mobile, and egocentric scenarios.Diese Arbeit stellt neue Methoden für die markerlose Verfolgung der vollen Artikulation der Hände und für die Informierung der Gestaltung der Gestik-Computer-Input. Emerging-Geräte wie Smartwatches oder virtuelle / Augmented-Reality-Brillen benötigen neue Eingabegeräte für Interaktion in Bewegung. Die sehr geschickten menschlichen Hände konnten eine immer-on-Input-Fähigkeit, ohne die tatsächliche Notwendigkeit, ein physisches Gerät zu tragen. Zunächst stellen wir neue Verfahren vor, um das visionbasierte Hand-Tracking-Problem des Hardcomputers unter variierender Anzahl von Kameras, Sichtweisen und Laufzeitanforderungen zu lösen. Zweitens tragen wir zur Gestaltung von gesture-basierten Interaktionstechniken bei, indem wir heuristische und rechnerische Ansätze vorstellen. Die Beiträge dieser Arbeit ermöglichen es Benutzern, effektiv interagieren mit Computern durch markerlose Verfolgung von Händen und Objekten in Desktop-, mobilen und egozentrischen Szenarien

