1,235 research outputs found

    Down-Sampling coupled to Elastic Kernel Machines for Efficient Recognition of Isolated Gestures

    Get PDF
    In the field of gestural action recognition, many studies have focused on dimensionality reduction along the spatial axis, to reduce both the variability of gestural sequences expressed in the reduced space, and the computational complexity of their processing. It is noticeable that very few of these methods have explicitly addressed the dimensionality reduction along the time axis. This is however a major issue with regard to the use of elastic distances characterized by a quadratic complexity. To partially fill this apparent gap, we present in this paper an approach based on temporal down-sampling associated to elastic kernel machine learning. We experimentally show, on two data sets that are widely referenced in the domain of human gesture recognition, and very different in terms of quality of motion capture, that it is possible to significantly reduce the number of skeleton frames while maintaining a good recognition rate. The method proves to give satisfactory results at a level currently reached by state-of-the-art methods on these data sets. The computational complexity reduction makes this approach eligible for real-time applications.Comment: ICPR 2014, International Conference on Pattern Recognition, Stockholm : Sweden (2014

    3-D Hand Pose Estimation from Kinect's Point Cloud Using Appearance Matching

    Full text link
    We present a novel appearance-based approach for pose estimation of a human hand using the point clouds provided by the low-cost Microsoft Kinect sensor. Both the free-hand case, in which the hand is isolated from the surrounding environment, and the hand-object case, in which the different types of interactions are classified, have been considered. The hand-object case is clearly the most challenging task having to deal with multiple tracks. The approach proposed here belongs to the class of partial pose estimation where the estimated pose in a frame is used for the initialization of the next one. The pose estimation is obtained by applying a modified version of the Iterative Closest Point (ICP) algorithm to synthetic models to obtain the rigid transformation that aligns each model with respect to the input data. The proposed framework uses a "pure" point cloud as provided by the Kinect sensor without any other information such as RGB values or normal vector components. For this reason, the proposed method can also be applied to data obtained from other types of depth sensor, or RGB-D camera

    Human gesture classification by brute-force machine learning for exergaming in physiotherapy

    Get PDF
    In this paper, a novel approach for human gesture classification on skeletal data is proposed for the application of exergaming in physiotherapy. Unlike existing methods, we propose to use a general classifier like Random Forests to recognize dynamic gestures. The temporal dimension is handled afterwards by majority voting in a sliding window over the consecutive predictions of the classifier. The gestures can have partially similar postures, such that the classifier will decide on the dissimilar postures. This brute-force classification strategy is permitted, because dynamic human gestures show sufficient dissimilar postures. Online continuous human gesture recognition can classify dynamic gestures in an early stage, which is a crucial advantage when controlling a game by automatic gesture recognition. Also, ground truth can be easily obtained, since all postures in a gesture get the same label, without any discretization into consecutive postures. This way, new gestures can be easily added, which is advantageous in adaptive game development. We evaluate our strategy by a leave-one-subject-out cross-validation on a self-captured stealth game gesture dataset and the publicly available Microsoft Research Cambridge-12 Kinect (MSRC-12) dataset. On the first dataset we achieve an excellent accuracy rate of 96.72%. Furthermore, we show that Random Forests perform better than Support Vector Machines. On the second dataset we achieve an accuracy rate of 98.37%, which is on average 3.57% better then existing methods

    Learning event patterns for gesture detection

    Get PDF
    Usability often plays a key role when software is brought to market, including clearly structured workows, the way of presenting information to the user, and, last but not least, how he interacts with the application. In this context, input devices as 3D cameras or (multi-)touch displays became omnipresent in order to define new intuitive ways of user interaction. State-of-the-art systems tightly couple application logic with separate gesture detection components for supported devices. Hard-coded rules or static models obtained by applying machine learning algorithms on many training samples are used in order to robustly detect a pre defined set of gesture patterns. If possible at all, it becomes difficcult to extend these sets with new patterns or to modify existing ones difficult for both, application developers and end users. Further, adding gesture support for legacy software or for additional devices becomes dificult with this hardwired approach. In previous research we demonstrated how the database community can contribute to this challenge by leveraging complex event processing on data streams to express gesture patterns. While this declarative approach decouples application logic from gesture detection components, its major drawback was the non-intuitive definition of gesture queries. In this paper, we present an approach that is related to density-based clustering in order to find declarative gesture descriptions using only a few samples. We demonstrate the algorithms on mining definitions for multi-dimensional gestures from the sensor data stream that is delivered by a Microsoft Kinect 3D camera, and provide a way for non-expert users to intuitively customize gesturecontrolled user interfaces even during runtime

    Physical rehabilitation based on kinect serious games: ThG therapy game

    Get PDF
    This thesis presents a serious game platform developed using Unity 3D game Engine and Kinect V2 sensor as a natural user interface. The aim of this work was to provide a tool for objective evaluation of patients’ movements during physiotherapy sessions as well as a pleasant way that may increase patient engagement on training motor rehabilitation exercises. The developed platform based on Kinect V2 sensor detects 3D motion of different body joints and provides data storage capability in a remote database. The platform for patient’s data management during physiotherapy process includes biometric data, some data relevant for physiotherapist related to patient’s clinical history, obtained scores during serious game based training and values of metrics such as the distance between feet during a game, left and right feet usage frequency and execution time for imposed movement associated with game mechanics. A description of technologies and techniques used for development of the platform and some results related to usability of the platform are presented in this thesis.Esta tese apresenta uma plataforma de jogo séria desenvolvida usando o motor de jogo Unity 3D juntamente com o sensor Kinect V2 como uma interface natural de utilizador. O objetivo deste trabalho foi fornecer uma ferramenta para avaliação objetiva dos movimentos dos pacientes durante as sessões de fisioterapia, bem como uma maneira agradá- vel que possa aumentar o envolvimento do paciente nos treinos de reabilitação motora. A plataforma desenvolvida baseada no sensor Kinect V2 deteta o movimento 3D de diferentes articulações do corpo e fornece capacidade de armazenamento de dados em uma base de dados remota. A plataforma que gere só dados do paciente durante o processo de fisioterapia inclui dados biométricos, alguns dados relevantes para fisioterapeuta relacionados com o historial clínico do paciente, pontuações durante o treino e valores de métricas, como a distância entre os pés durante o jogo, o uso do pé esquerdo e direito, frequência e tempo de execução do movimento associado à mecânica do jogo. A tese apresenta a descrição das tecnologias e técnicas utilizadas para o desenvolvimento da plataforma, e alguns resultados relacionados com o uso da plataforma

    RGB-D-based Action Recognition Datasets: A Survey

    Get PDF
    Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-\'{a}-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols

    Facial feature point fitting with combined color and depth information for interactive displays

    Get PDF
    Interactive displays are driven by natural interaction with the user, necessitating a computer system that recognizes body gestures and facial expressions. User inputs are not easily or reliably recognized for a satisfying user experience, as the complexities of human communication are difficult to interpret in real-time. Recognizing facial expressions in particular is a problem that requires high-accuracy and efficiency for stable interaction environments. The recent availability of the Kinect, a low cost, low resolution sensor that supplies simultaneous color and depth images, provides a breakthrough opportunity to enhance the interactive capabilities of displays and overall user experience. This new RGBD (RGB + depth) sensor generates an additional channel of depth information that can be used to improve the performance of existing state of the art technology and develop new techniques. The Active Shape Model (ASM) is a well-known deformable model that has been extensively studied for facial feature point placement. Previous shape model techniques have applied 3D reconstruction techniques using multiple cameras or other statistical methods for producing 3D information from 2D color images. These methods showed improved results compared to using only color data, but required an additional deformable model or expensive imaging equipment. In this thesis, an ASM model is trained using the RGBD image produced by the Kinect. The real-time information from the depth sensor is registered to the color image to create a pixel-for-pixel match. To improve the quality of the depth image, a temporal median filter is applied to reduce random noise produced by the sensor. The resulting combined model is designed to produce more robust fitting of facial feature points compared to a purely color based active shape model
    • …
    corecore