1,235 research outputs found
Down-Sampling coupled to Elastic Kernel Machines for Efficient Recognition of Isolated Gestures
In the field of gestural action recognition, many studies have focused on
dimensionality reduction along the spatial axis, to reduce both the variability
of gestural sequences expressed in the reduced space, and the computational
complexity of their processing. It is noticeable that very few of these methods
have explicitly addressed the dimensionality reduction along the time axis.
This is however a major issue with regard to the use of elastic distances
characterized by a quadratic complexity. To partially fill this apparent gap,
we present in this paper an approach based on temporal down-sampling associated
to elastic kernel machine learning. We experimentally show, on two data sets
that are widely referenced in the domain of human gesture recognition, and very
different in terms of quality of motion capture, that it is possible to
significantly reduce the number of skeleton frames while maintaining a good
recognition rate. The method proves to give satisfactory results at a level
currently reached by state-of-the-art methods on these data sets. The
computational complexity reduction makes this approach eligible for real-time
applications.Comment: ICPR 2014, International Conference on Pattern Recognition, Stockholm
: Sweden (2014
3-D Hand Pose Estimation from Kinect's Point Cloud Using Appearance Matching
We present a novel appearance-based approach for pose estimation of a human
hand using the point clouds provided by the low-cost Microsoft Kinect sensor.
Both the free-hand case, in which the hand is isolated from the surrounding
environment, and the hand-object case, in which the different types of
interactions are classified, have been considered. The hand-object case is
clearly the most challenging task having to deal with multiple tracks. The
approach proposed here belongs to the class of partial pose estimation where
the estimated pose in a frame is used for the initialization of the next one.
The pose estimation is obtained by applying a modified version of the Iterative
Closest Point (ICP) algorithm to synthetic models to obtain the rigid
transformation that aligns each model with respect to the input data. The
proposed framework uses a "pure" point cloud as provided by the Kinect sensor
without any other information such as RGB values or normal vector components.
For this reason, the proposed method can also be applied to data obtained from
other types of depth sensor, or RGB-D camera
Human gesture classification by brute-force machine learning for exergaming in physiotherapy
In this paper, a novel approach for human gesture classification on skeletal data is proposed for the application of exergaming in physiotherapy. Unlike existing methods, we propose to use a general classifier like Random Forests to recognize dynamic gestures. The temporal dimension is handled afterwards by majority voting in a sliding window over the consecutive predictions of the classifier. The gestures can have partially similar postures, such that the classifier will decide on the dissimilar postures. This brute-force classification strategy is permitted, because dynamic human gestures show sufficient dissimilar postures. Online continuous human gesture recognition can classify dynamic gestures in an early stage, which is a crucial advantage when controlling a game by automatic gesture recognition. Also, ground truth can be easily obtained, since all postures in a gesture get the same label, without any discretization into consecutive postures. This way, new gestures can be easily added, which is advantageous in adaptive game development. We evaluate our strategy by a leave-one-subject-out cross-validation on a self-captured stealth game gesture dataset and the publicly available Microsoft Research Cambridge-12 Kinect (MSRC-12) dataset. On the first dataset we achieve an excellent accuracy rate of 96.72%. Furthermore, we show that Random Forests perform better than Support Vector Machines. On the second dataset we achieve an accuracy rate of 98.37%, which is on average 3.57% better then existing methods
Learning event patterns for gesture detection
Usability often plays a key role when software is brought to market, including clearly structured workows, the way of presenting information to the user, and, last but not least, how he interacts with the application. In this context, input devices as 3D cameras or (multi-)touch displays became omnipresent in order to define new intuitive ways of user interaction. State-of-the-art systems tightly couple application logic with separate gesture detection components for supported devices. Hard-coded rules or static models obtained by applying machine learning algorithms on many training
samples are used in order to robustly detect a pre defined set of gesture patterns. If possible at all, it becomes difficcult to extend these sets with new patterns or to modify existing ones difficult for both, application developers and end users. Further, adding gesture support for legacy software
or for additional devices becomes dificult with this hardwired approach. In previous research we demonstrated how the database community can contribute to this challenge by leveraging complex event processing on data streams to express gesture patterns. While this declarative approach
decouples application logic from gesture detection components, its major drawback was the non-intuitive definition of gesture queries. In this paper, we present an approach that is related to density-based clustering in order to find declarative gesture descriptions using only a few samples.
We demonstrate the algorithms on mining definitions for multi-dimensional gestures from the sensor data stream that is delivered by a Microsoft Kinect 3D camera, and provide a way for non-expert users to intuitively customize gesturecontrolled user interfaces even during runtime
Physical rehabilitation based on kinect serious games: ThG therapy game
This thesis presents a serious game platform developed using Unity 3D game Engine and
Kinect V2 sensor as a natural user interface. The aim of this work was to provide a tool
for objective evaluation of patients’ movements during physiotherapy sessions as well as
a pleasant way that may increase patient engagement on training motor rehabilitation
exercises.
The developed platform based on Kinect V2 sensor detects 3D motion of different body
joints and provides data storage capability in a remote database. The platform for patient’s
data management during physiotherapy process includes biometric data, some data
relevant for physiotherapist related to patient’s clinical history, obtained scores during
serious game based training and values of metrics such as the distance between feet during
a game, left and right feet usage frequency and execution time for imposed movement
associated with game mechanics. A description of technologies and techniques used for
development of the platform and some results related to usability of the platform are
presented in this thesis.Esta tese apresenta uma plataforma de jogo séria desenvolvida usando o motor de jogo
Unity 3D juntamente com o sensor Kinect V2 como uma interface natural de utilizador.
O objetivo deste trabalho foi fornecer uma ferramenta para avaliação objetiva dos movimentos dos pacientes durante as sessões de fisioterapia, bem como uma maneira agradá-
vel que possa aumentar o envolvimento do paciente nos treinos de reabilitação motora.
A plataforma desenvolvida baseada no sensor Kinect V2 deteta o movimento 3D de diferentes articulações do corpo e fornece capacidade de armazenamento de dados em uma
base de dados remota. A plataforma que gere sĂł dados do paciente durante o processo de
fisioterapia inclui dados biomĂ©tricos, alguns dados relevantes para fisioterapeuta relacionados com o historial clĂnico do paciente, pontuações durante o treino e valores de mĂ©tricas, como a distância entre os pĂ©s durante o jogo, o uso do pĂ© esquerdo e direito, frequĂŞncia e tempo de execução do movimento associado Ă mecânica do jogo. A tese apresenta
a descrição das tecnologias e técnicas utilizadas para o desenvolvimento da plataforma, e
alguns resultados relacionados com o uso da plataforma
RGB-D-based Action Recognition Datasets: A Survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has
attracted increasing attention since the first work reported in 2010. Over this
period, many benchmark datasets have been created to facilitate the development
and evaluation of new algorithms. This raises the question of which dataset to
select and how to use it in providing a fair and objective comparative
evaluation against state-of-the-art methods. To address this issue, this paper
provides a comprehensive review of the most commonly used action recognition
related RGB-D video datasets, including 27 single-view datasets, 10 multi-view
datasets, and 7 multi-person datasets. The detailed information and analysis of
these datasets is a useful resource in guiding insightful selection of datasets
for future research. In addition, the issues with current algorithm evaluation
vis-\'{a}-vis limitations of the available datasets and evaluation protocols
are also highlighted; resulting in a number of recommendations for collection
of new datasets and use of evaluation protocols
Facial feature point fitting with combined color and depth information for interactive displays
Interactive displays are driven by natural interaction with the user, necessitating a computer system that recognizes body gestures and facial expressions. User inputs are not easily or reliably recognized for a satisfying user experience, as the complexities of human communication are difficult to interpret in real-time. Recognizing facial expressions in particular is a problem that requires high-accuracy and efficiency for stable interaction environments. The recent availability of the Kinect, a low cost, low resolution sensor that supplies simultaneous color and depth images, provides a breakthrough opportunity to enhance the interactive capabilities of displays and overall user experience. This new RGBD (RGB + depth) sensor generates an additional channel of depth information that can be used to improve the performance of existing state of the art technology and develop new techniques. The Active Shape Model (ASM) is a well-known deformable model that has been extensively studied for facial feature point placement. Previous shape model techniques have applied 3D reconstruction techniques using multiple cameras or other statistical methods for producing 3D information from 2D color images. These methods showed improved results compared to using only color data, but required an additional deformable model or expensive imaging equipment. In this thesis, an ASM model is trained using the RGBD image produced by the Kinect. The real-time information from the depth sensor is registered to the color image to create a pixel-for-pixel match. To improve the quality of the depth image, a temporal median filter is applied to reduce random noise produced by the sensor. The resulting combined model is designed to produce more robust fitting of facial feature points compared to a purely color based active shape model
- …