232 research outputs found
Multi-modal human gesture recognition combining dynamic programming and probabilistic methods
In this M. Sc. Thesis, we deal with the problem of Human Gesture Recognition using Human Behavior Analysis technologies. In particular, we apply the proposed methodologies in both health care and social applications. In these contexts, gestures are usually performed in a natural way, producing a high variability between the Human Poses that belong to them. This fact makes Human Gesture Recognition a very challenging task, as well as their generalization on developing technologies for Human Behavior Analysis. In order to tackle with the complete framework for Human Gesture Recognition, we split the process in three main goals: Computing multi-modal feature spaces, probabilistic modelling of gestures, and clustering of Human Poses for Sub-Gesture representation. Each of these goals implicitly includes different challenging problems, which are interconnected and faced by three presented approaches: Bag-of-Visual-and-Depth-Words, Probabilistic-Based Dynamic Time Warping, and Sub-Gesture Representation. The methodologies of each of these approaches are explained in detail in the next sections. We have validated the presented approaches on different public and designed data sets, showing high performance and the viability of using our methods for real Human Behavior Analysis systems and applications. Finally, we show a summary of different related applications currently in development, as well as both conclusions and future trends of research
Computer vision for body tracking in professional environments
Máster en Image Processing and Computer VisionThe goal of this work is to build the basis for a smartphone application that
provides functionalities for recording human motion data, train machine learning
algorithms and recognize professional gestures.
First, we take advantage of the new mobile phone cameras, either infrared or
stereoscopic, to record RGB-D data. Then, a bottom-up pose estimation algorithm
based on Deep Learning extracts the 2D human skeleton and exports the 3rd dimension
using the depth. Finally, we use a gesture recognition engine, which is based
on K-means and Hidden Markov Models (HMMs). The performance of the machine
learning algorithm has been tested with professional gestures using a silk-weaving, a
TV-assembly and hand-made glass datasets
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
In this paper, we present SignAvatars, the first large-scale multi-prompt 3D
sign language (SL) motion dataset designed to bridge the communication gap for
hearing-impaired individuals. While there has been an exponentially growing
number of research regarding digital communication, the majority of existing
communication technologies primarily cater to spoken or written languages,
instead of SL, the essential communication method for hearing-impaired
communities. Existing SL datasets, dictionaries, and sign language production
(SLP) methods are typically limited to 2D as the annotating 3D models and
avatars for SL is usually an entirely manual and labor-intensive process
conducted by SL experts, often resulting in unnatural avatars. In response to
these challenges, we compile and curate the SignAvatars dataset, which
comprises 70,000 videos from 153 signers, totaling 8.34 million frames,
covering both isolated signs and continuous, co-articulated signs, with
multiple prompts including HamNoSys, spoken language, and words. To yield 3D
holistic annotations, including meshes and biomechanically-valid poses of body,
hands, and face, as well as 2D and 3D keypoints, we introduce an automated
annotation pipeline operating on our large corpus of SL videos. SignAvatars
facilitates various tasks such as 3D sign language recognition (SLR) and the
novel 3D SL production (SLP) from diverse inputs like text scripts, individual
words, and HamNoSys notation. Hence, to evaluate the potential of SignAvatars,
we further propose a unified benchmark of 3D SL holistic motion production. We
believe that this work is a significant step forward towards bringing the
digital world to the hearing-impaired communities. Our project page is at
https://signavatars.github.io/Comment: 9 pages; Project page available at https://signavatars.github.io
A Temporal Densely Connected Recurrent Network for Event-based Human Pose Estimation
Event camera is an emerging bio-inspired vision sensors that report per-pixel
brightness changes asynchronously. It holds noticeable advantage of high
dynamic range, high speed response, and low power budget that enable it to best
capture local motions in uncontrolled environments. This motivates us to unlock
the potential of event cameras for human pose estimation, as the human pose
estimation with event cameras is rarely explored. Due to the novel paradigm
shift from conventional frame-based cameras, however, event signals in a time
interval contain very limited information, as event cameras can only capture
the moving body parts and ignores those static body parts, resulting in some
parts to be incomplete or even disappeared in the time interval. This paper
proposes a novel densely connected recurrent architecture to address the
problem of incomplete information. By this recurrent architecture, we can
explicitly model not only the sequential but also non-sequential geometric
consistency across time steps to accumulate information from previous frames to
recover the entire human bodies, achieving a stable and accurate human pose
estimation from event data. Moreover, to better evaluate our model, we collect
a large scale multimodal event-based dataset that comes with human pose
annotations, which is by far the most challenging one to the best of our
knowledge. The experimental results on two public datasets and our own dataset
demonstrate the effectiveness and strength of our approach. Code can be
available online for facilitating the future research
- …