12,961 research outputs found
Classifying types of gesture and inferring intent
In order to infer intent from gesture, a rudimentary classification of types of gestures into five main classes is introduced. The classification is intended as a basis for incorporating the understanding of gesture into human-robot interaction (HRI). Some requirements for the operational classification of gesture by a robot interacting with humans are also suggested
Speech-driven Animation with Meaningful Behaviors
Conversational agents (CAs) play an important role in human computer
interaction. Creating believable movements for CAs is challenging, since the
movements have to be meaningful and natural, reflecting the coupling between
gestures and speech. Studies in the past have mainly relied on rule-based or
data-driven approaches. Rule-based methods focus on creating meaningful
behaviors conveying the underlying message, but the gestures cannot be easily
synchronized with speech. Data-driven approaches, especially speech-driven
models, can capture the relationship between speech and gestures. However, they
create behaviors disregarding the meaning of the message. This study proposes
to bridge the gap between these two approaches overcoming their limitations.
The approach builds a dynamic Bayesian network (DBN), where a discrete variable
is added to constrain the behaviors on the underlying constraint. The study
implements and evaluates the approach with two constraints: discourse functions
and prototypical behaviors. By constraining on the discourse functions (e.g.,
questions), the model learns the characteristic behaviors associated with a
given discourse class learning the rules from the data. By constraining on
prototypical behaviors (e.g., head nods), the approach can be embedded in a
rule-based system as a behavior realizer creating trajectories that are timely
synchronized with speech. The study proposes a DBN structure and a training
approach that (1) models the cause-effect relationship between the constraint
and the gestures, (2) initializes the state configuration models increasing the
range of the generated behaviors, and (3) captures the differences in the
behaviors across constraints by enforcing sparse transitions between shared and
exclusive states per constraint. Objective and subjective evaluations
demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table
The spatiotemporal representation of dance and music gestures using topological gesture analysis (TGA)
SPATIOTEMPORAL GESTURES IN MUSIC AND DANCE HAVE been approached using both qualitative and quantitative research methods. Applying quantitative methods has offered new perspectives but imposed several constraints such as artificial metric systems, weak links with qualitative information, and incomplete accounts of variability. In this study, we tackle these problems using concepts from topology to analyze gestural relationships in space. The Topological Gesture Analysis (TGA) relies on the projection of musical cues onto gesture trajectories, which generates point clouds in a three-dimensional space. Point clouds can be interpreted as topologies equipped with musical qualities, which gives us an idea about the relationships between gesture, space, and music. Using this method, we investigate the relationships between musical meter, dance style, and expertise in two popular dances (samba and Charleston). The results show how musical meter is encoded in the dancer's space and how relevant information about styles and expertise can be revealed by means of simple topological relationships
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
Detection of bimanual gestures everywhere: why it matters, what we need and what is missing
Bimanual gestures are of the utmost importance for the study of motor
coordination in humans and in everyday activities. A reliable detection of
bimanual gestures in unconstrained environments is fundamental for their
clinical study and to assess common activities of daily living. This paper
investigates techniques for a reliable, unconstrained detection and
classification of bimanual gestures. It assumes the availability of inertial
data originating from the two hands/arms, builds upon a previously developed
technique for gesture modelling based on Gaussian Mixture Modelling (GMM) and
Gaussian Mixture Regression (GMR), and compares different modelling and
classification techniques, which are based on a number of assumptions inspired
by literature about how bimanual gestures are represented and modelled in the
brain. Experiments show results related to 5 everyday bimanual activities,
which have been selected on the basis of three main parameters: (not)
constraining the two hands by a physical tool, (not) requiring a specific
sequence of single-hand gestures, being recursive (or not). In the best
performing combination of modeling approach and classification technique, five
out of five activities are recognized up to an accuracy of 97%, a precision of
82% and a level of recall of 100%.Comment: Submitted to Robotics and Autonomous Systems (Elsevier
Recommended from our members
Pantomimic Gestures for Human-Robot Interaction
This work introduces a pantomimic gesture interface, which classifies human hand gestures using unmanned aerial vehicle (UAV) behaviour recordings as training data. We argue that pantomimic gestures are more intuitive than iconic gestures and show that a pantomimic gesture recognition strategy using micro UAV behaviour recordings can be more robust than one trained directly using hand gestures. Hand gestures are isolated by applying a maximum information criterion, with features extracted using principal component analysis (PCA) and compared using a nearest neighbour classifier. These features are biased in that they are better suited to classifying certain behaviours. We show how a Bayesian update step accounting for the geometry of training features compensates for this, resulting in fairer classification results, and introduce a weighted voting system to aid in sequence labelling.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/TRO.2015.247595
- …