4,583 research outputs found
A Mimetic Strategy to Engage Voluntary Physical Activity In Interactive Entertainment
We describe the design and implementation of a vision based interactive
entertainment system that makes use of both involuntary and voluntary control
paradigms. Unintentional input to the system from a potential viewer is used to
drive attention-getting output and encourage the transition to voluntary
interactive behaviour. The iMime system consists of a character animation
engine based on the interaction metaphor of a mime performer that simulates
non-verbal communication strategies, without spoken dialogue, to capture and
hold the attention of a viewer. The system was developed in the context of a
project studying care of dementia sufferers. Care for a dementia sufferer can
place unreasonable demands on the time and attentional resources of their
caregivers or family members. Our study contributes to the eventual development
of a system aimed at providing relief to dementia caregivers, while at the same
time serving as a source of pleasant interactive entertainment for viewers. The
work reported here is also aimed at a more general study of the design of
interactive entertainment systems involving a mixture of voluntary and
involuntary control.Comment: 6 pages, 7 figures, ECAG08 worksho
A graphical model based solution to the facial feature point tracking problem
In this paper a facial feature point tracker that is motivated by applications
such as human-computer interfaces and facial expression analysis systems is
proposed. The proposed tracker is based on a graphical model framework. The
facial features are tracked through video streams by incorporating statistical relations in time as well as spatial relations between feature points. By exploiting the spatial relationships between feature points, the proposed method provides robustness in real-world conditions such as arbitrary head movements and occlusions. A Gabor feature-based occlusion detector is developed and used to handle occlusions. The performance of the proposed tracker has been evaluated
on real video data under various conditions including occluded facial gestures
and head movements. It is also compared to two popular methods, one based
on Kalman filtering exploiting temporal relations, and the other based on active
appearance models (AAM). Improvements provided by the proposed approach
are demonstrated through both visual displays and quantitative analysis
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
Looking at the Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress
Psychological distress is a significant and growing issue in society.
Automatic detection, assessment, and analysis of such distress is an active
area of research. Compared to modalities such as face, head, and vocal,
research investigating the use of the body modality for these tasks is
relatively sparse. This is, in part, due to the limited available datasets and
difficulty in automatically extracting useful body features. Recent advances in
pose estimation and deep learning have enabled new approaches to this modality
and domain. To enable this research, we have collected and analyzed a new
dataset containing full body videos for short interviews and self-reported
distress labels. We propose a novel method to automatically detect
self-adaptors and fidgeting, a subset of self-adaptors that has been shown to
be correlated with psychological distress. We perform analysis on statistical
body gestures and fidgeting features to explore how distress levels affect
participants' behaviors. We then propose a multi-modal approach that combines
different feature representations using Multi-modal Deep Denoising
Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our
proposed model, combining audio-visual features with automatically detected
fidgeting behavioral cues, can successfully predict distress levels in a
dataset labeled with self-reported anxiety and depression levels
Computationally efficient deformable 3D object tracking with a monocular RGB camera
182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices
- …