2 research outputs found
Adaptive Body Gesture Representation for Automatic Emotion Recognition
We present a computational model and a system for the automated recognition of emotions starting from full-body movement. Three-dimensional motion data of full-body movements are obtained either from professional optical motion-capture systems (Qualisys) or from low-cost RGB-D sensors (Kinect and Kinect2). A number of features are then automatically extracted at different levels, from kinematics of a single joint to more global expressive features inspired by psychology and humanistic theories (e.g., contraction index, fluidity, and impulsiveness). An abstraction layer based on dictionary learning further processes these movement features to increase the model generality and to deal with intraclass variability, noise, and incomplete information characterizing emotion expression in human movement. The resulting feature vector is the input for a classifier performing real-time automatic emotion recognition based on linear support vector machines. The recognition performance of the proposed model is presented and discussed, including the tradeoff between precision of the tracking measures (we compare the Kinect RGB-D sensor and the Qualisys motion-capture system) versus dimension of the training dataset. The resulting model and system have been successfully applied in the development of serious games for helping autistic children learn to recognize and express emotions by means of their full-body movement
Online space-variant background modeling with sparse coding
In this paper, we propose a sparse coding approach to background modeling. The obtained model is based on dictionaries which we learn and keep up to date as new data are provided by a video camera. We observe that, without dynamic events, video frames may be seen as noisy data belonging to the background. Over time, such background is subject to local and global changes due to variable illumination conditions, camera jitter, stable scene changes, and intermittent motion of background objects. To capture the locality of some changes, we propose a space-variant analysis where we learn a dictionary of atoms for each image patch, the size of which depends on the background variability. At run time, each patch is represented by a linear combination of the atoms learnt online. A change is detected when the atoms are not sufficient to provide an appropriate representation, and stable changes over time trigger an update of the current dictionary. Even if the overall procedure is carried out at a coarse level, a pixel-wise segmentation can be obtained by comparing the atoms with the patch corresponding to the dynamic event. Experiments on benchmarks indicate that the proposed method achieves very good performances on a variety of scenarios. An assessment on long video streams confirms our method incorporates periodical changes, as the ones caused by variations in natural illumination. The model, fully data driven, is suitable as a main component of a change detection system