28 research outputs found
A Probabilistic Framework for Joint Head Tracking and Pose Estimation
Head Tracking and pose estimation are usually considered as two sequential and separate problems: pose is estimated on the head patch provided by a tracking module. However, precision in head pose estimation is dependent on tracking accuracy which itself could benefit from the head orientation knowledge. Therefore, this work considers head tracking and pose estimation as two coupled problems in a probabilistic setting. Head pose models are learned and incorporated into a mixed-state particle filter framework for joint head tracking and pose estimation. Experimental results on real sequences show the effectiveness of the method in estimating more stable and accurate pose values
Modelisation implicite du mouvement en suivi par filtrage de Monte Carlo sequentiel
Le filtrage par méthode de Monte-Carlo séquentiel (MCS) est l'une des méthodes les plus populaires pour effectuer du suivi visuel. Dans ce contexte, il est généralement fait l'hypothèse que, étant donnée la position d'un objet dans des images successives, les observations extraites des images de cet objet sont indépendantes. Dans cet article, nous soutenons que, au contraire, ces observation sont fortement corrélées. Pour prendre en compte cette correlation, nous proposons un nouveau modèle qui peut s'interpréter comme l'ajout d'un terme de vraisemblance modélisant implicitement des mesures de mouvement. Le nouveau modèle permet de lever des ambiguïtés visuelles tout en gardant des modèles d'objet simples, comme le montrent les résultats obtenus sur plusieurs séquences et modèles d'objets différents (contour ou distribution de couleurs)
Probabilistic Head Pose Tracking Evaluation in Single and Multiple Camera Setups
This paper presents our participation in the CLEAR 07 evaluation workshop head pose estimation tasks where two head pose estimation tasks were to be addressed. The first task estimates head poses with respect to (w.r.t.) a single camera capturing people seated in a meeting room scenario. The second task consisted of estimating the head pose of people moving in a room from four cameras w.r.t. a global room coordinate. To solve the first task, we used a probabilistic exemplar-based head pose tracking method using a mixed state particle filter based on a represention in a joint state space of head localization and pose variable. This state space representation allows the combined search for both the optimal head location and pose. To solve the second task, we first applied the same head tracking framework to estimate the head pose w.r.t each of the four camera. Then, using the camera calibration parameters, the head poses w.r.t. individual cameras were transformed into head poses w.r.t to the global room coordinates, and the measures obtained from the four cameras were fused using reliability measures based on skin detection. Good head pose tracking performances were obtained for both tasks
Speech/Non-Speech Detection in Meetings from Automatically Extracted Low Resolution Visual Features
In this paper we address the problem of estimating who is speaking from automatically extracted low resolution visual cues from group meetings. Traditionally, the task of speech/non-speech detection or speaker diarization tries to find who speaks and when from audio features only. Recent work has addressed the problem audio-visually but often with less emphasis on the visual component. Due to the high probability of losing the audio stream during video conferences, this work proposes methods for estimating speech using just low resolution visual cues. We carry out experiments to compare how context through the observation of group behaviour and task-oriented activities can help improve estimates of speaking status. We test on 105 minutes of natural meeting data with unconstrained conversations
A Cognitive and Unsupervised MAP Adaptation Approach to the Recognition of the Focus of Attention from Head Pose
In this paper, the recognition of the visual focus of attention (VFOA) of meeting participants (as defined by their eye gaze direction) from their head pose is addressed. To this end, the head pose observations are modeled using an Hidden Markov Model (HMM) whose hidden states corresponds to the VFOA. The novelties are threefold. First, contrary to previous studies on the topic, in our set-up, the potential VFOA of a person is not restricted to other participants only, but includes environmental targets (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan and tilt (as well) gaze space. Second, the HMM parameters are set by exploiting results from the cognitive science on saccadic eye motion, which allows to predict what the head pose should be given an actual gaze target. Third, an unsupervised parameter adaptation step is proposed which accounts for the specific gazing behaviour of each participant. Using a publicly available corpus of 8 meetings featuring 4 persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision based tracking system
Multi-party Focus of Attention Recognition in Meetings from Head Pose and Multimodal Contextual Cues
We address the problem of recognizing the visual focus of attention (VFOA) of meeting participants from their head pose and contextual cues. The main contribution of the paper is the use of a head pose posterior distribution as a representation of the head pose information contained in the image data. This posterior encodes the probabilities of the different head poses given the image data, and constitute therefore a richer representation of the data than the mean or the mode of this distribution, as done in all previous work. These observations are exploited in a joint interaction model of all meeting participants pose observations, VFOAs, speaking status and of environmental contextual cues. Numerical experiments on a public database of 4 meetings of 22min on average show that this change of representation allows for a 5.4% gain with respect to the standard approach using head pose as observation
Recognizing People's Focus of Attention from Head Poses: a Study
This paper presents a study on the recognition of the visual focus of attention (VFOA) of meeting participants based on their head pose. Contrary to previous studies on the topic, in our set-up, the potential VFOA of a person is not restricted to other meeting the participants only, but include environmental targets (including a table, a projection screen). This has two consequences. First, it increases the number of possible ambiguities in identifying the VFOA from the head pose. Secondly, in the scenario we present here, full knowledge of the head pointing direction is required to identify the VFOA. An incomplete representation of the head pointing direction (head pan only) will not suffice. In this paper, using a corpus of 8 meetings of 10 minutes average length, featuring 4 persons involved discussing statements projected on a screen, we analyze the above issues by evaluating, through numerical performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device (the ground truth) or a vision based tracking system (head pose estimates). The results clearly show that in such complex but realistic situations, it is can be optimistic to believe that the recognition of the VFOA can solely be based on the head pose, as some previous studies had suggested
A Rao-Blackwellized Mixed State Particle Filter for Head Pose Tracking
This paper presents a Rao-Blackwellized mixed state particle filter for joint head tracking and pose estimation. Rao-Blackwellizing a particle filter consists of marginalizing some of the variables of the state space in order to exactly compute their posterior probability density function. Marginalizing variables reduces the dimension of the configuration space and makes the particle filter more efficient and requires a lower number of particles. Experiments were conducted on our head pose ground truth video database consisting of people engaged in meeting discussions. Results from these experiments demonstrated benefits of the Rao-Blackwellized particle filter model with fewer particles over the mixed state particle filter model
A Cognitive and Unsupervised MAP Adaptation Approach to the Recognition of the Focus of Attention from Head Pose
In this paper, the recognition of the visual focus of attention (VFOA) of meeting participants (as defined by their eye gaze direction) from their head pose is addressed. To this end, the head pose observations are modeled using an Hidden Markov Model (HMM) whose hidden states corresponds to the VFOA. The novelties are threefold. First, contrary to previous studies on the topic, in our set-up, the potential VFOA of a person is not restricted to other participants only, but includes environmental targets (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan and tilt (as well) gaze space. Second, the HMM parameters are set by exploiting results from the cognitive science on saccadic eye motion, which allows to predict what the head pose should be given an actual gaze target. Third, an unsupervised parameter adaptation step is proposed which accounts for the specific gazing behaviour of each participant. Using a publicly available corpus of 8 meetings featuring 4 persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision based tracking system
A Video Database for Head Pose Tracking Evaluation
This document describes our work to provide a video database, of people in real situations with their head pose continuously annotated through time. The head poses were annotated using a magnetic 3d location and orientation tracker, the flock of bird. The environments of our meeting room were a meeting room and an office with their common light sources. 16 people were involved in the meeting room recording and 15 in the office giving a high person appearance variability