Search CORE

3 research outputs found

Conditional Sequence Model for Context-based Recognition of Gaze Aversion

Author: Trevor Darrell
Publication venue
Publication date
Field of study

Abstract. Eye gaze and gesture form key conversational grounding cues that are used extensively in face-to-face interaction among people. To accurately recognize visual feedback during interaction, people often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper, we investigate how dialog context from an embodied conversational agent (ECA) can improve visual recognition of eye gestures. We propose a new framework for contextual recognition based on Latent-Dynamic Conditional Random Field (LDCRF) models to learn the sub-structure and external dynamics of contextual cues. Our experiments show that adding contextual information improves visual recognition of eye gestures and demonstrate that the LDCRF model for context-based recognition of gaze aversion gestures outperforms Support Vector Machines, Hidden Markov Models

CiteSeerX

Infinite Hidden Conditional Random Fields for the Recognition of Human Behaviour

Author: Bousmalis Konstantinos
Publication venue: Computing, Imperial College London
Publication date: 01/04/2015
Field of study

While detecting and interpreting temporal patterns of nonverbal behavioral cues in a given context is a natural and often unconscious process for humans, it remains a rather difficult task for computer systems. In this thesis we are primarily motivated by the problem of recognizing expressions of high--level behavior, and specifically agreement and disagreement. We thoroughly dissect the problem by surveying the nonverbal behavioral cues that could be present during displays of agreement and disagreement; we discuss a number of methods that could be used or adapted to detect these suggested cues; we list some publicly available databases these tools could be trained on for the analysis of spontaneous, audiovisual instances of agreement and disagreement, we examine the few existing attempts at agreement and disagreement classification, and we discuss the challenges in automatically detecting agreement and disagreement. We present experiments that show that an existing discriminative graphical model, the Hidden Conditional Random Field (HCRF) is the best performing on this task. The HCRF is a discriminative latent variable model which has been previously shown to successfully learn the hidden structure of a given classification problem (provided an appropriate validation of the number of hidden states). We show here that HCRFs are also able to capture what makes each of these social attitudes unique. We present an efficient technique to analyze the concepts learned by the HCRF model and show that these coincide with the findings from social psychology regarding which cues are most prevalent in agreement and disagreement. Our experiments are performed on a spontaneous expressions dataset curated from real televised debates. The HCRF model outperforms conventional approaches such as Hidden Markov Models and Support Vector Machines. Subsequently, we examine existing graphical models that use Bayesian nonparametrics to have a countably infinite number of hidden states and adapt their complexity to the data at hand. We identify a gap in the literature that is the lack of a discriminative such graphical model and we present our suggestion for the first such model: an HCRF with an infinite number of hidden states, the Infinite Hidden Conditional Random Field (IHCRF). In summary, the IHCRF is an undirected discriminative graphical model for sequence classification and uses a countably infinite number of hidden states. We present two variants of this model. The first is a fully nonparametric model that relies on Hierarchical Dirichlet Processes and a Markov Chain Monte Carlo inference approach. The second is a semi--parametric model that uses Dirichlet Process Mixtures and relies on a mean--field variational inference approach. We show that both models are able to converge to a correct number of represented hidden states, and perform as well as the best finite HCRFs ---chosen via cross--validation--- for the difficult tasks of recognizing instances of agreement, disagreement, and pain in audiovisual sequences.Open Acces

Spiral - Imperial College Digital Repository