3 research outputs found
Conditional Sequence Model for Context-based Recognition of Gaze Aversion
Abstract. Eye gaze and gesture form key conversational grounding cues that are used extensively in face-to-face interaction among people. To accurately recognize visual feedback during interaction, people often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper, we investigate how dialog context from an embodied conversational agent (ECA) can improve visual recognition of eye gestures. We propose a new framework for contextual recognition based on Latent-Dynamic Conditional Random Field (LDCRF) models to learn the sub-structure and external dynamics of contextual cues. Our experiments show that adding contextual information improves visual recognition of eye gestures and demonstrate that the LDCRF model for context-based recognition of gaze aversion gestures outperforms Support Vector Machines, Hidden Markov Models
Infinite Hidden Conditional Random Fields for the Recognition of Human Behaviour
While detecting and interpreting temporal patterns of nonverbal behavioral cues
in a given context is a natural and often unconscious process for humans, it
remains a rather difficult task for computer systems.
In this thesis we are primarily motivated by the problem of recognizing
expressions of high--level behavior, and specifically agreement and
disagreement.
We thoroughly dissect the problem by surveying the nonverbal behavioral cues
that could be present during displays of agreement and disagreement; we discuss
a number of methods that could be used or adapted to detect these suggested
cues; we list some publicly available databases these tools could be trained on
for the analysis of spontaneous, audiovisual instances of agreement and
disagreement, we examine the few existing attempts at agreement and disagreement
classification, and we discuss the challenges in automatically detecting
agreement and disagreement.
We present
experiments that show that an existing discriminative graphical model, the
Hidden Conditional Random Field (HCRF) is the best performing on this task. The
HCRF is a discriminative latent variable model which has been previously shown
to successfully learn the hidden structure of a given classification problem
(provided an appropriate validation of the number of hidden states).
We show here that HCRFs are also able to capture what makes each of these social
attitudes unique. We present an efficient technique to analyze the concepts
learned by the HCRF model and show that these coincide with the findings from
social psychology regarding which cues are most prevalent in agreement and
disagreement. Our experiments are performed on a spontaneous expressions dataset
curated from real televised debates.
The HCRF model outperforms conventional approaches such as Hidden Markov Models
and Support Vector Machines.
Subsequently, we examine existing graphical models that use Bayesian
nonparametrics to have a countably infinite number of hidden states and adapt
their complexity to the data at hand.
We identify a gap in the literature that is the lack of a discriminative such
graphical model and we present our suggestion for the first such model: an HCRF
with an infinite number of hidden states, the Infinite Hidden Conditional Random
Field (IHCRF).
In summary, the IHCRF is an undirected discriminative graphical model for
sequence classification and uses a countably infinite number of hidden states.
We present two variants of this model. The first is a fully nonparametric model
that relies on Hierarchical Dirichlet Processes and a Markov Chain Monte Carlo
inference approach. The second is a semi--parametric model that uses Dirichlet
Process Mixtures and relies on a mean--field variational inference approach. We
show that both models are able to converge to a correct number of represented
hidden states, and perform as well as the best finite HCRFs ---chosen via
cross--validation--- for the difficult tasks of recognizing instances of
agreement, disagreement, and pain in audiovisual sequences.Open Acces