3 research outputs found
HEMVIP: Human Evaluation of Multiple Videos in Parallel
In many research areas, for example motion and gesture generation, objective
measures alone do not provide an accurate impression of key stimulus traits
such as perceived quality or appropriateness. The gold standard is instead to
evaluate these aspects through user studies, especially subjective evaluations
of video stimuli. Common evaluation paradigms either present individual stimuli
to be scored on Likert-type scales, or ask users to compare and rate videos in
a pairwise fashion. However, the time and resources required for such
evaluations scale poorly as the number of conditions to be compared increases.
Building on standards used for evaluating the quality of multimedia codecs,
this paper instead introduces a framework for granular rating of multiple
comparable videos in parallel. This methodology essentially analyses all
condition pairs at once. Our contributions are 1) a proposed framework, called
HEMVIP, for parallel and granular evaluation of multiple video stimuli and 2) a
validation study confirming that results obtained using the tool are in close
agreement with results of prior studies using conventional multiple pairwise
comparisons.Comment: 8 pages, 2 figure
A Review of Evaluation Practices of Gesture Generation in Embodied Conversational Agents
Embodied Conversational Agents (ECA) take on different forms, including
virtual avatars or physical agents, such as a humanoid robot. ECAs are often
designed to produce nonverbal behaviour to complement or enhance its verbal
communication. One form of nonverbal behaviour is co-speech gesturing, which
involves movements that the agent makes with its arms and hands that is paired
with verbal communication. Co-speech gestures for ECAs can be created using
different generation methods, such as rule-based and data-driven processes.
However, reports on gesture generation methods use a variety of evaluation
measures, which hinders comparison. To address this, we conducted a systematic
review on co-speech gesture generation methods for iconic, metaphoric, deictic
or beat gestures, including their evaluation methods. We reviewed 22 studies
that had an ECA with a human-like upper body that used co-speech gesturing in a
social human-agent interaction, including a user study to evaluate its
performance. We found most studies used a within-subject design and relied on a
form of subjective evaluation, but lacked a systematic approach. Overall,
methodological quality was low-to-moderate and few systematic conclusions could
be drawn. We argue that the field requires rigorous and uniform tools for the
evaluation of co-speech gesture systems. We have proposed recommendations for
future empirical evaluation, including standardised phrases and test scenarios
to test generative models. We have proposed a research checklist that can be
used to report relevant information for the evaluation of generative models as
well as to evaluate co-speech gesture use.Comment: 9 page