3 research outputs found

    HEMVIP: Human Evaluation of Multiple Videos in Parallel

    Full text link
    In many research areas, for example motion and gesture generation, objective measures alone do not provide an accurate impression of key stimulus traits such as perceived quality or appropriateness. The gold standard is instead to evaluate these aspects through user studies, especially subjective evaluations of video stimuli. Common evaluation paradigms either present individual stimuli to be scored on Likert-type scales, or ask users to compare and rate videos in a pairwise fashion. However, the time and resources required for such evaluations scale poorly as the number of conditions to be compared increases. Building on standards used for evaluating the quality of multimedia codecs, this paper instead introduces a framework for granular rating of multiple comparable videos in parallel. This methodology essentially analyses all condition pairs at once. Our contributions are 1) a proposed framework, called HEMVIP, for parallel and granular evaluation of multiple video stimuli and 2) a validation study confirming that results obtained using the tool are in close agreement with results of prior studies using conventional multiple pairwise comparisons.Comment: 8 pages, 2 figure

    A Review of Evaluation Practices of Gesture Generation in Embodied Conversational Agents

    Full text link
    Embodied Conversational Agents (ECA) take on different forms, including virtual avatars or physical agents, such as a humanoid robot. ECAs are often designed to produce nonverbal behaviour to complement or enhance its verbal communication. One form of nonverbal behaviour is co-speech gesturing, which involves movements that the agent makes with its arms and hands that is paired with verbal communication. Co-speech gestures for ECAs can be created using different generation methods, such as rule-based and data-driven processes. However, reports on gesture generation methods use a variety of evaluation measures, which hinders comparison. To address this, we conducted a systematic review on co-speech gesture generation methods for iconic, metaphoric, deictic or beat gestures, including their evaluation methods. We reviewed 22 studies that had an ECA with a human-like upper body that used co-speech gesturing in a social human-agent interaction, including a user study to evaluate its performance. We found most studies used a within-subject design and relied on a form of subjective evaluation, but lacked a systematic approach. Overall, methodological quality was low-to-moderate and few systematic conclusions could be drawn. We argue that the field requires rigorous and uniform tools for the evaluation of co-speech gesture systems. We have proposed recommendations for future empirical evaluation, including standardised phrases and test scenarios to test generative models. We have proposed a research checklist that can be used to report relevant information for the evaluation of generative models as well as to evaluate co-speech gesture use.Comment: 9 page
    corecore