280 research outputs found

    Development of duplex eye contact framework for human-robot inter communication

    Get PDF
    Funding Information: This work was supported in part by the National Research Foundation of Korea-Grant funded by the Korean Government (Ministry of Science and ICT) under Grant NRF 2020R1A2B5B02002478, in part by the Sejong University through its Faculty Research Program, and in part by the Directorate of Research and Extension (DRE), Chittagong University of Engineering and Technology.Peer reviewedPublisher PD

    시각적, 언어적 피드백을 통한 대화형 에이전트의 성격 표현 및 수행 과제에 따른 성격의 선호도

    Get PDF
    학위논문(석사)--서울대학교 대학원 :사회과학대학 언론정보학과,2019. 8. 이준환.대화형 에이전트의 심리적이고 감성적인 능력이 인간과 컴퓨터의 자연스러운 관계 형성을 위해 필요로 된다. 대화형 에이전트의 부자연스러운 표현과 반응은 사용자들에게 오히려 반감을 줄 수 있으며, 관계에 부정적인 영향을 끼친다. 감성 컴퓨팅 분야에서 주로 감정을 적용해 이를 해결했다면, 본 연구에서는 성격을 부여함으로써 대화형 에이전트의 자연스러운 피드백과 반응을 표현하고자 한다. 본 연구에서는 대화형 에이전트의 성격을 어떻게 표현할 수 있을지에 대해 탐구했다. 성격 표현 요소들로 선정된 요소들은 시각적 피드백과 언어적 요소들이다. 피험자 간 설계 방식으로, 실험을 실시했는데, 스터디 1에서는 다른 시각적 피드백들에 따른 다섯 가지 성격의 인식을 측정했다. 스터디 2에서는 다른 성별의 목소리와 언어적 요소들에 따른 다섯 가지 성격 인식을 측정했다. 또한, 특정 성격들이 업무수행에 더 적합하다는 관점을 적용하여, 스터디 3에서는 대화형 에이전트가 수행하는 과제들과 성격들에 따라 사용자들의 선호도와 인지한 지적 능력을 측정했다. 스터디 1, 2의 연구 결과에 따르면 시각적 피드백의 색깔에 상관없이 움직임 정도에 따라 사용자들이 인식하는 성격이 달라짐을 확인할 수 있었다. 5가지 성격들 중에, 우호성(agreeableness)을 제외한 성격들에 따른 적합한 언어적 요소들을 확인할 수 있었다. 스터디 3의 연구 결과에 따르면, 대화형 에이전트가 사회적 수행 과제를 제외한 다른 과제들을 수행할 때, 창의성(openness)이 가장 선호되고, 가장 지적으로 여겨졌다. 사회적 과제를 수행하는 대화형 에이전트일 경우에만 외향성이 가장 선호되고, 지능적으로 여겨졌다. 연구 결과들에 따르면, 빠르고, 활발한 움직임의 표현 요소들이 더 뚜렷하며, 긍정적인 성격으로 인식된다. 그리고 대화형 에이전트의 성격에 대한 인식이 목소리의 성별에 따라 달라졌다. 또한, 다양하고, 표현적인 요소들을 사용하는 것이 긍정적인 성격들을 표현하기에 적합하다. 사람들이 대화형 에이전트를 인식할 때 사람들을 인식할 때와 비슷한 패턴들을 적용함을 알 수 있었다.Conversational agents with psychological abilities could facilitate natural communication between humans and computers while conversational agents unnatural expressions and reactions could frustrate users. This research applies the concept of personality to conversational agents to implement natural feedback and reactions. This study explores how to express conversational agents personalities. The selected cues were visual feedback and verbal cues. As a between-participants study design, Study 1 measured the perception of five personalities toward different visual feedback and Study 2 measured the perception of five personalities depending on different verbal cues with voices of different genders. Concerning that certain personalities of conversational agents were considered more suitable for certain tasks, Study 3 investigated the user preference and perceived intelligence toward conversational agents with different personalities and tasks. The study results demonstrate that different motions of visual feedback were highly influential on the perceptions of personalities. Color was not a decisive factor. In addition, except for agreeableness, different verbal cues were perceived as different personalities. For conversational agents performing service, physical, and office tasks, openness was the most preferred and perceived as intelligent. In case of social tasks, the extravert conversational agents were the most preferred and perceived as intelligent. Fast and active visual feedback is suitable to design conversational agents with distinct and positive personalities. In addition, perceptions of conversational agents personalities differed according to the gender of voice. Diverse and expressive cues were suitable for expressing positive personalities. Interactions between conversational agents and humans demonstrated similar patterns of perception as human-human interactions.1. Introduction 1 2. Related work 6 2.1. Expressing machines internal states in Human-Computer Interaction 6 2.2. Personality expressions of computers and interfaces 8 2.3. Combinations of diverse cues 10 2.4. The agents personality and task match 11 3. Study 1 13 3.1. Overview 13 3.2. Study 1-1 14 3.2.1. Experimental materials 14 3.2.2. Experimental setting 16 3.2.3. Results 18 3.3. Study 1-2 25 3.3.1. Experimental materials 25 3.3.2. Experimental setting 26 3.3.3. Results 28 3.4. Results 31 3.5. Discussion 32 3.6. Limitations & Future Studies 36 4. Study 2 39 4.1. Overview 39 4.2. Research questions 40 4.3. Method 41 4.3.1. Experimental materials 41 4.3.2. Experimental setting 44 4.4. Results 45 4.4.1. Result 1: Pitch levels and gender of voices 45 4.4.2. Result 2: Emotionality and Gender of voices 47 4.4.3. Result 3: Wordiness and gender of voices 53 4.4.4. Result 4: Speed and gender of voices 54 4.4.5. Result 5: Questioning and gender of voices 60 4.5. Overall results 63 4.6. Discussion 65 4.7. Limitations 67 5. Study 3 69 5.1. Overview 69 5.2. Method 70 5. Study 3 69 5.2.1. Experimental Materials 70 5.2.2. Manipulation check 73 5.2.3. Experimental Setting 74 5.3. Results 75 5.3.1. Office task 75 5.3.2. Social task 77 5.3.3. Service task 80 5.3.4. Physical task 82 5.4. Discussion 85 6. Conclusions 87 7. Discussion for overall study 91 References 93 Appendix 1. Big Five personality questionnaires 101 Appendix 2. God speed scale questionnaires 102 국문 초록 103Maste

    People Interpret Robotic Non-linguistic Utterances Categorically

    Get PDF
    We present results of an experiment probing whether adults exhibit categorical perception when affectively rating robot-like sounds (Non-linguistic Utterances). The experimental design followed the traditional methodology from the psychology domain for measuring categorical perception: stimulus continua for robot sounds were presented to subjects, who were asked to complete a discrimination and an identification task. In the former subjects were asked to rate whether stimulus pairs were affectively different, while in the latter they were asked to rate single stimuli affectively. The experiment confirms that Non-linguistic Utterances can convey affect and that they are drawn towards prototypical emotions, confirming that people show categorical perception at a level of inferred affective meaning when hearing robot-like sounds. We speculate on how these insights can be used to automatically design and generate affect-laden robot-like utterances

    MoveBox: Democratizing MoCap for the Microsoft Rocketbox Avatar Library

    Get PDF
    This paper presents MoveBox an open sourced toolbox for animating motion captured (MoCap) movements onto the Microsoft Rocketbox library of avatars. Motion capture is performed using a single depth sensor, such as Azure Kinect or Windows Kinect V2. Motion capture is performed in real-time using a single depth sensor, such as Azure Kinect or Windows Kinect V2, or extracted from existing RGB videos offline leveraging deep-learning computer vision techniques. Our toolbox enables real-time animation of the user’s avatar by converting the transformations between systems that have different joints and hierarchies. Additional features of the toolbox include recording, playback and looping animations, as well as basic audio lip sync, blinking and resizing of avatars as well as finger and hand animations. Our main contribution is both in the creation of this open source tool as well as the validation on different devices and discussion of MoveBox’s capabilities by end users

    Speech Driven Expressive Animations

    Get PDF
    Current state of the art lip sync facial animation systems use vision ­based performance capture methods which are highly resource consuming. These techniques lack scalability and post hoc customizability whilst simpler and more automated alternatives often lack expressiveness. We propose an extension for a deep learning based speech driven lip sync facial synthesis system that allows for expressiveness and manual tweaking in the emotion space. Our model generates expressive animations by mapping recorded speech features into facial rig parameters. Our architecture consists of a conditional Variational Autoencoder conditioned on speech, whose latent space controls the facial expression during inference and is driven by predictions from a Speech Emotion Recognition module. This approach, to the extent of our knowledge, has not been tried before in the literature. The results show that our Speech Emotion Recognition (SER) model is able to make meaningful predictions and generalize to unseen game speech utterances. Our user study shows that participants significantly prefer our model animations when compared to animations generated from random emotions and a baseline neutral emotion model

    Improved facial feature fitting for model based coding and animation

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Robot-Mediated Interviews: a Robotic Intermediary for Facilitating Communication with Children

    Get PDF
    Robots have been used in a variety of education, therapy or entertainment contexts. This thesis introduces the novel application of using humanoid robots for Robot-Mediated Interviews (RMIs). In the initial stages of this research it was necessary to first establish as a baseline if children would respond to a robot in an interview setting, therefore the first study compared how children responded to a robot and a human in an interview setting. Following this successful initial investigation, the second study expanded on this research by examining how children would respond to different types and difficulty of questions from a robot compared to a human interviewer. Building on these studies, the third study investigated how a RMI approach would work for children with special needs. Following the positive results from the three studies indicating that a RMI approach may have some potential, three separate user panel sessions were organised with user groups that have expertise in working with children and for whom the system would be potentially useful in their daily work. The panel sessions were designed to gather feedback on the previous studies and outline a set of requirements to make a RMI system feasible for real world users. The feedback and requirements from the user groups were considered and implemented in the system before conducting a final field trial of the system with a potential real world user. The results of the studies in this research reveal that the children generally interacted with KASPAR in a very similar to how they interacted with a human interviewer regardless of question type or difficulty. The feedback gathered from experts working with children suggested that the three most important and desirable features of a RMI system were: reliability, flexibility and ease of use. The feedback from the experts also indicated that a RMI system would most likely be used with children with special needs. The final field trial with 10 children and a potential real world user illustrated that a RMI system could potentially be used effectively outside of a research context, with all of the children in the trial responding to the robot. Feedback from the educational psychologist testing the system would suggest that a RMI approach could have real world implications if the system were developed further

    Multi-Sensory Emotion Recognition with Speech and Facial Expression

    Get PDF
    Emotion plays an important role in human beings’ daily lives. Understanding emotions and recognizing how to react to others’ feelings are fundamental to engaging in successful social interactions. Currently, emotion recognition is not only significant in human beings’ daily lives, but also a hot topic in academic research, as new techniques such as emotion recognition from speech context inspires us as to how emotions are related to the content we are uttering. The demand and importance of emotion recognition have highly increased in many applications in recent years, such as video games, human computer interaction, cognitive computing, and affective computing. Emotion recognition can be done from many sources including text, speech, hand, and body gesture as well as facial expression. Presently, most of the emotion recognition methods only use one of these sources. The emotion of human beings changes every second and using a single way to process the emotion recognition may not reflect the emotion correctly. This research is motivated by the desire to understand and evaluate human beings’ emotion from multiple ways such as speech and facial expressions. In this dissertation, multi-sensory emotion recognition has been exploited. The proposed framework can recognize emotion from speech, facial expression, and both of them. There are three important parts in the design of the system: the facial emotion recognizer, the speech emotion recognizer, and the information fusion. The information fusion part uses the results from the speech emotion recognition and facial emotion recognition. Then, a novel weighted method is used to integrate the results, and a final decision of the emotion is given after the fusion. The experiments show that with the weighted fusion methods, the accuracy can be improved to an average of 3.66% compared to fusion without adding weight. The improvement of the recognition rate can reach 18.27% and 5.66% compared to the speech emotion recognition and facial expression recognition, respectively. By improving the emotion recognition accuracy, the proposed multi-sensory emotion recognition system can help to improve the naturalness of human computer interaction

    Multi-Sensory Emotion Recognition with Speech and Facial Expression

    Get PDF
    Emotion plays an important role in human beings’ daily lives. Understanding emotions and recognizing how to react to others’ feelings are fundamental to engaging in successful social interactions. Currently, emotion recognition is not only significant in human beings’ daily lives, but also a hot topic in academic research, as new techniques such as emotion recognition from speech context inspires us as to how emotions are related to the content we are uttering. The demand and importance of emotion recognition have highly increased in many applications in recent years, such as video games, human computer interaction, cognitive computing, and affective computing. Emotion recognition can be done from many sources including text, speech, hand, and body gesture as well as facial expression. Presently, most of the emotion recognition methods only use one of these sources. The emotion of human beings changes every second and using a single way to process the emotion recognition may not reflect the emotion correctly. This research is motivated by the desire to understand and evaluate human beings’ emotion from multiple ways such as speech and facial expressions. In this dissertation, multi-sensory emotion recognition has been exploited. The proposed framework can recognize emotion from speech, facial expression, and both of them. There are three important parts in the design of the system: the facial emotion recognizer, the speech emotion recognizer, and the information fusion. The information fusion part uses the results from the speech emotion recognition and facial emotion recognition. Then, a novel weighted method is used to integrate the results, and a final decision of the emotion is given after the fusion. The experiments show that with the weighted fusion methods, the accuracy can be improved to an average of 3.66% compared to fusion without adding weight. The improvement of the recognition rate can reach 18.27% and 5.66% compared to the speech emotion recognition and facial expression recognition, respectively. By improving the emotion recognition accuracy, the proposed multi-sensory emotion recognition system can help to improve the naturalness of human computer interaction
    corecore