125 research outputs found

    A multimodal multiparty human-robot dialogue corpus for real world interaction

    Get PDF
    Kyoto University/Honda Research Institute Japan Co.,Ltd.LREC 2018 Special Speech Sessions "Speech Resources Collection in Real-World Situations"; Phoenix Seagaia Conference Center, Miyazaki; 2018-05-09We have developed the MPR multimodal dialogue corpus and describe research activities using the corpus aimed for enabling multiparty human-robot verbal communication in real-world settings. While aiming for that as the final goal, the immediate focus of our project and the corpus is non-verbal communication, especially social signal processing by machines as the foundation of human-machine verbal communication. The MPR corpus stores annotated audio-visual recordings of dialogues between one robot and one or multiple (up to tree) participants. The annotations include speech segment, addressee of speech, transcript, interaction state, and, dialogue act types. Our research on multiparty dialogue management, boredom recognition, response obligation recognition, surprise detection and repair detection using the corpus is briefly introduced, and an analysis on repair in multiuser situations is presented. It exhibits richer repair behaviors and demands more sophisticated repair handling by machines

    Proceedings of the LREC 2018 Special Speech Sessions

    Get PDF
    LREC 2018 Special Speech Sessions "Speech Resources Collection in Real-World Situations"; Phoenix Seagaia Conference Center, Miyazaki; 2018-05-0

    Real-Time Topic and Sentiment Analysis in Human-Robot Conversation

    Get PDF
    Socially interactive robots, especially those designed for entertainment and companionship, must be able to hold conversations with users that feel natural and engaging for humans. Two important components of such conversations include adherence to the topic of conversation and inclusion of affective expressions. Most previous approaches have concentrated on topic detection or sentiment analysis alone, and approaches that attempt to address both are limited by domain and by type of reply. This thesis presents a new approach, implemented on a humanoid robot interface, that detects the topic and sentiment of a user’s utterances from text-transcribed speech. It also generates domain-independent, topically relevant verbal replies and appropriate positive and negative emotional expressions in real time. The front end of the system is a smartphone app that functions as the robot’s face. It displays emotionally expressive eyes, transcribes verbal input as text, and synthesizes spoken replies. The back end of the system is implemented on the robot’s onboard computer. It connects with the app via Bluetooth, receives and processes the transcribed input, and returns verbal replies and sentiment scores. The back end consists of a topic-detection subsystem and a sentiment-analysis subsystem. The topic-detection subsystem uses a Latent Semantic Indexing model of a conversation corpus, followed by a search in the online database ConceptNet 5, in order to generate a topically relevant reply. The sentiment-analysis subsystem disambiguates the input words, obtains their sentiment scores from SentiWordNet, and returns the averaged sum of the scores as the overall sentiment score. The system was hypothesized to engage users more with both subsystems working together than either subsystem alone, and each subsystem alone was hypothesized to engage users more than a random control. In computational evaluations, each subsystem performed weakly but positively. In user evaluations, users reported a higher level of topical relevance and emotional appropriateness in conversations in which the subsystems were working together, and they reported higher engagement especially in conversations in which the topic-detection system was working. It is concluded that the system partially fulfills its goals, and suggestions for future work are presented

    A Real-Time Architecture for Conversational Agents

    Get PDF
    Consider two people having a face-to-face conversation. They sometimes listen, sometimes talk, and sometimes interrupt each other. They use facial expressions to signal that they are confused. They point at objects. They jump from topic to topic opportunistically. When another acquaintance walks by, they nod and say hello. All the while they have other concerns on their mind, such as not missing the meeting that starts in 10 minutes. Like many other humans behaviors, these are not easy to replicate in artificial agents. In this work we look into the design requirements of an embodied agent that can participate in such natural conversations in a mixed-initiative, multi-modal setting. Such an agent needs to understand participating in a conversation is not merely a matter of sending a message and then waiting to receive a response -- both partners are simultaneously active at all times. This agent should be able to deal with different, sometimes conflicting goals, and be always ready to address events that may interrupt the current topic of conversation. To address those requirements, we have created a modular architecture that includes distributed functional units that compete with each other to gain control over available resources. Each of these units, called a schema, has its own sense- think-act cycle. In the field of robotics, this design is often referred to as behavior-based or schema-based. The major contribution of this work is merging behavior-based robotics with plan- based human-computer interaction
    corecore