2 research outputs found

    Spiking Neural Networks for Early Prediction in Human Robot Collaboration

    Full text link
    This paper introduces the Turn-Taking Spiking Neural Network (TTSNet), which is a cognitive model to perform early turn-taking prediction about human or agent's intentions. The TTSNet framework relies on implicit and explicit multimodal communication cues (physical, neurological and physiological) to be able to predict when the turn-taking event will occur in a robust and unambiguous fashion. To test the theories proposed, the TTSNet framework was implemented on an assistant robotic nurse, which predicts surgeon's turn-taking intentions and delivers surgical instruments accordingly. Experiments were conducted to evaluate TTSNet's performance in early turn-taking prediction. It was found to reach a F1 score of 0.683 given 10% of completed action, and a F1 score of 0.852 at 50% and 0.894 at 100% of the completed action. This performance outperformed multiple state-of-the-art algorithms, and surpassed human performance when limited partial observation is given (< 40%). Such early turn-taking prediction capability would allow robots to perform collaborative actions proactively, in order to facilitate collaboration and increase team efficiency.Comment: Under review for journa

    Prediction of Turn-Taking by Combining Prosodic and Eye-Gaze Information in Poster Conversations

    No full text
    We investigate turn-taking behaviors in conversations in poster sessions. While the poster presenter holds most of the turns during sessions, the audience’s utterances are more important and should not be missed. In this paper, therefore, prediction of turn-taking by the audience is addressed. It is classified into two sub-tasks: prediction of speaker change and prediction of the next speaker. We made analysis on eye-gaze information and its relationship with turn-taking, introducing joint eye-gaze events by the presenter and audience. We also parameterize backchannel patterns of the audience. As a result of machine learning with these features, it is found that combination of prosodic features of the presenter and the joint eye-gaze features is effective for predicting speaker change, while eyegaze duration and backchannels preceding the speaker change are useful for predicting the next speaker among the audience. Index Terms: multi-party interaction, turn-taking, prosody, eye-gaz
    corecore