2 research outputs found
Spiking Neural Networks for Early Prediction in Human Robot Collaboration
This paper introduces the Turn-Taking Spiking Neural Network (TTSNet), which
is a cognitive model to perform early turn-taking prediction about human or
agent's intentions. The TTSNet framework relies on implicit and explicit
multimodal communication cues (physical, neurological and physiological) to be
able to predict when the turn-taking event will occur in a robust and
unambiguous fashion. To test the theories proposed, the TTSNet framework was
implemented on an assistant robotic nurse, which predicts surgeon's turn-taking
intentions and delivers surgical instruments accordingly. Experiments were
conducted to evaluate TTSNet's performance in early turn-taking prediction. It
was found to reach a F1 score of 0.683 given 10% of completed action, and a F1
score of 0.852 at 50% and 0.894 at 100% of the completed action. This
performance outperformed multiple state-of-the-art algorithms, and surpassed
human performance when limited partial observation is given (< 40%). Such early
turn-taking prediction capability would allow robots to perform collaborative
actions proactively, in order to facilitate collaboration and increase team
efficiency.Comment: Under review for journa
Prediction of Turn-Taking by Combining Prosodic and Eye-Gaze Information in Poster Conversations
We investigate turn-taking behaviors in conversations in poster sessions. While the poster presenter holds most of the turns during sessions, the audience’s utterances are more important and should not be missed. In this paper, therefore, prediction of turn-taking by the audience is addressed. It is classified into two sub-tasks: prediction of speaker change and prediction of the next speaker. We made analysis on eye-gaze information and its relationship with turn-taking, introducing joint eye-gaze events by the presenter and audience. We also parameterize backchannel patterns of the audience. As a result of machine learning with these features, it is found that combination of prosodic features of the presenter and the joint eye-gaze features is effective for predicting speaker change, while eyegaze duration and backchannels preceding the speaker change are useful for predicting the next speaker among the audience. Index Terms: multi-party interaction, turn-taking, prosody, eye-gaz