Rapport, the close and harmonious relationship in which interaction partners
are "in sync" with each other, was shown to result in smoother social
interactions, improved collaboration, and improved interpersonal outcomes. In
this work, we are first to investigate automatic prediction of low rapport
during natural interactions within small groups. This task is challenging given
that rapport only manifests in subtle non-verbal signals that are, in addition,
subject to influences of group dynamics as well as inter-personal
idiosyncrasies. We record videos of unscripted discussions of three to four
people using a multi-view camera system and microphones. We analyse a rich set
of non-verbal signals for rapport detection, namely facial expressions, hand
motion, gaze, speaker turns, and speech prosody. Using facial features, we can
detect low rapport with an average precision of 0.7 (chance level at 0.25),
while incorporating prior knowledge of participants' personalities can even
achieve early prediction without a drop in performance. We further provide a
detailed analysis of different feature sets and the amount of information
contained in different temporal segments of the interactions.Comment: 12 pages, 6 figure