This paper proposes a novel study on personality recognition using video data
from different scenarios. Our goal is to jointly model nonverbal behavioral
cues with contextual information for a robust, multi-scenario, personality
recognition system. Therefore, we build a novel multi-stream Convolutional
Neural Network framework (CNN), which considers multiple sources of
information. From a given scenario, we extract spatio-temporal motion
descriptors from every individual in the scene, spatio-temporal motion
descriptors encoding social group dynamics, and proxemics descriptors to encode
the interaction with the surrounding context. All the proposed descriptors are
mapped to the same feature space facilitating the overall learning effort.
Experiments on two public datasets demonstrate the effectiveness of jointly
modeling the mutual Person-Context information, outperforming the state-of-the
art-results for personality recognition in two different scenarios. Lastly, we
present CNN class activation maps for each personality trait, shedding light on
behavioral patterns linked with personality attributes