This paper proposes an approach to automatically categorize the social
interactions of a user wearing a photo-camera 2fpm, by relying solely on what
the camera is seeing. The problem is challenging due to the overwhelming
complexity of social life and the extreme intra-class variability of social
interactions captured under unconstrained conditions. We adopt the
formalization proposed in Bugental's social theory, that groups human relations
into five social domains with related categories. Our method is a new deep
learning architecture that exploits the hierarchical structure of the label
space and relies on a set of social attributes estimated at frame level to
provide a semantic representation of social interactions. Experimental results
on the new EgoSocialRelation dataset demonstrate the effectiveness of our
proposal.Comment: Accepted at ICIP 201