Building embodied autonomous agents capable of participating in social
interactions with humans is one of the main challenges in AI. Within the Deep
Reinforcement Learning (DRL) field, this objective motivated multiple works on
embodied language use. However, current approaches focus on language as a
communication tool in very simplified and non-diverse social situations: the
"naturalness" of language is reduced to the concept of high vocabulary size and
variability. In this paper, we argue that aiming towards human-level AI
requires a broader set of key social skills: 1) language use in complex and
variable social contexts; 2) beyond language, complex embodied communication in
multimodal settings within constantly evolving social worlds. We explain how
concepts from cognitive sciences could help AI to draw a roadmap towards
human-like intelligence, with a focus on its social dimensions. As a first
step, we propose to expand current research to a broader set of core social
skills. To do this, we present SocialAI, a benchmark to assess the acquisition
of social skills of DRL agents using multiple grid-world environments featuring
other (scripted) social agents. We then study the limits of a recent SOTA DRL
approach when tested on SocialAI and discuss important next steps towards
proficient social agents. Videos and code are available at
https://sites.google.com/view/socialai.Comment: under review. This paper extends and generalizes work in
arXiv:2104.1320