For most of human history, face-to-face interactions have been the primary and most fundamental way to build social relationships, and even in the digital era they remain the basis of our closest bonds. These interactions are built on the dynamic integration and coordination of verbal and non-verbal information between multiple people. However, the psychological processes underlying face-to-face interaction remain difficult to study. In this Review, we discuss three ways the multimodal phenomena underlying face-to-face social interaction can be organized to provide a solid basis for theory development. Next, we review three types of theory of social interaction: theories that focus on the social meaning of actions, theories that explain actions in terms of simple behaviour rules and theories that rely on rich cognitive models of the internal states of others. Finally, we address how different methods can be used to distinguish between theories, showcasing new approaches and outlining important directions for future research. Advances in how face-to-face social interaction can be studied, combined with a renewed focus on cognitive theories, could lead to a renaissance in social interaction research and advance scientific understanding of face-to-face interaction and its underlying cognitive foundations