Research on group activity recognition mostly leans on the standard
two-stream approach (RGB and Optical Flow) as their input features. Few have
explored explicit pose information, with none using it directly to reason about
the persons interactions. In this paper, we leverage the skeleton information
to learn the interactions between the individuals straight from it. With our
proposed method GIRN, multiple relationship types are inferred from independent
modules, that describe the relations between the body joints pair-by-pair.
Additionally to the joints relations, we also experiment with the previously
unexplored relationship between individuals and relevant objects (e.g.
volleyball). The individuals distinct relations are then merged through an
attention mechanism, that gives more importance to those individuals more
relevant for distinguishing the group activity. We evaluate our method in the
Volleyball dataset, obtaining competitive results to the state-of-the-art. Our
experiments demonstrate the potential of skeleton-based approaches for modeling
multi-person interactions.Comment: 26 pages, 5 figures, accepted manuscript in Elsevier Pattern
Recognition, minor writing revisions and new reference