Few-shot action recognition aims to recognize novel action classes using only
a small number of labeled training samples. In this work, we propose a novel
approach that first summarizes each video into compound prototypes consisting
of a group of global prototypes and a group of focused prototypes, and then
compares video similarity based on the prototypes. Each global prototype is
encouraged to summarize a specific aspect from the entire video, for example,
the start/evolution of the action. Since no clear annotation is provided for
the global prototypes, we use a group of focused prototypes to focus on certain
timestamps in the video. We compare video similarity by matching the compound
prototypes between the support and query videos. The global prototypes are
directly matched to compare videos from the same perspective, for example, to
compare whether two actions start similarly. For the focused prototypes, since
actions have various temporal variations in the videos, we apply bipartite
matching to allow the comparison of actions with different temporal positions
and shifts. Experiments demonstrate that our proposed method achieves
state-of-the-art results on multiple benchmarks.Comment: ECCV 202