1 research outputs found
Recognizing Video Events with Varying Rhythms
Recognizing Video events in long, complex videos with multiple sub-activities
has received persistent attention recently. This task is more challenging than
traditional action recognition with short, relatively homogeneous video clips.
In this paper, we investigate the problem of recognizing long and complex
events with varying action rhythms, which has not been considered in the
literature but is a practical challenge. Our work is inspired in part by how
humans identify events with varying rhythms: quickly catching frames
contributing most to a specific event. We propose a two-stage \emph{end-to-end}
framework, in which the first stage selects the most significant frames while
the second stage recognizes the event using the selected frames. Our model
needs only \emph{event-level labels} in the training stage, and thus is more
practical when the sub-activity labels are missing or difficult to obtain. The
results of extensive experiments show that our model can achieve significant
improvement in event recognition from long videos while maintaining high
accuracy even if the test videos suffer from severe rhythm changes. This
demonstrates the potential of our method for real-world video-based
applications, where test and training videos can differ drastically in rhythms
of sub-activities