1,510 research outputs found
Detecting events and key actors in multi-person videos
Multi-person event recognition is a challenging task, often with many people
active in the scene but only a small subset contributing to an actual event. In
this paper, we propose a model which learns to detect events in such videos
while automatically "attending" to the people responsible for the event. Our
model does not use explicit annotations regarding who or where those people are
during training and testing. In particular, we track people in videos and use a
recurrent neural network (RNN) to represent the track features. We learn
time-varying attention weights to combine these features at each time-instant.
The attended features are then processed using another RNN for event
detection/classification. Since most video datasets with multiple people are
restricted to a small number of videos, we also collected a new basketball
dataset comprising 257 basketball games with 14K event annotations
corresponding to 11 event classes. Our model outperforms state-of-the-art
methods for both event classification and detection on this new dataset.
Additionally, we show that the attention mechanism is able to consistently
localize the relevant players.Comment: Accepted for publication in CVPR'1
- …