30,949 research outputs found
Localizing Actions from Video Labels and Pseudo-Annotations
The goal of this paper is to determine the spatio-temporal location of
actions in video. Where training from hard to obtain box annotations is the
norm, we propose an intuitive and effective algorithm that localizes actions
from their class label only. We are inspired by recent work showing that
unsupervised action proposals selected with human point-supervision perform as
well as using expensive box annotations. Rather than asking users to provide
point supervision, we propose fully automatic visual cues that replace manual
point annotations. We call the cues pseudo-annotations, introduce five of them,
and propose a correlation metric for automatically selecting and combining
them. Thorough evaluation on challenging action localization datasets shows
that we reach results comparable to results with full box supervision. We also
show that pseudo-annotations can be leveraged during testing to improve weakly-
and strongly-supervised localizers.Comment: BMV
- …