161 research outputs found
Action Recognition from Single Timestamp Supervision in Untrimmed Videos
Recognising actions in videos relies on labelled supervision during training,
typically the start and end times of each action instance. This supervision is
not only subjective, but also expensive to acquire. Weak video-level
supervision has been successfully exploited for recognition in untrimmed
videos, however it is challenged when the number of different actions in
training videos increases. We propose a method that is supervised by single
timestamps located around each action instance, in untrimmed videos. We replace
expensive action bounds with sampling distributions initialised from these
timestamps. We then use the classifier's response to iteratively update the
sampling distributions. We demonstrate that these distributions converge to the
location and extent of discriminative action segments. We evaluate our method
on three datasets for fine-grained recognition, with increasing number of
different actions per video, and show that single timestamps offer a reasonable
compromise between recognition performance and labelling effort, performing
comparably to full temporal supervision. Our update method improves top-1 test
accuracy by up to 5.4%. across the evaluated datasets.Comment: CVPR 201
Annotating Object Instances with a Polygon-RNN
We propose an approach for semi-automatic annotation of object instances.
While most current methods treat object segmentation as a pixel-labeling
problem, we here cast it as a polygon prediction task, mimicking how most
current datasets have been annotated. In particular, our approach takes as
input an image crop and sequentially produces vertices of the polygon outlining
the object. This allows a human annotator to interfere at any time and correct
a vertex if needed, producing as accurate segmentation as desired by the
annotator. We show that our approach speeds up the annotation process by a
factor of 4.7 across all classes in Cityscapes, while achieving 78.4% agreement
in IoU with original ground-truth, matching the typical agreement between human
annotators. For cars, our speed-up factor is 7.3 for an agreement of 82.2%. We
further show generalization capabilities of our approach to unseen datasets
- …