1,135 research outputs found
UntrimmedNets for Weakly Supervised Action Recognition and Detection
Current action recognition methods heavily rely on trimmed videos for model
training. However, it is expensive and time-consuming to acquire a large-scale
trimmed video dataset. This paper presents a new weakly supervised
architecture, called UntrimmedNet, which is able to directly learn action
recognition models from untrimmed videos without the requirement of temporal
annotations of action instances. Our UntrimmedNet couples two important
components, the classification module and the selection module, to learn the
action models and reason about the temporal duration of action instances,
respectively. These two components are implemented with feed-forward networks,
and UntrimmedNet is therefore an end-to-end trainable architecture. We exploit
the learned models for action recognition (WSR) and detection (WSD) on the
untrimmed video datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet
only employs weak supervision, our method achieves performance superior or
comparable to that of those strongly supervised approaches on these two
datasets.Comment: camera-ready version to appear in CVPR201
Multiple Instance Curriculum Learning for Weakly Supervised Object Detection
When supervising an object detector with weakly labeled data, most existing
approaches are prone to trapping in the discriminative object parts, e.g.,
finding the face of a cat instead of the full body, due to lacking the
supervision on the extent of full objects. To address this challenge, we
incorporate object segmentation into the detector training, which guides the
model to correctly localize the full objects. We propose the multiple instance
curriculum learning (MICL) method, which injects curriculum learning (CL) into
the multiple instance learning (MIL) framework. The MICL method starts by
automatically picking the easy training examples, where the extent of the
segmentation masks agree with detection bounding boxes. The training set is
gradually expanded to include harder examples to train strong detectors that
handle complex images. The proposed MICL method with segmentation in the loop
outperforms the state-of-the-art weakly supervised object detectors by a
substantial margin on the PASCAL VOC datasets.Comment: Published in BMVC 201
Unsupervised Object Discovery and Tracking in Video Collections
This paper addresses the problem of automatically localizing dominant objects
as spatio-temporal tubes in a noisy collection of videos with minimal or even
no supervision. We formulate the problem as a combination of two complementary
processes: discovery and tracking. The first one establishes correspondences
between prominent regions across videos, and the second one associates
successive similar object regions within the same video. Interestingly, our
algorithm also discovers the implicit topology of frames associated with
instances of the same object class across different videos, a role normally
left to supervisory information in the form of class labels in conventional
image and video understanding methods. Indeed, as demonstrated by our
experiments, our method can handle video collections featuring multiple object
classes, and substantially outperforms the state of the art in colocalization,
even though it tackles a broader problem with much less supervision
- …