1,665 research outputs found
Weakly-Supervised Temporal Localization via Occurrence Count Learning
We propose a novel model for temporal detection and localization which allows
the training of deep neural networks using only counts of event occurrences as
training labels. This powerful weakly-supervised framework alleviates the
burden of the imprecise and time-consuming process of annotating event
locations in temporal data. Unlike existing methods, in which localization is
explicitly achieved by design, our model learns localization implicitly as a
byproduct of learning to count instances. This unique feature is a direct
consequence of the model's theoretical properties. We validate the
effectiveness of our approach in a number of experiments (drum hit and piano
onset detection in audio, digit detection in images) and demonstrate
performance comparable to that of fully-supervised state-of-the-art methods,
despite much weaker training requirements.Comment: Accepted at ICML 201
Circulant temporal encoding for video retrieval and temporal alignment
We address the problem of specific video event retrieval. Given a query video
of a specific event, e.g., a concert of Madonna, the goal is to retrieve other
videos of the same event that temporally overlap with the query. Our approach
encodes the frame descriptors of a video to jointly represent their appearance
and temporal order. It exploits the properties of circulant matrices to
efficiently compare the videos in the frequency domain. This offers a
significant gain in complexity and accurately localizes the matching parts of
videos. The descriptors can be compressed in the frequency domain with a
product quantizer adapted to complex numbers. In this case, video retrieval is
performed without decompressing the descriptors. We also consider the temporal
alignment of a set of videos. We exploit the matching confidence and an
estimate of the temporal offset computed for all pairs of videos by our
retrieval approach. Our robust algorithm aligns the videos on a global timeline
by maximizing the set of temporally consistent matches. The global temporal
alignment enables synchronous playback of the videos of a given scene
Localizing Objects While Learning Their Appearance
Abstract. Learning a new object class from cluttered training images is very challenging when the location of object instances is unknown. Previous works generally require objects covering a large portion of the images. We present a novel approach that can cope with extensive clutter as well as large scale and appearance variations between object instances. To make this possible we propose a conditional random field that starts from generic knowledge and then progressively adapts to the new class. Our approach simultaneously localizes object instances while learning an appearance model specific for the class. We demonstrate this on the challenging Pascal VOC 2007 dataset. Furthermore, our method enables to train any state-of-the-art object detector in a weakly supervised fashion, although it would normally require object location annotations.
- …