17 research outputs found
Unsupervised Object Discovery and Tracking in Video Collections
This paper addresses the problem of automatically localizing dominant objects
as spatio-temporal tubes in a noisy collection of videos with minimal or even
no supervision. We formulate the problem as a combination of two complementary
processes: discovery and tracking. The first one establishes correspondences
between prominent regions across videos, and the second one associates
successive similar object regions within the same video. Interestingly, our
algorithm also discovers the implicit topology of frames associated with
instances of the same object class across different videos, a role normally
left to supervisory information in the form of class labels in conventional
image and video understanding methods. Indeed, as demonstrated by our
experiments, our method can handle video collections featuring multiple object
classes, and substantially outperforms the state of the art in colocalization,
even though it tackles a broader problem with much less supervision
Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution
Given a set of images containing objects from the same category, the task of
image co-localization is to identify and localize each instance. This paper
shows that this problem can be solved by a simple but intriguing idea, that is,
a common object detector can be learnt by making its detection confidence
scores distributed like those of a strongly supervised detector. More
specifically, we observe that given a set of object proposals extracted from an
image that contains the object of interest, an accurate strongly supervised
object detector should give high scores to only a small minority of proposals,
and low scores to most of them. Thus, we devise an entropy-based objective
function to enforce the above property when learning the common object
detector. Once the detector is learnt, we resort to a segmentation approach to
refine the localization. We show that despite its simplicity, our approach
outperforms state-of-the-art methods.Comment: Accepted to Proc. European Conf. Computer Vision 201
Object Position Labelling in Video Using PRBS Audio Multilateration
Supervised machine learning approaches for tracking objects’ positions in video typically require a large set of images in which the positions are labelled. Human labelling is time-consuming and automatic position labelling using visual markers is generally not possible because visible markers would corrupt the data. Here, we present an approach in which an object is tracked using a hidden tag that emits a PRBS audio signal. Four microphones arranged in a planar cross formation capture parallel recordings of the PRBS signal. Multilateration, using the time difference of arrival (TDoA) of the PRBS at each microphone, is used to estimate the position of the emitter. Here, we describe and evaluate the method by which the TDoAs are obtained and the emitter position is calculated. When evaluated, the approach yielded threedimensional position estimates with a mean error of 18.56cm. In its present form, the method is suitable for applications in which precision is not a priority, but three-dimensional object coordinates are required rather than two-dimensional camera view coordinates