12,417 research outputs found
Unsupervised Object Discovery and Localization in the Wild: Part-based Matching with Bottom-up Region Proposals
This paper addresses unsupervised discovery and localization of dominant
objects from a noisy image collection with multiple object classes. The setting
of this problem is fully unsupervised, without even image-level annotations or
any assumption of a single dominant class. This is far more general than
typical colocalization, cosegmentation, or weakly-supervised localization
tasks. We tackle the discovery and localization problem using a part-based
region matching approach: We use off-the-shelf region proposals to form a set
of candidate bounding boxes for objects and object parts. These regions are
efficiently matched across images using a probabilistic Hough transform that
evaluates the confidence for each candidate correspondence considering both
appearance and spatial consistency. Dominant objects are discovered and
localized by comparing the scores of candidate regions and selecting those that
stand out over other regions containing them. Extensive experimental
evaluations on standard benchmarks demonstrate that the proposed approach
significantly outperforms the current state of the art in colocalization, and
achieves robust object discovery in challenging mixed-class datasets.Comment: CVPR 201
Unsupervised Object Discovery and Tracking in Video Collections
This paper addresses the problem of automatically localizing dominant objects
as spatio-temporal tubes in a noisy collection of videos with minimal or even
no supervision. We formulate the problem as a combination of two complementary
processes: discovery and tracking. The first one establishes correspondences
between prominent regions across videos, and the second one associates
successive similar object regions within the same video. Interestingly, our
algorithm also discovers the implicit topology of frames associated with
instances of the same object class across different videos, a role normally
left to supervisory information in the form of class labels in conventional
image and video understanding methods. Indeed, as demonstrated by our
experiments, our method can handle video collections featuring multiple object
classes, and substantially outperforms the state of the art in colocalization,
even though it tackles a broader problem with much less supervision
- …