We propose a method for knowledge transfer between semantically related
classes in ImageNet. By transferring knowledge from the images that have
bounding-box annotations to the others, our method is capable of automatically
populating ImageNet with many more bounding-boxes and even pixel-level
segmentations. The underlying assumption that objects from semantically related
classes look alike is formalized in our novel Associative Embedding (AE)
representation. AE recovers the latent low-dimensional space of appearance
variations among image windows. The dimensions of AE space tend to correspond
to aspects of window appearance (e.g. side view, close up, background). We
model the overlap of a window with an object using Gaussian Processes (GP)
regression, which spreads annotation smoothly through AE space. The
probabilistic nature of GP allows our method to perform self-assessment, i.e.
assigning a quality estimate to its own output. It enables trading off the
amount of returned annotations for their quality. A large scale experiment on
219 classes and 0.5 million images demonstrates that our method outperforms
state-of-the-art methods and baselines for both object localization and
segmentation. Using self-assessment we can automatically return bounding-box
annotations for 30% of all images with high localization accuracy (i.e.~73%
average overlap with ground-truth).Comment: A final CVPR version with a correction in (1). IEEE Computer Vision
and Pattern Recognition, 201