'Institute of Electrical and Electronics Engineers (IEEE)'
Doi
Abstract
Current approaches to object category recognition require
datasets of training images to be manually prepared, with
varying degrees of supervision. We present an approach
that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Internet. We develop a new model, TSI-pLSA, which
extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner. Our approach can handle the high intra-class variability and large proportion of unrelated images returned
by search engines. We evaluate the models on standard test
sets, showing performance competitive with existing methods trained on hand prepared datasets