29 research outputs found
Zero-Shot Object Detection by Hybrid Region Embedding
Object detection is considered as one of the most challenging problems in
computer vision, since it requires correct prediction of both classes and
locations of objects in images. In this study, we define a more difficult
scenario, namely zero-shot object detection (ZSD) where no visual training data
is available for some of the target object classes. We present a novel approach
to tackle this ZSD problem, where a convex combination of embeddings are used
in conjunction with a detection framework. For evaluation of ZSD methods, we
propose a simple dataset constructed from Fashion-MNIST images and also a
custom zero-shot split for the Pascal VOC detection challenge. The experimental
results suggest that our method yields promising results for ZSD
Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval
In this paper, we investigate the problem of zero-shot sketch-based image
retrieval (ZS-SBIR), where human sketches are used as queries to conduct
retrieval of photos from unseen categories. We importantly advance prior arts
by proposing a novel ZS-SBIR scenario that represents a firm step forward in
its practical application. The new setting uniquely recognizes two important
yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap
between amateur sketch and photo, and (ii) the necessity for moving towards
large-scale retrieval. We first contribute to the community a novel ZS-SBIR
dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000
photos spanning across 110 categories. Highly abstract amateur human sketches
are purposefully sourced to maximize the domain gap, instead of ones included
in existing datasets that can often be semi-photorealistic. We then formulate a
ZS-SBIR framework to jointly model sketches and photos into a common embedding
space. A novel strategy to mine the mutual information among domains is
specifically engineered to alleviate the domain gap. External semantic
knowledge is further embedded to aid semantic transfer. We show that, rather
surprisingly, retrieval performance significantly outperforms that of
state-of-the-art on existing datasets that can already be achieved using a
reduced version of our model. We further demonstrate the superior performance
of our full model by comparing with a number of alternatives on the newly
proposed dataset. The new dataset, plus all training and testing code of our
model, will be publicly released to facilitate future researchComment: Oral paper in CVPR 201