10 research outputs found
Context-Aware Zero-Shot Recognition
We present a novel problem setting in zero-shot learning, zero-shot object
recognition and detection in the context. Contrary to the traditional zero-shot
learning methods, which simply infers unseen categories by transferring
knowledge from the objects belonging to semantically similar seen categories,
we aim to understand the identity of the novel objects in an image surrounded
by the known objects using the inter-object relation prior. Specifically, we
leverage the visual context and the geometric relationships between all pairs
of objects in a single image, and capture the information useful to infer
unseen categories. We integrate our context-aware zero-shot learning framework
into the traditional zero-shot learning techniques seamlessly using a
Conditional Random Field (CRF). The proposed algorithm is evaluated on both
zero-shot region classification and zero-shot detection tasks. The results on
Visual Genome (VG) dataset show that our model significantly boosts performance
with the additional visual context compared to traditional methods
Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning
Zero-shot learning (ZSL) makes object recognition in images possible in
absence of visual training data for a part of the classes from a dataset. When
the number of classes is large, classes are usually represented by semantic
class prototypes learned automatically from unannotated text collections. This
typically leads to much lower performances than with manually designed semantic
prototypes such as attributes. While most ZSL works focus on the visual aspect
and reuse standard semantic prototypes learned from generic text collections,
we focus on the problem of semantic class prototype design for large scale ZSL.
More specifically, we investigate the use of noisy textual metadata associated
to photos as text collections, as we hypothesize they are likely to provide
more plausible semantic embeddings for visual classes if exploited
appropriately. We thus make use of a source-based voting strategy to improve
the robustness of semantic prototypes. Evaluation on the large scale ImageNet
dataset shows a significant improvement in ZSL performances over two strong
baselines, and over usual semantic embeddings used in previous works. We show
that this improvement is obtained for several embedding methods, leading to
state of the art results when one uses automatically created visual and text
features