16,078 research outputs found
Zero-Shot Object Detection by Hybrid Region Embedding
Object detection is considered as one of the most challenging problems in
computer vision, since it requires correct prediction of both classes and
locations of objects in images. In this study, we define a more difficult
scenario, namely zero-shot object detection (ZSD) where no visual training data
is available for some of the target object classes. We present a novel approach
to tackle this ZSD problem, where a convex combination of embeddings are used
in conjunction with a detection framework. For evaluation of ZSD methods, we
propose a simple dataset constructed from Fashion-MNIST images and also a
custom zero-shot split for the Pascal VOC detection challenge. The experimental
results suggest that our method yields promising results for ZSD
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
Comparator Networks
The objective of this work is set-based verification, e.g. to decide if two
sets of images of a face are of the same person or not. The traditional
approach to this problem is to learn to generate a feature vector per image,
aggregate them into one vector to represent the set, and then compute the
cosine similarity between sets. Instead, we design a neural network
architecture that can directly learn set-wise verification. Our contributions
are: (i) We propose a Deep Comparator Network (DCN) that can ingest a pair of
sets (each may contain a variable number of images) as inputs, and compute a
similarity between the pair--this involves attending to multiple discriminative
local regions (landmarks), and comparing local descriptors between pairs of
faces; (ii) To encourage high-quality representations for each set, internal
competition is introduced for recalibration based on the landmark score; (iii)
Inspired by image retrieval, a novel hard sample mining regime is proposed to
control the sampling process, such that the DCN is complementary to the
standard image classification models. Evaluations on the IARPA Janus face
recognition benchmarks show that the comparator networks outperform the
previous state-of-the-art results by a large margin.Comment: To appear in ECCV 201
Group Membership Prediction
The group membership prediction (GMP) problem involves predicting whether or
not a collection of instances share a certain semantic property. For instance,
in kinship verification given a collection of images, the goal is to predict
whether or not they share a {\it familial} relationship. In this context we
propose a novel probability model and introduce latent {\em view-specific} and
{\em view-shared} random variables to jointly account for the view-specific
appearance and cross-view similarities among data instances. Our model posits
that data from each view is independent conditioned on the shared variables.
This postulate leads to a parametric probability model that decomposes group
membership likelihood into a tensor product of data-independent parameters and
data-dependent factors. We propose learning the data-independent parameters in
a discriminative way with bilinear classifiers, and test our prediction
algorithm on challenging visual recognition tasks such as multi-camera person
re-identification and kinship verification. On most benchmark datasets, our
method can significantly outperform the current state-of-the-art.Comment: accepted for ICCV 201
Multi-Label Zero-Shot Learning with Structured Knowledge Graphs
In this paper, we propose a novel deep learning architecture for multi-label
zero-shot learning (ML-ZSL), which is able to predict multiple unseen class
labels for each input instance. Inspired by the way humans utilize semantic
knowledge between objects of interests, we propose a framework that
incorporates knowledge graphs for describing the relationships between multiple
labels. Our model learns an information propagation mechanism from the semantic
label space, which can be applied to model the interdependencies between seen
and unseen class labels. With such investigation of structured knowledge graphs
for visual reasoning, we show that our model can be applied for solving
multi-label classification and ML-ZSL tasks. Compared to state-of-the-art
approaches, comparable or improved performances can be achieved by our method.Comment: CVPR 201
- …