2,043 research outputs found
Weakly-supervised learning of visual relations
This paper introduces a novel approach for modeling visual relations between
pairs of objects. We call relation a triplet of the form (subject, predicate,
object) where the predicate is typically a preposition (eg. 'under', 'in front
of') or a verb ('hold', 'ride') that links a pair of objects (subject, object).
Learning such relations is challenging as the objects have different spatial
configurations and appearances depending on the relation in which they occur.
Another major challenge comes from the difficulty to get annotations,
especially at box-level, for all possible triplets, which makes both learning
and evaluation difficult. The contributions of this paper are threefold. First,
we design strong yet flexible visual features that encode the appearance and
spatial configuration for pairs of objects. Second, we propose a
weakly-supervised discriminative clustering model to learn relations from
image-level labels only. Third we introduce a new challenging dataset of
unusual relations (UnRel) together with an exhaustive annotation, that enables
accurate evaluation of visual relation retrieval. We show experimentally that
our model results in state-of-the-art results on the visual relationship
dataset significantly improving performance on previously unseen relations
(zero-shot learning), and confirm this observation on our newly introduced
UnRel dataset
Weakly-supervised learning of visual relations
This paper introduces a novel approach for modeling visual relations between
pairs of objects. We call relation a triplet of the form (subject, predicate,
object) where the predicate is typically a preposition (eg. 'under', 'in front
of') or a verb ('hold', 'ride') that links a pair of objects (subject, object).
Learning such relations is challenging as the objects have different spatial
configurations and appearances depending on the relation in which they occur.
Another major challenge comes from the difficulty to get annotations,
especially at box-level, for all possible triplets, which makes both learning
and evaluation difficult. The contributions of this paper are threefold. First,
we design strong yet flexible visual features that encode the appearance and
spatial configuration for pairs of objects. Second, we propose a
weakly-supervised discriminative clustering model to learn relations from
image-level labels only. Third we introduce a new challenging dataset of
unusual relations (UnRel) together with an exhaustive annotation, that enables
accurate evaluation of visual relation retrieval. We show experimentally that
our model results in state-of-the-art results on the visual relationship
dataset significantly improving performance on previously unseen relations
(zero-shot learning), and confirm this observation on our newly introduced
UnRel dataset
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
- …