71,827 research outputs found
Hierarchy-based Image Embeddings for Semantic Image Retrieval
Deep neural networks trained for classification have been found to learn
powerful image representations, which are also often used for other tasks such
as comparing images w.r.t. their visual similarity. However, visual similarity
does not imply semantic similarity. In order to learn semantically
discriminative features, we propose to map images onto class embeddings whose
pair-wise dot products correspond to a measure of semantic similarity between
classes. Such an embedding does not only improve image retrieval results, but
could also facilitate integrating semantics for other tasks, e.g., novelty
detection or few-shot learning. We introduce a deterministic algorithm for
computing the class centroids directly based on prior world-knowledge encoded
in a hierarchy of classes such as WordNet. Experiments on CIFAR-100, NABirds,
and ImageNet show that our learned semantic image embeddings improve the
semantic consistency of image retrieval results by a large margin.Comment: Accepted at WACV 2019. Source code:
https://github.com/cvjena/semantic-embedding
SegSort: Segmentation by Discriminative Sorting of Segments
Almost all existing deep learning approaches for semantic segmentation tackle
this task as a pixel-wise classification problem. Yet humans understand a scene
not in terms of pixels, but by decomposing it into perceptual groups and
structures that are the basic building blocks of recognition. This motivates us
to propose an end-to-end pixel-wise metric learning approach that mimics this
process. In our approach, the optimal visual representation determines the
right segmentation within individual images and associates segments with the
same semantic classes across images. The core visual learning problem is
therefore to maximize the similarity within segments and minimize the
similarity between segments. Given a model trained this way, inference is
performed consistently by extracting pixel-wise embeddings and clustering, with
the semantic label determined by the majority vote of its nearest neighbors
from an annotated set.
As a result, we present the SegSort, as a first attempt using deep learning
for unsupervised semantic segmentation, achieving performance of its
supervised counterpart. When supervision is available, SegSort shows consistent
improvements over conventional approaches based on pixel-wise softmax training.
Additionally, our approach produces more precise boundaries and consistent
region predictions. The proposed SegSort further produces an interpretable
result, as each choice of label can be easily understood from the retrieved
nearest segments.Comment: In ICCV 2019. Webpage & Code:
https://jyhjinghwang.github.io/projects/segsort.htm
Cross Pixel Optical Flow Similarity for Self-Supervised Learning
We propose a novel method for learning convolutional neural image
representations without manual supervision. We use motion cues in the form of
optical flow, to supervise representations of static images. The obvious
approach of training a network to predict flow from a single image can be
needlessly difficult due to intrinsic ambiguities in this prediction task. We
instead propose a much simpler learning goal: embed pixels such that the
similarity between their embeddings matches that between their optical flow
vectors. At test time, the learned deep network can be used without access to
video or flow information and transferred to tasks such as image
classification, detection, and segmentation. Our method, which significantly
simplifies previous attempts at using motion for self-supervision, achieves
state-of-the-art results in self-supervision using motion cues, competitive
results for self-supervision in general, and is overall state of the art in
self-supervised pretraining for semantic image segmentation, as demonstrated on
standard benchmarks
Scene Graph Embeddings Using Relative Similarity Supervision
Scene graphs are a powerful structured representation of the underlying
content of images, and embeddings derived from them have been shown to be
useful in multiple downstream tasks. In this work, we employ a graph
convolutional network to exploit structure in scene graphs and produce image
embeddings useful for semantic image retrieval. Different from
classification-centric supervision traditionally available for learning image
representations, we address the task of learning from relative similarity
labels in a ranking context. Rooted within the contrastive learning paradigm,
we propose a novel loss function that operates on pairs of similar and
dissimilar images and imposes relative ordering between them in embedding
space. We demonstrate that this Ranking loss, coupled with an intuitive triple
sampling strategy, leads to robust representations that outperform well-known
contrastive losses on the retrieval task. In addition, we provide qualitative
evidence of how retrieved results that utilize structured scene information
capture the global context of the scene, different from visual similarity
search.Comment: Accepted to AAAI 202
- …