614 research outputs found
Memory vectors for similarity search in high-dimensional spaces
We study an indexing architecture to store and search in a database of
high-dimensional vectors from the perspective of statistical signal processing
and decision theory. This architecture is composed of several memory units,
each of which summarizes a fraction of the database by a single representative
vector. The potential similarity of the query to one of the vectors stored in
the memory unit is gauged by a simple correlation with the memory unit's
representative vector. This representative optimizes the test of the following
hypothesis: the query is independent from any vector in the memory unit vs. the
query is a simple perturbation of one of the stored vectors.
Compared to exhaustive search, our approach finds the most similar database
vectors significantly faster without a noticeable reduction in search quality.
Interestingly, the reduction of complexity is provably better in
high-dimensional spaces. We empirically demonstrate its practical interest in a
large-scale image search scenario with off-the-shelf state-of-the-art
descriptors.Comment: Accepted to IEEE Transactions on Big Dat
Hierarchy-based Image Embeddings for Semantic Image Retrieval
Deep neural networks trained for classification have been found to learn
powerful image representations, which are also often used for other tasks such
as comparing images w.r.t. their visual similarity. However, visual similarity
does not imply semantic similarity. In order to learn semantically
discriminative features, we propose to map images onto class embeddings whose
pair-wise dot products correspond to a measure of semantic similarity between
classes. Such an embedding does not only improve image retrieval results, but
could also facilitate integrating semantics for other tasks, e.g., novelty
detection or few-shot learning. We introduce a deterministic algorithm for
computing the class centroids directly based on prior world-knowledge encoded
in a hierarchy of classes such as WordNet. Experiments on CIFAR-100, NABirds,
and ImageNet show that our learned semantic image embeddings improve the
semantic consistency of image retrieval results by a large margin.Comment: Accepted at WACV 2019. Source code:
https://github.com/cvjena/semantic-embedding
SegSort: Segmentation by Discriminative Sorting of Segments
Almost all existing deep learning approaches for semantic segmentation tackle
this task as a pixel-wise classification problem. Yet humans understand a scene
not in terms of pixels, but by decomposing it into perceptual groups and
structures that are the basic building blocks of recognition. This motivates us
to propose an end-to-end pixel-wise metric learning approach that mimics this
process. In our approach, the optimal visual representation determines the
right segmentation within individual images and associates segments with the
same semantic classes across images. The core visual learning problem is
therefore to maximize the similarity within segments and minimize the
similarity between segments. Given a model trained this way, inference is
performed consistently by extracting pixel-wise embeddings and clustering, with
the semantic label determined by the majority vote of its nearest neighbors
from an annotated set.
As a result, we present the SegSort, as a first attempt using deep learning
for unsupervised semantic segmentation, achieving performance of its
supervised counterpart. When supervision is available, SegSort shows consistent
improvements over conventional approaches based on pixel-wise softmax training.
Additionally, our approach produces more precise boundaries and consistent
region predictions. The proposed SegSort further produces an interpretable
result, as each choice of label can be easily understood from the retrieved
nearest segments.Comment: In ICCV 2019. Webpage & Code:
https://jyhjinghwang.github.io/projects/segsort.htm
The Common Stability Mechanism behind most Self-Supervised Learning Approaches
Last couple of years have witnessed a tremendous progress in self-supervised
learning (SSL), the success of which can be attributed to the introduction of
useful inductive biases in the learning process to learn meaningful visual
representations while avoiding collapse. These inductive biases and constraints
manifest themselves in the form of different optimization formulations in the
SSL techniques, e.g. by utilizing negative examples in a contrastive
formulation, or exponential moving average and predictor in BYOL and SimSiam.
In this paper, we provide a framework to explain the stability mechanism of
these different SSL techniques: i) we discuss the working mechanism of
contrastive techniques like SimCLR, non-contrastive techniques like BYOL, SWAV,
SimSiam, Barlow Twins, and DINO; ii) we provide an argument that despite
different formulations these methods implicitly optimize a similar objective
function, i.e. minimizing the magnitude of the expected representation over all
data samples, or the mean of the data distribution, while maximizing the
magnitude of the expected representation of individual samples over different
data augmentations; iii) we provide mathematical and empirical evidence to
support our framework. We formulate different hypotheses and test them using
the Imagenet100 dataset.Comment: Additional visualizations (.gif):
https://github.com/abskjha/CenterVectorSS
- …