11 research outputs found
Interferences in match kernels
We consider the design of an image representation that embeds and aggregates a set of local descriptors into a single vector. Popular representations of this kind include the bag-of-visual-words, the Fisher vector and the VLAD. When two such image representations are compared with the dot-product, the image-to-image similarity can be interpreted as a match kernel. In match kernels, one has to deal with interference, i.e., with the fact that even if two descriptors are unrelated, their matching score may contribute to the overall similarity. We formalise this problem and propose two related solutions, both aimed at equalising the individual contributions of the local descriptors in the final representation. These methods modify the aggregation stage by including a set of per-descriptor weights. They differ by the objective function that is optimised to compute those weights. The first is a "democratisation" strategy that aims at equalising the relative importance of each descriptor in the set comparison metric. The second one involves equalising the match of a single descriptor to the aggregated vector. These concurrent methods give a substantial performance boost over the state of the art in image search with short or mid-size vectors, as demonstrated by our experiments on standard public image retrieval benchmarks
Second-order Democratic Aggregation
Aggregated second-order features extracted from deep convolutional networks
have been shown to be effective for texture generation, fine-grained
recognition, material classification, and scene understanding. In this paper,
we study a class of orderless aggregation functions designed to minimize
interference or equalize contributions in the context of second-order features
and we show that they can be computed just as efficiently as their first-order
counterparts and they have favorable properties over aggregation by summation.
Another line of work has shown that matrix power normalization after
aggregation can significantly improve the generalization of second-order
representations. We show that matrix power normalization implicitly equalizes
contributions during aggregation thus establishing a connection between matrix
normalization techniques and prior work on minimizing interference. Based on
the analysis we present {\gamma}-democratic aggregators that interpolate
between sum ({\gamma}=1) and democratic pooling ({\gamma}=0) outperforming both
on several classification tasks. Moreover, unlike power normalization, the
{\gamma}-democratic aggregations can be computed in a low dimensional space by
sketching that allows the use of very high-dimensional second-order features.
This results in a state-of-the-art performance on several datasets
Re-ranking for Writer Identification and Writer Retrieval
Automatic writer identification is a common problem in document analysis.
State-of-the-art methods typically focus on the feature extraction step with
traditional or deep-learning-based techniques. In retrieval problems,
re-ranking is a commonly used technique to improve the results. Re-ranking
refines an initial ranking result by using the knowledge contained in the
ranked result, e. g., by exploiting nearest neighbor relations. To the best of
our knowledge, re-ranking has not been used for writer
identification/retrieval. A possible reason might be that publicly available
benchmark datasets contain only few samples per writer which makes a re-ranking
less promising. We show that a re-ranking step based on k-reciprocal nearest
neighbor relationships is advantageous for writer identification, even if only
a few samples per writer are available. We use these reciprocal relationships
in two ways: encode them into new vectors, as originally proposed, or integrate
them in terms of query-expansion. We show that both techniques outperform the
baseline results in terms of mAP on three writer identification datasets
Set-Based Face Recognition Beyond Disentanglement: Burstiness Suppression With Variance Vocabulary
Set-based face recognition (SFR) aims to recognize the face sets in the
unconstrained scenario, where the appearance of same identity may change
dramatically with extreme variances (e.g., illumination, pose, expression). We
argue that the two crucial issues in SFR, the face quality and burstiness, are
both identity-irrelevant and variance-relevant. The quality and burstiness
assessment are interfered with by the entanglement of identity, and the face
recognition is interfered with by the entanglement of variance. Thus we propose
to separate the identity features with the variance features in a
light-weighted set-based disentanglement framework. Beyond disentanglement, the
variance features are fully utilized to indicate face quality and burstiness in
a set, rather than being discarded after training. To suppress face burstiness
in the sets, we propose a vocabulary-based burst suppression (VBS) method which
quantizes faces with a reference vocabulary. With interword and intra-word
normalization operations on the assignment scores, the face burtisness degrees
are appropriately estimated. The extensive illustrations and experiments
demonstrate the effect of the disentanglement framework with VBS, which gets
new state-of-the-art on the SFR benchmarks. The code will be released at
https://github.com/Liubinggunzu/set_burstiness.Comment: ACM MM 2022 accepted, code will be release
Generalized Sum Pooling for Metric Learning
A common architectural choice for deep metric learning is a convolutional
neural network followed by global average pooling (GAP). Albeit simple, GAP is
a highly effective way to aggregate information. One possible explanation for
the effectiveness of GAP is considering each feature vector as representing a
different semantic entity and GAP as a convex combination of them. Following
this perspective, we generalize GAP and propose a learnable generalized sum
pooling method (GSP). GSP improves GAP with two distinct abilities: i) the
ability to choose a subset of semantic entities, effectively learning to ignore
nuisance information, and ii) learning the weights corresponding to the
importance of each entity. Formally, we propose an entropy-smoothed optimal
transport problem and show that it is a strict generalization of GAP, i.e., a
specific realization of the problem gives back GAP. We show that this
optimization problem enjoys analytical gradients enabling us to use it as a
direct learnable replacement for GAP. We further propose a zero-shot loss to
ease the learning of GSP. We show the effectiveness of our method with
extensive evaluations on 4 popular metric learning benchmarks. Code is
available at: GSP-DML FrameworkComment: Accepted as a conference paper at International Conference on
Computer Vision (ICCV) 202
Interferences in match kernels
We consider the design of an image representation that embeds and aggregates a set of local descriptors into a single vector. Popular representations of this kind include the bag-of-visual-words, the Fisher vector and the VLAD. When two such image representations are compared with the dot-product, the image-to-image similarity can be interpreted as a match kernel. In match kernels, one has to deal with interference, i.e., with the fact that even if two descriptors are unrelated, their matching score may contribute to the overall similarity. We formalise this problem and propose two related solutions, both aimed at equalising the individual contributions of the local descriptors in the final representation. These methods modify the aggregation stage by including a set of per-descriptor weights. They differ by the objective function that is optimised to compute those weights. The first is a "democratisation" strategy that aims at equalising the relative importance of each descriptor in the set comparison metric. The second one involves equalising the match of a single descriptor to the aggregated vector. These concurrent methods give a substantial performance boost over the state of the art in image search with short or mid-size vectors, as demonstrated by our experiments on standard public image retrieval benchmarks
Interferences in match kernels
We consider the design of an image representation that embeds and aggregates a set of local descriptors into a single vector. Popular representations of this kind include the bag-of-visual-words, the Fisher vector and the VLAD. When two such image representations are compared with the dot-product, the image-to-image similarity can be interpreted as a match kernel. In match kernels, one has to deal with interference, i.e., with the fact that even if two descriptors are unrelated, their matching score may contribute to the overall similarity. We formalise this problem and propose two related solutions, both aimed at equalising the individual contributions of the local descriptors in the final representation. These methods modify the aggregation stage by including a set of per-descriptor weights. They differ by the objective function that is optimised to compute those weights. The first is a "democratisation" strategy that aims at equalising the relative importance of each descriptor in the set comparison metric. The second one involves equalising the match of a single descriptor to the aggregated vector. These concurrent methods give a substantial performance boost over the state of the art in image search with short or mid-size vectors, as demonstrated by our experiments on standard public image retrieval benchmarks