4 research outputs found
Generalizable Embeddings with Cross-batch Metric Learning
Global average pooling (GAP) is a popular component in deep metric learning
(DML) for aggregating features. Its effectiveness is often attributed to
treating each feature vector as a distinct semantic entity and GAP as a
combination of them. Albeit substantiated, such an explanation's algorithmic
implications to learn generalizable entities to represent unseen classes, a
crucial DML goal, remain unclear. To address this, we formulate GAP as a convex
combination of learnable prototypes. We then show that the prototype learning
can be expressed as a recursive process fitting a linear predictor to a batch
of samples. Building on that perspective, we consider two batches of disjoint
classes at each iteration and regularize the learning by expressing the samples
of a batch with the prototypes that are fitted to the other batch. We validate
our approach on 4 popular DML benchmarks.Comment: \c{opyright} 2023 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Feature Embedding by Template Matching as a ResNet Block
Convolution blocks serve as local feature extractors and are the key to
success of the neural networks. To make local semantic feature embedding rather
explicit, we reformulate convolution blocks as feature selection according to
the best matching kernel. In this manner, we show that typical ResNet blocks
indeed perform local feature embedding via template matching once batch
normalization (BN) followed by a rectified linear unit (ReLU) is interpreted as
arg-max optimizer. Following this perspective, we tailor a residual block that
explicitly forces semantically meaningful local feature embedding through using
label information. Specifically, we assign a feature vector to each local
region according to the classes that the corresponding region matches. We
evaluate our method on three popular benchmark datasets with several
architectures for image classification and consistently show that our approach
substantially improves the performance of the baseline architectures.Comment: Accepted at the British Machine Vision Conference 2022 (BMVC 2022
Deep Metric Learning with Chance Constraints
Deep metric learning (DML) aims to minimize empirical expected loss of the
pairwise intra-/inter- class proximity violations in the embedding image. We
relate DML to feasibility problem of finite chance constraints. We show that
minimizer of proxy-based DML satisfies certain chance constraints, and that the
worst case generalization performance of the proxy-based methods can be
characterized by the radius of the smallest ball around a class proxy to cover
the entire domain of the corresponding class samples, suggesting multiple
proxies per class helps performance. To provide a scalable algorithm as well as
exploiting more proxies, we consider the chance constraints implied by the
minimizers of proxy-based DML instances and reformulate DML as finding a
feasible point in intersection of such constraints, resulting in a problem to
be approximately solved by iterative projections. Simply put, we repeatedly
train a regularized proxy-based loss and re-initialize the proxies with the
embeddings of the deliberately selected new samples. We apply our method with
the well-accepted losses and evaluate on four popular benchmark datasets for
image retrieval. Outperforming state-of-the-art, our method consistently
improves the performance of the applied losses. Code is available at:
https://github.com/yetigurbuz/ccp-dmlComment: Under review at IEEE Transactions on Neural Networks and Learning
System
Generalized Sum Pooling for Metric Learning
A common architectural choice for deep metric learning is a convolutional
neural network followed by global average pooling (GAP). Albeit simple, GAP is
a highly effective way to aggregate information. One possible explanation for
the effectiveness of GAP is considering each feature vector as representing a
different semantic entity and GAP as a convex combination of them. Following
this perspective, we generalize GAP and propose a learnable generalized sum
pooling method (GSP). GSP improves GAP with two distinct abilities: i) the
ability to choose a subset of semantic entities, effectively learning to ignore
nuisance information, and ii) learning the weights corresponding to the
importance of each entity. Formally, we propose an entropy-smoothed optimal
transport problem and show that it is a strict generalization of GAP, i.e., a
specific realization of the problem gives back GAP. We show that this
optimization problem enjoys analytical gradients enabling us to use it as a
direct learnable replacement for GAP. We further propose a zero-shot loss to
ease the learning of GSP. We show the effectiveness of our method with
extensive evaluations on 4 popular metric learning benchmarks. Code is
available at: GSP-DML FrameworkComment: Accepted as a conference paper at International Conference on
Computer Vision (ICCV) 202