50,032 research outputs found
Pairwise Learning via Stagewise Training in Proximal Setting
The pairwise objective paradigms are an important and essential aspect of
machine learning. Examples of machine learning approaches that use pairwise
objective functions include differential network in face recognition, metric
learning, bipartite learning, multiple kernel learning, and maximizing of area
under the curve (AUC). Compared to pointwise learning, pairwise learning's
sample size grows quadratically with the number of samples and thus its
complexity. Researchers mostly address this challenge by utilizing an online
learning system. Recent research has, however, offered adaptive sample size
training for smooth loss functions as a better strategy in terms of convergence
and complexity, but without a comprehensive theoretical study. In a distinct
line of research, importance sampling has sparked a considerable amount of
interest in finite pointwise-sum minimization. This is because of the
stochastic gradient variance, which causes the convergence to be slowed
considerably. In this paper, we combine adaptive sample size and importance
sampling techniques for pairwise learning, with convergence guarantees for
nonsmooth convex pairwise loss functions. In particular, the model is trained
stochastically using an expanded training set for a predefined number of
iterations derived from the stability bounds. In addition, we demonstrate that
sampling opposite instances at each iteration reduces the variance of the
gradient, hence accelerating convergence. Experiments on a broad variety of
datasets in AUC maximization confirm the theoretical results.Comment: 10 Page
Variance Reduced Online Gradient Descent for Kernelized Pairwise Learning with Limited Memory
Pairwise learning is essential in machine learning, especially for problems
involving loss functions defined on pairs of training examples. Online gradient
descent (OGD) algorithms have been proposed to handle online pairwise learning,
where data arrives sequentially. However, the pairwise nature of the problem
makes scalability challenging, as the gradient computation for a new sample
involves all past samples. Recent advancements in OGD algorithms have aimed to
reduce the complexity of calculating online gradients, achieving complexities
less than and even as low as . However, these approaches are
primarily limited to linear models and have induced variance. In this study, we
propose a limited memory OGD algorithm that extends to kernel online pairwise
learning while improving the sublinear regret. Specifically, we establish a
clear connection between the variance of online gradients and the regret, and
construct online gradients using the most recent stratified samples with a
limited buffer of size of representing all past data, which have a
complexity of and employs random Fourier features
for kernel approximation. Importantly, our theoretical results demonstrate that
the variance-reduced online gradients lead to an improved sublinear regret
bound. The experiments on real-world datasets demonstrate the superiority of
our algorithm over both kernelized and linear online pairwise learning
algorithms.Comment: Accepted in ACML202
SCALE: Online Self-Supervised Lifelong Learning without Prior Knowledge
Unsupervised lifelong learning refers to the ability to learn over time while
memorizing previous patterns without supervision. Previous works assumed strong
prior knowledge about the incoming data (e.g., knowing the class boundaries)
which can be impossible to obtain in complex and unpredictable environments. In
this paper, motivated by real-world scenarios, we formally define the online
unsupervised lifelong learning problem with class-incremental streaming data,
which is non-iid and single-pass. The problem is more challenging than existing
lifelong learning problems due to the absence of labels and prior knowledge. To
address the issue, we propose Self-Supervised ContrAstive Lifelong LEarning
(SCALE) which extracts and memorizes knowledge on-the-fly. SCALE is designed
around three major components: a pseudo-supervised contrastive loss, a
self-supervised forgetting loss, and an online memory update for uniform subset
selection. All three components are designed to work collaboratively to
maximize learning performance. Our loss functions leverage pairwise similarity
thus remove the dependency on supervision or prior knowledge. We perform
comprehensive experiments of SCALE under iid and four non-iid data streams.
SCALE outperforms the best state-of-the-art algorithm on all settings with
improvements of up to 3.83%, 2.77% and 5.86% kNN accuracy on CIFAR-10,
CIFAR-100 and SubImageNet datasets.Comment: Submitted for revie
Ranked List Loss for Deep Metric Learning
The objective of deep metric learning (DML) is to learn embeddings that can
capture semantic similarity and dissimilarity information among data points.
Existing pairwise or tripletwise loss functions used in DML are known to suffer
from slow convergence due to a large proportion of trivial pairs or triplets as
the model improves. To improve this, ranking-motivated structured losses are
proposed recently to incorporate multiple examples and exploit the structured
information among them. They converge faster and achieve state-of-the-art
performance. In this work, we unveil two limitations of existing
ranking-motivated structured losses and propose a novel ranked list loss to
solve both of them. First, given a query, only a fraction of data points is
incorporated to build the similarity structure. Consequently, some useful
examples are ignored and the structure is less informative. To address this, we
propose to build a set-based similarity structure by exploiting all instances
in the gallery. The learning setting can be interpreted as few-shot retrieval:
given a mini-batch, every example is iteratively used as a query, and the rest
ones compose the gallery to search, i.e., the support set in few-shot setting.
The rest examples are split into a positive set and a negative set. For every
mini-batch, the learning objective of ranked list loss is to make the query
closer to the positive set than to the negative set by a margin. Second,
previous methods aim to pull positive pairs as close as possible in the
embedding space. As a result, the intraclass data distribution tends to be
extremely compressed. In contrast, we propose to learn a hypersphere for each
class in order to preserve useful similarity structure inside it, which
functions as regularisation. Extensive experiments demonstrate the superiority
of our proposal by comparing with the state-of-the-art methods.Comment: Accepted to T-PAMI. Therefore, to read the offical version, please go
to IEEE Xplore. Fine-grained image retrieval task. Our source code is
available online: https://github.com/XinshaoAmosWang/Ranked-List-Loss-for-DM
- …