106,078 research outputs found
Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions
Contrastive learning is a powerful framework for learning self-supervised
representations that generalize well to downstream supervised tasks. We show
that multiple existing contrastive learning methods can be reinterpreted as
learning kernel functions that approximate a fixed positive-pair kernel. We
then prove that a simple representation obtained by combining this kernel with
PCA provably minimizes the worst-case approximation error of linear predictors,
under a straightforward assumption that positive pairs have similar labels. Our
analysis is based on a decomposition of the target function in terms of the
eigenfunctions of a positive-pair Markov chain, and a surprising equivalence
between these eigenfunctions and the output of Kernel PCA. We give
generalization bounds for downstream linear prediction using our Kernel PCA
representation, and show empirically on a set of synthetic tasks that applying
Kernel PCA to contrastive learning models can indeed approximately recover the
Markov chain eigenfunctions, although the accuracy depends on the kernel
parameterization as well as on the augmentation strength.Comment: Published at ICLR 202
Stein Variational Gradient Descent with Multiple Kernel
Stein variational gradient descent (SVGD) and its variants have shown
promising successes in approximate inference for complex distributions. In
practice, we notice that the kernel used in SVGD-based methods has a decisive
effect on the empirical performance. Radial basis function (RBF) kernel with
median heuristics is a common choice in previous approaches, but unfortunately
this has proven to be sub-optimal. Inspired by the paradigm of Multiple Kernel
Learning (MKL), our solution to this flaw is using a combination of multiple
kernels to approximate the optimal kernel, rather than a single one which may
limit the performance and flexibility. Specifically, we first extend Kernelized
Stein Discrepancy (KSD) to its multiple kernels view called Multiple Kernelized
Stein Discrepancy (MKSD) and then leverage MKSD to construct a general
algorithm Multiple Kernel SVGD (MK-SVGD). Further, MKSVGD can automatically
assign a weight to each kernel without any other parameters, which means that
our method not only gets rid of optimal kernel dependence but also maintains
computational efficiency. Experiments on various tasks and models demonstrate
that our proposed method consistently matches or outperforms the competing
methods
Client-server multi-task learning from distributed datasets
A client-server architecture to simultaneously solve multiple learning tasks
from distributed datasets is described. In such architecture, each client is
associated with an individual learning task and the associated dataset of
examples. The goal of the architecture is to perform information fusion from
multiple datasets while preserving privacy of individual data. The role of the
server is to collect data in real-time from the clients and codify the
information in a common database. The information coded in this database can be
used by all the clients to solve their individual learning task, so that each
client can exploit the informative content of all the datasets without actually
having access to private data of others. The proposed algorithmic framework,
based on regularization theory and kernel methods, uses a suitable class of
mixed effect kernels. The new method is illustrated through a simulated music
recommendation system
Kernel Modulation: A Parameter-Efficient Method for Training Convolutional Neural Networks
Deep Neural Networks, particularly Convolutional Neural Networks (ConvNets), have achieved incredible success in many vision tasks, but they usually require millions of parameters for good accuracy performance. With increasing applications that use ConvNets, updating hundreds of networks for multiple tasks on an embedded device can be costly in terms of memory, bandwidth, and energy. Approaches to reduce this cost include model compression and parameter-efficient models that adapt a subset of network layers for each new task. This work proposes a novel parameter-efficient kernel modulation (KM) method that adapts all parameters of a base network instead of a subset of layers. KM uses lightweight task-specialized kernel modulators that require only an additional 1.4% of the base network parameters. With multiple tasks, only the task-specialized KM weights are communicated and stored on the end-user device. We applied this method in training ConvNets for Transfer Learning and Meta-Learning scenarios. Our results show that KM delivers up to 9% higher accuracy compared to other parameter-efficient methods on the Transfer Learning benchmark
Kernel Modulation: A Parameter-Efficient Method for Training Convolutional Neural Networks
Deep Neural Networks, particularly Convolutional Neural Networks (ConvNets),
have achieved incredible success in many vision tasks, but they usually require
millions of parameters for good accuracy performance. With increasing
applications that use ConvNets, updating hundreds of networks for multiple
tasks on an embedded device can be costly in terms of memory, bandwidth, and
energy. Approaches to reduce this cost include model compression and
parameter-efficient models that adapt a subset of network layers for each new
task. This work proposes a novel parameter-efficient kernel modulation (KM)
method that adapts all parameters of a base network instead of a subset of
layers. KM uses lightweight task-specialized kernel modulators that require
only an additional 1.4% of the base network parameters. With multiple tasks,
only the task-specialized KM weights are communicated and stored on the
end-user device. We applied this method in training ConvNets for Transfer
Learning and Meta-Learning scenarios. Our results show that KM delivers up to
9% higher accuracy than other parameter-efficient methods on the Transfer
Learning benchmark.Comment: Accepted at 2022 26th International Conference on Pattern Recognition
(ICPR
Generalizing Supervised Deep Learning MRI Reconstruction to Multiple and Unseen Contrasts using Meta-Learning Hypernetworks
Meta-learning has recently been an emerging data-efficient learning technique
for various medical imaging operations and has helped advance contemporary deep
learning models. Furthermore, meta-learning enhances the knowledge
generalization of the imaging tasks by learning both shared and discriminative
weights for various configurations of imaging tasks. However, existing
meta-learning models attempt to learn a single set of weight initializations of
a neural network that might be restrictive for multimodal data. This work aims
to develop a multimodal meta-learning model for image reconstruction, which
augments meta-learning with evolutionary capabilities to encompass diverse
acquisition settings of multimodal data. Our proposed model called KM-MAML
(Kernel Modulation-based Multimodal Meta-Learning), has hypernetworks that
evolve to generate mode-specific weights. These weights provide the
mode-specific inductive bias for multiple modes by re-calibrating each kernel
of the base network for image reconstruction via a low-rank kernel modulation
operation. We incorporate gradient-based meta-learning (GBML) in the contextual
space to update the weights of the hypernetworks for different modes. The
hypernetworks and the reconstruction network in the GBML setting provide
discriminative mode-specific features and low-level image features,
respectively. Experiments on multi-contrast MRI reconstruction show that our
model, (i) exhibits superior reconstruction performance over joint training,
other meta-learning methods, and context-specific MRI reconstruction methods,
and (ii) better adaptation capabilities with improvement margins of 0.5 dB in
PSNR and 0.01 in SSIM. Besides, a representation analysis with U-Net shows that
kernel modulation infuses 80% of mode-specific representation changes in the
high-resolution layers. Our source code is available at
https://github.com/sriprabhar/KM-MAML/.Comment: Accepted for publication in Elsevier Applied Soft Computing Journal,
36 pages, 18 figure
Efficient Multi-Template Learning for Structured Prediction
Conditional random field (CRF) and Structural Support Vector Machine
(Structural SVM) are two state-of-the-art methods for structured prediction
which captures the interdependencies among output variables. The success of
these methods is attributed to the fact that their discriminative models are
able to account for overlapping features on the whole input observations. These
features are usually generated by applying a given set of templates on labeled
data, but improper templates may lead to degraded performance. To alleviate
this issue, in this paper, we propose a novel multiple template learning
paradigm to learn structured prediction and the importance of each template
simultaneously, so that hundreds of arbitrary templates could be added into the
learning model without caution. This paradigm can be formulated as a special
multiple kernel learning problem with exponential number of constraints. Then
we introduce an efficient cutting plane algorithm to solve this problem in the
primal, and its convergence is presented. We also evaluate the proposed
learning paradigm on two widely-studied structured prediction tasks,
\emph{i.e.} sequence labeling and dependency parsing. Extensive experimental
results show that the proposed method outperforms CRFs and Structural SVMs due
to exploiting the importance of each template. Our complexity analysis and
empirical results also show that our proposed method is more efficient than
OnlineMKL on very sparse and high-dimensional data. We further extend this
paradigm for structured prediction using generalized -block norm
regularization with , and experiments show competitive performances when
- …