9 research outputs found
Learning to Forget for Meta-Learning
Few-shot learning is a challenging problem where the goal is to achieve
generalization from only few examples. Model-agnostic meta-learning (MAML)
tackles the problem by formulating prior knowledge as a common initialization
across tasks, which is then used to quickly adapt to unseen tasks. However,
forcibly sharing an initialization can lead to conflicts among tasks and the
compromised (undesired by tasks) location on optimization landscape, thereby
hindering the task adaptation. Further, we observe that the degree of conflict
differs among not only tasks but also layers of a neural network. Thus, we
propose task-and-layer-wise attenuation on the compromised initialization to
reduce its influence. As the attenuation dynamically controls (or selectively
forgets) the influence of prior knowledge for a given task and each layer, we
name our method as L2F (Learn to Forget). The experimental results demonstrate
that the proposed method provides faster adaptation and greatly improves the
performance. Furthermore, L2F can be easily applied and improve other
state-of-the-art MAML-based frameworks, illustrating its simplicity and
generalizability.Comment: CVPR 2020. Code at https://github.com/baiksung/L2
Writer adaptation for offline text recognition: An exploration of neural network-based methods
Handwriting recognition has seen significant success with the use of deep
learning. However, a persistent shortcoming of neural networks is that they are
not well-equipped to deal with shifting data distributions. In the field of
handwritten text recognition (HTR), this shows itself in poor recognition
accuracy for writers that are not similar to those seen during training. An
ideal HTR model should be adaptive to new writing styles in order to handle the
vast amount of possible writing styles. In this paper, we explore how HTR
models can be made writer adaptive by using only a handful of examples from a
new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used
as base models, using a ResNet backbone along with either an LSTM or
Transformer sequence decoder. Using these base models, two methods are
considered to make them writer adaptive: 1) model-agnostic meta-learning
(MAML), an algorithm commonly used for tasks such as few-shot classification,
and 2) writer codes, an idea originating from automatic speech recognition.
Results show that an HTR-specific version of MAML known as MetaHTR improves
performance compared to the baseline with a 1.4 to 2.0 improvement in word
error rate (WER). The improvement due to writer adaptation is between 0.2 and
0.7 WER, where a deeper model seems to lend itself better to adaptation using
MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models
or sentence-level HTR may become prohibitive due to its high computational and
memory requirements. Lastly, writer codes based on learned features or Hinge
statistical features did not lead to improved recognition performance.Comment: 21 pages including appendices, 6 figures, 10 table
Learning with Constraint Learning: New Perspective, Solution Strategy and Various Applications
The complexity of learning problems, such as Generative Adversarial Network
(GAN) and its variants, multi-task and meta-learning, hyper-parameter learning,
and a variety of real-world vision applications, demands a deeper understanding
of their underlying coupling mechanisms. Existing approaches often address
these problems in isolation, lacking a unified perspective that can reveal
commonalities and enable effective solutions. Therefore, in this work, we
proposed a new framework, named Learning with Constraint Learning (LwCL), that
can holistically examine challenges and provide a unified methodology to tackle
all the above-mentioned complex learning and vision problems. Specifically,
LwCL is designed as a general hierarchical optimization model that captures the
essence of these diverse learning and vision problems. Furthermore, we develop
a gradient-response based fast solution strategy to overcome optimization
challenges of the LwCL framework. Our proposed framework efficiently addresses
a wide range of applications in learning and vision, encompassing three
categories and nine different problem types. Extensive experiments on synthetic
tasks and real-world applications verify the effectiveness of our approach. The
LwCL framework offers a comprehensive solution for tackling complex machine
learning and computer vision problems, bridging the gap between theory and
practice
Generalizing Supervised Deep Learning MRI Reconstruction to Multiple and Unseen Contrasts using Meta-Learning Hypernetworks
Meta-learning has recently been an emerging data-efficient learning technique
for various medical imaging operations and has helped advance contemporary deep
learning models. Furthermore, meta-learning enhances the knowledge
generalization of the imaging tasks by learning both shared and discriminative
weights for various configurations of imaging tasks. However, existing
meta-learning models attempt to learn a single set of weight initializations of
a neural network that might be restrictive for multimodal data. This work aims
to develop a multimodal meta-learning model for image reconstruction, which
augments meta-learning with evolutionary capabilities to encompass diverse
acquisition settings of multimodal data. Our proposed model called KM-MAML
(Kernel Modulation-based Multimodal Meta-Learning), has hypernetworks that
evolve to generate mode-specific weights. These weights provide the
mode-specific inductive bias for multiple modes by re-calibrating each kernel
of the base network for image reconstruction via a low-rank kernel modulation
operation. We incorporate gradient-based meta-learning (GBML) in the contextual
space to update the weights of the hypernetworks for different modes. The
hypernetworks and the reconstruction network in the GBML setting provide
discriminative mode-specific features and low-level image features,
respectively. Experiments on multi-contrast MRI reconstruction show that our
model, (i) exhibits superior reconstruction performance over joint training,
other meta-learning methods, and context-specific MRI reconstruction methods,
and (ii) better adaptation capabilities with improvement margins of 0.5 dB in
PSNR and 0.01 in SSIM. Besides, a representation analysis with U-Net shows that
kernel modulation infuses 80% of mode-specific representation changes in the
high-resolution layers. Our source code is available at
https://github.com/sriprabhar/KM-MAML/.Comment: Accepted for publication in Elsevier Applied Soft Computing Journal,
36 pages, 18 figure
Learning an Explicit Hyperparameter Prediction Policy Conditioned on Tasks
Meta learning has attracted much attention recently in machine learning
community. Contrary to conventional machine learning aiming to learn inherent
prediction rules to predict labels for new query data, meta learning aims to
learn the learning methodology for machine learning from observed tasks, so as
to generalize to new query tasks by leveraging the meta-learned learning
methodology. In this study, we interpret such learning methodology as learning
an explicit hyperparameter prediction policy shared by all training tasks.
Specifically, this policy is represented as a parameterized function called
meta-learner, mapping from a training/test task to its suitable hyperparameter
setting, extracted from a pre-specified function set called meta learning
machine. Such setting guarantees that the meta-learned learning methodology is
able to flexibly fit diverse query tasks, instead of only obtaining fixed
hyperparameters by many current meta learning methods, with less adaptability
to query task's variations. Such understanding of meta learning also makes it
easily succeed from traditional learning theory for analyzing its
generalization bounds with general losses/tasks/models. The theory naturally
leads to some feasible controlling strategies for ameliorating the quality of
the extracted meta-learner, verified to be able to finely ameliorate its
generalization capability in some typical meta learning applications, including
few-shot regression, few-shot classification and domain generalization.Comment: 59 pages. arXiv admin note: text overlap with arXiv:1904.03758 by
other author
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
Forgetting refers to the loss or deterioration of previously acquired
information or knowledge. While the existing surveys on forgetting have
primarily focused on continual learning, forgetting is a prevalent phenomenon
observed in various other research domains within deep learning. Forgetting
manifests in research fields such as generative models due to generator shifts,
and federated learning due to heterogeneous data distributions across clients.
Addressing forgetting encompasses several challenges, including balancing the
retention of old task knowledge with fast learning of new tasks, managing task
interference with conflicting goals, and preventing privacy leakage, etc.
Moreover, most existing surveys on continual learning implicitly assume that
forgetting is always harmful. In contrast, our survey argues that forgetting is
a double-edged sword and can be beneficial and desirable in certain cases, such
as privacy-preserving scenarios. By exploring forgetting in a broader context,
we aim to present a more nuanced understanding of this phenomenon and highlight
its potential advantages. Through this comprehensive survey, we aspire to
uncover potential solutions by drawing upon ideas and approaches from various
fields that have dealt with forgetting. By examining forgetting beyond its
conventional boundaries, in future work, we hope to encourage the development
of novel strategies for mitigating, harnessing, or even embracing forgetting in
real applications. A comprehensive list of papers about forgetting in various
research fields is available at
\url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}