9 research outputs found
Look-Ahead Selective Plasticity for Continual Learning
Recent progress in contrastive representation learning has shown to yield robust representations that can avoid catastrophic forgetting in continual learning tasks. Most of these methods avoid forgetting by limiting changes in components of the deep neural network (DNN) that hold significant information about previously seen tasks. While these previous methods have been successful in preserving aspects of learned parameters believed to be most relevant for distinguishing previous classes, the retained parameters may be overfitted to seen data, leading to poor generalization even though “forgetting” is avoided. Inspired by modulation of early sensory neurons by top-down feedback projections of cortical neurons in perception and visual processing, we propose a class-incremental continual learning algorithm that identifies and attempts to preserve weights that contribute to the model performing well on new unseen classes by assessing their generalizability on a small predictive batch of the next episode of data. With experiments on popular image classification datasets, we demonstrate the effectiveness of the proposed approach and explain how using the model’s first encounter with new data to simulate a feedback signal for modulating plasticity of weights provides more information for training compared to using the loss value alone, and how it can guide the model’s learning through new experiences
Data-Efficient Learning via Minimizing Hyperspherical Energy
Deep learning on large-scale data is dominant nowadays. The unprecedented
scale of data has been arguably one of the most important driving forces for
the success of deep learning. However, there still exist scenarios where
collecting data or labels could be extremely expensive, e.g., medical imaging
and robotics. To fill up this gap, this paper considers the problem of
data-efficient learning from scratch using a small amount of representative
data. First, we characterize this problem by active learning on homeomorphic
tubes of spherical manifolds. This naturally generates feasible hypothesis
class. With homologous topological properties, we identify an important
connection -- finding tube manifolds is equivalent to minimizing hyperspherical
energy (MHE) in physical geometry. Inspired by this connection, we propose a
MHE-based active learning (MHEAL) algorithm, and provide comprehensive
theoretical guarantees for MHEAL, covering convergence and generalization
analysis. Finally, we demonstrate the empirical performance of MHEAL in a wide
range of applications on data-efficient learning, including deep clustering,
distribution matching, version space sampling and deep active learning
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Large foundation models are becoming ubiquitous, but training them from
scratch is prohibitively expensive. Thus, efficiently adapting these powerful
models to downstream tasks is increasingly important. In this paper, we study a
principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream
task adaptation. Despite demonstrating good generalizability, OFT still uses a
fairly large number of trainable parameters due to the high dimensionality of
orthogonal matrices. To address this, we start by examining OFT from an
information transmission perspective, and then identify a few key desiderata
that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast
Fourier transform algorithm enables efficient information transmission, we
propose an efficient orthogonal parameterization using butterfly structures. We
apply this parameterization to OFT, creating a novel parameter-efficient
finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a
special case, BOFT introduces a generalized orthogonal finetuning framework.
Finally, we conduct an extensive empirical study of adapting large vision
transformers, large language models, and text-to-image diffusion models to
various downstream tasks in vision and language.Comment: Technical Report (33 pages, 18 figures