9 research outputs found

    Look-Ahead Selective Plasticity for Continual Learning

    Get PDF
    Recent progress in contrastive representation learning has shown to yield robust representations that can avoid catastrophic forgetting in continual learning tasks. Most of these methods avoid forgetting by limiting changes in components of the deep neural network (DNN) that hold significant information about previously seen tasks. While these previous methods have been successful in preserving aspects of learned parameters believed to be most relevant for distinguishing previous classes, the retained parameters may be overfitted to seen data, leading to poor generalization even though “forgetting” is avoided. Inspired by modulation of early sensory neurons by top-down feedback projections of cortical neurons in perception and visual processing, we propose a class-incremental continual learning algorithm that identifies and attempts to preserve weights that contribute to the model performing well on new unseen classes by assessing their generalizability on a small predictive batch of the next episode of data. With experiments on popular image classification datasets, we demonstrate the effectiveness of the proposed approach and explain how using the model’s first encounter with new data to simulate a feedback signal for modulating plasticity of weights provides more information for training compared to using the loss value alone, and how it can guide the model’s learning through new experiences

    Data-Efficient Learning via Minimizing Hyperspherical Energy

    Full text link
    Deep learning on large-scale data is dominant nowadays. The unprecedented scale of data has been arguably one of the most important driving forces for the success of deep learning. However, there still exist scenarios where collecting data or labels could be extremely expensive, e.g., medical imaging and robotics. To fill up this gap, this paper considers the problem of data-efficient learning from scratch using a small amount of representative data. First, we characterize this problem by active learning on homeomorphic tubes of spherical manifolds. This naturally generates feasible hypothesis class. With homologous topological properties, we identify an important connection -- finding tube manifolds is equivalent to minimizing hyperspherical energy (MHE) in physical geometry. Inspired by this connection, we propose a MHE-based active learning (MHEAL) algorithm, and provide comprehensive theoretical guarantees for MHEAL, covering convergence and generalization analysis. Finally, we demonstrate the empirical performance of MHEAL in a wide range of applications on data-efficient learning, including deep clustering, distribution matching, version space sampling and deep active learning

    Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

    Full text link
    Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.Comment: Technical Report (33 pages, 18 figures

    Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks

    Get PDF
    corecore