183 research outputs found

    Scalable Recollections for Continual Lifelong Learning

    Full text link
    Given the recent success of Deep Learning applied to a variety of single tasks, it is natural to consider more human-realistic settings. Perhaps the most difficult of these settings is that of continual lifelong learning, where the model must learn online over a continuous stream of non-stationary data. A successful continual lifelong learning system must have three key capabilities: it must learn and adapt over time, it must not forget what it has learned, and it must be efficient in both training time and memory. Recent techniques have focused their efforts primarily on the first two capabilities while questions of efficiency remain largely unexplored. In this paper, we consider the problem of efficient and effective storage of experiences over very large time-frames. In particular we consider the case where typical experiences are O(n) bits and memories are limited to O(k) bits for k << n. We present a novel scalable architecture and training algorithm in this challenging domain and provide an extensive evaluation of its performance. Our results show that we can achieve considerable gains on top of state-of-the-art methods such as GEM.Comment: AAAI 201

    Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers

    Full text link
    Deep neural networks are susceptible to shortcut learning, using simple features to achieve low training loss without discovering essential semantic structure. Contrary to prior belief, we show that generative models alone are not sufficient to prevent shortcut learning, despite an incentive to recover a more comprehensive representation of the data than discriminative approaches. However, we observe that shortcuts are preferentially encoded with minimal information, a fact that generative models can exploit to mitigate shortcut learning. In particular, we propose Chroma-VAE, a two-pronged approach where a VAE classifier is initially trained to isolate the shortcut in a small latent subspace, allowing a secondary classifier to be trained on the complementary, shortcut-free latent subspace. In addition to demonstrating the efficacy of Chroma-VAE on benchmark and real-world shortcut learning tasks, our work highlights the potential for manipulating the latent space of generative classifiers to isolate or interpret specific correlations.Comment: Presented at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022

    Dynamic Narrowing of VAE Bottlenecks Using GECO and L0L_0 Regularization

    Full text link
    When designing variational autoencoders (VAEs) or other types of latent space models, the dimensionality of the latent space is typically defined upfront. In this process, it is possible that the number of dimensions is under- or overprovisioned for the application at hand. In case the dimensionality is not predefined, this parameter is usually determined using time- and resource-consuming cross-validation. For these reasons we have developed a technique to shrink the latent space dimensionality of VAEs automatically and on-the-fly during training using Generalized ELBO with Constrained Optimization (GECO) and the L0L_0-Augment-REINFORCE-Merge (L0L_0-ARM) gradient estimator. The GECO optimizer ensures that we are not violating a predefined upper bound on the reconstruction error. This paper presents the algorithmic details of our method along with experimental results on five different datasets. We find that our training procedure is stable and that the latent space can be pruned effectively without violating the GECO constraints.Comment: 16 pages, 3 figures, 1 tabl

    Representation Learning and Applications in Local Differential Privacy

    Get PDF
    Latent variable models (LVMs) provide an elegant, efficient, and interpretable approach to learning the generation process of observed data. Latent variables can capture salient features within often highly-correlated data, forming powerful tools in machine learning. For high-dimensional data, LVMs are typically parameterised by deep neural networks, and trained by maximising a variational lower bound on the data log likelihood. These models often suffer from poor use of their latent variable, with ad-hoc annealing factors used to encourage retention of information in the latent variable. In this work, we first introduce a novel approach to latent variable modelling, based on an objective that encourages both data reconstruction and generation. This ensures by design that the latent representations capture information about the data. Second, we consider a novel approach to inducing local differential privacy (LDP) in high dimensions with a specifically-designed LVM. LDP offers a rigorous approach to preserving one’s privacy against both adversaries and the database administrator. Existing LDP mechanisms struggle to retain data utility in high dimensions owing to prohibitive noise requirements. We circumvent this by inducing LDP on the low- dimensional manifold underlying the data. Further, we introduce a novel approach for downstream model learning using LDP training data, enabling the training of performant machine learning models. We achieve significant performance gains over current state-of-the-art LDP mechanisms, demonstrating far-reaching implications for the widespread practice of data collection and sharing. Finally, we scale up this approach, adapting current state-of-the-art representation learning models to induce LDP in even higher-dimensions, further widening the scope of LDP mechanisms for high-dimensional data collection
    • …
    corecore