183 research outputs found
Scalable Recollections for Continual Lifelong Learning
Given the recent success of Deep Learning applied to a variety of single
tasks, it is natural to consider more human-realistic settings. Perhaps the
most difficult of these settings is that of continual lifelong learning, where
the model must learn online over a continuous stream of non-stationary data. A
successful continual lifelong learning system must have three key capabilities:
it must learn and adapt over time, it must not forget what it has learned, and
it must be efficient in both training time and memory. Recent techniques have
focused their efforts primarily on the first two capabilities while questions
of efficiency remain largely unexplored. In this paper, we consider the problem
of efficient and effective storage of experiences over very large time-frames.
In particular we consider the case where typical experiences are O(n) bits and
memories are limited to O(k) bits for k << n. We present a novel scalable
architecture and training algorithm in this challenging domain and provide an
extensive evaluation of its performance. Our results show that we can achieve
considerable gains on top of state-of-the-art methods such as GEM.Comment: AAAI 201
Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers
Deep neural networks are susceptible to shortcut learning, using simple
features to achieve low training loss without discovering essential semantic
structure. Contrary to prior belief, we show that generative models alone are
not sufficient to prevent shortcut learning, despite an incentive to recover a
more comprehensive representation of the data than discriminative approaches.
However, we observe that shortcuts are preferentially encoded with minimal
information, a fact that generative models can exploit to mitigate shortcut
learning. In particular, we propose Chroma-VAE, a two-pronged approach where a
VAE classifier is initially trained to isolate the shortcut in a small latent
subspace, allowing a secondary classifier to be trained on the complementary,
shortcut-free latent subspace. In addition to demonstrating the efficacy of
Chroma-VAE on benchmark and real-world shortcut learning tasks, our work
highlights the potential for manipulating the latent space of generative
classifiers to isolate or interpret specific correlations.Comment: Presented at the 36th Conference on Neural Information Processing
Systems (NeurIPS 2022
Dynamic Narrowing of VAE Bottlenecks Using GECO and Regularization
When designing variational autoencoders (VAEs) or other types of latent space
models, the dimensionality of the latent space is typically defined upfront. In
this process, it is possible that the number of dimensions is under- or
overprovisioned for the application at hand. In case the dimensionality is not
predefined, this parameter is usually determined using time- and
resource-consuming cross-validation. For these reasons we have developed a
technique to shrink the latent space dimensionality of VAEs automatically and
on-the-fly during training using Generalized ELBO with Constrained Optimization
(GECO) and the -Augment-REINFORCE-Merge (-ARM) gradient estimator.
The GECO optimizer ensures that we are not violating a predefined upper bound
on the reconstruction error. This paper presents the algorithmic details of our
method along with experimental results on five different datasets. We find that
our training procedure is stable and that the latent space can be pruned
effectively without violating the GECO constraints.Comment: 16 pages, 3 figures, 1 tabl
Representation Learning and Applications in Local Differential Privacy
Latent variable models (LVMs) provide an elegant, efficient, and interpretable approach to learning the generation process of observed data. Latent variables can capture salient features within often highly-correlated data, forming powerful tools in machine learning.
For high-dimensional data, LVMs are typically parameterised by deep neural networks, and trained by maximising a variational lower bound on the data log likelihood. These models often suffer from poor use of their latent variable, with ad-hoc annealing factors used to encourage retention of information in the latent variable. In this work, we first introduce a novel approach to latent variable modelling, based on an objective that encourages both data reconstruction and generation. This ensures by design that the latent representations capture information about the data.
Second, we consider a novel approach to inducing local differential privacy (LDP) in high dimensions with a specifically-designed LVM. LDP offers a rigorous approach to preserving one’s privacy against both adversaries and the database administrator. Existing LDP mechanisms struggle to retain data utility in high dimensions owing to prohibitive noise requirements. We circumvent this by inducing LDP on the low- dimensional manifold underlying the data. Further, we introduce a novel approach for downstream model learning using LDP training data, enabling the training of performant machine learning models. We achieve significant performance gains over current state-of-the-art LDP mechanisms, demonstrating far-reaching implications for the widespread practice of data collection and sharing.
Finally, we scale up this approach, adapting current state-of-the-art representation learning models to induce LDP in even higher-dimensions, further widening the scope of LDP mechanisms for high-dimensional data collection
- …