3 research outputs found
Exploiting diversity for efficient machine learning
A common practice for solving machine learning problems is currently to consider
each problem in isolation, starting from scratch every time a new learning problem
is encountered or a new model is proposed. This is a perfectly feasible solution
when the problems are sufficiently easy or, if the problem is hard when a large
amount of resources, both in terms of the training data and computation, are
available. Although this naive approach has been the main focus of research in
machine learning for a few decades and had a lot of success, it becomes infeasible
if the problem is too hard in proportion to the available resources. When using
a complex model in this naive approach, it is necessary to collect large data
sets (if possible at all) to avoid overfitting and hence it is also necessary to use
large computational resources to handle the increased amount of data, first during
training to process a large data set and then also at test time to execute a complex
model.
An alternative to this strategy of treating each learning problem independently
is to leverage related data sets and computation encapsulated in previously
trained models. By doing that we can decrease the amount of data necessary to
reach a satisfactory level of performance and, consequently, improve the accuracy
achievable and decrease training time. Our attack on this problem is to exploit
diversity - in the structure of the data set, in the features learnt and in the
inductive biases of different neural network architectures.
In the setting of learning from multiple sources we introduce multiple-source
cross-validation, which gives an unbiased estimator of the test error when the data
set is composed of data coming from multiple sources and the data at test time are
coming from a new unseen source. We also propose new estimators of variance of
the standard k-fold cross-validation and multiple-source cross-validation, which
have lower bias than previously known ones.
To improve unsupervised learning we introduce scheduled denoising autoencoders,
which learn a more diverse set of features than the standard denoising
auto-encoder. This is thanks to their training procedure, which starts with a
high level of noise, when the network is learning coarse features and then the
noise is lowered gradually, which allows the network to learn some more local
features. A connection between this training procedure and curriculum learning
is also drawn. We develop further the idea of learning a diverse representation
by explicitly incorporating the goal of obtaining a diverse representation into the
training objective. The proposed model, the composite denoising autoencoder,
learns multiple subsets of features focused on modelling variations in the data set
at different levels of granularity.
Finally, we introduce the idea of model blending, a variant of model compression,
in which the two models, the teacher and the student, are both strong
models, but different in their inductive biases. As an example, we train convolutional
networks using the guidance of bidirectional long short-term memory
(LSTM) networks. This allows to train the convolutional neural network to be
more accurate than the LSTM network at no extra cost at test time
Accelerating and Privatizing Diffusion Models
Diffusion models (DMs) have emerged as a powerful class of generative models. DMs offer both state-of-the-art synthesis quality and sample diversity in combination with a robust and scalable learning objective. DMs rely on a diffusion process that gradually perturbs the data towards a normal distribution, while the neural network learns to denoise. Formally, the problem reduces to learning the score function, i.e., the gradient of the log-density of the perturbed data. The reverse of the diffusion process can be approximated by a differential equation, defined by the learned score function, and can therefore be used for generation when starting from random noise. In this thesis, we give a thorough and beginner-friendly introduction to DMs and discuss their history starting from early work on score-based generative models. Furthermore, we discuss connections to other statistical models and lay out applications of DMs, with a focus on image generative modeling.
We then present CLD: a new DM based on critically-damped Langevin dynamics. CLD can be interpreted as running a joint diffusion in an extended space, where the auxiliary variables can be considered "velocities" that are coupled to the data variables as in Hamiltonian dynamics. We derive a novel score matching objective for CLD-based DMs and introduce a fast solver for the reverse diffusion process which is inspired by methods from the statistical mechanics literature. The CLD framework provides new insights into DMs and generalizes many existing DMs which are based on overdamped Langevin dynamics.
Next, we present GENIE, a novel higher-order numerical solver for DMs. Many existing higher-order solvers for DMs built on finite difference schemes which break down in the large step size limit as approximations become too crude. GENIE, on the other hand, learns neural network-based models for higher-order derivatives whose precision do not depend on the step size. The additional networks in GENIE are implemented as small output heads on top of the neural backbone of the original DM, keeping the computational overhead minimal. Unlike recent sampling distillation methods that fundamentally alter the generation process in DMs, GENIE still solves the true generative differential equation, and therefore naturally enables applications such as encoding and guided sampling.
The fourth chapter presents differentially private diffusion models (DPDMs), DMs trained with strict differential privacy guarantees. While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained on sensitive data with differential privacy guarantees can sidestep this challenge, providing access to synthetic data instead. DPDMs enforce privacy by using differentially private stochastic gradient descent for training. We thoroughly study the design space of DPDMs and propose noise multiplicity, a simple yet powerful modification of the DM training objective tailored to the differential privacy setting. We motivate and show numerically why DMs are better suited for differentially private generative modeling than one-shot generators such as generative adversarial networks or normalizing flows.
Finally, we propose to distill the knowledge of large pre-trained DMs into smaller student DMs. Large-scale DMs have achieved unprecedented results across several domains, however, they generally require a large amount of GPU memory and are slow at inference time, making it difficult to deploy them in real-time or on resource-limited devices. In particular, we propose an approximate score matching objective that regresses the student model towards predictions of the teacher DM rather than the clean data as is done in standard DM training. We show that student models outperform the larger teacher model for a variety of compute budgets. Additionally, the student models may also be deployed on GPUs with significantly less memory than was required for the original teacher model