7 research outputs found
Towards Efficient and Trustworthy AI Through Hardware-Algorithm-Communication Co-Design
Artificial intelligence (AI) algorithms based on neural networks have been
designed for decades with the goal of maximising some measure of accuracy. This
has led to two undesired effects. First, model complexity has risen
exponentially when measured in terms of computation and memory requirements.
Second, state-of-the-art AI models are largely incapable of providing
trustworthy measures of their uncertainty, possibly `hallucinating' their
answers and discouraging their adoption for decision-making in sensitive
applications.
With the goal of realising efficient and trustworthy AI, in this paper we
highlight research directions at the intersection of hardware and software
design that integrate physical insights into computational substrates,
neuroscientific principles concerning efficient information processing,
information-theoretic results on optimal uncertainty quantification, and
communication-theoretic guidelines for distributed processing. Overall, the
paper advocates for novel design methodologies that target not only accuracy
but also uncertainty quantification, while leveraging emerging computing
hardware architectures that move beyond the traditional von Neumann digital
computing paradigm to embrace in-memory, neuromorphic, and quantum computing
technologies. An important overarching principle of the proposed approach is to
view the stochasticity inherent in the computational substrate and in the
communication channels between processors as a resource to be leveraged for the
purpose of representing and processing classical and quantum uncertainty
Repulsive Deep Ensembles are Bayesian
Deep ensembles have recently gained popularity in the deep learning community
for their conceptual simplicity and efficiency. However, maintaining functional
diversity between ensemble members that are independently trained with gradient
descent is challenging. This can lead to pathologies when adding more ensemble
members, such as a saturation of the ensemble performance, which converges to
the performance of a single model. Moreover, this does not only affect the
quality of its predictions, but even more so the uncertainty estimates of the
ensemble, and thus its performance on out-of-distribution data. We hypothesize
that this limitation can be overcome by discouraging different ensemble members
from collapsing to the same function. To this end, we introduce a kernelized
repulsive term in the update rule of the deep ensembles. We show that this
simple modification not only enforces and maintains diversity among the members
but, even more importantly, transforms the maximum a posteriori inference into
proper Bayesian inference. Namely, we show that the training dynamics of our
proposed repulsive ensembles follow a Wasserstein gradient flow of the KL
divergence with the true posterior. We study repulsive terms in weight and
function space and empirically compare their performance to standard ensembles
and Bayesian baselines on synthetic and real-world prediction tasks
Conditional Variational Autoencoder for Learned Image Reconstruction
Learned image reconstruction techniques using deep neural networks have recently gained popularity and have delivered promising empirical results. However, most approaches focus on one single recovery for each observation, and thus neglect information uncertainty. In this work, we develop a novel computational framework that approximates the posterior distribution of the unknown image at each query observation. The proposed framework is very flexible: it handles implicit noise models and priors, it incorporates the data formation process (i.e., the forward operator), and the learned reconstructive properties are transferable between different datasets. Once the network is trained using the conditional variational autoencoder loss, it provides a computationally efficient sampler for the approximate posterior distribution via feed-forward propagation, and the summarizing statistics of the generated samples are used for both point-estimation and uncertainty quantification. We illustrate the proposed framework with extensive numerical experiments on positron emission tomography (with both moderate and low-count levels) showing that the framework generates high-quality samples when compared with state-of-the-art methods
Stable ResNet
Deep ResNet architectures have achieved state of the art performance on many
tasks. While they solve the problem of gradient vanishing, they might suffer
from gradient exploding as the depth becomes large (Yang et al. 2017).
Moreover, recent results have shown that ResNet might lose expressivity as the
depth goes to infinity (Yang et al. 2017, Hayou et al. 2019). To resolve these
issues, we introduce a new class of ResNet architectures, called Stable ResNet,
that have the property of stabilizing the gradient while ensuring expressivity
in the infinite depth limit.Comment: 43 pages, 4 figure
Non-parametric machine learning for biological sequence data
In the past decade there has been a massive increase in the volume of biological sequence data, driven by massively parallel sequencing technologies. This has enabled data-driven statistical analyses using non-parametric predictive models (including those from machine learning) to complement more traditional, hypothesis-driven approaches. This thesis addresses several challenges that arise when applying non-parametric predictive models to biological sequence data.
Some of these challenges arise due to the nature of the biological system of interest. For example, in the study of the human microbiome the phylogenetic relationships between microorganisms are often ignored in statistical analyses. This thesis outlines a novel approach to modelling phylogenetic similarity using string kernels and demonstrates its utility in the two-sample test and host-trait prediction.
Other challenges arise from limitations in our understanding of the models themselves. For example, calculating variable importance (a key task in biomedical applications) is not possible for many models. This thesis describes a novel extension of an existing approach to compute importance scores for grouped variables in a Bayesian neural network. It also explores the behaviour of random forest classifiers when applied to microbial datasets, with a focus on the robustness of the biological findings under different modelling assumptions.Open Acces
On kernel and feature learning in neural networks
Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have recently emerged as two paradigms through which we can understand the complex behaviours of large-scale deep learning systems in practice. In the literature, they are often portrayed as two opposing ends of a dichotomy, both with their own strengths and weaknesses: one, kernel learning, draws connections to well-studied machine learning techniques like kernel methods and Gaussian Processes, whereas the other, feature learning, promises to capture more of the rich, but yet unexplained, properties that are unique to NNs.
In this thesis, we present three works studying properties of NNs that combine insights from both perspectives, highlighting not only their differences but also shared similarities. We start by reviewing relevant literature on the theory of deep learning, with a focus on the study of wide NNs. This provides context for a discussion of kernel and feature learning, and against this backdrop, we proceed to describe our contributions. First, we examine the relationship between ensembles of wide NNs and Bayesian inference using connections from kernel learning to Gaussian Processes, and propose a modification that accounts for missing variance at initialisation in NN functions, resulting in a Bayesian interpretation to our trained deep ensembles. Next, we combine kernel and feature learning to demonstrate the suitability of the feature kernel, i.e. the kernel induced by inner products over final layer NN features, as a target for knowledge distillation, where one seeks to use a powerful teacher model to improve the performance of a weaker student model. Finally, we explore the gap between collapsed and whitened features in self-supervised learning, highlighting the decay rate of eigenvalues in the feature kernel as a key quantity that bridges between this gap and impacts downstream generalisation performance, especially in settings with scarce labelled data. We conclude with a discussion, including limitations and future outlook, of our contributions