27 research outputs found
Simple Regularisation for Uncertainty-Aware Knowledge Distillation
Considering uncertainty estimation of modern neural networks (NNs) is one of
the most important steps towards deploying machine learning systems to
meaningful real-world applications such as in medicine, finance or autonomous
systems. At the moment, ensembles of different NNs constitute the
state-of-the-art in both accuracy and uncertainty estimation in different
tasks. However, ensembles of NNs are unpractical under real-world constraints,
since their computation and memory consumption scale linearly with the size of
the ensemble, which increase their latency and deployment cost. In this work,
we examine a simple regularisation approach for distribution-free knowledge
distillation of ensemble of machine learning models into a single NN. The aim
of the regularisation is to preserve the diversity, accuracy and uncertainty
estimation characteristics of the original ensemble without any intricacies,
such as fine-tuning. We demonstrate the generality of the approach on
combinations of toy data, SVHN/CIFAR-10, simple to complex NN architectures and
different tasks
ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation
Fully convolutional U-shaped neural networks have largely been the dominant approach for pixel-wise image segmentation. In this work, we tackle two defects that hinder their deployment in real-world applications: 1) Predictions lack uncertainty quantification that may be crucial to many decision-making systems; 2) Large memory storage and computational consumption demanding extensive hardware resources. To address these issues and improve their practicality we demonstrate a few-parameter compact Bayesian convolutional architecture, that achieves a marginal improvement in accuracy in comparison to related work using significantly fewer parameters and compute operations. The architecture combines parameter-efficient operations such as separable convolutions, bilinear interpolation, multi-scale feature propagation and Bayesian inference for per-pixel uncertainty quantification through Monte Carlo Dropout. The best performing configurations required fewer than 2.5 million parameters on diverse challenging datasets with few observations
An Online Learning Method for Microgrid Energy Management Control*
We propose a novel Model Predictive Control (MPC) scheme based on online-learning (OL) for microgrid energy management, where the control optimisation is embedded as the last layer of the neural network. The proposed MPC scheme deals with uncertainty on the load and renewable generation power profiles and on electricity prices, by employing the predictions provided by an online trained neural network in the optimisation problem. In order to adapt to possible changes in the environment, the neural network is online trained based on continuously received data. The network hyperparameters are selected by performing a hyperparameter optimisation before the deployment of the controller, using a pretraining dataset. We show the effectiveness of the proposed method for microgrid energy management through extensive experiments on real microgrid datasets. Moreover, we show that the proposed algorithm has good transfer learning (TL) capabilities among different microgrids
Navigating Noise: A Study of How Noise Influences Generalisation and Calibration of Neural Networks
Enhancing the generalisation abilities of neural networks (NNs) through
integrating noise such as MixUp or Dropout during training has emerged as a
powerful and adaptable technique. Despite the proven efficacy of noise in NN
training, there is no consensus regarding which noise sources, types and
placements yield maximal benefits in generalisation and confidence calibration.
This study thoroughly explores diverse noise modalities to evaluate their
impacts on NN's generalisation and calibration under in-distribution or
out-of-distribution settings, paired with experiments investigating the metric
landscapes of the learnt representations across a spectrum of NN architectures,
tasks, and datasets. Our study shows that AugMix and weak augmentation exhibit
cross-task effectiveness in computer vision, emphasising the need to tailor
noise to specific domains. Our findings emphasise the efficacy of combining
noises and successful hyperparameter transfer within a single domain but the
difficulties in transferring the benefits to other domains. Furthermore, the
study underscores the complexity of simultaneously optimising for both
generalisation and calibration, emphasising the need for practitioners to
carefully consider noise combinations and hyperparameter tuning for optimal
performance in specific tasks and datasets.Comment: Accepted at Transactions on Machine Learning Research (April 2024).
Martin and Ondrej contributed equall
ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation
Fully convolutional U-shaped neural networks have largely been the dominant
approach for pixel-wise image segmentation. In this work, we tackle two defects
that hinder their deployment in real-world applications: 1) Predictions lack
uncertainty quantification that may be crucial to many decision-making systems;
2) Large memory storage and computational consumption demanding extensive
hardware resources. To address these issues and improve their practicality we
demonstrate a few-parameter compact Bayesian convolutional architecture, that
achieves a marginal improvement in accuracy in comparison to related work using
significantly fewer parameters and compute operations. The architecture
combines parameter-efficient operations such as separable convolutions,
bilinear interpolation, multi-scale feature propagation and Bayesian inference
for per-pixel uncertainty quantification through Monte Carlo Dropout. The best
performing configurations required fewer than 2.5 million parameters on diverse
challenging datasets with few observations.Comment: Accepted for publication at ICANN 2021. Code at:
https://github.com/martinferianc/ComBiNe
On the effects of quantisation on model uncertainty in Bayesian neural networks
Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation. Being able to quantify uncertainty while making decisions is essential for understanding when the model is over-/under-confident, and hence BNNs are attracting interest in safety-critical applications, such as autonomous driving, healthcare, and robotics. Nevertheless, BNNs have not been as widely used in industrial practice, mainly because of their increased memory and compute costs. In this work, we investigate quantisation of BNNs by compressing 32-bit floating-point weights and activations to their integer counterparts, that has already been successful in reducing the compute demand in standard pointwise neural networks. We study three types of quantised BNNs, we evaluate them under a wide range of different settings, and we empirically demonstrate that a uniform quantisation scheme applied to BNNs does not substantially decrease their quality of uncertainty estimation
Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions
The deployment of large language models (LLMs) raises concerns regarding
their cultural misalignment and potential ramifications on individuals from
various cultural norms. Existing work investigated political and social biases
and public opinions rather than their cultural values. To address this
limitation, the proposed Cultural Alignment Test (CAT) quantifies cultural
alignment using Hofstede's cultural dimension framework, which offers an
explanatory cross-cultural comparison through the latent variable analysis. We
apply our approach to assess the cultural values embedded in state-of-the-art
LLMs, such as: ChatGPT and Bard, across diverse cultures of countries: United
States (US), Saudi Arabia, China, and Slovakia, using different prompting
styles and hyperparameter settings. Our results not only quantify cultural
alignment of LLMs with certain countries, but also reveal the difference
between LLMs in explanatory cultural dimensions. While all LLMs did not provide
satisfactory results in understanding cultural values, GPT-4 exhibited the
highest CAT score for the cultural values of the US.Comment: 31 page
Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs
Due to the huge success and rapid development of convolutional neural networks (CNNs), there is a growing demand for hardware accelerators that accommodate a variety of CNNs to improve their inference latency and energy efficiency, in order to enable their deployment in real-time applications. Among popular platforms, field-programmable gate arrays (FPGAs) have been widely adopted for CNN acceleration because of their capability to provide superior energy efficiency and low-latency processing, while supporting high reconfigurability, making them favorable for accelerating rapidly evolving CNN algorithms. This article introduces a highly customized streaming hardware architecture that focuses on improving the compute efficiency for streaming applications by providing full-stack acceleration of CNNs on FPGAs. The proposed accelerator maps most computational functions, that is, convolutional and deconvolutional layers into a singular unified module, and implements the residual and concatenative connections between the functions with high efficiency, to support the inference of mainstream CNNs with different topologies. This architecture is further optimized through exploiting different levels of parallelism, layer fusion, and fully leveraging digital signal processing blocks (DSPs). The proposed accelerator has been implemented on Intel's Arria 10 GX1150 hardware and evaluated with a wide range of benchmark models. The results demonstrate a high performance of over 1.3 TOP/s of throughput, up to 97% of compute [multiply-accumulate (MAC)] efficiency, which outperforms the state-of-the-art FPGA accelerators