58 research outputs found
On the Usefulness of Deep Ensemble Diversity for Out-of-Distribution Detection
The ability to detect Out-of-Distribution (OOD) data is important in
safety-critical applications of deep learning. The aim is to separate
In-Distribution (ID) data drawn from the training distribution from OOD data
using a measure of uncertainty extracted from a deep neural network. Deep
Ensembles are a well-established method of improving the quality of uncertainty
estimates produced by deep neural networks, and have been shown to have
superior OOD detection performance compared to single models. An existing
intuition in the literature is that the diversity of Deep Ensemble predictions
indicates distributional shift, and so measures of diversity such as Mutual
Information (MI) should be used for OOD detection. We show experimentally that
this intuition is not valid on ImageNet-scale OOD detection -- using MI leads
to 30-40% worse %FPR@95 compared to single-model entropy on some OOD datasets.
We suggest an alternative explanation for Deep Ensembles' better OOD detection
performance -- OOD detection is binary classification and we are ensembling
diverse classifiers. As such we show that practically, even better OOD
detection performance can be achieved for Deep Ensembles by averaging
task-specific detection scores such as Energy over the ensemble.Comment: Workshop on Uncertainty Quantification for Computer Vision, ECCV 202
Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition
Neural Network designs are quite diverse, from VGG-style to ResNet-style, and
from Convolutional Neural Networks to Transformers. Towards the design of
efficient accelerators, many works have adopted a dataflow-based, inter-layer
pipelined architecture, with a customised hardware towards each layer,
achieving ultra high throughput and low latency. The deployment of neural
networks to such dataflow architecture accelerators is usually hindered by the
available on-chip memory as it is desirable to preload the weights of neural
networks on-chip to maximise the system performance. To address this, networks
are usually compressed before the deployment through methods such as pruning,
quantization and tensor decomposition. In this paper, a framework for mapping
CNNs onto FPGAs based on a novel tensor decomposition method called Mixed-TD is
proposed. The proposed method applies layer-specific Singular Value
Decomposition (SVD) and Canonical Polyadic Decomposition (CPD) in a mixed
manner, achieving 1.73x to 10.29x throughput per DSP to state-of-the-art CNNs.
Our work is open-sourced: https://github.com/Yu-Zhewen/Mixed-TDComment: accepted by FPL202
Window-Based Early-Exit Cascades for Uncertainty Estimation: When Deep Ensembles are More Efficient than Single Models
Deep Ensembles are a simple, reliable, and effective method of improving both
the predictive performance and uncertainty estimates of deep learning
approaches. However, they are widely criticised as being computationally
expensive, due to the need to deploy multiple independent models. Recent work
has challenged this view, showing that for predictive accuracy, ensembles can
be more computationally efficient (at inference) than scaling single models
within an architecture family. This is achieved by cascading ensemble members
via an early-exit approach. In this work, we investigate extending these
efficiency gains to tasks related to uncertainty estimation. As many such
tasks, e.g. selective classification, are binary classification, our key novel
insight is to only pass samples within a window close to the binary decision
boundary to later cascade stages. Experiments on ImageNet-scale data across a
number of network architectures and uncertainty tasks show that the proposed
window-based early-exit approach is able to achieve a superior
uncertainty-computation trade-off compared to scaling single models. For
example, a cascaded EfficientNet-B2 ensemble is able to achieve similar
coverage at 5% risk as a single EfficientNet-B4 with <30% the number of MACs.
We also find that cascades/ensembles give more reliable improvements on OOD
data vs scaling models up. Code for this work is available at:
https://github.com/Guoxoug/window-early-exit
Augmenting Softmax Information for Selective Classification with Out-of-Distribution Data
Detecting out-of-distribution (OOD) data is a task that is receiving an
increasing amount of research attention in the domain of deep learning for
computer vision. However, the performance of detection methods is generally
evaluated on the task in isolation, rather than also considering potential
downstream tasks in tandem. In this work, we examine selective classification
in the presence of OOD data (SCOD). That is to say, the motivation for
detecting OOD samples is to reject them so their impact on the quality of
predictions is reduced. We show under this task specification, that existing
post-hoc methods perform quite differently compared to when evaluated only on
OOD detection. This is because it is no longer an issue to conflate
in-distribution (ID) data with OOD data if the ID data is going to be
misclassified. However, the conflation within ID data of correct and incorrect
predictions becomes undesirable. We also propose a novel method for SCOD,
Softmax Information Retaining Combination (SIRC), that augments softmax-based
confidence scores with feature-agnostic information such that their ability to
identify OOD samples is improved without sacrificing separation between correct
and incorrect ID predictions. Experiments on a wide variety of ImageNet-scale
datasets and convolutional neural network architectures show that SIRC is able
to consistently match or outperform the baseline for SCOD, whilst existing OOD
detection methods fail to do so
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Approximate FPGA-based LSTMs under Computation Time Constraints
Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM)
networks have demonstrated state-of-the-art accuracy in several emerging
Artificial Intelligence tasks. However, the models are becoming increasingly
demanding in terms of computational and memory load. Emerging latency-sensitive
applications including mobile robots and autonomous vehicles often operate
under stringent computation time constraints. In this paper, we address the
challenge of deploying computationally demanding LSTMs at a constrained time
budget by introducing an approximate computing scheme that combines iterative
low-rank compression and pruning, along with a novel FPGA-based LSTM
architecture. Combined in an end-to-end framework, the approximation method's
parameters are optimised and the architecture is configured to address the
problem of high-performance LSTM execution in time-constrained applications.
Quantitative evaluation on a real-life image captioning application indicates
that the proposed methods required up to 6.5x less time to achieve the same
application-level accuracy compared to a baseline method, while achieving an
average of 25x higher accuracy under the same computation time constraints.Comment: Accepted at the 14th International Symposium in Applied
Reconfigurable Computing (ARC) 201
FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition
3D Convolutional Neural Networks are gaining increasing attention from
researchers and practitioners and have found applications in many domains, such
as surveillance systems, autonomous vehicles, human monitoring systems, and
video retrieval. However, their widespread adoption is hindered by their high
computational and memory requirements, especially when resource-constrained
systems are targeted. This paper addresses the problem of mapping X3D, a
state-of-the-art model in Human Action Recognition that achieves accuracy of
95.5\% in the UCF101 benchmark, onto any FPGA device. The proposed toolflow
generates an optimised stream-based hardware system, taking into account the
available resources and off-chip memory characteristics of the FPGA device. The
generated designs push further the current performance-accuracy pareto front,
and enable for the first time the targeting of such complex model architectures
for the Human Action Recognition task.Comment: 8 pages, 6 figures, 2 table
- …