54,633 research outputs found
Resource-Efficient Neural Networks for Embedded Systems
While machine learning is traditionally a resource intensive task, embedded
systems, autonomous navigation, and the vision of the Internet of Things fuel
the interest in resource-efficient approaches. These approaches aim for a
carefully chosen trade-off between performance and resource consumption in
terms of computation and energy. The development of such approaches is among
the major challenges in current machine learning research and key to ensure a
smooth transition of machine learning technology from a scientific environment
with virtually unlimited computing resources into every day's applications. In
this article, we provide an overview of the current state of the art of machine
learning techniques facilitating these real-world requirements. In particular,
we focus on deep neural networks (DNNs), the predominant machine learning
models of the past decade. We give a comprehensive overview of the vast
literature that can be mainly split into three non-mutually exclusive
categories: (i) quantized neural networks, (ii) network pruning, and (iii)
structural efficiency. These techniques can be applied during training or as
post-processing, and they are widely used to reduce the computational demands
in terms of memory footprint, inference speed, and energy efficiency. We
substantiate our discussion with experiments on well-known benchmark data sets
to showcase the difficulty of finding good trade-offs between
resource-efficiency and predictive performance.Comment: arXiv admin note: text overlap with arXiv:1812.0224
To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference
The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource constrained computing devices. Model compression techniques can address the computation issue of deep inference on embedded devices. This technique is highly attractive, as it does not rely on specialized hardware, or computation-offloading that is often infeasible due to privacy concerns or high latency. However, it remains unclear how model compression techniques perform across a wide range of DNNs. To design efficient embedded deep learning solutions, we need to understand their behaviors. This work develops a quantitative approach to characterize model compression techniques on a representative embedded deep learning architecture, the NVIDIA Jetson Tx2. We perform extensive experiments by considering 11 influential neural network architectures from the image classification and the natural language processing domains. We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model storage size, inference time, energy consumption and performance metrics. We demonstrate that there are opportunities to achieve fast deep inference on embedded systems, but one must carefully choose the compression settings. Our results provide insights on when and how to apply model compression techniques and guidelines for designing efficient embedded deep learning systems
How to train accurate BNNs for embedded systems?
A key enabler of deploying convolutional neural networks on
resource-constrained embedded systems is the binary neural network (BNN). BNNs
save on memory and simplify computation by binarizing both features and
weights. Unfortunately, binarization is inevitably accompanied by a severe
decrease in accuracy. To reduce the accuracy gap between binary and
full-precision networks, many repair methods have been proposed in the recent
past, which we have classified and put into a single overview in this chapter.
The repair methods are divided into two main branches, training techniques and
network topology changes, which can further be split into smaller categories.
The latter category introduces additional cost (energy consumption or
additional area) for an embedded system, while the former does not. From our
overview, we observe that progress has been made in reducing the accuracy gap,
but BNN papers are not aligned on what repair methods should be used to get
highly accurate BNNs. Therefore, this chapter contains an empirical review that
evaluates the benefits of many repair methods in isolation over the
ResNet-20\&CIFAR10 and ResNet-18\&CIFAR100 benchmarks. We found three repair
categories most beneficial: feature binarizer, feature normalization, and
double residual. Based on this review we discuss future directions and research
opportunities. We sketch the benefit and costs associated with BNNs on embedded
systems because it remains to be seen whether BNNs will be able to close the
accuracy gap while staying highly energy-efficient on resource-constrained
embedded systems
An approximate randomization-based neural network with dedicated digital architecture for energy-constrained devices
Variable energy constraints affect the implementations of neural networks on battery-operated embedded systems. This
paper describes a learning algorithm for randomization-based neural networks with hard-limit activation functions. The
approach adopts a novel cost function that balances accuracy and network complexity during training. From an energyspecific perspective, the new learning strategy allows to adjust, dynamically and in real time, the number of operations
during the network’s forward phase. The proposed learning scheme leads to efficient predictors supported by digital
architectures. The resulting digital architecture can switch to approximate computing at run time, in compliance with the
available energy budget. Experiments on 10 real-world prediction testbeds confirmed the effectiveness of the learning
scheme. Additional tests on limited-resource devices supported the implementation efficiency of the overall design
approac
- …