Search CORE

54,633 research outputs found

Resource-Efficient Neural Networks for Embedded Systems

Author: Fröning Holger
Ghahramani Zoubin
Peharz Robert
Pernkopf Franz
Pfeifenberger Lukas
Roth Wolfgang
Schindler Günther
Tschiatschek Sebastian
Zöhrer Matthias
Publication venue
Publication date: 07/01/2020
Field of study

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into every day's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We substantiate our discussion with experiments on well-known benchmark data sets to showcase the difficulty of finding good trade-offs between resource-efficiency and predictive performance.Comment: arXiv admin note: text overlap with arXiv:1812.0224

arXiv.org e-Print Archive

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Author: Fang J
Feng Y
Gao L
Qin Q
Ren J
Wang H
Wang Z
Yu J
Zheng J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/10/2018
Field of study

The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource constrained computing devices. Model compression techniques can address the computation issue of deep inference on embedded devices. This technique is highly attractive, as it does not rely on specialized hardware, or computation-offloading that is often infeasible due to privacy concerns or high latency. However, it remains unclear how model compression techniques perform across a wide range of DNNs. To design efficient embedded deep learning solutions, we need to understand their behaviors. This work develops a quantitative approach to characterize model compression techniques on a representative embedded deep learning architecture, the NVIDIA Jetson Tx2. We perform extensive experiments by considering 11 influential neural network architectures from the image classification and the natural language processing domains. We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model storage size, inference time, energy consumption and performance metrics. We demonstrate that there are opportunities to achieve fast deep inference on embedded systems, but one must carefully choose the compression settings. Our results provide insights on when and how to apply model compression techniques and guidelines for designing efficient embedded deep learning systems

arXiv.org e-Print Archive

Crossref

Lancaster E-Prints

White Rose Research Online

How to train accurate BNNs for embedded systems?

Author: Corporaal Henk
de Putter Floran
Publication venue
Publication date: 24/06/2022
Field of study

A key enabler of deploying convolutional neural networks on resource-constrained embedded systems is the binary neural network (BNN). BNNs save on memory and simplify computation by binarizing both features and weights. Unfortunately, binarization is inevitably accompanied by a severe decrease in accuracy. To reduce the accuracy gap between binary and full-precision networks, many repair methods have been proposed in the recent past, which we have classified and put into a single overview in this chapter. The repair methods are divided into two main branches, training techniques and network topology changes, which can further be split into smaller categories. The latter category introduces additional cost (energy consumption or additional area) for an embedded system, while the former does not. From our overview, we observe that progress has been made in reducing the accuracy gap, but BNN papers are not aligned on what repair methods should be used to get highly accurate BNNs. Therefore, this chapter contains an empirical review that evaluates the benefits of many repair methods in isolation over the ResNet-20\&CIFAR10 and ResNet-18\&CIFAR100 benchmarks. We found three repair categories most beneficial: feature binarizer, feature normalization, and double residual. Based on this review we discuss future directions and research opportunities. We sketch the benefit and costs associated with BNNs on embedded systems because it remains to be seen whether BNNs will be able to close the accuracy gap while staying highly energy-efficient on resource-constrained embedded systems

arXiv.org e-Print Archive

An approximate randomization-based neural network with dedicated digital architecture for energy-constrained devices

Author: Gastaldo Paolo
Gianoglio Christian
Ragusa Edoardo
Zunino Rodolfo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Variable energy constraints affect the implementations of neural networks on battery-operated embedded systems. This paper describes a learning algorithm for randomization-based neural networks with hard-limit activation functions. The approach adopts a novel cost function that balances accuracy and network complexity during training. From an energyspecific perspective, the new learning strategy allows to adjust, dynamically and in real time, the number of operations during the network’s forward phase. The proposed learning scheme leads to efficient predictors supported by digital architectures. The resulting digital architecture can switch to approximate computing at run time, in compliance with the available energy budget. Experiments on 10 real-world prediction testbeds confirmed the effectiveness of the learning scheme. Additional tests on limited-resource devices supported the implementation efficiency of the overall design approac

Archivio istituzionale della ricerca - Università di Genova