27 research outputs found
Computing in the blink of an eye: Current possibilities for edge computing and hardware-agnostic programming
With the rapid advancements of the internet of things, systems including sensing, communication, and computation become ubiquitous. The systems that are built with these technologies are increasingly complex and therefore require more automation and intelligent decision-making, while often including contact with humans. It is thus critical that such interactions run smoothly in real time, and that the automation strategies do not introduce important delays, usually not larger than 100 milliseconds, as the blink of a human eye. Pushing the deployment of the algorithms on embedded devices closer to where data is collected to avoid delays is one of the main motivations of edge computing. Further advantages of edge computing include improved reliability and data privacy management. This work showcases the possibilities of different embedded platforms that are often used as edge computing nodes: embedded microcontrollers, embedded microprocessors, FPGAs and embedded GPUs. The embedded solutions are compared with respect to their cost, complexity, energy consumption and computing speed establishing valuable guidelines for designers of complex systems that need to make use of edge computing. Furthermore, this paper shows the possibilities of hardware-agnostic programming using OpenCL, illustrating the price to pay in efficiency when software can be easily deployed on different hardware platforms
Convolutional Neural Networks on Embedded Automotive Platforms: A Qualitative Comparison
In the last decade, the rise of power-efficient, het-
erogeneous embedded platforms paved the way to the effective
adoption of neural networks in several application domains.
Especially, many-core accelerators (e.g., GPUs and FPGAs) are
used to run Convolutional Neural Networks, e.g., in autonomous
vehicles, and industry 4.0. At the same time, advanced research
on neural networks is producing interesting results in computer
vision applications, and NN packages for computer vision object
detection and categorization such as YOLO, GoogleNet and
AlexNet reached an unprecedented level of accuracy and perfor-
mance. With this work, we aim at validating the effectiveness and
efficiency of most recent networks on state-of-the-art embedded
platforms, with commercial-off-the-shelf System-on-Chips such
as the NVIDIA Tegra X2 and Xilinx Ultrascale+. In our vision,
this work will support the choice of the most appropriate CNN
package and computing system, and at the same time tries to
“make some order” in the field
Recent Advances in Embedded Computing, Intelligence and Applications
The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems
Adaptively Lossy Image Compression for Onboard Processing
More efficient image-compression codecs are an emerging requirement for spacecraft because increasingly complex, onboard image sensors can rapidly saturate downlink bandwidth of communication transceivers. While these codecs reduce transmitted data volume, many are compute-intensive and require rapid processing to sustain sensor data rates. Emerging next-generation small satellite (SmallSat) computers provide compelling computational capability to enable more onboard processing and compression than previously considered. For this research, we apply two compression algorithms for deployment on modern flight hardware: (1) end-to-end, neural-network-based, image compression (CNN-JPEG); and (2) adaptive image compression through feature-point detection (FPD-JPEG). These algorithms rely on intelligent data-processing pipelines that adapt to sensor data to compress it more effectively, ensuring efficient use of limited downlink bandwidths. The first algorithm, CNN-JPEG, employs a hybrid approach adapted from literature combining convolutional neural networks (CNNs) and JPEG; however, we modify and tune the training scheme for satellite imagery to account for observed training instabilities. This hybrid CNN-JPEG approach shows 23.5% better average peak signal-to-noise ratio (PSNR) and 33.5% better average structural similarity index (SSIM) versus standard JPEG on a dataset collected on the Space Test Program – Houston 5 (STP-H5-CSP) mission onboard the International Space Station (ISS). For our second algorithm, we developed a novel adaptive image-compression pipeline based upon JPEG that leverages the Oriented FAST and Rotated BRIEF (ORB) feature-point detection algorithm to adaptively tune the compression ratio to allow for a tradeoff between PSNR/SSIM and combined file size over a batch of STP-H5-CSP images. We achieve a less than 1% drop in average PSNR and SSIM while reducing the combined file size by 29.6% compared to JPEG using a static quality factor (QF) of 90
Computing in the Blink of an Eye: Current Possibilities for Edge Computing and Hardware-Agnostic Programming
With the rapid advancements of the internet of things, systems including sensing, communication, and computation become ubiquitous. The systems that are built with these technologies are increasingly complex and therefore require more automation and intelligent decision-making, while often including contact with humans. It is thus critical that such interactions run smoothly in real time, and that the automation strategies do not introduce important delays, usually not larger than 100 milliseconds, as the blink of a human eye. Pushing the deployment of the algorithms on embedded devices closer to where data is collected to avoid delays is one of the main motivations of edge computing. Further advantages of edge computing include improved reliability and data privacy management. This work showcases the possibilities of different embedded platforms that are often used as edge computing nodes: embedded microcontrollers, embedded microprocessors, FPGAs and embedded GPUs. The embedded solutions are compared with respect to their cost, complexity, energy consumption and computing speed establishing valuable guidelines for designers of complex systems that need to make use of edge computing. Furthermore, this paper shows the possibilities of hardware-agnostic programming using OpenCL, illustrating the price to pay in efficiency when software can be easily deployed on different hardware platforms.DFG, 414044773, Open Access Publizieren 2019 - 2020 / Technische Universität Berli
Tiny Classifier Circuits: Evolving Accelerators for Tabular Data
A typical machine learning (ML) development cycle for edge computing is to
maximise the performance during model training and then minimise the
memory/area footprint of the trained model for deployment on edge devices
targeting CPUs, GPUs, microcontrollers, or custom hardware accelerators. This
paper proposes a methodology for automatically generating predictor circuits
for classification of tabular data with comparable prediction performance to
conventional ML techniques while using substantially fewer hardware resources
and power. The proposed methodology uses an evolutionary algorithm to search
over the space of logic gates and automatically generates a classifier circuit
with maximised training prediction accuracy. Classifier circuits are so tiny
(i.e., consisting of no more than 300 logic gates) that they are called "Tiny
Classifier" circuits, and can efficiently be implemented in ASIC or on an FPGA.
We empirically evaluate the automatic Tiny Classifier circuit generation
methodology or "Auto Tiny Classifiers" on a wide range of tabular datasets, and
compare it against conventional ML techniques such as Amazon's AutoGluon,
Google's TabNet and a neural search over Multi-Layer Perceptrons. Despite Tiny
Classifiers being constrained to a few hundred logic gates, we observe no
statistically significant difference in prediction performance in comparison to
the best-performing ML baseline. When synthesised as a Silicon chip, Tiny
Classifiers use 8-18x less area and 4-8x less power. When implemented as an
ultra-low cost chip on a flexible substrate (i.e., FlexIC), they occupy 10-75x
less area and consume 13-75x less power compared to the most hardware-efficient
ML baseline. On an FPGA, Tiny Classifiers consume 3-11x fewer resources.Comment: 14 pages, 16 figure