29 research outputs found
DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs
The deployment of Deep Neural Networks (DNNs) on end-nodes at the extreme
edge of the Internet-of-Things is a critical enabler to support pervasive Deep
Learning-enhanced applications. Low-Cost MCU-based end-nodes have limited
on-chip memory and often replace caches with scratchpads, to reduce area
overheads and increase energy efficiency -- requiring explicit DMA-based memory
transfers between different levels of the memory hierarchy. Mapping modern DNNs
on these systems requires aggressive topology-dependent tiling and
double-buffering. In this work, we propose DORY (Deployment Oriented to memoRY)
- an automatic tool to deploy DNNs on low cost MCUs with typically less than
1MB of on-chip SRAM memory. DORY abstracts tiling as a Constraint Programming
(CP) problem: it maximizes L1 memory utilization under the topological
constraints imposed by each DNN layer. Then, it generates ANSI C code to
orchestrate off- and on-chip transfers and computation phases. Furthermore, to
maximize speed, DORY augments the CP formulation with heuristics promoting
performance-effective tile sizes. As a case study for DORY, we target
GreenWaves Technologies GAP8, one of the most advanced parallel ultra-low power
MCU-class devices on the market. On this device, DORY achieves up to 2.5x
better MAC/cycle than the GreenWaves proprietary software solution and 18.1x
better than the state-of-the-art result on an STM32-F746 MCU on single layers.
Using our tool, GAP-8 can perform end-to-end inference of a 1.0-MobileNet-128
network consuming just 63 pJ/MAC on average @ 4.3 fps - 15.4x better than an
STM32-F746. We release all our developments - the DORY framework, the optimized
backend kernels, and the related heuristics - as open-source software.Comment: 14 pages, 12 figures, 4 tables, 2 listings. Accepted for publication
in IEEE Transactions on Computers
(https://ieeexplore.ieee.org/document/9381618
Human activity recognition: suitability of a neuromorphic approach for on-edge AIoT applications
Human activity recognition (HAR) is a classification problem involving time-dependent signals produced by body monitoring, and its application domain covers all the aspects of human life, from healthcare to sport, from safety to smart environments. As such, it is naturally well suited for on-edge deployment of personalized point-of-care (POC) analyses or other tailored services for the user. However, typical smart and wearable devices suffer from relevant limitations regarding energy consumption, and this significantly hinders the possibility for successful employment of edge computing for tasks like HAR. In this paper, we investigate how this problem can be mitigated by adopting a neuromorphic approach. By comparing optimized classifiers based on traditional deep neural network (DNN) architectures as well as on recent alternatives like the Legendre Memory Unit (LMU), we show how spiking neural networks (SNNs) can effectively deal with the temporal signals typical of HAR providing high performances at a low energy cost. By carrying out an application-oriented hyperparameter optimization, we also propose a methodology flexible to be extended to different domains, to enlarge the field of neuro-inspired classifier suitable for on-edge artificial intelligence of things (AIoT) applications
2022 roadmap on neuromorphic computing and engineering
Modern computation based on von Neumann architecture is now a mature cutting-edge science. In the von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 10 calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges for each research area. We hope that this roadmap will be a useful resource by providing a concise yet comprehensive introduction to readers outside this field, for those who are just entering the field, as well as providing future perspectives for those who are well established in the neuromorphic computing community
Lightweight Neural Architecture Search for Temporal Convolutional Networks at the Edge
Neural Architecture Search (NAS) is quickly becoming the go-to approach to
optimize the structure of Deep Learning (DL) models for complex tasks such as
Image Classification or Object Detection. However, many other relevant
applications of DL, especially at the edge, are based on time-series processing
and require models with unique features, for which NAS is less explored. This
work focuses in particular on Temporal Convolutional Networks (TCNs), a
convolutional model for time-series processing that has recently emerged as a
promising alternative to more complex recurrent architectures. We propose the
first NAS tool that explicitly targets the optimization of the most peculiar
architectural parameters of TCNs, namely dilation, receptive-field and number
of features in each layer. The proposed approach searches for networks that
offer good trade-offs between accuracy and number of parameters/operations,
enabling an efficient deployment on embedded platforms. We test the proposed
NAS on four real-world, edge-relevant tasks, involving audio and bio-signals.
Results show that, starting from a single seed network, our method is capable
of obtaining a rich collection of Pareto optimal architectures, among which we
obtain models with the same accuracy as the seed, and 15.9-152x fewer
parameters. Compared to three state-of-the-art NAS tools, ProxylessNAS,
MorphNet and FBNetV2, our method explores a larger search space for TCNs (up to
10^12x) and obtains superior solutions, while requiring low GPU memory and
search time. We deploy our NAS outputs on two distinct edge devices, the
multicore GreenWaves Technology GAP8 IoT processor and the single-core
STMicroelectronics STM32H7 microcontroller. With respect to the
state-of-the-art hand-tuned models, we reduce latency and energy of up to 5.5x
and 3.8x on the two targets respectively, without any accuracy loss.Comment: Accepted for publication at the IEEE Transactions on Computer