835 research outputs found
A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones
Fully-autonomous miniaturized robots (e.g., drones), with artificial
intelligence (AI) based visual navigation capabilities are extremely
challenging drivers of Internet-of-Things edge intelligence capabilities.
Visual navigation based on AI approaches, such as deep neural networks (DNNs)
are becoming pervasive for standard-size drones, but are considered out of
reach for nanodrones with size of a few cm. In this work, we
present the first (to the best of our knowledge) demonstration of a navigation
engine for autonomous nano-drones capable of closed-loop end-to-end DNN-based
visual navigation. To achieve this goal we developed a complete methodology for
parallel execution of complex DNNs directly on-bard of resource-constrained
milliwatt-scale nodes. Our system is based on GAP8, a novel parallel
ultra-low-power computing platform, and a 27 g commercial, open-source
CrazyFlie 2.0 nano-quadrotor. As part of our general methodology we discuss the
software mapping techniques that enable the state-of-the-art deep convolutional
neural network presented in [1] to be fully executed on-board within a strict 6
fps real-time constraint with no compromise in terms of flight results, while
all processing is done with only 64 mW on average. Our navigation engine is
flexible and can be used to span a wide performance range: at its peak
performance corner it achieves 18 fps while still consuming on average just
3.5% of the power envelope of the deployed nano-aircraft.Comment: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication
in the IEEE Internet of Things Journal (IEEE IOTJ
Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler
Different from developing neural networks (NNs) for general-purpose
processors, the development for NN chips usually faces with some
hardware-specific restrictions, such as limited precision of network signals
and parameters, constrained computation scale, and limited types of non-linear
functions.
This paper proposes a general methodology to address the challenges. We
decouple the NN applications from the target hardware by introducing a compiler
that can transform an existing trained, unrestricted NN into an equivalent
network that meets the given hardware's constraints. We propose multiple
techniques to make the transformation adaptable to different kinds of NN chips,
and reliable for restrict hardware constraints.
We have built such a software tool that supports both spiking neural networks
(SNNs) and traditional artificial neural networks (ANNs). We have demonstrated
its effectiveness with a fabricated neuromorphic chip and a
processing-in-memory (PIM) design. Tests show that the inference error caused
by this solution is insignificant and the transformation time is much shorter
than the retraining time. Also, we have studied the parameter-sensitivity
evaluations to explore the tradeoffs between network error and resource
utilization for different transformation strategies, which could provide
insights for co-design optimization of neuromorphic hardware and software.Comment: Accepted by ASPLOS 201
Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node
Binary Neural Networks (BNNs) have been shown to be robust to random
bit-level noise, making aggressive voltage scaling attractive as a power-saving
technique for both logic and SRAMs. In this work, we introduce the first fully
programmable IoT end-node system-on-chip (SoC) capable of executing
software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC
exploits a hybrid memory scheme where error-vulnerable SRAMs are complemented
by reliable standard-cell memories to safely store critical data under
aggressive voltage scaling. On a prototype in 22nm FDX technology, we
demonstrate that both the logic and SRAM voltage can be dropped to 0.5Vwithout
any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving
energy efficiency by 2.2X w.r.t. nominal conditions. Furthermore, we show that
the supply voltage can be dropped to 0.42V (50% of nominal) while keeping more
than99% of the nominal accuracy (with a bit error rate ~1/1000). In this
operating point, our prototype performs 4Gop/s (15.4Inference/s on the CIFAR-10
dataset) by computing up to 13binary ops per pJ, achieving 22.8 Inference/s/mW
while keeping within a peak power envelope of 674uW - low enough to enable
always-on operation in ultra-low power smart cameras, long-lifetime
environmental sensors, and insect-sized pico-drones.Comment: Submitted to ISICAS2020 journal special issu
Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives
The amount of data processed in the cloud, the development of
Internet-of-Things (IoT) applications, and growing data privacy concerns force
the transition from cloud-based to edge-based processing. Limited energy and
computational resources on edge push the transition from traditional von
Neumann architectures to In-memory Computing (IMC), especially for machine
learning and neural network applications. Network compression techniques are
applied to implement a neural network on limited hardware resources.
Quantization is one of the most efficient network compression techniques
allowing to reduce the memory footprint, latency, and energy consumption. This
paper provides a comprehensive review of IMC-based Quantized Neural Networks
(QNN) and links software-based quantization approaches to IMC hardware
implementation. Moreover, open challenges, QNN design requirements,
recommendations, and perspectives along with an IMC-based QNN hardware roadmap
are provided
- …