471 research outputs found
A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones
Fully-autonomous miniaturized robots (e.g., drones), with artificial
intelligence (AI) based visual navigation capabilities are extremely
challenging drivers of Internet-of-Things edge intelligence capabilities.
Visual navigation based on AI approaches, such as deep neural networks (DNNs)
are becoming pervasive for standard-size drones, but are considered out of
reach for nanodrones with size of a few cm. In this work, we
present the first (to the best of our knowledge) demonstration of a navigation
engine for autonomous nano-drones capable of closed-loop end-to-end DNN-based
visual navigation. To achieve this goal we developed a complete methodology for
parallel execution of complex DNNs directly on-bard of resource-constrained
milliwatt-scale nodes. Our system is based on GAP8, a novel parallel
ultra-low-power computing platform, and a 27 g commercial, open-source
CrazyFlie 2.0 nano-quadrotor. As part of our general methodology we discuss the
software mapping techniques that enable the state-of-the-art deep convolutional
neural network presented in [1] to be fully executed on-board within a strict 6
fps real-time constraint with no compromise in terms of flight results, while
all processing is done with only 64 mW on average. Our navigation engine is
flexible and can be used to span a wide performance range: at its peak
performance corner it achieves 18 fps while still consuming on average just
3.5% of the power envelope of the deployed nano-aircraft.Comment: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication
in the IEEE Internet of Things Journal (IEEE IOTJ
An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics
Near-sensor data analytics is a promising direction for IoT endpoints, as it
minimizes energy spent on communication and reduces network load - but it also
poses security concerns, as valuable data is stored or sent over the network at
various stages of the analytics pipeline. Using encryption to protect sensitive
data at the boundary of the on-chip analytics engine is a way to address data
security issues. To cope with the combined workload of analytics and encryption
in a tight power envelope, we propose Fulmine, a System-on-Chip based on a
tightly-coupled multi-core cluster augmented with specialized blocks for
compute-intensive data processing and encryption functions, supporting software
programmability for regular computing tasks. The Fulmine SoC, fabricated in
65nm technology, consumes less than 20mW on average at 0.8V achieving an
efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to
25MIPS/mW in software. As a strong argument for real-life flexible application
of our platform, we show experimental results for three secure analytics use
cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN
consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with
secured remote recognition in 5.74pJ/op; and seizure detection with encrypted
data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE
Transactions on Circuits and Systems - I: Regular Paper
Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node
Binary Neural Networks (BNNs) have been shown to be robust to random
bit-level noise, making aggressive voltage scaling attractive as a power-saving
technique for both logic and SRAMs. In this work, we introduce the first fully
programmable IoT end-node system-on-chip (SoC) capable of executing
software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC
exploits a hybrid memory scheme where error-vulnerable SRAMs are complemented
by reliable standard-cell memories to safely store critical data under
aggressive voltage scaling. On a prototype in 22nm FDX technology, we
demonstrate that both the logic and SRAM voltage can be dropped to 0.5Vwithout
any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving
energy efficiency by 2.2X w.r.t. nominal conditions. Furthermore, we show that
the supply voltage can be dropped to 0.42V (50% of nominal) while keeping more
than99% of the nominal accuracy (with a bit error rate ~1/1000). In this
operating point, our prototype performs 4Gop/s (15.4Inference/s on the CIFAR-10
dataset) by computing up to 13binary ops per pJ, achieving 22.8 Inference/s/mW
while keeping within a peak power envelope of 674uW - low enough to enable
always-on operation in ultra-low power smart cameras, long-lifetime
environmental sensors, and insect-sized pico-drones.Comment: Submitted to ISICAS2020 journal special issu
An Open Source and Open Hardware Deep Learning-Powered Visual Navigation Engine for Autonomous Nano-UAVs
Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers. In this work, we present what is, to the best of our knowledge, the first 27g nano-UAV system able to run aboard an end-to-end, closed-loop visual pipeline for autonomous navigation based on a state-of-the-art deep-learning algorithm, built upon the open-source CrazyFlie 2.0 nano-quadrotor. Our visual navigation engine is enabled by the combination of an ultra-low power computing device (the GAP8 system-on-chip) with a novel methodology for the deployment of deep convolutional neural networks (CNNs). We enable onboard real-time execution of a state-of-the-art deep CNN at up to 18Hz. Field experiments demonstrate that the system's high responsiveness prevents collisions with unexpected dynamic obstacles up to a flight speed of 1.5m/s. In addition, we also demonstrate the capability of our visual navigation engine of fully autonomous indoor navigation on a 113m previously unseen path. To share our key findings with the embedded and robotics communities and foster further developments in autonomous nano-UAVs, we publicly release all our code, datasets, and trained networks
Computing in the blink of an eye: Current possibilities for edge computing and hardware-agnostic programming
With the rapid advancements of the internet of things, systems including sensing, communication, and computation become ubiquitous. The systems that are built with these technologies are increasingly complex and therefore require more automation and intelligent decision-making, while often including contact with humans. It is thus critical that such interactions run smoothly in real time, and that the automation strategies do not introduce important delays, usually not larger than 100 milliseconds, as the blink of a human eye. Pushing the deployment of the algorithms on embedded devices closer to where data is collected to avoid delays is one of the main motivations of edge computing. Further advantages of edge computing include improved reliability and data privacy management. This work showcases the possibilities of different embedded platforms that are often used as edge computing nodes: embedded microcontrollers, embedded microprocessors, FPGAs and embedded GPUs. The embedded solutions are compared with respect to their cost, complexity, energy consumption and computing speed establishing valuable guidelines for designers of complex systems that need to make use of edge computing. Furthermore, this paper shows the possibilities of hardware-agnostic programming using OpenCL, illustrating the price to pay in efficiency when software can be easily deployed on different hardware platforms
Sub-pJ per operation scalable computing: The PULP experience
none1noUltra-low power operation and extreme energy efficiency are strong requirements for a number of high-growth Internet of-Things (IoT) applications requiring near-sensor processing. A promising approach to achieve major energy efficiency improvements is near-threshold computing. However, frequency degradation due to aggressive voltage scaling may not be acceptable for performance-constrained applications. The PULP platform leverages multi-core parallelism with explicitly-managed shared L1 memory to overcome performance degradation at low voltage, while maintaining the flexibility and programmability typical of instruction processors. PULP supports OpenMP, OpenCL, and OpenVX parallel programming with hardware support for energy efficient synchronization. Multiple silicon implementations of PULP have been taped out and achieve hundreds of GOPS/W on video, audio, inertial sensor data processing and classification, within power envelopes of a few milliwatts. PULP hardware and software are open-source, with the goal of supporting and boosting an innovation ecosystem focusing on ULP computing for the IoT.openRossi, DavideRossi, David
XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to
conventional deep neural networks at a fraction of the cost in terms of memory
and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully
digital configurable hardware accelerator IP for BNNs, integrated within a
microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid
SRAM / standard cell memory. The XNE is able to fully compute convolutional and
dense layers in autonomy or in cooperation with the core in the MCU to realize
more complex behaviors. We show post-synthesis results in 65nm and 22nm
technology for the XNE IP and post-layout results in 22nm for the full MCU
indicating that this system can drop the energy cost per binary operation to
21.6fJ per operation at 0.4V, and at the same time is flexible and performant
enough to execute state-of-the-art BNN topologies such as ResNet-34 in less
than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation
at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design
of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu
A Multi-Precision Bit-Serial Hardware Accelerator IP for Deep Learning Enabled Internet-of-Things
Deep Neural Networks (DNNs) computation-hungry algorithms demand hardware platforms capable of meeting rigid power and timing requirements.
We introduce the Serial-MAC-engine (SMAC-engine), a fully-digital hardware accelerator for inference of quantized DNNs suitable for integration in a heterogeneous System-on-Chip (SoC). The accelerator is completely embedded in the form of a Hardware Processing Engine (HWPE) in the PULPissimo platform, a RISCV-based programmable architecture that targets the computational requirements of IoT applications. The SMAC-engine supports configurable precision for both weights (8/6/4 bits) and activations (8/4 bits), with scalable performance. Results in 65 nm technology demonstrate that the serial-MAC approach enables the accelerator to achieve a maximum throughput of 14.28 GMAC/s, consuming 0.58 pJ/MAC @ 1.0 V when operating at a precision of 4 bits for weights and 8 bits for activations
Deploying RIOT operating system on a reconfigurable Internet of Things end-device
Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e ComputadoresThe Internet of Everything (IoE) is enabling the connection of an infinity of
physical objects to the Internet, and has the potential to connect every single
existing object in the world. This empowers a market with endless opportunities
where the big players are forecasting, by 2020, more than 50 billion connected
devices, representing an 8 trillion USD market.
The IoE is a broad concept that comprises several technological areas and will
certainly, include more in the future. Some of those already existing fields are the
Internet of Energy related with the connectivity of electrical power grids, Internet
of Medical Things (IoMT), for instance, enables patient monitoring, Internet of
Industrial Things (IoIT), which is dedicated to industrial plants, and the Internet
of Things (IoT) that focus on the connection of everyday objects (e.g. home
appliances, wearables, transports, buildings, etc.) to the Internet.
The diversity of scenarios where IoT can be deployed, and consequently the
different constraints associated to each device, leads to a heterogeneous network
composed by several communication technologies and protocols co-existing on the
same physical space. Therefore, the key requirements of an IoT network are
the connectivity and the interoperability between devices. Such requirement is
achieved by the adoption of standard protocols and a well-defined lightweight network
stack. Due to the adoption of a standard network stack, the data processed
and transmitted between devices tends to increase. Because most of the devices
connected are resource constrained, i.e., low memory, low processing capabilities,
available energy, the communication can severally decrease the device’s performance.
Hereupon, to tackle such issues without sacrificing other important requirements,
this dissertation aims to deploy an operating system (OS) for IoT, the
RIOT-OS, while providing a study on how network-related tasks can benefit from
hardware accelerators (deployed on reconfigurable technology), specially designed
to process and filter packets received by an IoT device.O conceito Internet of Everything (IoE) permite a conexão de uma infinidade
de objetos à Internet e tem o potencial de conectar todos os objetos existentes no
mundo. Favorecendo assim o aparecimento de novos mercados e infinitas possibilidades,
em que os grandes intervenientes destes mercados preveem até 2020 a
conexão de mais de 50 mil milhões de dispositivos, representando um mercado de
8 mil milhões de dólares.
IoE é um amplo conceito que inclui várias áreas tecnológicas e irá certamente
incluir mais no futuro. Algumas das áreas já existentes são: a Internet of Energy
relacionada com a conexão de redes de transporte e distribuição de energia à
Internet; Internet of Medical Things (IoMT), que possibilita a monotorização de
pacientes; Internet of Industrial Things (IoIT), dedicada a instalações industriais
e a Internet of Things (IoT), que foca na conexão de objetos do dia-a-dia (e.g.
eletrodomésticos, wearables, transportes, edifícios, etc.) à Internet.
A diversidade de cenários à qual IoT pode ser aplicado, e consequentemente,
as diferentes restrições aplicadas a cada dispositivo, levam à criação de uma rede
heterogénea composto por diversas tecnologias de comunicação e protocolos a coexistir
no mesmo espaço físico. Desta forma, os requisitos chave aplicados às redes
IoT são a conectividade e interoperabilidade entre dispositivos. Estes requisitos
são atingidos com a adoção de protocolos standard e pilhas de comunicação bem
definidas. Com a adoção de pilhas de comunicação standard, a informação processada
e transmitida entre dispostos tende a aumentar. Visto que a maioria dos
dispositivos conectados possuem escaços recursos, i.e., memória reduzida, baixa
capacidade de processamento, pouca energia disponível, o aumento da capacidade
de comunicação pode degradar o desempenho destes dispositivos.
Posto isto, para lidar com estes problemas e sem sacrificar outros requisitos importantes,
esta dissertação pretende fazer o porting de um sistema operativo IoT,
o RIOT, para uma solução reconfigurável, o CUTE mote. O principal objetivo
consiste na realização de um estudo sobre os benefícios que as tarefas relacionadas
com as camadas de rede podem ter ao serem executadas em hardware via aceleradores
dedicados. Estes aceleradores são especialmente projetados para processar
e filtrar pacotes de dados provenientes de uma interface radio em redes IoT periféricas
- …