10 research outputs found
Methods and Applications for Low-power Deep Neural Networks on Edge Devices
L'abstract è presente nell'allegato / the abstract is in the attachmen
A Non-conventional Sum-and-Max based Neural Network layer for Low Power Classification
The increasing need for small and low-power Deep Neural Networks (DNNs) for edge computing applications involves the investigation of new architectures that allow good performance on low-resources/mobile devices. To this aim, many different structures have been proposed in the literature, mainly targeting the reduction in the costs introduced by the Multiply and Accumulate (MAC) primitive. In this work, a DNN layer based on the novel Sum and Max (SAM) paradigm is proposed. It does not require either the use of multiplications or the insertion of complex non-linear operations. Furthermore, it is especially prone to aggressive pruning, thus needing a very low number of parameters to work. The layer is tested on a simple classification task and its cost is compared with a classic DNN layer with equivalent accuracy based on the MAC primitive, in order to assess the reduction of resources that the use of this new structure could introduce
Multiply-And-Max/min Neurons at the Edge: Pruned Autoencoder Implementation
In response to the increasing interest in Internet of Things (IoT) applications, several studies explore ways to reduce the size of Deep Neural Networks (DNNs), to allow implementations on edge devices with strongly constrained resources. To this aim, pruning allows removing redundant interconnections between neurons, thus reducing a DNN memory footprint and computational complexity, while also minimizing the performance loss. Over the last years, many works presenting new pruning techniques and prunable architectures have been proposed but relatively little effort has been devoted to implementing and validating their performance on hardware. Recently, we introduced neurons based on the Multiply-And-Maximin (MAM) map-reduce paradigm. When state-of-the-art unstructured pruning techniques are applied, MAM-based neurons have shown better pruning capabilities compared to standard neurons based on the Multiply and Accumulate (MAC) paradigm. In this work, we implement MAM on-device for the first time to demonstrate the feasibility of MAM-based DNNs at the Edge. In particular, as a case study, we implement an autoencoder for electrocardiogram (ECG) signals on a low-end microcontroller unit (MCU), namely the STM32F767ZI based on ARM Cortex-M7. We show that the tail of a pruned MAM-based autoencoder fits on the targeted device while keeping a good reconstruction accuracy (Average Signal to Noise Ratio of 32.6 dB), where a standard MAC-based implementation with the same accuracy would not. Furthermore, the implemented MAM-based layer guarantees a lower energy consumption and inference time compared to the MAC-based layer at the same level of performance
Aggressively prunable MAM²-based Deep Neural Oracle for ECG acquisition by Compressed Sensing
The growing interest in Internet of Things (IoT) and mobile biomedical applications is pushing the investigation on approaches that can be used to reduce the energy consumption while acquiring data. Compressed Sensing (CS) is a technique that allows to reduce the energy required for the acquisition and compression of a sparse signal, transferring the complexity to the reconstruction stage. Many works leverage the use of Deep Neural Networks (DNNs) for signal reconstruction and, assuming that also this operation has to be performed on a IoT device, it is necessary for the DNN architecture to fit in small and low-energy devices. Pruning techniques, that can reduce the size of DNNs by removing unnecessary parameters and thus decreasing storage requirements, can be of great help in this effort. In this work, a novel Multiply and Max&Min (MAM²) map-reduce paradigm trained with the vanishing contributes technique and then pruned with the activation rate method is proposed. The result is a naturally and aggressively pruned DNN layer structure. This structure is used to reduce the complexity of a DNN-based CS reconstructor and its performance is verified. As an example, MAM²-based layers still retain the baseline accuracy of the CS decoder with 94% of the parameters pruned against 25% when using classic MAC-based layers only
A High-level Implementation Framework for Non-Recurrent Artificial Neural Networks on FPGA
This paper presents a fully parametrized framework, entirely described in VHDL, to simplify the FPGA implementation of non-recurrent Artificial Neural Networks (ANNs), which works independently of the complexity of the networks in terms of number of neurons, layers and, to some extent, overall topology. More specifically, the network may consist of fully-connected, max-pooling or convolutional layers which can be arbitrarily combined. The ANN is used only for inference, while back-propagation is performed off-line during the ANN learning phase. Target of this work is to achieve fast-prototyping, small, low-power and cost-effective implementation of ANNs to be employed directly on the sensing nodes of IOT (i.e. Edge Computing). The performance of so-implemented ANNs is assessed for two real applications, namely hand movement recognition based on electromyographic signals and handwritten character recognition. Energy per operation is measured in the FPGA realization and compared with the corresponding ANN implemented on a microcontroller (μC) to demonstrate the advantage of the FPGA based solution
PEDRo: an Event-based Dataset for Person Detection in Robotics
Event-based cameras are devices based on neuromorphic sensors that are gaining popularity in different fields, including robotics. They are suitable for tasks where high-speed, low-latency, low-power operations are required. Person detection is one of these, to allow mobile robots to monitor areas and navigate in crowded environments. Most of the available event-based datasets that contain annotated human figures and collected with a moving camera are designed for autonomous driving tasks. Yet, robotic tasks are certainly not limited to the recognition of pedestrians walking on sidewalks, which makes the above datasets of limited utility. To address this impasse, we introduce a new dataset called PEDRo, which is fully manually labeled. This dataset has been specifically developed for person detection and it counts a total number of 43259 bounding boxes included in 119 recordings. A moving DAVIS346 event-based camera has been used to collect events in a large variety of indoor and outdoor scenarios with various lighting and meteorological conditions (such as sunny, rainy and snowy). To the best of our knowledge, this is now the largest available dataset for event-based person detection, which has been recorded with a moving camera and manually labeled
Event-based Classification with Recurrent Spiking Neural Networks on Low-end Micro-Controller Units
Due to its intrinsic sparsity both in time and space, event-based data is optimally suited for edge-computing applications that require low power and low latency. Time varying signals encoded with this data representation are best processed with Spiking Neural Networks (SNN). In particular, recurrent SNNs (RSNNs) can solve temporal tasks using a relatively low number of parameters, and therefore support their hardware implementation in resource-constrained computing architectures. These premises propel the need of exploring the properties of these kinds of structures on low-power processing systems to test their limits both in terms of computational accuracy and resource consumption, without having to resort to full-custom implementations. In this work, we implemented an RSNN model on a low-end, resource-constrained ARM-Cortex-M4-based Micro Controller Unit (MCU). We trained it on a down-sampled version of the N-MNIST event-based dataset for digit recognition as an example to assess its performance in the inference phase. With an accuracy of 97.2%, the implementation has an average energy consumption as low as 4.1 μJ and a worst-case computational time of 150.4 μs per time-step with an operating frequency of 180 MHz, so the deployment of RSNNs on MCU devices is a feasible option for small image vision real-time tasks
Event-based Classification with Recurrent Spiking Neural Networks on Low-end Micro-Controller Units
Due to its intrinsic sparsity both in time and space, event-based data is optimally suited for edge-computing applications that require low power and low latency. Time varying signals encoded with this data representation are best processed with Spiking Neural Networks (SNN). In particular, recurrent SNNs (RSNNs) can solve temporal tasks using a relatively low number of parameters, and therefore support their hardware implementation in resource-constrained computing architectures. These premises propel the need of exploring the properties of these kinds of structures on low-power processing systems to test their limits both in terms of computational accuracy and resource consumption, without having to resort to full-custom implementations. In this work, we implemented an RSNN model on a low-end, resource-constrained ARM-Cortex-M4-based Micro Controller Unit (MCU). We trained it on a down-sampled version of the N-MNIST event-based dataset for digit recognition as an example to assess its performance in the inference phase. With an accuracy of 97.2%, the implementation has an average energy consumption as low as 4.1μJ and a worst-case computational time of 150.4μs per time-step with an operating frequency of 180 MHz, so the deployment of RSNNs on MCU devices is a feasible option for small image vision real-time tasks.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Electronic Instrumentatio