Search CORE

16 research outputs found

Energy-efficient neural networks with near-threshold processors and hardware accelerators

Author: Howard Neil
Nunez-Yanez Jose
Publication venue: 'Elsevier BV'
Publication date: 01/06/2021
Field of study

BiSon-e: a lightweight and high-performance accelerator for narrow integer linear algebra computing on the edge

Author: Cristal Kestelman Adrián
Figueras Bagué Roger
Olivieri Mauro
Ramírez Lazo Cristóbal
Reggiani Enrico
Unsal Osman Sabri
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Linear algebra computational kernels based on byte and sub-byte integer data formats are at the base of many classes of applications, ranging from Deep Learning to Pattern Matching. Porting the computation of these applications from cloud to edge and mobile devices would enable significant improvements in terms of security, safety, and energy efficiency. However, despite their low memory and energy demands, their intrinsically high computational intensity makes the execution of these workloads challenging on highly resource-constrained devices. In this paper, we present BiSon-e, a novel RISC-V based architecture that accelerates linear algebra kernels based on narrow integer computations on edge processors by performing Single Instruction Multiple Data (SIMD) operations on off-The-shelf scalar Functional Units (FUs). Our novel architecture is built upon the binary segmentation technique, which allows to significantly reduce the memory footprint and the arithmetic intensity of linear algebra kernels requiring narrow data sizes. We integrate BiSon-e into a complete System-on-Chip (SoC) based on RISC-V, synthesized and Place Routed in 65nm and 22nm technologies, introducing a negligible 0.07% area overhead with respect to the baseline architecture. Our experimental evaluation shows that, when computing the Convolution and Fully-Connected layers of the AlexNet and VGG-16 Convolutional Neural Networks (CNNs) with 8-, 4-, and 2-bit, our solution gains up to 5.6×, 13.9× and 24× in execution time compared to the scalar implementation of a single RISC-V core, and improves the energy efficiency of string matching tasks by 5× when compared to a RISC-V-based Vector Processing Unit (VPU).This research was supported by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of the total cost eligible, under the DRAC project [001-P-001723], and from the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB). This research was also supported by the grant PRE2020-095272 funded by MCIN/AEI/ 10.13039/501100011033 and, by “ESF Investing in your future”.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC