7 research outputs found

    Mix-GEMM: An efficient HW-SW architecture for mixed-precision quantized deep neural networks inference on edge devices

    Get PDF
    Deep Neural Network (DNN) inference based on quantized narrow-precision integer data represents a promising research direction toward efficient deep learning computations on edge and mobile devices. On one side, recent progress of Quantization-Aware Training (QAT) frameworks aimed at improving the accuracy of extremely quantized DNNs allows achieving results close to Floating-Point 32 (FP32), and provides high flexibility concerning the data sizes selection. Unfortunately, current Central Processing Unit (CPU) architectures and Instruction Set Architectures (ISAs) targeting resource-constrained devices present limitations on the range of data sizes supported to compute DNN kernels.This paper presents Mix-GEMM, a hardware-software co-designed architecture capable of efficiently computing quantized DNN convolutional kernels based on byte and sub-byte data sizes. Mix-GEMM accelerates General Matrix Multiplication (GEMM), representing the core kernel of DNNs, supporting all data size combinations from 8- to 2-bit, including mixed-precision computations, and featuring performance that scale with the decreasing of the computational data sizes. Our experimental evaluation, performed on representative quantized Convolutional Neural Networks (CNNs), shows that a RISC-V based edge System-on-Chip (SoC) integrating Mix-GEMM achieves up to 1.3 TOPS/W in energy efficiency, and up to 13.6 GOPS in throughput, gaining from 5.3Ă— to 15.1Ă— in performance over the OpenBLAS GEMM frameworks running on a commercial RISC-V based edge processor. By performing synthesis and Place and Route (PnR) of the enhanced SoC in Global Foundries 22nm FDX technology, we show that Mix-GEMM only accounts for 1% of the overall area consumption.This research was supported by the ERDF Operational Program of Catalonia 2014-2020, with a grant from the Spanish State Research Agency [PID2019-107255GB] and with DRAC project [001-P-001723], by the grant [PID2019-107255G-C21] funded by MCIN/AEI/ 10.13039/501100011033, by the Generalitat de Catalunya [2017-SGR-1328], and by Lenovo-BSC Contract-Framework (2020). The Spanish Ministry of Economy, Industry and Competitiveness has partially supported M. Doblas through an FPU fellowship [FPU20-04076] and M. Moreto through a Ramon y Cajal fellowship [RYC-2016-21104].Peer ReviewedPostprint (author's final draft

    Innovative intelligent sensors to objectively understand exercise interventions for older adults

    Get PDF
    The population of most western countries is ageing and, therefore, the ageing issue now matters more than ever. According to the reports of the United Nations in 2017, there were a total of 15.8 million (26.9%) people over 60 years of age in the United Kindom, and the numbers are projected to reach 23.5 million (31.5%) by 2050. Spending on medical treatment and healthcare for older adults accounts for two-fifths of the UK National Health Service (NHS) budget. Keeping older people healthy is a challenge. In general, exercise is believed to benefit both mental and physical health. Specifically, resistance band exercises are proven by many studies that they have potentially positive effects on both mental and physical health. However, treatment using resistance band exercise is usually done in unmonitored environments, such as at home or in a rehabilitation centre; therefore, the exercise cannot be measured and/or quantified accurately. Despite many years of research, the true effectiveness of resistance band exercises remains unclear. [Continues.]</div

    Runtime Adaptation in Embedded Computing Systems using Markov Decision Processes

    Get PDF
    During the design and implementation of embedded computing systems (ECSs), engineers must make assumptions on how the system will be used after being built and deployed. Traditionally, these important decisions were made at design time for a fleet of ECSs prior to deployment. In contrast to this approach, this research explores and develops techniques to enable adaptation of ECSs at runtime to the environments and applications in which they operate. Adaptation is enabled such that the usage assumptions and performance optimization decisions can be made autonomously at runtime in the deployed system. This thesis utilizes Markov Decision Processes (MDPs), a powerful and well established mathematical framework used for decision making under uncertainty, to control computing systems at runtime. The resulting control is performed in ways that are more dynamic, robust and adaptable than alternatives in many scenarios. The techniques developed in this thesis are first applied to a reconfigurable embedded digital signal processing system. In this effort, several challenges are encountered and resolved using novel approaches. Through extensive simulations and a prototype implementation, the robustness of the adaptation is demonstrated in comparison with the prior state-of-the-art. The thesis continues by developing an efficient algorithm for conversion of MDP models to actionable control policies - a required step known as solving the MDP. The solver algorithm is developed in the context of ECSs that contain general purpose embedded GPUs (graphics processing units). The novel solver algorithm, Sparse Parallel Value Iteration (SPVI), makes use of the parallel processing capabilities provided by such GPUs, and also exploits the sparsity that typically exists in MDPs when used to model and control ECSs. To extend the applicability of the runtime adaptation techniques to smaller and more strictly resource constrained ECSs, another solver - Sparse Value Iteration (SVI) is developed for use on microcontrollers. The method is explored in a detailed case study involving a cellular (LTE-M) connected sensor that adapts to varying communications profiles. The case study reveals that the proposed adaptation framework outperforms a competing approach based on Reinforcement Learning (RL) in terms of robustness and adaptation, while consuming comparable resource requirements. Finally, the thesis concludes by analyzing the various logistical challenges that exist when deploying MDPs on ECSs. In response to these challenges, the thesis contributes an open source software package to the engineering community. The package contains libraries of MDP solvers, parsers, datasets and reference solutions, which provide a comprehensive infrastructure for exploring the trade-offs among existing embedded MDP techniques, and experimenting with novel approaches

    Signal Processing for Early Warning Arrhythmia Detection and Survival Prediction for Clinical Decision

    Get PDF
    According to the British Heart Foundation, UK, there is a population of around 7 million living in the UK with heart and circulatory diseases; about 25% of all the deaths in the UK are caused by cardiovascular diseases and more than 30,000 people a year suffer cardiac arrest out-of-hospital. As people all over the world, continue to live busy and stressful lives, a vast majority of people start showing cardiac arrhythmia-related symptoms which, if not treated in time may lead to a serious heart condition or even sudden cardiac death. To identify the early-warning signs in cardiac arrhythmia, methods to identify the precursors to fatal arrhythmia were developed in this research study, using a wearable kit. To enable accurate classification between arrhythmic beats, novel feature extraction algorithms using spectral components were developed. Often a fatal cardiac arrhythmia, or a serious injury, may lead to trauma and in such situations, it becomes imperative that the critical care teams have adequate information about the patient’s health status at remote location following an ambulatory response. A real-time trauma scoring algorithm was developed, and correlation and regression analyses were performed to arrive at these scores using the physiological parameters and vital signs. It was found that with appropriate feature extraction algorithms, supervised learning classifiers could identify the precursors to arrhythmia in real time and on a resource-constrained device, regardless of time and location. The trauma scoring algorithm, implemented using ICU patients’ dataset, produced values that agreed with the patients’ status and events could be logged to electronic health records using standard clinical coding systems. It could, therefore, be concluded that regardless of situation and location of an individual, fatal arrhythmia and trauma events could be identified ahead of time before reaching a state of emergency
    corecore