Search CORE

278 research outputs found

ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

Author: Balas Robert
Bambini Giovanni
Bartolini Andrea
Benini Luca
Ciani Maicol
del Vecchio Antonio
Ottaviano Alessandro
Rossi Davide
Publication venue
Publication date: 19/06/2023
Field of study

High-Performance Computing (HPC) processors are nowadays integrated Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3% the plant's thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.Comment: 33 pages, 11 figure

arXiv.org e-Print Archive

PATIENT-SPECIFIC CONTROLLER FOR AN IMPLANTABLE ARTIFICIAL PANCREAS

Author: HO YICK WAI YVONNE AUDREY
Publication venue
Publication date: 07/01/2016
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Custom optimization algorithms for efficient hardware implementation

Author: A. Constantinides
Eric C. Kerrigan
Juan Luis Jerez
Supervised George
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2013
Field of study

The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

CiteSeerX

Spiral - Imperial College Digital Repository

Embedded Processor Selection/Performance Estimation using FPGA-based Profiling

Author: Obeidat Fadi
Publication venue: VCU Scholars Compass
Publication date: 26/07/2010
Field of study

In embedded systems, modeling the performance of the candidate processor architectures is very important to enable the designer to estimate the capability of each architecture against the target application. Considering the large number of available embedded processors, the need has increased for building an infrastructure by which it is possible to estimate the performance of a given application on a given processor with a minimum of time and resources. This dissertation presents a framework that employs the softcore MicroBlaze processor as a reference architecture where FPGA-based profiling is implemented to extract the functional statistics that characterize the target application. Linear regression analysis is implemented for mapping the functional statistics of the target application to the performance of the candidate processor architecture. Hence, this approach does not require running the target application on each candidate processor; instead, it is run only on the reference processor which allows testing many processor architectures in very short time

VCU Scholars Compass

A federated learning framework for the next-generation machine learning systems

Author: Costa Diogo André Veiga
Publication venue
Publication date: 17/03/2022
Field of study

Dissertação de mestrado em Engenharia Eletrónica Industrial e Computadores (especialização em Sistemas Embebidos e Computadores)The end of Moore's Law aligned with rising concerns about data privacy is forcing machine learning (ML) to shift from the cloud to the deep edge, near to the data source. In the next-generation ML systems, the inference and part of the training process will be performed right on the edge, while the cloud will be responsible for major ML model updates. This new computing paradigm, referred to by academia and industry researchers as federated learning, alleviates the cloud and network infrastructure while increasing data privacy. Recent advances have made it possible to efficiently execute the inference pass of quantized artificial neural networks on Arm Cortex-M and RISC-V (RV32IMCXpulp) microcontroller units (MCUs). Nevertheless, the training is still confined to the cloud, imposing the transaction of high volumes of private data over a network. To tackle this issue, this MSc thesis makes the first attempt to run a decentralized training in Arm Cortex-M MCUs. To port part of the training process to the deep edge is proposed L-SGD, a lightweight version of the stochastic gradient descent optimized for maximum speed and minimal memory footprint on Arm Cortex-M MCUs. The L-SGD is 16.35x faster than the TensorFlow solution while registering a memory footprint reduction of 13.72%. This comes at the cost of a negligible accuracy drop of only 0.12%. To merge local model updates returned by edge devices this MSc thesis proposes R-FedAvg, an implementation of the FedAvg algorithm that reduces the impact of faulty model updates returned by malicious devices.O fim da Lei de Moore aliado às crescentes preocupações sobre a privacidade dos dados gerou a necessidade de migrar as aplicações de Machine Learning (ML) da cloud para o edge, perto da fonte de dados. Na próxima geração de sistemas ML, a inferência e parte do processo de treino será realizada diretamente no edge, enquanto que a cloud será responsável pelas principais atualizações do modelo ML. Este novo paradigma informático, referido pelos investigadores académicos e industriais como treino federativo, diminui a sobrecarga na cloud e na infraestrutura de rede, ao mesmo tempo que aumenta a privacidade dos dados. Avanços recentes tornaram possível a execução eficiente do processo de inferência de redes neurais artificiais quantificadas em microcontroladores Arm Cortex-M e RISC-V (RV32IMCXpulp). No entanto, o processo de treino continua confinado à cloud, impondo a transação de grandes volumes de dados privados sobre uma rede. Para abordar esta questão, esta dissertação faz a primeira tentativa de realizar um treino descentralizado em microcontroladores Arm Cortex-M. Para migrar parte do processo de treino para o edge é proposto o L-SGD, uma versão lightweight do tradicional método stochastic gradient descent (SGD), otimizada para uma redução de latência do processo de treino e uma redução de recursos de memória nos microcontroladores Arm Cortex-M. O L-SGD é 16,35x mais rápido do que a solução disponibilizada pelo TensorFlow, ao mesmo tempo que regista uma redução de utilização de memória de 13,72%. O custo desta abordagem é desprezível, sendo a perda de accuracy do modelo de apenas 0,12%. Para fundir atualizações de modelos locais devolvidas por dispositivos do edge, é proposto o RFedAvg, uma implementação do algoritmo FedAvg que reduz o impacto de atualizações de modelos não contributivos devolvidos por dispositivos maliciosos

Universidade do Minho: RepositoriUM

Reliability and Security Assessment of Modern Embedded Devices

Author: RUOSPO ANNACHIARA
Publication venue: country:Italy
Publication date: 05/07/2022
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Recommended from our members

High efficiency smart voltage regulating module for green mobile computing

Author: Tapou Monaf Sabri
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2014
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In this thesis a design for a smart high efficiency voltage regulating module capable of supplying the core of modern microprocessors incorporating dynamic voltage and frequency scaling (DVS) capability is accomplished using a RISC based microcontroller to facilitate all the functions required to control, protect, and supply the core with the required variable operating voltage as set by the DVS management system. Normally voltage regulating modules provide maximum power efficiency at designed peak load, and the efficiency falls off as the load moves towards lesser values. A mathematical model has been derived for the main converter and small signal analysis has been performed in order to determine system operation stability and select a control scheme that would improve converter operation response to transients and not requiring intense computational power to realize. A Simulation model was built using Matlab/Simulink and after experimenting with tuned PID controller and fuzzy logic controllers, a simple fuzzy logic control scheme was selected to control the pulse width modulated converter and several methods were devised to reduce the requirements for computational power making the whole system operation realizable using a low power RISC based microcontroller. The same microcontroller provides circuit adaptations operation in addition to providing protection to load in terms of over voltage and over current protection. A novel circuit technique and operation control scheme enables the designed module to selectively change some of the circuit elements in the main pulse width modulated buck converter so as to improve efficiency over a wider range of loads. In case of very light loads as the case when the device goes into standby, sleep or hibernation mode, a secondary converter starts operating and the main converter stops. The secondary converter adapts a different operation scheme using switched capacitor technique which provides high efficiency at low load currents. A fuzzy logic control scheme was chosen for the main converter for its lighter computational power requirement promoting implementation using ultra low power embedded controllers. Passive and active components were carefully selected to augment operational efficiency. These aspects enabled the designed voltage regulating module to operate with efficiency improvement in off peak load region in the range of 3% to 5%. At low loads as the case when the computer system goes to standby or sleep mode, the efficiency improvent is better than 13% which will have noticeable contribution in extending battery run time thus contributing to lowering the carbon footprint of human consumption

Brunel University Research Archive

Alternative vehicle electronic architecture for individual wheel control

Author: Pan-Ngum Setha
Publication venue
Publication date
Field of study

Electronic control systems have become an integral part of the modern vehicle and their installation rate is still on a sharp rise. Their application areas range from powertrain, chassis and body control to entertainment. Each system is conventionally control led by a centralised controller with hard-wired links to sensors and actuators. As systems have become more complex, a rise in the number of system components and amount of wiring harness has followed. This leads to serious problems on safety, reliability and space limitation. Different networking and vehicle electronic architectures have been developed by others to ease these problems. The thesis proposes an alternative architecture namely Distributed Wheel Architecture, for its potential benefits in terms of vehicle dynamics, safety and ease of functional addition. The architecture would have a networked controller on each wheel to perform its dynamic control including braking, suspension and steering. The project involves conducting a preliminary study and comparing the proposed architecture with four alternative existing or high potential architectures. The areas of study are functionality, complexity, and reliability. Existing ABS, active suspension and four wheel steering systems are evaluated in this work by simulation of their operations using road test data. They are used as exemplary systems, for modelling of the new electronic architecture together with the four alternatives. A prediction technique is developed, based on the derivation of software pseudo code from system specifications, to estimate the microcontroller specifications of all the system ECUs. The estimate indicates the feasibility of implementing the architectures using current microcontrollers. Message transfer on the Controller Area Network (CAN) of each architecture is simulated to find its associated delays, and hence the feasibility of installing CAN in the architectures. Architecture component costs are estimated from the costs of wires, ECUs, sensors and actuators. The number of wires is obtained from the wiring models derived from exemplary system data. ECU peripheral component counts are estimated from their statistical plot against the number of ECU pins of collected ECUs. Architecture component reliability is estimated based on two established reliability handbooks. The results suggest that all of the five architectures could be implemented using present microcontrollers. In addition, critical data transfer via CAN is made within time limits under current levels of message load, indicating the possibility of installing CAN in these architectures. The proposed architecture is expected to· be costlier in terms of components than the rest of the architectures, while it is among the leaders for wiring weight saving. However, it is expected to suffer from a relatively higher probability of system component failure. The proposed architecture is found not economically viable at present, but shows potential in reducing vehicle wire and weight problems

Warwick Research Archives Portal Repository

A survey on run-time power monitors at the edge

Author: Andrea Galimberti
Davide Zoni
William Fornaciari
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2023
Field of study

Effectively managing energy and power consumption is crucial to the success of the design of any computing system, helping mitigate the efficiency obstacles given by the downsizing of the systems while also being a valuable step towards achieving green and sustainable computing. The quality of energy and power management is strongly affected by the prompt availability of reliable and accurate information regarding the power consumption for the different parts composing the target monitored system. At the same time, effective energy and power management are even more critical within the field of devices at the edge, which exponentially proliferated within the past decade with the digital revolution brought by the Internet of things. This manuscript aims to provide a comprehensive conceptual framework to classify the different approaches to implementing run-time power monitors for edge devices that appeared in literature, leading the reader toward the solutions that best fit their application needs and the requirements and constraints of their target computing platforms. Run-time power monitors at the edge are analyzed according to both the power modeling and monitoring implementation aspects, identifying specific quality metrics for both in order to create a consistent and detailed taxonomy that encompasses the vast existing literature and provides a sound reference to the interested reader

Archivio istituzionale della ricerca - Politecnico di Milano

Neural network computing using on-chip accelerators

Author: Eldridge Schuyler
Publication venue
Publication date: 05/11/2016
Field of study

The use of neural networks, machine learning, or artificial intelligence, in its broadest and most controversial sense, has been a tumultuous journey involving three distinct hype cycles and a history dating back to the 1960s. Resurgent, enthusiastic interest in machine learning and its applications bolsters the case for machine learning as a fundamental computational kernel. Furthermore, researchers have demonstrated that machine learning can be utilized as an auxiliary component of applications to enhance or enable new types of computation such as approximate computing or automatic parallelization. In our view, machine learning becomes not the underlying application, but a ubiquitous component of applications. This view necessitates a different approach towards the deployment of machine learning computation that spans not only hardware design of accelerator architectures, but also user and supervisor software to enable the safe, simultaneous use of machine learning accelerator resources. In this dissertation, we propose a multi-transaction model of neural network computation to meet the needs of future machine learning applications. We demonstrate that this model, encompassing a decoupled backend accelerator for inference and learning from hardware and software for managing neural network transactions can be achieved with low overhead and integrated with a modern RISC-V microprocessor. Our extensions span user and supervisor software and data structures and, coupled with our hardware, enable multiple transactions from different address spaces to execute simultaneously, yet safely. Together, our system demonstrates the utility of a multi-transaction model to increase energy efficiency improvements and improve overall accelerator throughput for machine learning applications

Boston University Institutional Repository (OpenBU)