278 research outputs found
ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation
High-Performance Computing (HPC) processors are nowadays integrated
Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power
and thermal control strategies. To efficiently satisfy real-time multi-input
multi-output (MIMO) optimal power requirements, high-end processors integrate
an on-die power controller system (PCS).
While traditional PCSs are based on a simple microcontroller (MCU)-class
core, more scalable and flexible PCS architectures are required to support
advanced MIMO control algorithms for managing the ever-increasing number of
cores, power states, and process, voltage, and temperature variability.
This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS
platform consisting of a single-core MCU with fast interrupt handling coupled
with a scalable multi-core programmable cluster accelerator and a specialized
DMA engine for the parallel acceleration of real-time power management
policies. ControlPULP relies on FreeRTOS to schedule a reactive power control
firmware (PCF) application layer.
We demonstrate ControlPULP in a power management use-case targeting a
next-generation 72-core HPC processor. We first show that the multi-core
cluster accelerates the PCF, achieving 4.9x speedup compared to single-core
execution, enabling more advanced power management algorithms within the
control hyper-period at a shallow area overhead, about 0.1% the area of a
modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based,
closed-loop emulation framework that leverages the heterogeneous SoCs paradigm,
achieving DVFS tracking with a mean deviation within 3% the plant's thermal
design power (TDP) against a software-equivalent model-in-the-loop approach.
Finally, we show that the proposed PCF compares favorably with an
industry-grade control algorithm under computational-intensive workloads.Comment: 33 pages, 11 figure
PATIENT-SPECIFIC CONTROLLER FOR AN IMPLANTABLE ARTIFICIAL PANCREAS
Ph.DDOCTOR OF PHILOSOPH
Custom optimization algorithms for efficient hardware implementation
The focus is on real-time optimal decision making with application in advanced control
systems. These computationally intensive schemes, which involve the repeated solution of
(convex) optimization problems within a sampling interval, require more efficient computational
methods than currently available for extending their application to highly dynamical
systems and setups with resource-constrained embedded computing platforms.
A range of techniques are proposed to exploit synergies between digital hardware, numerical
analysis and algorithm design. These techniques build on top of parameterisable
hardware code generation tools that generate VHDL code describing custom computing
architectures for interior-point methods and a range of first-order constrained optimization
methods. Since memory limitations are often important in embedded implementations we
develop a custom storage scheme for KKT matrices arising in interior-point methods for
control, which reduces memory requirements significantly and prevents I/O bandwidth
limitations from affecting the performance in our implementations. To take advantage of
the trend towards parallel computing architectures and to exploit the special characteristics
of our custom architectures we propose several high-level parallel optimal control
schemes that can reduce computation time. A novel optimization formulation was devised
for reducing the computational effort in solving certain problems independent of the computing
platform used. In order to be able to solve optimization problems in fixed-point
arithmetic, which is significantly more resource-efficient than floating-point, tailored linear
algebra algorithms were developed for solving the linear systems that form the computational
bottleneck in many optimization methods. These methods come with guarantees
for reliable operation. We also provide finite-precision error analysis for fixed-point implementations
of first-order methods that can be used to minimize the use of resources while
meeting accuracy specifications. The suggested techniques are demonstrated on several
practical examples, including a hardware-in-the-loop setup for optimization-based control
of a large airliner.Open Acces
Embedded Processor Selection/Performance Estimation using FPGA-based Profiling
In embedded systems, modeling the performance of the candidate processor architectures is very important to enable the designer to estimate the capability of each architecture against the target application. Considering the large number of available embedded processors, the need has increased for building an infrastructure by which it is possible to estimate the performance of a given application on a given processor with a minimum of time and resources. This dissertation presents a framework that employs the softcore MicroBlaze processor as a reference architecture where FPGA-based profiling is implemented to extract the functional statistics that characterize the target application. Linear regression analysis is implemented for mapping the functional statistics of the target application to the performance of the candidate processor architecture. Hence, this approach does not require running the target application on each candidate processor; instead, it is run only on the reference processor which allows testing many processor architectures in very short time
A federated learning framework for the next-generation machine learning systems
Dissertação de mestrado em Engenharia Eletrónica Industrial e Computadores (especialização em Sistemas Embebidos e Computadores)The end of Moore's Law aligned with rising concerns about data privacy is forcing machine learning
(ML) to shift from the cloud to the deep edge, near to the data source. In the next-generation ML systems,
the inference and part of the training process will be performed right on the edge, while the cloud will be
responsible for major ML model updates. This new computing paradigm, referred to by academia and
industry researchers as federated learning, alleviates the cloud and network infrastructure while
increasing data privacy. Recent advances have made it possible to efficiently execute the inference pass
of quantized artificial neural networks on Arm Cortex-M and RISC-V (RV32IMCXpulp) microcontroller units
(MCUs). Nevertheless, the training is still confined to the cloud, imposing the transaction of high volumes
of private data over a network.
To tackle this issue, this MSc thesis makes the first attempt to run a decentralized training in Arm
Cortex-M MCUs. To port part of the training process to the deep edge is proposed L-SGD, a lightweight
version of the stochastic gradient descent optimized for maximum speed and minimal memory footprint
on Arm Cortex-M MCUs. The L-SGD is 16.35x faster than the TensorFlow solution while registering a
memory footprint reduction of 13.72%. This comes at the cost of a negligible accuracy drop of only 0.12%.
To merge local model updates returned by edge devices this MSc thesis proposes R-FedAvg, an
implementation of the FedAvg algorithm that reduces the impact of faulty model updates returned by
malicious devices.O fim da Lei de Moore aliado às crescentes preocupaçÔes sobre a privacidade dos dados gerou a
necessidade de migrar as aplicaçÔes de Machine Learning (ML) da cloud para o edge, perto da fonte de
dados. Na prĂłxima geração de sistemas ML, a inferĂȘncia e parte do processo de treino serĂĄ realizada
diretamente no edge, enquanto que a cloud serå responsåvel pelas principais atualizaçÔes do modelo
ML. Este novo paradigma informåtico, referido pelos investigadores académicos e industriais como treino
federativo, diminui a sobrecarga na cloud e na infraestrutura de rede, ao mesmo tempo que aumenta a
privacidade dos dados. Avanços recentes tornaram possĂvel a execução eficiente do processo de
inferĂȘncia de redes neurais artificiais quantificadas em microcontroladores Arm Cortex-M e RISC-V
(RV32IMCXpulp). No entanto, o processo de treino continua confinado à cloud, impondo a transação de
grandes volumes de dados privados sobre uma rede.
Para abordar esta questão, esta dissertação faz a primeira tentativa de realizar um treino
descentralizado em microcontroladores Arm Cortex-M. Para migrar parte do processo de treino para o
edge é proposto o L-SGD, uma versão lightweight do tradicional método stochastic gradient descent
(SGD), otimizada para uma redução de latĂȘncia do processo de treino e uma redução de recursos de
memória nos microcontroladores Arm Cortex-M. O L-SGD é 16,35x mais råpido do que a solução
disponibilizada pelo TensorFlow, ao mesmo tempo que regista uma redução de utilização de memória
de 13,72%. O custo desta abordagem Ă© desprezĂvel, sendo a perda de accuracy do modelo de apenas
0,12%. Para fundir atualizaçÔes de modelos locais devolvidas por dispositivos do edge, é proposto o RFedAvg, uma implementação do algoritmo FedAvg que reduz o impacto de atualizaçÔes de modelos não
contributivos devolvidos por dispositivos maliciosos
Reliability and Security Assessment of Modern Embedded Devices
L'abstract Ăš presente nell'allegato / the abstract is in the attachmen
Recommended from our members
High efficiency smart voltage regulating module for green mobile computing
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In this thesis a design for a smart high efficiency voltage regulating module capable of supplying the core of modern microprocessors incorporating dynamic voltage and frequency scaling (DVS) capability is accomplished using a RISC based microcontroller to facilitate all the functions required to control, protect, and supply the core with the required variable operating voltage as set by the DVS management system. Normally voltage regulating modules provide maximum power efficiency at designed peak load, and the efficiency falls off as the load moves towards lesser values. A mathematical model has been derived for the main converter and small signal analysis has been performed in order to determine system operation stability and select a control scheme that would improve converter operation response to transients and not requiring intense computational power to realize. A Simulation model was built using Matlab/Simulink and after experimenting with tuned PID controller and fuzzy logic controllers, a simple fuzzy logic control scheme was selected to control the pulse width modulated converter and several methods were devised to reduce the requirements for computational power making the whole system operation realizable using a low power RISC based microcontroller. The same microcontroller provides circuit adaptations operation in addition to providing protection to load in terms of over voltage and over current protection. A novel circuit technique and operation control scheme enables the designed module to selectively change some of the circuit elements in the main pulse width modulated buck converter so as to improve efficiency over a wider range of loads. In case of very light loads as the case when the device goes into standby, sleep or hibernation mode, a secondary converter starts operating and the main converter stops. The secondary converter adapts a different operation scheme using switched capacitor technique which provides high efficiency at low load currents. A fuzzy logic control scheme was chosen for the main converter for its lighter computational power requirement promoting implementation using ultra low power embedded controllers. Passive and active components were carefully selected to augment operational efficiency. These aspects enabled the designed voltage regulating module to operate with efficiency improvement in off peak load region in the range of 3% to 5%. At low loads as the case when the computer system goes to standby or sleep mode, the efficiency improvent is better than 13% which will have noticeable contribution in extending battery run time thus contributing to lowering the carbon footprint of human consumption
Alternative vehicle electronic architecture for individual wheel control
Electronic control systems have become an integral part of the modern vehicle and
their installation rate is still on a sharp rise. Their application areas range from
powertrain, chassis and body control to entertainment. Each system is conventionally
control led by a centralised controller with hard-wired links to sensors and actuators. As
systems have become more complex, a rise in the number of system components and
amount of wiring harness has followed. This leads to serious problems on safety,
reliability and space limitation. Different networking and vehicle electronic architectures
have been developed by others to ease these problems. The thesis proposes an alternative
architecture namely Distributed Wheel Architecture, for its potential benefits in terms of
vehicle dynamics, safety and ease of functional addition. The architecture would have a
networked controller on each wheel to perform its dynamic control including braking,
suspension and steering.
The project involves conducting a preliminary study and comparing the proposed
architecture with four alternative existing or high potential architectures. The areas of
study are functionality, complexity, and reliability.
Existing ABS, active suspension and four wheel steering systems are evaluated in
this work by simulation of their operations using road test data. They are used as
exemplary systems, for modelling of the new electronic architecture together with the
four alternatives. A prediction technique is developed, based on the derivation of software
pseudo code from system specifications, to estimate the microcontroller specifications of
all the system ECUs. The estimate indicates the feasibility of implementing the
architectures using current microcontrollers. Message transfer on the Controller Area
Network (CAN) of each architecture is simulated to find its associated delays, and hence
the feasibility of installing CAN in the architectures. Architecture component costs are
estimated from the costs of wires, ECUs, sensors and actuators. The number of wires is
obtained from the wiring models derived from exemplary system data. ECU peripheral
component counts are estimated from their statistical plot against the number of ECU
pins of collected ECUs. Architecture component reliability is estimated based on two
established reliability handbooks.
The results suggest that all of the five architectures could be implemented using
present microcontrollers. In addition, critical data transfer via CAN is made within time
limits under current levels of message load, indicating the possibility of installing CAN in
these architectures. The proposed architecture is expected to· be costlier in terms of
components than the rest of the architectures, while it is among the leaders for wiring
weight saving. However, it is expected to suffer from a relatively higher probability of
system component failure.
The proposed architecture is found not economically viable at present, but shows
potential in reducing vehicle wire and weight problems
A survey on run-time power monitors at the edge
Effectively managing energy and power consumption is crucial to the success of the design of any computing system, helping mitigate the efficiency obstacles given by the downsizing of the systems while also being a valuable step towards achieving green and sustainable computing. The quality of energy and power management is strongly affected by the prompt availability of reliable and accurate information regarding the power consumption for the different parts composing the target monitored system. At the same time, effective energy and power management are even more critical within the field of devices at the edge, which exponentially proliferated within the past decade with the digital revolution brought by the Internet of things. This manuscript aims to provide a comprehensive conceptual framework to classify the different approaches to implementing run-time power monitors for edge devices that appeared in literature, leading the reader toward the solutions that best fit their application needs and the requirements and constraints of their target computing platforms. Run-time power monitors at the edge are analyzed according to both the power modeling and monitoring implementation aspects, identifying specific quality metrics for both in order to create a consistent and detailed taxonomy that encompasses the vast existing literature and provides a sound reference to the interested reader
Neural network computing using on-chip accelerators
The use of neural networks, machine learning, or artificial intelligence, in its broadest and most controversial sense, has been a tumultuous journey involving three distinct hype cycles and a history dating back to the 1960s. Resurgent, enthusiastic interest in machine learning and its applications bolsters the case for machine learning as a fundamental computational kernel. Furthermore, researchers have demonstrated that machine learning can be utilized as an auxiliary component of applications to enhance or enable new types of computation such as approximate computing or automatic parallelization. In our view, machine learning becomes not the underlying application, but a ubiquitous component of applications. This view necessitates a different approach towards the deployment of machine learning computation that spans not only hardware design of accelerator architectures, but also user and supervisor software to enable the safe, simultaneous use of machine learning accelerator resources.
In this dissertation, we propose a multi-transaction model of neural network computation to meet the needs of future machine learning applications. We demonstrate that this model, encompassing a decoupled backend accelerator for inference and learning from hardware and software for managing neural network transactions can be achieved with low overhead and integrated with a modern RISC-V microprocessor. Our extensions span user and supervisor software and data structures and, coupled with our hardware, enable multiple transactions from different address spaces to execute simultaneously, yet safely. Together, our system demonstrates the utility of a multi-transaction model to increase energy efficiency improvements and improve overall accelerator throughput for machine learning applications
- âŠ