278 research outputs found

    ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

    Full text link
    High-Performance Computing (HPC) processors are nowadays integrated Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3% the plant's thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.Comment: 33 pages, 11 figure

    PATIENT-SPECIFIC CONTROLLER FOR AN IMPLANTABLE ARTIFICIAL PANCREAS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

    Embedded Processor Selection/Performance Estimation using FPGA-based Profiling

    Get PDF
    In embedded systems, modeling the performance of the candidate processor architectures is very important to enable the designer to estimate the capability of each architecture against the target application. Considering the large number of available embedded processors, the need has increased for building an infrastructure by which it is possible to estimate the performance of a given application on a given processor with a minimum of time and resources. This dissertation presents a framework that employs the softcore MicroBlaze processor as a reference architecture where FPGA-based profiling is implemented to extract the functional statistics that characterize the target application. Linear regression analysis is implemented for mapping the functional statistics of the target application to the performance of the candidate processor architecture. Hence, this approach does not require running the target application on each candidate processor; instead, it is run only on the reference processor which allows testing many processor architectures in very short time

    A federated learning framework for the next-generation machine learning systems

    Get PDF
    Dissertação de mestrado em Engenharia EletrĂłnica Industrial e Computadores (especialização em Sistemas Embebidos e Computadores)The end of Moore's Law aligned with rising concerns about data privacy is forcing machine learning (ML) to shift from the cloud to the deep edge, near to the data source. In the next-generation ML systems, the inference and part of the training process will be performed right on the edge, while the cloud will be responsible for major ML model updates. This new computing paradigm, referred to by academia and industry researchers as federated learning, alleviates the cloud and network infrastructure while increasing data privacy. Recent advances have made it possible to efficiently execute the inference pass of quantized artificial neural networks on Arm Cortex-M and RISC-V (RV32IMCXpulp) microcontroller units (MCUs). Nevertheless, the training is still confined to the cloud, imposing the transaction of high volumes of private data over a network. To tackle this issue, this MSc thesis makes the first attempt to run a decentralized training in Arm Cortex-M MCUs. To port part of the training process to the deep edge is proposed L-SGD, a lightweight version of the stochastic gradient descent optimized for maximum speed and minimal memory footprint on Arm Cortex-M MCUs. The L-SGD is 16.35x faster than the TensorFlow solution while registering a memory footprint reduction of 13.72%. This comes at the cost of a negligible accuracy drop of only 0.12%. To merge local model updates returned by edge devices this MSc thesis proposes R-FedAvg, an implementation of the FedAvg algorithm that reduces the impact of faulty model updates returned by malicious devices.O fim da Lei de Moore aliado Ă s crescentes preocupaçÔes sobre a privacidade dos dados gerou a necessidade de migrar as aplicaçÔes de Machine Learning (ML) da cloud para o edge, perto da fonte de dados. Na prĂłxima geração de sistemas ML, a inferĂȘncia e parte do processo de treino serĂĄ realizada diretamente no edge, enquanto que a cloud serĂĄ responsĂĄvel pelas principais atualizaçÔes do modelo ML. Este novo paradigma informĂĄtico, referido pelos investigadores acadĂ©micos e industriais como treino federativo, diminui a sobrecarga na cloud e na infraestrutura de rede, ao mesmo tempo que aumenta a privacidade dos dados. Avanços recentes tornaram possĂ­vel a execução eficiente do processo de inferĂȘncia de redes neurais artificiais quantificadas em microcontroladores Arm Cortex-M e RISC-V (RV32IMCXpulp). No entanto, o processo de treino continua confinado Ă  cloud, impondo a transação de grandes volumes de dados privados sobre uma rede. Para abordar esta questĂŁo, esta dissertação faz a primeira tentativa de realizar um treino descentralizado em microcontroladores Arm Cortex-M. Para migrar parte do processo de treino para o edge Ă© proposto o L-SGD, uma versĂŁo lightweight do tradicional mĂ©todo stochastic gradient descent (SGD), otimizada para uma redução de latĂȘncia do processo de treino e uma redução de recursos de memĂłria nos microcontroladores Arm Cortex-M. O L-SGD Ă© 16,35x mais rĂĄpido do que a solução disponibilizada pelo TensorFlow, ao mesmo tempo que regista uma redução de utilização de memĂłria de 13,72%. O custo desta abordagem Ă© desprezĂ­vel, sendo a perda de accuracy do modelo de apenas 0,12%. Para fundir atualizaçÔes de modelos locais devolvidas por dispositivos do edge, Ă© proposto o RFedAvg, uma implementação do algoritmo FedAvg que reduz o impacto de atualizaçÔes de modelos nĂŁo contributivos devolvidos por dispositivos maliciosos

    Reliability and Security Assessment of Modern Embedded Devices

    Get PDF
    L'abstract Ăš presente nell'allegato / the abstract is in the attachmen

    Alternative vehicle electronic architecture for individual wheel control

    Get PDF
    Electronic control systems have become an integral part of the modern vehicle and their installation rate is still on a sharp rise. Their application areas range from powertrain, chassis and body control to entertainment. Each system is conventionally control led by a centralised controller with hard-wired links to sensors and actuators. As systems have become more complex, a rise in the number of system components and amount of wiring harness has followed. This leads to serious problems on safety, reliability and space limitation. Different networking and vehicle electronic architectures have been developed by others to ease these problems. The thesis proposes an alternative architecture namely Distributed Wheel Architecture, for its potential benefits in terms of vehicle dynamics, safety and ease of functional addition. The architecture would have a networked controller on each wheel to perform its dynamic control including braking, suspension and steering. The project involves conducting a preliminary study and comparing the proposed architecture with four alternative existing or high potential architectures. The areas of study are functionality, complexity, and reliability. Existing ABS, active suspension and four wheel steering systems are evaluated in this work by simulation of their operations using road test data. They are used as exemplary systems, for modelling of the new electronic architecture together with the four alternatives. A prediction technique is developed, based on the derivation of software pseudo code from system specifications, to estimate the microcontroller specifications of all the system ECUs. The estimate indicates the feasibility of implementing the architectures using current microcontrollers. Message transfer on the Controller Area Network (CAN) of each architecture is simulated to find its associated delays, and hence the feasibility of installing CAN in the architectures. Architecture component costs are estimated from the costs of wires, ECUs, sensors and actuators. The number of wires is obtained from the wiring models derived from exemplary system data. ECU peripheral component counts are estimated from their statistical plot against the number of ECU pins of collected ECUs. Architecture component reliability is estimated based on two established reliability handbooks. The results suggest that all of the five architectures could be implemented using present microcontrollers. In addition, critical data transfer via CAN is made within time limits under current levels of message load, indicating the possibility of installing CAN in these architectures. The proposed architecture is expected to· be costlier in terms of components than the rest of the architectures, while it is among the leaders for wiring weight saving. However, it is expected to suffer from a relatively higher probability of system component failure. The proposed architecture is found not economically viable at present, but shows potential in reducing vehicle wire and weight problems

    A survey on run-time power monitors at the edge

    Get PDF
    Effectively managing energy and power consumption is crucial to the success of the design of any computing system, helping mitigate the efficiency obstacles given by the downsizing of the systems while also being a valuable step towards achieving green and sustainable computing. The quality of energy and power management is strongly affected by the prompt availability of reliable and accurate information regarding the power consumption for the different parts composing the target monitored system. At the same time, effective energy and power management are even more critical within the field of devices at the edge, which exponentially proliferated within the past decade with the digital revolution brought by the Internet of things. This manuscript aims to provide a comprehensive conceptual framework to classify the different approaches to implementing run-time power monitors for edge devices that appeared in literature, leading the reader toward the solutions that best fit their application needs and the requirements and constraints of their target computing platforms. Run-time power monitors at the edge are analyzed according to both the power modeling and monitoring implementation aspects, identifying specific quality metrics for both in order to create a consistent and detailed taxonomy that encompasses the vast existing literature and provides a sound reference to the interested reader

    Neural network computing using on-chip accelerators

    Get PDF
    The use of neural networks, machine learning, or artificial intelligence, in its broadest and most controversial sense, has been a tumultuous journey involving three distinct hype cycles and a history dating back to the 1960s. Resurgent, enthusiastic interest in machine learning and its applications bolsters the case for machine learning as a fundamental computational kernel. Furthermore, researchers have demonstrated that machine learning can be utilized as an auxiliary component of applications to enhance or enable new types of computation such as approximate computing or automatic parallelization. In our view, machine learning becomes not the underlying application, but a ubiquitous component of applications. This view necessitates a different approach towards the deployment of machine learning computation that spans not only hardware design of accelerator architectures, but also user and supervisor software to enable the safe, simultaneous use of machine learning accelerator resources. In this dissertation, we propose a multi-transaction model of neural network computation to meet the needs of future machine learning applications. We demonstrate that this model, encompassing a decoupled backend accelerator for inference and learning from hardware and software for managing neural network transactions can be achieved with low overhead and integrated with a modern RISC-V microprocessor. Our extensions span user and supervisor software and data structures and, coupled with our hardware, enable multiple transactions from different address spaces to execute simultaneously, yet safely. Together, our system demonstrates the utility of a multi-transaction model to increase energy efficiency improvements and improve overall accelerator throughput for machine learning applications
    • 

    corecore