High-Performance Computing (HPC) processors are nowadays integrated
Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power
and thermal control strategies. To efficiently satisfy real-time multi-input
multi-output (MIMO) optimal power requirements, high-end processors integrate
an on-die power controller system (PCS).
While traditional PCSs are based on a simple microcontroller (MCU)-class
core, more scalable and flexible PCS architectures are required to support
advanced MIMO control algorithms for managing the ever-increasing number of
cores, power states, and process, voltage, and temperature variability.
This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS
platform consisting of a single-core MCU with fast interrupt handling coupled
with a scalable multi-core programmable cluster accelerator and a specialized
DMA engine for the parallel acceleration of real-time power management
policies. ControlPULP relies on FreeRTOS to schedule a reactive power control
firmware (PCF) application layer.
We demonstrate ControlPULP in a power management use-case targeting a
next-generation 72-core HPC processor. We first show that the multi-core
cluster accelerates the PCF, achieving 4.9x speedup compared to single-core
execution, enabling more advanced power management algorithms within the
control hyper-period at a shallow area overhead, about 0.1% the area of a
modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based,
closed-loop emulation framework that leverages the heterogeneous SoCs paradigm,
achieving DVFS tracking with a mean deviation within 3% the plant's thermal
design power (TDP) against a software-equivalent model-in-the-loop approach.
Finally, we show that the proposed PCF compares favorably with an
industry-grade control algorithm under computational-intensive workloads.Comment: 33 pages, 11 figure