173 research outputs found
Ultra-Fast and Accurate Simulation for Large-Scale Many-Core Processors
The proposed methods enable ultra-fast and accurate emulations of large-scale NoC architectures with up to thousands of nodes using only on-chip resources of a single FPGA. The evaluation results show that more than 5,000× simulation speedup over BookSim, one of the most widely used software-based NoC simulators, is achieved while the simulation accuracy is maintained. i
ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation
High-Performance Computing (HPC) processors are nowadays integrated
Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power
and thermal control strategies. To efficiently satisfy real-time multi-input
multi-output (MIMO) optimal power requirements, high-end processors integrate
an on-die power controller system (PCS).
While traditional PCSs are based on a simple microcontroller (MCU)-class
core, more scalable and flexible PCS architectures are required to support
advanced MIMO control algorithms for managing the ever-increasing number of
cores, power states, and process, voltage, and temperature variability.
This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS
platform consisting of a single-core MCU with fast interrupt handling coupled
with a scalable multi-core programmable cluster accelerator and a specialized
DMA engine for the parallel acceleration of real-time power management
policies. ControlPULP relies on FreeRTOS to schedule a reactive power control
firmware (PCF) application layer.
We demonstrate ControlPULP in a power management use-case targeting a
next-generation 72-core HPC processor. We first show that the multi-core
cluster accelerates the PCF, achieving 4.9x speedup compared to single-core
execution, enabling more advanced power management algorithms within the
control hyper-period at a shallow area overhead, about 0.1% the area of a
modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based,
closed-loop emulation framework that leverages the heterogeneous SoCs paradigm,
achieving DVFS tracking with a mean deviation within 3% the plant's thermal
design power (TDP) against a software-equivalent model-in-the-loop approach.
Finally, we show that the proposed PCF compares favorably with an
industry-grade control algorithm under computational-intensive workloads.Comment: 33 pages, 11 figure
An FPGA-based infrastructure for fine-grained DVFS analysis in high-performance embedded systems
Emerging technologies provide SoCs with fine-grained DVFS capabilities both in space (number of domains) and time (transients in the order of tens of nanoseconds). Analyzing these systems requires cycle-accurate accounting of rapidly-changing dynamics and complex interactions among accelerators, interconnect, memory, and OS. We present an FPGA-based infrastructure that facilitates such analyses for high-performance embedded systems. We show how our infrastructure can be used to first generate SoCs with loosely-coupled accelerators, and then perform design-space exploration considering several DVFS policies under full-system workload scenarios, sweeping spatial and temporal domain granularity
A RISC-V-based FPGA Overlay to Simplify Embedded Accelerator Deployment
Modern cyber-physical systems (CPS) are increasingly adopting heterogeneous systems-on-chip (HeSoCs) as a computing platform to satisfy the demands of their sophisticated workloads. FPGA-based HeSoCs can reach high performance and energy efficiency at the cost of increased design complexity. High-Level Synthesis (HLS) can ease IP design, but automated tools still lack the maturity to efficiently and easily tackle system-level integration of the many hardware and software blocks included in a modern CPS. We present an innovative hardware overlay offering plug-and-play integration of HLS-compiled or handcrafted acceleration IPs thanks to a customizable wrapper attached to the overlay interconnect and providing shared-memory communication to the overlay cores. The latter are based on the open RISC-V ISA and offer simplified software management of the acceleration IP. Deploying the proposed overlay on a Xilinx ZU9EG shows ≈ 20% LUT usage and ≈ 4× speedup compared to program execution on the ARM host core
Recommended from our members
Next Generation Silicon Photonic Transceiver: From Device Innovation to System Analysis
Silicon photonics is recognized as a disruptive technology that has the potential to reshape many application areas, for example, data center communication, telecommunications, high-performance computing, and sensing. The key capability that silicon photonics offers is to leverage CMOS-style design, fabrication, and test infrastructure to build compact, energy-efficient, and high-performance integrated photonic systems-on- chip at low cost. As the need to squeeze more data into a given bandwidth and a given footprint increases, silicon photonics becomes more and more promising. This work develops and demonstrates novel devices, methodologies, and architectures to resolve the challenges facing the next-generation silicon photonic transceivers. The first part of this thesis focuses on the topology optimization of passive silicon photonic devices. Specifically, a novel device optimization methodology - particle swarm optimization in conjunction with 3D finite-difference time-domain (FDTD), has been proposed and proven to be an effective way to design a wide range of passive silicon photonic devices. We demonstrate a polarization rotator and a 90â—¦ optical hybrid for polarization-diversity and phase-diversity communications - two important schemes to increase the communication capacity by increasing the spectral efficiency. The second part of this thesis focuses on the design and characterization of the next- generation silicon photonic transceivers. We demonstrate a polarization-insensitive WDM receiver with an aggregate data rate of 160 Gb/s. This receiver adopts a novel architecture which effectively reduces the polarization-dependent loss. In addition, we demonstrate a III-V/silicon hybrid external cavity laser with a tuning range larger than 60 nm in the C-band on a silicon-on-insulator platform. A III-V semiconductor gain chip is hybridized into the silicon chip by edge-coupling to the silicon chip. The demonstrated packaging method requires only passive alignment and is thus suitable for high-volume production. We also demonstrate all silicon-photonics-based transmission of 34 Gbaud (272 Gb/s) dual-polarization 16-QAM using our integrated laser and silicon photonic coherent transceiver. The results show no additional penalty compared to commercially available narrow linewidth tunable lasers. The last part of this thesis focuses on the chip-scale optical interconnect and presents two different types of reconfigurable memory interconnects for multi-core many-memory computing systems. These reconfigurable interconnects can effectively alleviate the memory access issues, such as non-uniform memory access, and Network-on-Chip (NoC) hot-spots that plague the many-memory computing systems by dynamically directing the available memory bandwidth to the required memory interface
- …