Search CORE

173 research outputs found

Ultra-Fast and Accurate Simulation for Large-Scale Many-Core Processors

Author: Thiem Van Chu
Publication venue
Publication date: 03/04/2020
Field of study

The proposed methods enable ultra-fast and accurate emulations of large-scale NoC architectures with up to thousands of nodes using only on-chip resources of a single FPGA. The evaluation results show that more than 5,000× simulation speedup over BookSim, one of the most widely used software-based NoC simulators, is achieved while the simulation accuracy is maintained. i

CiteSeerX

ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

Author: Balas Robert
Bambini Giovanni
Bartolini Andrea
Benini Luca
Ciani Maicol
del Vecchio Antonio
Ottaviano Alessandro
Rossi Davide
Publication venue
Publication date: 19/06/2023
Field of study

High-Performance Computing (HPC) processors are nowadays integrated Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3% the plant's thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.Comment: 33 pages, 11 figure

arXiv.org e-Print Archive

High Speed Cycle-Approximate Simulation of Embedded Cache-Incoherent and Coherent Chip-Multiprocessors

Author: Christopher Thompson
D Burger
D Chiou
Daniel Sanchez
E Argollo
ES Chung
J Chen
K Skadron
K Wang
M Monchiero
Miles Gould
MMK Martin
N Binkert
N Hardavellas
Nigel Topham
P Ren
PS Magnusson
SS Mukherjee
X Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2018
Field of study

Crossref

Edinburgh Research Explorer

Prototyping Methodologies and Design of Communication-centric Heterogeneous Many-core Architectures

Author: Masing Leonard Jannik
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen

Deadline, Energy and Buffer-Aware Task Mapping Optimization in NoC-Based SoCs Using Genetic Algorithms

Author: Alves da Silva Eduardo
Bruch Jaison Valmor
Soares Indrusiak Leandro
Zeferino Cesar Albenes
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2017
Field of study

White Rose Research Online

A NoC-based hybrid message-passing/shared-memory approach to CMP design

Author: Agarwal
Daemen
Forsell
Grecu
Karniadakis
Lorensen
Mario R. Casu
Massimo Ruo Roch
Maurizio Zamboni
Owens
Paulin
Radulescu
Sergio V. Tota
Snir
Tota
Publication venue: Elsevier
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

An FPGA-based infrastructure for fine-grained DVFS analysis in high-performance embedded systems

Author: Carloni Luca P.
Cota Emilio G.
Di Guglielmo Giuseppe
Mantovani Paolo
Pilato Christian
Shepard Ken
Tien Kevin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Emerging technologies provide SoCs with fine-grained DVFS capabilities both in space (number of domains) and time (transients in the order of tens of nanoseconds). Analyzing these systems requires cycle-accurate accounting of rapidly-changing dynamics and complex interactions among accelerators, interconnect, memory, and OS. We present an FPGA-based infrastructure that facilitates such analyses for high-performance embedded systems. We show how our infrastructure can be used to first generate SoCs with loosely-coupled accelerators, and then perform design-space exploration considering several DVFS policies under full-system workload scenarios, sweeping spatial and temporal domain granularity

Archivio istituzionale della ricerca - Politecnico di Milano

A RISC-V-based FPGA Overlay to Simplify Embedded Accelerator Deployment

Author: Bellocchi Gianluca
Capotondi Alessandro
Conti Francesco
Marongiu Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Modern cyber-physical systems (CPS) are increasingly adopting heterogeneous systems-on-chip (HeSoCs) as a computing platform to satisfy the demands of their sophisticated workloads. FPGA-based HeSoCs can reach high performance and energy efficiency at the cost of increased design complexity. High-Level Synthesis (HLS) can ease IP design, but automated tools still lack the maturity to efficiently and easily tackle system-level integration of the many hardware and software blocks included in a modern CPS. We present an innovative hardware overlay offering plug-and-play integration of HLS-compiled or handcrafted acceleration IPs thanks to a customizable wrapper attached to the overlay interconnect and providing shared-memory communication to the overlay cores. The latter are based on the open RISC-V ISA and offer simplified software management of the acceleration IP. Deploying the proposed overlay on a Xilinx ZU9EG shows ≈ 20% LUT usage and ≈ 4× speedup compared to program execution on the ARM host core

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Recommended from our members

Next Generation Silicon Photonic Transceiver: From Device Innovation to System Analysis

Author: Guan Hang
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Silicon photonics is recognized as a disruptive technology that has the potential to reshape many application areas, for example, data center communication, telecommunications, high-performance computing, and sensing. The key capability that silicon photonics offers is to leverage CMOS-style design, fabrication, and test infrastructure to build compact, energy-efficient, and high-performance integrated photonic systems-on- chip at low cost. As the need to squeeze more data into a given bandwidth and a given footprint increases, silicon photonics becomes more and more promising. This work develops and demonstrates novel devices, methodologies, and architectures to resolve the challenges facing the next-generation silicon photonic transceivers. The first part of this thesis focuses on the topology optimization of passive silicon photonic devices. Specifically, a novel device optimization methodology - particle swarm optimization in conjunction with 3D finite-difference time-domain (FDTD), has been proposed and proven to be an effective way to design a wide range of passive silicon photonic devices. We demonstrate a polarization rotator and a 90◦ optical hybrid for polarization-diversity and phase-diversity communications - two important schemes to increase the communication capacity by increasing the spectral efficiency. The second part of this thesis focuses on the design and characterization of the next- generation silicon photonic transceivers. We demonstrate a polarization-insensitive WDM receiver with an aggregate data rate of 160 Gb/s. This receiver adopts a novel architecture which effectively reduces the polarization-dependent loss. In addition, we demonstrate a III-V/silicon hybrid external cavity laser with a tuning range larger than 60 nm in the C-band on a silicon-on-insulator platform. A III-V semiconductor gain chip is hybridized into the silicon chip by edge-coupling to the silicon chip. The demonstrated packaging method requires only passive alignment and is thus suitable for high-volume production. We also demonstrate all silicon-photonics-based transmission of 34 Gbaud (272 Gb/s) dual-polarization 16-QAM using our integrated laser and silicon photonic coherent transceiver. The results show no additional penalty compared to commercially available narrow linewidth tunable lasers. The last part of this thesis focuses on the chip-scale optical interconnect and presents two different types of reconfigurable memory interconnects for multi-core many-memory computing systems. These reconfigurable interconnects can effectively alleviate the memory access issues, such as non-uniform memory access, and Network-on-Chip (NoC) hot-spots that plague the many-memory computing systems by dynamically directing the available memory bandwidth to the required memory interface

Columbia University Academic Commons