Search CORE

96 research outputs found

Xar-Trek: Run-Time Execution Migration among FPGAs and Heterogeneous-ISA CPUs

Author: Barbalace Antonio
Chuang Ho-Ren
Horta Edson
Olivier Pierre
Philippidis Cesar
Ravindran Binoy
VSathish Naarayanan Rao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/10/2021
Field of study

Datacenter servers are increasingly heterogeneous: from x86 host CPUs, to ARM or RISC-V CPUs in NICs/SSDs, to FPGAs. Previous works have demonstrated that migrating application execution at run-time across heterogeneous-ISA CPUs can yield significant performance and energy gains, with relatively little programmer effort. However, FPGAs have often been overlooked in that context: hardware acceleration using FPGAs involves statically implementing select application functions, which prohibits dynamic and transparent migration. We present Xar-Trek, a new compiler and run-time software framework that overcomes this limitation. Xar-Trek compiles an application for several CPU ISAs and select application functions for acceleration on an FPGA, allowing execution migration between heterogeneous-ISA CPUs and FPGAs at run-time. Xar-Trek's run-time monitors server workloads and migrates application functions to an FPGA or to heterogeneous-ISA CPUs based on a scheduling policy. We develop a heuristic policy that uses application workload profiles to make scheduling decisions. Our evaluations conducted on a system with x86-64 server CPUs, ARM64 server CPUs, and an Alveo accelerator card reveal 88%-1% performance gains over no-migration baselines

arXiv.org e-Print Archive

Edinburgh Research Explorer

Edge Computing: The Case for Heterogeneous-ISA Container Migration

Author: Attardi Giuseppe
Barbalace Antonio
Cadar Cristian
Collberg Christian S
Das Anirban
DeVuyst Matthew
Dinaburg Artem
Furlong Matthew
Gordon Mark S
Hightower Kelsey
Křoustek Jakub
Madhavapeddy Anil
Venkat Ashish
Wang Ruoyu
York Richard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/03/2020
Field of study

Crossref

Edinburgh Research Explorer

H-Container: Enabling Heterogeneous-ISA Container Migration in Edge Computing

Author: Barbalace Antonio
Karaoui Mohamed L.
Olivier Pierre
Ravindran Binoy
Wang Wei
Xing Tong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/03/2022
Field of study

Edinburgh Research Explorer

The University of Manchester - Institutional Repository

UNIFICO: Thread Migration in Heterogeneous-ISA CPUs without State Transformation

Author: Barbalace Antonio
Franke Björn
Khordadi Amir
Mavrogeorgis Nikolaos
Mu Pei
Vasiladiotis Christos
Publication venue
Publication date: 20/02/2024
Field of study

Heterogeneous-ISA processor designs have attracted considerable research interest. However, unlike their homogeneous-ISA counterparts, explicit software support for bridging ISA heterogeneity is required. The lack of a compilation toolchain ready to support heterogeneous-ISA targets has been a major factor hindering research in this exciting emerging area. For any such compiler “getting right” the mechanics involved in state transformation upon migration and doing this efficiently is of critical importance. In particular, any runtime conversion of the current program stack from one architecture to another would be prohibitively expensive. In this paper, we design and develop Unifico, a new multi-ISA compiler that generates binaries that maintain the same stack layout during their execution on either architecture. Unifico avoids the need for runtime stack transformation, thus eliminating overheads associated with ISA migration. Additional responsibilities of the Unifico compiler backend include maintenance of a uniform ABI and virtual address space across ISAs. Unifico is implemented using the LLVM compiler infrastructure, and we are currently targeting the x86-64 and ARMv8 ISAs. We have evaluated Unifico across a range of compute-intensive NAS benchmarks and show its minimal impact on overall execution time, where less than 6% overhead is introduced on average. When compared against the state-of-the-art Popcorn compiler, Unifico reduces binary size overhead from ∼200% to ∼10%, whilst eliminating the stack transformation overhead during ISA migration

Edinburgh Research Explorer

Recommended from our members

Achieving Accurate Predictions of Future Events Under Hardware Heterogeneity

Author: Prodromou Andreas
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Heterogeneous hardware is becoming increasingly available in modern hardware, while research breakthroughs enforce the expectation that heterogeneity will keep increasing in the future. Significant gains can be achieved via appropriate utilization of heterogeneity, in terms of performance and power consumption, however, poor utilization can have a detrimental effect. Intelligent scheduling and resource management is a crucial challenge we need to overcome in order to harvest the full potential of heterogeneous hardware. As systems become larger and include greater levels of hardware diversity, the importance of intelligent scheduling and resource management is further accentuated.This dissertation presents techniques that aid the process of scheduling and resource management in the presence of heterogeneous hardware, via accurately predicting upcoming runtime events. With a proactive and accurate view of the near future, schedulers can utilize the underlying hardware more efficiently, and fully take advantage of the available benefits.By adapting a majority element heuristic, this dissertation significantly improves the accuracy of predicting memory addresses about to be accessed, while reducing prediction-related costs by a factor of ten thousand compared to previously proposed predictive approaches. Coupled with novel microarchitectural modifications, accurate address predictions are shown to improve the performance of heterogeneous memory architectures.Machine learning-based performance predictors are further presented, capable of predicting a program's performance when executed on a given general-purpose core. Trained to model the subtleties of the interaction between hardware and software, these predictors are capable of generating highly accurate predictions even for cores with varied Instruction Set Architectures. Utilizing these performance predictions for job scheduling, is shown to improve overall system performance. The trained predictors are further examined and interpreted in order to visualize the correlations between features picked up and amplified during training.Finally, this dissertation demonstrates that scheduling algorithms cannot guarantee deriving an optimal schedule during realistic execution scenarios due to the underlying hardware heterogeneity, the wide range of runtime requirements of software, as well as prediction error from performance predictors. In response, deep neural networks are trained to select one scheduling approach from a list of options with varied overheads and correctness guarantees. The scheduling approach chosen, is the one which will most likely return the highest-performance schedule with the lowest overhead, given a particular instance of the job-to-core assignment problem

eScholarship - University of California