Search CORE

4 research outputs found

SARA: Self-Aware Resource Allocation for Heterogeneous MPSoCs

Author: Alavoine Olivier
Lin Bill
Song Yang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2018
Field of study

In modern heterogeneous MPSoCs, the management of shared memory resources is crucial in delivering end-to-end QoS. Previous frameworks have either focused on singular QoS targets or the allocation of partitionable resources among CPU applications at relatively slow timescales. However, heterogeneous MPSoCs typically require instant response from the memory system where most resources cannot be partitioned. Moreover, the health of different cores in a heterogeneous MPSoC is often measured by diverse performance objectives. In this work, we propose a Self-Aware Resource Allocation (SARA) framework for heterogeneous MPSoCs. Priority-based adaptation allows cores to use different target performance and self-monitor their own intrinsic health. In response, the system allocates non-partitionable resources based on priorities. The proposed framework meets a diverse range of QoS demands from heterogeneous cores.Comment: Accepted by the 55th annual Design Automation Conference 2018 (DAC'18

arXiv.org e-Print Archive

Crossref

DarkCache: Energy-performance Optimization of Tiled Multi-cores by Adaptively Power Gating LLC Banks

Author: Colombo Luca
Fornaciari William
Zoni Davide
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

The Last Level Cache (LLC) is a key element to improve application performance in multi-cores. To handle the worst case, the main design trend employs tiled architectures with a large LLC organized in banks, which goes underutilized in several realistic scenarios. Our proposal, named DarkCache, aims at properly powering off such unused banks to optimize the Energy-Delay Product (EDP) through an adaptive cache reconfiguration, thus aggressively reducing the leakage energy. The implemented solution is general and it can recognize and skip the activation of the DarkCache policy for the few strong memory intensive applications that actually require the use of the entire LLC. The validation has been carried out on 16- and 64-core architectures also accounting for two state-of-the-art methodologies. Compared to the baseline solution, DarkCache exhibits a performance overhead within 2% and an average EDP improvement of 32.58% and 36.41% considering 16 and 64 cores, respectively. Moreover, DarkCache shows an average EDP gain between 16.15% (16 cores) and 21.05% (64 cores) compared to the best state-of-the-art we evaluated, and it confirms a good scalability since the gain improves with the size of the architecture

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Recommended from our members

Improving virtual memory performance in virtualized environments

Author: Marathe Yashwant
Publication venue
Publication date: 25/02/2021
Field of study

Virtual Memory is a major system performance bottleneck in virtualized environments. In addition to expensive address translations, frequent virtual machine context switches are common in virtualized environments, resulting in increased TLB miss rates, subsequent expensive page walks and data cache contention due to incoming page table entries evicting useful data. Orthogonally, translation coherence, which is currently an expensive operation implemented in software, can consume up to 50% of the runtime of an application executing on the guest. To improve the performance of virtual memory in virtualized environments, two solutions have been proposed in this thesis - namely, (1) Context Switch Aware Large TLB (CSALT), an architecture which addresses the problem of increased TLB miss rates and their adverse impact on data caches. CSALT copes with the increased demand of context switches by storing a large number TLB entries. It mitigates data cache contention by employing a novel TLB-aware cache partitioning scheme. On 8-core systems that switch between two virtual machine contexts executing multi-threaded workloads, CSALT achieves an average performance improvement of 85% over a baseline with conventional L1-L2 TLBs and 25% over a baseline which has a large L3 TLB (2) Translation Coherence using Addressable TLBs (TCAT), a hardware translation coherence scheme which eliminates almost all of the overheads associated with address translation coherence. TCAT overlays translation coherence atop cache coherence to accurately identify slave cores. It then leverages the addressable Part-Of-Memory TLB (POM-TLB) to eliminate expensive Inter Processor Interrupts (IPI) and achieve precise invalidations on the slave core. On 8-core systems with one virtual machine context executing multi-threaded workloads, TCAT achieves an average performance improvement of 13% over the kvmtlb baselineElectrical and Computer Engineerin

Texas ScholarWorks