4 research outputs found
SARA: Self-Aware Resource Allocation for Heterogeneous MPSoCs
In modern heterogeneous MPSoCs, the management of shared memory resources is
crucial in delivering end-to-end QoS. Previous frameworks have either focused
on singular QoS targets or the allocation of partitionable resources among CPU
applications at relatively slow timescales. However, heterogeneous MPSoCs
typically require instant response from the memory system where most resources
cannot be partitioned. Moreover, the health of different cores in a
heterogeneous MPSoC is often measured by diverse performance objectives. In
this work, we propose a Self-Aware Resource Allocation (SARA) framework for
heterogeneous MPSoCs. Priority-based adaptation allows cores to use different
target performance and self-monitor their own intrinsic health. In response,
the system allocates non-partitionable resources based on priorities. The
proposed framework meets a diverse range of QoS demands from heterogeneous
cores.Comment: Accepted by the 55th annual Design Automation Conference 2018
(DAC'18
DarkCache: Energy-performance Optimization of Tiled Multi-cores by Adaptively Power Gating LLC Banks
The Last Level Cache (LLC) is a key element to improve application performance in multi-cores. To handle the worst case, the main design trend employs tiled architectures with a large LLC organized in banks, which goes underutilized in several realistic scenarios. Our proposal, named DarkCache, aims at properly powering off such unused banks to optimize the Energy-Delay Product (EDP) through an adaptive cache reconfiguration, thus aggressively reducing the leakage energy. The implemented solution is general and it can recognize and skip the activation of the DarkCache policy for the few strong memory intensive applications that actually require the use of the entire LLC. The validation has been carried out on 16- and 64-core architectures also accounting for two state-of-the-art methodologies. Compared to the baseline solution, DarkCache exhibits a performance overhead within 2% and an average EDP improvement of 32.58% and 36.41% considering 16 and 64 cores, respectively. Moreover, DarkCache shows an average EDP gain between 16.15% (16 cores) and 21.05% (64 cores) compared to the best state-of-the-art we evaluated, and it confirms a good scalability since the gain improves with the size of the architecture
Recommended from our members
Improving virtual memory performance in virtualized environments
Virtual Memory is a major system performance bottleneck in virtualized environments. In addition to expensive address translations, frequent virtual machine context switches are common in virtualized environments, resulting in increased TLB miss rates, subsequent expensive page walks and data cache contention due to incoming page table entries evicting useful data. Orthogonally, translation coherence, which is currently an expensive operation implemented in software, can consume up to 50% of the runtime of an application executing on the guest. To improve the performance of virtual memory in virtualized environments, two solutions have been proposed in this thesis - namely, (1) Context Switch Aware Large TLB (CSALT), an architecture which addresses the problem of increased TLB miss rates and their adverse impact on data caches. CSALT copes with the increased demand of context switches by storing a large number TLB entries. It mitigates data cache contention by employing a novel TLB-aware cache partitioning scheme. On 8-core systems that switch between two virtual machine contexts executing multi-threaded workloads, CSALT achieves an average performance improvement of 85% over a baseline with conventional L1-L2 TLBs and 25% over a baseline which has a large L3 TLB (2) Translation Coherence using Addressable TLBs (TCAT), a hardware translation coherence scheme which eliminates almost all of the overheads associated with address translation coherence. TCAT overlays translation coherence atop cache coherence to accurately identify slave cores. It then leverages the addressable Part-Of-Memory TLB (POM-TLB) to eliminate expensive Inter Processor Interrupts (IPI) and achieve precise invalidations on the slave core. On 8-core systems with one virtual machine context executing multi-threaded workloads, TCAT achieves an average performance improvement of 13% over the kvmtlb baselineElectrical and Computer Engineerin