18 research outputs found
A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems
Abstract—By integrating multiple cores in a single chip, Chip Multiprocessors (CMP) provide an attractive approach to improve both system throughput and efficiency. This integration allows the sharing of on-chip resources which may lead to de-structive interference between the executing workloads. Memory-subsystem is an important shared resource that contributes significantly to the overall throughput and power consumption. In order to prevent destructive interference, the cache capacity and memory bandwidth requirements of the last level cache have to be controlled. While previously proposed schemes focus on resource sharing within a chip, we explore additional possibilities both inside and outside a single chip. We propose a dynamic memory-subsystem resource management scheme that considers both cache capacity and memory bandwidth contention in large multi-chip CMP systems. Our approach uses low overhead, non-invasive resource profilers that are based on Mattson’s stack distance algorithm to project each core’s resource requirements and guide our cache partitioning algorithms. Our bandwidth-aware algorithm seeks for throughput optimizations among multiple chips by migrating workloads from the most resource-overcommitted chips to the ones with more available resources. Use of bandwidth as a criterion results in an overall 18% reduction in memory bandwidth along with a 7.9 % reduction in miss rate, compared to existing resource management schemes. Using a cycle-accurate full system simulator, our approach achieved an average improvement of 8.5 % on throughput. I
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Abstract—As Chip-Multiprocessor systems (CMP) have be-come the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. This enables sharing of computation resources that was not previously possible. In addition, the virtualization of these computational resources exposes the system to a mix of diverse and competing workloads. Cache is a resource of primary concern as it can be dominant in controlling overall throughput. In order to prevent destructive interference between divergent workloads, the last level of cache must be partitioned. In the past, many solutions have been proposed but most of them are assuming either simplified cache hierarchies with no realistic restrictions or complex cache schemes that are difficult to integrate in a real design. To address this problem, we propose a dynamic partitioning strategy based on realistic last level cache designs of CMP processors. We used a cycle accurate, full system simulator based on Simics and Gems to evaluate our partitioning scheme on an 8-core DNUCA CMP system. Results for an 8-core system show that our proposed scheme provides on average a 70 % reduction in misses compared to non-partitioned shared caches, and a 25 % misses reduction compared to static equally partitioned (private) caches. I
