Search CORE

16 research outputs found

Architectural Support for Operating System-Driven CMP Cache Management

Author: Rafique Nauman
Taek Won
Thottethodi Mithuna
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2006
Field of study

The role of the operating system (OS) in managing shared resources such as CPU time, memory, peripherals, and even energy is well motivated and understood [22]. Unfortu- nately, one key resource|lower-level shared cache in chip multi-processors|is commonly managed purely in hardware by rudimentary replacement policies such as least-recently- used (LRU). The rigid nature of the hardware cache manage- ment policy poses a serious problem since there is no single best cache management policy across all sharing scenarios. For example, the cache management policy for a scenario where applications from a single organization are running under \best effort performance expectation is likely to be different from the policy for a scenario where applications from competing business entities (say, at a third party data center) are running under a minimum service level expecta- tion. When it comes to managing shared caches, there is an inherent tension between exibility and performance. On one hand, managing the shared cache in the OS offers immense policy exibility since it may be implemented in soft- ware. Unfortunately, it is prohibitively expensive in terms of performance for the OS to be involved in managing tempo- rally fine-grain events such as cache allocation. On the other hand, sophisticated hardware-only cache management tech- niques to achieve fair sharing or throughput maximization have been proposed. But they offer no policy exibility. This paper addresses this problem by designing architec- tural support for OS to effciently manage shared caches with a wide variety of policies. Our scheme consists of a hard- ware cache quota management mechanism, an OS interface and a set of OS level quota orchestration policies. The hard- ware mechanism guarantees that OS-specifed quotas are en- forced in shared caches, thus eliminating the need for (and the performance penalty of) temporally fine-grained OS in- tervention. The OS retains policy exibility since it can tune the quotas during regularly scheduled OS interventions. We demonstrate that our scheme can support a wide range of policies including policies that provide (a) passive per- formance differentiation, (b) reactive fairness by miss-rate equalization and (c) reactive performance differentiation

CiteSeerX

Purdue E-Pubs

Recommended from our members

Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

Author: Federova Alexandra
Seltzer Margo I.
Smith Michael D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/12/2012
Field of study

We describe a new operating system scheduling algorithm that improves performance isolation on chip multiprocessors (CMP). Poor performance isolation occurs when an application’s performance is determined by the behaviour of its co-runners, i.e., other applications simultaneously running with it. This performance dependency is caused by unfair, corunner-dependent cache allocation on CMPs. Poor performance isolation interferes with the operating system’s control over priority enforcement and hinders QoS provisioning. Previous solutions required modifications to the hardware. We present a new software solution. Our cache-fair algorithm ensures that the application runs as quickly as it would under fair cache allocation, regardless of how the cache is actually allocated. If the thread executes fewer instructions per cycle than it would under fair cache allocation, the scheduler increases that thread’s CPU timeslice. This way, the thread’s overall performance does not suffer because it is allowed to use the CPU longer. We describe our implementation of the algorithm in Solaris™ 10, and show that it significantly improves performance isolation for SPEC CPU, SPEC JBB and TPC-C.Engineering and Applied Science

Harvard University - DASH

BP-NUCA: Cache Pressure-Aware Migration for High-Performance Caching in CMPs

Author: Fu Guitao
Jia Xiaomin
Jiang Jiang
Qi Shubo
Wang Yongwen
Zhang Minxuan
Zhao Tianlei
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

As the momentum behind Chip Multi-Processors (CMPs) continues to grow, Last Level Cache (LLC) management becomes a crucial issue to CMPs because off-chip accesses often involve a big latency. Private cache design is distinguished by smaller local access latency, good performance isolation and easy scalability, thus is becoming an attractive design alternative for LLC of CMPs. This paper proposes Balanced Private Non-Uniform Cache Architecture (BP-NUCA), a new LLC architecture that starts from private cache design for smaller local access latency and good performance isolation, then introduces a low cost mechanism to dynamically migrate private blocks among peer private caches of LLC to improve the overall space utilization. BP-NUCA achieves this by measuring the cache access pressure level that each cache set experiences at runtime and then using the information to guide block migration among different private caches of LLC. A heavily accessed set, namely a set with high access pressure level, is allowed to migrate its evicted blocks to peer private caches, replacing blocks of sets which are with the same index and have low access pressure level. By migrating blocks from heavily accessed cache sets to less accessed cache sets, BP-NUCA effectively balances space utilization of LLC among different cores. Experimental results using a full system CMP simulator show that BP-NUCA improves the overall throughput by as much as 20.3 %, 12.4 %, 14.5 % and 18.0 % (on average 7.7 %, 4.4 %, 4.0 % and 6.1 %) over private cache, shared cache, shared cache management scheme UCP and private cache organization CC respectively on a 4-core CMP for SPEC CPU2006 benchmarks

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

GDP : using dataflow properties to accurately estimate interference-free performance at runtime

Author: Eeckhout Lieven
Jahre Magnus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Multi-core memory systems commonly share resources between processors. Resource sharing improves utilization at the cost of increased inter-application interference which may lead to priority inversion, missed deadlines and unpredictable interactive performance. A key component to effectively manage multi-core resources is performance accounting which aims to accurately estimate interference-free application performance. Previously proposed accounting systems are either invasive or transparent. Invasive accounting systems can be accurate, but slow down latency-sensitive processes. Transparent accounting systems do not affect performance, but tend to provide less accurate performance estimates. We propose a novel class of performance accounting systems that achieve both performance-transparency and superior accuracy. We call the approach dataflow accounting, and the key idea is to track dynamic dataflow properties and use these to estimate interference-free performance. Our main contribution is Graph-based Dynamic Performance (GDP) accounting. GDP dynamically builds a dataflow graph of load requests and periods where the processor commits instructions. This graph concisely represents the relationship between memory loads and forward progress in program execution. More specifically, GDP estimates interference-free stall cycles by multiplying the critical path length of the dataflow graph with the estimated interference-free memory latency. GDP is very accurate with mean IPC estimation errors of 3.4% and 9.8% for our 4- and 8-core processors, respectively. When GDP is used in a cache partitioning policy, we observe average system throughput improvements of 11.9% and 20.8% compared to partitioning using the state-of-the-art Application Slowdown Model

Crossref

Ghent University Academic Bibliography

NORA - Norwegian Open Research Archives

Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

Author: Alexandra Fedorova
Margo Seltzer
Michael D. Smith
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

(Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Feorova, Alexandra, Margo Seltzer, and Michael D. Smith. 2007.Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the 16t

CiteSeerX

Crossref

Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view

Author: Michaud Pierre
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

The presence of shared caches in current multicore processors may generate a lot of performance variability when several applications execute simultaneously. For the programmer of an application with quality-of-service goals, this performance variability may lead to a very pessimistic tuning. To solve this problem, there must be a way for the programmer to define a reasonable performance target and make sure that the actual performance is greater than or close to the target. We propose that the performance target be defined as the performancemeasured when each core runs a copy of the application, which we call self-performance. This study characterizes self-performance and explains how the shared-cache replacement policy can be modified for self-performance to be meaningful

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Memory hierarchy performance measurement of commercial dual-core desktop processors

Author: Carl Staelin
David Koppelman
Hammond
Jih-Kwon Peir
Kalla
Lu Peng
Tribuvan K. Prakash
Yen-Kuang Chen
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A Survey on Cache Management Mechanisms for Real-Time Embedded Systems

Author: ALHAMMAD AHMED
FRÖHLICH ANTÔNIO
GRACIOLI GIOVANI
MANCUSO RENATO
Pellizzoni Rodolfo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2015
Field of study

© ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {48, 2, (November 2015)} http://doi.acm.org/10.1145/2830555Multicore processors are being extensively used by real-time systems, mainly because of their demand for increased computing power. However, multicore processors have shared resources that affect the predictability of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks. One of the main factors for unpredictability in a multicore processor is the cache memory hierarchy. Recently, many research works have proposed different techniques to deal with caches in multicore processors in the context of real-time systems. Nevertheless, a review and categorization of these techniques is still an open topic and would be very useful for the real-time community. In this article, we present a survey of cache management techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research published in 2014. We categorize the main research works and provide a detailed comparison in terms of similarities and differences. We also identify key challenges and discuss future research directions.King Saud University NSER

University of Waterloo's Institutional Repository

Crossref

Leveraging virtualization technologies for resource partitioning in mixed criticality systems

Author: Li Ye
Publication venue
Publication date: 28/11/2015
Field of study

Multi- and many-core processors are becoming increasingly popular in embedded systems. Many of these processors now feature hardware virtualization capabilities, such as the ARM Cortex A15, and x86 processors with Intel VT-x or AMD-V support. Hardware virtualization offers opportunities to partition physical resources, including processor cores, memory and I/O devices amongst guest virtual machines. Mixed criticality systems and services can then co-exist on the same platform in separate virtual machines. However, traditional virtual machine systems are too expensive because of the costs of trapping into hypervisors to multiplex and manage machine physical resources on behalf of separate guests. For example, hypervisors are needed to schedule separate VMs on physical processor cores. Additionally, traditional hypervisors have memory footprints that are often too large for many embedded computing systems. This dissertation presents the design of the Quest-V separation kernel, which partitions services of different criticality levels across separate virtual machines, or sandboxes. Each sandbox encapsulates a subset of machine physical resources that it manages without requiring intervention of a hypervisor. In Quest-V, a hypervisor is not needed for normal operation, except to bootstrap the system and establish communication channels between sandboxes. This approach not only reduces the memory footprint of the most privileged protection domain, it removes it from the control path during normal system operation, thereby heightening security

Boston University Institutional Repository (OpenBU)