22 research outputs found
Fairness-aware scheduling on single-ISA heterogeneous multi-cores
Single-ISA heterogeneous multi-cores consisting of small (e.g., in-order) and big (e.g., out-of-order) cores dramatically improve energy- and power-efficiency by scheduling workloads on the most appropriate core type. A significant body of recent work has focused on improving system throughput through scheduling. However, none of the prior work has looked into fairness. Yet, guaranteeing that all threads make equal progress on heterogeneous multi-cores is of utmost importance for both multi-threaded and multi-program workloads to improve performance and quality-of-service. Furthermore, modern operating systems affinitize workloads to cores (pinned scheduling) which dramatically affects fairness on heterogeneous multi-cores. In this paper, we propose fairness-aware scheduling for single-ISA heterogeneous multi-cores, and explore two flavors for doing so. Equal-time scheduling runs each thread or workload on each core type for an equal fraction of the time, whereas equal-progress scheduling strives at getting equal amounts of work done on each core type. Our experimental results demonstrate an average 14% (and up to 25%) performance improvement over pinned scheduling through fairness-aware scheduling for homogeneous multi-threaded workloads; equal-progress scheduling improves performance by 32% on average for heterogeneous multi-threaded workloads. Further, we report dramatic improvements in fairness over prior scheduling proposals for multi-program workloads, while achieving system throughput comparable to throughput-optimized scheduling, and an average 21% improvement in throughput over pinned scheduling
Utilizing Criticality Stacks for Dynamic Voltage and Frequency Scaling
Thread imbalance is inevitable for multithreaded applications due to the necessity of synchronization
primitives to coordinate access to memory and system resources. This imbalance leads to
a bounding of application performance, but, more importantly for mobile devices, this imbalance
also leads to energy inefficiencies. Recent works have begun to quantify this imbalance and look
to leverage it not only for performance improvements, but for energy savings as well. All these
works, though, test the theory through the use of simulators and power estimation tools. These
results may show that the theory is sound, but the complexities of how a real machine handles synchronization
may lead to diminished results by either having too large of a performance impact,
or too little energy savings. In this work, we implement one such algorithm, PCSLB, and improve
upon it in order to see if the results shown for this technique are feasible for use in real machines.
With the improved algorithm, PCSLB-Max, and the CritScale Linux kernel module, we show that,
in fact, there are energy saving available to us while mitigating the performance
Utilizing Criticality Stacks for Dynamic Voltage and Frequency Scaling
Thread imbalance is inevitable for multithreaded applications due to the necessity of synchronization
primitives to coordinate access to memory and system resources. This imbalance leads to
a bounding of application performance, but, more importantly for mobile devices, this imbalance
also leads to energy inefficiencies. Recent works have begun to quantify this imbalance and look
to leverage it not only for performance improvements, but for energy savings as well. All these
works, though, test the theory through the use of simulators and power estimation tools. These
results may show that the theory is sound, but the complexities of how a real machine handles synchronization
may lead to diminished results by either having too large of a performance impact,
or too little energy savings. In this work, we implement one such algorithm, PCSLB, and improve
upon it in order to see if the results shown for this technique are feasible for use in real machines.
With the improved algorithm, PCSLB-Max, and the CritScale Linux kernel module, we show that,
in fact, there are energy saving available to us while mitigating the performance