698 research outputs found
Fairness-aware scheduling on single-ISA heterogeneous multi-cores
Single-ISA heterogeneous multi-cores consisting of small (e.g., in-order) and big (e.g., out-of-order) cores dramatically improve energy- and power-efficiency by scheduling workloads on the most appropriate core type. A significant body of recent work has focused on improving system throughput through scheduling. However, none of the prior work has looked into fairness. Yet, guaranteeing that all threads make equal progress on heterogeneous multi-cores is of utmost importance for both multi-threaded and multi-program workloads to improve performance and quality-of-service. Furthermore, modern operating systems affinitize workloads to cores (pinned scheduling) which dramatically affects fairness on heterogeneous multi-cores. In this paper, we propose fairness-aware scheduling for single-ISA heterogeneous multi-cores, and explore two flavors for doing so. Equal-time scheduling runs each thread or workload on each core type for an equal fraction of the time, whereas equal-progress scheduling strives at getting equal amounts of work done on each core type. Our experimental results demonstrate an average 14% (and up to 25%) performance improvement over pinned scheduling through fairness-aware scheduling for homogeneous multi-threaded workloads; equal-progress scheduling improves performance by 32% on average for heterogeneous multi-threaded workloads. Further, we report dramatic improvements in fairness over prior scheduling proposals for multi-program workloads, while achieving system throughput comparable to throughput-optimized scheduling, and an average 21% improvement in throughput over pinned scheduling
POSTER: Exploiting asymmetric multi-core processors with flexible system sofware
Energy efficiency has become the main challenge for high performance computing (HPC). The use of mobile asymmetric multi-core architectures to build future multi-core systems is an approach towards energy savings while keeping high performance. However, it is not known yet whether such systems are ready to handle parallel applications. This paper fills this gap by evaluating emerging parallel applications on an asymmetric multi-core. We make use of the PARSEC benchmark suite and a processor that implements the ARM big.LITTLE architecture. We conclude that these applications are not mature enough to run on such systems, as they suffer from load imbalance.
Furthermore, we explore the behaviour of dynamic scheduling solutions on either the Operating System (OS) or the runtime level. Comparing these approaches shows us that the most efficient scheduling takes place in the runtime level, influencing the future research towards such solutions.This work has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de
Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the
European HiPEAC Network of Excellence. The Mont-Blanc project receives funding from the EU's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 610402 and from the EU's H2020 Framework Programme (H2020/2014-2020) under grant agreement number 671697.
M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas
is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie
Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft
Modeling and scheduling heterogeneous multi-core architectures
Om de prestatie van toekomstige processors en processorarchitecturen te evalueren wordt vaak gebruik gemaakt van een simulator die het gedrag en de prestatie van de processor modelleert. De prestatie bepalen van de uitvoering van een computerprogramma op een gegeven processorarchitectuur m.b.v. een simulator duurt echter vele grootteordes langer dan de werkelijke uitvoeringstijd. Dit beperkt in belangrijke mate de hoeveelheid experimenten die gedaan kunnen worden.
In dit doctoraatswerk werd het Multi-Program Performance Model (MPPM) ontwikkeld, een innovatief alternatief voor traditionele simulatie, dat het mogelijk maakt om tot 100.000x sneller een processorconfiguratie te evalueren. MPPM laat ons toe om nooit geziene exploraties te doen. Gebruik makend van dit raamwerk hebben we aangetoond dat de taakplanning cruciaal is om heterogene meerkernige processors optimaal te benutten.
Vervolgens werd een nieuwe manier voorgesteld om op een schaalbare manier de taakplanning uit te voeren, namelijk Performance Impact Estimation (PIE). Tijdens de uitvoering van een draad op een gegeven processorkern schatten we de prestatie op een ander type kern op basis van eenvoudig op te meten prestatiemetrieken. Zo beschikken we op elk moment over alle nodige informatie om een efficiënte taakplanning te doen. Dit laat ons bovendien toe te optimaliseren voor verschillende criteria zoals uitvoeringstijd, doorvoersnelheid of fairness
Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems
Heterogeneity has grown in popularity both at the core and server level as a
way to improve both performance and energy efficiency. However, despite these
benefits, scheduling applications in heterogeneous machines remains
challenging. Additionally, when these heterogeneous resources accommodate
multiple applications to increase utilization, resources are prone to
contention, destructive interference, and unpredictable performance. Existing
solutions examine heterogeneity either across or within a server, leading to
missed performance and efficiency opportunities. We present Mage, a practical
interference-aware runtime that optimizes performance and efficiency in systems
with intra- and inter-server heterogeneity. Mage leverages fast and online data
mining to quickly explore the space of application placements, and determine
the one that minimizes destructive interference between co-resident
applications. Mage continuously monitors the performance of active
applications, and, upon detecting QoS violations, it determines whether
alternative placements would prove more beneficial, taking into account any
overheads from migration. Across 350 application mixes on a heterogeneous CMP,
Mage improves performance by 38% and up to 2x compared to a greedy scheduler.
Across 160 mixes on a heterogeneous cluster, Mage improves performance by 30%
on average and up to 52% over the greedy scheduler, and by 11% over the
combination of Paragon [15] for inter- and intra-server heterogeneity
COLAB:A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors
Funding: Partially funded by the UK EPSRC grants Discovery: Pattern Discovery and Program Shaping for Many-core Systems (EP/P020631/1) and ABC: Adaptive Brokerage for Cloud (EP/R010528/1); Royal Academy of Engineering under the Research Fellowship scheme.Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry only under restricted scenarios. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient asymmetric schedulers for single program workloads. What we do not have is a scheduler that can handle all runtime factors affecting AMP for multi-threaded multi-programmed workloads. This paper introduces the first general purpose asymmetry-aware scheduler for multi-threaded multi-programmed workloads. It estimates the performance of each thread on each type of core and identifies communication patterns and bottleneck threads. The scheduler then makes coordinated core assignment and thread selection decisions that still provide each application its fair share of the processor's time. We evaluate our approach using the GEM5 simulator on four distinct big.LITTLE configurations and 26 mixed workloads composed of PARSEC and SPLASH2 benchmarks. Compared to the state-of-the art Linux CFS and AMP-aware schedulers, we demonstrate performance gains of up to 25% and 5% to 15% on average depending on the hardware setup.Postprin
- …