698 research outputs found

    Fairness-aware scheduling on single-ISA heterogeneous multi-cores

    Get PDF
    Single-ISA heterogeneous multi-cores consisting of small (e.g., in-order) and big (e.g., out-of-order) cores dramatically improve energy- and power-efficiency by scheduling workloads on the most appropriate core type. A significant body of recent work has focused on improving system throughput through scheduling. However, none of the prior work has looked into fairness. Yet, guaranteeing that all threads make equal progress on heterogeneous multi-cores is of utmost importance for both multi-threaded and multi-program workloads to improve performance and quality-of-service. Furthermore, modern operating systems affinitize workloads to cores (pinned scheduling) which dramatically affects fairness on heterogeneous multi-cores. In this paper, we propose fairness-aware scheduling for single-ISA heterogeneous multi-cores, and explore two flavors for doing so. Equal-time scheduling runs each thread or workload on each core type for an equal fraction of the time, whereas equal-progress scheduling strives at getting equal amounts of work done on each core type. Our experimental results demonstrate an average 14% (and up to 25%) performance improvement over pinned scheduling through fairness-aware scheduling for homogeneous multi-threaded workloads; equal-progress scheduling improves performance by 32% on average for heterogeneous multi-threaded workloads. Further, we report dramatic improvements in fairness over prior scheduling proposals for multi-program workloads, while achieving system throughput comparable to throughput-optimized scheduling, and an average 21% improvement in throughput over pinned scheduling

    POSTER: Exploiting asymmetric multi-core processors with flexible system sofware

    Get PDF
    Energy efficiency has become the main challenge for high performance computing (HPC). The use of mobile asymmetric multi-core architectures to build future multi-core systems is an approach towards energy savings while keeping high performance. However, it is not known yet whether such systems are ready to handle parallel applications. This paper fills this gap by evaluating emerging parallel applications on an asymmetric multi-core. We make use of the PARSEC benchmark suite and a processor that implements the ARM big.LITTLE architecture. We conclude that these applications are not mature enough to run on such systems, as they suffer from load imbalance. Furthermore, we explore the behaviour of dynamic scheduling solutions on either the Operating System (OS) or the runtime level. Comparing these approaches shows us that the most efficient scheduling takes place in the runtime level, influencing the future research towards such solutions.This work has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. The Mont-Blanc project receives funding from the EU's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 610402 and from the EU's H2020 Framework Programme (H2020/2014-2020) under grant agreement number 671697. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft

    Modeling and scheduling heterogeneous multi-core architectures

    Get PDF
    Om de prestatie van toekomstige processors en processorarchitecturen te evalueren wordt vaak gebruik gemaakt van een simulator die het gedrag en de prestatie van de processor modelleert. De prestatie bepalen van de uitvoering van een computerprogramma op een gegeven processorarchitectuur m.b.v. een simulator duurt echter vele grootteordes langer dan de werkelijke uitvoeringstijd. Dit beperkt in belangrijke mate de hoeveelheid experimenten die gedaan kunnen worden. In dit doctoraatswerk werd het Multi-Program Performance Model (MPPM) ontwikkeld, een innovatief alternatief voor traditionele simulatie, dat het mogelijk maakt om tot 100.000x sneller een processorconfiguratie te evalueren. MPPM laat ons toe om nooit geziene exploraties te doen. Gebruik makend van dit raamwerk hebben we aangetoond dat de taakplanning cruciaal is om heterogene meerkernige processors optimaal te benutten. Vervolgens werd een nieuwe manier voorgesteld om op een schaalbare manier de taakplanning uit te voeren, namelijk Performance Impact Estimation (PIE). Tijdens de uitvoering van een draad op een gegeven processorkern schatten we de prestatie op een ander type kern op basis van eenvoudig op te meten prestatiemetrieken. Zo beschikken we op elk moment over alle nodige informatie om een efficiënte taakplanning te doen. Dit laat ons bovendien toe te optimaliseren voor verschillende criteria zoals uitvoeringstijd, doorvoersnelheid of fairness

    Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems

    Full text link
    Heterogeneity has grown in popularity both at the core and server level as a way to improve both performance and energy efficiency. However, despite these benefits, scheduling applications in heterogeneous machines remains challenging. Additionally, when these heterogeneous resources accommodate multiple applications to increase utilization, resources are prone to contention, destructive interference, and unpredictable performance. Existing solutions examine heterogeneity either across or within a server, leading to missed performance and efficiency opportunities. We present Mage, a practical interference-aware runtime that optimizes performance and efficiency in systems with intra- and inter-server heterogeneity. Mage leverages fast and online data mining to quickly explore the space of application placements, and determine the one that minimizes destructive interference between co-resident applications. Mage continuously monitors the performance of active applications, and, upon detecting QoS violations, it determines whether alternative placements would prove more beneficial, taking into account any overheads from migration. Across 350 application mixes on a heterogeneous CMP, Mage improves performance by 38% and up to 2x compared to a greedy scheduler. Across 160 mixes on a heterogeneous cluster, Mage improves performance by 30% on average and up to 52% over the greedy scheduler, and by 11% over the combination of Paragon [15] for inter- and intra-server heterogeneity

    COLAB:A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors

    Get PDF
    Funding: Partially funded by the UK EPSRC grants Discovery: Pattern Discovery and Program Shaping for Many-core Systems (EP/P020631/1) and ABC: Adaptive Brokerage for Cloud (EP/R010528/1); Royal Academy of Engineering under the Research Fellowship scheme.Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry only under restricted scenarios. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient asymmetric schedulers for single program workloads. What we do not have is a scheduler that can handle all runtime factors affecting AMP for multi-threaded multi-programmed workloads. This paper introduces the first general purpose asymmetry-aware scheduler for multi-threaded multi-programmed workloads. It estimates the performance of each thread on each type of core and identifies communication patterns and bottleneck threads. The scheduler then makes coordinated core assignment and thread selection decisions that still provide each application its fair share of the processor's time. We evaluate our approach using the GEM5 simulator on four distinct big.LITTLE configurations and 26 mixed workloads composed of PARSEC and SPLASH2 benchmarks. Compared to the state-of-the art Linux CFS and AMP-aware schedulers, we demonstrate performance gains of up to 25% and 5% to 15% on average depending on the hardware setup.Postprin
    • …
    corecore