11 research outputs found

    SUBOPTIMAL APPROACHES TO SCHEDULING MALLEABLE TASKS

    Get PDF
    In the paper, the problem of scheduling a set of n malleable tasks on m parallel computers is considered. The tasks may be executed by several processors simultaneously and the processing speed of a task is a function of the number of processors alloted. The problem is motivated by real-life applications of parallel computer systems in scientific computing of highly parallelizable tasks. Starting from the continuous version of the problem (i. e. where the tasks may require a fractional part of the resources), we propose a general approximation algorithm with a performance guarantee equal to 2. Then, some improvements are derived that lead to a very good average behavior of the scheduling algorithm.Pozna

    DROM: Enabling Efficient and Effortless Malleability for Resource Managers

    Get PDF
    In the design of future HPC systems, research in resource management is showing an increasing interest in a more dynamic control of the available resources. It has been proven that enabling the jobs to change the number of computing resources at run time, i.e. their malleability, can significantly improve HPC system performance. However, job schedulers and applications typically do not support malleability due to the common belief that it introduces additional programming complexity and performance impact. This paper presents DROM, an interface that provides efficient malleability with no effort for program developers. The running application is enabled to adapt the number of threads to the number of assigned computing resources in a completely transparent way to the user through the integration of DROM with standard programming models, such as OpenMP/OmpSs, and MPI. We designed the APIs to be easily used by any programming model, application and job scheduler or resource manager. Our experimental results from two realistic use cases analysis, based on malleability by reducing the number of cores a job is using per node and jobs co-allocation, show the potential of DROM for improving the performance of HPC systems. In particular, the workload of two MPI+OpenMP neuro-simulators are tested, reporting improvement in system metrics, such as total run time and average response time, up to 8% and 48%, respectively.This work is partially supported by the Span- ish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project, by the Generalitat de Catalunya (contract 2017-SGR-1414) and from the European Union’s Horizon 2020 under grant agreement No 785907 (HBP SGA2)Peer ReviewedPostprint (author's final draft

    Provably Efficient Adaptive Scheduling for Parallel Jobs

    Get PDF
    Scheduling competing jobs on multiprocessors has always been an important issue for parallel and distributed systems. The challenge is to ensure global, system-wide efficiency while offering a level of fairness to user jobs. Various degrees of successes have been achieved over the years. However, few existing schemes address both efficiency and fairness over a wide range of work loads. Moreover, in order to obtain analytical results, most of them require prior information about jobs, which may be difficult to obtain in real applications. This paper presents two novel adaptive scheduling algorithms -- GRAD for centralized scheduling, and WRAD for distributed scheduling. Both GRAD and WRAD ensure fair allocation under all levels of workload, and they offer provable efficiency without requiring prior information of job's parallelism. Moreover, they provide effective control over the scheduling overhead and ensure efficient utilization of processors. To the best of our knowledge, they are the first non-clairvoyant scheduling algorithms that offer such guarantees. We also believe that our new approach of resource request-allotment protocol deserves further exploration. Specifically, both GRAD and WRAD are O(1)-competitive with respect to mean response time for batched jobs, and O(1)-competitive with respect to makespan for non-batched jobs with arbitrary release times. The simulation results show that, for non-batched jobs, the makespan produced by GRAD is no more than 1.39 times of the optimal on average and it never exceeds 4.5 times. For batched jobs, the mean response time produced by GRAD is no more than 2.37 times of the optimal on average, and it never exceeds 5.5 times.Singapore-MIT Alliance (SMA

    Holistic Slowdown Driven Scheduling and Resource Management for Malleable Jobs

    Get PDF
    In job scheduling, the concept of malleability has been explored since many years ago. Research shows that malleability improves system performance, but its utilization in HPC never became widespread. The causes are the difficulty in developing malleable applications, and the lack of support and integration of the different layers of the HPC software stack. However, in the last years, malleability in job scheduling is becoming more critical because of the increasing complexity of hardware and workloads. In this context, using nodes in an exclusive mode is not always the most efficient solution as in traditional HPC jobs, where applications were highly tuned for static allocations, but offering zero flexibility to dynamic executions. This paper proposes a new holistic, dynamic job scheduling policy, Slowdown Driven (SD-Policy), which exploits the malleability of applications as the key technology to reduce the average slowdown and response time of jobs. SD-Policy is based on backfill and node sharing. It applies malleability to running jobs to make room for jobs that will run with a reduced set of resources, only when the estimated slowdown improves over the static approach. We implemented SD-Policy in SLURM and evaluated it in a real production environment, and with a simulator using workloads of up to 198K jobs. Results show better resource utilization with the reduction of makespan, response time, slowdown, and energy consumption, up to respectively 7%, 50%, 70%, and 6%, for the evaluated workloads

    Combining malleability and I/O control mechanisms to enhance the execution of multiple applications

    Get PDF
    This work presents a common framework that integrates CLARISSE, a cross-layer runtime for the I/O software stack, and FlexMPI, a runtime that provides dynamic load balancing and malleability capabilities for MPI applications. This integration is performed both at application level, as libraries executed within the application, as well as at central-controller level, as external components that manage the execution of different applications. We show that a cooperation between both runtimes provides important benefits for overall system performance: first, by means of monitoring, the CPU, communication and I/O performances of all executing applications are collected, providing a holistic view of the complete platform utilization. Secondly, we introduce a coordinated way of using CLARISSE and FlexMPI control mechanisms, based on two different optimization strategies, with the aim of improving both the application I/O and overall system performance. Finally, we present a detailed description of this proposal, as well as an empirical evaluation of the framework on a cluster showing significant performance improvements at both application and wide-platform levels. We demonstrate that with this proposal the overall I/O time of an application can be reduced by up to 49% and the aggregated FLOPS of all running applications can be increased by 10% with respect to the baseline case. (C) 2018 Elsevier Inc. All rights reserved.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been partially supported by the Spanish “Ministerio de Economia y Competitividad” under the project grant TIN2016-79637-P “Towards Unification of HPC and Big Data paradigms” and EU under the COST Program Action IC1305, Network for Sustainable Ultrascale Computing (NESUS)

    Provably efficient online non-clairvoyant adaptive scheduling

    Get PDF
    Abstract To the best of our knowledge, GRAD is the first nonclairvoyant scheduling algorithm that offers such guarantees. We also believe that our new approach of resource requestallotment protocol deserves further exploration. The simulation results show that, for non-batched jobs, the makespan produced by GRAD is no more than 1.39 times of the optimal on average. For batched jobs, the mean response time produced by GRAD is no more than 2.37 times of the optimal on average

    Group-based optimization for parallel job scheduling in clusters via heuristic search

    Get PDF
    Job scheduling for parallel processing typically makes scheduling decisions on a per job basis due to the dynamic arrival of jobs. Such decision making provides limited options to find globally best schedules. Most research uses off-line optimization which is not realistic. We propose an optimization on the basis of limited-size dynamic job grouping per priority class. We apply heuristic domain-knowledge-based hi-level search and branch-and-bound methods to heavy workload traces to capture good schedules. Special plan-based conservative backfilling and shifting policies are used to augment the search. Our objective is to minimize average relative response times for long and medium job classes, while keeping utilization high. The scheduling algorithm is extended from the SCOJO-PECT coarse-grain pre-emptive time-sharing scheduler. The proposed scheduler was evaluated using real traces and Lublin-Feitelson synthetic workload model. The comparisons were made with the conservative SCOJO-PECT scheduler. The results are promising--the average relative response times were improved by 18-32 while still able to contain the loss of utilization within 2

    Efficient Approximation Algorithms for Scheduling Malleable Tasks

    Get PDF
    A malleable task is a computational unit which may be executed on any arbitrary number of processors, its execution time depending on the amount of resources allotted to it. According to the standard behavior of parallel applications, we assume that the malleable tasks are monotonic, i.e. that the execution time is decreasing with the number of processors while the computational work increases. This paper presents a new approach for scheduling a set of independent malleable tasks which leads to a worst case guarantee of 3 for the minimization of the parallel execution time, or makespan. It improves all other existing practical results including the two-phases method introduced by Turek et al. The main idea is to transfer the difficulty of a two phases method from the scheduling part to the allotment selection. We show how to formulate this last problem as a knapsack optimization problem. Then, the scheduling problem is solved by a dual-approximation which leads to a simple structure of two consecutive shelves

    Abstract Efficient Approximation Algorithms for Scheduling Malleable Tasks

    No full text
    A malleable task is a computational unit which may be executed on any arbitrary number of processors, its execution time depending on the amount of resources allotted to it. According to the standard behavior of parallel applications, we assume that the malleable tasks are monotonic, i.e. that the execution time is decreasing with the number of processors while the computational work increases. This paper presents a new approach for scheduling a set of independent malleable tasks which leads to a worst case guarantee of fi for the minimization of the parallel execution time, or makespan. It improves all other existing practical results including the two-phases method introduced by Turek et al. The main idea is to transfer the difficulty of a two phases method from the scheduling part to the allotment selection. We show how to formulate this last problem as a knapsack optimization problem. Then, the scheduling problem is solved by a dual-approximation which leads to a simple structure of two consecutive shelves.
    corecore