4 research outputs found

    Adaptive Resource and Job Management for Limited Power Consumption

    Get PDF
    International audienceThe last decades have been characterized by anever growing requirement in terms of computing and storage resources.This tendency has recently put the pressure on the abilityto efficiently manage the power required to operate the hugeamount of electrical components associated with state-of-the-arthigh performance computing systems. The power consumption ofa supercomputer needs to be adjusted based on varying powerbudget or electricity availabilities. As a consequence, Resourceand Job Management Systems have to be adequately adaptedin order to efficiently schedule jobs with optimized performancewhile limiting power usage whenever needed.We introduce in this paper a new scheduling strategy thatcan adapt the executed workload to a limited power budget. Theoriginality of this approach relies upon a combination of speedscaling and node shutdown techniques for power reductions. It isimplemented into the widely used resource and job managementsystem SLURM. Finally, it is validated through large scale emulationsusing real production workload traces of the supercomputerCurie

    Uncovering the effect of system performance on user behavior from traces of parallel systems

    No full text
    Abstract—Intuitively, it seems that understanding how the performance of a system affects its users requires research in psychology and the conducting of live experiments. We demonstrate that it is possible to uncover the effect from traces of the system. In particular, we found that the behavior of users of parallel systems is correlated with the response time of their jobs, not the slowdown as was previously assumed. We show that response times affect the decision of users to continue or abort their interactive session with the system, and that this may relate to expectations the users develop. Although this research was conducted in the context of parallel systems, we believe our results are more general and may pertain to other types of systems as well. I

    ADEPT Runtime/Scalability Predictor in support of Adaptive Scheduling

    Get PDF
    A job scheduler determines the order and duration of the allocation of resources, e.g. CPU, to the tasks waiting to run on a computer. Round-Robin and First-Come-First-Serve are examples of algorithms for making such resource allocation decisions. Parallel job schedulers make resource allocation decisions for applications that need multiple CPU cores, on computers consisting of many CPU cores connected by different interconnects. An adaptive parallel scheduler is a parallel scheduler that is capable of adjusting its resource allocation decisions based on the current resource usage and demand. Adaptive parallel schedulers that decide the numbers of CPU cores to allocate to a parallel job provide more flexibility and potentially improve performance significantly for both local and grid job scheduling compared to non-adaptive schedulers. A major reason why adaptive schedulers are not yet used practically is due to lack of knowledge of the scalability curves of the applications, and high cost of existing white-box approaches for scalability prediction. We show that a runtime and scalability prediction tool can be developed with 3 requirements: accuracy comparable to white-box methods, applicability, and robustness. Applicability depends only on knowledge feasible to gain in a production environment. Robustness addresses anomalous behaviour and unreliable predictions. We present ADEPT, a speedup and runtime prediction tool that satisfies all criteria for both single problem size and across different problem sizes of a parallel application. ADEPT is also capable of handling anomalies and judging reliability of its predictions. We demonstrate these using experiments with MPI and OpenMP implementations of NAS benchmarks and seven real applications

    Workload Modeling for Computer Systems Performance Evaluation

    Full text link
    corecore