259 research outputs found

    Metascheduling of HPC Jobs in Day-Ahead Electricity Markets

    Full text link
    High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in space and time, that are prevalent in the dynamic electricity price markets. In this paper, we present a metascheduling algorithm to optimize the placement of jobs in a compute grid which consumes electricity from the day-ahead wholesale market. We formulate the scheduling problem as a Minimum Cost Maximum Flow problem and leverage queue waiting time and electricity price predictions to accurately estimate the cost of job execution at a system. Using trace based simulation with real and synthetic workload traces, and real electricity price data sets, we demonstrate our approach on two currently operational grids, XSEDE and NorduGrid. Our experimental setup collectively constitute more than 433K processors spread across 58 compute systems in 17 geographically distributed locations. Experiments show that our approach simultaneously optimizes the total electricity cost and the average response time of the grid, without being unfair to users of the local batch systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System

    The effect of real workloads and stochastic workloads on the performance of allocation and scheduling algorithms in 2D mesh multicomputers

    Get PDF
    The performance of the existing non-contiguous processor allocation strategies has been traditionally carried out by means of simulation based on a stochastic workload model to generate a stream of incoming jobs. To validate the performance of the existing algorithms, there has been a need to evaluate the algorithms' performance based on a real workload trace. In this paper, we evaluate the performance of several well-known processor allocation and job scheduling strategies based on a real workload trace and compare the results against those obtained from using a stochastic workload. Our results reveal that the conclusions reached on the relative performance merits of the allocation strategies when a real workload trace is used are in general compatible with those obtained when a stochastic workload is used

    Towards ServMark, an Architecture for Testing Grid Services

    Get PDF
    Technical University of Delft - Technical Report ServMark-2006-002, July 2006Grid computing provides a natural way to aggregate resources from different administrative domains for building large scale distributed environments. The Web Services paradigm proposes a way by which virtual services can be seamlessly integrated into global-scale solutions to complex problems. While the usage of Grid technology ranges from academia and research to business world and production, two issues must be considered: that the promised functionality can be accurately quantified and that the performance can be evaluated based on well defined means. Without adequate functionality demonstrators, systems cannot be tuned or adequately configured, and Web services cannot be stressed adequately in production environment. Without performance evaluation systems, the system design and procurement processes are limp, and the performance of Web Services in production cannot be assessed. In this paper, we present ServMark, a carefully researched tool for Grid performance evaluation. While we acknowledge that a lot of ground must be covered to fulfill the requirements of a system for testing Grid environments, and Web (and Grid) Services, we believe that ServMark addresses the minimal set of critical issues

    Supercomputer Emulation For Evaluating Scheduling Algorithms

    Get PDF
    Scheduling algorithms have a significant impact on the optimal utilization of HPC facilities, yet the vast majority of the research in this area is done using simulations. In working with simulations, a great deal of factors that affect a real scheduler, such as its scheduling processing time, communication latencies and the scheduler intrinsic implementation complexity are not considered. As a result, despite theoretical improvements reported in several articles, practically no new algorithms proposed have been implemented in real schedulers, with HPC facilities still using the basic first-come-first-served (FCFS) with Backfill policy scheduling algorithm. A better approach could be, therefore, the use of real schedulers in an emulation environment to evaluate new algorithms. This thesis investigates two related challenges in emulations: computational cost and faithfulness of the results to real scheduling environments. It finds that the sampling, shrinking and shuffling of a trace must be done carefully to keep the classical metrics invariant or linear variant in relation to size and times of the original workload. This is accomplished by the careful control of the submission period and the consideration of drifts in the submission period and trace duration. This methodology can help researchers to better evaluate their scheduling algorithms and help HPC administrators to optimize the parameters of production schedulers. In order to assess the proposed methodology, we evaluated both the FCFS with Backfill and Suspend/Resume scheduling algorithms. The results strongly suggest that Suspend/Resume leads to a better utilization of a supercomputer when high priorities are given to big jobs

    Fuzzy C-Mean And Genetic Algorithms Based Scheduling For Independent Jobs In Computational Grid

    Get PDF
    The concept of Grid computing is becoming the most important research area in the high performance computing. Under this concept, the jobs scheduling in Grid computing has more complicated problems to discover a diversity of available resources, select the appropriate applications and map to suitable resources. However, the major problem is the optimal job scheduling, which Grid nodes need to allocate the appropriate resources for each job. In this paper, we combine Fuzzy C-Mean and Genetic Algorithms which are popular algorithms, the Grid can be used for scheduling. Our model presents the method of the jobs classifications based mainly on Fuzzy C-Mean algorithm and mapping the jobs to the appropriate resources based mainly on Genetic algorithm. In the experiments, we used the workload historical information and put it into our simulator. We get the better result when compared to the traditional algorithms for scheduling policies. Finally, the paper also discusses approach of the jobs classifications and the optimization engine in Grid scheduling

    Extending Scojo-PECT by migration based on system-level checkpointing

    Get PDF
    In recent years, a significant amount of research has been done on job scheduling in high performance computing area. Parallel jobs have different running time and require a different number of processors, thus jobs need to be scheduled and packed to improve system utilization. Scojo-PECT is a job scheduler which provides service guarantees by using coarse-grain time sharing. However, Scojo-PECT does not provide process migration. We extend the Scojo-PECT by migrating parallel jobs based on system-level checkpointing. We investigate different cases in the Scojo-PECT scheduling algorithm where migration based on system-level checkpointing can be used to improve resource utilization and reduce job response time. Our experimental results show reduction of relative response times on medium jobs over the results of the original Scojo-PECT scheduler and the long jobs do not suffer any disadvantage

    The Impact of Data Replicatino on Job Scheduling Performance in Hierarchical data Grid

    Full text link
    In data-intensive applications data transfer is a primary cause of job execution delay. Data access time depends on bandwidth. The major bottleneck to supporting fast data access in Grids is the high latencies of Wide Area Networks and Internet. Effective scheduling can reduce the amount of data transferred across the internet by dispatching a job to where the needed data are present. Another solution is to use a data replication mechanism. Objective of dynamic replica strategies is reducing file access time which leads to reducing job runtime. In this paper we develop a job scheduling policy and a dynamic data replication strategy, called HRS (Hierarchical Replication Strategy), to improve the data access efficiencies. We study our approach and evaluate it through simulation. The results show that our algorithm has improved 12% over the current strategies.Comment: 11 pages, 7 figure

    Evaluating SLURM simulator with real-machine SLURM and vice versa

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Having a precise and a fast job scheduler model that resembles the real-machine job scheduling software behavior is extremely important in the field of job scheduling. The idea behind SLURM simulator is preserving the original code of the core SLURM functions while allowing for all the advantages of a simulator. Since 2011, SLURM simulator has passed through several iterations of improvements in different research centers. In this work, we present our latest improvements of SLURM simulator and perform the first-ever validation of the simulator on the real machine. In particular, we improved the simulator's performance for about 2.6 times, made the simulator deterministic across several same set-up runs, and improved the simulator's accuracy; its deviation from the real-machine is lowered from previous 12% to at most 1.7%. Finally, we illustrate with several use cases the value of the simulator for job scheduling researchers, SLURM-system administrators, and SLURM developers.Peer ReviewedPostprint (author's final draft
    • …
    corecore