131 research outputs found

    Grid-job scheduling with reservations and preemption

    Get PDF
    Computational grids make it possible to exploit grid resources across multiple clusters when grid jobs are deconstructed into tasks and allocated across clusters. Grid-job tasks are often scheduled in the form of workflows which require synchronization, and advance reservation makes it easy to guarantee predictable resource provisioning for these jobs. However, advance reservation for grid jobs creates roadblocks and fragmentation which adversely affects the system utilization and response times for local jobs. We provide a solution which incorporates relaxed reservations and uses a modified version of the standard grid-scheduling algorithm, HEFT, to obtain flexibility in placing reservations for workflow grid jobs. Furthermore, we deploy the relaxed reservation with modified HEFT as an extension of the preemption based job scheduling framework, SCOJO-PECT job scheduler. In SCOJO-PECT, relaxed reservations serve the additional purpose of permitting scheduler optimizations which shift the overall schedule forward. Furthermore, a propagation heuristics algorithm is used to alleviate the workflow job makespan extension caused by the slack of relaxed reservation. Our solution aims at decreasing the fragmentation caused by grid jobs, so that local jobs and system utilization are not compromised, and at the same time grid jobs also have reasonable response times

    Quality of service based data-aware scheduling

    Get PDF
    Distributed supercomputers have been widely used for solving complex computational problems and modeling complex phenomena such as black holes, the environment, supply-chain economics, etc. In this work we analyze the use of these distributed supercomputers for time sensitive data-driven applications. We present the scheduling challenges involved in running deadline sensitive applications on shared distributed supercomputers running large parallel jobs and introduce a ``data-aware\u27\u27 scheduling paradigm that overcomes these challenges by making use of Quality of Service classes for running applications on shared resources. We evaluate the new data-aware scheduling paradigm using an event-driven hurricane simulation framework which attempts to run various simulations modeling storm surge, wave height, etc. in a timely fashion to be used by first responders and emergency officials. We further generalize the work and demonstrate with examples how data-aware computing can be used in other applications with similar requirements

    Resource allocation model for grid computing environment

    Get PDF
    Grid computing is a collection of heterogeneous resources that is highly dynamic and unpredictable. It is typically used for solving scientific or technical problems that require a large number of computer processing cycles or access to substantial amounts of data. Various resource allocation strategies have been used to make resource use more productive, with subsequent distributed environmental performance increases. The user sends a job by providing a predetermined time limit for running that job. Then, the scheduler gives priority to work according to the request and scheduling policy and places it in the waiting queue. When the resource is released, the scheduler selects the job from the waiting queue with a specific algorithm. Requests will be rejected if the required resources are not available. The user can re-submit a new request by modifying the parameter until available resources can be found. Eventually, there is a decrease in idle resources between work and resource utilization, and the waiting time will increase. An effective scheduling policy is required to improve resource use and reduce waiting times. In this paper, the FCFS-LRH method is proposed, where jobs received will be sorted by arrival time, execution time, and the number of resources needed. After the sorting process, the work will be placed in a logical view, and the job will be sent to the actual resource when it executes. The experimental results show that the proposed model can increase resource utilization by 1.34% and reduce waiting time by 20.47% when compared to existing approaches. This finding could be beneficially implemented in cloud systems resource allocation management

    Backfilling with fairness and slack for parallel job scheduling

    Get PDF
    Parallel jobs have different runtimes and numbers of threads/processes. Thus, scheduling parallel jobs involves a packing problem. If jobs are packed as tightly as possible, utilization will be improved. Otherwise, some resources have to stay idle. The common solution to deal with idle resources is backfilling, which schedule smaller jobs submitted later to execute earlier as long as they do not postpone the first job or all the previous jobs in the waiting queue. Traditionally, backfilling uses first fit for idle resources, according to the submission order. However, in this case, better packing of jobs could be missed. Hence, we propose an algorithm which looks further ahead if significantly improving utilization. However at the same time, this could be unfair to some jobs ahead in the queue. So we use a delay factor as a constraint to limit unfairness. We propose a branch and bound algorithm which selects jobs for backfilling which keep utilization high, while trying to stay close to First-Come-First-Served (FCFS). We evaluate relative response time and utilization and compare to other backfilling approaches. The selection of jobs for backfilling to optimize for high utilization and low delay is implemented as an extension of the existing Scojo-PECT preemptive scheduler

    Optimum Allocation of Distributed Service Workflows with Probabilistic Real-Time Guarantees

    Get PDF
    This paper addresses the problem of optimum allocation of distributed real-time workflows with probabilistic service guarantees over a set of physical resources. The discussion focuses on how such a problem may be mathematically formalized, in terms of both constraints and objective function to be optimized, which also accounts for possible business rules for regulating the deployment of the workflows. The presented formal problem constitutes a probabilistic admission control test that may be run by a provider in order to decide whether or not it is worth to admit new workflows into the system and to decide what the optimum allocation of the workflow to the available resources is. Various options are presented, which may be plugged into the formal problem description, depending on the specific needs of individual workflows. The presented problem has been implemented using GAMS and has been tested under various solvers. An illustrative numerical example and an analysis of the results of the implemented model under realistic settings are presented

    Development of Data-Driven Dispatching Heuristics for Heterogeneous HPC Systems

    Get PDF
    Nell’ambito dei sistemi High-Performance Computing, l'uso di euristiche di dispatching efficaci, per lo scheduling e l'allocazione dei jobs in arrivo, è fondamentale al fine di ottenere buoni livelli di Quality of Service. In questo elaborato ci concentreremo sul design e l’analisi di euristiche di allocazione delle risorse, che saranno progettate per sistemi HPC eterogenei, nei quali i nodi possono essere equipaggiati con diverse tipologie di unità di elaborazione. Impiegheremo poi euristiche data-driven per la predizione della durata dei jobs, e valuteremo il tutto dal punto di vista del throughput di sistema. Considereremo in particolare Eurora, un sistema HPC eterogeneo realizzato da CINECA, oltre che un workload catturato dal relativo log di sistema, contenente jobs reali inviati dagli utenti. Tutto ciò è stato possibile grazie ad AccaSim, un simulatore di sistemi HPC sviluppato nel Dipartimento di Informatica - Scienza e Ingegneria (DISI) dell’Università di Bologna, ed al quale si è contribuito in modo sostanziale. Quest’elaborato mostra che l’impatto di diverse euristiche di allocazione sul throughput di un sistema HPC eterogeneo non è trascurabile, con variazioni in grado di raggiungere picchi di un ordine di grandezza, e più pronunciate considerando brevi intervalli temporali, dell'ordine dei mesi. Abbiamo inoltre osservato che l’impiego di euristiche per la predizione della durata dei jobs è di grande beneficio al throughput su tutte le euristiche di allocazione, e specialmente su quelle che integrano in maniera più profonda tali elementi data-driven. Infine, l’analisi effettuata ha permesso di caratterizzare integralmente il sistema Eurora ed il relativo workload, permettendoci di comprendere al meglio gli effetti su di esso dei diversi metodi di dispatching, nonché di estendere le nostre considerazioni anche ad altre classi di sistemi

    Simulation techniques in an artificial society model

    Get PDF
    Artificial society refers to a generic class of agent-based simulation models used to discover global social structures and collective behavior produced by simple local rules and interaction mechanisms. Artificial society models are applicable in a variety of disciplines, including the modeling of chemical and biological processes, natural phenomena, and complex adaptive systems. We focus on the underlying simulation techniques used in artificial society discrete-event simulation models, including model time evolution and computational performance.;Although for some applications synchronous time evolution is the correct modeling approach, many other applications are better represented using asynchronous time evolution. We claim that asynchronous time evolution can eliminate potential simulation artifacts produced using synchronous time evolution. Using an adaptation of a popular artificial society model, we show that very different output can result based solely on the choice of asynchronous or synchronous time evolution. Based on the event list implementation chosen, the use of discrete-event simulation to incorporate asynchronous time evolution can incur a substantial loss in computational performance. Accordingly, we evaluate select event list implementations within the artificial society simulation model and demonstrate that acceptable performance can be achieved.;In addition to the artificial society model, we show that transforming from a synchronous to an asynchronous system proves beneficial for scheduling resources in a parallel system. We focus on non-FCFS job scheduling policies that permit jobs to backfill, i.e., to move ahead in the queue, given that they do not delay certain previously submitted jobs. Instead of using a single queue of jobs, we propose a simple yet effective backfilling scheduling policy that effectively separates short from long jobs by incorporating multiple queues. By monitoring system performance, our policy adapts its configuration parameters in response to severe changes in the job arrival pattern and/or resource demands. Detailed performance comparisons via simulation using actual parallel workload traces indicate that our proposed policy consistently outperforms traditional backfilling in a variety of contexts
    • …
    corecore