2 research outputs found

    Resource Allocation on SMP Clusters

    Get PDF
    [[abstract]]This paper studies the resource allocation of parallel jobs on SMP clusters. Previous parallel job scheduling algorithms‚ such as EASY (Extensible Argonne Scheduling sYstem), a backfilling algorithm‚ focus only on the allocation of CPUs. As communication cost becomes a bottleneck and gradually dominates the performance of program executions on multicomputers, some processor allocation policies suggest that as processors are allocated to a job, the allocated processors must be as continuous as possible. However, some of the mentioned algorithms will lead to another external fragmentation problem. The problem occurs as sufficient processors become available for the requested job; however, these processors are dispersed throughout the system. Thus jobs must wait for some time until there are sufficient contiguous processors. In summary, as adjacent processors are allocated, we may improve the run time of jobs, however, this will increase the wait-time of jobs. In this paper, we suggest that an effective processor allocation policy should well balance communication cost and waiting delay. The principal is simple. When a network is busy or the communication cost is high‚ the processors must be allocated as continuously as possible. On the other hand‚ when the communication cost is light‚ to shorten the waiting time‚ the processors should be allocated as soon as possible

    Autonomous grid scheduling using probabilistic job runtime forecasting.

    Get PDF
    Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline. and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error eentroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future
    corecore