12 research outputs found
Benchmarks and Standards for the Evaluation of Parallel Job Schedulers
The evaluation of parallel job schedulers hinges on the workloads used. It is suggested that this be standardized, in terms of both format and content, so as to ease the evaluation and comparison of different systems. The question remains whether this can encompass both traditional parallel systems and metacomputing systems. This paper is based on a panel on this subject that was held at the workshop, and the ensuing discussion; its authors are both the panel members and participants from the audience. Naturally, not all of us agree with all the opinions expressed here..
Coarse-grain time sharing with advantageous overhead minimization for parallel job scheduling
Parallel job scheduling on cluster computers involves the usage of several strategies to maximize both the utilization of the hardware as well as the throughput at which jobs are processed. Another consideration is the response times, or how quickly a job finishes after submission. One possible solution toward achieving these goals is the use of preemption. Preemptive scheduling techniques involve an overhead cost typically associated with swapping jobs in and out of memory. As memory and data sets increase in size, overhead costs increase. Here is presented a technique for reducing the overhead incurred by swapping jobs in and out of memory as a result of preemption. This is done in the context of the Scojo-PECT preemptive scheduler. Additionally a design for expanding the existing Cluster Simulator to support analysis of scheduling overhead in preemptive scheduling techniques is presented. A reduction in the overhead incurred through preemptive scheduling by the application of standard fitting algorithms in a multi-state job allocation heuristic is shown
Multi-attribute demand characterization and layered service pricing
As cloud computing gains popularity, understanding the pattern and structure of its workload is increasingly important in order to drive effective resource allocation and pricing decisions. In the cloud model, virtual machines (VMs), each consisting of a bundle of computing resources, are presented to users for purchase. Thus, the cloud context requires multi-attribute models of demand. While most of the available studies have focused on one specific attribute of a virtual request such as CPU or memory, to the best of our knowledge there is no work on the joint distribution of resource usage. In the first part of this dissertation, we develop a joint distribution model that captures the relationship among multiple resources by fitting the marginal distribution of each resource type as well as the non-linear structure of their correlation via a copula distribution. We validate our models using a public data set of Google data center usage.
Constructing the demand model is essential for provisioning revenue-optimal configuration for VMs or quality of service (QoS) offered by a provider. In the second part of the dissertation, we turn to the service pricing problem in a multi-provider setting: given service configurations (qualities) offered by different providers, choose a proper price for each offered service to undercut competitors and attract customers. With the rise of layered service-oriented architectures there is a need for more advanced solutions that manage the interactions among service providers at multiple levels. Brokers, as the intermediaries between customers and lower-level providers, play a key role in improving the efficiency of service-oriented structures by matching the demands of customers to the services of providers. We analyze a layered market in which service brokers and service providers compete in a Bertrand game at different levels in an oligopoly market while they offer different QoS. We examine the interaction among players and the effect of price competition on their market shares. We also study the market with partial cooperation, where a subset of players optimizes their total revenue instead of maximizing their own profit independently. We analyze the impact of this cooperation on the market and customers' social welfare
Memory Usage in the LANL CM-5 Workload
. It is generally agreed that memory requirements should be taken into account in the scheduling of parallel jobs. However, so far the work on combined processor and memory scheduling has not been based on detailed information and measurements. To rectify this problem, we present an analysis of memory usage by a production workload on a large parallel machine, the 1024-node CM-5 installed at Los Alamos National Lab. Our main observations are -- The distribution of memory requests has strong discrete components, i.e. some sizes are much more popular than others. -- Many jobs use a relatively small fraction of the memory available on each node, so there is some room for time slicing among several memory-resident jobs. -- Larger jobs (using more nodes) tend to use more memory, but it is difficult to characterize the scaling of per-processor memory usage. 1 Introduction Resource management includes a number of distinct topics, such as scheduling and memory management. Howeve..
Autonomous grid scheduling using probabilistic job runtime scheduling
Computational Grids are evolving into a global, service-oriented architecture –
a universal platform for delivering future computational services to a range of
applications of varying complexity and resource requirements. The thesis focuses
on developing a new scheduling model for general-purpose, utility clusters
based on the concept of user requested job completion deadlines. In such a
system, a user would be able to request each job to finish by a certain deadline,
and possibly to a certain monetary cost. Implementing deadline scheduling is
dependent on the ability to predict the execution time of each queued job, and
on an adaptive scheduling algorithm able to use those predictions to maximise
deadline adherence. The thesis proposes novel solutions to these two problems
and documents their implementation in a largely autonomous and self-managing
way.
The starting point of the work is an extensive analysis of a representative
Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected
by the Grid middleware for accounting purposes. An automated approach is
proposed to identify these dependencies and use them to partition the highly
variable workload into subsets of more consistent and predictable behaviour.
A range of time-series forecasting models, applied in this context for the first
time, were used to model the job execution times as a function of their historical
behaviour and associated properties. Based on the resulting predictions of job
runtimes a novel scheduling algorithm is able to estimate the latest job start
time necessary to meet the requested deadline and sort the queue accordingly to
minimise the amount of deadline overrun.
The testing of the proposed approach was done using the actual job trace
collected from a production Grid facility. The best performing execution time
predictor (the auto-regressive moving average method) coupled to workload
partitioning based on three simultaneous job properties returned the median
absolute percentage error centroid of only 4.75%. This level of prediction
accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler.
Overall, the thesis demonstrates that deadline scheduling of computational
jobs on the Grid is achievable using statistical forecasting of job execution times
based on historical information. The proposed approach is easily implementable,
substantially self-managing and better matched to the human workflow making
it well suited for implementation in the utility Grids of the future
Design and evaluation of multi-objective online scheduling strategies for parallel machines using computational intelligence
This thesis presents a methodology for automatically generating
online scheduling algorithms for a complex objective defined by a
machine provider. Such complex objective functions are required if
the providers have several simple objectives. For example, the
different relationships to the various users must be incorporated
during the development of appropriate scheduling algorithms. Our
research is focused on online scheduling with independent parallel
jobs, multiple identical machines and a small user community.
First, Evolutionary Algorithms are used to exemplarily create a
7-dimensional solution space of feasible schedules of a given
workload trace. Within this step no preferences between different
basic objectives need to be defined. This solution space enables
the resource providers to define a complex evaluation objective
based on their specific preferences. Second, optimized scheduling
strategies are generated by using two different approaches. On the
one hand, an adaptation of a Greedy scheduling algorithm is
applied which uses weights to create an order of jobs. These job
weights are extracted again from workload traces with the help of
Evolutionary Algorithms. On the other hand, a Fuzzy rule based
scheduling system will be applied. Here, we classify a scheduling
situation which consists of many parameters like the day time, the
week day, the waiting queue length etc. Depending on this
classification, a Fuzzy rule based system chooses an appropriate
sorting criterion for the waiting job queue and a suitable
scheduling algorithm. Finally, both approaches, the Greedy
scheduling strategy and the Fuzzy rule based scheduling system,
are compared by using again workload traces. The achieved results
demonstrate the applicability of our approach to generate such
multi-objective scheduling strategies