10 research outputs found
Grid-job scheduling with reservations and preemption
Computational grids make it possible to exploit grid resources across multiple clusters when grid jobs are deconstructed into tasks and allocated across clusters. Grid-job tasks are often scheduled in the form of workflows which require synchronization, and advance reservation makes it easy to guarantee predictable resource provisioning for these jobs. However, advance reservation for grid jobs creates roadblocks and fragmentation which adversely affects the system utilization and response times for local jobs. We provide a solution which incorporates relaxed reservations and uses a modified version of the standard grid-scheduling algorithm, HEFT, to obtain flexibility in placing reservations for workflow grid jobs. Furthermore, we deploy the relaxed reservation with modified HEFT as an extension of the preemption based job scheduling framework, SCOJO-PECT job scheduler. In SCOJO-PECT, relaxed reservations serve the additional purpose of permitting scheduler optimizations which shift the overall schedule forward. Furthermore, a propagation heuristics algorithm is used to alleviate the workflow job makespan extension caused by the slack of relaxed reservation. Our solution aims at decreasing the fragmentation caused by grid jobs, so that local jobs and system utilization are not compromised, and at the same time grid jobs also have reasonable response times
Optimizing performance of workflow executions under authorization control
âBusiness processes or workflows are often used to
model enterprise or scientific applications. It has
received considerable attention to automate workflow
executions on computing resources. However, many
workflow scenarios still involve human activities and
consist of a mixture of human tasks and computing
tasks.
Human involvement introduces security and
authorization concerns, requiring restrictions on who
is allowed to perform which tasks at what time. Role-
Based Access Control (RBAC) is a popular authorization
mechanism. In RBAC, the authorization concepts such as
roles and permissions are defined, and various
authorization constraints are supported, including
separation of duty, temporal constraints, etc. Under
RBAC, users are assigned to certain roles, while the
roles are associated with prescribed permissions.
When we assess resource capacities, or evaluate the
performance of workflow executions on supporting
platforms, it is often assumed that when a task is
allocated to a resource, the resource will accept the
task and start the execution once a processor becomes available. However, when the authorization policies
are taken into account,â this assumption may not be
true and the situation becomes more complex. For
example, when a task arrives, a valid and activated
role has to be assigned to a task before the task can
start execution. The deployed authorization
constraints may delay the workflow execution due to
the rolesâ availability, or other restrictions on the
role assignments, which will consequently have
negative impact on application performance.
When the authorization constraints are present to
restrict the workflow executions, it entails new
research issues that have not been studied yet in
conventional workflow management. This thesis aims to
investigate these new research issues.
First, it is important to know whether a feasible
authorization solution can be found to enable the
executions of all tasks in a workflow, i.e., check the
feasibility of the deployed authorization constraints.
This thesis studies the issue of the feasibility
checking and models the feasibility checking problem
as a constraints satisfaction problem.
Second, it is useful to know when the performance of
workflow executions will not be affected by the given
authorization constraints. This thesis proposes the
methods to determine the time durations when the given
authorization constraints do not have impact.
Third, when the authorization constraints do have
the performance impact, how can we quantitatively
analyse and determine the impact? When there are multiple choices to assign the roles to the tasks,
will different choices lead to the different
performance impact? If so, can we find an optimal way
to conduct the task-role assignments so that the
performance impact is minimized? This thesis proposes
the method to analyze the delay caused by the
authorization constraints if the workflow arrives
beyond the non-impact time duration calculated above.
Through the analysis of the delay, we realize that the
authorization method, i.e., the method to select the
roles to assign to the tasks affects the length of the
delay caused by the authorization constraints. Based
on this finding, we propose an optimal authorization
method, called the Global Authorization Aware (GAA)
method.
Fourth, a key reason why authorization constraints
may have impact on performance is because the
authorization control directs the tasks to some
particular roles. Then how to determine the level of
workload directed to each role given a set of
authorization constraints? This thesis conducts the
theoretical analysis about how the authorization
constraints direct the workload to the roles, and
proposes the methods to calculate the arriving rate of
the requests directed to each role under the role,
temporal and cardinality constraints.
Finally, the amount of resources allocated to
support each individual role may have impact on the
execution performance of the workflows. Therefore, it
is desired to develop the strategies to determine the
adequate amount of resources when the authorization
control is present in the system. This thesis presents the methods to allocate the appropriate quantity for
resources, including both human resources and
computing resources. Different features of human
resources and computing resources are taken into
account. For human resources, the objective is to
maximize the performance subject to the budgets to
hire the human resources, while for computing
resources, the strategy aims to allocate adequate
amount of computing resources to meet the QoS
requirements
Mapping DAG-based applications to multiclusters with background workload
Before an application modelled as a Directed Acyclic Graph (DAG) is executed on a heterogeneous system, a DAG mapping policy is often enacted. After mapping, the tasks (in the DAG-based application) to be executed at each computational resource are determined. The tasks are then sent to the corresponding resources, where they are orchestrated in the pre-designed pattern to complete the work. Most DAG mapping policies in the literature assume that each computational resource is a processing node of a single processor, i.e. the tasks mapped to a resource are to be run in sequence. Our studies demonstrate that if the resource is actually a cluster with multiple processing nodes, this assumption will cause a misperception in the tasks' execution time and execution order. This will disturb the pre-designed cooperation among tasks so that the expected performance cannot be achieved. In this paper, a DAG mapping algorithm is presented for multicluster architectures. Each constituent cluster in the multicluster is shared by background workload (from other users) and has its own independent local scheduler. The multicluster DAG mapping policy is based on theoretical analysis and its performance is evaluated through extensive experimental studies. The results show that compared with conventional DAG mapping policies, the new scheme that we present can significantly improve the scheduling performance of a DAG-based application in terms of the schedule length
Recommended from our members
Personal mobile grids with a honeybee inspired resource scheduler
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The overall aim of the thesis has been to introduce Personal Mobile Grids (PMGrids)
as a novel paradigm in grid computing that scales grid infrastructures to mobile devices and extends grid entities to individual personal users. In this thesis, architectural designs as well as simulation models for PM-Grids are developed.
The core of any grid system is its resource scheduler. However, virtually all current conventional grid schedulers do not address the non-clairvoyant scheduling problem, where job information is not available before the end of execution. Therefore, this thesis proposes a honeybee inspired resource scheduling heuristic for PM-Grids (HoPe) incorporating a radical approach to grid resource scheduling to tackle this problem. A detailed design and implementation of HoPe with a decentralised self-management and adaptive policy are initiated.
Among the other main contributions are a comprehensive taxonomy of grid systems as well as a detailed analysis of the honeybee colony and its nectar acquisition process (NAP), from the resource scheduling perspective, which have not been presented in any previous work, to the best of our knowledge.
PM-Grid designs and HoPe implementation were evaluated thoroughly through a strictly controlled empirical evaluation framework with a well-established heuristic in high throughput computing, the opportunistic scheduling heuristic (OSH), as a benchmark algorithm. Comparisons with optimal values and worst bounds are conducted to gain a clear insight into HoPe behaviour, in terms of stability, throughput, turnaround time and speedup, under different running conditions of number of jobs and grid scales.
Experimental results demonstrate the superiority of HoPe performance where it
has successfully maintained optimum stability and throughput in more than 95%
of the experiments, with HoPe achieving three times better than the OSH under
extremely heavy loads. Regarding the turnaround time and speedup, HoPe has
effectively achieved less than 50% of the turnaround time incurred by the OSH, while doubling its speedup in more than 60% of the experiments.
These results indicate the potential of both PM-Grids and HoPe in realising futuristic grid visions. Therefore considering the deployment of PM-Grids in real life scenarios and the utilisation of HoPe in other parallel processing and high throughput computing systems are recommended
Workload characterization, modeling, and prediction in grid Computing
Workloads play an important role in experimental performance studies of computer systems. This thesis presents a comprehensive characterization of real workloads on production clusters and Grids. A variety of correlation structures and rich scaling behavior are identified in workload attributes such as job arrivals and run times, including pseudo-periodicity, long range dependence, and strong temporal locality. Based on the analytic results workload models are developed to fit the real data. For job arrivals three different kinds of autocorrelations are investigated. For short to middle range dependent data, Markov modulated Poisson processes (MMPP) are good models because they can capture correlations between interarrival times while remaining analytically tractable. For long range dependent and multifractal processes, the multifractal wavelet model (MWM) is able to reconstruct the scaling behavior and it provides a coherent wavelet framework for analysis and synthesis. Pseudo-periodicity is a special kind of autocorrelation and it can be modeled by a matching pursuit approach. For workload attributes such as run time a new model is proposed that can fit not only the marginal distribution but also the second order statistics such as the autocorrelation function (ACF). The development of workload models enable the simulation studies of Grid scheduling strategies. By using the synthetic traces, the performance impacts of workload correlations in Grid scheduling is quantitatively evaluated. The results indicate that autocorrelations in workload attributes can cause performance degradation, in some situations the difference can be up to several orders of magnitude. The larger the autocorrelation, the worse the performance, it is proved both at the cluster and Grid level. This study shows the importance of realistic workload models in performance evaluation studies. Regarding performance predictions, this thesis treats the targeted resources as a ``black box'' and takes a statistical approach. It is shown that statistical learning based methods, after a well-thought and fine-tuned design, are able to deliver good accuracy and performance.UBL - phd migration 201
Workload modeling and performance evaluation in parallel systems
Scheduling plays a significant role in producing good performance for clusters and grids. Smart scheduling policies in these systems are essential to enable efficient resource allocation mechanisms. One of the key factors that have a strong effect on scheduling is the workload. This workload problem is associated with four research topics to obtain an effective scheduler, namely workload characterisation, workload modeling, performance evaluation and prediction, and scheduling design. Workload data collected from real systems are the best source for improving our knowledge about performance issues of clusters and grids. Observed features of these workloads are precious sources of clues, which can be utilized to enhance scheduling. To this end, several long-term parallel and grid workloads have been collected and this thesis used these real workloads in the study of workload characterisation, workload modeling, per formance evaluation and prediction. Our research resulted in many workload modeling tools, a performance predictor and several useful clues that are essential to develop efficient cluster and grid schedulers.UBL - phd migration 201
Personal mobile grids with a honeybee inspired resource scheduler
The overall aim of the thesis has been to introduce Personal Mobile Grids (PMGrids) as a novel paradigm in grid computing that scales grid infrastructures to mobile devices and extends grid entities to individual personal users. In this thesis, architectural designs as well as simulation models for PM-Grids are developed. The core of any grid system is its resource scheduler. However, virtually all current conventional grid schedulers do not address the non-clairvoyant scheduling problem, where job information is not available before the end of execution. Therefore, this thesis proposes a honeybee inspired resource scheduling heuristic for PM-Grids (HoPe) incorporating a radical approach to grid resource scheduling to tackle this problem. A detailed design and implementation of HoPe with a decentralised self-management and adaptive policy are initiated. Among the other main contributions are a comprehensive taxonomy of grid systems as well as a detailed analysis of the honeybee colony and its nectar acquisition process (NAP), from the resource scheduling perspective, which have not been presented in any previous work, to the best of our knowledge. PM-Grid designs and HoPe implementation were evaluated thoroughly through a strictly controlled empirical evaluation framework with a well-established heuristic in high throughput computing, the opportunistic scheduling heuristic (OSH), as a benchmark algorithm. Comparisons with optimal values and worst bounds are conducted to gain a clear insight into HoPe behaviour, in terms of stability, throughput, turnaround time and speedup, under different running conditions of number of jobs and grid scales. Experimental results demonstrate the superiority of HoPe performance where it has successfully maintained optimum stability and throughput in more than 95% of the experiments, with HoPe achieving three times better than the OSH under extremely heavy loads. Regarding the turnaround time and speedup, HoPe has effectively achieved less than 50% of the turnaround time incurred by the OSH, while doubling its speedup in more than 60% of the experiments. These results indicate the potential of both PM-Grids and HoPe in realising futuristic grid visions. Therefore considering the deployment of PM-Grids in real life scenarios and the utilisation of HoPe in other parallel processing and high throughput computing systems are recommended.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Operating policies for energy efficient large scale computing
PhD ThesisEnergy costs now dominate IT infrastructure total cost of ownership, with datacentre
operators predicted to spend more on energy than hardware infrastructure in the
next five years. With Western European datacentre power consumption estimated at
56 TWh/year in 2007 and projected to double by 2020, improvements in energy efficiency
of IT operations is imperative. The issue is further compounded by social and
political factors and strict environmental legislation governing organisations.
One such example of large IT systems includes high-throughput cycle stealing distributed
systems such as HTCondor and BOINC, which allow organisations to leverage
spare capacity on existing infrastructure to undertake valuable computation.
As a consequence of increased scrutiny of the energy impact of these systems, aggressive
power management policies are often employed to reduce the energy impact
of institutional clusters, but in doing so these policies severely restrict the computational
resources available for high-throughput systems. These policies are often configured
to quickly transition servers and end-user cluster machines into low power
states after only short idle periods, further compounding the issue of reliability.
In this thesis, we evaluate operating policies for energy efficiency in large-scale
computing environments by means of trace-driven discrete event simulation, leveraging
real-world workload traces collected within Newcastle University.
The major contributions of this thesis are as follows:
i) Evaluation of novel energy efficient management policies for a decentralised
peer-to-peer (P2P) BitTorrent environment.
ii) Introduce a novel simulation environment for the evaluation of energy efficiency
of large scale high-throughput computing systems, and propose a generalisable
model of energy consumption in high-throughput computing systems.
iii
iii) Proposal and evaluation of resource allocation strategies for energy consumption
in high-throughput computing systems for a real workload.
iv) Proposal and evaluation for a realworkload ofmechanisms to reduce wasted task
execution within high-throughput computing systems to reduce energy consumption.
v) Evaluation of the impact of fault tolerance mechanisms on energy consumption