10 research outputs found

    PFAS: A Resource-Performance-Fluctuation-Aware Workflow Scheduling Algorithm for Grid Computing

    Full text link

    Grid-job scheduling with reservations and preemption

    Get PDF
    Computational grids make it possible to exploit grid resources across multiple clusters when grid jobs are deconstructed into tasks and allocated across clusters. Grid-job tasks are often scheduled in the form of workflows which require synchronization, and advance reservation makes it easy to guarantee predictable resource provisioning for these jobs. However, advance reservation for grid jobs creates roadblocks and fragmentation which adversely affects the system utilization and response times for local jobs. We provide a solution which incorporates relaxed reservations and uses a modified version of the standard grid-scheduling algorithm, HEFT, to obtain flexibility in placing reservations for workflow grid jobs. Furthermore, we deploy the relaxed reservation with modified HEFT as an extension of the preemption based job scheduling framework, SCOJO-PECT job scheduler. In SCOJO-PECT, relaxed reservations serve the additional purpose of permitting scheduler optimizations which shift the overall schedule forward. Furthermore, a propagation heuristics algorithm is used to alleviate the workflow job makespan extension caused by the slack of relaxed reservation. Our solution aims at decreasing the fragmentation caused by grid jobs, so that local jobs and system utilization are not compromised, and at the same time grid jobs also have reasonable response times

    Optimizing performance of workflow executions under authorization control

    Get PDF
    “Business processes or workflows are often used to model enterprise or scientific applications. It has received considerable attention to automate workflow executions on computing resources. However, many workflow scenarios still involve human activities and consist of a mixture of human tasks and computing tasks. Human involvement introduces security and authorization concerns, requiring restrictions on who is allowed to perform which tasks at what time. Role- Based Access Control (RBAC) is a popular authorization mechanism. In RBAC, the authorization concepts such as roles and permissions are defined, and various authorization constraints are supported, including separation of duty, temporal constraints, etc. Under RBAC, users are assigned to certain roles, while the roles are associated with prescribed permissions. When we assess resource capacities, or evaluate the performance of workflow executions on supporting platforms, it is often assumed that when a task is allocated to a resource, the resource will accept the task and start the execution once a processor becomes available. However, when the authorization policies are taken into account,” this assumption may not be true and the situation becomes more complex. For example, when a task arrives, a valid and activated role has to be assigned to a task before the task can start execution. The deployed authorization constraints may delay the workflow execution due to the roles’ availability, or other restrictions on the role assignments, which will consequently have negative impact on application performance. When the authorization constraints are present to restrict the workflow executions, it entails new research issues that have not been studied yet in conventional workflow management. This thesis aims to investigate these new research issues. First, it is important to know whether a feasible authorization solution can be found to enable the executions of all tasks in a workflow, i.e., check the feasibility of the deployed authorization constraints. This thesis studies the issue of the feasibility checking and models the feasibility checking problem as a constraints satisfaction problem. Second, it is useful to know when the performance of workflow executions will not be affected by the given authorization constraints. This thesis proposes the methods to determine the time durations when the given authorization constraints do not have impact. Third, when the authorization constraints do have the performance impact, how can we quantitatively analyse and determine the impact? When there are multiple choices to assign the roles to the tasks, will different choices lead to the different performance impact? If so, can we find an optimal way to conduct the task-role assignments so that the performance impact is minimized? This thesis proposes the method to analyze the delay caused by the authorization constraints if the workflow arrives beyond the non-impact time duration calculated above. Through the analysis of the delay, we realize that the authorization method, i.e., the method to select the roles to assign to the tasks affects the length of the delay caused by the authorization constraints. Based on this finding, we propose an optimal authorization method, called the Global Authorization Aware (GAA) method. Fourth, a key reason why authorization constraints may have impact on performance is because the authorization control directs the tasks to some particular roles. Then how to determine the level of workload directed to each role given a set of authorization constraints? This thesis conducts the theoretical analysis about how the authorization constraints direct the workload to the roles, and proposes the methods to calculate the arriving rate of the requests directed to each role under the role, temporal and cardinality constraints. Finally, the amount of resources allocated to support each individual role may have impact on the execution performance of the workflows. Therefore, it is desired to develop the strategies to determine the adequate amount of resources when the authorization control is present in the system. This thesis presents the methods to allocate the appropriate quantity for resources, including both human resources and computing resources. Different features of human resources and computing resources are taken into account. For human resources, the objective is to maximize the performance subject to the budgets to hire the human resources, while for computing resources, the strategy aims to allocate adequate amount of computing resources to meet the QoS requirements

    Mapping DAG-based applications to multiclusters with background workload

    Get PDF
    Before an application modelled as a Directed Acyclic Graph (DAG) is executed on a heterogeneous system, a DAG mapping policy is often enacted. After mapping, the tasks (in the DAG-based application) to be executed at each computational resource are determined. The tasks are then sent to the corresponding resources, where they are orchestrated in the pre-designed pattern to complete the work. Most DAG mapping policies in the literature assume that each computational resource is a processing node of a single processor, i.e. the tasks mapped to a resource are to be run in sequence. Our studies demonstrate that if the resource is actually a cluster with multiple processing nodes, this assumption will cause a misperception in the tasks' execution time and execution order. This will disturb the pre-designed cooperation among tasks so that the expected performance cannot be achieved. In this paper, a DAG mapping algorithm is presented for multicluster architectures. Each constituent cluster in the multicluster is shared by background workload (from other users) and has its own independent local scheduler. The multicluster DAG mapping policy is based on theoretical analysis and its performance is evaluated through extensive experimental studies. The results show that compared with conventional DAG mapping policies, the new scheme that we present can significantly improve the scheduling performance of a DAG-based application in terms of the schedule length

    Workload characterization, modeling, and prediction in grid Computing

    Get PDF
    Workloads play an important role in experimental performance studies of computer systems. This thesis presents a comprehensive characterization of real workloads on production clusters and Grids. A variety of correlation structures and rich scaling behavior are identified in workload attributes such as job arrivals and run times, including pseudo-periodicity, long range dependence, and strong temporal locality. Based on the analytic results workload models are developed to fit the real data. For job arrivals three different kinds of autocorrelations are investigated. For short to middle range dependent data, Markov modulated Poisson processes (MMPP) are good models because they can capture correlations between interarrival times while remaining analytically tractable. For long range dependent and multifractal processes, the multifractal wavelet model (MWM) is able to reconstruct the scaling behavior and it provides a coherent wavelet framework for analysis and synthesis. Pseudo-periodicity is a special kind of autocorrelation and it can be modeled by a matching pursuit approach. For workload attributes such as run time a new model is proposed that can fit not only the marginal distribution but also the second order statistics such as the autocorrelation function (ACF). The development of workload models enable the simulation studies of Grid scheduling strategies. By using the synthetic traces, the performance impacts of workload correlations in Grid scheduling is quantitatively evaluated. The results indicate that autocorrelations in workload attributes can cause performance degradation, in some situations the difference can be up to several orders of magnitude. The larger the autocorrelation, the worse the performance, it is proved both at the cluster and Grid level. This study shows the importance of realistic workload models in performance evaluation studies. Regarding performance predictions, this thesis treats the targeted resources as a ``black box'' and takes a statistical approach. It is shown that statistical learning based methods, after a well-thought and fine-tuned design, are able to deliver good accuracy and performance.UBL - phd migration 201

    Workload modeling and performance evaluation in parallel systems

    Get PDF
    Scheduling plays a significant role in producing good performance for clusters and grids. Smart scheduling policies in these systems are essential to enable efficient resource allocation mechanisms. One of the key factors that have a strong effect on scheduling is the workload. This workload problem is associated with four research topics to obtain an effective scheduler, namely workload characterisation, workload modeling, performance evaluation and prediction, and scheduling design. Workload data collected from real systems are the best source for improving our knowledge about performance issues of clusters and grids. Observed features of these workloads are precious sources of clues, which can be utilized to enhance scheduling. To this end, several long-term parallel and grid workloads have been collected and this thesis used these real workloads in the study of workload characterisation, workload modeling, per formance evaluation and prediction. Our research resulted in many workload modeling tools, a performance predictor and several useful clues that are essential to develop efficient cluster and grid schedulers.UBL - phd migration 201

    Personal mobile grids with a honeybee inspired resource scheduler

    Get PDF
    The overall aim of the thesis has been to introduce Personal Mobile Grids (PMGrids) as a novel paradigm in grid computing that scales grid infrastructures to mobile devices and extends grid entities to individual personal users. In this thesis, architectural designs as well as simulation models for PM-Grids are developed. The core of any grid system is its resource scheduler. However, virtually all current conventional grid schedulers do not address the non-clairvoyant scheduling problem, where job information is not available before the end of execution. Therefore, this thesis proposes a honeybee inspired resource scheduling heuristic for PM-Grids (HoPe) incorporating a radical approach to grid resource scheduling to tackle this problem. A detailed design and implementation of HoPe with a decentralised self-management and adaptive policy are initiated. Among the other main contributions are a comprehensive taxonomy of grid systems as well as a detailed analysis of the honeybee colony and its nectar acquisition process (NAP), from the resource scheduling perspective, which have not been presented in any previous work, to the best of our knowledge. PM-Grid designs and HoPe implementation were evaluated thoroughly through a strictly controlled empirical evaluation framework with a well-established heuristic in high throughput computing, the opportunistic scheduling heuristic (OSH), as a benchmark algorithm. Comparisons with optimal values and worst bounds are conducted to gain a clear insight into HoPe behaviour, in terms of stability, throughput, turnaround time and speedup, under different running conditions of number of jobs and grid scales. Experimental results demonstrate the superiority of HoPe performance where it has successfully maintained optimum stability and throughput in more than 95% of the experiments, with HoPe achieving three times better than the OSH under extremely heavy loads. Regarding the turnaround time and speedup, HoPe has effectively achieved less than 50% of the turnaround time incurred by the OSH, while doubling its speedup in more than 60% of the experiments. These results indicate the potential of both PM-Grids and HoPe in realising futuristic grid visions. Therefore considering the deployment of PM-Grids in real life scenarios and the utilisation of HoPe in other parallel processing and high throughput computing systems are recommended.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Operating policies for energy efficient large scale computing

    Get PDF
    PhD ThesisEnergy costs now dominate IT infrastructure total cost of ownership, with datacentre operators predicted to spend more on energy than hardware infrastructure in the next five years. With Western European datacentre power consumption estimated at 56 TWh/year in 2007 and projected to double by 2020, improvements in energy efficiency of IT operations is imperative. The issue is further compounded by social and political factors and strict environmental legislation governing organisations. One such example of large IT systems includes high-throughput cycle stealing distributed systems such as HTCondor and BOINC, which allow organisations to leverage spare capacity on existing infrastructure to undertake valuable computation. As a consequence of increased scrutiny of the energy impact of these systems, aggressive power management policies are often employed to reduce the energy impact of institutional clusters, but in doing so these policies severely restrict the computational resources available for high-throughput systems. These policies are often configured to quickly transition servers and end-user cluster machines into low power states after only short idle periods, further compounding the issue of reliability. In this thesis, we evaluate operating policies for energy efficiency in large-scale computing environments by means of trace-driven discrete event simulation, leveraging real-world workload traces collected within Newcastle University. The major contributions of this thesis are as follows: i) Evaluation of novel energy efficient management policies for a decentralised peer-to-peer (P2P) BitTorrent environment. ii) Introduce a novel simulation environment for the evaluation of energy efficiency of large scale high-throughput computing systems, and propose a generalisable model of energy consumption in high-throughput computing systems. iii iii) Proposal and evaluation of resource allocation strategies for energy consumption in high-throughput computing systems for a real workload. iv) Proposal and evaluation for a realworkload ofmechanisms to reduce wasted task execution within high-throughput computing systems to reduce energy consumption. v) Evaluation of the impact of fault tolerance mechanisms on energy consumption
    corecore