50 research outputs found

    GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing

    Full text link
    Clusters, grids, and peer-to-peer (P2P) networks have emerged as popular paradigms for next generation parallel and distributed computing. The management of resources and scheduling of applications in such large-scale distributed systems is a complex undertaking. In order to prove the effectiveness of resource brokers and associated scheduling algorithms, their performance needs to be evaluated under different scenarios such as varying number of resources and users with different requirements. In a grid environment, it is hard and even impossible to perform scheduler performance evaluation in a repeatable and controllable manner as resources and users are distributed across multiple organizations with their own policies. To overcome this limitation, we have developed a Java-based discrete-event grid simulation toolkit called GridSim. The toolkit supports modeling and simulation of heterogeneous grid resources (both time- and space-shared), users and application models. It provides primitives for creation of application tasks, mapping of tasks to resources, and their management. To demonstrate suitability of the GridSim toolkit, we have simulated a Nimrod-G like grid resource broker and evaluated the performance of deadline and budget constrained cost- and time-minimization scheduling algorithms

    Economic-based Distributed Resource Management and Scheduling for Grid Computing

    Full text link
    Computational Grids, emerging as an infrastructure for next generation computing, enable the sharing, selection, and aggregation of geographically distributed resources for solving large-scale problems in science, engineering, and commerce. As the resources in the Grid are heterogeneous and geographically distributed with varying availability and a variety of usage and cost policies for diverse users at different times and, priorities as well as goals that vary with time. The management of resources and application scheduling in such a large and distributed environment is a complex task. This thesis proposes a distributed computational economy as an effective metaphor for the management of resources and application scheduling. It proposes an architectural framework that supports resource trading and quality of services based scheduling. It enables the regulation of supply and demand for resources and provides an incentive for resource owners for participating in the Grid and motives the users to trade-off between the deadline, budget, and the required level of quality of service. The thesis demonstrates the capability of economic-based systems for peer-to-peer distributed computing by developing users' quality-of-service requirements driven scheduling strategies and algorithms. It demonstrates their effectiveness by performing scheduling experiments on the World-Wide Grid for solving parameter sweep applications

    A study in grid simulation and scheduling

    Get PDF
    Grid computing is emerging as an essential tool for large scale analysis and problem solving in scientific and business domains. Whilst the idea of stealing unused processor cycles is as old as the Internet, we are still far from reaching a position where many distributed resources can be seamlessly utilised on demand. One major issue preventing this vision is deciding how to effectively manage the remote resources and how to schedule the tasks amongst these resources. This thesis describes an investigation into Grid computing, specifically the problem of Grid scheduling. This complex problem has many unique features making it particularly difficult to solve and as a result many current Grid systems employ simplistic, inefficient solutions. This work describes the development of a simulation tool, G-Sim, which can be used to test the effectiveness of potential Grid scheduling algorithms under realistic operating conditions. This tool is used to analyse the effectiveness of a simple, novel scheduling technique in numerous scenarios. The results are positive and show that it could be applied to current procedures to enhance performance and decrease the negative effect of resource failure. Finally a conversion between the Grid scheduling problem and the classic computing problem SAT is provided. Such a conversion allows for the possibility of applying sophisticated SAT solving procedures to Grid scheduling providing potentially effective solutions

    Using swarm intelligence for distributed job scheduling on the grid

    Get PDF
    With the rapid growth of data and computational needs, distributed systems and computational Grids are gaining more and more attention. Grids are playing an important and growing role in today networks. The huge amount of computations a Grid can fulfill in a specific time cannot be done by the best super computers. However, Grid performance can still be improved by making sure all the resources available in the Grid are utilized by a good load balancing algorithm. The purpose of such algorithms is to make sure all nodes are equally involved in Grid computations. This research proposes two new distributed swarm intelligence inspired load balancing algorithms. One is based on ant colony optimization and is called AntZ, the other one is based on particle swarm optimization and is called ParticleZ. Distributed load balancing does not incorporate a single point of failure in the system. In the AntZ algorithm, an ant is invoked in response to submitting a job to the Grid and this ant surfs the network to find the best resource to deliver the job to. In the ParticleZ algorithm, each node plays a role as a particle and moves toward other particles by sharing its workload among them. We will be simulating our proposed approaches using a Grid simulation toolkit (GridSim) dedicated to Grid simulations. The performance of the algorithms will be evaluated using several performance criteria (e.g. makespan and load balancing level). A comparison of our proposed approaches with a classical approach called State Broadcast Algorithm and two random approaches will also be provided. Experimental results show the proposed algorithms (AntZ and ParticleZ) can perform very well in a Grid environment. In particular, the use of particle swarm optimization, which has not been addressed in the literature, can yield better performance results in many scenarios than the ant colony approach

    The Inter-cloud meta-scheduling

    Get PDF
    Inter-cloud is a recently emerging approach that expands cloud elasticity. By facilitating an adaptable setting, it purposes at the realization of a scalable resource provisioning that enables a diversity of cloud user requirements to be handled efficiently. This study’s contribution is in the inter-cloud performance optimization of job executions using metascheduling concepts. This includes the development of the inter-cloud meta-scheduling (ICMS) framework, the ICMS optimal schemes and the SimIC toolkit. The ICMS model is an architectural strategy for managing and scheduling user services in virtualized dynamically inter-linked clouds. This is achieved by the development of a model that includes a set of algorithms, namely the Service-Request, Service-Distribution, Service-Availability and Service-Allocation algorithms. These along with resource management optimal schemes offer the novel functionalities of the ICMS where the message exchanging implements the job distributions method, the VM deployment offers the VM management features and the local resource management system details the management of the local cloud schedulers. The generated system offers great flexibility by facilitating a lightweight resource management methodology while at the same time handling the heterogeneity of different clouds through advanced service level agreement coordination. Experimental results are productive as the proposed ICMS model achieves enhancement of the performance of service distribution for a variety of criteria such as service execution times, makespan, turnaround times, utilization levels and energy consumption rates for various inter-cloud entities, e.g. users, hosts and VMs. For example, ICMS optimizes the performance of a non-meta-brokering inter-cloud by 3%, while ICMS with full optimal schemes achieves 9% optimization for the same configurations. The whole experimental platform is implemented into the inter-cloud Simulation toolkit (SimIC) developed by the author, which is a discrete event simulation framework

    Evaluating the Robustness of Resource Allocations Obtained through Performance Modeling with Stochastic Process Algebra

    Get PDF
    Recent developments in the field of parallel and distributed computing has led to a proliferation of solving large and computationally intensive mathematical, science, or engineering problems, that consist of several parallelizable parts and several non-parallelizable (sequential) parts. In a parallel and distributed computing environment, the performance goal is to optimize the execution of parallelizable parts of an application on concurrent processors. This requires efficient application scheduling and resource allocation for mapping applications to a set of suitable parallel processors such that the overall performance goal is achieved. However, such computational environments are often prone to unpredictable variations in application (problem and algorithm) and system characteristics. Therefore, a robustness study is required to guarantee a desired level of performance. Given an initial workload, a mapping of applications to resources is considered to be robust if that mapping optimizes execution performance and guarantees a desired level of performance in the presence of unpredictable perturbations at runtime. In this research, a stochastic process algebra, Performance Evaluation Process Algebra (PEPA), is used for obtaining resource allocations via a numerical analysis of performance modeling of the parallel execution of applications on parallel computing resources. The PEPA performance model is translated into an underlying mathematical Markov chain model for obtaining performance measures. Further, a robustness analysis of the allocation techniques is performed for finding a robustmapping from a set of initial mapping schemes. The numerical analysis of the performance models have confirmed similarity with the simulation results of earlier research available in existing literature. When compared to direct experiments and simulations, numerical models and the corresponding analyses are easier to reproduce, do not incur any setup or installation costs, do not impose any prerequisites for learning a simulation framework, and are not limited by the complexity of the underlying infrastructure or simulation libraries

    Integrated computational and network QOS in grid computing

    Get PDF
    Master'sMASTER OF ENGINEERIN

    An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing

    Get PDF
    Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults

    Distributed Trust Management in Grid Computing Environments

    Get PDF
    Grid computing environments are open distributed systems in which autonomous participants collaborate with each other using specific mechanisms and protocols. In general, the participants have different aims and objectives, can join and leave the Grid environment any time, have different capabilities for offering services, and often do not have sufficient knowledge about their collaboration partners. As a result, it is quite difficult to rely on the outcome of the collaboration process. Furthermore, the overall decision whether to rely at all on a collaboration partner or not may be affected by other non-functional aspects that cannot be generally determined for every possible situation, but should rather be under the control of the user when requesting such a decision. In this thesis, the idea that trust is the major requirement for enabling collaboration among partners in Grid environments is investigated. The probability for a successful future interaction among partners is considered as closely related to the mutual trust values the partners assign to each other. Thus, the level of trust represents the level of intention of Grid participants to collaborate. Trust is classified into two categories: identity trust and behavior trust. Identity trust is concerned with verifying the authenticity of an interaction partner, whereas behavior trust deals with the trustworthiness of an interaction partner. In order to calculate the identity trust, a "small-worlds"-like scheme is proposed. The overall behavior trust of an interaction partner is built up by considering several factors, such as accuracy or reliability. These factors of behavior trust are continuously tested and verified. In this way, a history of past collaborations that is used for future decisions on further collaborations between collaboration partners is collected. This kind of experience is also shared as recommendations to other participants. An interesting problem analysed is the difficulty of discovering the "real" behavior of an interaction partner from the "observed" behavior. If there are behavioral deviations, then it is not clear under what circumstances the deviating behavior of a partner is going to be tolerated. Issues involved in managing behavior trust of Grid participants are investigated and an approach based on the idea of using statistical methods of quality assurance for identifying the "real" behavior of a participant during an interaction and for "keeping" the behavior of the participants "in-control" is proposed. Another problem addressed is the security in Grid environments. Grids are designed to provide access and control over enormous remote computational resources, storage devices and scientific instruments. The information exchanged, saved or processed can be quite valuable and thus, a Grid is an attractive target for attacks to extract this information. Here, the confidentiality of the communication between Grid participants, together with issues related to authorization, integrity, management and non-repudiation are considered. A hybrid message level encryption scheme for securing the communication between Grid participants is proposed. It is based on a combination of two asymmetric cryptographic techniques, a variant of Public Key Infrastructure (PKI) and Certificateless Public Key Cryptography (CL-PKC). The different methods to trust management are implemented on a simulation infrastructure. The proposed system architecture can be configured to the domain specific trust requirements by the use of several separate trust profiles covering the entire lifecycle of trust establishment and management. Different experiments illustrate further how Grid participants can build, manage and evolve trust between them in order to have a successful collaboration. Although the approach is basically conceived for Grid environments, it is generic enough to be used for establishing and managing trust in many Grid-like distributed environments

    Scheduling for Large Scale Distributed Computing Systems: Approaches and Performance Evaluation Issues

    Get PDF
    Although our everyday life and society now depends heavily oncommunication infrastructures and computation infrastructures,scientists and engineers have always been among the main consumers ofcomputing power. This document provides a coherent overview of theresearch I have conducted in the last 15 years and which targets themanagement and performance evaluation of large scale distributedcomputing infrastructures such as clusters, grids, desktop grids,volunteer computing platforms, ... when used for scientific computing.In the first part of this document, I present how I have addressedscheduling problems arising on distributed platforms (like computinggrids) with a particular emphasis on heterogeneity and multi-userissues, hence in connection with game theory. Most of these problemsare relaxed from a classical combinatorial optimization formulationinto a continuous form, which allows to easily account for keyplatform characteristics such as heterogeneity or complex topologywhile providing efficient practical and distributed solutions.The second part presents my main contributions to the SimGrid project,which is a simulation toolkit for building simulators of distributedapplications (originally designed for scheduling algorithm evaluationpurposes). It comprises a unified presentation of how the questions ofvalidation and scalability have been addressed in SimGrid as well asthoughts on specific challenges related to methodological aspects andto the application of SimGrid to the HPC context
    corecore