70 research outputs found

    Fault Tolerant Scheduling of Partitioned and Grouped Jobs in Grid Computing (FTPG)

    Get PDF
    Computational grids have the potential for solving scientific and large - scale problems using heterogeneous and geographically distributed resources In addition to the challenges of managing and scheduling resources reliable challenges arise because the grid infrastructure is unreliable There are two major problems in Scheduling the Grid 1 Efficient Scheduling of jobs 2 Providing fault tolerance in a reliable manner Most of the existing strategies do not provide fault tolerance There are some algorithms which provide fault tolerance but they do a large amount of redundant computation to provide fault tolerance This paper addresses this issue and minimizes redundant work by using a group level table of data This technique is suitable for partitioning and group scheduling of job

    Using swarm intelligence for distributed job scheduling on the grid

    Get PDF
    With the rapid growth of data and computational needs, distributed systems and computational Grids are gaining more and more attention. Grids are playing an important and growing role in today networks. The huge amount of computations a Grid can fulfill in a specific time cannot be done by the best super computers. However, Grid performance can still be improved by making sure all the resources available in the Grid are utilized by a good load balancing algorithm. The purpose of such algorithms is to make sure all nodes are equally involved in Grid computations. This research proposes two new distributed swarm intelligence inspired load balancing algorithms. One is based on ant colony optimization and is called AntZ, the other one is based on particle swarm optimization and is called ParticleZ. Distributed load balancing does not incorporate a single point of failure in the system. In the AntZ algorithm, an ant is invoked in response to submitting a job to the Grid and this ant surfs the network to find the best resource to deliver the job to. In the ParticleZ algorithm, each node plays a role as a particle and moves toward other particles by sharing its workload among them. We will be simulating our proposed approaches using a Grid simulation toolkit (GridSim) dedicated to Grid simulations. The performance of the algorithms will be evaluated using several performance criteria (e.g. makespan and load balancing level). A comparison of our proposed approaches with a classical approach called State Broadcast Algorithm and two random approaches will also be provided. Experimental results show the proposed algorithms (AntZ and ParticleZ) can perform very well in a Grid environment. In particular, the use of particle swarm optimization, which has not been addressed in the literature, can yield better performance results in many scenarios than the ant colony approach

    Economic-based Distributed Resource Management and Scheduling for Grid Computing

    Full text link
    Computational Grids, emerging as an infrastructure for next generation computing, enable the sharing, selection, and aggregation of geographically distributed resources for solving large-scale problems in science, engineering, and commerce. As the resources in the Grid are heterogeneous and geographically distributed with varying availability and a variety of usage and cost policies for diverse users at different times and, priorities as well as goals that vary with time. The management of resources and application scheduling in such a large and distributed environment is a complex task. This thesis proposes a distributed computational economy as an effective metaphor for the management of resources and application scheduling. It proposes an architectural framework that supports resource trading and quality of services based scheduling. It enables the regulation of supply and demand for resources and provides an incentive for resource owners for participating in the Grid and motives the users to trade-off between the deadline, budget, and the required level of quality of service. The thesis demonstrates the capability of economic-based systems for peer-to-peer distributed computing by developing users' quality-of-service requirements driven scheduling strategies and algorithms. It demonstrates their effectiveness by performing scheduling experiments on the World-Wide Grid for solving parameter sweep applications

    An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing

    Get PDF
    Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults

    LBSim: A simulation system for dynamic load-balancing algorithms for distributed systems.

    Get PDF
    In a distributed system consisting of autonomous computational units, the total computational power of all the units needs to be utilized efficiently by applying suitable load-balancing policies. For accomplishing the task, a large number of load balancing algorithms have been proposed in the literature. To facilitate the performance study of each of these load-balancing strategies, simulation has been widely used. However comparison of the load balancing algorithms becomes difficult if a different simulator is used for each case. There have been few studies on generalized simulation of load-balancing algorithms in distributed systems. Most of the simulation systems address the experiments for some particular load-balancing algorithms, whereas this thesis aims to study the simulation for a broad range of algorithms. After the characterization of the distributed systems and the extraction of the common components of load-balancing algorithms, a simulation system, called LBSim, has been built. LBSim is a generalized event-driven simulator for studying load-balancing algorithms with coarse-grained applications running on distributed networks of autonomous processing nodes. In order to verify that the simulation model can represent actual systems reasonably well, we have validated LBSim both qualitatively and quantitatively. As a toolkit of simulation, LBSim programming libraries can be reused to implement load-balancing algorithms for the purpose of performance measurement and analysis from different perspectives. As a framework of algorithm simulation can be extended with a moderate effort by following object-oriented methodology, to meet any new requirements that may arise in the future.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .D8. Source: Masters Abstracts International, Volume: 43-05, page: 1747. Adviser: A. K. Aggarwal. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    Regionally distributed architecture for dynamic e-learning environment (RDADeLE)

    Get PDF
    e-Learning is becoming an influential role as an economic method and a flexible mode of study in the institutions of higher education today which has a presence in an increasing number of college and university courses. e-Learning as system of systems is a dynamic and scalable environment. Within this environment, e-learning is still searching for a permanent, comfortable and serviceable position that is to be controlled, managed, flexible, accessible and continually up-to-date with the wider university structure. As most academic and business institutions and training centres around the world have adopted the e-learning concept and technology in order to create, deliver and manage their learning materials through the web, it has become the focus of investigation. However, management, monitoring and collaboration between these institutions and centres are limited. Existing technologies such as grid, web services and agents are promising better results. In this research a new architecture has been developed and adopted to make the e-learning environment more dynamic and scalable by dividing it into regional data grids which are managed and monitored by agents. Multi-agent technology has been applied to integrate each regional data grid with others in order to produce an architecture which is more scalable, reliable, and efficient. The result we refer to as Regionally Distributed Architecture for Dynamic e-Learning Environment (RDADeLE). Our RDADeLE architecture is an agent-based grid environment which is composed of components such as learners, staff, nodes, regional grids, grid services and Learning Objects (LOs). These components are built and organised as a multi-agent system (MAS) using the Java Agent Development (JADE) platform. The main role of the agents in our architecture is to control and monitor grid components in order to build an adaptable, extensible, and flexible grid-based e-learning system. Two techniques have been developed and adopted in the architecture to build LOs' information and grid services. The first technique is the XML-based Registries Technique (XRT). In this technique LOs' information is built using XML registries to be discovered by the learners. The registries are written in Dublin Core Metadata Initiative (DCMI) format. The second technique is the Registered-based Services Technique (RST). In this technique the services are grid services which are built using agents. The services are registered with the Directory Facilitator (DF) of a JADE platform in order to be discovered by all other components. All components of the RDADeLE system, including grid service, are built as a multi-agent system (MAS). Each regional grid in the first technique has only its own registry, whereas in the second technique the grid services of all regional grids have to be registered with the DF. We have evaluated the RDADeLE system guided by both techniques by building a simulation of the prototype. The prototype has a main interface which consists of the name of the system (RDADeLE) and a specification table which includes Number of Regional Grids, Number of Nodes, Maximum Number of Learners connected to each node, and Number of Grid Services to be filled by the administrator of the RDADeLE system in order to create the prototype. Using the RST technique shows that the RDADeLE system can be built with more regional grids with less memory consumption. Moreover, using the RST technique shows that more grid services can be registered in the RDADeLE system with a lower average search time and the search performance is increased compared with the XRT technique. Finally, using one or both techniques, the XRT or the RST, in the prototype does not affect the reliability of the RDADeLE system.Royal Commission for Jubail and Yanbu - Directorate General For Jubail Project Kingdom of Saudi Arabi
    • …
    corecore