Search CORE

83 research outputs found

Reliable DAG Scheduling on Grids with Rewinding and Migration

Author: Cole Murray
Hernandez Israel
Publication venue
Publication date: 01/01/2007
Field of study

Crossref

Edinburgh Research Explorer

Workflow scheduling for service oriented cloud computing

Author: Fida Adnan
Publication venue: 'University of Saskatchewan Library'
Publication date: 01/01/2008
Field of study

Service Orientation (SO) and grid computing are two computing paradigms that when put together using Internet technologies promise to provide a scalable yet flexible computing platform for a diverse set of distributed computing applications. This practice gives rise to the notion of a computing cloud that addresses some previous limitations of interoperability, resource sharing and utilization within distributed computing. In such a Service Oriented Computing Cloud (SOCC), applications are formed by composing a set of services together. In addition, hierarchical service layers are also possible where general purpose services at lower layers are composed to deliver more domain specific services at the higher layer. In general an SOCC is a horizontally scalable computing platform that offers its resources as services in a standardized fashion. Workflow based applications are a suitable target for SOCC where workflow tasks are executed via service calls within the cloud. One or more workflows can be deployed over an SOCC and their execution requires scheduling of services to workflow tasks as the task become ready following their interdependencies. In this thesis heuristics based scheduling policies are evaluated for scheduling workflows over a collection of services offered by the SOCC. Various execution scenarios and workflow characteristics are considered to understand the implication of the heuristic based workflow scheduling

eCommons@USASK

University of Saskatchewan Research Archive

Reactive Scheduling of DAG Applications on Heterogeneous and Dynamic Distributed Computing Systems

Author: Hernandez Jesus Israel
Publication venue
Publication date: 04/12/2008
Field of study

Institute for Computing Systems ArchitectureEmerging technologies enable a set of distributed resources across a network to be linked together and used in a coordinated fashion to solve a particular parallel application at the same time. Such applications are often abstracted as directed acyclic graphs (DAGs), in which vertices represent application tasks and edges represent data dependencies between tasks. Effective scheduling mechanisms for DAG applications are essential to exploit the tremendous potential of computational resources. The core issues are that the availability and performance of resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. In this thesis, we first consider the problem of scheduling DAG task graphs onto heterogeneous resources with changeable capabilities. We propose a list-scheduling heuristic approach, the Global Task Positioning (GTP) scheduling method, which addresses the problem by allowing rescheduling and migration of tasks in response to significant variations in resource characteristics. We observed from experiments with GTP that in an execution with relatively frequent migration, it may be that, over time, the results of some task have been copied to several other sites, and so a subsequent migrated task may have several possible sources for each of its inputs. Some of these copies may now be more quickly accessible than the original, due to dynamic variations in communication capabilities. To exploit this observation, we extended our model with a Copying Management(CM) function, resulting in a new version, the Global Task Positioning with copying facilities (GTP/c) system. The idea is to reuse such copies, in subsequent migration of placed tasks, in order to reduce the impact of migration cost on makespan. Finally, we believe that fault tolerance is an important issue in heterogeneous and dynamic computational environments as the availability of resources cannot be guaranteed. To address the problem of processor failure, we propose a rewinding mechanism which rewinds the progress of the application to a previous state, thereby preserving the execution in spite of the failed processor(s). We evaluate our mechanisms through simulation, since this allow us to generate repeatable patterns of resource performance variation. We use a standard benchmark set of DAGs, comparing performance against that of competing algorithms from the scheduling literature

Edinburgh Research Archive

Grid-centric scheduling strategies for workflow applications

Author: Zhang Yang
Publication venue
Publication date: 01/01/2010
Field of study

Grid computing faces a great challenge because the resources are not localized, but distributed, heterogeneous and dynamic. Thus, it is essential to provide a set of programming tools that execute an application on the Grid resources with as little input from the user as possible. The thesis of this work is that Grid-centric scheduling techniques of workflow applications can provide good usability of the Grid environment by reliably executing the application on a large scale distributed system with good performance. We support our thesis with new and effective approaches in the following five aspects. First, we modeled the performance of the existing scheduling approaches in a multi-cluster Grid environment. We implemented several widely-used scheduling algorithms and identified the best candidate. The study further introduced a new measurement, based on our experiments, which can improve the schedule quality of some scheduling algorithms as much as 20 fold in a multi-cluster Grid environment. Second, we studied the scalability of the existing Grid scheduling algorithms. To deal with Grid systems consisting of hundreds of thousands of resources, we designed and implemented a novel approach that performs explicit resource selection decoupled from scheduling Our experimental evaluation confirmed that our decoupled approach can be scalable in such an environment without sacrificing the quality of the schedule by more than 10%. Third, we proposed solutions to address the dynamic nature of Grid computing with a new cluster-based hybrid scheduling mechanism. Our experimental results collected from real executions on production clusters demonstrated that this approach produces programs running 30% to 100% faster than the other scheduling approaches we implemented on both reserved and shared resources. Fourth, we improved the reliability of Grid computing by incorporating fault- tolerance and recovery mechanisms into the workow application execution. Our experiments on a simulated multi-cluster Grid environment demonstrated the effectiveness of our approach and also characterized the three-way trade-off between reliability, performance and resource usage when executing a workflow application. Finally, we improved the large batch-queue wait time often found in production Grid clusters. We developed a novel approach to partition the workow application and submit them judiciously to achieve less total batch-queue wait time. The experimental results derived from production site batch queue logs show that our approach can reduce total wait time by as much as 70%. Our approaches combined can greatly improve the usability of Grid computing while increasing the performance of workow applications on a multi-cluster Grid environment

DSpace at Rice University

Process Rescheduling: Enabling Performance by Applying Multiple Metrics and Efficient Adaptations

Author: Alexandre Carissimi
Hans-Ulrich Heiss
Laércio Pilla
Philippe Navaux
Rodrigo Righi
Publication venue: 'IntechOpen'
Publication date: 17/08/2010
Field of study

IntechOpen

Proceedings of the First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014): Porto, Portugal

Author: Barbosa Jorge
Carretero Pérez Jesús
García Blas Francisco Javier
Morla Ricardo
Publication venue
Publication date: 01/01/2014
Field of study

Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

A Multi-Objective Load Balancing System for Cloud Environments

Author: Lu J
Ramezani F
Taheri J
Zomaya AY
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

© 2017 The British Computer Society. All rights reserved. Virtual machine (VM) live migration has been applied to system load balancing in cloud environments for the purpose of minimizing VM downtime and maximizing resource utilization. However, the migration process is both time-and cost-consuming as it requires the transfer of large size files or memory pages and consumes a huge amount of power and memory for the origin and destination physical machine (PM), especially for storage VM migration. This process also leads to VM downtime or slowdown. To deal with these shortcomings, we develop a Multi-objective Load Balancing (MO-LB) system that avoids VM migration and achieves system load balancing by transferring extra workload from a set of VMs allocated on an overloaded PM to other compatible VMs in the cluster with greater capacity. To reduce the time factor even more and optimize load balancing over a cloud cluster, MO-LB contains a CPU Usage Prediction (CUP) sub-system. The CUP not only predicts the performance of the VMs but also determines a set of appropriate VMs with the potential to execute the extra workload imposed on the VMs of an overloaded PM. We also design a Multi-Objective Task Scheduling optimization model using Particle Swarm Optimization to migrate the extra workload to the compatible VMs. The proposed method is evaluated using a VMware-vSphere-based private cloud in contrast to the VM migration technique applied by vMotion. The evaluation results show that the MO-LB system dramatically increases VM performance while reducing service response time, memory usage, job makespan, power consumption and the time taken for the load balancing process

OPUS - University of Technology Sydney

Generic access to symbolic computing services

Author: Carstea Alexandru
Publication venue: Mathematical and Computer Sciences
Publication date: 01/01/2012
Field of study

Symbolic computation is one of the computational domains that requires large computational resources. Computer Algebra Systems (CAS), the main tools used for symbolic computations, are mainly designed to be used as software tools installed on standalone machines that do not provide the required resources for solving large symbolic computation problems. In order to support symbolic computations an infrastructure built upon massively distributed computational environments must be developed. Building an infrastructure for symbolic computations requires a thorough analysis of the most important requirements raised by the symbolic computation world and must be built based on the most suitable architectural styles and technologies. The architecture that we propose is composed of several main components: the Computer Algebra System (CAS) Server that exposes the functionality implemented by one or more supporting CASs through generic interfaces of Grid Services; the Architecture for Grid Symbolic Services Orchestration (AGSSO) Server that allows seamless composition of CAS Server capabilities; and client side libraries to assist the users in describing workflows for symbolic computations directly within the CAS environment. We have also designed and developed a framework for automatic data management of mathematical content that relies on OpenMath encoding. To support the validation and fine tuning of the system we have developed a simulation platform that mimics the environment on which the architecture is deployed

CiteSeerX

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Economic-based Distributed Resource Management and Scheduling for Grid Computing

Author: Buyya Rajkumar
Publication venue
Publication date: 01/01/2002
Field of study

Computational Grids, emerging as an infrastructure for next generation computing, enable the sharing, selection, and aggregation of geographically distributed resources for solving large-scale problems in science, engineering, and commerce. As the resources in the Grid are heterogeneous and geographically distributed with varying availability and a variety of usage and cost policies for diverse users at different times and, priorities as well as goals that vary with time. The management of resources and application scheduling in such a large and distributed environment is a complex task. This thesis proposes a distributed computational economy as an effective metaphor for the management of resources and application scheduling. It proposes an architectural framework that supports resource trading and quality of services based scheduling. It enables the regulation of supply and demand for resources and provides an incentive for resource owners for participating in the Grid and motives the users to trade-off between the deadline, budget, and the required level of quality of service. The thesis demonstrates the capability of economic-based systems for peer-to-peer distributed computing by developing users' quality-of-service requirements driven scheduling strategies and algorithms. It demonstrates their effectiveness by performing scheduling experiments on the World-Wide Grid for solving parameter sweep applications

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server