8,798 research outputs found
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
Hybrid spot instance based resource provisioning strategy in dynamic cloud environment
Utilization of resources to the maximum extent in large scale distributed cloud environment is a major challenge due to the nature of cloud. Spot Instances in the Amazon Elastic Compute Cloud (EC2) are provisioned based on highest bid with no guarantee of task completion but incurs the overhead of longer task execution time and price. The paper demonstrates the last partial hour and cost overhead that can be avoided by the proposed strategy of Hybrid Spot Instance. It aims to provide reliable service to the ongoing task so as to complete the execution without abruptly interrupting the long running tasks by redefining the bid price. The strategy also considers that on-demand resource services can be acquired when spot price crosses on-demand price and thereby availing high reliability. This will overcome the overhead involved during checkpointing, restarting and workload migration as in the existing system, leading to efficient resources usage for both the providers and users. Service providers revenue is carefully optimized by eliminating the free issue of last partial hour which is a taxing factor for the provider. Simulation carried out based on real time price of various instances considering heterogenous applications shows that the number of out-of-bid scenarios can be reduced largely which leads to the increased number of task completion. Checkpointing is also minimized maximally due to which the overhead associated with it is reduced. This resource provisioning strategy aims to provide preference to existing customers and the task which are nearing the execution completion
Notes on Cloud computing principles
This letter provides a review of fundamental distributed systems and economic
Cloud computing principles. These principles are frequently deployed in their
respective fields, but their inter-dependencies are often neglected. Given that
Cloud Computing first and foremost is a new business model, a new model to sell
computational resources, the understanding of these concepts is facilitated by
treating them in unison. Here, we review some of the most important concepts
and how they relate to each other
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management
As users of big data applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the pay-as-you-go model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs - systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the check-pointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures. Copyright © 2013 ACM
- …