8,798 research outputs found

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    Hybrid spot instance based resource provisioning strategy in dynamic cloud environment

    Get PDF
    Utilization of resources to the maximum extent in large scale distributed cloud environment is a major challenge due to the nature of cloud. Spot Instances in the Amazon Elastic Compute Cloud (EC2) are provisioned based on highest bid with no guarantee of task completion but incurs the overhead of longer task execution time and price. The paper demonstrates the last partial hour and cost overhead that can be avoided by the proposed strategy of Hybrid Spot Instance. It aims to provide reliable service to the ongoing task so as to complete the execution without abruptly interrupting the long running tasks by redefining the bid price. The strategy also considers that on-demand resource services can be acquired when spot price crosses on-demand price and thereby availing high reliability. This will overcome the overhead involved during checkpointing, restarting and workload migration as in the existing system, leading to efficient resources usage for both the providers and users. Service providers revenue is carefully optimized by eliminating the free issue of last partial hour which is a taxing factor for the provider. Simulation carried out based on real time price of various instances considering heterogenous applications shows that the number of out-of-bid scenarios can be reduced largely which leads to the increased number of task completion. Checkpointing is also minimized maximally due to which the overhead associated with it is reduced. This resource provisioning strategy aims to provide preference to existing customers and the task which are nearing the execution completion

    Notes on Cloud computing principles

    Get PDF
    This letter provides a review of fundamental distributed systems and economic Cloud computing principles. These principles are frequently deployed in their respective fields, but their inter-dependencies are often neglected. Given that Cloud Computing first and foremost is a new business model, a new model to sell computational resources, the understanding of these concepts is facilitated by treating them in unison. Here, we review some of the most important concepts and how they relate to each other

    Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management

    Get PDF
    As users of big data applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the pay-as-you-go model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs - systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the check-pointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures. Copyright © 2013 ACM
    • …
    corecore