1,712 research outputs found

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    Topology-aware GPU scheduling for learning workloads in cloud environments

    Get PDF
    Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version

    Decentralized Resource Scheduling in Grid/Cloud Computing

    Get PDF
    In the Grid/Cloud environment, applications or services and resources belong to different organizations with different objectives. Entities in the Grid/Cloud are autonomous and self-interested; however, they are willing to share their resources and services to achieve their individual and collective goals. In such open environment, the scheduling decision is a challenge given the decentralized nature of the environment. Each entity has specific requirements and objectives that need to achieve. In this thesis, we review the Grid/Cloud computing technologies, environment characteristics and structure and indicate the challenges within the resource scheduling. We capture the Grid/Cloud scheduling model based on the complete requirement of the environment. We further create a mapping between the Grid/Cloud scheduling problem and the combinatorial allocation problem and propose an adequate economic-based optimization model based on the characteristic and the structure nature of the Grid/Cloud. By adequacy, we mean that a comprehensive view of required properties of the Grid/Cloud is captured. We utilize the captured properties and propose a bidding language that is expressive where entities have the ability to specify any set of preferences in the Grid/Cloud and simple as entities have the ability to express structured preferences directly. We propose a winner determination model and mechanism that utilizes the proposed bidding language and finds a scheduling solution. Our proposed approach integrates concepts and principles of mechanism design and classical scheduling theory. Furthermore, we argue that in such open environment privacy concerns by nature is part of the requirement in the Grid/Cloud. Hence, any scheduling decision within the Grid/Cloud computing environment is to incorporate the feasibility of privacy protection of an entity. Each entity has specific requirements in terms of scheduling and privacy preferences. We analyze the privacy problem in the Grid/Cloud computing environment and propose an economic based model and solution architecture that provides a scheduling solution given privacy concerns in the Grid/Cloud. Finally, as a demonstration of the applicability of the approach, we apply our solution by integrating with Globus toolkit (a well adopted tool to enable Grid/Cloud computing environment). We also, created simulation experimental results to capture the economic and time efficiency of the proposed solution

    Supporting Quality of Service in Scientific Workflows

    Get PDF
    While workflow management systems have been utilized in enterprises to support businesses for almost two decades, the use of workflows in scientific environments was fairly uncommon until recently. Nowadays, scientists use workflow systems to conduct scientific experiments, simulations, and distributed computations. However, most scientific workflow management systems have not been built using existing workflow technology; rather they have been designed and developed from scratch. Due to the lack of generality of early scientific workflow systems, many domain-specific workflow systems have been developed. Generally speaking, those domain-specific approaches lack common acceptance and tool support and offer lower robustness compared to business workflow systems. In this thesis, the use of the industry standard BPEL, a workflow language for modeling business processes, is proposed for the modeling and the execution of scientific workflows. Due to the widespread use of BPEL in enterprises, a number of stable and mature software products exist. The language is expressive (Turingcomplete) and not restricted to specific applications. BPEL is well suited for the modeling of scientific workflows, but existing implementations of the standard lack important features that are necessary for the execution of scientific workflows. This work presents components that extend an existing implementation of the BPEL standard and eliminate the identified weaknesses. The components thus provide the technical basis for use of BPEL in academia. The particular focus is on so-called non-functional (Quality of Service) requirements. These requirements include scalability, reliability (fault tolerance), data security, and cost (of executing a workflow). From a technical perspective, the workflow system must be able to interface with the middleware systems that are commonly used by the scientific workflow community to allow access to heterogeneous, distributed resources (especially Grid and Cloud resources). The major components cover exactly these requirements: Cloud Resource Provisioner Scalability of the workflow system is achieved by automatically adding additional (Cloud) resources to the workflow system’s resource pool when the workflow system is heavily loaded. Fault Tolerance Module High reliability is achieved via continuous monitoring of workflow execution and corrective interventions, such as re-execution of a failed workflow step or replacement of the faulty resource. Cost Aware Data Flow Aware Scheduler The majority of scientific workflow systems only take the performance and utilization of resources for the execution of workflow steps into account when making scheduling decisions. The presented workflow system goes beyond that. By defining preference values for the weighting of costs and the anticipated workflow execution time, workflow users may influence the resource selection process. The developed multiobjective scheduling algorithm respects the defined weighting and makes both efficient and advantageous decisions using a heuristic approach. Security Extensions Because it supports various encryption, signature and authentication mechanisms (e.g., Grid Security Infrastructure), the workflow system guarantees data security in the transfer of workflow data. Furthermore, this work identifies the need to equip workflow developers with workflow modeling tools that can be used intuitively. This dissertation presents two modeling tools that support users with different needs. The first tool, DAVO (domain-adaptable, Visual BPEL Orchestrator), operates at a low level of abstraction and allows users with knowledge of BPEL to use the full extent of the language. DAVO is a software that offers extensibility and customizability for different application domains. These features are used in the implementation of the second tool, SimpleBPEL Composer. SimpleBPEL is aimed at users with little or no background in computer science and allows for quick and intuitive development of BPEL workflows based on predefined components

    Cost and Performance-Based Resource Selection Scheme for Asynchronous Replicated System in Utility-Based Computing Environment

    Get PDF
    A resource selection problem for asynchronous replicated systems in utility-based computing environment is addressed in this paper. The needs for a special attention on this problem lies on the fact that most of the existing replication scheme in this computing system whether implicitly support synchronous replication and/or only consider read-only job. The problem is undoubtedly complex to be solved as two main issues need to be concerned simultaneously, i.e. 1) the difficulty on predicting the performance of the resources in terms of job response time, and 2) an efficient mechanism must be employed in order to measure the trade-off between the performance and the monetary cost incurred on resources so that minimum cost is preserved while providing low job response time. Therefore, a simple yet efficient algorithm that deals with the complexity of resource selection problem in utility-based computing systems is proposed in this paper. The problem is formulated as a Multi Criteria Decision Making (MCDM) problem. The advantages of the algorithm are two-folds. On one fold, it hides the complexity of resource selection process without neglecting important components that affect job response time. The difficulty on estimating job response time is captured by representing them in terms of different QoS criteria levels at each resource. On the other fold, this representation further relaxed the complexity in measuring the trade-offs between the performance and the monetary cost incurred on resources. The experiments proved that our proposed resource selection scheme achieves an appealing result with good system performance and low monetary cost as compared to existing algorithms

    A theoretical and computational basis for CATNETS

    Get PDF
    The main content of this report is the identification and definition of market mechanisms for Application Layer Networks (ALNs). On basis of the structured Market Engineering process, the work comprises the identification of requirements which adequate market mechanisms for ALNs have to fulfill. Subsequently, two mechanisms for each, the centralized and the decentralized case are described in this document. These build the theoretical foundation for the work within the following two years of the CATNETS project. --Grid Computing
    • …
    corecore