8 research outputs found

    Optimizing Workflow Data Footprint

    Get PDF
    In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid-the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application's runtime

    On the Use of Cloud Computing for Scientific Workflows

    Get PDF
    This paper explores the use of cloud computing for scientific workflows, focusing on a widely used astronomy application-Montage. The approach is to evaluate from the point of view of a scientific workflow the tradeoffs between running in a local environment, if such is available, and running in a virtual environment via remote, wide-area network resource access. Our results show that for Montage, a workflow with short job runtimes, the virtual environment can provide good compute time performance but it can suffer from resource scheduling delays and wide-area communications

    Presentation an Approach for Placement Phase in Mapping Algorithm

    Get PDF
    The data requirements of both scientific and commercial applications have been increasing drastically in recent years. Just a couple of years ago, the data requirements for an average scientific application were measured in terabytes, whereas today we use petabytes to measure them. Moreover, these data requirements continue to increase rapidly every year, and in less than a decade they are expected to reach the exabyte (1 million terabytes) scale.. In this work, the data duplication technique has not been used by us. That’s because of increase in costs and expenses of using a cloud system.In this paper, an approach to mapping workflow tasks and data between cloud system data centers has been presented. This approach encompasses 2 phases: both of which both have been given enough input to appropriately map tasks and data between data centers in such a way that the total time for task execution and data movement becomes minimal. In other words, the goal of mentioned approach is to present a trade-off between these two Goals. Simulations have demonstrated that the said approach can fulfill stated goals effectively. Keywords:Distributed system, scientific application, application, data requirement

    Utilizing the blackboard paradigm to implement a workflow engine

    Get PDF
    Workflow management has evolved into a mature field with numerous workflow management systems with scores of features. These systems are designed to automate business processes of organisations. However, many of these workflow engines struggle to support complex workflows. There has been relatively little research into building a workflow engine utilizing the blackboard paradigm. The blackboard paradigm can be characterized as specialists interacting with and updating a centralized data structure, namely the blackboard, with partial and complete solutions. The opportunistic control innate to the blackboard paradigm can be leveraged to support the execution of complex workflows. Furthermore, the blackboard architecture can be seen to accommodate comprehensive workflow functionality. This research aims to verify whether or not the blackboard paradigm can be used to build a workflow engine. To validate this research, a prototype was designed and developed following stringent guidelines in order to remain true to the blackboard paradigm. Four main perspectives of workflow management namely the functional, behavioural, informational and operational aspects with their quality indicators and requirements were used to evaluate the prototype. This evaluation approach was chosen since it is universally applicable to any workflow engine and thereby provides a common platform on which the prototype can be judged and compared against other workflow engines. The two most important quality indicators are the level of support a workflow engine can provide for 20 main workflow patterns and 40 main data patterns. Test cases based on these patterns were developed and executed within the prototype to determine the level of support. It was found that the prototype supports 85% of all the workflow patterns and 72.5% of all the data patterns. This reveals some functional limitations in the prototype and improvement suggestions are given that can boost these scores to 95% and 90% for workflow and data patterns respectively. The nature of the blackboard paradigm only prevents support of only 5% and 10% of the workflow and data patterns respectively. The prototype is shown to substantially outperform most other workflow engines in the level of patterns support. Besides support for these patterns, other less important quality indicators provided by the main aspects of workflow management are also found to be present in the prototype. Given the above evidence, it is possible to conclude that a workflow engine can be successfully built utilizing the blackboard paradigm

    Optimizing workflow data footprint

    Get PDF
    In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid-the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application's runtime

    Optimizing Workflow Data Footprint

    No full text
    In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid─the Open Science Grid. We show that although reducing the data footprint of Montage by 48 % can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56 % reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application’s runtime. 1
    corecore