25,217 research outputs found

    Evolutionary multi-objective workflow scheduling in Cloud

    Get PDF
    Cloud computing provides promising platforms for executing large applications with enormous computational resources to offer on demand. In a Cloud model, users are charged based on their usage of resources and the required quality of service (QoS) specifications. Although there are many existing workflow scheduling algorithms in traditional distributed or heterogeneous computing environments, they have difficulties in being directly applied to the Cloud environments since Cloud differs from traditional heterogeneous environments by its service-based resource managing method and pay-per-use pricing strategies. In this paper, we highlight such difficulties, and model the workflow scheduling problem which optimizes both makespan and cost as a Multi-objective Optimization Problem (MOP) for the Cloud environments. We propose an evolutionary multi-objective optimization (EMO)-based algorithm to solve this workflow scheduling problem on an infrastructure as a service (IaaS) platform. Novel schemes for problem-specific encoding and population initialization, fitness evaluation and genetic operators are proposed in this algorithm. Extensive experiments on real world workflows and randomly generated workflows show that the schedules produced by our evolutionary algorithm present more stability on most of the workflows with the instance-based IaaS computing and pricing models. The results also show that our algorithm can achieve significantly better solutions than existing state-of-the-art QoS optimization scheduling algorithms in most cases. The conducted experiments are based on the on-demand instance types of Amazon EC2; however, the proposed algorithm are easy to be extended to the resources and pricing models of other IaaS services.This work is supported by the National Science Foundation of China under Grand no. 61272420 and the Provincial Science Foundation of Jiangsu Grand no. BK2011022

    Scientific Workflow Integration For Services Computing

    Get PDF
    In recent years, significant scientific advances are increasingly achieved through complex scientific processes. As the exponential growth in computing technologies and scientific data, a scientific workflow may comprise a large number of heterogeneous scientific services and applications, provided by different organizations. These services, applications, and their associated data are usually distributed across heterogeneous computing environments. The integration and management of such scientific workflows are pushing the limits of current workflow technology. This dissertation presents an integrated solution to composing, scheduling, executing and developing scientific workflows and scientific workflow management systems. To provide a foundation for workflow composition, scheduling, execution and management, we propose the first reference architecture for scientific workflow management systems. The reference architecture not only provides a high-level organization of subsystems and their interactions in a workflow system, but also provides a basis for comparison between different systems and a guidance for the architectural design of an SWFMS in a specific scientific domain. To integrate heterogeneous services and applications and enable them composed to workflows, we propose a task template model which not only provides an appropriate abstraction of heterogeneous services and applications, but also encapsulates the composition and mapping of shims and functional task components within a task interface. Our proposed task specification language (TSL) not only integrates heterogeneous services and applications into uniform workflow tasks, but also provides a solution to address both TYPE-I and TYPE-II shimming problems in composing scientific workflows. To schedule scientific workflows in emerging services computing environments, we propose two workflow scheduling algorithms, the SHEFT algorithm and the SCPOR algorithm, to prioritize tasks in a workflow, map tasks onto suitable resources and order the execution of tasks on the assigned resources, so that the workflow makespan can be minimized. Our extensive experiments have shown that our proposed algorithms not only outperform other algorithms for large-scale, data-intensive and compute intensive workflows, but also allow the assigned resources elastically change on demand of the size of workflows. To execute workflows on distributed computing environments, we propose a task run model to model the run-time behaviors of tasks. The proposed task run description language (TRDL) enables the execution of task instances constructed from heterogeneous services and applications. We also develop an SOA based task management subsystem to manage all task templates, task instances and task runs for the invocation and execution of various heterogeneous task components. Finally, our developed SOA based workflow management system, the VIEW system, and a VIEW based workflow application system, the FiberFlow system, validate our architectures, models, languages, and algorithms

    Model Exploration Using OpenMOLE - a workflow engine for large scale distributed design of experiments and parameter tuning

    Get PDF
    OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. In this work, we briefly expose the strong assets of OpenMOLE and demonstrate its efficiency at exploring the parameter set of an agent simulation model. We perform a multi-objective optimisation on this model using computationally expensive Genetic Algorithms (GA). OpenMOLE hides the complexity of designing such an experiment thanks to its DSL, and transparently distributes the optimisation process. The example shows how an initialisation of the GA with a population of 200,000 individuals can be evaluated in one hour on the European Grid Infrastructure.Comment: IEEE High Performance Computing and Simulation conference 2015, Jun 2015, Amsterdam, Netherland

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    Heterogeneous hierarchical workflow composition

    Get PDF
    Workflow systems promise scientists an automated end-to-end path from hypothesis to discovery. However, expecting any single workflow system to deliver such a wide range of capabilities is impractical. A more practical solution is to compose the end-to-end workflow from more than one system. With this goal in mind, the integration of task-based and in situ workflows is explored, where the result is a hierarchical heterogeneous workflow composed of subworkflows, with different levels of the hierarchy using different programming, execution, and data models. Materials science use cases demonstrate the advantages of such heterogeneous hierarchical workflow composition.This work is a collaboration between Argonne National Laboratory and the Barcelona Supercomputing Center within the Joint Laboratory for Extreme-Scale Computing. This research is supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under contract number DE-AC02- 06CH11357, program manager Laura Biven, and by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contract 2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Distributed data mining in grid computing environments

    Get PDF
    The official published version of this article can be found at the link below.The computing-intensive data mining for inherently Internet-wide distributed data, referred to as Distributed Data Mining (DDM), calls for the support of a powerful Grid with an effective scheduling framework. DDM often shares the computing paradigm of local processing and global synthesizing. It involves every phase of Data Mining (DM) processes, which makes the workflow of DDM very complex and can be modelled only by a Directed Acyclic Graph (DAG) with multiple data entries. Motivated by the need for a practical solution of the Grid scheduling problem for the DDM workflow, this paper proposes a novel two-phase scheduling framework, including External Scheduling and Internal Scheduling, on a two-level Grid architecture (InterGrid, IntraGrid). Currently a DM IntraGrid, named DMGCE (Data Mining Grid Computing Environment), has been developed with a dynamic scheduling framework for competitive DAGs in a heterogeneous computing environment. This system is implemented in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems from oil well logging analysis are used to measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper

    Context-Aware Information Retrieval for Enhanced Situation Awareness

    No full text
    In the coalition forces, users are increasingly challenged with the issues of information overload and correlation of information from heterogeneous sources. Users might need different pieces of information, ranging from information about a single building, to the resolution strategy of a global conflict. Sometimes, the time, location and past history of information access can also shape the information needs of users. Information systems need to help users pull together data from disparate sources according to their expressed needs (as represented by system queries), as well as less specific criteria. Information consumers have varying roles, tasks/missions, goals and agendas, knowledge and background, and personal preferences. These factors can be used to shape both the execution of user queries and the form in which retrieved information is packaged. However, full automation of this daunting information aggregation and customization task is not possible with existing approaches. In this paper we present an infrastructure for context-aware information retrieval to enhance situation awareness. The infrastructure provides each user with a customized, mission-oriented system that gives access to the right information from heterogeneous sources in the context of a particular task, plan and/or mission. The approach lays on five intertwined fundamental concepts, namely Workflow, Context, Ontology, Profile and Information Aggregation. The exploitation of this knowledge, using appropriate domain ontologies, will make it feasible to provide contextual assistance in various ways to the work performed according to a user’s taskrelevant information requirements. This paper formalizes these concepts and their interrelationships
    • …
    corecore