5 research outputs found

    A scientific workflow framework for scientific data querying and processing

    Get PDF
    We are at the beginning of the new era of ``e-science\u27\u27. Researchers in many areas of science, especially in astrophysics, physics, climatology and biology, are now facing tremendous increases in data volumes, as well as corresponding data analysis tools. These increased data and tools demand a better framework to manage the new generation scientific research cycle from data capture, data curation to data analysis, data query and data visualization. Scientific workflows are proving to be one of the key technologies for scientists to formalize and structure complex scientific processes to enable and accelerate many significant scientific discoveries. Although several scientific workflow management systems (SWFMSs) are developed, a formal scientific workflow composition framework, in which workflows and constructs can be composed arbitrarily to process and query collectional scientific data sets, is still to be proposed. In this thesis, I make several contributions towards formalizing a scientific workflow composition framework. First, We proposed a dataflow-based scientific workflow composition model including a scientific workflow model that separates the declaration of the workflow interface from the definition of its functional body; and a set of workflow constructs, including Map, Reduce, Tree, Loop, Conditional, and Curry, which are fully compositional one with another. Our workflow composition framework is unique in that workflows are the only operands for composition; in this way, our approach elegantly solves the two-world problem in existing composition frameworks, in which composition needs to deal with both the world of tasks and the world of workflows. Second, We formalized a collection-oriented data model, called collectional data model, to model hierarchical collection-oriented scientific data, and a set of well-defined operators to manipulate and query such data. To our best knowledge, this is the first algebraic approach to modeling collection-oriented scientific data. Finally, we developed a prototype scientific workflow management system, called View. The View system implemented the above techniques in its subsystems and integrated them within a service-oriented architecture

    Improving Usability And Scalability Of Big Data Workflows In The Cloud

    Get PDF
    Big data workflows have recently emerged as the next generation of data-centric workflow technologies to address the five “V” challenges of big data: volume, variety, velocity, veracity, and value. More formally, a big data workflow is the computerized modeling and automation of a process consisting of a set of computational tasks and their data interdependencies to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. The convergence of big data and workflows creates new challenges in workflow community. First, the variety of big data results in a need for integrating large number of remote Web services and other heterogeneous task components that can consume and produce data in various formats and models into a uniform and interoperable workflow. Existing approaches fall short in addressing the so-called shimming problem only in an adhoc manner and unable to provide a generic solution. We automatically insert a piece of code called shims or adaptors in order to resolve the data type mismatches. Second, the volume of big data results in a large number of datasets that needs to be queried and analyzed in an effective and personalized manner. Further, there is also a strong need for sharing, reusing, and repurposing existing tasks and workflows across different users and institutes. To overcome such limitations, we propose a folksonomy- based social workflow recommendation system to improve workflow design productivity and efficient dataset querying and analyzing. Third, the volume of big data results in the need to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. But a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. We propose a NoSQL collectional data model that addresses this limitation. Finally, the volume of big data combined with the unbound resource leasing capability foreseen in the cloud, facilitates data scientists to wring actionable insights from the data in a time and cost efficient manner. We propose BARENTS scheduler that supports high-performance workflow scheduling in a heterogeneous cloud-computing environment with a single objective to minimize the workflow makespan under a user provided budget constraint

    Querying and managing opm-compliant scientific workflow provenance

    Get PDF
    Provenance, the metadata that records the derivation history of scientific results, is important in scientific workflows to interpret, validate, and analyze the result of scientific computing. Recently, to promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) has been proposed and has played an important role in the community. In this dissertation, to efficiently query and manage OPM-compliant provenance, we first propose a provenance collection framework that collects both prospective provenance, which captures an abstract workflow specification as a recipe for future data derivation and retrospective provenance, which captures past workflow execution and data derivation information. We then propose a relational database-based provenance system, called OPMPROV that stores, reasons, and queries prospective and retrospective provenance, which is OPM-compliant provenance. We finally propose OPQL, an OPM-level provenance query language, that is directly defined over the OPM model. An OPQL query takes an OPM graph as input and produces an OPM graph as output; therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our provenance store, provenance collection framework, and provenance query language feature the native support of the OPM model

    Actes du Symposium International - Le livre, la Roumanie, l’Europe / Proceedings of the International Symposium Books, Romania, Europe - 5ème édition 24-26 septembre 2012

    Get PDF
    Tome 2 des actes du Symposium International "Le livre, la Roumanie, L\u27Europe" qui s\u27est tenu les 24, 25 et 26 septembre 2012 à Mamaia, Roumanie, organisé par la Bibliothèque Métropolitaine de Bucarest. / Tome 2 of the Proceedings of the International Symposium "Books, Romania, Europe" held on 24, 25 and 26 September 2012 in Mamaia, Romania, organized by the Bucharest Metropolitan Library. Textes réunis et présentés par : Réjean Savard Chantal Stanescu Hermina G.B. Anghelescu Cristina Io
    corecore