Search CORE

5 research outputs found

A scientific workflow framework for scientific data querying and processing

Author: Fei Xubo
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2011
Field of study

We are at the beginning of the new era of ``e-science\u27\u27. Researchers in many areas of science, especially in astrophysics, physics, climatology and biology, are now facing tremendous increases in data volumes, as well as corresponding data analysis tools. These increased data and tools demand a better framework to manage the new generation scientific research cycle from data capture, data curation to data analysis, data query and data visualization. Scientific workflows are proving to be one of the key technologies for scientists to formalize and structure complex scientific processes to enable and accelerate many significant scientific discoveries. Although several scientific workflow management systems (SWFMSs) are developed, a formal scientific workflow composition framework, in which workflows and constructs can be composed arbitrarily to process and query collectional scientific data sets, is still to be proposed. In this thesis, I make several contributions towards formalizing a scientific workflow composition framework. First, We proposed a dataflow-based scientific workflow composition model including a scientific workflow model that separates the declaration of the workflow interface from the definition of its functional body; and a set of workflow constructs, including Map, Reduce, Tree, Loop, Conditional, and Curry, which are fully compositional one with another. Our workflow composition framework is unique in that workflows are the only operands for composition; in this way, our approach elegantly solves the two-world problem in existing composition frameworks, in which composition needs to deal with both the world of tasks and the world of workflows. Second, We formalized a collection-oriented data model, called collectional data model, to model hierarchical collection-oriented scientific data, and a set of well-defined operators to manipulate and query such data. To our best knowledge, this is the first algebraic approach to modeling collection-oriented scientific data. Finally, we developed a prototype scientific workflow management system, called View. The View system implemented the above techniques in its subsystems and integrated them within a service-oriented architecture

Digital Commons@Wayne State University

Improving Usability And Scalability Of Big Data Workflows In The Cloud

Author: Mohan Aravind
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2017
Field of study

Big data workflows have recently emerged as the next generation of data-centric workflow technologies to address the five “V” challenges of big data: volume, variety, velocity, veracity, and value. More formally, a big data workflow is the computerized modeling and automation of a process consisting of a set of computational tasks and their data interdependencies to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. The convergence of big data and workflows creates new challenges in workflow community. First, the variety of big data results in a need for integrating large number of remote Web services and other heterogeneous task components that can consume and produce data in various formats and models into a uniform and interoperable workflow. Existing approaches fall short in addressing the so-called shimming problem only in an adhoc manner and unable to provide a generic solution. We automatically insert a piece of code called shims or adaptors in order to resolve the data type mismatches. Second, the volume of big data results in a large number of datasets that needs to be queried and analyzed in an effective and personalized manner. Further, there is also a strong need for sharing, reusing, and repurposing existing tasks and workflows across different users and institutes. To overcome such limitations, we propose a folksonomy- based social workflow recommendation system to improve workflow design productivity and efficient dataset querying and analyzing. Third, the volume of big data results in the need to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. But a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. We propose a NoSQL collectional data model that addresses this limitation. Finally, the volume of big data combined with the unbound resource leasing capability foreseen in the cloud, facilitates data scientists to wring actionable insights from the data in a time and cost efficient manner. We propose BARENTS scheduler that supports high-performance workflow scheduling in a heterogeneous cloud-computing environment with a single objective to minimize the workflow makespan under a user provided budget constraint

Digital Commons@Wayne State University

Querying and managing opm-compliant scientific workflow provenance

Author: Lim Chunhyeok
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2012
Field of study

Provenance, the metadata that records the derivation history of scientific results, is important in scientific workflows to interpret, validate, and analyze the result of scientific computing. Recently, to promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) has been proposed and has played an important role in the community. In this dissertation, to efficiently query and manage OPM-compliant provenance, we first propose a provenance collection framework that collects both prospective provenance, which captures an abstract workflow specification as a recipe for future data derivation and retrospective provenance, which captures past workflow execution and data derivation information. We then propose a relational database-based provenance system, called OPMPROV that stores, reasons, and queries prospective and retrospective provenance, which is OPM-compliant provenance. We finally propose OPQL, an OPM-level provenance query language, that is directly defined over the OPM model. An OPQL query takes an OPM graph as input and produces an OPM graph as output; therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our provenance store, provenance collection framework, and provenance query language feature the native support of the OPM model

Digital Commons@Wayne State University

Recommended from our members

Intensely distributed nanoscience: co-ordinating scientific work in a large multi-sited cross-disciplinary nanomedical project

Author: Roubert Francois
Publication venue: Brunel University London
Publication date: 01/01/2017
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThis thesis is concerned with the study of biomedical scientific research work that is intensely distributed, i.e. socially distributed across multiple institutions, sites, and disciplines. Specifically, this PhD probes the ways in which scientists co-operating on multi-sited crossdisciplinary projects, design, use and maintain information-based resources to conduct and coordinate their experimental activities. The research focuses on the roles of information artefacts, i.e. the tools, media and devices used to store, track, display, and retrieve information in paper or electronic format, in helping the scientists integrate their activities to achieve concerted action. To examine how scientists in globally distributed settings organise and co-ordinate their scientific work using information artefacts, a multi-method multi-sited study informed by different ethnographic perspectives was conducted focused on a large European crossdisciplinary translational research project in nanodiagnostics. Situated interviews with project scientists, participant observations and participatory learning exercises were designed and deployed. From the data analysis, several abstractions were developed to represent how the joined utilisations of key information artefacts support the co-ordination of experimental activities. Subsequently, a framework was developed to highlight key interactional strategies that need to be managed by experimenters when using artefacts to organise their work cooperatively. This framework was then used as a guiding device to identify innovative ways to design future digital interactive systems to support the co-ordination of intensely distributed scientific work. From this study, several key findings came to light. We identify the role of the experimental protocol acts as a co-ordinative map that is co-designed dynamically to disseminate various instantiations of experimental executions across sites. We have also shed light on the ways the protocol, the lab book and the material log are used jointly to support the articulation of scientific work. The protocol and the lab book are used both locally and across co-operating sites to support four repeatability and reproducibility levels that are key to experimental validation. The use of the local protocol / lab book dyads at each site is further integrated with that of a centralised material log artefact to enable a system of exchange of scientific content (e.g. experimental processes, intermediate results and observations) and experimental materials (both physical materials and key information). We have found that this integration into a co-ordinative cluster supports awareness and the articulation of experimental activities both locally and across remote labs. From this understanding, we have derived several sensitising tensions to frame the strategies that scientific practitioners need to manage when designing their multi-sited experimental work and technologists should consider when designing systems to support them: (1) formalisation / flexibility; (2) articulability / local appropriateness; (3) scrutiny / tinkering; (4) accountability / applicability; (5) traceability / improvisation and (6) lastingness / immediacy. Lastly, based on these tensions, we have suggested a number of implications for the design of interactive information artefacts that can help manage both local and multi-sited co-ordination in intensely distributed scientific projects

Brunel University Research Archive

Actes du Symposium International - Le livre, la Roumanie, l’Europe / Proceedings of the International Symposium Books, Romania, Europe - 5ème édition 24-26 septembre 2012

Author: Anghelescu Hermina G.B.
Bajjaly Stephen
Bats Raphaëlle
Bernaoui Radia
Bokhonskaya Elena
Braham Hager
Cantau Alina
cstanescu
de Miribel Marielle
Doncque Marie-Paule
Epron Benoît
Hassoun Mohamed
Ion Cristina
Kniffel Leonard
Levreaud Philippe
Marian Koren
Marinescu Luiza
Masson Helmut
Nazare Daniel
Pepene Nicolae
Savard Réjean
Savova Julia
Stanescu Chantal
Svenbro Anna
Wigell-Ryynänen Barbro
Publication venue: Université de Bucarest
Publication date
Field of study

Tome 2 des actes du Symposium International "Le livre, la Roumanie, L\u27Europe" qui s\u27est tenu les 24, 25 et 26 septembre 2012 à Mamaia, Roumanie, organisé par la Bibliothèque Métropolitaine de Bucarest. / Tome 2 of the Proceedings of the International Symposium "Books, Romania, Europe" held on 24, 25 and 26 September 2012 in Mamaia, Romania, organized by the Bucharest Metropolitan Library. Textes réunis et présentés par : Réjean Savard Chantal Stanescu Hermina G.B. Anghelescu Cristina Io

Bibliothèque numérique de l'enssib