5 research outputs found
A scientific workflow framework for scientific data querying and processing
We are at the beginning of the new era of ``e-science\u27\u27. Researchers in many areas of science, especially in astrophysics, physics, climatology and biology, are now facing tremendous increases in data volumes, as well as corresponding data analysis tools. These increased data and tools demand a better framework to manage the new generation scientific research cycle from data capture, data curation to data analysis, data query and data visualization. Scientific workflows are proving to be one of the key technologies for scientists to formalize and structure complex scientific processes to enable and accelerate many significant scientific discoveries. Although several scientific workflow management systems (SWFMSs) are developed, a formal scientific workflow composition framework, in which workflows and constructs can be composed arbitrarily to process and query collectional scientific data sets, is still to be proposed.
In this thesis, I make several contributions towards formalizing a scientific workflow composition framework. First, We proposed a dataflow-based scientific workflow composition model including a scientific workflow model that separates the declaration of the workflow interface from the definition of its functional body; and a set of workflow constructs, including Map, Reduce, Tree, Loop, Conditional, and Curry, which are fully compositional one with another. Our workflow composition framework is unique in that workflows are the only operands for composition; in this way, our approach elegantly solves the two-world problem in existing composition frameworks, in which composition needs to deal with both the world of tasks and the world of workflows. Second, We formalized a collection-oriented data model, called collectional data model, to model hierarchical collection-oriented scientific data, and a set of well-defined operators to manipulate and query such data. To our best knowledge, this is the first algebraic approach to modeling collection-oriented scientific data. Finally, we developed a prototype scientific workflow management system, called View. The View system implemented the above techniques in its subsystems and integrated them within a service-oriented architecture
Improving Usability And Scalability Of Big Data Workflows In The Cloud
Big data workflows have recently emerged as the next generation of data-centric workflow technologies to address the five “V” challenges of big data: volume, variety, velocity, veracity, and value. More formally, a big data workflow is the computerized modeling and automation of a process consisting of a set of computational tasks and their data interdependencies to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. The convergence of big data and workflows creates new challenges in workflow community.
First, the variety of big data results in a need for integrating large number of remote Web services and other heterogeneous task components that can consume and produce data in various formats and models into a uniform and interoperable workflow. Existing approaches fall short in addressing the so-called shimming problem only in an adhoc manner and unable to provide a generic solution. We automatically insert a piece of code called shims or adaptors in order to resolve the data type mismatches.
Second, the volume of big data results in a large number of datasets that needs to be queried and analyzed in an effective and personalized manner. Further, there is also a strong need for sharing, reusing, and repurposing existing tasks and workflows across different users and institutes. To overcome such limitations, we propose a folksonomy- based social workflow recommendation system to improve workflow design productivity and efficient dataset querying and analyzing.
Third, the volume of big data results in the need to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. But a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. We propose a NoSQL collectional data model that addresses this limitation.
Finally, the volume of big data combined with the unbound resource leasing capability foreseen in the cloud, facilitates data scientists to wring actionable insights from the data in a time and cost efficient manner. We propose BARENTS scheduler that supports high-performance workflow scheduling in a heterogeneous cloud-computing environment with a single objective to minimize the workflow makespan under a user provided budget constraint
Querying and managing opm-compliant scientific workflow provenance
Provenance, the metadata that records the derivation history of scientific results, is important in scientific workflows to interpret, validate, and analyze the result of scientific computing. Recently,
to promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) has been proposed and has played an important role in the community.
In this dissertation, to efficiently query and manage OPM-compliant provenance, we first propose a provenance collection framework that collects both prospective provenance, which captures
an abstract workflow specification as a recipe for future data derivation and retrospective provenance, which captures past workflow execution and data derivation information. We then
propose a relational database-based provenance system, called OPMPROV that stores, reasons, and queries prospective and retrospective provenance, which is OPM-compliant provenance. We finally propose OPQL, an OPM-level provenance query language, that is directly defined over the OPM model. An OPQL query takes an
OPM graph as input and produces an OPM graph as output; therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our provenance store, provenance collection framework, and provenance query language feature the native support of the OPM model
Recommended from our members
Intensely distributed nanoscience: co-ordinating scientific work in a large multi-sited cross-disciplinary nanomedical project
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThis thesis is concerned with the study of biomedical scientific research work that is intensely
distributed, i.e. socially distributed across multiple institutions, sites, and disciplines.
Specifically, this PhD probes the ways in which scientists co-operating on multi-sited crossdisciplinary
projects, design, use and maintain information-based resources to conduct and coordinate
their experimental activities. The research focuses on the roles of information
artefacts, i.e. the tools, media and devices used to store, track, display, and retrieve
information in paper or electronic format, in helping the scientists integrate their activities to
achieve concerted action.
To examine how scientists in globally distributed settings organise and co-ordinate their
scientific work using information artefacts, a multi-method multi-sited study informed by
different ethnographic perspectives was conducted focused on a large European crossdisciplinary
translational research project in nanodiagnostics. Situated interviews with project
scientists, participant observations and participatory learning exercises were designed and
deployed. From the data analysis, several abstractions were developed to represent how the
joined utilisations of key information artefacts support the co-ordination of experimental
activities. Subsequently, a framework was developed to highlight key interactional strategies
that need to be managed by experimenters when using artefacts to organise their work cooperatively.
This framework was then used as a guiding device to identify innovative ways to
design future digital interactive systems to support the co-ordination of intensely distributed
scientific work.
From this study, several key findings came to light. We identify the role of the experimental
protocol acts as a co-ordinative map that is co-designed dynamically to disseminate various
instantiations of experimental executions across sites. We have also shed light on the ways the
protocol, the lab book and the material log are used jointly to support the articulation of
scientific work. The protocol and the lab book are used both locally and across co-operating
sites to support four repeatability and reproducibility levels that are key to experimental
validation. The use of the local protocol / lab book dyads at each site is further integrated with that of a centralised material log artefact to enable a system of exchange of scientific content
(e.g. experimental processes, intermediate results and observations) and experimental
materials (both physical materials and key information). We have found that this integration
into a co-ordinative cluster supports awareness and the articulation of experimental activities
both locally and across remote labs. From this understanding, we have derived several
sensitising tensions to frame the strategies that scientific practitioners need to manage when
designing their multi-sited experimental work and technologists should consider when
designing systems to support them: (1) formalisation / flexibility; (2) articulability / local
appropriateness; (3) scrutiny / tinkering; (4) accountability / applicability; (5) traceability /
improvisation and (6) lastingness / immediacy. Lastly, based on these tensions, we have
suggested a number of implications for the design of interactive information artefacts that can
help manage both local and multi-sited co-ordination in intensely distributed scientific
projects
Actes du Symposium International - Le livre, la Roumanie, l’Europe / Proceedings of the International Symposium Books, Romania, Europe - 5ème édition 24-26 septembre 2012
Tome 2 des actes du Symposium International "Le livre, la Roumanie, L\u27Europe" qui s\u27est tenu les 24, 25 et 26 septembre 2012 à Mamaia, Roumanie, organisé par la Bibliothèque Métropolitaine de Bucarest. / Tome 2 of the Proceedings of the International Symposium "Books, Romania, Europe" held on 24, 25 and 26 September 2012 in Mamaia, Romania, organized by the Bucharest Metropolitan Library.
Textes réunis et présentés par :
RĂ©jean Savard
Chantal Stanescu
Hermina G.B. Anghelescu
Cristina Io