Search CORE

39,850 research outputs found

Pingo: A Framework for the Management of Storage of Intermediate Outputs of Computational Workflows

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Scientific workflows allow scientists to easily model and express the entire data processing steps, typically as a directed acyclic graph (DAG). These scientific workflows are made of a collection of tasks that usually take a long time to compute and that produce a considerable amount of intermediate datasets. Because of the nature of scientific exploration, a scientific workflow can be modified and re-run multiple times, or new scientific workflows are created that might make use of past intermediate datasets. Storing intermediate datasets has the potential to save time in computations. Since storage is limited, one main problem that needs a solution is determining which intermediate datasets need to be saved at creation time in order to minimize the computational time of the workflows to be run in the future. This research thesis proposes the design and implementation of Pingo, a system that is capable of managing the computations of scientific workflows as well as the storage, provenance and deletion of intermediate datasets. Pingo uses the history of workflows submitted to the system to predict the most likely datasets to be needed in the future, and subjects the decision of dataset deletion to the optimization of the computational time of future workflows.Dissertation/ThesisMasters Thesis Computer Science 201

ASU Digital Repository

Reusability Challenges of Scientific Workflows: A Case Study for Galaxy

Author: Alam Khairul
Roy Banani
Serebrenik Alexander
Publication venue
Publication date: 13/09/2023
Field of study

Scientific workflow has become essential in software engineering because it provides a structured approach to designing, executing, and analyzing scientific experiments. Software developers and researchers have developed hundreds of scientific workflow management systems so scientists in various domains can benefit from them by automating repetitive tasks, enhancing collaboration, and ensuring the reproducibility of their results. However, even for expert users, workflow creation is a complex task due to the dramatic growth of tools and data heterogeneity. Thus, scientists attempt to reuse existing workflows shared in workflow repositories. Unfortunately, several challenges prevent scientists from reusing those workflows. In this study, we thus first attempted to identify those reusability challenges. We also offered an action list and evidence-based guidelines to promote the reusability of scientific workflows. Our intensive manual investigation examined the reusability of existing workflows and exposed several challenges. The challenges preventing reusability include tool upgrading, tool support unavailability, design flaws, incomplete workflows, failure to load a workflow, etc. Such challenges and our action list offered guidelines to future workflow composers to create better workflows with enhanced reusability. In the future, we plan to develop a recommender system using reusable workflows that can assist scientists in creating effective and error-free workflows.Comment: Accepted in APSEC 202

arXiv.org e-Print Archive

Automatic vs Manual Provenance Abstractions: Mind the Gap

Author: Alper Pinar
Belhajjame Khalid
Goble Carole A.
Publication venue
Publication date: 21/05/2016
Field of study

In recent years the need to simplify or to hide sensitive information in provenance has given way to research on provenance abstraction. In the context of scientific workflows, existing research provides techniques to semi automatically create abstractions of a given workflow description, which is in turn used as filters over the workflow's provenance traces. An alternative approach that is commonly adopted by scientists is to build workflows with abstractions embedded into the workflow's design, such as using sub-workflows. This paper reports on the comparison of manual versus semi-automated approaches in a context where result abstractions are used to filter report-worthy results of computational scientific analyses. Specifically; we take a real-world workflow containing user-created design abstractions and compare these with abstractions created by ZOOM UserViews and Workflow Summaries systems. Our comparison shows that semi-automatic and manual approaches largely overlap from a process perspective, meanwhile, there is a dramatic mismatch in terms of data artefacts retained in an abstracted account of derivation. We discuss reasons and suggest future research directions.Comment: Preprint accepted to the 2016 workshop on the Theory and Applications of Provenance, TAPP 201

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Towards enabling scientific workflows for the future internet of things

Author: A Kertesz
B Buyya
C Cappiello
J Gubbi
LM Vaquero
N Mitton
R Buyya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

SZTAKI Publication Repository

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

EPiK-a Workflow for Electron Tomography in Kepler.

Author: Altintas Ilkay
Chen Ruijuan
Crawl Daniel
Ellisman Mark
Lawrence Albert
Phan Sébastien
Wan Xiaohua
Wang Jianwu
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

Scientific workflows integrate data and computing interfaces as configurable, semi-automatic graphs to solve a scientific problem. Kepler is such a software system for designing, executing, reusing, evolving, archiving and sharing scientific workflows. Electron tomography (ET) enables high-resolution views of complex cellular structures, such as cytoskeletons, organelles, viruses and chromosomes. Imaging investigations produce large datasets. For instance, in Electron Tomography, the size of a 16 fold image tilt series is about 65 Gigabytes with each projection image including 4096 by 4096 pixels. When we use serial sections or montage technique for large field ET, the dataset will be even larger. For higher resolution images with multiple tilt series, the data size may be in terabyte range. Demands of mass data processing and complex algorithms require the integration of diverse codes into flexible software structures. This paper describes a workflow for Electron Tomography Programs in Kepler (EPiK). This EPiK workflow embeds the tracking process of IMOD, and realizes the main algorithms including filtered backprojection (FBP) from TxBR and iterative reconstruction methods. We have tested the three dimensional (3D) reconstruction process using EPiK on ET data. EPiK can be a potential toolkit for biology researchers with the advantage of logical viewing, easy handling, convenient sharing and future extensibility

PubMed Central

eScholarship - University of California

The Evolution of myExperiment

Author: Aleksejevs Sergejs
Bechhofer Sean
Bhagat Jiten
Cruickshank Don
De Roure David
Fisher Paul
Goble Carole
Kollara Nandkumar
Michaelides Danius
Missier Paolo
Newman David
Ramsden Marcus
Roos Marco
Wolstencroft Katy
Zaluska Ed
Zhao Jun
Publication venue
Publication date
Field of study

The myExperiment social website for sharing scientific workflows, designed according to Web 2.0 principles, has grown to be the largest public repository of its kind. It is distinctive for its focus on sharing methods, its researcher-centric design and its facility to aggregate content into sharable 'research objects'. This evolution of myExperiment has occurred hand in hand with its users. myExperiment now supports Linked Data as a step toward our vision of the future research environment, which we categorise here as '3rd generation e-Research'

Southampton (e-Prints Soton)

The Sciences of Data – Moving Towards a Comprehensive Systems Perspective

Author: Garcia Victor
Horn Thomas
Ott Claus
Publication venue
Publication date: 01/01/2022
Field of study

Data science’s rapid development in a dynamically growing data environment endows it with unique characteristics among scientific disciplines, juxtaposing challenges typically encountered in theoretical as well as empirical sciences. This raises questions as to the identification of the most pressing problems for data science, as well as to what constitutes its theoretical foundations. In this contribution, we first describe data science from the perspective of philosophy of science. We argue that the current mode of development of data science is adequately described by what we term the differentiational-expansionist mode. This leads us to conclude that data science concerns the acquisition of scientific theories relating to the application of methods, workflows and algorithms that generate value for users – which we term the integrative view. This definition emphasizes the interdependent nature of human and algorithmic elements in complex data workflows. We then offer four challenges for the future of the field. We conclude that since full control of entire data workflows is unfeasible, attention should be redirected towards the creation of an infrastructure by which data workflows will self-organize in a useful manner

KITopen

ZHAW digitalcollection

Progress and prospects for accelerating materials science with automated and autonomous workflows

Author: Gregoire John M.
Stein Helge S.
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 14/11/2019
Field of study

Accelerating materials research by integrating automation with artificial intelligence is increasingly recognized as a grand scientific challenge to discover and develop materials for emerging and future technologies. While the solid state materials science community has demonstrated a broad range of high throughput methods and effectively leveraged computational techniques to accelerate individual research tasks, revolutionary acceleration of materials discovery has yet to be fully realized. This perspective review presents a framework and ontology to outline a materials experiment lifecycle and visualize materials discovery workflows, providing a context for mapping the realized levels of automation and the next generation of autonomous loops in terms of scientific and automation complexity. Expanding autonomous loops to encompass larger portions of complex workflows will require integration of a range of experimental techniques as well as automation of expert decisions, including subtle reasoning about data quality, responses to unexpected data, and model design. Recent demonstrations of workflows that integrate multiple techniques and include autonomous loops, combined with emerging advancements in artificial intelligence and high throughput experimentation, signal the imminence of a revolution in materials discovery

Caltech Authors