Search CORE

328 research outputs found

Toward open computational communication science: A practical road map for reusable data and code

Author: Strycharz J.
Trilling D.
van Atteveldt W.
Welbers Kasper
Publication venue
Publication date: 01/01/2019
Field of study

Computational communication science (CCS) offers an opportunity to accelerate the scope and pace of discovery in communication research. This article argues that CCS will profit from adopting open science practices by fostering the reusability of data and code. We discuss the goals and challenges related to creating reusable data and code and offer practical guidance to individual researchers to achieve this. More specifically, we argue for integration of the research process into reusable workflows and recognition of tools and data as academic work. The challenges and road map are also critically discussed in terms of the additional burden they place on individual scholars, which culminates in a call to action for the field to support and incentivize the reusability of tools and data

UvA-DARE

An Intermediate Data-driven Methodology for Scientific Workflow Management System to Support Reusability

Author: Chakroborti Debasish 1989-
Publication venue: 'University of Saskatchewan Library'
Publication date: 06/01/2020
Field of study

Automatic processing of different logical sub-tasks by a set of rules is a workflow. A workflow management system (WfMS) is a system that helps us accomplish a complex scientific task through making a sequential arrangement of sub-tasks available as tools. Workflows are formed with modules from various domains in a WfMS, and many collaborators of the domains are involved in the workflow design process. Workflow Management Systems (WfMSs) have been gained popularity in recent years for managing various tools in a system and ensuring dependencies while building a sequence of executions for scientific analyses. As a result of heterogeneous tools involvement and collaboration requirement, Collaborative Scientific Workflow Management Systems (CSWfMS) have gained significant interest in the scientific analysis community. In such systems, big data explosion issues exist with massive velocity and variety characteristics for the heterogeneous large amount of data from different domains. Therefore a large amount of heterogeneous data need to be managed in a Scientific Workflow Management System (SWfMS) with a proper decision mechanism. Although a number of studies addressed the cost management of data, none of the existing studies are related to real- time decision mechanism or reusability mechanism. Besides, frequent execution of workflows in a SWfMS generates a massive amount of data and characteristics of such data are always incremental. Input data or module outcomes of a workflow in a SWfMS are usually large in size. Processing of such data-intensive workflows is usually time-consuming where modules are computationally expensive for their respective inputs. Besides, lack of data reusability, limitation of error recovery, inefficient workflow processing, inefficient storing of derived data, lacking in metadata association and lacking in validation of the effectiveness of a technique of existing systems need to be addressed in a SWfMS for efficient workflow building by maintaining the big data explosion. To address the issues, in this thesis first we propose an intermediate data management scheme for a SWfMS. In our second attempt, we explored the possibilities and introduced an automatic recommendation technique for a SWfMS from real-world workflow data (i.e Galaxy [1] workflows) where our investigations show that the proposed technique can facilitate 51% of workflow building in a SWfMS by reusing intermediate data of previous workflows and can reduce 74% execution time of workflow buildings in a SWfMS. Later we propose an adaptive version of our technique by considering the states of tools in a SWfMS, which shows around 40% reusability for workflows. Consequently, in our fourth study, We have done several experiments for analyzing the performance and exploring the effectiveness of the technique in a SWfMS for various environments. The technique is introduced to emphasize on storing cost reduction, increase data reusability, and faster workflow execution, to the best of our knowledge, which is the first of its kind. Detail architecture and evaluation of the technique are presented in this thesis. We believe our findings and developed system will contribute significantly to the research domain of SWfMSs

University of Saskatchewan Research Archive

NFDI4Ing - the National Research Data Infrastructure for Engineering Sciences

Author: Anthofer Verena
Auer Sören
Başkaya Sait
Bischof Christian
Bronger Torsten
Claus Florian
Cordes Florian
Demandt Évariste
Eifert Thomas
Flemisch Bernd
Fuchs Matthias
Fuhrmans Marc
Gerike Regine
Gerstner Eva-Maria
Hanke Vanessa
Heine Ina
Huebser Louis
Iglezakis Dorothea
Jagusch Gerald
Klinger Axel
Krafczyk Manfred
Kraft Angelina
Kuckertz Patrick
Küsters Ulrike
Lachmayer Roland
Langenbach Christian
Mozgova Iryna
Müller Matthias S.
Nestler Britta
Pelz Peter
Politze Marius
Preuß Nils
Przybylski-Freund Marie-Dominique
Rißler-Pipka Nanette
Robinius Martin
Schachtner Joachim
Schlenz Hartmut
Schmitt Robert H.
Schwarz Annett
Schwibs Jürgen
Selzer Michael
Sens Irina
Stemmer Christian
Stille Wolfgang
Stolten Detlef
Stotzka Rainer
Streit Achim
Strötgen Robert
Stäcker Thomas
Wang Wei Min
Publication venue: Zenodo
Publication date: 01/01/2020
Field of study

HPC Application Cloudification: The StreamFlow Toolkit

Author: Aldinucci Marco
Cantalupo Barbara
Colonnelli Iacopo
Esposito Roberto
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum fur Informatik
Publication date: 01/01/2021
Field of study

DolphinNext: a distributed data processing platform for high throughput genomics

Author: Garber Manuel
Kucukural Alper
Ozturk Ahmet R.
Turkyilmaz Osman
Yukselen Onur
Publication venue: eScholarship@UMassChan
Publication date: 19/04/2020
Field of study

BACKGROUND: The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations. RESULTS: To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as high-performance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with R-markdown and shiny support for interactive data visualization and analysis. CONCLUSION: DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results

eScholarship@UMMS

Struct:an R/bioconductor-based framework for standardised metabolomics data analysis and beyond

Author: Jankevics Andris
Lloyd Gavin Rhys
Weber Ralf J M
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/12/2020
Field of study