Search CORE

651 research outputs found

Fine-Grain Interoperability of Scientific Workflows in Distributed Computing Infrastructures

Author: David Rogers
E Deelman
I Taylor
Ian Harvey
Ian Taylor
J Montagnat
Johan Montagnat
Kassian Plankensteiner
Matthias Janetschek
P Kacsuk
PD Wells
Péter Kacsuk
Radu Prodan
T Glatard
Thomas Fahringer
WB Dobrusky
Ákos Balaskó
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Today there exist a wide variety of scientific workflow management systems, each designed to fulfill the needs of a certain scientific community. Unfortunately, once a workflow application has been designed in one particular system it becomes very hard to share it with users working with different systems. Portability of workflows and interoperability between current systems barely exists. In this work, we present the fine-grained interoperability solution proposed in the SHIWA European project that brings together four representative European workflow systems: ASKALON, MOTEUR, WS-PGRADE, and Triana. The proposed interoperability is realised at two levels of abstraction: abstract and concrete. At the abstract level, we propose a generic Interoperable Workflow Intermediate Representation (IWIR) that can be used as a common bridge for translating workflows between different languages independent of the underlying distributed computing infrastructure. At the concrete level, we propose a bundling technique that aggregates the abstract IWIR representation and concrete task representations to enable workflow instantiation, execution and scheduling. We illustrate case studies using two real-workflow applications designed in a native environment and then translated and executed by a foreign workflow system in a foreign distributed computing infrastructure. © 2013 Springer Science+Business Media Dordrecht

CiteSeerX

Crossref

SZTAKI Publication Repository

Online Research @ Cardiff

HAL-UNICE

University of Innsbruck Digital Library

A novel approach to user-steering in scientific workflows

Author: Kacsuk Péter
Kail Eszter
Kozlovszky Miklós
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

From the scientist's perspective the workflow execution is like black boxes. The scientist submits the workflow and at the end, the result or a notification about failed completion is returned. Concerning long running experiments or when workflows are in experimental phase it may not be acceptable. Scientist may need to fine-tune and monitor their experiments. To support the scientist with special user interaction tool we introduced intervention points (iPoints) where the user takes over the control for a while and has the possibility to interfere, namely to change some parameters or data, or to stop, to restart the workflow or even to deviate from the original workflow model during enactment. We plan to implement our solution in IWIR \cite{plan2011} language which was targeted to provide interoperability between four existing well-known SWfMS within the framework of the SHIWA project

Crossref

SZTAKI Publication Repository

EGI user forum 2011 : book of abstracts

Author
Publication venue
Publication date: 01/01/2011
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

A model for information and action flows connecting science gateways to distributed computing infrastructures

Author: Arshad J.
Arshad J.
Frost G.
Frost G.
Gesing S.
Gesing S.
Jaghoori M.
Jaghoori M.
Olabarriaga S.
Olabarriaga S.
Pierantoni G.
Pierantoni G.
Terstyanski G.
Terstyanski G.
Publication venue: CEUR Workshop Proceedings
Publication date: 01/01/2016
Field of study

WestminsterResearch

FAIR Computational Workflows

Author: Cohen-Boulakia Sarah
Crusoe Michael R.
Garijo Daniel
Gil Yolanda
Goble Carole
Peters Kristian
Schober Daniel
Soiland-Reyes Stian
Publication venue
Publication date: 06/07/2019
Field of study

Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during the processing of data; and by tracking and recording data provenance. These properties aid data quality assessment and contribute to secondary data usage. Moreover, workflows are digital objects in their own right. This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.Accepted for Data Intelligence special issue: FAIR best practices 2019. Carole Goble acknowledges funding by BioExcel2 (H2020 823830), IBISBA1.0 (H2020 730976) and EOSCLife (H2020 824087) . Daniel Schober's work was financed by Phenomenal (H2020 654241) at the initiation-phase of this effort, current work in kind contribution. Kristian Peters is funded by the German Network for Bioinformatics Infrastructure (de.NBI) and acknowledges BMBF funding under grant number 031L0107. Stian Soiland-Reyes is funded by BioExcel2 (H2020 823830). Daniel Garijo, Yolanda Gil, gratefully acknowledge support from DARPA award W911NF-18-1-0027, NIH award 1R01AG059874-01, and NSF award ICER-1740683

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

The University of Manchester - Institutional Repository

An extensible and scalable Pilot-MapReduce framework for data intensive applications on distributed cyberinfrastructure

Author: Mantha Pradeep Kumar
Publication venue: LSU Digital Commons
Publication date: 01/01/2012
Field of study

The volume and complexity of data that must be analyzed in scientific applications is increasing exponentially. Often, this data is distributed; thus, the ability to analyze data by localizing it will yield limited returns. Therefore, an efficient processing of large distributed datasets is required, whilst ideally not introducing fundamentally new programming models or methods. For example, extending MapReduce - a proven effective programming model for processing large datasets, to work more effectively on distributed data and on different infrastructure (such as non-Hadoop, general-purpose clusters) is desirable. We posit that this can be achieved with an effective and efficient runtime environment and without refactoring MapReduce itself. MapReduce on distributed data requires effective distributed coordination of computation (map and reduce) and data, as well as distributed data management (in particular the transfer of intermediate data units). To address these requirements, we design and implement Pilot-MapReduce (PMR) - a flexible, infrastructure-independent runtime environment for MapReduce. PMR is based on Pilot abstractions for both compute (Pilot- Jobs) and data (Pilot-Data): it utilizes Pilot-Jobs to couple the map phase computation to the nearby source data, and Pilot-Data to move intermediate data using parallel data transfers to the reduce computation phase. We analyze the effectiveness of PMR over applications with different characteristics (e. g. different volumes of intermediate and output data). Our experimental evaluations show that the Pilot abstraction for data movement across multiple clusters is promising, and can lower the execution time span of the entire MapReduce execution. We also investigate the performance of PMR with distributed data using a Word Count and a genome sequencing application over different MapReduce configurations. We find that PMR is a viable tool to support distributed NGS analytics by comparing and contrasting the PMR approach to similar capabilities of Seqal and Crossbow, two Next Generation Sequencing(NGS) Hadoop MapReduce based applications. Our experiments show that PMR provides the desired flexibility in the deployment and configuration of MapReduce runs to address specific application characteristics and achieve an optimal performance, both locally and over wide-area multiple clusters

Louisiana State University

ENABLING GENERIC DISTRIBUTED COMPUTING INFRASTRUCTURE COMPATIBILITY FOR WORKFLOW MANAGEMENT SYSTEMS

Author: Balasko A
Kacsuk P
Karoczkai K
Kozlowszky M
Marosi A
Marton I
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2012
Field of study

Solving workﬂow management system’s Distributed Computing Infrastructure (DCI) incompatibility and their workﬂow interoperability issues are very challenging and complex tasks. Workﬂow management systems (and therefore their workﬂows, workﬂow developers and also their end-users) are bounded tightly to some limited number of supported DCIs, and eﬀorts required to allow additional DCI support. In this paper we are specifying a concept how to enable generic DCI compatibility for grid workﬂow management systems (such as ASKALON, MOTEUR, gUSE/WS-PGRADE, etc.) on job and indirectly on workﬂow level. To enable DCI compatibility among the diﬀerent workﬂow management systems we have developed the DCI Bridge software solution. In this paper we will describe its internal architecture, provide usage scenarios to show how the developed service resolve the DCI interoperability issues between various middleware types. The generic DCI Bridge service enables the execution of jobs onto the existing major DCI platforms (such as Service Grids (Globus Toolkit 2 and 4, gLite, ARC, UNICORE), Desktop Grids, Web services, or even cloud based DCIs)

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

SZTAKI Publication Repository

Directory of Open Access Journals

Building essential biodiversity variables (EBVs) of species distribution and abundance at a global scale

Author: McRae L
Publication venue
Publication date: 01/02/2018
Field of study

Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to capture a minimum set of critical variables required to study, report and manage biodiversity change. Here, we assess the challenges of a ‘Big Data’ approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance. The majority of currently available data on species distributions derives from incidentally reported observations or from surveys where presence-only or presence–absence data are sampled repeatedly with standardized protocols. Most abundance data come from opportunistic population counts or from population time series using standardized protocols (e.g. repeated surveys of the same population from single or multiple sites). Enormous complexity exists in integrating these heterogeneous, multi-source data sets across space, time, taxa and different sampling methods. Integration of such data into global EBV data products requires correcting biases introduced by imperfect detection and varying sampling effort, dealing with different spatial resolution and extents, harmonizing measurement units from different data sources or sampling methods, applying statistical tools and models for spatial inter- or extrapolation, and quantifying sources of uncertainty and errors in data and models. To support the development of EBVs by the Group on Earth Observations Biodiversity Observation Network (GEO BON), we identify 11 key workflow steps that will operationalize the process of building EBV data products within and across research infrastructures worldwide. These workflow steps take multiple sequential activities into account, including identification and aggregation of various raw data sources, data quality control, taxonomic name matching and statistical modelling of integrated data. We illustrate these steps with concrete examples from existing citizen science and professional monitoring projects, including eBird, the Tropical Ecology Assessment and Monitoring network, the Living Planet Index and the Baltic Sea zooplankton monitoring. The identified workflow steps are applicable to both terrestrial and aquatic systems and a broad range of spatial, temporal and taxonomic scales. They depend on clear, findable and accessible metadata, and we provide an overview of current data and metadata standards. Several challenges remain to be solved for building global EBV data products: (i) developing tools and models for combining heterogeneous, multi-source data sets and filling data gaps in geographic, temporal and taxonomic coverage, (ii) integrating emerging methods and technologies for data collection such as citizen science, sensor networks, DNA-based techniques and satellite remote sensing, (iii) solving major technical issues related to data product structure, data storage, execution of workflows and the production process/cycle as well as approaching technical interoperability among research infrastructures, (iv) allowing semantic interoperability by developing and adopting standards and tools for capturing consistent data and metadata, and (v) ensuring legal interoperability by endorsing open data or data that are free from restrictions on use, modification and sharing. Addressing these challenges is critical for biodiversity research and for assessing progress towards conservation policy targets and sustainable development goals

UCL Discovery

Fine-Grained Workflow Interoperability in Life Sciences

Author: de la Garza Luis
Publication venue: Universität Tübingen
Publication date: 01/01/2019
Field of study

In den vergangenen Jahrzehnten führten Fortschritte in den Schlüsseltechnologien der Lebenswissenschaften zu einer exponentiellen Zunahme der zur Verfügung stehenden biologischen Daten. Um Ergebnisse zeitnah generieren zu können werden sowohl spezialisierte Rechensystem als auch Programmierfähigkeiten benötigt: Desktopcomputer oder monolithische Ansätze sind weder in der Lage mit dem Wachstum der verfügbaren biologischen Daten noch mit der Komplexität der Analysetechniken Schritt zu halten. Workflows erlauben diesem Trend durch Parallelisierungsansätzen und verteilten Rechensystemen entgegenzuwirken. Ihre transparenten Abläufe, gegeben durch ihre klar definierten Strukturen, ebenso ihre Wiederholbarkeit, erfüllen die Standards der Reproduzierbarkeit, welche an wissenschaftliche Methoden gestellt werden. Eines der Ziele unserer Arbeit ist es Forschern beim Bedienen von Rechensystemen zu unterstützen, ohne dass Programmierkenntnisse notwendig sind. Dafür wurde eine Sammlung von Tools entwickelt, welche jedes Kommandozeilenprogramm in ein Workflowsystem integrieren kann. Ohne weitere Anpassungen kann unser Programm zwei weit verbreitete Workflowsysteme unterstützen. Unser modularer Entwurf erlaubt zudem Unterstützung für weitere Workflowmaschinen hinzuzufügen. Basierend auf der Bedeutung von frühen und robusten Workflowentwürfen, haben wir außerdem eine wohl etablierte Desktop–basierte Analyseplattform erweitert. Diese enthält über 2.000 Aufgaben, wobei jede als Baustein in einem Workflow fungiert. Die Plattform erlaubt einfache Entwicklung neuer Aufgaben und die Integration externer Kommandozeilenprogramme. In dieser Arbeit wurde ein Plugin zur Konvertierung entwickelt, welches nutzerfreundliche Mechanismen bereitstellt, um Workflows auf verteilten Hochleistungsrechensystemen auszuführen—eine Aufgabe, die sonst technische Kenntnisse erfordert, die gewöhnlich nicht zum Anforderungsprofil eines Lebenswissenschaftlers gehören. Unsere Konverter–Erweiterung generiert quasi identische Versionen desselben Workflows, welche im Anschluss auf leistungsfähigen Berechnungsressourcen ausgeführt werden können. Infolgedessen werden nicht nur die Möglichkeiten von verteilten hochperformanten Rechensystemen sowie die Bequemlichkeit eines für Desktopcomputer entwickelte Workflowsystems ausgenutzt, sondern zusätzlich werden Berechnungsbeschränkungen von Desktopcomputern und die steile Lernkurve, die mit dem Workflowentwurf auf verteilten Systemen verbunden ist, umgangen. Unser Konverter–Plugin hat sofortige Anwendung für Forscher. Wir zeigen dies in drei für die Lebenswissenschaften relevanten Anwendungsbeispielen: Strukturelle Bioinformatik, Immuninformatik, und Metabolomik.Recent decades have witnessed an exponential increase of available biological data due to advances in key technologies for life sciences. Specialized computing resources and scripting skills are now required to deliver results in a timely fashion: desktop computers or monolithic approaches can no longer keep pace with neither the growth of available biological data nor the complexity of analysis techniques. Workflows offer an accessible way to counter against this trend by facilitating parallelization and distribution of computations. Given their structured and repeatable nature, workflows also provide a transparent process to satisfy strict reproducibility standards required by the scientific method. One of the goals of our work is to assist researchers in accessing computing resources without the need for programming or scripting skills. To this effect, we created a toolset able to integrate any command line tool into workflow systems. Out of the box, our toolset supports two widely–used workflow systems, but our modular design allows for seamless additions in order to support further workflow engines. Recognizing the importance of early and robust workflow design, we also extended a well–established, desktop–based analytics platform that contains more than two thousand tasks (each being a building block for a workflow), allows easy development of new tasks and is able to integrate external command line tools. We developed a converter plug–in that offers a user–friendly mechanism to execute workflows on distributed high–performance computing resources—an exercise that would otherwise require technical skills typically not associated with the average life scientist's profile. Our converter extension generates virtually identical versions of the same workflows, which can then be executed on more capable computing resources. That is, not only did we leverage the capacity of distributed high–performance resources and the conveniences of a workflow engine designed for personal computers but we also circumvented computing limitations of personal computers and the steep learning curve associated with creating workflows for distributed environments. Our converter extension has immediate applications for researchers and we showcase our results by means of three use cases relevant for life scientists: structural bioinformatics, immunoinformatics and metabolomics

Publikationsserver der Universität Tübingen

Consortium Proposal NFDI-MatWerk

Author: Bitzek Erik
Dahmen Tim
Eberl Christoph
Fritzen Felix
Gumbsch Peter
Hickel Tilmann
Klein Stefan
Mücklich Frank
Müller Matthias S.
Neugebauer Jörg
Niebel Markus
Pauly Christoph
Politze Marius
Sack Harald
Sandfeld Stefan
Schwaiger Ruth
Slusallek Philipp
Stotzka Rainer
Streit Achim
Zimmermann Martina
Publication venue: Zenodo
Publication date: 08/07/2021
Field of study

This is the official proposal the NFDI-consortium NFDI-MatWerk submitted to the DFG within the request for funding the project. Visit www.dfg.de/nfdi for more infos on the German National Research Data Infrastructure (Nationale Forschungsdateninfrastruktur - NFDI) initiative. Visit www.nfdi-matwerk.de for last infos about the project NFDI-MatWerk

KITopen

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY