Search CORE

1,104 research outputs found

Improving Pipelining Tools for Pre-processing Data

Author: Lage Yeray
Laza Rosalía
Méndez José Ramón
Novo-Lourés María
Pavón Reyes
Ruano-Ordás David
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 01/01/2022
Field of study

The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features

Re-UNIR

DIALNET

Improving pipelining tools for pre-processing data

Author: Lage Yeray
Laza Fidalgo Rosalía
Méndez Reboredo José Ramón
Novo Lourés María
Pavón Rial Maria Reyes
Ruano Ordás David Alfonso
Publication venue: Sistemas Informáticos de Nova Xeración
Publication date: 04/12/2023
Field of study

Investigo

Fine-Grain Interoperability of Scientific Workflows in Distributed Computing Infrastructures

Author: David Rogers
E Deelman
I Taylor
Ian Harvey
Ian Taylor
J Montagnat
Johan Montagnat
Kassian Plankensteiner
Matthias Janetschek
P Kacsuk
PD Wells
Péter Kacsuk
Radu Prodan
T Glatard
Thomas Fahringer
WB Dobrusky
Ákos Balaskó
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Today there exist a wide variety of scientific workflow management systems, each designed to fulfill the needs of a certain scientific community. Unfortunately, once a workflow application has been designed in one particular system it becomes very hard to share it with users working with different systems. Portability of workflows and interoperability between current systems barely exists. In this work, we present the fine-grained interoperability solution proposed in the SHIWA European project that brings together four representative European workflow systems: ASKALON, MOTEUR, WS-PGRADE, and Triana. The proposed interoperability is realised at two levels of abstraction: abstract and concrete. At the abstract level, we propose a generic Interoperable Workflow Intermediate Representation (IWIR) that can be used as a common bridge for translating workflows between different languages independent of the underlying distributed computing infrastructure. At the concrete level, we propose a bundling technique that aggregates the abstract IWIR representation and concrete task representations to enable workflow instantiation, execution and scheduling. We illustrate case studies using two real-workflow applications designed in a native environment and then translated and executed by a foreign workflow system in a foreign distributed computing infrastructure. © 2013 Springer Science+Business Media Dordrecht

CiteSeerX

Crossref

SZTAKI Publication Repository

Online Research @ Cardiff

HAL-UNICE

University of Innsbruck Digital Library

The architecture of discovery net : towards grid-based discovery services

Author: Wendel Patrick
Wendel Patrick
Publication venue
Publication date: 01/01/2008
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Recommended from our members

Streaming Support for Data Intensive Cloud-Based Sequence Analysis

Author: Abouelhoda Mohamed
Bruggmann Rémy
El-Kalioby Mohamed
Issa Shadi A.
Kienzler Romeo
Tonellato Peter J.
Wall Dennis
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Bern Open Repository and Information System (BORIS)