Search CORE

135 research outputs found

Multi-objective scheduling of Scientific Workflows in multisite clouds

Author: Liu Ji
Mattoso Marta
Oliveira Daniel de
Pacitti Esther
Valduriez Patrick
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Clouds appear as appropriate infrastructures for executing Scientific Workflows (SWfs). A cloud is typically made of several sites (or data centers), each with its own resources and data. Thus, it becomes important to be able to execute some SWfs at more than one cloud site because of the geographical distribution of data or available resources among different cloud sites. Therefore, a major problem is how to execute a SWf in a multisite cloud, while reducing execution time and monetary costs. In this paper, we propose a general solution based on multi-objective scheduling in order to execute SWfs in a multisite cloud. The solution consists of a multi-objective cost model including execution time and monetary costs, a Single Site Virtual Machine (VM) Provisioning approach (SSVP) and ActGreedy, a multisite scheduling approach. We present an experimental evaluation, based on the execution of the SciEvol SWf in Microsoft Azure cloud. The results reveal that our scheduling approach significantly outperforms two adapted baseline algorithms (which we propose by adapting two existing algorithms) and the scheduling time is reasonable compared with genetic and brute-force algorithms. The results also show that our cost model is accurate and that SSVP can generate better VM provisioning plans compared with an existing approach.Work partially funded by EU H2020 Programme and MCTI/RNP-Brazil (HPC4E grant agreement number 689772), CNPq, FAPERJ, and INRIA (MUSIC project), Microsoft (ZcloudFlow project) and performed in the context of the Computational Biology Institute (www.ibc-montpellier.fr). We would like to thank Kary Ocaña for her help in modeling and executing the SciEvol SWf.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

HAL-Rennes 1

Sharing scientific experiments and workflows in environmental applications

Author: Campos Maria Luiza
Cavalcanti Maria Cláudia
Llirbat François
Mattoso Marta
Simon Eric
Publication venue: Instituto Tércio Pacitti de Aplicações e Pesquisas Computacionais
Publication date: 30/12/2000
Field of study

Environmental applications have been stimulating the cooperation among scientists from different disciplines. There are many examples where this cooperation takes place through exchanging scientific resources, such as data, programs and mathematical models. The LeSelect architecture supports environmental applications, where scientists may share their data and programs. We believe that besides programs and data, models, as well as experiments and workflows are scientific resources that need to be shared in environmental applications. Therefore, in this paper we propose an extension to LeSelect architecture that allows sharing of models, experiments and workflows

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Pantheon

Grid Data Management: Open Problems and New Issues

Author: Mattoso Marta
Pacitti Esther
Valduriez Patrick
Publication venue: Springer Verlag
Publication date: 01/01/2007
Field of study

International audienceInitially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum

Author: Antoniu Gabriel
Costan Alexandru
Mattoso Marta
Pina Débora
Rosendo Daniel
Souza Renan
Valduriez Patrick
Publication venue
Publication date: 20/07/2023
Field of study

Modern scientific workflows require hybrid infrastructures combining numerous decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems (aka the Computing Continuum) to enable their optimized execution. Understanding and optimizing the performance of such complex Edge-to-Cloud workflows is challenging. Capturing the provenance of key performance indicators, with their related data and processes, may assist in understanding and optimizing workflow executions. However, the capture overhead can be prohibitive, particularly in resource-constrained devices, such as the ones on the IoT/Edge.To address this challenge, based on a performance analysis of existing systems, we propose ProvLight, a tool to enable efficient provenance capture on the IoT/Edge. We leverage simplified data models, data compression and grouping, and lightweight transmission protocols to reduce overheads. We further integrate ProvLight into the E2Clab framework to enable workflow provenance capture across the Edge-to-Cloud Continuum. This integration makes E2Clab a promising platform for the performance optimization of applications through reproducible experiments.We validate ProvLight at a large scale with synthetic workloads on 64 real-life IoT/Edge devices in the FIT IoT LAB testbed. Evaluations show that ProvLight outperforms state-of-the-art systems like ProvLake and DfAnalyzer in resource-constrained devices. ProvLight is 26 -- 37x faster to capture and transmit provenance data; uses 5 -- 7x less CPU; 2x less memory; transmits 2x less data; and consumes 2 -- 2.5x less energy. ProvLight and E2Clab are available as open-source tools

arXiv.org e-Print Archive

ClusterMiner: High Performance for Data, Text and Web Mining

Author: Baião Fernanda
Costa Myriam
Ebecken Nelson
Evsukoff Alexandre
Mattoso Marta
Terra Guilherme
Zaverucha Gerson
Publication venue: 'Universidade Federal do Estado do Rio de Janeiro UNIRIO'
Publication date: 24/11/2008
Field of study

Universidade Federal do Estado do Rio de Janeiro: Portal de Revistas da UNIRIO

Mediators Metadata Management Services: An Implementation Using GOA++ System

Author: Figueiredo Pires Paulo de
Mattoso Marta
Saldunbides Brügger Thaís
Publication venue
Publication date: 22/04/2022
Field of study

The main contribution of this work is the development of a Metadata Manager to interconnect heterogeneous and autonomous information sources in a flexible, expandable and transparent way. The interoperability at the semantic level is reached using an integration layer, structured in a hierarchical way, based on the concept of Mediators. Services of a Mediator Metadata Manager (MMM) are specified and implemented using functions based on the Outlines of GOA++. The MMM services e are available in the form of a GOA++ API and they can be accessed remotely via CORBA or through local API calls.Sociedad Argentina de Informática e Investigación Operativ

Servicio de Difusión de la Creación Intelectual

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Author: Barbosa Helio J. C.
Foster Ian
Gadelha Jr Luiz M. R.
Katz Daniel S.
Loss Guilherme
Magalhães Thiago
Mattoso Marta
Mondelli Maria Luiza
Ocaña Kary
Vasconcelos Ana Tereza R.
Wilde Michael
Publication venue: 'PeerJ'
Publication date: 11/01/2018
Field of study

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

arXiv.org e-Print Archive

Directory of Open Access Journals

Enhancing Energy Production with Exascale HPC Methods

Author: Camata José J.
Cela José M.
Costa Danilo
Coutinho Alvaro LGA
Fernández-Galisteo Daniel
Jiménez Carmen
Kourdioumov Vadim
Mattoso Marta
Mayo-García Rafael
Miras Thomas
Moríñigo José A.
Navarro Jorge
Navaux Philippe O.A.
Oliveira Daniel de
Rodríguez-Pascual Manuel
Silva Vítor
Souza Renan
Valduriez Patrick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and from the Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the Intel Corporation, which enabled us to obtain the presented experimental results in uncertainty quantification in seismic imagingPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

HAL-Rennes 1

SARAVÁ: data sharing for online communities in P2P

Author: Akbarinia Reza
Braganholo Vanessa
Lima Alexandre A. B.
Mattoso Marta
Pacitti Esther
Valduriez Patrick
Publication venue: HAL CCSD
Publication date: 22/07/2009
Field of study

International audienceThis paper describes SARAVÁ, a research project that aims at investigating new challenges in P2P data sharing for online communities. The major advantage of P2P is a completely decentralized approach to data sharing which does not require centralized administration. Users may be in high numbers and interested in different kinds of collaboration and sharing their knowledge, ideas, experiences, etc. Data sources can be in high numbers, fairly autonomous, i.e. locally owned and controlled, and highly heterogeneous with different semantics and structures. Our project deals with new, decentralized data management techniques that scale up while addressing the autonomy, dynamic behavior and heterogeneity of both users and data sources. In this context, we focus on two major problems: query processing with uncertain data and management of scientific workflows

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1