Search CORE

13 research outputs found

Simulation of the Performance of MapReduce Jobs

Author: Tiago André da Rocha Ferreira
Publication venue
Publication date: 12/02/2016
Field of study

Repositório Aberto da Universidade do Porto

BIGhybrid - A Toolkit for Simulating MapReduce on Hybrid Infrastructures

Author: Fedak Gilles
Geyer Claudio
Santos dos Anjos Julio Cesar
Publication venue: HAL CCSD
Publication date: 09/09/2014
Field of study

Cloud computing has increasingly been used as a platform for running large business and data processing applications. Although clouds have become highly popular, when it comes to data processing, the cost of usage is not negligible. Conversely, Desktop Grids, have been used by a plethora of projects, taking advantage of the high number of resources provided for free by volunteers. Merging cloud computing and desktop grids into hybrid infrastructure can provide a feasible low-cost solution for big data analysis. Although frameworks like MapReduce have been conceived to exploit commodity hardware, their use on hybrid infrastructure poses some challenges due to large resource heterogeneity and high churn rate. This study introduces BIGhybrid a toolkit to simulate MapReduce on hybrid environments. The main goal is to provide a framework for developers and system designers to address the issues of hybrid MapReduce. In this paper, we describe the framework which simulates the assembly of two existing middleware: BitDew- MapReduce for Desktop Grids and Hadoop-BlobSeer for Cloud Computing. Experimental results included in this work demonstrate the feasibility of our approach

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

BIGhybrid: A Simulator for MapReduce Applications in Hybrid Distributed Infrastructures Validated with the Grid5000 Experimental Platform

Author: Anjos Julio,
Fedak Gilles
Geyer Claudio
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

International audienceSUMMARY Cloud computing has increasingly been used as a platform for running large business and data processing applications. Conversely, Desktop Grids have been successfully employed in a wide range of projects, because they are able to take advantage of a large number of resources provided free of charge by volunteers. A hybrid infrastructure created from the combination of Cloud and Desktop Grids infrastructures can provide a low-cost and scalable solution for Big Data analysis. Although frameworks like MapReduce have been designed to exploit commodity hardware, their ability to take advantage of a hybrid infrastructure poses significant challenges due to their large resource heterogeneity and high churn rate. In this paper is proposed BIGhybrid, a simulator for two existing classes of MapReduce runtime environments: BitDew-MapReduce designed for Desktop Grids and BlobSeer-Hadoop designed for Cloud computing, where the goal is to carry out accurate simulations of MapReduce executions in a hybrid infrastructure composed of Cloud computing and Desktop Grid resources. This work describes the principles of the simulator and describes the validation of BigHybrid with the Grid5000 experimental platform. Owing to BigHybrid, developers can investigate and evaluate new algorithms to enable MapReduce to be executed in hybrid infrastructures. This includes topics such as resource allocation and data splitting. Concurrency and Computation: Practice and Experienc

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Adding Storage Simulation Capacities to the SimGrid Toolkit: Concepts, Models, and API

Author: Lebre Adrien
Legrand Arnaud
Suter Frédéric
Veyre Pierre
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/05/2015
Field of study

International audienceFor each kind of distributed computing infrastructures, i.e., clusters, grids, clouds, data centers, or supercomputers, storage is a essential component to cope with the tremendous increase in scientific data production and the ever-growing need for data analysis and preservation. Understanding the performance of a storage subsystem or dimensioning it properly is an important concern for which simulation can help by allowing for fast, fully repeatable, and configurable experiments for arbitrary hypothetical scenarios. However, most simulation frameworks tailored for the study of distributed systems offer no or little abstractions or models of storage resources.In this paper, we detail the extension of SimGrid, a versatile toolkit for the simulation of large-scale distributed computing systems, with storage simulation capacities. We first define the required abstractions and propose a new API to handle storage components and their contents in SimGrid-based simulators. Then we characterize the performance of the fundamental storage component that are disks and derive models of these resources. Finally we list several concrete use cases of storage simulations in clusters, grids, clouds, and data centers for which the proposed extension would be beneficial

HAL-ENS-LYON

HAL-IN2P3

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Mines Nantes

Hal-Diderot

MT-EA4Cloud: A Methodology For testing and optimising energy-aware cloud systems

Author: Cerro Cañizares Pablo
Lara Jaramillo Juan de
Llana Luis
Núñez Alberto
Publication venue: 'Elsevier BV'
Publication date: 01/05/2020
Field of study

Currently, using conventional techniques for checking and optimising the energy consumption in cloud systems is unpractical, due to the massive computational resources required. An appropriate test suite focusing on the parts of the cloud to be tested must be efficiently synthesised and executed, while the correctness of the test results must be checked. Additionally, alternative cloud configurations that optimise the energetic consumption of the cloud must be generated and analysed accordingly, which is challenging. To solve these issues we present MT-EA4Cloud, a formal approach to check the correctness – from an energy-aware point of view – of cloud systems and optimise their energy consumption. To make the checking of energy consumption practical, MT-EA4Cloud combines metamorphic testing, evolutionary algorithms and simulation. Metamorphic testing allows to formally model the underlying cloud infrastructure in the form of metamorphic relations. We use metamorphic testing to alleviate both the reliable test set problem, generating appropriate test suites focused on the features reflected in the metamorphic relations, and the oracle problem, using the metamorphic relations to check the generated results automatically. MT-EA4Cloud uses evolutionary algorithms to efficiently guide the search for optimising the energetic consumption of cloud systems, which can be calculated using different cloud simulatorsThis work was supported by the Spanish MINECO/FEDER projects DArDOS, FAME and MASSIVE under Grants TIN2015-65845-C3-1-R, RTI2018-093608-B-C31 and RTI2018-095255- B-I00, and the Comunidad de Madrid project FORTE-CM under grant S2018/TCS-4314. The first author is also supported by the Universidad Complutense de Madrid Santander Universidades grant (CT17/17-CT18/17

Biblos-e Archivo

Bridging a Gap Between Research and Production: Contributions to Scheduling and Simulation

Author: Suter Frédéric
Publication venue: HAL CCSD
Publication date: 10/12/2014
Field of study

Large scale distributed computing infrastructures (e.g., data centers, grids, or clouds) are used by scientists from various domains to produce outstanding research results, such as the discovery of the Higgs Boson in High Energy Physics. These infrastructures are also studied by Computer Scientists to produce their own set of scientific results. Ideally, a virtuous circle should exist between Domain and Computer Scientists: the former raising challenges that could be addressed by the latter. Unfortunately, in many occasions, a gap exists that prevents such an ideal and fostering collaboration. This habilitation covers research works conducted in the fields of scheduling and simulation that contribute to the filling of this gap. It discusses the necessary conditions to achieve this goal and details concrete initiatives in this endeavor

HAL-ENS-LYON

Thèses en Ligne

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

MT-EA4Cloud: A Methodology For testing and optimising energy-aware cloud systems

Author: Cañizares Pablo C.
Lara Juan de
Llana Díaz Luis Fernando
Núñez Covarrubias Alberto
Publication venue: 'Elsevier BV'
Publication date: 01/05/2020
Field of study

Docta Complutense

Biblos-e Archivo

Scheduling for Large Scale Distributed Computing Systems: Approaches and Performance Evaluation Issues

Author: Legrand Arnaud
Publication venue: HAL CCSD
Publication date: 02/11/2015
Field of study

Although our everyday life and society now depends heavily oncommunication infrastructures and computation infrastructures,scientists and engineers have always been among the main consumers ofcomputing power. This document provides a coherent overview of theresearch I have conducted in the last 15 years and which targets themanagement and performance evaluation of large scale distributedcomputing infrastructures such as clusters, grids, desktop grids,volunteer computing platforms, ... when used for scientific computing.In the first part of this document, I present how I have addressedscheduling problems arising on distributed platforms (like computinggrids) with a particular emphasis on heterogeneity and multi-userissues, hence in connection with game theory. Most of these problemsare relaxed from a classical combinatorial optimization formulationinto a continuous form, which allows to easily account for keyplatform characteristics such as heterogeneity or complex topologywhile providing efficient practical and distributed solutions.The second part presents my main contributions to the SimGrid project,which is a simulation toolkit for building simulators of distributedapplications (originally designed for scheduling algorithm evaluationpurposes). It comprises a unified presentation of how the questions ofvalidation and scalability have been addressed in SimGrid as well asthoughts on specific challenges related to methodological aspects andto the application of SimGrid to the HPC context

Thèses en Ligne

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server