13 research outputs found

    BIGhybrid - A Toolkit for Simulating MapReduce on Hybrid Infrastructures

    Get PDF
    Cloud computing has increasingly been used as a platform for running large business and data processing applications. Although clouds have become highly popular, when it comes to data processing, the cost of usage is not negligible. Conversely, Desktop Grids, have been used by a plethora of projects, taking advantage of the high number of resources provided for free by volunteers. Merging cloud computing and desktop grids into hybrid infrastructure can provide a feasible low-cost solution for big data analysis. Although frameworks like MapReduce have been conceived to exploit commodity hardware, their use on hybrid infrastructure poses some challenges due to large resource heterogeneity and high churn rate. This study introduces BIGhybrid a toolkit to simulate MapReduce on hybrid environments. The main goal is to provide a framework for developers and system designers to address the issues of hybrid MapReduce. In this paper, we describe the framework which simulates the assembly of two existing middleware: BitDew- MapReduce for Desktop Grids and Hadoop-BlobSeer for Cloud Computing. Experimental results included in this work demonstrate the feasibility of our approach

    BIGhybrid: A Simulator for MapReduce Applications in Hybrid Distributed Infrastructures Validated with the Grid5000 Experimental Platform

    Get PDF
    International audienceSUMMARY Cloud computing has increasingly been used as a platform for running large business and data processing applications. Conversely, Desktop Grids have been successfully employed in a wide range of projects, because they are able to take advantage of a large number of resources provided free of charge by volunteers. A hybrid infrastructure created from the combination of Cloud and Desktop Grids infrastructures can provide a low-cost and scalable solution for Big Data analysis. Although frameworks like MapReduce have been designed to exploit commodity hardware, their ability to take advantage of a hybrid infrastructure poses significant challenges due to their large resource heterogeneity and high churn rate. In this paper is proposed BIGhybrid, a simulator for two existing classes of MapReduce runtime environments: BitDew-MapReduce designed for Desktop Grids and BlobSeer-Hadoop designed for Cloud computing, where the goal is to carry out accurate simulations of MapReduce executions in a hybrid infrastructure composed of Cloud computing and Desktop Grid resources. This work describes the principles of the simulator and describes the validation of BigHybrid with the Grid5000 experimental platform. Owing to BigHybrid, developers can investigate and evaluate new algorithms to enable MapReduce to be executed in hybrid infrastructures. This includes topics such as resource allocation and data splitting. Concurrency and Computation: Practice and Experienc

    Adding Storage Simulation Capacities to the SimGrid Toolkit: Concepts, Models, and API

    Get PDF
    International audienceFor each kind of distributed computing infrastructures, i.e., clusters, grids, clouds, data centers, or supercomputers, storage is a essential component to cope with the tremendous increase in scientific data production and the ever-growing need for data analysis and preservation. Understanding the performance of a storage subsystem or dimensioning it properly is an important concern for which simulation can help by allowing for fast, fully repeatable, and configurable experiments for arbitrary hypothetical scenarios. However, most simulation frameworks tailored for the study of distributed systems offer no or little abstractions or models of storage resources.In this paper, we detail the extension of SimGrid, a versatile toolkit for the simulation of large-scale distributed computing systems, with storage simulation capacities. We first define the required abstractions and propose a new API to handle storage components and their contents in SimGrid-based simulators. Then we characterize the performance of the fundamental storage component that are disks and derive models of these resources. Finally we list several concrete use cases of storage simulations in clusters, grids, clouds, and data centers for which the proposed extension would be beneficial

    MT-EA4Cloud: A Methodology For testing and optimising energy-aware cloud systems

    Full text link
    Currently, using conventional techniques for checking and optimising the energy consumption in cloud systems is unpractical, due to the massive computational resources required. An appropriate test suite focusing on the parts of the cloud to be tested must be efficiently synthesised and executed, while the correctness of the test results must be checked. Additionally, alternative cloud configurations that optimise the energetic consumption of the cloud must be generated and analysed accordingly, which is challenging. To solve these issues we present MT-EA4Cloud, a formal approach to check the correctness – from an energy-aware point of view – of cloud systems and optimise their energy consumption. To make the checking of energy consumption practical, MT-EA4Cloud combines metamorphic testing, evolutionary algorithms and simulation. Metamorphic testing allows to formally model the underlying cloud infrastructure in the form of metamorphic relations. We use metamorphic testing to alleviate both the reliable test set problem, generating appropriate test suites focused on the features reflected in the metamorphic relations, and the oracle problem, using the metamorphic relations to check the generated results automatically. MT-EA4Cloud uses evolutionary algorithms to efficiently guide the search for optimising the energetic consumption of cloud systems, which can be calculated using different cloud simulatorsThis work was supported by the Spanish MINECO/FEDER projects DArDOS, FAME and MASSIVE under Grants TIN2015-65845-C3-1-R, RTI2018-093608-B-C31 and RTI2018-095255- B-I00, and the Comunidad de Madrid project FORTE-CM under grant S2018/TCS-4314. The first author is also supported by the Universidad Complutense de Madrid Santander Universidades grant (CT17/17-CT18/17

    Bridging a Gap Between Research and Production: Contributions to Scheduling and Simulation

    Get PDF
    Large scale distributed computing infrastructures (e.g., data centers, grids, or clouds) are used by scientists from various domains to produce outstanding research results, such as the discovery of the Higgs Boson in High Energy Physics. These infrastructures are also studied by Computer Scientists to produce their own set of scientific results. Ideally, a virtuous circle should exist between Domain and Computer Scientists: the former raising challenges that could be addressed by the latter. Unfortunately, in many occasions, a gap exists that prevents such an ideal and fostering collaboration. This habilitation covers research works conducted in the fields of scheduling and simulation that contribute to the filling of this gap. It discusses the necessary conditions to achieve this goal and details concrete initiatives in this endeavor

    MT-EA4Cloud: A Methodology For testing and optimising energy-aware cloud systems

    Get PDF
    Currently, using conventional techniques for checking and optimising the energy consumption in cloud systems is unpractical, due to the massive computational resources required. An appropriate test suite focusing on the parts of the cloud to be tested must be efficiently synthesised and executed, while the correctness of the test results must be checked. Additionally, alternative cloud configurations that optimise the energetic consumption of the cloud must be generated and analysed accordingly, which is challenging. To solve these issues we present MT-EA4Cloud, a formal approach to check the correctness – from an energy-aware point of view – of cloud systems and optimise their energy consumption. To make the checking of energy consumption practical, MT-EA4Cloud combines metamorphic testing, evolutionary algorithms and simulation. Metamorphic testing allows to formally model the underlying cloud infrastructure in the form of metamorphic relations. We use metamorphic testing to alleviate both the reliable test set problem, generating appropriate test suites focused on the features reflected in the metamorphic relations, and the oracle problem, using the metamorphic relations to check the generated results automatically. MT-EA4Cloud uses evolutionary algorithms to efficiently guide the search for optimising the energetic consumption of cloud systems, which can be calculated using different cloud simulators

    Scheduling for Large Scale Distributed Computing Systems: Approaches and Performance Evaluation Issues

    Get PDF
    Although our everyday life and society now depends heavily oncommunication infrastructures and computation infrastructures,scientists and engineers have always been among the main consumers ofcomputing power. This document provides a coherent overview of theresearch I have conducted in the last 15 years and which targets themanagement and performance evaluation of large scale distributedcomputing infrastructures such as clusters, grids, desktop grids,volunteer computing platforms, ... when used for scientific computing.In the first part of this document, I present how I have addressedscheduling problems arising on distributed platforms (like computinggrids) with a particular emphasis on heterogeneity and multi-userissues, hence in connection with game theory. Most of these problemsare relaxed from a classical combinatorial optimization formulationinto a continuous form, which allows to easily account for keyplatform characteristics such as heterogeneity or complex topologywhile providing efficient practical and distributed solutions.The second part presents my main contributions to the SimGrid project,which is a simulation toolkit for building simulators of distributedapplications (originally designed for scheduling algorithm evaluationpurposes). It comprises a unified presentation of how the questions ofvalidation and scalability have been addressed in SimGrid as well asthoughts on specific challenges related to methodological aspects andto the application of SimGrid to the HPC context
    corecore