29 research outputs found

    Grid'5000: a large scale and highly reconfigurable grid experimental testbed

    Full text link
    Large scale distributed systems such as Grids are difficult to study from theoretical models and simulators only. Most Grids deployed at large scale are production plat-forms that are inappropriate research tools because of their limited reconfiguration, control and monitoring capa-bilities. In this paper, we present Grid’5000, a 5000 CPU nation-wide infrastructure for research in Grid computing. Grid’5000 is designed to provide a scientific tool for com-puter scientists similar to the large-scale instruments used by physicists, astronomers, and biologists. We describe the motivations, design considerations, architec-ture, control, and monitoring infrastructure of this experi-mental platform. We present configuration examples and performance results for the reconfiguration subsystem

    Roborobo! a Fast Robot Simulator for Swarm and Collective Robotics

    Full text link
    Roborobo! is a multi-platform, highly portable, robot simulator for large-scale collective robotics experiments. Roborobo! is coded in C++, and follows the KISS guideline ("Keep it simple"). Therefore, its external dependency is solely limited to the widely available SDL library for fast 2D Graphics. Roborobo! is based on a Khepera/ePuck model. It is targeted for fast single and multi-robots simulation, and has already been used in more than a dozen published research mainly concerned with evolutionary swarm robotics, including environment-driven self-adaptation and distributed evolutionary optimization, as well as online onboard embodied evolution and embodied morphogenesis.Comment: 2 pages, 1 figur

    Enhanced Failure Detection Mechanism in MapReduce

    Get PDF
    The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general

    Evaluating WUW, a service to enhance users' satisfaction in Content-Based Peer-to-Peer Networks

    Get PDF
    International audienceNowadays, Peer-to-Peer (P2P) architectures are becoming more popular in content delivery applications thanks to their valuable characteristics as scalability, performance and low maintenance costs. In those systems, peers share their resources automatically (bandwidth, storage, etc.) and not only download content but also upload content to other peers organized in a neighbourhood. Each peer' neighbourhood is based basically on QoS-related parameters (available bandwidth, number of connections, etc.) and the amount of exchanged content. We consider that peers are under control of users that are autonomous and free persons having rights, preferences and interests. As users' resources are the richness of P2P systems, we think it is important to satisfy their preferences beyond the QoS. In this paper we present first experimental results of WUW (What Users Want), a service located on top of a P2P layer and proposed to satisfy users' preferences during content exchange. In the current implementation we use the BitTorrent protocol for measuring to which extent users' preferences influence the P2P behaviour when WUW is used. We describe how the experimental scenarios are built using the resources provided by Grid'5000. Our preliminary results are encouraging because they show a low overhead of WUW on the global content sharing performance

    On the sustainability of large-scale computer science testbeds: the Grid'5000 case

    Get PDF
    In this position paper, we look at the financial sustainability of Grid'5000. The duration of the project (over 12 years) owesmore to successive investment decisions and continued support rather than from a wellunderstood and operated business model generating enough revenue tocovers costs and investments.In this paper, we will give an overview of a typical coststructure for a large-scale testbed, wesummarize with a few statements our views and develop pros and cons of a few funding sources. The way Grid'5000 is funded is detailed, before giving some data used to compute the unit cost for Grid'5000 resources

    A Generic API for Load Balancing in Structured P2P Systems

    Get PDF
    International audienceReal world datasets are known to be highly skewed, often leading to an important load imbalance issue for distributed systems managing them. To address this issue, there exist almost as many load balancing strategies as there are different systems. When designing a scalable distributed system geared towards handling large amounts of information, it is often not so easy to anticipate which kind of strategy will be the most efficient to maintain adequate performance regarding response time, scalability and reliability at any time. Based on this observation, we describe the methodology behind the building of a generic API to implement and experiment any strategy independently from the rest of the code, prior to a definitive choice for instance. We then show how this API is compatible with famous existing systems and their load balancing scheme. We also present results from our own distributed system which targets the continuous storage of events structured according to the Semantic Web standards, further retrieved by interested parties. As such, our system constitutes a typical example of a Big Data environment

    Cost Function based Event Triggered Model Predictive Controllers - Application to Big Data Cloud Services

    No full text
    International audienceHigh rate cluster reconfigurations is a costly issue in Big Data Cloud services. Current control solutions manage to scale the cluster according to the workload, however they do not try to minimize the number of system reconfigurations. Event-based control is known to reduce the number of control updates typically by waiting for the system states to degrade below a given threshold before reacting. However, computer science systems often have exogenous inputs (such as clients connections) with delayed impacts that can enable to anticipate states degradation. In this paper, a novel event-triggered approach is proposed. This triggering mechanism relies on a Model Predictive Controller and is defined upon the value of the optimal cost function instead of the state or output error. This controller reduces the number of control changes, in the normal operation mode, through constraints in the MPC formulation but also assures a very reactive behavior to changes of exogenous inputs. This novel control approach is evaluated using a model validated on a real Big Data system. The controller efficiently scales the cluster according to specifications, meanwhile reducing its reconfigurations

    Application du contrôle pour garantir la performance des systèmes Big Data

    No full text
    International audienceNous sommes à l'aube d'une énorme explosion de données et la quantité à traiter par les entreprises est de plus en plus grande. Pour faire face à ce chalenge, Google a développé MapReduce, un modèle de programmation parallèle qui est en train de devenir l'outil de facto pour l'analyse des systèmes Big Data. Bien que dans une certaine mesure son utilisation est déjà très répandue dans l'industrie, garantir les performances d'un système aussi complexe pose de grands problèmes et sa gestion nécessite un haut niveau d'expertise. Cet article répond à ces défis en proposant le premier système autonome qui garantit des contraintes de temps de réponse pour une charge de travail MapReduce simultanée. Nous développons le premier modèle dynamique d'une grappe MapRe- duce. De plus, un contrôle en boucle fermée est conçu et implémenté pour garantir un temps de réponse donné. Un contrôle d'anticipation de type ""feedforward"" est également rajouté pour amé- liorer la réponse du système en présence de perturbations, en l'occurrence, la variation du nombre de clients. L'approche est validée en ligne sur une grappe MapReduce avec 40 nœuds utilisant une charge de travail intensive de type Business Intelligence. Nos expériences montrent que le contrôle ainsi conçu peut garantir les contraintes de temps de réponse

    Grid computing: a case study in hybrid GMRES method

    Get PDF
    Abstract. Grid computing in general is a special type of parallel computing. It intends to deliver high-performance computing over distributed platforms for computation and data-intensive applications by making use of a very large amount of resources. The GMRES method is used widely to solve the large sparse linear systems. In this paper, we present an effective parallel hybrid asynchronous method, which combines the typical parallel GMRES method with the Least Square method that needs some eigenvalues obtained from a parallel Arnoldi process. And we apply it on a Grid Computing platform Grid5000. From the numeric results, we will present that this hybrid method has some advantage for some real or complex systems compared to the general method GMRES
    corecore