103 research outputs found

    SimGrid: a Sustained Effort for the Versatile Simulation of Large Scale Distributed Systems

    Full text link
    In this paper we present Simgrid, a toolkit for the versatile simulation of large scale distributed systems, whose development effort has been sustained for the last fifteen years. Over this time period SimGrid has evolved from a one-laboratory project in the U.S. into a scientific instrument developed by an international collaboration. The keys to making this evolution possible have been securing of funding, improving the quality of the software, and increasing the user base. In this paper we describe how we have been able to make advances on all three fronts, on which we plan to intensify our efforts over the upcoming years.Comment: 4 pages, submission to WSSSPE'1

    GRAS: a Research and Development framework for Grid services

    Get PDF
    Grid platforms federate large numbers of resources across several organizations. While their promises are great, these platforms have proven challenging to use because of inherent heterogeneity and dynamic characteristics. Therefore, grid application development is possible only if robust distributed services infrastructures, e.g. for resource and data discovery, resource monitoring or application deployment, are available. These infrastructures, which are large-scaled distributed loosely-coupled applications, are very difficult to design, develop and tune.This paper presents the Grid Reality And Simulation (GRAS) framework that allows grid developers to first implement and experiment with such an infrastructure in simulation, benefiting from a controlled and fast environment. The infrastructure can then be deployed in situ without code modification. We first detail the design goals and the implementation of GRAS, and contrast them to the state of the art. We then present a case study to highlight the fundamentals of GRAS and illustrate its ease-of-use. In addition, we quantify the complexity of a code example using either GRAS or several other communication solutions. We also conduct tests over LAN and WAN networks to assess the performance. We find that the code using GRAS is simpler and shorter than any other solution while achieving better performance than most of the other solutions

    SimGrid: a Generic Framework for Large-Scale Distributed Experiments

    Get PDF
    International audienceIn this paper we describe a comprehensive simulation framework, SimGrid, for the simulation of distributed applications on distributed platforms. Our goal is to describe the salient capabilities of SimGrid and explain how they allow users to perform simulations for a wide range of applications and platforms

    Dynamic Performance Forecasting for Network-Enabled Servers in a Heterogeneous Environment

    Get PDF
    This paper presents a tool for dynamic forecasting of Network-Enabled Servers performance. FAST (Fast Agent's System Timer}) is a software package allowing client applications to get an accurate forecast of communicat- ion and computation times and memory use in a heterogeneous environment. It relies on low level software packages, i.e., network and host monitoring tools, and some of our developments in computation routines modeling. The FAST internals and user interface are presented and a comparison between the execution time predicted by FAST and the measured time of complex matrix multiplication executed on an heterogeneous platform is given

    GRAS: a Research and Development Framework for Grid and P2P Infrastructures

    Get PDF
    International audienceDistributed service architectures are mandatory to handle the platform scale and dynamicity hindering the development of grid and P2P applications. These large-scaled distributed applications are difficult to design, develop and tune because of both theoretical and practical issues. This paper presents the GRAS framework that allows developers to first implement and experiment with such an infrastructure in simulation, benefiting from a controlled environment. The infrastructure can then be deployed in-situ without code modification. We detail our design goals, and contrast them with the state of the art. We study the exchange of a message (from the Pastry protocol) using either GRAS or several other solutions. We quantify both the code complexity and the performance and find that GRAS performs better according to both metrics

    A Simple Model of Communication APIs – Application to Dynamic Partial-order Reduction

    Get PDF
    We are interested in the verification, using model checking, of distributed programs that communicate asynchronously over standard communication APIs such as MPI. This is feasible only if the set of executions that the model checker explores is aggressively reduced to a subset of representative executions, using techniques such as dynamic partial-order reduction. We propose a small set of core primitives in terms of which such APIs can be defined and formally specify these primitives in TLA+. From this specification we derive theorems about the (in)dependence of invocations of the primitives, and use them in a DPOR-based verifier that runs within SimGrid, a simulation framework for distributed programming. Our preliminary experimental results indicate that we obtain good reductions, even though complex network operations are implemented in terms of the core commu nication primitives

    Assessing the Performance of MPI Applications Through Time-Independent Trace Replay

    Get PDF
    International audienceSimulation is a popular approach to obtain objective performance indicators platforms that are not at one's disposal. It may help the dimensioning of compute clusters in large computing centers. In this work we present a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. This allows us to completely decouple the acquisition process from the actual replay of the traces in a simulation context. Then we are able to acquire traces for large application instances without being limited to an execution on a single compute cluster. Finally our framework is built on top of a scalable, fast, and validated simulation kernel. In this paper, we present the used time-independent trace format, investigate several acquisition strategies, detail the developed trace replay tool, and assess the quality of our simulation framework in terms of accuracy, acquisition time, simulation time, and trace size.La simulation est une approche trĂšs populaire pour obtenir des indicateurs de performances objectifs sur des plates-formes qui ne sont pas disponibles. Cela peut permettre le dimensionnement de grappes de calculs au sein de grands centres de calcul. Dans cet article nous prĂ©sentons un outil de simulation post-mortem d'applications MPI. Sa principale originalitĂ© au regard de la littĂ©rature est d'utiliser des traces d'exĂ©cution indĂ©pendantes du temps. Cela permet de dĂ©coupler intĂ©gralement le processus d'acquisition des traces de celui de rejeu dans un contexte de simulation. Il est ainsi possible d'obtenir des traces pour de grandes instances de problĂšmes sans ĂȘtre limitĂ© Ă  des exĂ©cutions au sein d'une unique grappe. Enfin notre outil est dĂ©veloppĂ© au dessus d'un noyau de simulation scalable, rapide et validĂ©. Cet article prĂ©sente le format de traces indĂ©pendantes du temps utilisĂ©, Ă©tudie plusieurs stratĂ©gies d'acquisition, dĂ©taille l'outil de rejeu que nous avons dĂ©velopĂ©, et evaluĂ© la qualitĂ© de nos simulations en termes de prĂ©cision, temps d'acuisition, temps de simulation et tailles de traces

    System-level State Equality Detection for the Dynamic Verification of Distributed Applications

    Get PDF
    International audienceThis poster presents our solution to detect state equality of legacy MPI applications directly at system level, which is important to formally verify these applications

    Byte-Range Asynchronous Locking in Distributed Settings

    Get PDF
    International audienceThis paper investigate a mutual exclusion algorithm on distributed systems. We introduce a new algorithm based on the Naimi-Trehel algorithm, taking advantage of the distributed approach of Naimi-Trehel while allowing to request partial locks. Such ranged locks offer a semantic close to POSIX file locking, where threads lock some parts of the shared file. We evaluate our algorithm by comparing its performance with to the original Naimi-Trehel algorithm and to a centralized mutual exclusion algorithm. The considered performance metric is the average time to obtain a lock

    The Java Learning Machine: A Learning Management System Dedicated To Computer Science Education

    Get PDF
    This paper presents the Java Learning Machine (JLM), a platform dedicated to computer programming education. This generic platform offers support to teachers for creating programming microworlds suitable to teaching courses. It features an integrated and graphical environment, providing a short feedback loop to students in order to improve the effectiveness of the autonomous learning process. This paper presents the motivations behind the platform and its main functionalities.Ce rapport présente la Java Learning Machine (JLM), une plate-forme dédiée à l'enseignement de la programmation. Cette plate-forme générique permet aux enseignants d'informatique de créer des micro-mondes utilisables dans leurs cours. Elle constitue un environnement graphique intégré, offrant aux apprenants d'obtenir un retour immédiat sur leur travail. Cela permet d'améliorer l'efficacité du processus d'apprentissage en autonomie. Ce rapport présente les motivations ayant mené à la création de la plate-forme, ainsi que les principales fonctionnalités de l'outil
    • 

    corecore