478 research outputs found

    Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

    Full text link
    This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

    Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms

    Get PDF
    International audienceThe study of parallel and distributed applications and platforms, whether in the cluster, grid, peer-to-peer, volunteer, or cloud computing domain, often mandates empirical evaluation of proposed algorithmic and system solutions via simulation. Unlike direct experimentation via an application deployment on a real-world testbed, simulation enables fully repeatable and configurable experiments for arbitrary hypothetical scenarios. Two key concerns are accuracy (so that simulation results are scientifically sound) and scalability (so that simulation experiments can be fast and memory-efficient). While the scalability of a simulator is easily measured, the accuracy of many state-of-the-art simulators is largely unknown because they have not been sufficiently validated. In this work we describe recent accuracy and scalability advances made in the context of the SimGrid simulation framework. A design goal of SimGrid is that it should be versatile, i.e., applicable across all aforementioned domains. We present quantitative results that show that SimGrid compares favorably to state-of-the-art domain-specific simulators in terms of scalability, accuracy, or the trade-off between the two. An important implication is that, contrary to popular wisdom, striving for versatility in a simulator is not an impediment but instead is conducive to improving both accuracy and scalability

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Mixing multi-core CPUs and GPUs for scientific simulation software

    Get PDF
    Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

    Improving Simulations of MPI Applications Using A Hybrid Network Model with Topology and Contention Support

    Get PDF
    Proper modeling of collective communications is essential for understanding the behavior of medium-to-large scale parallel applications, and even minor deviations in implementation can adversely affect the prediction of real-world performance. We propose a hybrid network model extending LogP based approaches to account for topology and contention in high-speed TCP networks. This model is validated within SMPI, an MPI implementation provided by the SimGrid simulation toolkit. With SMPI, standard MPI applications can be compiled and run in a simulated network environment, and traces can be captured without incurring errors from tracing overheads or poor clock synchronization as in physical experiments. SMPI provides features for simulating applications that require large amounts of time or resources, including selective execution, ram folding, and off-line replay of execution traces. We validate our model by comparing traces produced by SMPI with those from other simulation platforms, as well as real world environments.Une bonne modĂ©lisation des communications collective est indispensable Ă  la comprĂ©hension des performances des applications parallĂšles et des diffĂ©rences, mĂȘme minimes, dans leur implĂ©mentation peut drastiquement modifier les performances escomptĂ©es. Nous proposons un modĂšle rĂ©seau hybrid Ă©tendant les approches de type LogP mais permettant de rendre compte de la topologie et de la contention pour les rĂ©seaux hautes performances utilisant TCP. Ce modĂšle est mis en oeuvre et validĂ© au sein de SMPI, une implĂ©mentation de MPI fournie par l'environnement SimGrid. SMPI permet de compiler et d'exĂ©cuter sans modification des applications MPI dans un environnement simulĂ©. Il est alors possible de capturer des traces sans l'intrusivitĂ© ni les problĂšme de synchronisation d'horloges habituellement rencontrĂ©s dans des expĂ©riences rĂ©elles. SMPI permet Ă©galement de simuler des applications gourmandes en mĂ©moire ou en temps de calcul Ă  l'aide de techniques telles l'exĂ©cution sĂ©lective, le repliement mĂ©moire ou le rejeu hors-ligne de traces d'exĂ©cutions. Nous validons notre modĂšle en comparant les traces produites Ă  l'aide de SMPI avec celles de traces d'exĂ©cution rĂ©elle. Nous montrons le gain obtenu en les comparant Ă©galement Ă  celles obtenues avec des modĂšles plus classiques utilisĂ©s dans des outils concurrents

    Analytical cost metrics: days of future past

    Get PDF
    2019 Summer.Includes bibliographical references.Future exascale high-performance computing (HPC) systems are expected to be increasingly heterogeneous, consisting of several multi-core CPUs and a large number of accelerators, special-purpose hardware that will increase the computing power of the system in a very energy-efficient way. Specialized, energy-efficient accelerators are also an important component in many diverse systems beyond HPC: gaming machines, general purpose workstations, tablets, phones and other media devices. With Moore's law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. This work builds analytical cost models for cost metrics such as time, energy, memory access, and silicon area. These models are used to predict the performance of applications, for performance tuning, and chip design. The idea is to work with domain specific accelerators where analytical cost models can be accurately used for performance optimization. The performance optimization problems are formulated as mathematical optimization problems. This work explores the analytical cost modeling and mathematical optimization approach in a few ways. For stencil applications and GPU architectures, the analytical cost models are developed for execution time as well as energy. The models are used for performance tuning over existing architectures, and are coupled with silicon area models of GPU architectures to generate highly efficient architecture configurations. For matrix chain products, analytical closed form solutions for off-chip data movement are built and used to minimize the total data movement cost of a minimum op count tree

    Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

    Get PDF
    The amount of produced data, either in the scientific community or the commercialworld, is constantly growing. The field of Big Data has emerged to handle largeamounts of data on distributed computing infrastructures. High-Performance Computing (HPC) infrastructures are traditionally used for the execution of computeintensive workloads. However, the HPC community is also facing an increasingneed to process large amounts of data derived from high definition sensors andlarge physics apparati. The convergence of the two fields -HPC and Big Data- iscurrently taking place. In fact, the HPC community already uses Big Data tools,which are not always integrated correctly, especially at the level of the file systemand the Resource and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, andwhat are the challenges for the HPC infrastructures, we have studied multipleaspects of the convergence: We initially provide a survey on the software provisioning methods, with a focus on data-intensive applications. We contribute a newRJMS collaboration technique called BeBiDa which is based on 50 lines of codewhereas similar solutions use at least 1000 times more. We evaluate this mechanism on real conditions and in simulated environment with our simulator Batsim.Furthermore, we provide extensions to Batsim to support I/O, and showcase thedevelopments of a generic file system model along with a Big Data applicationmodel. This allows us to complement BeBiDa real conditions experiments withsimulations while enabling us to study file system dimensioning and trade-offs.All the experiments and analysis of this work have been done with reproducibilityin mind. Based on this experience, we propose to integrate the developmentworkflow and data analysis in the reproducibility mindset, and give feedback onour experiences with a list of best practices.RĂ©sumĂ©La quantitĂ© de donnĂ©es produites, que ce soit dans la communautĂ© scientifiqueou commerciale, est en croissance constante. Le domaine du Big Data a Ă©mergĂ©face au traitement de grandes quantitĂ©s de donnĂ©es sur les infrastructures informatiques distribuĂ©es. Les infrastructures de calcul haute performance (HPC) sont traditionnellement utilisĂ©es pour l’exĂ©cution de charges de travail intensives en calcul. Cependant, la communautĂ© HPC fait Ă©galement face Ă  un nombre croissant debesoin de traitement de grandes quantitĂ©s de donnĂ©es dĂ©rivĂ©es de capteurs hautedĂ©finition et de grands appareils physique. La convergence des deux domaines-HPC et Big Data- est en cours. En fait, la communautĂ© HPC utilise dĂ©jĂ  des outilsBig Data, qui ne sont pas toujours correctement intĂ©grĂ©s, en particulier au niveaudu systĂšme de fichiers ainsi que du systĂšme de gestion des ressources (RJMS).Afin de comprendre comment nous pouvons tirer parti des clusters HPC pourl’utilisation du Big Data, et quels sont les dĂ©fis pour les infrastructures HPC, nousavons Ă©tudiĂ© plusieurs aspects de la convergence: nous avons d’abord proposĂ© uneĂ©tude sur les mĂ©thodes de provisionnement logiciel, en mettant l’accent sur lesapplications utilisant beaucoup de donnĂ©es. Nous contribuons a l’état de l’art avecune nouvelle technique de collaboration entre RJMS appelĂ©e BeBiDa basĂ©e sur 50lignes de code alors que des solutions similaires en utilisent au moins 1000 fois plus.Nous Ă©valuons ce mĂ©canisme en conditions rĂ©elles et en environnement simulĂ©avec notre simulateur Batsim. En outre, nous fournissons des extensions Ă  Batsimpour prendre en charge les entrĂ©es/sorties et prĂ©sentons le dĂ©veloppements d’unmodĂšle de systĂšme de fichiers gĂ©nĂ©rique accompagnĂ© d’un modĂšle d’applicationBig Data. Cela nous permet de complĂ©ter les expĂ©riences en conditions rĂ©ellesde BeBiDa en simulation tout en Ă©tudiant le dimensionnement et les diffĂ©rentscompromis autours des systĂšmes de fichiers.Toutes les expĂ©riences et analyses de ce travail ont Ă©tĂ© effectuĂ©es avec la reproductibilitĂ© Ă  l’esprit. Sur la base de cette expĂ©rience, nous proposons d’intĂ©grerle flux de travail du dĂ©veloppement et de l’analyse des donnĂ©es dans l’esprit dela reproductibilitĂ©, et de donner un retour sur nos expĂ©riences avec une liste debonnes pratiques

    Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing

    Get PDF
    The availability of many-core computing platforms enables a wide variety of technical solutions for systems across the embedded, high-performance and cloud computing domains. However, large scale manycore systems are notoriously hard to optimise. Choices regarding resource allocation alone can account for wide variability in timeliness and energy dissipation (up to several orders of magnitude). Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing covers dynamic resource allocation heuristics for manycore systems, aiming to provide appropriate guarantees on performance and energy efficiency. It addresses different types of systems, aiming to harmonise the approaches to dynamic allocation across the complete spectrum between systems with little flexibility and strict real-time guarantees all the way to highly dynamic systems with soft performance requirements. Technical topics presented in the book include: Load and Resource Models Admission Control Feedback-based Allocation and Optimisation Search-based Allocation Heuristics Distributed Allocation based on Swarm Intelligence Value-Based Allocation Each of the topics is illustrated with examples based on realistic computational platforms such as Network-on-Chip manycore processors, grids and private cloud environments.Note.-- EUR 6,000 BPC fee funded by the EC FP7 Post-Grant Open Access Pilo
    • 

    corecore