16 research outputs found

    Метод описания топологической структуры вычислительных кластеров, основанный на операциях произведений подграфов

    Get PDF
    Topological structure of communication networks in supercomputers with grow in size and complexity of installation, respectively becomes more difficult. There are many methods to describe it, but such descriptions are cumbersome, which makes them difficult to manipulate. The article proposes an approach to describing the communication environment of a supercomputer, when the communication network is described as a constructor. The elements of the constructor are typical topological structures often found in various computing systems. For this purpose, a language for describing the topological structure has been developed. It based on the operation products of subgraphs. The language is ideologically similar in its principles to the NetML and OMNeT++ languages. Special attention is paid to exceptions in the regularity of networks of real supercomputers; in order to add the possibility of describing this fact, special constructions have been introduced into the language. A library has been developed in the C programming language with purpose to facilitate work with the language intoduced in this article. Also a special wrapper over C library has been written in Python3, which then can be used to visualize graphs described by the language. The expressive power of language has been demonstrated in the description computing clusters: Tianhe-2A, AI Bridging Cloud Infrastructure and Lomonosov-2. The method has been tested and compared with GraphViz DOT it is showed multiple reductions in the Record volume required to save topology for some of the major Top500 systems.Топологическая структура коммуникационных сетей суперкомпьютерных систем при увеличении размера и сложности суперкомпьютеров соответственно усложняется. Для ее описания существует множество методов, однако такие описания являются громоздкими, что усложняет манипулирование ими. В статье предложен подход к описанию коммуникационной среды суперкомпьютера, когда коммуникационная сеть описывается как конструктор, где элементами конструктора являются типовые топологические структуры, часто встречающиеся в различных вычислительных системах. С этой целью разработан язык описания топологической структуры, основанный на операции произведения подграфов. Язык идейно схож в своих принципах с языками NetML и OMNeT++. Отдельное внимание в работе уделяется исключениям в регулярности сетей реальных суперкомпьютеров; с целью добавления возможности описания данного факта в язык внесены специальные конструкции. Для поддержки работы с языком описания разработана библиотека на языке программирования Си и специальная оболочка над ней написанная на языке Python3, которая затем может использоваться для визуализации описываемых языком графов. Выразительная мощность языка была продемонстрирована на описании вычислительных кластеров: Tianhe-2A, AI Bridging Cloud Infrastructure и Ломоносов-2. Метод был проверен и сравнен с GraphViz DOT показано многократное сокращение необходимых объема записи для некоторых крупных систем из Top500

    Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in New Generation Computing. The final authenticated version is available online at: https://doi.org/10.1007/s00354-013-0302-4[Abstract] The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures to ensure that not all computation done is lost on machine failures. Checkpointing and rollback recovery is one of the most popular techniques to implement fault-tolerant applications. However, checkpointing parallel applications is expensive in terms of computing time, network utilization and storage resources. Thus, current checkpoint-recovery techniques should minimize these costs in order to be useful for large scale systems. In this paper three different and complementary techniques to reduce the size of the checkpoints generated by application-level checkpointing are proposed and implemented. Detailed experimental results obtained on a multicore cluster show the effectiveness of the proposed methods to reduce checkpointing cost.Ministerio de Ciencia e Innovación; TIN2010-16735Galicia. Consellería de Economía e Industria; 10PXIB105180P

    Stencil codes on a vector length agnostic architecture

    Get PDF
    Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manual vectorization is a tedious and costly process that needs to be repeated for each specific instruction set or register size. In addition, automatic compiler vectorization is susceptible to code complexity, and usually limited due to data and control dependencies. To address some these issues, Arm recently released a new vector ISA, the Scalable Vector Extension (SVE), which is Vector-Length Agnostic (VLA). VLA enables the generation of binary files that run regardless of the physical vector register length. In this paper we leverage the main characteristics of SVE to implement and optimize stencil computations, ubiquitous in scientific computing. We show that SVE enables easy deployment of textbook optimizations like loop unrolling, loop fusion, load trading or data reuse. Our detailed simulations using vector lengths ranging from 128 to 2,048 bits show that these optimizations can lead to performance improvements over straight-forward vectorized code of up to 56.6% for 2,048 bit vectors. In addition, we show that certain optimizations can hurt performance due to a reduction in arithmetic intensity, and provide insight useful for compiler optimizers.This work has been partially supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by the Generalitat de Catalunya (contracts 2017-SGR-1328 and 2017-SGR-1414). The Mont-Blanc project receives funding from the EUs H2020 Framework Programme (H2020/2014-2020) under grant agreements no. 671697 and no. 779877. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104. Finally, A. Armejach has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Juan de la Cierva postdoctoral fellowship number FJCI-2015-24753.Peer ReviewedPostprint (author's final draft

    ReStore: In-Memory REplicated STORagE for Rapid Recovery in Fault-Tolerant Algorithms

    Get PDF
    Fault-tolerant distributed applications require mechanisms to recover data lost via a process failure. On modern cluster systems it is typically impractical to request replacement resources after such a failure. Therefore, applications have to continue working with the remaining resources. This requires redistributing the workload and that the non-failed processes reload data. We present an algorithmic framework and its C++ library implementation ReStore for MPI programs that enables recovery of data after process failures. By storing all required data in memory via an appropriate data distribution and replication, recovery is substantially faster than with standard checkpointing schemes that rely on a parallel file system. As the application developer can specify which data to load, we also support shrinking recovery instead of recovery using spare compute nodes. We evaluate ReStore in both controlled, isolated environments and real applications. Our experiments show loading times of lost input data in the range of milliseconds on up to 24576 processors and a substantial speedup of the recovery time for the fault-tolerant version of a widely used bioinformatics application

    Simulation based formal verification of cyber-physical systems

    Get PDF
    Cyber-Physical Systems (CPSs) have become an intrinsic part of the 21st century world. Systems like Smart Grids, Transportation, and Healthcare help us run our lives and businesses smoothly, successfully and safely. Since malfunctions in these CPSs can have serious, expensive, sometimes fatal consequences, System-Level Formal Verification (SLFV) tools are vital to minimise the likelihood of errors occurring during the development process and beyond. Their applicability is supported by the increasingly widespread use of Model Based Design (MBD) tools. MBD enables the simulation of CPS models in order to check for their correct behaviour from the very initial design phase. The disadvantage is that SLFV for complex CPSs is an extremely time-consuming process, which typically requires several months of simulation. Current SLFV tools are aimed at accelerating the verification process with multiple simulators working simultaneously. To this end, they compute all the scenarios in advance in such a way as to split and simulate them in parallel. Furthermore, they compute optimised simulation campaigns in order to simulate common prefixes of these scenarios only once, thus avoiding redundant simulation. Nevertheless, there are still limitations that prevent a more widespread adoption of SLFV tools. Firstly, current tools cannot optimise simulation campaigns from existing datasets with collected scenarios. Secondly, there are currently no methods to predict the time required to complete the SLFV process. This lack of ability to predict the length of the process makes scheduling verification activities highly problematic. In this thesis, we present how we are able to overcome these limitations with the use of a simulation campaign optimiser and an execution time estimator. The optimiser tool is aimed at speeding up the SLFV process by using a data-intensive algorithm to obtain optimised simulation campaigns from existing datasets, that may contain a large quantity of collected scenarios. The estimator tool is able to accurately predict the execution time to simulate a given simulation campaign by using an effective machine-independent method

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft


    Get PDF
    My dissertation focuses on developing scalable algorithms for analyzing large complex networks and evaluating how the results alter with changes to the network. Network analysis has become a ubiquitous and very effective tool in big data analysis, particularly for understanding the mechanisms of complex systems that arise in diverse disciplines such as cybersecurity [83], biology [15], sociology [5], and epidemiology [7]. However, data from real-world systems are inherently noisy because they are influenced by fluctuations in experiments, subjective interpretation of data, and limitation of computing resources. Therefore, the corresponding networks are also approximate. This research addresses these issues of obtaining accurate results from large noisy networks efficiently. My dissertation has four main components. The first component consists of developing efficient and scalable algorithms for centrality computations that produce reliable results on noisy networks. Two novel contributions I made in this area are the development of a group testing [16] based algorithm for identification of high centrality vertices which is extremely faster than current methods, and an algorithm for computing the betweenness centrality of a specific vertex. The second component consists of developing quantitative metrics to measure how different noise models affect the analysis results. We implemented a uniform perturbation model based on random addition/ deletion of edges of a network. To quantify the stability of a network we investigated the effect that perturbations have on the top-k ranked vertices and the local structure properties of the top ranked vertices. The third component consists of developing efficient software for network analysis. I have been part of the development of a software package, ESSENS (Extensible, Scalable Software for Evolving NetworkS) [76], that effectively supports our algorithms on large networks. The fourth component is a literature review of the various noise models that researchers have applied to networks and the methods they have used to quantify the stability, sensitivity, robustness, and reliability of networks. These four aspects together will lead to efficient, accurate, and highly scalable algorithms for analyzing noisy networks

    Sheared Rayleigh-Bénard Turbulence

    Get PDF

    High performance cloud computing on multicore computers

    Get PDF
    The cloud has become a major computing platform, with virtualization being a key to allow applications to run and share the resources in the cloud. A wide spectrum of applications need to process large amounts of data at high speeds in the cloud, e.g., analyzing customer data to find out purchase behavior, processing location data to determine geographical trends, or mining social media data to assess brand sentiment. To achieve high performance, these applications create and use multiple threads running on multicore processors. However, existing virtualization technology cannot support the efficient execution of such applications on virtual machines, making them suffer poor and unstable performance in the cloud. Targeting multi-threaded applications, the dissertation analyzes and diagnoses their performance issues on virtual machines, and designs practical solutions to improve their performance. The dissertation makes the following contributions. First, the dissertation conducts extensive experiments with standard multicore applications, in order to evaluate the performance overhead on virtualization systems and diagnose the causing factors. Second, focusing on one main source of the performance overhead, excessive spinning, the dissertation designs and evaluates a holistic solution to make effective utilization of the hardware virtualization support in processors to reduce excessive spinning with low cost. Third, focusing on application scalability, which is the most important performance feature for multi-threaded applications, the dissertation models application scalability in virtual machines and analyzes how application scalability changes with virtualization and resource sharing. Based on the modeling and analysis, the dissertation identifies key application features and system factors that have impacts on application scalability, and reveals possible approaches for improving scalability. Forth, the dissertation explores one approach to improving application scalability by making fully utilization of virtual resources of each virtual machine. The general idea is to match the workload distribution among the virtual CPUs in a virtual machine and the virtual CPU resource of the virtual machine manager