48 research outputs found

    A Minimally Intrusive Low-Memory Approach to Resilience for Existing Transient Solvers

    Get PDF
    We propose a novel, minimally intrusive approach to adding fault tolerance to existing complex scientific simulation codes, used for addressing a broad range of time-dependent problems on the next generation of supercomputers. Exascale systems have the potential to allow much larger, more accurate and scale-resolving simulations of transient processes than can be performed on current petascale systems. However, with a much larger number of components, exascale computers are expected to suffer a node failure every few minutes. Many existing parallel simulation codes are not tolerant of these failures and existing resilience methodologies would necessitate major modifications or redesign of the application. Our approach combines the proposed user-level failure mitigation extensions to the Message-Passing Interface (MPI), with the concepts of message-logging and remote in-memory checkpointing, to demonstrate how to add scalable resilience to transient solvers. Logging MPI communication reduces the storage requirement of static data, such as finite element operators, and allows a spare MPI process to rebuild these data structures independently of other ranks. Remote in-memory checkpointing avoids disk I/O contention on large parallel filesystems. A prototype implementation is applied to Nektar++, a scalable, production-ready transient simulation framework. Forward-path and recovery-path performance of the resilience algorithm is analysed through experiments using the solver for the incompressible Navier-Stokes equations, and strong scaling of the approach is observed

    Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks

    Get PDF
    The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks

    Algebraic approaches to distributed compression and network error correction

    Get PDF
    Algebraic codes have been studied for decades and have extensive applications in communication and storage systems. In this dissertation, we propose several novel algebraic approaches for distributed compression and network error protection problems. In the first part of this dissertation we propose the usage of Reed-Solomon codes for compression of two nonbinary sources. Reed-Solomon codes are easy to design and offer natural rate adaptivity. We compare their performance with multistage LDPC codes and show that algebraic soft-decision decoding of Reed-Solomon codes can be used effectively under certain correlation structures. As part of this work we have proposed a method that adapts list decoding for the problem of syndrome decoding. This in turn allows us to arrive at improved methods for the compression of multicast network coding vectors. When more than two correlated sources are present, we consider a correlation model given by a system of linear equations. We propose a transformation of correlation model and a way to determine proper decoding schedules. Our scheme allows us to exploit more correlations than those in the previous work and the simulation results confirm its better performance. In the second part of this dissertation we study the network protection problem in the presence of adversarial errors and failures. In particular, we consider the usage of network coding for the problem of simultaneous protection of multiple unicast connections, under certain restrictions on the network topology. The proposed scheme allows the sharing of protection resources among multiple unicast connections. Simulations show that our proposed scheme saves network resources by 4%-15% compared to the protection scheme based on simple repetition codes, especially when the number of primary paths is large or the costs for establishing primary paths are high

    Spin squeezed GKP codes for quantum error correction in atomic ensembles

    Full text link
    GKP codes encode a qubit in displaced phase space combs of a continuous-variable (CV) quantum system and are useful for correcting a variety of high-weight photonic errors. Here we propose atomic ensemble analogues of the single-mode CV GKP code by using the quantum central limit theorem to pull back the phase space structure of a CV system to the compact phase space of a quantum spin system. We study the optimal recovery performance of these codes under error channels described by stochastic relaxation and isotropic ballistic dephasing processes using the diversity combining approach for calculating channel fidelity. We find that the spin GKP codes outperform other spin system codes such as cat codes or binomial codes. Our spin GKP codes based on the two-axis countertwisting interaction and superpositions of SU(2) coherent states are direct spin analogues of the finite-energy CV GKP codes, whereas our codes based on one-axis twisting do not yet have well-studied CV analogues. An implementation of the spin GKP codes is proposed which uses the linear combination of unitaries method, applicable to both the CV and spin GKP settings. Finally, we discuss a fault-tolerant approximate gate set for quantum computing with spin GKP-encoded qubits, obtained by translating gates from the CV GKP setting using quantum central limit theorem.Comment: More details added to the previous versions with more figure

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
    corecore