48 research outputs found
A Minimally Intrusive Low-Memory Approach to Resilience for Existing Transient Solvers
We propose a novel, minimally intrusive approach to adding fault tolerance to existing complex scientific simulation codes, used for addressing a broad range of time-dependent problems on the next generation of supercomputers. Exascale systems have the potential to allow much larger, more accurate and scale-resolving simulations of transient processes than can be performed on current petascale systems. However, with a much larger number of components, exascale computers are expected to suffer a node failure every few minutes. Many existing parallel simulation codes are not tolerant of these failures and existing resilience methodologies would necessitate major modifications or redesign of the application. Our approach combines the proposed user-level failure mitigation extensions to the Message-Passing Interface (MPI), with the concepts of message-logging and remote in-memory checkpointing, to demonstrate how to add scalable resilience to transient solvers. Logging MPI communication reduces the storage requirement of static data, such as finite element operators, and allows a spare MPI process to rebuild these data structures independently of other ranks. Remote in-memory checkpointing avoids disk I/O contention on large parallel filesystems. A prototype implementation is applied to Nektar++, a scalable, production-ready transient simulation framework. Forward-path and recovery-path performance of the resilience algorithm is analysed through experiments using the solver for the incompressible Navier-Stokes equations, and strong scaling of the approach is observed
Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks
The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks
Algebraic approaches to distributed compression and network error correction
Algebraic codes have been studied for decades and have extensive applications in communication and storage systems. In this dissertation, we propose several novel algebraic approaches for distributed compression and network error protection problems.
In the first part of this dissertation we propose the usage of Reed-Solomon codes for compression of two nonbinary sources. Reed-Solomon codes are easy to design and offer natural rate adaptivity. We compare their performance with multistage LDPC codes and show that algebraic soft-decision decoding of Reed-Solomon codes can be used effectively under certain correlation structures. As part of this work we have proposed a method that adapts list decoding for the problem of syndrome decoding. This in turn allows us to arrive at improved methods for the compression of multicast network coding vectors. When more than two correlated sources are present, we consider a correlation model given by a system of linear equations. We propose a transformation of correlation model and a way to determine proper decoding schedules. Our scheme allows us to exploit more correlations than those in the previous work and the simulation results confirm its better performance.
In the second part of this dissertation we study the network protection problem in the presence of adversarial errors and failures. In particular, we consider the usage of network coding for the problem of simultaneous protection of multiple unicast connections, under certain restrictions on the network topology. The proposed scheme allows the sharing of protection resources among multiple unicast connections. Simulations show that our proposed scheme saves network resources by 4%-15% compared to the protection scheme based on simple repetition codes, especially when the number of primary paths is large or the costs for establishing primary paths are high
Spin squeezed GKP codes for quantum error correction in atomic ensembles
GKP codes encode a qubit in displaced phase space combs of a
continuous-variable (CV) quantum system and are useful for correcting a variety
of high-weight photonic errors. Here we propose atomic ensemble analogues of
the single-mode CV GKP code by using the quantum central limit theorem to pull
back the phase space structure of a CV system to the compact phase space of a
quantum spin system. We study the optimal recovery performance of these codes
under error channels described by stochastic relaxation and isotropic ballistic
dephasing processes using the diversity combining approach for calculating
channel fidelity. We find that the spin GKP codes outperform other spin system
codes such as cat codes or binomial codes. Our spin GKP codes based on the
two-axis countertwisting interaction and superpositions of SU(2) coherent
states are direct spin analogues of the finite-energy CV GKP codes, whereas our
codes based on one-axis twisting do not yet have well-studied CV analogues. An
implementation of the spin GKP codes is proposed which uses the linear
combination of unitaries method, applicable to both the CV and spin GKP
settings. Finally, we discuss a fault-tolerant approximate gate set for quantum
computing with spin GKP-encoded qubits, obtained by translating gates from the
CV GKP setting using quantum central limit theorem.Comment: More details added to the previous versions with more figure
Software for Exascale Computing - SPPEXA 2016-2019
This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Programming Languages and Systems
This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems