Search CORE

15,815 research outputs found

Okapi: Causally Consistent Geo-Replication Made Faster, Cheaper and More Available

Author: Didona Diego
Spirovska Kristina
Zwaenepoel Willy
Publication venue
Publication date: 14/02/2017
Field of study

Okapi is a new causally consistent geo-replicated key- value store. Okapi leverages two key design choices to achieve high performance. First, it relies on hybrid logical/physical clocks to achieve low latency even in the presence of clock skew. Second, Okapi achieves higher resource efficiency and better availability, at the expense of a slight increase in update visibility latency. To this end, Okapi implements a new stabilization protocol that uses a combination of vector and scalar clocks and makes a remote update visible when its delivery has been acknowledged by every data center. We evaluate Okapi with different workloads on Amazon AWS, using three geographically distributed regions and 96 nodes. We compare Okapi with two recent approaches to causal consistency, Cure and GentleRain. We show that Okapi delivers up to two orders of magnitude better performance than GentleRain and that Okapi achieves up to 3.5x lower latency and a 60% reduction of the meta-data overhead with respect to Cure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Space Efficient Breadth-First and Level Traversals of Consistent Global States of Parallel Programs

Author: B Ganter
G Pruesse
G Steiner
KM Chandy
L Bianco
L Lamport
L Lamport
M Chein
M Habib
MM Sysło
S Alagar
S Alagar
T Ball
VK Garg
Publication venue
Publication date: 24/07/2017
Field of study

Enumerating consistent global states of a computation is a fundamental problem in parallel computing with applications to debug- ging, testing and runtime verification of parallel programs. Breadth-first search (BFS) enumeration is especially useful for these applications as it finds an erroneous consistent global state with the least number of events possible. The total number of executed events in a global state is called its rank. BFS also allows enumeration of all global states of a given rank or within a range of ranks. If a computation on n processes has m events per process on average, then the traditional BFS (Cooper-Marzullo and its variants) requires

\mathcal{O}(\frac{m^{n-1}}{n})

space in the worst case, whereas ou r algorithm performs the BFS requires

\mathcal{O}(m^2n^2)

space. Thus, we reduce the space complexity for BFS enumeration of consistent global states exponentially. and give the first polynomial space algorithm for this task. In our experimental evaluation of seven benchmarks, traditional BFS fails in many cases by exhausting the 2 GB heap space allowed to the JVM. In contrast, our implementation uses less than 60 MB memory and is also faster in many cases

arXiv.org e-Print Archive

Crossref

Execution replay and debugging

Author: De Bosschere Koen
de Kergommeaux Jacques Chassin
Ronsse Michiel
Publication venue
Publication date: 01/01/2000
Field of study

As most parallel and distributed programs are internally non-deterministic -- consecutive runs with the same input might result in a different program flow -- vanilla cyclic debugging techniques as such are useless. In order to use cyclic debugging tools, we need a tool that records information about an execution so that it can be replayed for debugging. Because recording information interferes with the execution, we must limit the amount of information and keep the processing of the information fast. This paper contains a survey of existing execution replay techniques and tools.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop on Automated Debugging (AADebug 2000), August 2000, Munich. cs.SE/001003

arXiv.org e-Print Archive

CiteSeerX

Ghent University Academic Bibliography

Revisiting LFSMs

Author: Arnault François
Berger Thierry
Minier Marine
Pousse Benjamin
Publication venue
Publication date: 01/01/2010
Field of study

Linear Finite State Machines (LFSMs) are particular primitives widely used in information theory, coding theory and cryptography. Among those linear automata, a particular case of study is Linear Feedback Shift Registers (LFSRs) used in many cryptographic applications such as design of stream ciphers or pseudo-random generation. LFSRs could be seen as particular LFSMs without inputs. In this paper, we first recall the description of LFSMs using traditional matrices representation. Then, we introduce a new matrices representation with polynomial fractional coefficients. This new representation leads to sparse representations and implementations. As direct applications, we focus our work on the Windmill LFSRs case, used for example in the E0 stream cipher and on other general applications that use this new representation. In a second part, a new design criterion called diffusion delay for LFSRs is introduced and well compared with existing related notions. This criterion represents the diffusion capacity of an LFSR. Thus, using the matrices representation, we present a new algorithm to randomly pick LFSRs with good properties (including the new one) and sparse descriptions dedicated to hardware and software designs. We present some examples of LFSRs generated using our algorithm to show the relevance of our approach.Comment: Submitted to IEEE-I

arXiv.org e-Print Archive

HAL-UNILIM

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Efficient, Near Complete and Often Sound Hybrid Dynamic Data Race Prediction (extended version)

Author: Stadtmüller Kai
Sulzmann Martin
Publication venue
Publication date: 04/11/2020
Field of study

Dynamic data race prediction aims to identify races based on a single program run represented by a trace. The challenge is to remain efficient while being as sound and as complete as possible. Efficient means a linear run-time as otherwise the method unlikely scales for real-world programs. We introduce an efficient, near complete and often sound dynamic data race prediction method that combines the lockset method with several improvements made in the area of happens-before methods. By near complete we mean that the method is complete in theory but for efficiency reasons the implementation applies some optimizations that may result in incompleteness. The method can be shown to be sound for two threads but is unsound in general. We provide extensive experimental data that shows that our method works well in practice.Comment: typos, appendi

arXiv.org e-Print Archive

Partially ordered distributed computations on asynchronous point-to-point networks

Author: Adelstein
Barbosa
Barbosa
Bertsekas
Birman
Birman
Charon-Bost
Corrêa
Corrêa
Dobrev
Drummond
Garg
Golub
Karp
Kshemkalyani
Kumar
Lamport
Li
Lynch
Peterson
Raynal
Ricardo C. Corrêa
Schiper
Singhal
Valmir C. Barbosa
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Asynchronous executions of a distributed algorithm differ from each other due to the nondeterminism in the order in which the messages exchanged are handled. In many situations of interest, the asynchronous executions induced by restricting nondeterminism are more efficient, in an application-specific sense, than the others. In this work, we define partially ordered executions of a distributed algorithm as the executions satisfying some restricted orders of their actions in two different frameworks, those of the so-called event- and pulse-driven computations. The aim of these restrictions is to characterize asynchronous executions that are likely to be more efficient for some important classes of applications. Also, an asynchronous algorithm that ensures the occurrence of partially ordered executions is given for each case. Two of the applications that we believe may benefit from the restricted nondeterminism are backtrack search, in the event-driven case, and iterative algorithms for systems of linear equations, in the pulse-driven case

arXiv.org e-Print Archive

CiteSeerX

Crossref

Non-intrusive on-the-fly data race detection using execution replay

Author: De Bosschere Koen
Ronsse Michiel
Publication venue
Publication date: 01/01/2000
Field of study

This paper presents a practical solution for detecting data races in parallel programs. The solution consists of a combination of execution replay (RecPlay) with automatic on-the-fly data race detection. This combination enables us to perform the data race detection on an unaltered execution (almost no probe effect). Furthermore, the usage of multilevel bitmaps and snooped matrix clocks limits the amount of memory used. As the record phase of RecPlay is highly efficient, there is no need to switch it off, hereby eliminating the possibility of Heisenbugs because tracing can be left on all the time.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop on Automated Debugging (AAdebug 2000), August 2000, Munich. cs.SE/001003

arXiv.org e-Print Archive

Ghent University Academic Bibliography