15,815 research outputs found

    Okapi: Causally Consistent Geo-Replication Made Faster, Cheaper and More Available

    Get PDF
    Okapi is a new causally consistent geo-replicated key- value store. Okapi leverages two key design choices to achieve high performance. First, it relies on hybrid logical/physical clocks to achieve low latency even in the presence of clock skew. Second, Okapi achieves higher resource efficiency and better availability, at the expense of a slight increase in update visibility latency. To this end, Okapi implements a new stabilization protocol that uses a combination of vector and scalar clocks and makes a remote update visible when its delivery has been acknowledged by every data center. We evaluate Okapi with different workloads on Amazon AWS, using three geographically distributed regions and 96 nodes. We compare Okapi with two recent approaches to causal consistency, Cure and GentleRain. We show that Okapi delivers up to two orders of magnitude better performance than GentleRain and that Okapi achieves up to 3.5x lower latency and a 60% reduction of the meta-data overhead with respect to Cure

    Space Efficient Breadth-First and Level Traversals of Consistent Global States of Parallel Programs

    Full text link
    Enumerating consistent global states of a computation is a fundamental problem in parallel computing with applications to debug- ging, testing and runtime verification of parallel programs. Breadth-first search (BFS) enumeration is especially useful for these applications as it finds an erroneous consistent global state with the least number of events possible. The total number of executed events in a global state is called its rank. BFS also allows enumeration of all global states of a given rank or within a range of ranks. If a computation on n processes has m events per process on average, then the traditional BFS (Cooper-Marzullo and its variants) requires O(mn−1n)\mathcal{O}(\frac{m^{n-1}}{n}) space in the worst case, whereas ou r algorithm performs the BFS requires O(m2n2)\mathcal{O}(m^2n^2) space. Thus, we reduce the space complexity for BFS enumeration of consistent global states exponentially. and give the first polynomial space algorithm for this task. In our experimental evaluation of seven benchmarks, traditional BFS fails in many cases by exhausting the 2 GB heap space allowed to the JVM. In contrast, our implementation uses less than 60 MB memory and is also faster in many cases

    Execution replay and debugging

    Full text link
    As most parallel and distributed programs are internally non-deterministic -- consecutive runs with the same input might result in a different program flow -- vanilla cyclic debugging techniques as such are useless. In order to use cyclic debugging tools, we need a tool that records information about an execution so that it can be replayed for debugging. Because recording information interferes with the execution, we must limit the amount of information and keep the processing of the information fast. This paper contains a survey of existing execution replay techniques and tools.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop on Automated Debugging (AADebug 2000), August 2000, Munich. cs.SE/001003

    Revisiting LFSMs

    Full text link
    Linear Finite State Machines (LFSMs) are particular primitives widely used in information theory, coding theory and cryptography. Among those linear automata, a particular case of study is Linear Feedback Shift Registers (LFSRs) used in many cryptographic applications such as design of stream ciphers or pseudo-random generation. LFSRs could be seen as particular LFSMs without inputs. In this paper, we first recall the description of LFSMs using traditional matrices representation. Then, we introduce a new matrices representation with polynomial fractional coefficients. This new representation leads to sparse representations and implementations. As direct applications, we focus our work on the Windmill LFSRs case, used for example in the E0 stream cipher and on other general applications that use this new representation. In a second part, a new design criterion called diffusion delay for LFSRs is introduced and well compared with existing related notions. This criterion represents the diffusion capacity of an LFSR. Thus, using the matrices representation, we present a new algorithm to randomly pick LFSRs with good properties (including the new one) and sparse descriptions dedicated to hardware and software designs. We present some examples of LFSRs generated using our algorithm to show the relevance of our approach.Comment: Submitted to IEEE-I

    Efficient, Near Complete and Often Sound Hybrid Dynamic Data Race Prediction (extended version)

    Full text link
    Dynamic data race prediction aims to identify races based on a single program run represented by a trace. The challenge is to remain efficient while being as sound and as complete as possible. Efficient means a linear run-time as otherwise the method unlikely scales for real-world programs. We introduce an efficient, near complete and often sound dynamic data race prediction method that combines the lockset method with several improvements made in the area of happens-before methods. By near complete we mean that the method is complete in theory but for efficiency reasons the implementation applies some optimizations that may result in incompleteness. The method can be shown to be sound for two threads but is unsound in general. We provide extensive experimental data that shows that our method works well in practice.Comment: typos, appendi

    Partially ordered distributed computations on asynchronous point-to-point networks

    Full text link
    Asynchronous executions of a distributed algorithm differ from each other due to the nondeterminism in the order in which the messages exchanged are handled. In many situations of interest, the asynchronous executions induced by restricting nondeterminism are more efficient, in an application-specific sense, than the others. In this work, we define partially ordered executions of a distributed algorithm as the executions satisfying some restricted orders of their actions in two different frameworks, those of the so-called event- and pulse-driven computations. The aim of these restrictions is to characterize asynchronous executions that are likely to be more efficient for some important classes of applications. Also, an asynchronous algorithm that ensures the occurrence of partially ordered executions is given for each case. Two of the applications that we believe may benefit from the restricted nondeterminism are backtrack search, in the event-driven case, and iterative algorithms for systems of linear equations, in the pulse-driven case

    Non-intrusive on-the-fly data race detection using execution replay

    Full text link
    This paper presents a practical solution for detecting data races in parallel programs. The solution consists of a combination of execution replay (RecPlay) with automatic on-the-fly data race detection. This combination enables us to perform the data race detection on an unaltered execution (almost no probe effect). Furthermore, the usage of multilevel bitmaps and snooped matrix clocks limits the amount of memory used. As the record phase of RecPlay is highly efficient, there is no need to switch it off, hereby eliminating the possibility of Heisenbugs because tracing can be left on all the time.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop on Automated Debugging (AAdebug 2000), August 2000, Munich. cs.SE/001003
    • …
    corecore