15,815 research outputs found
Okapi: Causally Consistent Geo-Replication Made Faster, Cheaper and More Available
Okapi is a new causally consistent geo-replicated key- value store. Okapi
leverages two key design choices to achieve high performance. First, it relies
on hybrid logical/physical clocks to achieve low latency even in the presence
of clock skew. Second, Okapi achieves higher resource efficiency and better
availability, at the expense of a slight increase in update visibility latency.
To this end, Okapi implements a new stabilization protocol that uses a
combination of vector and scalar clocks and makes a remote update visible when
its delivery has been acknowledged by every data center. We evaluate Okapi with
different workloads on Amazon AWS, using three geographically distributed
regions and 96 nodes. We compare Okapi with two recent approaches to causal
consistency, Cure and GentleRain. We show that Okapi delivers up to two orders
of magnitude better performance than GentleRain and that Okapi achieves up to
3.5x lower latency and a 60% reduction of the meta-data overhead with respect
to Cure
Space Efficient Breadth-First and Level Traversals of Consistent Global States of Parallel Programs
Enumerating consistent global states of a computation is a fundamental
problem in parallel computing with applications to debug- ging, testing and
runtime verification of parallel programs. Breadth-first search (BFS)
enumeration is especially useful for these applications as it finds an
erroneous consistent global state with the least number of events possible. The
total number of executed events in a global state is called its rank. BFS also
allows enumeration of all global states of a given rank or within a range of
ranks. If a computation on n processes has m events per process on average,
then the traditional BFS (Cooper-Marzullo and its variants) requires
space in the worst case, whereas ou r
algorithm performs the BFS requires space. Thus, we
reduce the space complexity for BFS enumeration of consistent global states
exponentially. and give the first polynomial space algorithm for this task. In
our experimental evaluation of seven benchmarks, traditional BFS fails in many
cases by exhausting the 2 GB heap space allowed to the JVM. In contrast, our
implementation uses less than 60 MB memory and is also faster in many cases
Execution replay and debugging
As most parallel and distributed programs are internally non-deterministic --
consecutive runs with the same input might result in a different program flow
-- vanilla cyclic debugging techniques as such are useless. In order to use
cyclic debugging tools, we need a tool that records information about an
execution so that it can be replayed for debugging. Because recording
information interferes with the execution, we must limit the amount of
information and keep the processing of the information fast. This paper
contains a survey of existing execution replay techniques and tools.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop
on Automated Debugging (AADebug 2000), August 2000, Munich. cs.SE/001003
Revisiting LFSMs
Linear Finite State Machines (LFSMs) are particular primitives widely used in
information theory, coding theory and cryptography. Among those linear
automata, a particular case of study is Linear Feedback Shift Registers (LFSRs)
used in many cryptographic applications such as design of stream ciphers or
pseudo-random generation. LFSRs could be seen as particular LFSMs without
inputs.
In this paper, we first recall the description of LFSMs using traditional
matrices representation. Then, we introduce a new matrices representation with
polynomial fractional coefficients. This new representation leads to sparse
representations and implementations. As direct applications, we focus our work
on the Windmill LFSRs case, used for example in the E0 stream cipher and on
other general applications that use this new representation.
In a second part, a new design criterion called diffusion delay for LFSRs is
introduced and well compared with existing related notions. This criterion
represents the diffusion capacity of an LFSR. Thus, using the matrices
representation, we present a new algorithm to randomly pick LFSRs with good
properties (including the new one) and sparse descriptions dedicated to
hardware and software designs. We present some examples of LFSRs generated
using our algorithm to show the relevance of our approach.Comment: Submitted to IEEE-I
Efficient, Near Complete and Often Sound Hybrid Dynamic Data Race Prediction (extended version)
Dynamic data race prediction aims to identify races based on a single program
run represented by a trace. The challenge is to remain efficient while being as
sound and as complete as possible. Efficient means a linear run-time as
otherwise the method unlikely scales for real-world programs. We introduce an
efficient, near complete and often sound dynamic data race prediction method
that combines the lockset method with several improvements made in the area of
happens-before methods. By near complete we mean that the method is complete in
theory but for efficiency reasons the implementation applies some optimizations
that may result in incompleteness. The method can be shown to be sound for two
threads but is unsound in general. We provide extensive experimental data that
shows that our method works well in practice.Comment: typos, appendi
Partially ordered distributed computations on asynchronous point-to-point networks
Asynchronous executions of a distributed algorithm differ from each other due
to the nondeterminism in the order in which the messages exchanged are handled.
In many situations of interest, the asynchronous executions induced by
restricting nondeterminism are more efficient, in an application-specific
sense, than the others. In this work, we define partially ordered executions of
a distributed algorithm as the executions satisfying some restricted orders of
their actions in two different frameworks, those of the so-called event- and
pulse-driven computations. The aim of these restrictions is to characterize
asynchronous executions that are likely to be more efficient for some important
classes of applications. Also, an asynchronous algorithm that ensures the
occurrence of partially ordered executions is given for each case. Two of the
applications that we believe may benefit from the restricted nondeterminism are
backtrack search, in the event-driven case, and iterative algorithms for
systems of linear equations, in the pulse-driven case
Non-intrusive on-the-fly data race detection using execution replay
This paper presents a practical solution for detecting data races in parallel
programs. The solution consists of a combination of execution replay (RecPlay)
with automatic on-the-fly data race detection. This combination enables us to
perform the data race detection on an unaltered execution (almost no probe
effect). Furthermore, the usage of multilevel bitmaps and snooped matrix clocks
limits the amount of memory used. As the record phase of RecPlay is highly
efficient, there is no need to switch it off, hereby eliminating the
possibility of Heisenbugs because tracing can be left on all the time.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop
on Automated Debugging (AAdebug 2000), August 2000, Munich. cs.SE/001003
- …