15,139 research outputs found
A thread calculus with molecular dynamics
We present a theory of threads, interleaving of threads, and interaction
between threads and services with features of molecular dynamics, a model of
computation that bears on computations in which dynamic data structures are
involved. Threads can interact with services of which the states consist of
structured data objects and computations take place by means of actions which
may change the structure of the data objects. The features introduced include
restriction of the scope of names used in threads to refer to data objects.
Because that feature makes it troublesome to provide a model based on
structural operational semantics and bisimulation, we construct a projective
limit model for the theory.Comment: 47 pages; examples and results added, phrasing improved, references
replace
Lost in Abstraction: Monotonicity in Multi-Threaded Programs (Extended Technical Report)
Monotonicity in concurrent systems stipulates that, in any global state,
extant system actions remain executable when new processes are added to the
state. This concept is not only natural and common in multi-threaded software,
but also useful: if every thread's memory is finite, monotonicity often
guarantees the decidability of safety property verification even when the
number of running threads is unknown. In this paper, we show that the act of
obtaining finite-data thread abstractions for model checking can be at odds
with monotonicity: Predicate-abstracting certain widely used monotone software
results in non-monotone multi-threaded Boolean programs - the monotonicity is
lost in the abstraction. As a result, well-established sound and complete
safety checking algorithms become inapplicable; in fact, safety checking turns
out to be undecidable for the obtained class of unbounded-thread Boolean
programs. We demonstrate how the abstract programs can be modified into
monotone ones, without affecting safety properties of the non-monotone
abstraction. This significantly improves earlier approaches of enforcing
monotonicity via overapproximations
Mathematizing C++ concurrency
Shared-memory concurrency in C and C++ is pervasive in systems programming, but has long been poorly defined. This motivated an ongoing shared effort by the standards committees to specify concurrent behaviour in the next versions of both languages. They aim to provide strong guarantees for race-free programs, together with new (but subtle) relaxed-memory atomic primitives for high-performance concurrent code. However, the current draft standards, while the result of careful deliberation, are not yet clear and rigorous definitions, and harbour substantial problems in their details.
In this paper we establish a mathematical (yet readable) semantics for C++ concurrency. We aim to capture the intent of the current (`Final Committee') Draft as closely as possible, but discuss changes that fix many of its problems. We prove that a proposed x86 implementation of the concurrency primitives is correct with respect to the x86-TSO model, and describe our Cppmem tool for exploring the semantics of examples, using code generated from our Isabelle/HOL definitions.
Having already motivated changes to the draft standard, this work will aid discussion of any further changes, provide a correctness condition for compilers, and give a much-needed basis for analysis and verification of concurrent C and C++ programs
Efficient, Near Complete and Often Sound Hybrid Dynamic Data Race Prediction (extended version)
Dynamic data race prediction aims to identify races based on a single program
run represented by a trace. The challenge is to remain efficient while being as
sound and as complete as possible. Efficient means a linear run-time as
otherwise the method unlikely scales for real-world programs. We introduce an
efficient, near complete and often sound dynamic data race prediction method
that combines the lockset method with several improvements made in the area of
happens-before methods. By near complete we mean that the method is complete in
theory but for efficiency reasons the implementation applies some optimizations
that may result in incompleteness. The method can be shown to be sound for two
threads but is unsound in general. We provide extensive experimental data that
shows that our method works well in practice.Comment: typos, appendi
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for
\textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search
system for ultra-high dimensional datasets on a single machine, that does not
require similarity computations and is tailored for high-performance computing
platforms. By leveraging a LSH style randomized indexing procedure and
combining it with several principled techniques, such as reservoir sampling,
recent advances in one-pass minwise hashing, and count based estimations, we
reduce the computational and parallelization costs of similarity search, while
retaining sound theoretical guarantees.
We evaluate FLASH on several real, high-dimensional datasets from different
domains, including text, malicious URL, click-through prediction, social
networks, etc. Our experiments shed new light on the difficulties associated
with datasets having several million dimensions. Current state-of-the-art
implementations either fail on the presented scale or are orders of magnitude
slower than FLASH. FLASH is capable of computing an approximate k-NN graph,
from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than
10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam
dataset, using brute-force (), will require at least 20 teraflops. We
provide CPU and GPU implementations of FLASH for replicability of our results
Synchronising C/C++ and POWER
Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of IBM®, POWER®, ARM, or C/C++, it remains surprisingly unclear exactly what the programmer can depend on.
This paper studies relaxed-memory synchronisation. On the hardware side, we give a clear semantic characterisation of the load-reserve/store-conditional primitives as provided by POWER multiprocessors, for the first time since they were introduced 20 years ago; we cover their interaction with relaxed loads, stores, barriers, and dependencies. Our model, while not officially sanctioned by the vendor, is validated by extensive testing, comparing actual implementation behaviour against an oracle generated from the model, and by detailed discussion with IBM staff. We believe the ARM semantics to be similar.
On the software side, we prove sound a proposed compilation scheme of the C/C++ synchronisation constructs to POWER, including C/C++ spinlock mutexes, fences, and read-modify-write operations, together with the simpler atomic operations for which soundness is already known from our previous work; this is a first step in verifying concurrent algorithms that use load-reserve/store-conditional with respect to a realistic semantics. We also build confidence in the C/C++ model in its own terms, fixing some omissions and contributing to the C standards committee adoption of the C++11 concurrency model
- …