





In this thesis we present a computer-aided programming approach to concurrency. Our approach
helps the programmer by automatically fixing concurrency-related bugs, i.e. bugs that occur
when the program is executed using an aggressive preemptive scheduler, but not when using a
non-preemptive (cooperative) scheduler. Bugs are program behaviours that are incorrect w.r.t.
a specification. We consider both user-provided explicit specifications in the form of assertion
statements in the code as well as an implicit specification. The implicit specification is inferred
from the non-preemptive behaviour. Let us consider sequences of calls that the program makes
to an external interface. The implicit specification requires that any such sequence produced
under a preemptive scheduler should be included in the set of sequences produced under a
non-preemptive scheduler.
We consider several semantics-preserving fixes that go beyond atomic sections typically
explored in the synchronisation synthesis literature. Our synthesis is able to place locks, barriers
and wait-signal statements and last, but not least reorder independent statements. The latter
may be useful if a thread is released to early, e.g., before some initialisation is completed. We
guarantee that our synthesis does not introduce deadlocks and that the synchronisation inserted
is optimal w.r.t. a given objective function.
We dub our solution trace-based synchronisation synthesis and it is loosely based on
counterexample-guided inductive synthesis (CEGIS). The synthesis works by discovering a
trace that is incorrect w.r.t. the specification and identifying ordering constraints crucial to trigger
the specification violation. Synchronisation may be placed immediately (greedy approach) or
delayed until all incorrect traces are found (non-greedy approach). For the non-greedy approach
we construct a set of global constraints over synchronisation placements. Each model of the
global constraints set corresponds to a correctness-ensuring synchronisation placement. The
placement that is optimal w.r.t. the given objective function is chosen as the synchronisation
solution.
iii
We evaluate our approach on a number of realistic (albeit simplified) Linux device-driver
benchmarks. The benchmarks are versions of the drivers with known concurrency-related bugs.
For the experiments with an explicit specification we added assertions that would detect the bugs
in the experiments. Device drivers lend themselves to implicit specification, where the device and
the operating system are the external interfaces. Our experiments demonstrate that our synthesis
method is precise and efficient. We implemented objective functions for coarse-grained and
fine-grained locking and observed that different synchronisation placements are produced for




I would like to thank all those people who helped me in the last four years to finish this thesis.
First and foremost Thomas A. Henzinger whose advice proved invaluable to guide me along
the way. Despite his responsibilities as a president he always has time for his students. Pavol
Černý and Arjun Radhakrishna helped me get a smooth start with my PhD by involving me into
the project that would eventually make up this thesis. But before even starting my PhD at IST I
visited Laura Kovács at TU Vienna who encouraged me to persue a PhD degree. I would also
like to thank Roderick Bloem and Bettina Könighofer for the many interesting discussions and
for inviting me to Graz.
Thanks to the professors from my thesis committee for reviewing this thesis: Roderick
Bloem, Krishnendu Chatterjee, Pavol Černý, Thomas A. Henzinger and Laura Kovács.
Further I am grateful to my collaborators Ashutosh Gupta, Pavol Černý, Thomas A. Hen-
zinger, Arjun Radhakrishna, Leonid Ryzhyk and Roopsha Samanta for the fruitful collaborations
we had. Without them I would not have been able to conduct this research.
My office mates Przemysław Daca, Mirco Giacobbe, Andreas Pavlogiannis, Arjun Radhakrishna,
Jakob Rueß and the usual coffee break crowd consisting of Sergiy Bogomolov, Ventsislav Chonev,
Rasmus Ibsen-Jensen, Bernhard Kragl, Petr Novotný, Jan Otop and Roopsha Samanta made my
time at IST a much more pleasent experience
Last but not least I would like to thank my family for their continued support even though I
did not have as much time for them as I would have liked.
v
About the Author
Thorsten Tarrach completed a BSc and MSc in Computer Science at the University of
Saarland in Saarbrücken, Germany in 2010. After two years in the industry as a software engineer
he joined IST in 2012. During his PhD studies he worked on synthesis of synchronisation for
concurrent programs. The PhD was concluded in 2016.
vi
Record of Publications
The work appearing in this thesis was supported in part by the European Research Council
(ERC) under grant 267989 (QUAREM), by the Austrian Science Fund (FWF) under grants
S11402-N23 (RiSE) and Z211-N23 (Wittgenstein Award), by NSF under award CCF 1421752
and the Expeditions award CCF 1138996, by the Simons Foundation and by a gift from the Intel
Corporation.
The formalisation in Chapter 2 and the synthesis technique presented in Chapter 6 is an
extension of our work that appeared in [Černý et al., 2015b]. We extended said paper by
including a proof for Theorem 6.4.5 that shows that language inclusion is undecidable for our
particular construction of automata and independence relation. Further, we introduced a set of
global mutex constraints that replace the greedy approach of our previous work and enables
optimal lock placement according to an objective function. This part of the work is to appear in
the Formal Methods in System Design journal.
Chapters 3, 4 and 5 contain the research from our publications [Černý et al., 2013], [Černý
et al., 2014] and [Gupta et al., 2015] respectively. The presentation has been modified w.r.t. the
original papers to match the formalisation presented in Chapter 2. This is especially true for





About the Author v
Record of Publications vi
Tabel of Contents vii
List of Tables x
List of Figures xi
List of Abbreviations xiii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Formal Framework and Problem Statement 19
2.1 Concurrent Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Executions and Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Program Correctness and the Synthesis Problem . . . . . . . . . . . . . . . . . 29
2.4 Trace-based Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
viii
3 Synthesis for an Explicit Specification 33
3.1 Problem Statement and Illustrative Examples . . . . . . . . . . . . . . . . . . 33
3.2 Semantics-preserving Transformations . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Generalising Counterexamples to Partial-order Neighbourhoods . . . . . . . . 38
3.4 Synthesis by Elimination of Bad Neighbourhoods . . . . . . . . . . . . . . . . 42
3.5 Implementation and Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Regression-free Synthesis 49
4.1 Problem Statement and Illustrative Example . . . . . . . . . . . . . . . . . . . 49
4.2 Good and Bad Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 The Regression-free Synthesis Procedure . . . . . . . . . . . . . . . . . . . . 60
4.4 Implementation and Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Synthesis of Locks and Other Synchronisation Primitives 67
5.1 Problem Statement and Illustrative Examples . . . . . . . . . . . . . . . . . . 67
5.2 Computing Good and Bad Neighbourhoods . . . . . . . . . . . . . . . . . . . 71
5.3 Synchronisation Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4 Bug Summarisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.5 Accelerating CEGAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.6 Implementation and Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 88
6 Synthesis using an Implicit Specification 95
6.1 Problem Statement and Illustrative Example . . . . . . . . . . . . . . . . . . . 95
6.2 Solution Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3 Abstract Concurrent Programs . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.4 Checking Preemption-safety . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.5 Finding and Enforcing Mutex Constraints . . . . . . . . . . . . . . . . . . . . 120
6.6 Global Lock Placement Constraints . . . . . . . . . . . . . . . . . . . . . . . 126
6.7 Optimising Lock Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . 131




List of Publications 151
x
List of Tables
3.1 Synchronisation patterns in Linux device drivers. . . . . . . . . . . . . . . . . 45
3.2 Subpatterns of the REORDER pattern. . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Micro-benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Results for Linux Realtek 8169 driver benchmark . . . . . . . . . . . . . . . . 47
4.1 Results in iterations and time needed. . . . . . . . . . . . . . . . . . . . . . . 64
5.1 Experiments: ϕB generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Experiments: synchronisation synthesis L : Locks WS : Wait-notifies B :
Barriers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Experiments: bug summarisation 2S : 2-stage access bug DR : Data-race bug
OAV : Other atomicity violation DU : Define-use bugs OV : Order-violation
bug (only human inspection) Human : Human interpretation of the bug . . . . 92
5.4 Experiments: CEGAR acceleration . . . . . . . . . . . . . . . . . . . . . . . . 93
6.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.2 Lock placement statistics: the number of synthesised lock variables, lock and
unlock statements, and the number of abstract statements protected by locks for
different objective functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
xi
List of Figures
1.1 Example with two threads incrementing x . . . . . . . . . . . . . . . . . . . . 7
1.2 Example with correct locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Syntax ofW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Single-thread semantics ofW . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Single-thread semantics ofW (continued) . . . . . . . . . . . . . . . . . . . . 25
2.3 Non-preemptive semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Additional rule for preemptive semantics . . . . . . . . . . . . . . . . . . . . . 26
2.5 Basic flow of trace-based synthesis . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 Illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Example for copying idempotent statements. . . . . . . . . . . . . . . . . . . . 38
3.3 Examples of REORDER subpatterns and corresponding elimination graphs. . . . 45
4.1 Program analysis with good and bad traces . . . . . . . . . . . . . . . . . . . . 52
4.2 Sample Good Traces for Regression-preventing constraints . . . . . . . . . . . 56
4.3 Eliminating bad traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 The CEGIS and PACES spectrum . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1 Online banking: This trace is drawn from a program consisting of three threads,
one for withdrawing money, one for depositing money, and one for checking
consistency of the bank account after completion of a withdrawal and a deposit.
(In all the examples in this chapter, we represent traces using typed global variable declarations/initialisations, followed
by each thread’s typed local variable declarations and statements. Note that this representation depicts a trace and not
a program.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
xii
5.2 Translation of statement with identifier ` to guarded action . . . . . . . . . . . 73
5.3 Rewrite rules for synchronisation synthesis . . . . . . . . . . . . . . . . . . . 79
5.4 Example programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4 Example programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.5 Inference rules for bug summarisation (In this figure, `1 < `′1 < `′′1 and `2 < `′2 < `′′2 .) 83
6.1 Running example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 Interaction of the device driver with the OS and the device . . . . . . . . . . . 99
6.3 Abstraction of the running example . . . . . . . . . . . . . . . . . . . . . . . 100
6.4 Running example with the synthesised locks . . . . . . . . . . . . . . . . . . . 101
6.5 Solution Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.6 Syntax ofWabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.7 Partial set of rules for single-thread semantics ofWabs . . . . . . . . . . . . . . 105
6.8 Example motivating branch-tagging . . . . . . . . . . . . . . . . . . . . . . . 105
6.9 Abstraction function fromW toWabs . . . . . . . . . . . . . . . . . . . . . . 106
6.10 Simulator algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.11 Example for illustrating construction of Bk,I for k = 1 and I = {(α, β)}. . . . 116
6.12 Atomicity violation patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.13 Example: Mutex constraints and conflicts . . . . . . . . . . . . . . . . . . . . 124
xiii
List of Abbreviations
CEGAR Counterexample guided abstraction refinement
CEGIS Counterexample guided inductive synthesis
HB Happens-before
LOC Lines of Code
mutex mutual exclusion
NFA Non-deterministic finite automaton




The main goal of this thesis is to introduce a technique for computer-aided programming where
the programmer is helped with the hardest aspect of concurrent programs: synchronisation.
A concurrent program allows several threads of statements to run at the same time. Such
programs typically require synchronisation to ensure threads do not interfere with each other.
Our technique helps the programmer by automatically repairing a concurrent program by adding
missing synchronisation to the source code.
1.1 Motivation
The recent interest in concurrency in both industry and academia stems from developments
in the CPU market. While the single-core performance has stagnated, the number of cores
in today’s CPUs is growing. This means programmers can only improve the performance of
their algorithms by parallelising them. A second motivation for concurrency is to ensure that
systems stay responsive even during long running computations. Even single-core systems
may create the illusion of running threads in parallel using time-sharing, i.e. quickly switching
between threads. Such a switch is called a context-switch. In this work we assume a single core
that uses frequent context-switches to run threads concurrently. The work would also apply to
a hypothetical multi-core system that guarantees sequential consistency of memory accesses.
Common multi-core systems today do not offer such guarantees, so that additional synthesis
steps would be required to enforce memory consistency.
However, implementing concurrent programs is difficult as it requires great care not to
2
introduce errors due to missing synchronisation. Without synchronisation concurrent systems
offer no guarantee as to the relative progress between threads. We consider in this thesis shared-
memory concurrency, where each threads has a local state and read/write access to a shared state.
Other types of concurrency are discussed in Section 1.3. Lack of synchronisation may lead to an
unintended shared state if two threads manipulate the same variables concurrently. Such states
outside the specification are considered bugs.
Our work aims to discover and fix concurrency bugs. Concurrency bugs only occur because
of scheduler choices on the interleaving of threads. The stochastic nature of these bugs make
them hard to discover by traditional testing.
Techniques to automatically discover bugs have a long research history and are important
for critical applications, such as software used in the automotive or aviation industry. They are
also used in hardware design as replacing hardware is much more expensive. The term Formal
Methods is a term used to describe such techniques. They all have in common that a formal
specification is used to enable automatic bug discovery. Bug discovery methods for concurrent
programs range from the discovery of common bug patterns [Jin et al., 2012], race detection
[Savage et al., 1997] to software verification.
Software verification is a method to prove software conforms to its specification. It was
pioneered in the 60s with Floyd-Hoare triples [Floyd, 1967; Hoare, 1969] for manual verification,
and later automated using model checking [Clarke and Emerson, 1982; Queille and Sifakis,
1982]. To this day automated software verification remains a challenge because depending on
whether the state space is finite it is either undecidable or PSPACE-complete [Esparza, 1997;
Schnoebelen, 2002], meaning for real-world concurrent programs there is no efficient model-
checking algorithm available. This lead to the development of sound, but incomplete algorithms
that will find all bugs, but also return false-positives. In practice this leads to a huge number of
false-positive bug reports the programmer has to comb through. An example for a model-checker
that can deal with concurrent programs is CBMC [Clarke et al., 2004]. In our experience it
cannot verify programs with more than 400 lines of code and 5 threads in reasonable time.
We want to go one step further and not just discover concurrency bugs, but also assist
programmers in fixing these bugs automatically. The programmer would provide a program
that is sequentially correct and tested enough to ensure that there are no bugs in the sequential
execution. This is a reasonable assumption as programmers have developed and tested sequential
programs for decades. Our tool then finds concurrency bugs and automatically modifies the
3
source code to eliminate the bug by adding synchronisation constructs.
This type of program repair is a form of synthesis. Traditional, functional synthesis, which
attempts to generate whole programs from specifications, goes back to the 60s [Green, 1969;
Waldinger and Lee, 1969]. Despite half a century of research, synthesis of full programs is still
out of reach. Our synthesis is less ambitious: We only aim to introduce isolated synchronisation
primitives into the code.
Our synthesis is based on verification: A verifier is used to find a bug in the program that we fix
using our synthesis. This means we inherit the disadvantages of automated program verification,
such as false-positives. In our case false-positives would lead to additional synchronisation,
which may negatively impact performance, but not introduce incorrect behaviour. In general,
introduction of synchronisation in the program code may cause deadlocks. This is a state where
some threads of the program are blocked indefinitely and the program cannot terminate. To
illustrate this, take the most common type of synchronisation primitive, locks, which ensure
that certain areas of the program are mutually exclusive. A thread tid1 trying to acquire a lock
currently held by thread tid2 has to wait for the lock to become available. If both threads try to
acquire a lock currently held by the other, none can make progress. This is a classic example for
a deadlock. In our synthesis we take care not to introduce deadlocks.
Finally, we envision a system where the programmer may write code completely ignoring
concurrency and the necessary synchronisation is automatically inserted using our synthesis
technique. While in this thesis we introduce all the needed techniques, they do not scale to
real-world programs.
Though our work is applicable to a wide variety of programs, we evaluated our imple-
mentation on Linux device drivers. Device drivers are a crucial pieces of code that facilitate
communication between the operating system and the hardware. They must inherently support
concurrency and a bug in a device driver is more devastating because it may cause the entire
system to crash.
1.1.1 Synthesis approach
We observe that programmers tend to take an iterative approach to bug fixing. After a bug is
discovered the programmer would typically try to reconstruct the control flow (trace) that lead
to the error and then mentally abstract the trace to narrow down the root cause of the bug. The
4
program is then fixed by modifying the source code. Finally, testing is resumed until the program
is found to be correct.
We mimic this iterative approach in our synthesis, so in a sense our automatic approach
proceeds in the same way as a human programmer would. Each iteration checks if the program is
correct w.r.t. a formal specification. If it is incorrect a counterexample trace is produced. A trace
is a sequence of program statements that were executed. The trace contains information which
branches were taken and at which points the program makes context-switches. A counterexample
trace is a witness of a specification failure and we call such traces bad traces.
The original counterexample trace is then generalised to a set of bad traces. Intuitively
a larger set describes the root cause of the bug. Finally the program is refined by inserting
synchronisation into the code, such that the set of bad traces become infeasible. An infeasible
trace is one that cannot occur because of the semantics of the program. The generalisation is
therefore a crucial step in speeding up the process, because it ensures a whole set of bad traces is
removed from the program as opposed to just removing the original counterexample trace. The
synthesis algorithm continues in a loop finding new bad traces until the refined program is found
to be correct. We dub this approach trace-based synchronisation synthesis.
We consider two types of specifications: Explicit specification and implicit specification. An
explicit specification is provided by the programmer in addition to the actual program, typically
in the form of assertions. Assert statements contain an expression that needs to evaluate to true
whenever the execution reaches the assertion. If an assertion evaluates to false the trace leading
to it is considered a counterexample.
It is difficult for a programmer to provide a complete specification that will expose all
possible concurrency bugs. However, our problem set-up is suitable for an implicit specification.
For this we introduce two schedulers: A hypothetical non-preemptive scheduler that will not
switch threads except at specific preemption points and the real-world preemptive scheduler that
may switch threads at any point. Our implicit specification requires that the behaviour of the
program under the preemptive scheduler should be the same as executing the program under the
non-preemptive scheduler. For the implicit specification the counterexample is a trace that is
possible under the preemptive, but not under the non-preemptive scheduler.
In the literature the most common synchronisation primitives employed for concurrent
program repair are atomic sections. An atomic section is a sequence of adjacent statements
and no context-switch is allowed in between executing these statements. Atomic sections are
5
a theoretical construct and not directly implementable. There exist techniques however to
convert them to locks [Cherem et al., 2008]. We focus on synchronisation primitives commonly
employed by programmers today, such as locks, barriers and wait-signal statements. A lock
marks several sections of statements that are mutually exclusive to each other. This makes them
weaker than atomic sections, which are exclusive w.r.t. to any part of the code. We also explore
the reordering of independent statements as an alternative to wait-signal statements. Reordering
produces more efficient code as it avoids the introduction of additional synchronisation primitives
that generally have a negative runtime impact. In our study in Section 3.5.1 we confirm that
reordering is a common fix for concurrency bugs in device drivers. For our work we assume a
sequentially consistent memory model where the CPU does not reorder statements.
1.1.2 Techniques introduced
In this thesis we present four approaches to trace-based synchronisation synthesis increasing
in sophistication. As mentioned above we investigated two kinds of specification: explicit and
implicit. For the explicit specification the programmer typically annotates the source code with
assertions and an off-the-shelve model-checker [Clarke et al., 2004] can be used to check the
program for correctness.
We start with this basic setup for our first implementation of the trace-based synchronisation
synthesis technique in Chapter 3. After the model-checker returns a trace we generalise it and
insert two kinds of synchronisation primitives to eliminate the bug. Firstly we try to reorder two
statements from the same thread in the source code. This eliminates bugs where one threads is
released too early. If that fails we place an atomic section in the code. While the model-checker
respects the semantics of atomic sections and confirms the final program is correct, the atomic
sections cannot be compiled to real machine code.
For statement reordering we consider only independent statements, so that swapping does not
affect the semantics of the thread. However, by reordering we may still introduce new bugs in
other threads that rely on the original ordering. Chapter 4 improves over the previous technique
by not just analysing counterexamples, but also learning from good traces. A good trace is
one that does not violate the specification. During our synthesis we use the model-checker to
generate both good and bad traces. When we eliminate bad traces using statement reordering we
guarantee that none of the observed good traces becomes bad. This means that w.r.t. to the good
6
traces observed to this point we guarantee there are no regressions.
As atomic sections are not directly implementable in most commonly used programming
languages we improved our synthesis to generate common synchronisation primitives, such as
locks, barriers and wait-signals in Chapter 5. Synthesising a lock is more challenging because
it is exclusive only w.r.t. code in other threads protected by the same lock. We accomplish
this by employing a theorem prover to generalise the counterexample traces. Using the prover
we obtain a formula of ordering constraints that includes all bad traces of the program. This
formula may contain disjunctions and conjunctions, e.g. statement A has to happen before B or C
before D. We have patterns that match parts of these formulæ and each pattern corresponds to a
synchronisation primitive that eliminates the traces. While this changes the synthesis step we
still employ an off-the-shelf model-checker in a loop to find counterexample traces we analyse.
Finally, in Chapter 6 we move to an implicit specification: We require that the program
exhibits the same observable behaviour under the preemptive and the non-preemptive scheduler.
This can no longer be checked using an off-the-shelf model-checker; we develop an algorithm
based on automata language-inclusion. This works by translating the control-flow of the program
into two automata, a non-preemptive one that serves as the specification and a preemptive one
that represents all possible traces. The language-inclusion checks if all behaviours possible
under the preemptive automaton are also possible under the non-preemptive automaton. The
language these automata produce is the sequence of read/write access to the memory. If these
sequences are identical we know that the result computed is the same. We can relax the restriction
that they need to be identical because two reads or access to different memory loctions do not
interfere with each other. The synchronisation synthesis discovers critical sections in the same
way as in Chapter 5, but does not insert locks greedily. Only once all critical sections are known
our tool inserts locks into the source code. This allows us to optimise the lock placement, for
example preferring a minimal number of lock statements or to minimise the number of statements
protected by a lock. Depending on the platform and workload of the program one may be better
than the other in terms of performance.
1.2 Illustrative Example
For a very simple example we assume two threads that increment a shared variable x by 1. To
make things more interesting a variable y is assigned in both as well. The example is given in
7
Figure 1.1 Example with two threads incrementing x
thread1
`1: l1 := x
`2: l11 := y
`3: x := l1 + 1
(a) Example
thread2
`A: l22 := y
`B: l2 := x











Informally the programmer’s expectation is that x should have increased by 2 after execution
of both threads. Without loss of generality let us assume the initial value of x is 0. Without
synchronisation the following trace is possible and results in x = 1: `1 → `2 → `A → `B →
`C → `3. Intuitively the problem with this trace is that both threads read value 0 into their
thread-local variables (l1 and l2) and both write back 1 to x.
In order to analyse the trace we first need a formal specification. An explicit specification
could be a post-condition x = x′ + 2, where x′ is the original value and x is the value after both
threads completed. Though simple in this case in general it is difficult for the programmer to
come up with the exact specification that would detect all possible race-conditions.
Our implicit specification approach ensures that the preemptive execution of the program
is equivalent to the non-preemptive execution of the program. One way to define equivalence
is to require that the result of the calculation is identical [Bloem et al., 2014]. This approach
is not applicable to reactive systems and we therefore require that the interaction with the
environment is retained, so that for the environment the preemptive program appears to behave
the same as the non-preemptive program. For this example the environment is the shared memory
and the interactions are reads and writes. We relax the requirement that the order of memory
accesses needs to be exactly the same as the order of reads w.r.t. other reads is unimportant. The
same is true for accesses to different variables. In Figure 1.1b we present the abstract program
with the access to global variables. There are two traces possible under the non-preemptive
scheduler: thread1 followed by thread2 and the other way around. This results in two
possible sequences of memory accesses:
(1) read1(x)→ read1(y)→ write1(x)→ read2(y)→ read2(x)→ write2(x) and
(2) read2(y)→ read2(x)→ write2(x)→ read1(x)→ read1(y)→ write1(x)
Under a preemptive scheduler more sequences are possible. A sequence is considered within
the specification if it can be transformed into a sequence of the non-preemptive scheduler. As a
8
transformation we allow that adjacent accesses can be swapped if they are independent. Accesses
are independent if they are both reads or if they refer to different variables. For example assume
that the scheduler starts with thread2 and then switches to thread1 after executing `A.
(3) read2(y)→ read1(x)→ read1(y)→ write1(x)→ read2(x)→ write2(x)
This sequence is within the specification, because it is equivalent to sequence (1). read2(y) is
independent w.r.t. all other accesses and be moved past the accesses of thread1. The sequence
(4) read2(y)→ read2(x)→ read1(x)→ read1(y)→ write1(x)→ write2(x)
is not within the specification as read2(x) and write1(x) are dependent and cannot be swapped.
This sequence corresponds to the bad trace `A → `B → `1 → `2 → `3 → `C. A trace is bad if it
violates the specification.
Trace generalisation. A bad trace is subsequently generalised to find the root cause of the
bug. We introduce different generalisation techniques in various chapters, which work on the
same principle: The sequence of events in the trace are seen as happens-before relations that
should be relaxed as long as all traces are still bad. We consider the following bad trace π:
`A → `B → `1 → `2 → `3 → `C. The trace can be represented by a happen-before formula:
f = hb(`A, `B) ∧ hb(`B, `1) ∧ hb(`1, `2) ∧ hb(`2, `3) ∧ hb(`3, `C)
where hb(x, y) describes all traces where x occurs before y. The formula f therefore represents
exactly our original bad trace. We can construct a larger formula f ′ that contains also the
happens-before relations that are implied by transitivity. For example hb(`A, `B) ∧ hb(`B, `1) also
implies hb(`A, `1). We have that
f ′ =
∧
{hb(`x, `y)|`x <π `y}
where `x <π `y denotes that `x appeared before `y in trace π.
The generalisation step consists of removing as many happens-before relations as possible
from f ′, such that all traces represented by the HB-formula are bad. For example if we remove
hb(`3, `C) from f ′, the HB-formula now represents two traces: the original π and the trace
`A → `B → `1 → `2 → `C → `3. As both traces are bad, removing hb(`3, `C) is a valid
generalisation. If we were to remove hb(`3, `C), hb(`2, `C), hb(`1, `C) f ′ would contain the trace
9
Figure 1.2 Example with correct locks
thread1
lock(lk)
`1: l1 := x
`2: l11 := y
`3: x := l1 + 1
unlock(lk)
thread2
`A: l22 := y
lock(lk)
`B: l2 := x
`C: x := l2 + 1
unlock(lk)
`A → `B → `C → `1 → `2 → `3, which conforms to the specification. Therefore the constraint
hb(`1, `C) must be part of generalisation of f ′. The most general HB-formula that represents all
bad traces in the program is
f ′′ = hb(`1, `C) ∧ hb(`B, `3)
Informally this means that both threads have to read x before either thread writes to it.
Synchronisation synthesis. As f ′′ represents all bad traces, the negation represents all good
traces of the program.
¬f ′′ = hb(`C, `1) ∨ hb(`3, `B)
From this HB-formula we can infer a lock. The formula ¬f ′′ mandates that either the statement
`C happens before `1 or `3 happens before `B. This is exactly what a lock around `1, `2, `3 and
`B, `C enforces. The final program is shown in Figure 1.2.
1.3 Related Work
Concurrency
There are different notions of concurrency. First let us consider true concurrency as introduced
in Petri nets [Petri, 1962]. True concurrency places no restrictions on the order of events.
Mazurkiewicz traces [Mazurkiewicz, 1987] are partial-order traces used to describe the behaviour
of a concurrent system. The trace does not enforce an order between events that can occur
concurrently.
Contrary, interleaved concurrency describes a model where a scheduler orders the events.
One model are message-passing systems. A message passing system is a set of independent
processes that communicate with each other using one or more channels they send and receive
messages from. The messages on the channel synchronise the processes. Hoare introduced a
10
first notation of message-passing concurrency: Communicating Sequential Processes [Hoare,
1978]. This evolved into the Calculus of Communicating Systems [Milner, 1980] and was later
extended to the π-calculus [Milner, 1992].
Shared-memory concurrency is another model of interleaved concurrency. We use this model
in this thesis as it is the model used by multi-threaded programs. It describes a system where
concurrently running processes communicate by accessing a common shared memory. This roots
back to [Dijkstra, 1968; Cadiou and Levy, 1973]. Traces of shared-memory concurrent programs
represent a total order over the events of all process.
Reasoning about concurrent programs. Software verification was pioneered in the 60s by
[Floyd, 1967] and [Hoare, 1969] who independently invented what is today known as the Floyd-
Hoare triples. Their research was based on earlier work in [McCarthy, 1960; McCarthy, 1962].
These triples essentially capture the semantics of a statement in an imperative program in a
formal way. For each possible program statement the corresponding triple defines a precondition
that must hold before executing the statement and a postcondition that is guaranteed to hold
afterwards. This was intended to construct manual proofs of a sequential program, which is,
however, only feasible for trivially small programs.
Verification was motivated by the need to reason systematically over concurrent programs.
Concurrent programs can have hard to identify bugs, such as race conditions, which makes it
desirable to reason about them systematically. A proof system for shared-variable concurrency
was developed in [Owicki and Gries, 1976] that can show non-interference of parallel threads
with respect to the correctness proof of the other threads.
Correctness specifications for concurrent programs. Up to that point all correctness prop-
erties of a program were expressed as propositional formulæ, which were proven to be invariants,
and termination conditions, which were proven using ranking functions. [Pnueli, 1977] and
later also [Lamport, 1980] proposed the use of temporal logic to express correctness conditions
of reactive systems. A reactive system runs indefinitely, accepting input and producing output.
Temporal logic is suitable to reason over reactive systems because it allows one to express condi-
tions that must be met in future states (as seen from the state currently under observation). This
shift was motivated by concurrent programs, for which a new class of properties was explored:
liveness properties. Liveness captures the notion that all threads of a program should be able
11
to make progress eventually. If we consider critical sections for example, the safety property is
that no two threads may be inside the critical section at the same time. The liveness property
is that no thread may wait forever to enter the critical section (starvation). There exist several
variants of temporal logic, such as LTL (linear temporal logic), CTL (computational tree logic)
and CTL* (a combination of both).
Model checking
Owicki-Gries style proofs can be constructed by hand or automatically using a verification
condition generator and a theorem prover. In practice it turns out to be difficult to automatically
generate verification conditions for programs with loops or recursion.
[Clarke and Emerson, 1982] as well as [Queille and Sifakis, 1982] independently discovered
model checking, a procedure that can mechanically determine if a program with finitely many
states meets its temporal logic specification. They expressed the state space of a program as
a Kripke structure, named after its author [Kripke, 1963]. A Kripke structure is basically a
finite automaton where every state has a successor and a label is assigned to every state. This
process is called model checking because it checks if the Kripke structure is a model for the
temporal formula. If the check fails a counterexample is produced that we use for our synthesis.
The strength of the procedure is that it naturally lends itself to the verification of concurrent
programs. In sequential programs the only source of non-determinism is the input, making
systematic exploration of all branches a feasible option. Concurrent programs have an additional
source of non-determinism: The possible interleavings of concurrent processes. This space
cannot be explored easily by testing, making automated verification the preferred option. Both
sources of non-determinism lead to a so-called state explosion, meaning that the Kripke structure
is exponential in the size of the original program. This model checking approach was later
implemented in the EMC Model Checker [Clarke et al., 1983; Clarke et al., 1986].
State explosion due to inputs. The fact that the state explosion makes it infeasible to construct
a Kripke structure for complex programs gave birth to the idea of symbolic model checking.
In symbolic model checking a set of states is defined by a formula that represents the states.
Symbolic execution of a statement results in a new formula that represents the states the system
can be in after execution. This technique was pioneered by [McMillan, 1993], who implement it
in the SMV model checker. An important precondition for symbolic model checking is a suitable
12
representation of formulæ. [Bryant, 1986] discovered Binary Decision Diagrams (BDDs) that
represent Boolean formulæ canonically and allow for inexpensive logical operations needed in
verification.
Symbolic model checking represents the set of states exactly which means that the formulæ
potentially have to track all program variables and the BDDs can grow exponentially. [Cousot
and Cousot, 1977] introduced the concept of abstract interpretation that can be used to abstract
states and reason about transitions on abstract states. The abstract state space is smaller, however
transitions are sound because they overapproximates the set of concrete states reachable. [Graf
and Saidi, 1997] implement this idea for a small process calculus and use the theorem prover
PVS to perform operations on the abstract domain.
While abstraction preserves soundness, it is incomplete: overestimating the reachable state
may lead to the erroneous conclusion that an error state is reachable. While Graf and Saidi
recognised this problem, they suggested that the algorithm is rerun with a different set of
predicates, leaving it to the user to refine the abstraction. [Clarke et al., 2000] and [Ball et al.,
2001] introduced automated abstraction refinement, where once the spurious counter-example
is discovered the abstraction is refined to eliminate the counter-example. In [Henzinger et al.,
2002] abstraction refinement was further improved to lazy abstraction. In lazy abstraction not the
entire program is reverified after an abstraction refinement, but only those parts that are affected
by the new predicates. This approach was implement in the BLAST model checker.
State explosion due to concurrency. In [Lipton, 1975] the concept of reduction was intro-
duced which can greatly simplify program proofs. Proving that certain parts of the program
behave the same whether they are interrupted or not allows us to reason about them as if they were
atomic sections. We use his notion of left- and right-movers in our work to test if a statements
interferes with another statement.
A number of techniques were developed to tackle the state explosion problem that occurs
due to concurrency. One is assume-guarantee reasoning [Jones, 1983; Henzinger et al., 2000],
which allows a modular approach to proving concurrent programs. This is essentially a divide
and conquer strategy where every concurrent process can be analysed separately assuming non-
interference from the other processes. Another approach is partial-order reductions [Godefroid
et al., 1996] that define an equivalence class of interleavings and for each class only one
representative interleaving is explored.
13
In this thesis we use bounded model checking (BMC), where model checking is bounded
by some measure. BMC is unsound because it cannot guarantee the absence of bugs beyond
the bound, but it is still able to finding difficult bugs. [Biere et al., 2003] restrict the length
of the paths checked, which is increased whenever no error is found (up to a certain point).
Later [Qadeer and Rehof, 2005] proved BMC bounded by the number of context-switches to be
decidable for Boolean programs. Despite being unsound in general, empirically it was shown that
if there is no bug up to a sufficiently high bound the program is most likely bug-free [Musuvathi
and Qadeer, 2007]. We use a continuation of this work, that is implemented in the model checker
CBMC [Clarke et al., 2004]. CBMC uses SAT instead of BBDs to reason over states.
[Elmas et al., 2009] deal with the state explosion using a combination of abstraction and
reduction. These two are applied alternatingly to simplify the program. The abstraction overes-
timates the program’s behaviour while the reduction collapses several statements into a single
atomic block, thereby reducing the number of possible interleavings.
Bug summarisation
Verification techniques typically provide the user with a counterexample trace if the program
does not match the specification. Counterexamples of concurrent programs can be difficult to
understand as they typically contain a lot of spurious context-switches unrelated to the bug. Bug
summarisation techniques aim to present the user with easy to understand information, such
as classifying the bug (race condition, atomicity violation, two-stage access bugs, etc.) and
highlighting problematic context-switches or code sections that lead to the error.
Though this thesis focuses primarily on synthesis we also developed a bug summarisation
technique based on our trace generalisation technique from Chapter 5. Our tool is able to
present to the user a minimal set of required ordering constraints that will trigger the bug and
a bug classification. There is a number of prior work in fault localisation, error explanation,
counterexample minimisation and bug summarisation for concurrent programs.
In [Kashyap and Garg, 2008], the authors focus on shortening counterexamples in message-
passing programs to a set of “crucial events” that are both necessary and sufficient to reach the
bug. In [Jalbert and Sen, 2010], the authors introduce a heuristic to simplify concurrent error
traces by reducing the number of context-switches. There are several papers that survey and
classify common concurrency bug patterns [Farchi et al., 2003; Lu et al., 2008]. We can extend
our bug inference rules using the bug patterns from the papers. Finally, there is a large body of
14
work on automatic detection of specific bugs such as data races and atomicity violations [Savage
et al., 1997; Engler and Ashcraft, 2003; Wang et al., 2010; Said et al., 2011].
Synthesis
Synthesis aims to produce code from a specification and has been studied for decades, but
remains a hard problem today.
Functional synthesis uses an input-output relation as a specification and was pioneered in
early work by [Green, 1969] and [Waldinger and Lee, 1969]. These approaches generally cannot
synthesise loops in the programs. A more recent approach by [Solar-Lezama et al., 2006] is
less ambitious and assumes the programmer provides the general program structure in addition
to the specification, so that only straight-line parts of the program are synthesised. This partial
synthesis proved to be more tractable in practice compared to the full program synthesis.
Reactive systems require a different specification because they do not terminate. The
synthesis question in the automata-theoretic framework was first asked in [Church, 1962] and
solved by [Rabin, 1972]. [Manna and Wolper, 1984] used temporal logics as a specification
and synthesised a communicating system. Finally, [Pnueli and Rosner, 1989] and [Ramadge
and Wonham, 1987] synthesised a reactive module from a temporal formula. They model the
synthesis as a game where player 1 tries to satisfy the specification and player 2 models the
environment and tries to violate the specification. The synthesis corresponds to finding a winning
strategy for player 1. Assume-guarantee synthesis [Chatterjee and Henzinger, 2007] synthesises
concurrent processes one-by-one by assuming the other process will work according to their
specification.
Program repair. Our synthesis is a form of program repair. Program repair takes as input a
program that almost conforms to the specification and minimally modifies the program to ensure
it is correct. The works [Jobstmann et al., 2005; Griesmayer et al., 2006; Samanta et al., 2008]
introduce program repair for sequential Boolean programs.
We target concurrent programs that lack appropriate synchronisation, which our synthesis
inserts. A large body of work deals with the repair of concurrent programs by inserting atomic
sections and fences [Clarke and Emerson, 1982; Vechev et al., 2009; Vechev et al., 2010a].
The approaches in [Clarke and Emerson, 1982; Vechev et al., 2009] were based on inferring
synchronisation by constructing and exploring the entire product graph or tableaux corresponding
15
to a concurrent program. In [Vechev et al., 2009] the authors discuss a trade-off between
permissiveness and synchronisation cost. The synthesis, however, works by adding guards to
statements and the cost is the number of shared accesses in the guards. This is conceptionally
very different from the standard synchronisation primitives we introduce into the code. The
approach in [Vechev et al., 2010a] combines synthesis with abstraction refinement. If a program
violates the specification it may either be due to the abstraction not being precise enough or due
to an actual bug. Then either the abstraction is refined or an atomic section is inserted. Similar
to our method in Chapter 6 an atomicity constraint formula is derived for each faulty trace that
eliminates the context switches. The placement of atomic sections is delayed until no more
faulty traces are found (non-greedy approach). What is missing compared to our approach is
a trace generalisation step, that would let their synthesis loop terminate more quickly. We do
not implement abstraction refinement as it is part of the off-the-shelve verification tool we use
(except in Chapter 6). Further, our approach is able to synthesise real-world synchronisation
primitives, such as locks, barriers and wait-signals, as well as statement reordering. Using the
lock placement technique in [Cherem et al., 2008] atomic sections can be transformed into locks.
Nevertheless, that approach will lead to less fine-grained locking because in the lock placement
step the original specification that lead to the atomic sections being discovered is not longer taken
into account. In [Deshmukh et al., 2010], the authors present a technique that can directly infer
locks to ensure linearisability of concurrent libraries. The main difference from this previous
work is the added counterexample generalisation step that generalises counterexample traces to
formulæ of ordering constraints and eliminate them by inserting appropriate synchronisation. In
[Zhang and Wang, 2014] the authors devise a system that prevents harmful traces at runtime. In
contrast in this work we propose to add common synchronisation constructs to the source code
before compile time. A very recent program repair paper is [Khoshnood et al., 2015], which is
very similar to the technique we present in Chapter 5. Our technique was published earlier in
[Gupta et al., 2015]. All these techniques require an explicit user-provided specification in the
form of assertions or post-conditions. It is very hard for the user to ensure all race conditions
are discoverable by assertions. Our technique in Chapter 6 uses an implicit specification that
guarantees that the synthesised program behaves as if it were executed non-preemptively.
Our program repair algorithm considers the reordering of independent statements as an
alternative to placing wait-signal statements. Statement ordering has been studied before in [Solar-
Lezama et al., 2008] and [Vechev and Yahav, 2008], where the input is a partial order of a set
16
of statements and the output of the synthesis is a total order of the statements. Our synthesis in
Chapter 3 accepts as input a buggy program, and we reorder statements to remove the bug (while
preserving the sequential semantics).
As an alternative to the above techniques [Jin et al., 2012] developed CFIX. It uses predeter-
mined patterns to detect concurrency bugs and a fixing strategy for each pattern of bug. CFIX
scales very well and was shown to work on real code bases, but it is not sound in that it cannot
find all concurrency bugs.
Trace generalisation
All our approaches use generalisation of counterexample traces to guide the synthesis. This
novel approach allows us to focus on traces that can be obtained easily from an off-the-shelve
verification tool and the generalisation helps us find the most general bug fix.
In verification, concurrent trace generalisation was used in [Sinha and Wang, 2010; Sinha
and Wang, 2011]; and in [Alglave et al., 2013] for detecting errors due to weak memory models.
In these works the bounded verification problem is encoded in SMT by encoding the data- and
control-flow. Generalisations of good traces was previously used in [Farzan et al., 2013b] to
create an inductive data-flow graph (iDFG) to represent a proof of program correctness. They do
not attempt to use iDFGs in synthesis.
In [Weeratunge et al., 2011] the authors develop a technique that is based on running the
program in a profile phase to gather positive behaviours (traces) and use them as a specification
for the synthesis. This has the advantage that the approach is much more scalable as positive
traces are cheap to find. The disadvantage is that it may overestimate the required atomicity
because some positive traces were not observed during the profiling phase.
Representations of trace sets. The encoding of trace sets used in Chapter 5 was introduced
in [Wang et al., 2009], and subsequently generalised in [Kahlon and Wang, 2010; Sinha and
Wang, 2011]. We derive from this encoding a succinct, explicit representation of traces using
formulæ over happens-before constraints. In other work, interference scenarios have been
proposed in [Farzan et al., 2013a] to represent concurrent executions that are behaviourally
equivalent under the same input values. For sequential programs, the authors in [Basu et al.,
2003] represent all counterexamples of recursive programs using pushdown automata.
17
Implicit Specification
The concepts of sequential consistency, linearisablity and serialisability have been used as
implicit specifications. Sequential consistency and linearisablity are properties of a concurrent
datastructure that has a number of methods called from various threads. Sequential consistency
[Lamport, 1979] requires that every concurrent execution of the threads is equivalent to a
sequential interleaving of the method calls. Linearisability [Herlihy and Wing, 1990] is a
stronger constraint that also requires that the relative order of calls between threads is preserved.
Serialisability [Eswaran et al., 1976; Papadimitriou, 1986] is a term that originated in the
database world and describes a property of modern database systems that guarantee that for
every concurrent execution of a set of transactions there exists a sequential execution of those
transactions that results in the same state of the database.
There has been a body of work on using a non-preemptive (cooperative) scheduler as an
implicit specification. The notion of cooperability was introduced in [Yi and Flanagan, 2010].
They require the user to annotate the program with yield statements to indicate thread interference.
Then their system verifies that the yield specification is complete meaning that every trace is
cooperable. A preemptive trace is cooperable if it is equivalent to a trace under the cooperative
scheduler. While we abstract the program to variable reads and writes, [Yi and Flanagan, 2010]
uses Lipton’s notion of left- and right-mover to define equivalence between a preemptive and a
cooperating trace.
A recent paper [Bloem et al., 2014] uses implicit specifications for synchronisation synthesis.
While their specification is given by sequential behaviours, ours is given by non-preemptive
behaviours. This makes our approach applicable to scenarios where threads need to communicate
explicitly. Further, correctness in [Bloem et al., 2014] is determined by comparing values at
the end of the execution. In contrast, we compare sequences of events, which serves as a more




Formal Framework and Problem
Statement
We present the syntax and semantics of a concrete concurrent while languageW . WhileW (and
our tools) permits non-recursive function call and return statements, we skip these constructs in
the formalisation below. We conclude the section by formalising our notion of correctness for
concrete concurrent programs.
2.1 Concurrent Programs
In our work, we assume a read or a write to a single shared variable executes atomically and
further assume a sequentially consistent memory model.
2.1.1 Syntax ofW (Figure 2.1)
A concurrent program is a finite collection of threads 〈T1, . . . ,Tn〉 where each thread is a
statement written in the syntax ofW . Variables inW can be categorised into
• shared variables ShVar i,
• thread-local variables LoVar i,
• lock variables LkVar i,
• condition variables CondVar i for wait-signal statements,
• guard variables GrdVar i for assumptions and
• the special variable atomic_level.
20
The LkVar i, CondVar i and GrdVar i variables are also shared between all threads. All variables
range over integers, except for guard variables that range over Booleans {false, true}. The
atomic_level variable keeps track how deeply nested the execution is inside atomic sections.
It is not a shared or local variable. It is increased by the semantics whenever an atomic section
starts and decreased when an atomic section ends. Atomic sections can be nested in the code.
Each statement is labelled with a unique location identifier `; we denote by stmt(`) the statement
labelled by `.
We use SV = {ShVar i|i ∈ N} to denote the set of shared variables, each LVi is the set of
local variables of thread Ti, and V = SV ∪
⋃
i LVi is the set of all variables. Let Vi = SV ∪LVi
denote the set of variables that can be read from and written by thread Ti.
The languageW includes standard sequential constructs, such as assignments, loops, condi-
tionals, and goto statements. Additional statements control the interaction between threads, such
as lock, wait/signal, and yield statements. InW , we only permit expressions that read from at
most one shared variable and assignments that either read from or write to exactly one shared
variable1.
The languageW supports both implicit and explicit specification. The two are not meant to
be used together. The assert statement is used to provide an explicit specification.
To allow an implicit specification the languageW has two statements that allow commu-
nication with an external system: input(ch) reads from and output(ch, ShExp) writes to a
communication channel ch. The channel is an interface between the program and an external
system. The external system cannot observe the internal state of the program and only observes
the information flow on the channel. The implicit specification requires that the interaction
observable to the external system is identical between the preemptive and non-preemptive ex-
ecution of the program. In practice, we use the channels to model device registers. A device
register is a special memory address, reading and writing from and to it is visible to the device.
This is used to exchange information with a device. In our presentation, we assume all channels
communicate with the same external system.
1An expression/assignment statement that involves reading from/writing to multiple shared variables can always
be rewritten into a sequence of atomic read/atomic write statements using local variables. For example the statement
x := x+ 1, where x is a global variable can be translated to l = x; x = l+ 1, where l is a fresh local variable.
21
Figure 2.1 Syntax ofW
LbStmt ::= Labelled Statement
` : stmt Statement annotated with a location
LbStmt1;LbStmt2 Sequence of statements
stmt ::= Statement
skip marks the end of the thread
ShVar := LoExp Assignment to shared variable
LoVar := ShExp Assignment to local variable
ShVar := havoc Assign non-deterministic value
ShVar := input(ch) Read a value from channel ch
output(ch,ShExp) Write value of ShExp to channel ch
if (ShExp) then LbStmt1 else LbStmt2 conditional
while (ShExp) LbStmt while loop
assert(ShExp) Assert ShExp evaluates to 6= 0
await(ShExp) Busy wait for ShExp to become 6= 0
lock(LkVar) Locks the mutex lock
unlock(LkVar) Unlocks the mutex lock
wait(CondVar) Waits for CondVar to be signalled
wait_not(CondVar) Waits for CondVar to be reset
signal(CondVar) Notifies condition variable
reset(CondVar) Resets condition variable
wait_reset(CondVar) Waits and resets in an atomic operation
assume(GrdVar) Assume guard to be true
assume_not(GrdVar) Assume guard to be false
GrdVar ← GrdExpr Assigns GrdVar the result of GrdExpr
yield Allow current thread to be descheduled
goto(`) Set the next statement to `
atomic_start/atomic_end Start/End an atomic section
LoExp ::= Local-variable expression
c Integer constant
LoVar Thread-local variable
op(LoExp1, . . . ,LoExpn) Operator application
ShExp ::= Shared-variable expression
LoExp Local-variable expression
ShVar Shared variable
op(ShVar ,LoExp1, . . . ,LoExpn) Operator application with shared variable
GrdExpr ::= Expression over guard variables
true/false Boolean constant
GrdVar Guard variable
boolop(GrdExpr1, . . . ,GrdExprn) Boolean operation
22
2.1.2 Semantics ofW
We first define the semantics of a single thread inW , and then extend the definition to concurrent
non-preemptive and preemptive semantics.
Single-thread semantics (Figure 2.2). Let us fix a thread identifier tid . We use tid inter-
changeably with the program it represents. A state of a single thread is given by 〈V , `〉 where
V is a valuation of all program variables, and ` is a location identifier, indicating the statement
in tid to be executed next. A thread is guaranteed not to read or write thread-local variables of
other threads.
We define the flow graph Gtid for thread tid in a manner similar to the control-flow graph of
tid . Every node of Gtid represents a single statement (basic blocks are not merged) and the node
is labelled with the location identifier ` of the statement. The flow graph Gtid has a unique entry
node and a unique exit node. These two may coincide if the thread has no statements The entry
node is the first labelled statement in tid ; we denote its location identifier by firsttid . The exit
node is a special node corresponding to a hypothetical statement lasttid : skip placed at the end
of tid .
We define successors of events of tid using Gtid . The event last has no successors. We define
succ(`) = `′ if node ` : stmt in Gtid has exactly one outgoing edge to node `′ : stmt ′. Nodes
representing conditionals and loops have two outgoing edges. We define succ1(`) = `1 and
succ2(`) = `2 if node ` : stmt in Gtid has exactly two outgoing edges to nodes `1 : stmt1 and
`2 : stmt2. Here succ1 represents the then or the loop branch, whereas succ2 represents the else
or the loopexit branch.
We can now define the single-thread operational semantics. A single execution step 〈V , `〉 α→
〈V ′, `′〉 changes the program state from 〈V , `〉 to 〈V ′, `′〉, while optionally outputting an observ-
able symbol α. The absence of a symbol is denoted using ε. The outputs are used only for the
implicit specification. In the following, e represents an expression and e[v/V [v]] evaluates an
expression by replacing all variables v with their values in V . We use V [v := k] to denote that
variable v is set to k and all other variables in V remain unchanged.
In Figure 2.2, we present the rules for single execution steps. Each step is atomic, no
interference can occur while the expressions in the premise are being evaluated. Note that every
expression may reference only one shared variable and all other variables must be thread-local.
23
We use integer expressions for conditional and loop statements using the standard C interpretation
of 0 meaning false and all other integers meaning true. The only rules with an observable
output are:
1. HAVOC: Statement ` : ShVar := havoc assigns shared variable ShVar a non-deterministic
value (say k) and outputs the observable (tid , havoc, k, ShVar).
2. INPUT, OUTPUT: ` : ShVar := input(ch) and ` : output(ch, ShExp) read and write
values to the channel ch, and output (tid , in, k, ch) and (tid , out, k, ch), where k is the
value read or written, respectively.
Intuitively, the observables record the sequence of non-deterministic guesses, as well as the in-
put/output interaction with the tagged channels. The semantics of the synchronisation statements
shown in Figure 2.2 is standard. The lock and unlock statements do not count and do not allow
double (un)locking. There are no rules for goto and the sequence statement because they are
already taken care of by the flow graph.
The await(ShExp) statement is a busy-wait, short for while (ShExp) {}. The atomic start and
end statements atomically increment and decrement the atomic_level variable. This allows
for nested atomic sections to be handled correctly. A succeeding assertion (where the expression
evaluates to true) does not change the state except to advance the location identifier.
Concurrent semantics. A state of a concurrent program is given by 〈V , ctid, (`1, . . . , `n)〉
where V is a valuation of all program variables, ctid is the thread identifier of the currently
executing thread and `1, . . . , `n are the location identifiers of the statements to be executed next
in threads T1 to Tn, respectively. There are three additional states: 〈terminated〉 indicates
the program has finished, 〈failed〉 indicates an assertion failed and 〈invalid〉 indicates an
assumption failed. Initially, all integer program variables and ctid equal 0, all guard variables
equal false and for each i ∈ [1, n] : `i = firsti. We introduce a non-preemptive and a
preemptive semantics. The former is used as an implicit specification of allowed executions,
whereas the latter models concurrent sequentially consistent executions of the program.
Non-preemptive semantics (Figure 2.3). The non-preemptive semantics ensures that a single
thread from the program keeps executing using the single-thread semantics (Rule SEQ) until
one of the following occurs: (a) the thread finishes execution (Rule THREAD_END) or (b) it
encounters a yield, lock, wait or wait_not statement (Rule NSWITCH). In these cases, a context-
switch is possible, however, the new thread must not be blocked. We consider a thread blocked if
24
Figure 2.2 Single-thread semantics ofW
stmt(`) = ShVar := LoExp LoExp[v/V[v]] = k
〈V, `〉 ε→ 〈V[ShVar := k], succ(`)〉
ASSIGNMENT
stmt(`) = ShVar := havoc k ∈ Z
〈V, `〉 (tid ,havoc,k,ShVar)−−−−−−−−−−−−→ 〈V[ShVar := k], succ(`)〉
HAVOC
stmt(`) = ShVar := input(ch) k ∈ Z
〈V, `〉 (tid ,in,k,ch)−−−−−−−→ 〈V[ShVar := k], succ(`)〉
INPUT
stmt(`) = output(ch,ShExp) ShExp[v/V[v]] = k
〈V, `〉 (tid ,out,k,ch)−−−−−−−−→ 〈V, succ(`)〉
OUTPUT
stmt(`) = if (ShExp) then LbStmt1 else LbStmt2 ShExp[v/V[v]] 6= 0
〈V, `〉 ε→ 〈V, succ1(`)〉
IF1
stmt(`) = if (ShExp) then LbStmt1 else LbStmt2 ShExp[v/V[v]] = 0
〈V, `〉 ε→ 〈V, succ2(`)〉
IF2
stmt(`) = while (ShExp) LbStmt ShExp[v/V[v]] 6= 0
〈V, `〉 ε→ 〈V, succ1(`)〉
WHILE1
stmt(`) = while (ShExp) LbStmt ShExp[v/V[v]] = 0
〈V, `〉 ε→ 〈V, succ2(`)〉
WHILE2
stmt(`) = lock(LkVar) V[LkVar ] = 0
〈V, `〉 ε→ 〈V[LkVar := tid ], succ(`)〉
LOCK
stmt(`) = unlock(LkVar) V[LkVar ] = tid
〈V, `〉 ε→ 〈V[LkVar := 0], succ(`)〉
UNLOCK
stmt(`) = wait(CondVar)/wait_not(CondVar) V[CondVar ] = 1/0
〈V, `〉 ε→ 〈V, succ(`)〉
WAIT/WAIT_NOT
stmt(`) = wait_reset(CondVar) V[CondVar ] = 1
〈V, `〉 ε→ 〈V[CondVar := 0], succ(`)〉
WAIT_RESET
stmt(`) = signal(CondVar)/reset(CondVar)
〈V, `〉 ε→ 〈V[CondVar := 1/0], succ(`)〉
SIGNAL/RESET
stmt(`) = assume(GrdVar)/assume_not(GrdVar)
V[GrdVar ] = true/false
〈V, `〉 ε→ 〈V, succ(`)〉
ASSUME/ASSUME_NOT
stmt(`) = assert(ShExp) ShExp[v/V[v]] 6= 0
〈V, `〉 ε→ 〈V, succ(`)〉
ASSERTION SUCCEEDING
stmt(`) = GrdVar ← GrdExpr GrdExpr [v/V[v]] = k k ∈ {true, false}
〈V, `〉 ε→ 〈V[GrdVar := k], succ(`)〉
SET GUARD
25
Figure 2.2 Single-thread semantics ofW (continued)
stmt(`) = await(ShExp) ShExp[v/V[v]] 6= 0 `′ = succ(`)
〈V, `〉 ε→ 〈V, `′〉
AWAIT1
stmt(`) = await(ShExp) ShExp[v/V[v]] = 0
〈V, `〉 ε→ 〈V, `〉
AWAIT2
stmt(`) = atomic_start V[atomic_level] = k `′ = succ(`)
〈V, `〉 ε→ 〈V[atomic_level := k + 1], `′〉
ATOMIC START
stmt(`) = atomic_end V[atomic_level] = k k > 0 `′ = succ(`)
〈V, `〉 ε→ 〈V[atomic_level := k − 1], `′〉
ATOMIC END
its current statement is to acquire an unavailable lock, to wait for a condition that is not signalled
or the thread reached the last event. We call yield, lock, wait and wait_not statements preemption
points. Note the difference between wait/wait_not and assume/assume_not. The former allow
for a context-switch while the latter transitions to the 〈invalid〉 state if the assume is not fulfilled
(rule ASSUME/ASSUME_NOT). A special rule exists for termination (rule TERMINATE), which
requires that all threads finished execution, all locks are unlocked and also that we are outside
any atomic section. If an assertion fails the execution transitions to the 〈failed〉 state (rule
ASSERTION FAILURE). The await statement causes a livelock in the non-preemptive semantics
if its condition is not fulfilled.
Preemptive semantics (Figure 2.3, Figure 2.4). The preemptive semantics of a program is
obtained from the non-preemptive semantics by relaxing the condition on context-switches, and
allowing context-switches at all program points. In particular, the preemptive semantics consist
of the rules of the non-preemptive semantics and the single rule PSWITCH in Figure 2.4. A
preemptive context-switch is only possible when outside an atomic section.
2.2 Executions and Traces
Let W denote the set of all concurrent programs inW .
Executions. A non-preemptive/preemptive execution of a concurrent program C in W is an
alternating sequence of program states and (possibly empty) observable symbols, S0`1S1 . . . `kSk,
such that
26
Figure 2.3 Non-preemptive semantics
ctid = i 〈V, `i〉
α→ 〈V ′, `′i〉
〈V, ctid, (. . . , `i, . . .)〉
α→ 〈V ′, ctid, (. . . , `′i, . . .)〉
SEQ
ctid = i `i = lasti ctid
′ ∈ {1, . . . , n} ¬blocked(`ctid′ ,V)
〈V, ctid, (. . . , `i, . . .)〉
ε→ 〈V, ctid′, (. . . , `i, . . .)〉
THREAD_END
stmt(`i) = lock(lk)/wait(cv)/wait_not(cv)/wait_reset(cv)/yield
ctid = i ctid′ ∈ {1, . . . , n} ¬blocked(`ctid′ ,V)
〈V, ctid, (. . . , `i, . . .)〉
ε→ 〈V, ctid′, (. . . , `i, . . .)〉
NSWITCH
∀i. `i = lasti ∀j. V[lkj ] = 0 V[atomic_level] = 0
〈V, ctid, (`1, . . . , `n)〉
ε→ 〈terminated〉
TERMINATE
ctid = i stmt(`i) = assume(gv)/assume_not(gv) V[gv] = 0/1
〈V, ctid, (`1, . . . , `n)〉
ε→ 〈invalid〉
ASSUME/ASSUME_NOT
ctid = i stmt(`i) = assert(ShExp) ShExp[v/V[v]] = 0
〈V, ctid, (`1, . . . , `n)〉
ε→ 〈failed〉
ASSERTION FAILURE
blocked(`,V) = (stmt(`) = lock(LkVar) ∧ V[LkVar ] 6= 0)
∨ (stmt(`) = wait(CondVar) ∧ V[CondVar ] = 0)
∨ (stmt(`) = wait_not(CondVar) ∧ V[CondVar ] = 1)
∨ (stmt(`) = wait_reset(CondVar) ∧ V[CondVar ] = 0)
∨ (∃i : ` = lasti))
Figure 2.4 Additional rule for preemptive semantics
ctid′ ∈ {1, . . . , n} ¬blocked(`ctid′ ,V) V [atomic_level] = 0
〈V , ctid, (`1, . . . , `n)〉
ε→ 〈V , ctid′, (`1, . . . , `n)〉
PSWITCH
27
(a) S0 is the initial state of C,
(b) ∀j ∈ [0, k − 1]. Sj = 〈Vj, ctidj, (`1j , . . . , `nj )〉,
(c) ∀j ∈ [0, k − 1], according to the non-preemptive/preemptive semantics of W , we have
`ctid = `j ∧ ∃α. Sj
αj+1−−→ Sj+1, and
(d) Sk is the state 〈terminated〉 or 〈failed〉.
Intuitively an execution must be valid w.r.t. the non-preemptive/preemptive semantics and
terminate in the end. An execution is good if it satisfies a given specification, and bad otherwise.
Traces. A trace is a sequence of location identifiers `1; . . . ; `n. We occasionally write tid0.`1;
. . . ; tidn.`n to highlight the thread of each statement. These two notations are equivalent as the
location identifiers are unique and each location identifier is associated with exactly one thread.
Alternatively, we also use → to separate location identifiers in a trace as in `1 → . . . → `n.
The language L(π) of a trace π = `1 . . . `n is the set of all executions S0`′1S1 . . . `′nSn+1 where
`i = `
′
i for i ∈ [1, n].
For any two events `i, `j ∈ locs(π), we say `i <π `j if `i occurs before `j in π. A trace π is
feasible if its language has at least one execution (i.e., L(π) 6= ∅), and is infeasible otherwise. A
feasible trace π is good if all executions in L(π) are good, and is bad otherwise.
We introduce the following functions that help us reason about traces and events:
Function Explanation
stmt(`) Returns the statement for event `
tid(`) Returns the thread identifier for `
locs(tid) Returns a set of location identifiers of all statements in thread tid
locs(π) Returns a set of events of trace π
If an event repeats in a trace due to loops it is tagged with its repetition number as in the
following example:
while(y) {
`1 : y := y− 1
}
`2 : y := 2




1; `2 repeats event `1 three times.
28
2.2.1 Trace Neighbourhoods
We define equivalence classes over traces, called neighbourhoods. Since the trace is the result of a
program run, loops are unrolled a fixed number of times. This makes the relation < well-defined
for loops, as the events in the loop are tagged with their repetition number. The neighbourhood
Nπ of a trace π is a set of traces Nπ = {σ | locs(σ) = locs(π) ∧ ∀`i, `j ∈ locs(π). tid(`i) =
tid(`j) ∧ `i<π`j ⇒ `i<σ`j}. Intuitively, Nπ contains all traces having the same events as π and
having the same order of events within each thread. A trace in Nπ may be infeasible, good, or
bad. We denote the subsets of good and bad traces in Nπ by N gπ and N bπ , respectively. We call
N bπ and N gπ the bad and good neighbourhoods of π. The languages L(Nπ), L(N gπ ) and L(N bπ)
are the unions of the languages of all traces in Nπ, N gπ and N bπ , respectively. We say `i N `j if
`i <π `j for all traces π in N. We use `i  `j if the neighbourhood is clear from the context.
Note thatNπ corresponds to a partial order (locs(π),v), with `i v `j iff `i<π`j and tid(`i) =
tid(`j). However, N gπ and N bπ do not, in general, correspond to a partial order.
Representation of concurrent trace sets. There are multiple ways to represent trace sets.
Some representations may be more expressive or useful for reasoning about concurrent programs
than others. A candidate representation that has been used for certain trace sets is a partial order
over events [Wang et al., 2009]. The neighbourhood of a trace, as defined above, can also be
represented as a partial order. However, the good neighbourhood or the bad neighbourhood of a
trace is, in general, not a partial order. In our work, we represent trace sets as HB-formulæ. An
HB-formula is a Boolean combination of happens-before causality constraints (`i < `j) between
events. HB-formulæ can represent arbitrary finite sets of finite traces, and in particular, good
and bad neighbourhoods. As we will see later, HB-formulæ are not only expressive, but also
versatile enough to be usable for diverse objectives.
We represent the sequence of events from thread T with location identifiers between ` and `′
(inclusive) by T[` : `′]. We also use the symbol L to denote location identifier ranges such as
` : `′.
Partial-order HB-formulæ and partial-order neighbourhoods. For our earlier techniques
we need to restrict HB-formulæ to those representing partial orders. We call an HB-formula a
partial-order HB-formula if it does not contain disjunctions (∨) or negations (¬). A partial-order
neighbourhood is a neighbourhood that can be represented using a partial-order HB-formula.
29
For a partial-order neighbourhood Nπ, we use Nπ \ hb(A,B) to denote the neighbourhood Nπ
with the constraint hb(A,B) removed.
2.2.2 Representing Trace Neighbourhoods
Representing subsets of trace neighbourhoods. We represent subsets of trace neighbour-
hoods using happens-before formulæ, or, HB-formulæ. An HB-formula ϕ for a trace π is either a:
(a) basic constraint of the form hb(`i, `j) where `i, `j ∈ locs(π); or (b) a Boolean combination
of HB-formulæ, i.e., one of ϕ1 ∧ ϕ2, ϕ1 ∨ ϕ2, or ¬ϕ1 where ϕ1 and ϕ2 are HB-formulæ.
The semantics [[ϕ]] of an HB-formula ϕ for a trace π is subset of Nπ, defined as follows:
(a) for a basic constraint hb(`i, `j), we have that [[hb(`i, `j)]] = {σ ∈ Nπ | `i <σ `j}; and (b) for
Boolean combinations, we have that [[ϕ1 ∧ ϕ2]] = [[ϕ1]] ∩ [[ϕ2]], [[ϕ1 ∨ ϕ2]] = [[ϕ1]] ∪ [[ϕ2]], and
[[¬ϕ1]] = Nπ \ [[ϕ1]], respectively.
In a slight abuse of notation we write `i < `j for hb(`i, `j).
Remark 2.2.1. Our HB-formulæ only represent constraints on scheduling. One could define
more expressive constraints which include constraints not just on scheduling, but also on variable
valuations in individual executions. However, our hypothesis is that happens-before constraints
on scheduling are sufficient to express many interesting properties of traces and executions. This
is also supported by empirical data that shows that most concurrency bugs are due to bad ordering
of statements in a trace rather than the interaction between schedules and variable valuations [Lu
et al., 2008].
2.3 Program Correctness and the Synthesis Problem
2.3.1 Explicit specification
We call an execution assertion-safe if the final state is not 〈failed〉. A program C is assertion-
safe if all executions are assertion-safe. Intuitively, a program is assertion-safe if no failing
assertion is reachable.
A trace π is called assertion-safe if all executions in L(π) are assertion-safe. Otherwise the
trace is called erroneous.
30
2.3.2 Implicit specification
Observable behaviours. Let π be an execution of program C in W, then we denote with
ω = obs(π) the sequence of non-empty observable symbols in π. We use [[C]]NP , resp. [[C]]P , to
denote the non-preemptive, resp. preemptive, observable behaviour of C, that is all sequences
obs(π) of all executions π under the non-preemptive, resp. preemptive, scheduling.
We specify correctness of concurrent programs in W using two implicit criteria, presented
below.
Preemption-safety. Observable behaviours ω1 and ω2 of a program C in W are equivalent
if: (a) the subsequences of ω1 and ω2 containing only symbols of the form (tid , in, k, t) and
(tid , out, k, t) are equal and (b) for each thread identifier tid , the subsequences of ω1 and
ω2 containing only symbols of the form (tid , havoc, k, x) are equal. Intuitively, observable
behaviours are equivalent if they have the same interaction with the interface, and the same
non-deterministic choices (havoc) in each thread. For sets O1 and O2 of observable behaviours,
we write O1 b O2 to denote that each sequence in O1 has an equivalent sequence in O2.
Given concurrent programs C and C ′ in W such that C ′ is obtained by adding locks to C, C ′ is
preemption-safe w.r.t. C if [[C ′]]P b [[C]]NP .
We call a program correct if it is preemption-safe in case of implicit specification or if it is
assertion-safe in case of explicit specification.
2.3.3 Deadlock-freedom
A state S of concurrent program C in W is a deadlock state under non-preemptive/preemptive
semantics if
(a) the repeated application of the rules of the non-preemptive/preemptive semantics from the
initial state S0 of C can lead to S,
(b) S 6= 〈terminated〉,
(c) S 6= 〈invalid〉,
(d) S 6= 〈failed〉 and
(e) ¬∃S ′: 〈S〉 α→ 〈S ′〉 according to the non-preemptive/preemptive semantics ofW .
Program C in W is deadlock-free under non-preemptive/preemptive semantics if no non-preemptive/
preemptive execution of C hits a deadlock state. In other words, every non-preemptive/preemptive
31
Figure 2.5 Basic flow of trace-based synthesis









execution of C ends in state 〈terminated〉, 〈failed〉 or 〈invalid〉. The 〈invalid〉 state indic-
ates an assumption did not hold, which we do not consider a deadlock. We say C is deadlock-free
if it is deadlock-free under both non-preemptive and preemptive semantics.
Deadlocks are typically caused by two threads acquiring the same locks in different order
or a wait statement with no concurrent thread signalling. Note that the await statement may
introduce a livelock (but not a deadlock), i.e. 〈S〉 ε→ 〈S〉 is possible forever while the program
makes no progress.
2.3.4 Synthesis Problem
In Chapters 3 to 6 we present solutions to various synchronisation synthesis problems, that are
all variants of the basic synchronisation synthesis problem: Given a concurrent program C in W,
the goal is to synthesise a new concurrent program C ′ in W such that:
(a) C ′ is obtained by adding synchronisation to C,
(b) C ′ is correct and
(c) C ′ contains no deadlocks not already present in C.
2.4 Trace-based Synthesis
Our synthesis algorithm is loosely based on counterexample guided inductive synthesis (CE-
GIS) [Solar-Lezama et al., 2006]. CEGIS works in a loop that iteratively proposes a candidate
32
solution and then checks if the solution matches the specification. If this is the case the synthesis
algorithm found a correct solution and terminates. If the solution does not match the specification
a counterexample is produced that is used to refine the solution for the next loop iteration.
We call our synthesis algorithm the trace-based synchronisation synthesis. The algorithms
presented in Chapters 3 to 6 are variations of this basic synthesis algorithm.
Figure 2.5 shows the basic components of our basic algorithm. The input is the program C
that uses either implicit or explicit specification. The algorithm works in a loop that iteratively
refines the program C ′ (initially C ′ = C). The loop condition is whether C ′ is correct according to
the specification. For programs with explicit specification we can use for example an off-the-shelf
model-checker to test this. If C ′ is correct this it is returned and the algorithm terminates. If C ′
is incorrect the check will produce a faulty (counterexample) trace that explains the reason for
incorrectness, for example a failed assertion is reached. The goal of the synthesis is to eliminate
the buggy trace from C. The first step to eliminate the bug is generalisation of the buggy trace.
The goal of this step is to capture the “essence” of the bug. The trace may for example contain a
large number of context switches, but only few are actually required to trigger the bug. The last
step is to eliminate the bug by modifying C ′ such that the buggy trace and a set of similar traces




Synthesis for an Explicit Specification
3.1 Problem Statement and Illustrative Examples
Our first technique focuses on a specific class of synchronisation bugs that can be fixed using
statement reordering, i.e., a rearrangement of program statements that changes the program’s
concurrent behaviour while preserving the sequential semantics of each thread. For example, a
pointer initialisation may be moved before a statement releasing another thread that dereferences
the pointer. We develop a technique for automating this type of transformation.
Our study of real-world concurrency bugs from device drivers shows that the most common
fix used for 28% of driver concurrency bugs is statement reordering and only 17% of bugs are
fixed using locks.
We also consider other semantics-preserving transformations inspired by practical bug-fixing
techniques. For example, the synthesis tool may repeat idempotent statements multiple times
(we give an example where duplicating an statement removes a concurrency bug). As a fall-back
option we place atomic sections to fix other concurrency bugs.
In this approach we generalise counterexample traces to partial-order neighbourhoods (see
Section 2.2.1). We first find a counterexample trace using an off-the-shelf tool, and then
generalise it to a partial-order neighbourhood. We achieve this generalisation by combining ideas
from Lipton reduction [Lipton, 1975] and error invariants [Ermis et al., 2012]. We relax the
ordering of a pair of statements in the trace if swapping these statements preserves error invariants
(and thus the bug can still be reached). Intuitively, the resulting partial-order neighbourhood
captures the ‘true cause’ of the bug. For instance, if the counterexample trace includes context
34
switches that are not necessary to reach the bug, these context switches will not be required by
the partial-order neighbourhood.
A key insight in our algorithm is that given a partial-order neighbourhood N of trace π, the
problem of eliminating N can be phrased as the problem of creating a minimal cycle in a graph
(representing the partial order neighbourhood) by adding new edges. A graph with a cycle does
not allow linearisation and hence a cycle corresponds to a set of transformations that together
eliminate N. The additional edges correspond to possible statement reordering or the insertion
of atomic sections. Each additional edge is labelled by a cost (for instance, the length of the
atomic section).
We implemented this techniques in a prototype tool called CONCURRENCYSWAPPER. As
specifications, we handle assertions, deadlocks, and generic conditions such as pointer use
before initialisation. However, our techniques apply to a larger class of reachability properties.
For finding buggy traces, we use the model checker Q [Q]. If Q produces a buggy trace,
CONCURRENCYSWAPPER generalises it to a partial-order neighbourhood, which it then tries to
eliminate first by statement reordering, and failing that, using an atomic section. Otherwise, the
current version of the driver is returned, with all the discovered bugs fixed.
We evaluated our tool on (a) five microbenchmarks that are simplified versions of bugs from
Linux device drivers, and (b) a simplified driver for the Realtek 8169 Ethernet controller. The
latter had 364 LOC, seven threads, and contained five bugs. In the experiments, we found that:
(a) bug finding and verification (in Q) dominates time spent generalising counterexamples, and
(b) using generalised counterexamples reduces the number of bug-finding iterations.
3.1.1 Illustrative Examples
Generalising buggy traces. In Figure 3.1a, thread1 and thread2 concurrently increment
x. The assertion states that x is 2 in the end. It fails in trace π ≡ A→ B→ 1→ C→ 2→ D→
3 → 4, where both threads read the initial x value 0, and then write back 1 to x. However, π
is just one trace exhibiting this bug. For example, swapping B and 1 in π gives another buggy
trace. Let π be the HB-formula
∧
`i,`j
hb(`i, `j) iff statement `i occurs before `j in π. We
relax π by removing all constraints hb(`i, `j) where stmt(`i); stmt(`j) has the same effect as
stmt(`j); stmt(`i). This gives us the HB-formula 1π (shown in Figure 3.1c). All traces where
the execution order respects 1π fail the assertion.
35
Figure 3.1 Illustrative examples
init : x := 0; t1 := 0
thread1 thread2
A : l1 := x 1 : l2 := x
B : l1 := l1+ 1 2 : l2 := l2+ 1
C : x := l1 3 : x := l2
D : t1 := 1 4 : assert(¬t1 ∨ x = 2)
(a) Concurrent increment
init : IntrMask := 0; ready := 0; handled := 0
init_thread intr_thread
M : IntrMask := 1 R : assume(IntrMask = 1)





































For C and 3, the sequence C; 3 is not equivalent to 3; C when l1 6= l2. However, in all traces
of 1π, it can be seen that l1 = l2 = 1, and further, this is sufficient to trigger the bug. These
sufficient conditions to trigger bugs are error invariants. Using this information, we can further
relax π to 1π shown in Figure 3.1d, where the only constraints are that both threads read
x before either writes to it, and that D occurs before 4. A main component of our synthesis
algorithm is the generalisation of buggy traces to determine their root cause.
Atomic sections. We attempt to eliminate the bug represented by π by adding atomic sections.
For example, adding an atomic section around 1, 2, and 3 in 1π gives us 1+π from Figure 3.1e,
where the atomic section is collapsed into a single node. Note that 1+π represents an empty
neighbourhood, as there is a cycle of nodes [1; 2; 3] and C. Intuitively, the cycle implies that
[1; 2; 3] happens both before and after C, which is impossible. Hence, adding an atomic section
around [1; 2; 3] eliminates all traces represented by 1π from the program. The atomic section
[1; 2; 3] does not eliminate the buggy trace A → [1; 2; 3] → B → C → D → 4. Analysing this
trace similarly, we find that another atomic section [A; B; C] is needed to obtain a correct program.
The number of bug fixing iterations can be reduced using error invariants [Ermis et al., 2012].
36
For example, in 2π, the atomic section [1; 2; 3] is not sufficient to create a cycle; instead, we
immediately see that both [1; 2; 3] and [A; B; C] are needed. The error invariant l1 = l2 allows
the partial order to be generalised further, removing the edge between C and 3 (Figure 3.1d).
This shows that error invariants help the synthesis by allowing more general partial orders to be
discovered.
Instruction reordering. The example in Figure 3.1b is inspired by a real bug from a Linux
device driver. Thread intr_thread runs when interrupts are enabled, i.e., IntrMask is 1,
and attempts to handle them; it fails if the driver is not ready. The init_thread enables
interrupts and readies the driver.
The bug is that interrupts are enabled before the driver is ready, for example, in trace
θ ≡ M→ R→ S→ N→ T. Note that statements M and N are independent, i.e., M; N is equivalent
to N; M. We construct an HB-formula from θ as before, but remove the constraint M θ N, resulting
in Figure 3.1f (excluding the dashed edge). Adding the edge N→ M creates a cycle and eliminates
the bug. This edge changes the order of M and N, forcing the order N; M. This results in a correct
program with the driver ready to handle interrupts before they are enabled.
Following the ideas presented in this section, our synthesis algorithm works by generalising
counterexample traces to partial-order neighbourhoods and eliminating them using atomic section
insertion or statement reordering.
3.2 Semantics-preserving Transformations
We consider two kinds of transformations for fixing bugs:
• A reordering transformation θ = `1 ! `2 transforms program C to C ′ if location `1
immediately precedes `2 in C and `2 immediately precedes `1 in C ′ with everything else
unchanged. We only consider cases where the sequential semantics are preserved, i.e., if
(a) `1 and l2 are from the same basic block; and (b) `1; l2 is equivalent to `2; `1.
• An atomic section transformation θ = [`1; `2] transforms C to C ′ if neighbouring locations
`1 and `2 are in an atomic section in C ′, but not in C.
We write C θ1...θk−−−→ C ′ if applying each of θi in order transforms C to C ′. We say transformation θ
acts across preemption points if either θ = `1 ! `2 and one of `1 or `2 is a preemption point; or
if θ = [`1; `2] and `2 is a preemption point.
37
Given a program C, we define program constraints to represent sets of programs that can be
obtained through applying program transformations on C.
• Atomicity constraint: Program C ′ |= [`i; `j] if `i and `j are in an atomic block.
• Ordering constraint: Program C ′ |= `i v `j if `i and `j are from the same basic block and
either `i occurs before `j , or C ′ satisfies [`i; `j].
If C ′ |= Φ, we say that C ′ satisfies Φ. Further, we define conjunction of Φ1 and Φ2 by letting
C ′ |= Φ1 ∧ Φ2 ⇐⇒ (C ′ |= Φ1 ∧ C ′ |= Φ2). We define implication as C ′ |= Φ1 =⇒ Φ2 ⇐⇒
(C ′ |= Φ1 =⇒ C ′ |= Φ2) and its negation as C ′ |= Φ1 /=⇒ Φ2 ⇐⇒ ¬ (C ′ |= Φ1 =⇒ C ′ |= Φ2)
A program constraint Φ is weaker than Φ′ if Φ 6= Φ′ and
• ∀`i, `j. (Φ =⇒ [`i; `j]) =⇒ (Φ′ =⇒ [`i; `j]) or
• (@`i, `j. Φ′ =⇒ [`i; `j] ∧ Φ /=⇒ [`i; `j]) ∧ (∀`i, `j. (Φ =⇒ `i v `j) =⇒ (Φ′ /=⇒
`i v `j)).
Intuitively, this means Φ is weaker than Φ′ if Φ has fewer atomic sections or if the number of
atomic sections are the same than Φ is weaker if it enforces fewer ordering constraints.
As our reorderings need to preserve the sequential semantics of the thread, we can compute
some reordering constraints even before considering concurrent executions. The procedure
SemPreservingOrders computes a program constraint Φ as follows. For each thread M it
picks `, `′ ∈ locs(M) such that ` precedes `′ in the original program, and checks if stmt(`) and
stmt(`′) commute, i.e., we test using a theorem prover two conditions: (a) stmt(`′); stmt(`) can
execute to completion from each state stmt(`); stmt(`′) can; and (b) they have the same effect.
If they do not commute, then Φ← Φ ∧ ` v `′.
If SemPreservingOrders returns Φ on input M , then every M ′ satisfying Φ (and obtained
by reordering) is sequentially equivalent to M , and no weaker Φ′ has the same property.
Example 3.2.1. Running SemPreservingOrders on the code fragment from Figure 3.1b gives
us a single constraint S v T as all other pairs of statements are independent of each other. In
Figure 3.1a, we get A v B v C ∧ 1 v 2 v 3 v 4.
Other transformations. We motivate another sequential-semantics preserving transformation
with an example. Some further transformations are in Section 3.5.1.
Example 3.2.2. In Figure 3.2, the timer thread is invoked when timer_enabled = 1
to handle requests. The device shutdown thread, shutdown, handles the remaining requests
38
Figure 3.2 Example for copying idempotent statements.
init : timer_enabled := 1; halted := 0
timer shutdown
atomic_start 1 : work_queue() work_queue() {
A : assume(timer_enabled) 2 : timer_enabled := 0 P : unsafe()
B : timer_enabled := 0 3 : assert(¬timer_enabled) Q : timer_enabled := 1
C : work_queue() }
atomic_end
and disables the timer. There are two correctness conditions: (1) the timer is disabled after
device shutdown; and (2) the unsafe() function can be accessed only by one thread at a time.
Condition (2) is violated as statements 1 and C can cause unsafe to be executed simultaneously.
This happens if statement C calls unsafe through work_queue, and after executing a few
statements of unsafe, thread timer executes and, in the atomic section, calls unsafe. One
fix is to move 2 before 1. This introduces a trace where the assertion fails as the timer gets
re-enabled by 1 (switching 1 and 2 is not semantics preserving). A possible fix is to execute
statement 2 twice, before and after statement 1.
The above example illustrates another useful semantics-preserving transformation, namely,
replication of idempotent statements. A statement ` occurring after `′ can be replicated before `′
if stmt(`); stmt(`′); stmt(`) has the same effect as stmt(`′); stmt(`).
3.3 Generalising Counterexamples to Partial-order Neighbour-
hoods
In this chapter we consider bad partial-order neighbourhoods represented by a partial-order
HB-formula. We consider an execution bad if its last state is 〈failed〉. A bad partial-order
neighbourhood contains only bad traces, i.e., for every trace there exists at least one bad execution.
We do not guarantee that the bad partial-order neighbourhood of trace π contains all bad traces
in Nπ
Error invariants. Error invariants were introduced in [Ermis et al., 2012] in a sequential
setting. Here we use them to generalise counterexamples to partial-order HB-formulæ. Let π be
a trace S0`1S1 . . . `kSk. An error invariant ErrInv is a function from ` to state assertions, such
that :
39
(a) ErrInv(`1) represents exactly state S0
(b) ∀i. ErrInv(`i) over-approximates the set of states reachable at `i along π. That is, ErrInv(`i)
holds for Si.
(c) ∀i. ErrInv(`i) under-approximates the set of states from which we can reach the failed state
along π starting from `i. That is, if ErrInv(`i) holds for Si, then Sn = 〈failed〉.
We generalise the notion of error invariant to neighbourhoods. An error invariant for Nπ
is a function ErrInv from location identifier ` to state assertions such that ErrInv is an error
invariant for every trace π′ in Nπ.
3.3.1 Generalising Counterexample Traces
Given an erroneous trace π, we now present techniques for generalising it into a bad partial-order
neighbourhood N. We assume we have discovered for each location `i in the original trace an
error invariant ErrInv(`i) = J`i .
The trace generalisation technique proceeds iteratively. Given a partial-order neighbourhood
N represented by a partial-order HB-formula, in each step, we attempt to relax N by removing
the conjunct hb(`a, `b) for two location identifiers `a, `b where tid(`a) 6= tid(`b). Further, we
require that ¬∃` : `a N ` N `b. However, we need to ensure that the resultant neighbourhood
remains bad after the relaxation, i.e., that every trace contained in it is an erroneous trace. We
formalise this condition below.
Let `c, `d ∈ locs(π) be such that `c N `a N `b N `d and ∀` ∈ locs(π). ` N `c ∨ `c N
` N `d ∨ `d N `. Further, let κ ⊆ locs(π) be the set {`|`c N ` N `d} \ {`c, `d}, i.e., κ
represents the set of statements occur between `c and `d. We call the triple (`c, `d, κ) a border set
of `a, `b and Nπ.
Let J`c and J`d be the error invariants at `c and `d. Intuitively, we check that we can get
from J`c to J`d for every ordering of statements in κ allowed by N \ hb(`a, `b). Formally, let
`1, `2, . . . `n be such that each `i ∈ κ and ∀` ∈ κ. ∃i. `i = `, and ∀i. `i N `i+1 ∨ `i =
`b ∧ `i+1 = `a. We allow relaxing the condition hb(`a, `b) in a step if and only if the following
holds: for every sequence `1; `2; . . . ; `n satisfying the above conditions, the Hoare-triple
{J`c}stmt(`1); stmt(`2); . . . ; stmt(`n){J`d} is valid.
Therefore, the full technique for generalising a trace is as follows: We start with the neigh-
bourhood that contains only the original trace N ≡
∧
{hb(`i, `j)|`i <π `j}. Then, in each step,
40
we pick `a and `b, and then check the above conditions. If they hold, we relax by removing the
constraint hb(`a, `b) from N. Although this technique is sound and complete for generalising
traces, it can be inefficient due to the large number of complex checks needed in each iteration.
Instead, we present an alternative algorithm (Algorithm 3.1) that is sound, but incomplete. The
outline of the algorithm is the same as the complete technique presented above, i.e., in each
iteration, the algorithm attempts to relax hb(`a, `b). However, we use two alternative checks.
Algorithm 3.1 Generalising linear counterexamples
Require: counterexample trace π, error invariant ErrInv
Ensure: bad partial-order neighbourhood N represented by an HB-formula
1: N ←
∧
{hb(`i, `j)|`i <π `j}
2: for all hb(`a, `b) in N do
3: N ← N \ hb(`a, `b)
4: `c, `d, κ← borderSet(`a, `b,N)
5: res← true
6: for all U, V ∈ κ do
7: if U / V ∧ V / U then
8: res← res ∧ (check1(`u, `v, `c, `d,N,ErrInv ) ∨
9: check2(`u, `v, `c, `d,N,ErrInv ))
10: end if
11: end for
12: if ¬ res then N ← N ∧ hb(`a, `b)
13: end for
14: return N
Rule 1, implemented in procedure check1, allows relaxing the order between statements
that commute under certain conditions. To relax the edge from `u to `v, we check if there
exists K1 such that {J`c}stmt(`c){K1} is a valid Hoare-triple and K1∧ stmt(`v); stmt(`u) =⇒
K1 ∧ stmt(`u); stmt(`v). Intuitively, we are checking if the statements at `u and `v commute
given the pre-condition K1. Further, we require that other statements do not interfere with K1,
i.e., for all ` ∈ κ, K1 is preserved under `, i.e., {K1}stmt(`){K1} is a valid Hoare-triple.
Rule 2, implemented in procedure check2, allows relaxing the order between statements
which do not commute, but ensure the similar post-conditions in both orders. The procedure
check2(`u, `v, `c, `d,N) works as follows. Let J`c be the error invariant at `c, and let J`d be the
error invariant at `d. The procedure returns true if and only if there exists two state assertions
K1 and K2 such that for all nodes the following conditions hold: (a) {J`c}stmt(`c){K1},
{K1}stmt(`u); stmt(`v){K2}, and {K1}stmt(`v); stmt(`u){K2} are valid Hoare-triples; and
(b) K2 =⇒ J`d . These conditions state that the error invariants are sufficient to prove that `u
and `v commute. Furthermore, let ` be any other node in κ. We require that ` preserves K1 and
41
K2, i.e., the following two Hoare-triples are valid: (c) {K1}stmt(`){K1} (d) {K2}stmt(`){K2}.
Intuitively, instead of checking all allowed paths from `c to `d, we find state assertions K1 and
K2 that are strong enough to prove commutativity, but are preserved by other statements in κ.
Example 3.3.1.
• Consider threads (1 : x := 0; 2 : x := x+1) and (A : x := x+1; B : assert(x ≤ 1)). Here,
1→ A→ 2→ B is an erroneous trace. However, the ordering of A and 2 is irrelevant to
the bug. This order can be eliminated by applying Rule 1 with precondition K1 ≡ true,
as we have A; 2 =⇒ 2; A.
• Using Rule 1 in the illustrative example (Figure 3.1a) taking K1 to be l1 = 1 ∧ l2 = 1
lets us commute the statements x := l1 and x := l2.
• Consider two threads each with the code (1 : x := 3) and (A : x := 2; B : assert(x = 0)).
The erroneous trace here is 1→ A→ B. Here, it is clear that 1 and A cannot be relaxed by
check1, because scheduling them in reverse order results in a different state. However, in
the context of this trace, interchanging A and 1 still preserves the error. Therefore, using
Rule 2 with K1 ≡ true and K2 ≡ x > 0 relax the ordering between A and 1.
We note that Rule 1 and Rule 2 provide only a sound, not a complete proof system for trace
generalisation. Application of both these rules involve finding suitable K1 and K2. The set of
conditions imposed on K1 and K2 can be expressed as Horn clauses. Solving Horn clauses
(in logics useful for program analysis) is a focus of recent research. Non-recursive version
was solved by [Gupta et al., 2011a], and recursive Horn clauses are solved successfully using
heuristics, for example, in [Gupta et al., 2011b]. These techniques can be used to implement
check1 and check2.
Theorem 3.3.2. Let π be a counterexample trace corresponding to an erroneous trace, and
ErrInv an error invariant for π. If Algorithm 3.1 returns neighbourhood N on π and ErrInv ,
then N is a bad neighbourhood of π.
Now at step i, we pick a constraint in N, and try to remove it. We remove a constraint
hb(`u, `v) if in all traces of N where `u is immediately followed by `v, we can swap the two and
still hit the bug. That the border nodes lb and ub exist is easy to see (X is finite).
42
3.4 Synthesis by Elimination of Bad Neighbourhoods
We now present Algorithm 3.2 to solve the synthesis problem stated in Section 2.3.4 for programs
with an explicit specification using atomic sections and reorderings. It works by finding a buggy
trace, generalising it, and then eliminating it using either an atomic section, or a code reordering.
The algorithm maintains a program constraint Φ. In each iteration, program C ′ which satisfies
Φ is picked and verified. If correct, it is returned. Otherwise, Φ is strengthened using the
generalised counterexample. Note that as Verify is solving an undecidable problem, it may
not terminate. This results in our algorithm not terminating as well. However, as the constraint is
strengthened at each step and only a finite number exist, if all calls to Verify terminate, then
the algorithm terminates and always returns a correct program. The verification question may be
undecidable because our language is Turing-complete. This correct program, in the worst case,
will have every thread enclosed in an atomic section.
Algorithm 3.2 Synthesis algorithm
Require: Library C
Ensure: Error-free program C ′ sequentially equivalent to C
1: Φ← SemPreservingOrders(C)
2: while true do
3: C ′ ← Choose(Φ)
4: if Verify(C ′) return C ′
5: N ← Generalise(cex(C ′),Φ)
6: Φ← Φ ∧ Eliminate(N,Φ)
7: end while
Algorithm SemPreservingOrders was defined in Section 3.2. Generalise is the Al-
gorithm 3.1. Algorithm Choose picks a program satisfying a given constraint. Eliminate (see
below) finds constraints to eliminate a generalised po-trace.
The basic idea behind generalised trace elimination is that N encodes the happens-before
relation among statements and hence cannot contain loops. Hence, we aim to enforce minimal
constraints to introduce a cycle in the HB-formula representing N. We extend the HB-formula
representingN by introducing constraints corresponding to possible atomic sections and reorder-
ings. We then find the smallest cycles, which correspond to the required minimal constraints.
Fix a program C and a partial-order neighbourhood N for the remainder of this section.
We represent N with an elimination graph G(N,Φ) = (S,E), that is a weighted graph with
vertices S = locs(π). Here Φ refers to the constraints from SemPreservingOrders and previous
43
iterations. The edges E ⊆ S × N × S are described below. Let `, `′ ∈ S. The function cons
assigns a constraint to each edge of the elimination graph. We have (`, w, `′) ∈ E if:
• tid(`) 6= tid(`′) ∧ `  `′ ∧ w = 1 ∧ ¬∃`′′. `  `′′  `′. In this case, we define
cons((`, w, `′)) = >.
• tid(`) = tid(`′), where `  `′ and either ` and `′ belong to different blocks or Φ =⇒
` v `′. We have w = |{`′′ | `  `′′  `′}|. Here, we let cons((`, w, `′)) = >. These edges
correspond to happens-before relations that hold due to Φ.
• tid(`) = tid(`′) ∧ Φ =⇒ `′ v ` and w = A · |{`′′ | `′  `′′  `}| for some constant
A ∈ N. Here, we define cons((`, w, `′)) = ({`, `′}, ∅). Such edges correspond to adding
an atomic section around ` and `′. We give the atomic section a cost proportional to the
minimum number of control locations it contains.
• tid(`) = tid(`′) ∧ Φ /=⇒ ` v `′ ∧ Φ /=⇒ `′ v ` and w = R · |{(`′′, `′′′) | Φ /
=⇒ `′′ v `′′′ ∧ Φ =⇒ `′′ v ` ∧ `′ v `′′′} for some R ∈ N. Here, we define
cons((`, w, `′)) = (∅, {(`, `′)}). This edge corresponds to forcing the order ` before `′ and
has a cost proportional to the number of additional statement orders the constraint implies.
Intuitively, an edge (`, w, `′) with cons((`, w, `′)) = > represents a happens-before relation true
in any C ′ satisfying Φ. Every remaining edge (`, w, `′) is a happens-before relation true in any
program satisfying cons((`, w, `′)). We pick A much larger than R to prefer solutions having
only reorderings rather than atomic sections (picking A and R such that A > R · |locs(π)|2 is
sufficient).
Let `0 . . . `n−1`0 be a cycle in the elimination graph for the partial-order neighbourhood N
and Φ such that `0 = ` ∧ `n−1 = `′ and tid(`) 6= tid(`′). We call such a cycle an elimination
cycle. We show that any elimination cycle gives us a constraint that eliminates all traces in N.
From the elimination cycle, we obtain the following constraint
∧n−2
i=0 cons((xi, xi+1)). This is
the constraint returned by Eliminate (called from Algorithm 3.2). A constraint Φ′ eliminates a
neighbourhood N iff all libraries satisfying Φ ∧ Φ′ and sequentially equivalent to C do not share
a trace with N.
Theorem 3.4.1. LetG(N,Φ) contain an elimination cycle `0`1 . . . `n−1`0. Then,
∧n−2
i=0 cons((`i, `i+1))
eliminates the partial-order neighbourhood N.
Proof. Assume π = S0`1S1 . . . `kSk to be an execution. Any execution π in C ′ satisfying Φ and
cons(`i, `i+1) has `i < `i+1. Hence, any trace π satisfying
∧n−2
i=0 cons((`i, `i+1)) and Φ satisfies
44
`0 < `n−1. However, as (`n−1, `0) is an edge in the elimination graph where `0 and `n−1 come
from different threads, we have that `n−1 N `0 and hence, `n−1 < `0. This is not possible as
`0 and `n−1 correspond to different threads. Hence, every execution π ∈ N is eliminated by∧n−2
i=0 cons((`i, `i+1)).
Further, the minimal elimination cycle corresponds to a minimal constraint. As A >
R|locs(π)|2, atomic sections are used iff N cannot be eliminated by reordering.
Theorem 3.4.2. If Φ is the constraint corresponding to the minimal cycle in the elimination
graph, no strictly weaker constraint is sufficient to eliminate N.
Finding minimal cycles can be done by running an all-pairs shortest path algorithm, and
finding nodes u, v from different threads such that sum of distances u to v and v to u is minimal.
Hence, the theorem follows.
Theorem 3.4.3. Finding minimal elimination cycles in the elimination graph G(N,Φ) can be
done in time polynomial in the size of N and Φ.
3.5 Implementation and Experiments
3.5.1 A study of concurrency bugs in Linux drivers
Our work is motivated by a study of concurrency defects in Linux device drivers. Drivers are
required to perform well under concurrent workloads, which calls for sparing and fine-grained
use of locks. This, in turn, provokes many concurrency-related bugs, making concurrency
a major source of errors in drivers [Chou et al., 2001; Ryzhyk et al., 2009]. Additionally,
the kernel imposes a number of constraints on the use of locks. For example, a driver may
not perform blocking operations, such as acquiring a mutex, in its interrupt handler routine.
Driver threads interact with other kernel threads; however, since the developer can not modify
kernel code outside the driver, synchronisation with external threads must be achieved without
placing additional locks in the kernel. Our study considered 100 most recent (as of Dec. 2012)
concurrency-related defects fixed in Linux device drivers (we used the Linux kernel development
archive obtained from www.kernel.org). These defects occurred in 68 different drivers,
all maintained by different developers. For each bug, we rely on manual code inspection to
understand the exact nature of the bug and the fix.
45
Table 3.1 Synchronisation patterns in Linux device drivers.
pattern description #
REORDER Reorder program statements to eliminate a race 28
LOCK Protect racing code sections with a lock 17
OPTIMISTIC Check if another thread has modified the value of a shared variable 10
BARRIER Use a system-provided function to wait for a racing thread to terminate or complete
a critical section
7
ATOMIC Replace a statement-sequence with an equivalent atomic primitive 6
UPGRADE Replace a synchronisation primitive with a stronger one 5
UNSHARE Avoid sharing by creating a private copy of a shared variable 3
CLONE Replicate an idempotent statement 1
ADHOC Transformations that do not fall into one of the previous categories 23
Total 100
Figure 3.3 Examples of REORDER subpatterns and corresponding elimination graphs.
(a) REORDER.RELEASE
init : x := 0, run := 0
thread1 thread2
B : x := 1 1 : wait(run)
A : signal(run) 2 : assert(x)
((((
(
B : x := 1
(b) REORDER.LOCK
init : x := 1
thread1 thread2
B : x := 0 1 : lock(l)
D : x := 1 2 : assert(x)
C : unlock(l) 3 : unlock(l)
((((
(
D : x := 1
(c) REORDER.DELAY




A : x := 0 1 : assert(x)
B : wait(exit) 2 : signal(exit)
A : x := 0
We observed that many bug fixes involve subtle and seemingly ad hoc code transformations.
In-depth analysis reveals several common patterns, shown in Table 3.1. In particular, 28 of 100
fixes were semantic-preserving statement reorderings (the REORDER pattern). These further fall
into several subpatterns (see Table 3.2 and Figure 3.3). Reordering statements often involves
additional side effects. For example, moving a statement across function boundaries may require
adding arguments or return values to functions. Our implementation currently does not perform
these, but can be extended to do so.
Interestingly, the LOCK pattern (17%) is rarer than expected. Performance and kernel-
imposed constraints often prevent lock usage. This observation confirms that locks are not
a universal band-aid for concurrency defects in OS code. We do not discuss remaining bug
categories, but note that we encountered 23 bug fixes that did not fit into any pattern (AD HOC in
Table 3.1). We expect to discover new patterns among these as we include more defects in our
study.
46
Table 3.2 Subpatterns of the REORDER pattern.
pattern description example #
REORDER.RELEASE Move a variable assignment to a location before another
thread accessing this variable is released
Figure 3.3a 11
REORDER.LOCK Move statements to existing lock-protected section Figure 3.3b 10
REORDER.DELAY Delay assignment to a shared variable until a racing thread
accessing this variable has terminated
Figure 3.3c 6
REORDER.RW Reorder accesses to a pair of shared variables Figure 3.1b 1
REORDER.ADHOC Application-specific reordering – 1
3.5.2 Synthesis case study
We implemented our synthesis algorithm in a tool called CONCURRENCYSWAPPER1. It handles
a restricted subset of C, avoiding complex parts including pointer arithmetic, aliasing, bit-wise
arithmetic, etc. It uses CPAChecker [Beyer and Keremoglu, 2011] to convert C statements into
formulæ representing statements, as in Section 2.3.4. We use the bounded model checking tool
Q [Q] to detect three kinds of bugs: (a) assertion failures; (b) generic correctness conditions
(e.g., initialisation-before-use for pointers); and (c) deadlocks (as Q does not detect deadlocks,
we manually encoded these as suitable assertions for our examples). We generalise buggy traces,
using the Z3 theorem prover [de Moura and Bjørner, 2008] to perform the required checks for
Rules 1 and 2. The current implementation does not compute invariants during generalisation;
but even without invariant computation, our tool came up with the right program transformations
quickly. To evaluate the effectiveness of trace generalisation, we ran the experiments with and
without it.
Reporting. Although each iteration of the algorithm eliminates a buggy po-trace, additional
traces may exhibit the same bug. We report the iterations needed to completely fix a bug, i.e.,
until no more traces exhibit a similar bug. Also, we report separately, the time taken to: (a) find
bugs; (b) generalise the trace and find a fix; and (c) verify the correct program. We report the
verification time separately as it is usually the largest fraction of execution time.
Benchmarks. Our initial evaluation consisted of 5 microbenchmarks each of 15–30 lines of
code without comments, and modelling a single concurrency defect found in a real Linux driver.
The iterations required and fix patterns are summarised in Table 3.3. All measurements were




Benchmark Fix pattern Iters. Iters (w/o trace gen.)
ex1 REORDER.RW 1 1
ex2 REORDER.RELEASE 1 1
ex3 REORDER.LOCK 1 1
ex4 REORDER.ADHOC 3 3
ex5 LOCK 2 3
Table 3.4 Results for Linux Realtek 8169 driver benchmark
Bug Fix pattern With trace generalisation Without trace generalisation
Iters. Bug-finding Iters. Additional bug-finding
bug1 REORDER.RELEASE 1 8 sec 1 same
bug2 REORDER.DELAY 1 23 sec 4 same + 80 sec
bug3 REORDER.RW 1 93 sec 1 same
bug4 REORDER.RW 1 94 sec 1 same
bug5 REORDER.ADHOC 2 47 sec 2 same
done on an Intel core i5-3320M laptop with 8GB of RAM. The synthesis took less than 15
seconds for each case, with trace analysis taking less than 0.5 seconds. Also, in 1 case, not using
trace generalisation leads to an additional iteration, leading to a larger execution time.
We evaluate the scalability of CONCURRENCYSWAPPER using a simplified version of the
Linux Realtek 8169 driver. This driver is representative of medium to high-end drivers both in
terms of overall complexity and the complexity of synchronisation logic. We extracted the driver’s
complete synchronisation skeleton, including code and variables related to thread synchronisation
and communication. The skeleton does not include the actual device management code, which
is irrelevant to concurrency, and was additionally simplified to avoid currently unsupported C
constructs. We provide an environment model to simulate all (7) OS threads that interact with
the driver. The resulting skeleton had 364 LOC, while the original driver had around 7,000 LOC.
The skeleton had 5 concurrency defects.
Q was able to find all the defects, and CONCURRENCYSWAPPER was able to find fixes for
each defect through statement reordering. The results are summarised in Table 3.4. In each
iteration, the trace analysis phase took less than 2 seconds. The extra bug finding times due
to additional iterations is reported for the runs without trace generalisation. In one case, the 3
additional iterations were required without trace generalisation. The bug finding times dominate
the trace analysis times, justifying the use of complex trace generalisation procedure to avoid





4.1 Problem Statement and Illustrative Example
In Chapter 3 we introduced a trace-based algorithm for program repair of programs with explicit
specification. We applied two types of program transformations: (1) reordering of adjacent
statements stmt(`a); stmt(`b) within a thread if the statements are sequentially independent (i.e.,
if stmt(`a); stmt(`b) is sequentially equivalent to stmt(`b); stmt(`a)), and (2) inserting atomic
sections. Reordering of statements is given priority as it may result in a better performance than
the insertion of atomic sections.
While placing atomic sections is safe, reordering statements can result in assertion violations
becoming reachable that were not reachable in the original program. This is called a regression.
In this chapter we demonstrate the use of good traces to prevent regressions.
We say that such an algorithm is regression-free if after every iteration, we have that: first,
all bad traces examined so far are removed, and second, no good trace examined so far is turned
into a bad trace of the new program. (Of course, to make this definition precise, we will need to
define a correspondence between traces of the original program and the new program.)
In related work by [von Essen and Jobstmann, 2013], the goal is to repair reactive systems
(given as automata) according to an LTL specification, with a guarantee that good traces do not
disappear as a result of the repair. They do not deal with concurrency bugs or synchronisation.
Our algorithm. Our algorithm learns constraints on the space of candidate solutions from
both good traces and bad traces. We explain constraint learning using as an example the program
50
transformation `1 ! `2 (Section 3.2), which reorders statements within threads. From a bad
trace, we learn reordering constraints that eliminate the counterexample using Algorithm 3.2.
While eliminating the counterexample, such reorderings may transform a (not necessarily
preemption-free) good trace into a bad trace — this would constitute a regression. In order to
avoid regressions, our algorithm learns also from good traces. Intuitively, from a good trace π,
we want to learn all the ways in which π can be transformed by reordering without turning it into
an error trace— this is expressed as a program constraint. The program constraint is (a) sound,
if all programs satisfying the constraint are regression-free; and (b) complete, if all programs
violating the constraint have regressions. However, as learning a sound and complete constraint
is not possible from a single trace, given a good trace π we learn a sound constraint that only
guarantees that π is not transformed into a bad trace. We generate the constraint using data-flow
analysis on the statements in π. The main idea of the analysis is that in good traces, the data-flow
into passing assertions is protected by synchronisation mechanisms (such as locks) and data-flow
into conditionals along the trace. This protection may fail if we reorder statements. We thus find
a constraint that prevents such bad reorderings.
Summarising, as the algorithm progresses and sees a set of bad traces and a set of good traces,
it learns constraints that encode the ways in which the program can be transformed in order to
eliminate the bad traces without turning the good traces into bad traces of the resulting program.
CEGIS vs PACES. A popular recent approach to synthesis is counterexample-guided inductive
synthesis (CEGIS) [Solar-Lezama et al., 2006]. Our algorithm can be viewed as an instance of
CEGIS with the important feature that we learn from positive examples. We dub this approach
PACES, for Positive- and Counter-Examples in Synthesis. The input to the CEGIS algorithm
is a specification ϕ (possibly in multiple pieces – say, as a temporal formula and a language
of possible solutions [Alur et al., 2013]). In the basic CEGIS loop, the synthesiser proposes
a candidate solution S, which is then checked against ϕ. If it is correct, the CEGIS loop
terminates; if not, a counterexample is provided and the synthesiser uses it to improve S.
In practice, the CEGIS loop often faces performance issues, in particular, it can suffer from
regressions: new candidate solutions may introduce errors that were not present in previous
candidate solutions. We address this issue by making use of positive examples (good traces)
in addition to counterexamples (bad traces). The good traces are used to learn constraints that
ensure that these good traces are preserved in the candidate solution programs proposed by the
51
CEGIS loop. The PACES approach applies in many program synthesis contexts, but in this
chapter, we focus on program repair for concurrency.
Experimental evaluation. To evaluate our approach, we implemented a program repair tool,
named CONREPAIR, and applied it to a collection of (simplified) open-source Linux device
drivers. We looked at concurrency bugs that were reported and fixed in the Linux kernel
repository. We used five examples, where we modelled the concurrency skeleton in sufficient
detail to reproduce the bug (between 35 and 80 lines of code per example). In addition, we did
two larger case studies, of the rtl8169 and usb-serial drivers, modelled in more detail,
with about 400 lines of code each. As explained above, our tool tries to fix a bug by reordering
statements within threads, which is the preferred solution and/or by inserting atomic sections. In
each case, our tool found a solution that fixed the problem (or, in the case of rtl8169, multiple
problems). To evaluate the impact of using positive examples, we compared CONREPAIR with
two versions of CONCURRENCYSWAPPER (Chapter 3), which do not use positive examples.
The first version of CONCURRENCYSWAPPER (ce1) prefers to exhaust all possible statement
reorderings before using atomic sections; the second version (ce2) heuristically decides to insert
atomic sections earlier. We found that (a) the new tool converges to a solution in a significantly
smaller number of iterations than (ce1), and (b) the new tool finds solutions with fewer atomic
sections than (ce2) in a comparable number of iterations. We thus conclude that the use of
positive examples can substantially improve the performance and quality of counterexample-
guided inductive synthesis algorithms. In theory it is possible that ce2 inserts an atomic section
earlier that is not needed.
4.1.1 Illustrative Example
We motivate our approach on the program C in Figure 4.1a. There is a bug witnessed by the
following trace: π1 = A → B → 1 → 2 → 3 (the assertion at line 3 fails). Let us attempt to
fix the bug using the algorithm from Chapter 3. The algorithm discovers possible fixes by first
generalising the trace into a partial order (Figure 4.1b, without the dotted edges) representing the
happens-before relations necessary for the bug to occur, and second, trying to create a cycle in
the partial order to eliminate the generalised counterexample. It finds three possible ways to do
this: swapping B and C, or moving C before A, or moving A after C, indicated by the dotted edges
in Figure 4.1b. Assume that we continue with swapping B and C to obtain program C1 where the
52
Figure 4.1 Program analysis with good and bad traces
init : x := 0; y := 0; z := 0
thread1 thread2 thread3
1 : await(x = 1) A : x := 1 n : await(z = 1)
2 : await(y = 1) B : y := 1 p : assert(y = 1)

















(c) Learning from a good trace
first thread is A; C; B. Program C1 contains an error trace π2 = A→ C→ n→ p (the assertion
at line p fails). This bug was not in the original program, but was introduced by our fix. We refer
to this type of bug as a regression.
In order to prevent regressions, the algorithm learns from good traces. Consider the following
good trace π3 = A→ B→ C→ 1→ 2→ n→ 3→ p. The algorithm analyses the trace, and
produces the graph in Figure 4.1c. Here, the thick red edges indicate the reads-from relation for
assert commands, and the dashed blue edges indicate the reads-from relation for await commands.
Intuitively, the algorithm now analyses why the assertion at line p holds in the given trace. This
assertion reads the value written in line B (indicated by the thick red edge). The algorithm finds a
path from B to p composed entirely from intra-thread sequential edges (B→ C and n→ p) and
dashed blue edges (C→ n). This path guarantees that this trace cannot be changed by different
scheduler choices into a path where p reads from elsewhere and fails. From the good trace π2 we
thus find that there could be a regression unless B precedes C and n precedes p. Having learned
this constraint, the synthesiser can find a better way to fix π1. Of the three options described
above, it chooses the only way which does not reorder B and C, i.e., it moves A after C. This fixes
the program without regressions.
4.1.2 Problem Statement
We use the definition of program constraints Φ introduced in Section 3.2.
53
Trace Transformations and Regressions. A trace π = `0 . . . `m transforms into a trace π′ =
`′0 . . . `
′
m by switching if: (a) `0 . . . `n = `
′
0 . . . `
′
n and the suffixes `n+2 . . . `m and `
′
n+2 . . . `
′
m are
equal; and (b) ln = `′n+1 ∧ `n+1 = `′n. We label switching transformations as a:
• Free transformation if `n and `n+1 are from different threads. We write π′ ∈ f(π) if a
sequence of free transformations takes π to π′.
• Reordering transformation θ = `] ! `[ acting on π if `n = `] and `n+1 = `[ and `], `[
from the same thread.
We have π′ ∈ θ(π) if repeated applications of reordering transformations acting on π give
π′. Similarly, π′ ∈ θf (π) if repeated applications of θ and free transformations acting on π
give π′.
Reordering is a special form a switching because it requires changes to the order of statements
in the actual program. Similarly, π′ is obtained by atomic section transformation θ = [`1; `2]
acting on a trace π if π′ ∈ f(π), and there are no context-switches between `1 and `2 in π′
(Section 3.2).
Trace analysis graphs. We use trace analysis graphs to characterise data-flow and scheduling
in a trace. First, given a trace π = `0 . . . `n, we define the function depends to recursively




{(last(i, v), i)} ∪
dependsπ(last(i, v))
)
where v ranges over variables read by `i, and last(i, v) returns j if `i reads
the value of v written by `j and last(i, v) = ⊥ if no such j exists. As the base case, we define
dependsπ(⊥) = ∅.
Now, a trace analysis graph for trace π = `0 . . . `n is a multi-graph G(π) = 〈V,→〉, where
V = {⊥} ∪ {i|0 ≤ i ≤ n} are the positions in the trace along with ⊥ (representing the initial
state) and→ contains the following types of edges.
1. Intra-thread order (IntraThreadOrder ): We have x → y if either x < y, and `x and `y
are from the same thread, or if x = ⊥.
2. Data-flow into conditionals (DFConds): We have
⋃
a∈conds dependsπ(a) ⊆→ where
x ∈ conds iff stmt(`x) is an assume or an await statement.
3. Data-flow into assertions (DFAsserts): We have
⋃
a∈asserts dependsπ(a) ⊆→ where x ∈
asserts iff stmt(`x) is an assert statement.
4. Non-free order (NonFreeOrder ): We have x → y if stmt(`x) and stmt(`y) write two
different values to the same variable. Intuitively, the non-free orders prevent switching
54
transformations that switch `x and `y.
Regressions. Suppose C θ1,...,θk−−−−→ C ′. We say θ1, . . . , θk introduces a regression with respect to
a good trace π = `0 . . . `m of C if there exists a trace π′ = `′0 . . . `′m ∈ θ
f
k ◦ . . . ◦ θ
f
1 (π) such that:
(a) π′ is a bad trace of C ′; (b) π does not freely transform into any bad trace of C; and (c) for
every data-flow into conditionals edge x → y (say ly reads the variables V from lx) in G(π),
the edge p(x)→ p(y) is a data-flow into conditionals edge in G(π′) (where `′p(y) reads the same
variables V from `′p(x)). Here, p(i) is the position in π
′ of the statement at position i in π after the
sequence of switching transformations that take π to π′. We say θ1 . . . θk introduces a regression
with respect to a set TG of good traces if it introduces a regression with respect to at least one
trace π ∈ TG.
Intuitively, a program-transformation induces a regression if it allows a good trace π to
become a bad trace π′ due to the program transformations. Further, we require that π and π′ have
the conditionals enabled in the same way, i.e., the assume and await statements read from the
same locations.
Remark 4.1.1. The above definition of regression attempts to capture the intuition that a good
trace transforms into a “similar” bad trace. The notion of similar asks that the traces have the
same data-flow into conditionals – this condition can be relaxed to obtain more general notions
of regression. However, this makes trace analysis and finding regression-free fixes much harder
(See Example 4.2.4).
Example 4.1.2. In Figure 4.1, the trace π = A; B; C; n; p transforms under B ! C to
π′ = A; C; B; n; p, which freely transforms to π′′ = A; C; n; p; B. Hence, B ! C introduces a
regression with respect to π as π does not freely transform into a bad trace, and π′ is bad while
the await in n still reads from C.
The Regression-free Program-Repair Problem. Intuitively, the program-repair problem asks
for a correct program C ′ that is a transformation of C. Further, C ′ should preserve all sequential
behaviour of C; and if all preemption-free behaviour of C is good, we require that C ′ preserves it.
Program repair problem. The input is a program C where all complete sequential traces
are good. The result is a sequence of program transformations θ1 . . . θn and C ′, such that
(a) C θ1...θn−−−→ C ′; (b) C ′ has no bad traces; (c) for each complete sequential trace π of C, there
55
exists a complete sequential trace π′ of C ′ such that π′ ∈ θ1 ◦ θ2 . . . ◦ θn(π); and (d) if all
complete preemption-free traces of C are good, then for each such trace π, there exists a complete
preemption-free trace π′ of C ′ such that π′ ∈ θ1 ◦ θ2 . . . ◦ θn(π). We call the conditions (c) and
(d) the preservation of sequential and correct preemption-free behaviour.
Regression-free error fix. Our approach to the above problem is through repeated regression-
free error fixing. Formally, the regression-free error fix problem takes a set of good traces TG, a
program C and a bad trace π as input, and produces transformations θ1, . . . , θk and C ′ such that
C θ1...θk−−−→ C ′, π′ ∈ θfk ◦ . . . ◦ θ
f
1 (π) is a trace in C ′, and θ1, . . . , θk does not introduce a regression
with respect to TG.
Our approach to program-repair is through learning regression preventing constraints from
good traces and error eliminating constraints from bad traces.
4.2 Good and Bad Traces
4.2.1 Learning from Good Traces
Given a trace π of C, a program constraint Φ is a sound regression preventing constraint for π if
no sequence of program transformations θ1, . . . , θk, such that C
θ1...θk−−−→ C ′ and C ′ |= Φ, introduces
a regression with respect to π. Further, if every θ1 . . . θk, such that C
θ1...θk−−−→ C ′ and C ′ 6|= Φ1,
introduces a regression with respect to π, then Φ is a complete regression preventing constraint.
Example 4.2.1. Let the program C be {1 : x := 1; 2 : y := 1}||{A : await(y); B : assert(x =
1)}. In Figure 4.2a, the constraint Φ∗ = (1 v 2 ∧ A v B) is a sound and complete regression-
preventing constraint for the trace 1→ 2→ A→ B.
Lemma 4.2.2. For a program C and a good trace π, the sound and complete regression-
preventing constraint Φ∗ is computable in exponential time in |π|.
Intuitively, the proof relies on an algorithm that iteratively applies all possible free and
program transformations in different combinations (there are a finite, though exponential, number
of these) to π. It then records the constraints satisfied by programs obtained by transformations
that do not introduce regressions.
1see Section 3.2 for definition
56
Figure 4.2 Sample Good Traces for Regression-preventing constraints
1 : x := 1
2 : y := 1
A : await(y = 1)
B : assert(x = 1)
(a)
1 : x := 1
2 : y := 1
A : await(y = 1)
B : assert(x = 1)
C : a := 1
3 : assume(a = 1)
4 : x := 0
(b)
1 : x := 1
2′ : y := 2
2 : y := 1A : await(y >= 1)
B : assert(x = 1)
(c)
While the complexity is exponential, we can show that this cost is unavoidable. We do not
present the proof here, but only state that it is non-constructive and is based on Shannon’s lower
bounds on circuit complexity for boolean functions.
Lemma 4.2.3. There exist a class of programs Cn, and traces πn of length O(n) such that the




The sound and complete constraints are usually large and impractical to compute. Instead, we
present an algorithm to compute sound regression-preventing constraints. The main issue here
is non-locality, i.e., statements that are not close to the assertion may influence the regression-
preventing constraint.
Example 4.2.4. The trace in Figure 4.2b is a simple extension of Figure 4.2a. However, the
constraint (1 v 2 ∧ A v B) (from Example 4.2.1) does not prevent regressions for Figure 4.2b.
An additional constraint B v C ∧ 3 v 4 is needed as reordering these statements can lead
to the assertion failing by reading the value of x “too late”, i.e., from the statement 4 (trace:
1→ 2→ A→ C→ 3→ 4→ B).
57
Figure 4.2c clarifies our definition of regression, which requires that the data-flow edges into
assumptions and awaits need to be preserved. The await can be activated by both 2 and 2′; in
the trace we analyse it is activated by 2. Moving 2′ before 1 could activate the await “too early”
and the assertion would fail (trace: 2′ → A→ B). However, it is not possible to learn this purely
with data-flow analysis – for example, if statement 2′ was y := −1, then this would not lead to a
bad trace. Hence, we exclude such cases from our definition of regressions by requiring that the
await reads A reads from the same location.
Learning Sound Regression-Preventing Constraints. The sound regression-preventing con-
straint learned by our algorithm for a trace ensures that the data-flow into an assertion is preserved.
This is achieved through two steps: suppose an assertion at location `a reads from a write at
location `w. First, the constraint ensures that `w always happens before `a. Second, the constraint
ensures that no other writes interfere with the above read-write relationship.
For ensuring happens-before relationships, we use the notion of a cover. Intuitively, given a
trace π of C where location `x happens before location `y, we learn a constraint Φ that ensures
that if C ′ |= Φ, then each trace π′ of C ′ obtained as free and program transformations acting on π
satisfies the happens-before relationship between `x and `y. Formally, given a trace π of program
C, we call a path x1 → x2 → . . . → xn in the trace analysis graph a cover of edge x → y if
x = x1 ∧ y = xn and each of xi → xi+1 is either a intra-thread order edge, or a data-flow into
conditionals edge, or a non-free order edge.
Given a trace π = `0; `1 . . . `n, where statement at position r (i.e., `r) reads a set of variables
(say V ) written by a statement at position w (i.e., `w), the non-interference edges define a
sufficient set of happens-before relations to ensure that no other statements can interfere with
the read-write pair, i.e., that every other write to V either happens before w or after r. Formally,
we have that interfere(w → r) = {r → w′ | w′ > r ∧ write(`w′) ∩ write(`w) ∩ read(`r) 6=
∅} ∪ {w′ → w | w′ < w ∧ write(`w′) ∩ write(`w) ∩ read(`r) 6= ∅} where read(`) and write(`)
are the variables read and written at location `. If w = ⊥, we have interfere(w → r) = {r →
w′ | w′ > r ∧ write(`w′) ∩ read(`r) 6= ∅}.
Algorithm 4.1 works by ensuring that for each data-flow into assertions edge e, the edge
itself is covered and that the interference edges are covered. For each such cover, the set of
intra-thread order edges needed for the covering are conjuncted to obtain a constraint. We take
the disjunction Φ′ of the constraints produced by all covers of one edge and add it to a constraint
58
Algorithm 4.1 LearnGoodUnder
Require: A good trace π
Ensure: Regression-preventing constraint Φ
1: Φ← true;G← G(π)







3: if e is not covered then return
∧
{`x ≤ `y | x→ y is a intra-thread order edge}
4: Φ′ ← false
5: for all x1 → x2 → . . .→ xn cover of e do
6: Φ′ ← Φ′ ∨
∧
{`xi ≤ `xi+1 | xi → xi+1 is a intra-thread order edge and xi 6= ⊥
`xi and `xi+1 are from the same execution of a basic block in π }
7: end for
8: Φ← Φ ∧ Φ′
9: end for
10: return Φ
Φ to be returned. If an edge cannot be covered, the algorithm falls back by returning a constraint
that fixes all current intra-thread orders. The algorithm can be made to run in polynomial time in
|π| using standard dynamic programming techniques.
Theorem 4.2.5. Given a trace π, Algorithm 4.1 returns a constraint Φ that is a sound regression-
preventing constraint for π and runs in polynomial time in |π|.
Proof. The fallback case (line 3) is trivially sound. Let us assume towards contradiction that there
is a bad trace π′ = `′0; `
′
1 . . . `
′
n of C ′ |= Φ, that is obtained by transformation of π = `0; `1 . . . `n.
For each 0 ≤ i < n, let p(i) be such that the statement at position i in π is at position p(i) in π′
after the sequence of switching transformations taking π to π′.
If for every data-flow into assertion edge in x→ y in G(π), we have that p(x)→ p(y) is a
corresponding data-flow into assertion edge in G(π′), then it can be easily shown that π′ is also
good (each corresponding edge in π′ reads the same values as in π). Now, suppose x→ y is the
first (with minimal x) such edge in π that does not hold in π′. We will show in two steps that
p(x) happens before p(y) in π′, and that p(y) reads from p(x) which will lead to a contradiction.
For the first step, we know that there exists a cover of x→ y in π. For now, assume there is
exactly one cover – the other case is similar. For each edge a → b in this cover, no switching
transformation can switch the order of `a and `b:
• If a→ b is a data-flow into conditionals edge, as π′ has to preserve all DFConds edges
(definition of regression), p(a) happens before p(b) in π′.
• If a→ b is a non-free order edge, no switching transformation can reorder a and b as that
would change variables values (by definition of non-free edges).
59
• If a→ b is a intra-thread order edge, we have that C ′ |= Φ and Φ =⇒ a v b, and hence,
no switching transformation would change the order of a and b.
Hence, we have that all the happens before relations given by the cover are all preserved by π′
and hence, p(a) happens before p(a) in π′. The fact that p(y) reads from p(x) follows from a
similar argument with the interfere(x → y) edges showing that every interfering write either
happens before p(x) or after p(y).
4.2.2 Eliminating Bad Traces
Given a bad trace π of C, a program constraint Φ is a error eliminating constraint if for all
transformations θ1, . . . , θk and C ′ such that C
θ1...θk−−−→ C ′ and C ′ |= Φ, each bad trace π′ in
θfk ◦ . . . ◦ θ
f
1 (π) is not a trace of C ′. In Chapter 3, we presented an algorithm to fix bad traces
using reordering and atomic sections. The main idea behind the algorithm is as follows. Given a
bad trace π, we (a) first, generalise the trace into a partial order neighbourhood; and (b) then,
compute a program constraint that violates some essential part of the ordering necessary for the
bug.
More precisely, the algorithm builds a trace elimination graph which contain edges corres-
ponding to the orderings necessary for the bug to occur, as well as the edges corresponding
program constraints. Fixes are found by finding cycles in this graph – the conjunction of the
program constraints in a cycle form an error elimination constraint. Intuitively, the program
constraints in the cycle will enforce a happens-before conflicting with the orderings necessary
for the bug.
Example 4.2.6. Consider the program in Figure 4.3a and the trace elimination graph for the
trace A; B; 1; 2; C. The orderings A happens-before 1 and 2 happens-before C are necessary for
the error to happen. The cycle C→ A→ 1→ 2→ C is the elimination cycle. The corresponding
error eliminating constraint is C v A ∧ 1 v 2, and one possible fix is to move C ahead of A.
For the bad trace A; 1; B in Figure 4.3b, the elimination cycle is A→ 1→ B→ A giving us the
constraint [A; B] and an atomic section around A; B as the fix.
The FixBad algorithm. The FixBad algorithm takes as input a program C, a constraint Φ
and a bad trace π. It outputs a program constraint Φ′, sequence of program transformations
θ1, . . . , θk, and a new program C ′, such that C
θ1...θk−−−→ C ′. The algorithm guarantees that (a) Φ′ is
60
Figure 4.3 Eliminating bad traces
A : x := 1
B : z := 1
C : y := 1
1 : await(x = 1)
2 : assert(y = 1)
1 v 2C v A
(a)
A : x := 0
B : x := 1
1 : assert(x = 1)[A,B]
(b)
A : x := 1
B : y := 1
1 : assert(y = 1)
B  1
(c)
an error eliminating constraint; (b) C ′ |= Φ∧C ′ |= Φ′; and (c) if there is no preemption-free trace
π′ of C such that π freely transforms to π′ (i.e., π′ ∈ f(π)), then none of the transformations
θ ∈ {θ1, . . . , θk} acts across preemption points. The fact that θ1 . . . θk and C ′ can be chosen to
satisfy (c) is a consequence of the algorithm from Chapter 3.
Fixes using wait/signal statements. Some programs cannot be fixed by statement reordering
or atomic section insertion. These programs are in general outside our definition of the program
repair problem as they have bad sequential traces. However, they can be fixed by the insertion of
wait/signal statements. One such example is depicted in Figure 4.3c where the trace 1; A; B causes
an assertion failure. A possible fix is to add a wait statement before 1 and a corresponding signal
statement after B. The algorithm FixBad can be modified to insert such wait-signal statements
by also considering constraints of the form X  Y to represent that X is scheduled before Y
– the corresponding program transformation is to add a wait statement before Y and a signal
statement after X . In Figure 4.3c, the edge B → 1 represents such a constraint B  1 – the
elimination cycle 1→ B→ 1 corresponds to the above described fix.
4.3 The Regression-free Synthesis Procedure
Algorithm 4.2 is a program-repair algorithm to fix concurrency bugs while avoiding regressions.
The algorithm maintains the current program C, and a constraint Φ that restricts possible reorder-
ings. In each iteration, the algorithm tests if C is correct and if so returns C. If not it picks a trace
61
Algorithm 4.2 Program-Repair Procedure for Concurrency
Require: A concurrent program C, all sequential traces are good
Ensure: Program C ′ such that C ′ has no bad traces
1: Φ← true;TG ← ∅; C ′ ← C
2: while true do
3: if Verify(C ′) = true then return C ′
4: Choose π from C ′ (non-deterministic)
5: if π is non-erroneous then
6: Φ← Φ ∧ LearnGood(π)
7: TG ← TG ∪ {π}
8: else
9: ([θ1, . . . , θk], C ′,Φ′)← FixBad(C ′,P ,Φ, π)





g|π′g ∈ θk ◦ . . . ◦ θ1(πg) ∧ π′g ∈ C ′}
12: end if
13: end while
π in C (line 4). If the trace is good it learns the regression-preventing constraint Φ for π and
the trace π is added to the set of good traces TG (TG is required only for the correctness proof).
If π is bad it calls FixBad to generate a new program that excludes π while respecting Φ, and
Φ is strengthened by conjunction with the error elimination constraint Φ′ produced by FixBad .
The algorithm terminates with a valid solution for all choices of C ′ in line 10 as the constraint
Φ is strengthened in each FixBad iteration. Eventually, the strongest program-constraint will
restrict the possible program C ′ to one with large enough atomic sections such that it will have
only preemption-free or sequential traces.
Theorem 4.3.1 (Soundness). Given a program C, Algorithm 4.2 returns a program C ′ with no
bad traces that preserves the sequential and correct preemption-free behaviour of C. Further,
each iteration of the while loop where a bad trace π is chosen performs a regression-free error
fix with respect to the good traces TG.
The extension of the FixBad algorithm to wait/signal fixes in Algorithm 4.2 may lead to C ′
not preserving the good preemption-free and sequential behaviours of C. However, in this case,
the input C violates the pre-conditions of the algorithm.
Theorem 4.3.2 (Fair Termination). Assuming that a bad trace will eventually be chosen in line 4
if one exists in C, Algorithm 4.2 terminates for any instantiation of FixBad .
62




P(P , c∗, i)
∃?i∗ s.t.
¬P(P , c∗, i∗)
E = E∪
{i∗}
∃?c∗ : c∗ |= Φ
∃?i∗ s.t.
¬P(P , c∗, i∗)
∃?i∗ s.t.





4.3.1 A Generic Program-Repair Procedure
We now explain how our program-repair algorithm relates to generic synthesis algorithms based
on counter-example guided inductive synthesis (CEGIS) [Solar-Lezama et al., 2006]. In the
CEGIS approach, the input is a partial-program P , i.e., a non-deterministic program and the
goal is to specialise P to a program C so that all behaviours of C satisfy a specification. In our
case, the partial-program would non-deterministically choose between various reorderings and
atomic sections. Let C be the set of choices (e.g., statement orderings) available in P . For a
given c ∈ C, let P(P , c, i) be the predicate that program obtained by specialising P with c
behaves correctly on the input i.
The CEGIS algorithm maintains a set E of inputs called experiments. In each iteration,
it finds c∗ ∈ C such that the ∀i ∈ E : P(P , c∗, i). Then, it attempts to find an input i∗ such
that P(P , c∗, i∗) does not hold. If there is no such input, then c∗ is the correct specialisation.
Otherwise, i∗ is added to E . This algorithm is illustrated in Figure 4.4(left). Alternatively, CEGIS
can be rewritten in terms of constraints on C. For each input i, we associate the constraint φi
where φi(c)⇔ P(P , c, i). Now, instead of E , the algorithm maintains the constraint Φ =
∧
i∈E φi.
Every iteration, the algorithm picks a c such that c |= Φ; tries to find an input i∗ such that
¬P(P , c, i) holds, and then strengthens Φ by φi∗ .
This algorithm is exactly the else branch (i.e., FixBad algorithm) of an iteration in Al-
gorithm 4.2 where i∗ and φi∗ correspond to π and FixBad(π). Intuitively, the initial variable
values in π and the scheduler choices are the inputs to our concurrent programs. This suggests
that the then branch in Algorithm 4.2 could also be incorporated into the standard CEGIS
approach. This extension (dubbed PACES for Positive and Counter-Examples in Synthesis) to
the CEGIS approach is shown in Figure 4.4(right). Here, the algorithm in each iteration may
choose to find an input for which the program is correct and use the constraints arising from it.
We discuss the advantages and disadvantages of this approach below.
63
Constraints vs. Inputs. A major advantage of using constraints instead of sample inputs is
the possibility of using over- and under-approximations. As seen in Section 4.2.1, it is sometimes
easier to work with approximations of constraints due to simplicity of representation at the cost
of potentially missing good solutions. Another advantage is that the sample inputs may have no
simple representations in some domains. The scheduler decisions are one such example – the
scheduler choices for one program are hard to translate into the scheduler choices for another.
For example, the original CEGIS for concurrency work [Solar-Lezama et al., 2008] uses ad-hoc
trace projection to translate the scheduler choices between programs.
Positive-examples and Counter-examples vs. Counter-examples. In standard program-
repair tasks, although the faulty program and the search space C may be large, the solution
program is usually “near” the original program, i.e., the fix is small. Further, we do not want
to change the given program unnecessarily. In this case, the use of positive examples and
over-approximations of learned constraints can be used to narrow down the search space quickly.
Another possible advantage comes in the case where the search space for synthesis is structured
(for example, in modular synthesis). In this case, we can use the correct behaviour displayed by
a candidate solution to fix parts of the search space.
4.4 Implementation and Experiments
We implemented Algorithm 4.2 in our tool CONREPAIR2 consisting of 3300 lines of Scala code.
The model checker CBMC [Clarke et al., 2004] is used for generating both good and bad traces,
and on an average more than 95% of the total execution time is spent in CBMC. Model checking
is far from optimal to obtain good traces, and we expect that techniques from [Sen, 2008] can be
used to generate good traces much faster. Our tool can operate in two modes: In “mixed” mode
it first analyses good traces and then proceeds to fixing the program. The baseline “badOnly”
mode skips the analysis of good traces (corresponds to the algorithm in Chapter 3).
In practice the analysis of bad traces usually generates a large number of potential reorderings
that could fix the bug. Our previous algorithm from Chapter 3 (badOnly ce1) prefers reorderings
over atomic sections, but in examples where an atomic section is the only fix, this algorithm has
2available as open-source software along with benchmarks: https://github.com/thorstent/
ConRepair
64
Table 4.1 Results in iterations and time needed.
File LOC mixed badOnly ce1 badOnly ce2
ex1.c 60 1 2 2
ex2.c 37 2 5 6
ex3.c 35 1 2 2
ex4.c 60 1 2 2
ex5.c 43 1 8 3
ex-regr.c 30 2 2 2
paper1.c 28 1 3 33
dv1394.c 81 1 (13+4s) 51 (60s) 51 (9s)
iwl3945.c 66 1(3+2s) 2(2s) 2(2s)
lc-rc.c 40 10 (2+7s) 179 (122s) 203 (134s)
rtl8169.c 405 7 (10+45m) >100 (>6h) 8 (54m)
usb-serial.c 410 4 (56+20m) 6 (38m) 6 (38m)
poor performance. To address this we implemented a heuristic (ce2) that places atomic sections
before having tried all possible reorderings, but this can result in solutions having unnecessary
atomic sections.
The fall back case in Algorithm 4.1 severely limits further fixes – it forces further fixes
involving the same statements to be atomic sections. Hence, in our implementation, we omit
this step and prefer an unsound algorithm (i.e., not necessarily regression-free) that can fix more
programs with reorderings. While the implemented algorithm is unsound, our experiments show
that even without the fallback, in our examples, there is no regression except for one artificial
example (ex-regr.c) constructed precisely for that purpose.
Benchmarks. We evaluate our tool on a set of examples that model real bugs found and fixed
in Linux device drivers by their developers. To this end, we explored a history of bug fixes in
the drivers subtree of the Linux kernel and identified concurrency bugs. We further focused our
attention on a subset of particularly subtle bugs involving more than two racing threads and/or a
mix of different synchronisation mechanisms, e.g., lock-based and lock-free synchronisation.
Approximately 20% of concurrency bugs that we considered satisfy this criterion. Such bugs
are particularly tricky to fix either manually or automatically, as new races or deadlocks can be
easily introduced while eliminating them. Hence, these bugs are most likely to benefit from good
trace analysis.
Table 4.1 shows our experimental results: the iterations and the wall-clock time needed to
find a valid fix for our mixed algorithm and the two heuristics of the badOnly algorithm. For the
mixed algorithm the time is split into the time needed to generate and analyse good traces (first
65
number) and the time needed for the fixing afterwards. All measurements were done on an Intel
core i5-3320M laptop with 8GB of RAM.
Detailed analysis. The artificial examples ex1.c to ex5.c are used for testing and take
only a few seconds; example paper1.c is the one in Figure 4.1a. Example ex-regr.c was
constructed to show unsoundness of the implementation. Example usb-serial.c models
the USB-to-serial adapter driver. Here, from the good traces the tool learns that two statements
should not be reordered as it will trigger another bug. This prompts them to be reordered above a
third statement together, while the badOnly analysis would first move one, find a new bug, and
then fix that by moving the other statement. Thus, the good trace analysis saves us two rounds of
bug fixing and reduces bug fixing time by 18 minutes.
The rtl8169.c example models the Realtek 8169 driver containing 5 concurrency bugs.
One of the reorderings that the tool considers introduces a new bug; further, after doing the
reordering, the atomic section is the only valid fix. The good trace analysis discover that the
reordering would lead to a new bug, and thus the algorithm does not use it. But, without good
traces, the tool uses the faulty reordering and then ce1 takes a very long time to search through
all possible reorderings and then discover that an atomic section is required. The situation is
improved when using heuristic ce2 as it interrupts the search early. However, the same heuristic
has an adverse effect in the dv1394.c example: by interrupting the search early, it prevents
the algorithm from finding a correct reordering and inserts an unnecessary atomic section. The
dv1394.c example also benefits from good traces in a different way than the other examples.
Instead of preventing regressions, they are used to obtain hints as to what reorderings would
provide coverage for a specific data-flow into assertion edge. Then, if a bad trace is encountered
and can be fixed by the hinted reordering, the hinted reordering is preferred over all other possible
ones. Without hints the dv1394.c example would require 5 iterations. Though hints are not
part of our theory they are a simple and logical extension.
Example lc-rc.c models a bug in an ultra-wide band driver that requires two reorderings
to fix. Though there is initially no deadlock, one may easily be introduced when reordering
statements. Here, the good-trace analysis identifies a dependency between two await statements
and learns not to reorder statements to prevent a deadlock. Without good traces, a large number




Synthesis of Locks and Other
Synchronisation Primitives
5.1 Problem Statement and Illustrative Examples
In Chapters 3 and 4 we introduced a technique to repair concurrent programs using statement
reordering and atomic sections. However, statement reordering only fixes a certain type of bugs
and atomic sections are not directly implementable. In this chapter we introduce a technique that
is able to place locks, wait-signal and barriers in the code. These are common synchronisation
primitives supported in many programming languages.
In the two previous chapters we generalised a concurrent trace into a set of traces that under-
approximates the target trace sets. In this chapter, we present a succinct, complete representation
of such concurrent trace sets, which can drive diverse verification, fault localisation, repair, and
synthesis techniques for concurrent programs. The representation is complete in the sense that it
encodes every trace in the trace set of interest.
Neighbourhood computation. In Chapters 3 and 4 we used partial-order neighbourhoods,
which are not able to represent exactly all bad interleavings of a trace. In this chapter we
move to neighbourhoods that allow disjunctions in the HB-formula. Given a trace π and a
correctness specification, we present a method to generate an HB-formula ϕB representing the
bad neighbourhood of π (see Section 2.2). To generate ϕB , we first encode all the bad executions
in L(Nπ) in a quantifier-free first-order formula Φ such that an execution π is a model of Φ iff π
68
is a bad execution in L(N bπ). We then incrementally construct ϕB. Initially, ϕB is set to false.
In each step: (1) we invoke an SMT solver to obtain a model for Φ that does not belong to the
language of the subset ofN bπ represented by the current ϕB , (2) generalise the trace of the model
into an HB-formula ϕ, and (3) update ϕB by adding ϕ as a disjunct. We iterate until there is no
new model of Φ. The trace generalisation used in each iteration has the following properties: (a)
the model obtained in the iteration satisfies ϕ, and (b) any trace inNπ that satisfies ϕ is bad. The
final HB-formula obtained is an exact representation of N bπ .
While an exact representation is a worthy goal, the corresponding ϕB may not be succinct.
To gain succinctness and utility, we trade in exactness. In particular, we permit the inclusion of
infeasible traces to obtain a succinct HB-formula representing a sound overapproximation of
N bπ . The overapproximation of N bπ is sound in the sense that it is guaranteed to not include any
good traces. To generate such a succinct HB-formula, we enhance the above algorithm. We use
data-flow analysis and minimal unsatisfiability core (unsat core) computation for generalising
the trace of the model into an HB-formula ϕ in step (2) of each iteration. This new trace
generalisation step has the following properties: (a) the model obtained in the iteration satisfies
ϕ, and (b) any trace in Nπ satisfying ϕ is either bad or infeasible.
Complementing ϕB, the succinct representation of a sound overapproximation of N bπ yields
ϕG, a succinct representation of a sound overapproximation of N gπ . Note that complementing
the exact representation of N bπ does not yield an exact representation of N gπ . In fact, our
existing methodology cannot produce an exact representation of N gπ . Figure 5.1 shows the
exact representation of N bπ and the representations for sound overapproximations of N gπ and N bπ
obtained by our method for the example trace shown.
We implemented the above algorithm as a tool TARA and used it to generate (succinct)
representations of trace sets of programs drawn from the software verification competition (SV-
Comp) [Beyer, 2014] and the regression suites of ESBMC [Morse et al., 2014] and CONREPAIR
(Chapter 4).
Synchronisation synthesis. We present a novel algorithm that uses ϕG to synthesise syn-
chronisation for eliminating the bad neighbourhood of π. The algorithm proceeds by apply-
ing rewrite rules to derive synchronisation primitives such as mutex locks, barriers, shared
exclusive locks and wait-signal statements from easily-identifiable patterns in ϕG. For ex-
ample, a missing mutex lock in the example in Figure 5.1 that ensures the statements at TW[1]
69
Figure 5.1 Online banking: This trace is drawn from a program consisting of three threads, one
for withdrawing money, one for depositing money, and one for checking consistency of the bank
account after completion of a withdrawal and a deposit. (In all the examples in this chapter, we represent traces
using typed global variable declarations/initialisations, followed by each thread’s typed local variable declarations and statements. Note that
this representation depicts a trace and not a program.)
init : x := balance; deposited := 0; withdrawn := 0
thread_withdraw
TW[1] : temp := balance;
TW[2] : balance := temp− withdrawal;
TW[3] : withdrawn := 1
thread_deposit
TD[1] : temp2 := balance;
TD[2] : balance := temp2+ deposit;
TD[3] : deposited := 1
thread_checkresult
TC[1] : assume(deposited = 1 ∧ withdrawn = 1);
TC[2] : assert(balance = x+ deposit− withdrawal)
Exact representation of N bπ:
hb(TW[1],TD[2]) ∧ hb(TD[1],TW[2]) ∧ hb(TW[3],TC[1]) ∧ hb(TD[3],TC[1])
Exact representation of N gπ :
(hb(TD[2],TW[1]) ∨ hb(TW[2],TD[1])) ∧ hb(TW[3],TC[1]) ∧ hb(TD[3],TC[1])
Representation of sound overapproximation of N bπ:
hb(TW[1],TD[2]) ∧ hb(TD[1],TW[2])
Representation of sound overapproximation of N gπ :
hb(TD[2],TW[1]) ∨ hb(TW[2],TD[1])
70
and TW[2] in thread_withdraw do not interfere with the statements TD[1] and TD[2] in
thread_deposit is identified by the pattern hb(TD[2],TW[1]) ∨ hb(TW[2],TD[1]) in ϕG. We
have implemented this algorithm as an extension of our tool TARA and used it to successfully
synthesise synchronisation for our benchmarks.
We further demonstrate the applicability of our representations of good and bad neigh-
bourhoods of a trace to bug summarisation and verification based on counterexample-guided
abstraction refinement (CEGAR).
Bug summarisation. Error detection tools based on model checking and static analyses typic-
ally provide counterexample traces to help with program debugging. However, these traces can
be long and encumbered with unnecessary data, providing little insight about the actual bug. We
use ϕB, the representation for a sound overapproximation of a trace’s bad neighbourhood, for
counterexample and bug summarisation. The HB-formula ϕB encodes relevant ordering inform-
ation about all counterexamples in the neighbourhood of π and can be viewed as a stand-alone
counterexample summary. While this can already be useful feedback for a human debugger, we
present a set of rules to infer specific bugs such as data races, atomicity violations, two-stage
access bugs and define-use order violations. These rules work by identifying particular patterns
in ϕB and combining them with some lightweight data-flow information. We have extended
TARA for bug summarisation and evaluated it on our benchmarks.
Accelerating CEGAR. We also recognise an application of our representation of bad neigh-
bourhoods of abstract counterexamples in accelerating CEGAR for concurrent programs. CE-
GAR often takes many iterations to find the right predicates for proving correctness of a program.
There is a number of prior work to enhance the CEGAR loop by finding better predicates,
e.g. [Beyer et al., 2007; Sharma et al., 2012]. In the setting of hardware model-checking (for
circuits), Glusman et al. [Glusman et al., 2003] extend the CEGAR loop by adding several predic-
ates if a spurious counterexample is found; they generate all counterexamples of the same length
and gather information about valuations crucial to the incorrectness of the counterexamples. In a
similar setting, Wang et al. [Wang et al., 2006] improve the CEGAR by introducing a technique
to eliminate all spurious counterexamples for an invariant. Sakunkonchak et al. [Sakunkonchak
et al., 2007] apply CEGAR optimisations to software model checking and speed up the search
for predicates that make the counterexample spurious. However, they do not use interpolants and
71
instead search the counterexample for conflicting predicates. Bjesse et al. [Bjesse and Kukula,
2004] use predicates obtained from CEGAR to guide bounded-model checking (BMC) and
extend its reach.
The choice of refinement procedure usually determines the number of iterations necessary.
Many heuristics have been proposed to find relevant predicates quickly, e.g., [Beyer et al.,
2007]. This problem is compounded in concurrent program verification, where the existence of a
large number of interleavings can delay the discovery of interesting spurious counterexamples
that lead to relevant predicates. We present a new predicate learning procedure that uses the
HB-formula ϕB representing the bad neighbourhood of a spurious counterexample of an abstract
concurrent program. In each iteration of the CEGAR loop, our procedure refines the abstraction
to eliminate multiple spurious abstract counterexamples drawn from ϕB , using a method similar
to beautiful interpolants [Albarghouthi and McMillan, 2013]. We have integrated our TARA-
based refinement procedure within SATABS [Clarke et al., 2005] and have been able to reduce
the number of iterations needed to verify various example programs.
Highlights. We introduce a novel representation for concurrent trace sets based on HB-formulæ
(Section 5.2). HB-formulæ have several useful properties. They can express arbitrary finite trace
sets. They enable efficient computation and concise expression of unions over trace sets. This is
exploited by our tool TARA to compute succinct representations of sound overapproximations of
good and bad neighbourhoods of a trace. HB-formulæ are an intuitively appealing representation
for trace sets. They can reveal specific patterns of causality relations between events that can
drive diverse verification, fault localisation, repair, and synthesis techniques for concurrent
programs. We demonstrate the use of our tool in three applications — synchronisation synthesis
(Section 5.3), bug summarisation (Section 5.4), and CEGAR acceleration (Section 5.5).
5.2 Computing Good and Bad Neighbourhoods
In this section, we present an algorithm for computing an exact representation for the bad
neighbourhood of a trace. However, as this representation may be unwieldy and complex,
we further provide an algorithm to produce sound overapproximations of N bπ and N gπ , i.e., to
find succinct HB-formulæ ϕG and ϕB such that N gπ ⊆ [[ϕG]], N bπ ⊆ [[ϕB]], and [[ϕG]] ∩ N bπ =
72
[[ϕB]]∩N gπ = ∅. Note that ϕG∧ϕB is not necessarily false, because ϕG∧ϕB may still represent
infeasible traces.
We use two different formalisms to express statements.
• Guarded actions. Here, a statement from thread Ti is either a guarded action assume(G)→
assign or an assertion assert(G), where G is a Boolean expression over Vi and assign
is a parallel assignment var 1, . . . , var m := expr1, . . . , exprm of expressions over Vi to
variables in Vi. We use just := to indicate no assignment is happening (i.e. the statement
equals assume(G)).
• Transition predicates. Here, a statement from thread Ti is a predicate over variables from
Vi ∪ V ′i where V ′i contains primed versions of variables in Vi. Intuitively, variables from
Vi and V ′i represent the values of program variables before and after the execution of
the statement, respectively. For example, the assignment x := x + y is represented as
x′ = x + y. The advantage of this formalism is that it can express non-deterministic
statements which we need to model abstract programs in Section 5.5. Assertions are
represented as before, i.e., as assert(G), where G is a Boolean expression over Vi.
The transition predicate formalism is used exclusively for CEGAR accelaration in Section 5.5.
For traces of programs we use the guarded actions formalism.
Translating traces to guarded actions. In Figure 5.2 we present a translation function that
translates statements in traces into guarded actions. We do not consider havoc, input and output
statements as they are used primarily for the implicit specification. To indicate what branch the
trace took for loops and conditionals we use the words then, else, loop and exitloop. Statements
inside an atomic section are merged into one translated statement.
Encoding bad executions. Given a trace π, our algorithm is based on constructing a quantifier
free first-order formula that represents all bad executions in L(Nπ). We use the concurrent trace
program encoding [Wang et al., 2009] which is based on a concurrent single static assignment
(CSSA) form of traces. We recall the encoding below to make the presentation self-contained.
We present the encoding for the case where statements are expressed as guarded actions; the
case where statements are expressed as transition predicates is similar. Given a trace π, we first
rewrite it into the CSSA form.
73
Figure 5.2 Translation of statement with identifier ` to guarded action
ShVar := LoExp = true→ ShVar := LoExp
LoVar := ShExp = true→ LoVar := ShExp
if (ShExp) then = ShExp → :=
if (ShExp) else = ¬ShExp → :=
while ShExp loop = ShExp → :=
while ShExp exitloop = ¬ShExp → :=
assert(ShExp) = assert(ShExp)
await(ShExp) = ShExp → :=
lock(LkVar) = LkVar = 0→ LkVar := tid(`)
unlock(LkVar) = LkVar = tid(`)→ LkVar := 0
wait(CondVar) = CondVar = 1→ :=
wait_not(CondVar) = CondVar = 0→ :=
signal(CondVar) = true→ CondVar := 1
reset(CondVar) = true→ CondVar := 0
wait_reset(CondVar) = CondVar = 1→ CondVar := 0
assume(GrdVar) = GrdVar = true→ :=
assume_not(GrdVar) = GrdVar = false→ :=
GrdVar ← GrdExpr = true→ GrdVar := GrdExpr
• For each variable v, we introduce a unique name vw,` for each event ` that may change the
value of v (here, w stands for “write”). Further, for each variable v, we introduce a unique
name vι to represent the value of v at the start of an execution.
• For each event ` that reads a variable v, we replace v as follows:
– If v is a local variable, we replace v by vw,`′ where `′ is the most recent event from
the thread that writes to v; and
– If v is a shared variable, we replace v by vr,` (where r stands for “read”) and we store
an additional constraint, where vr,` = π(vι, vw,`1 , vw,`2,, . . . , vw,``) where `i ranges
over all events from other threads that write to v and the most recent event from the
same thread that writes to v.
The π-functions above are analogous to the φ-functions used to express joins in sequential
single static assignment encodings, i.e., vr,` = π(vι, vw,`1 , . . . , vw,``) expresses that ` reads
either the initial value of v, or the value written by one of `1, . . . , ``.
• Further, for each event `, we define the condition that ` is feasibly reached. If ` is the first
event in a thread, we set cond(`) = true. Otherwise, cond(`) depends on the previous
event from the same thread in π (say `′). If `′ is an assertion, we let cond(`) = cond(`′).
Otherwise, `′ is a guarded action assume(G)→ assign, and we let cond(`) = cond(`′)∧G.
Example 5.2.1. In the running example from Figure 5.1, the statement TW[1] : temp := balance
74
would be encoded as tempw,TW[1] = balancer,TW[1] ∧ balancer,TW[1] = π(balanceι, balancew,TD[2]).
Given a trace π rewritten in the CSSA form, the following constraints encode executions in
the neighbourhood Nπ of π:
• Thread orders. In any execution in the neighbourhood of π, the order of events in
each thread is the same as in the trace π. We define ΦPO =
∧
{hb(`i, `j) | tid(`i) =
tid(`j) ∧ `i <π `j}.
• Variable assignments. This part of the encoding is a direct translation of the assignments







i, where ` ranges over
events of the form T[`] : stmt with stmt being assume(G)→ v1w,`, . . . , vmw,` := expr1, . . . , exprm.
• π-constraints. Each π-constraint chooses a value for a read of a shared variable from
possible writes. Formally, each condition vr,` = π(vι, vw,`1 , . . . , vw,``) is rewritten as
[vr,` = vι ∧
∧
i hb(`, `i)] ∨
∨`
i=1[vr,` = vw,`i ∧ cond(`i) ∧ hb(`i, `) ∧
∧
j 6=i(hb(`j, `i) ∨
hb(`, `j))]. Intuitively, the above formula states that: (a) the value of v read by ` is
either the initial value of v (denoted as vι) or written by one of `1, . . . , ``; (b) if the
value is the initial value, all `i happen after `; and (c) if the value is written by `i,
then `i is feasibly reached and all conflicting writes either happen before `i or after `.
We denote by ΦPI the conjunction of all such π-constraints. For example, for the π-
function from Example 5.2.1, the corresponding constraint is (balancer,TW[1] = balanceι ∧
hb(TW[1],TD[2])) ∨ (balancer,TW[1] = balancew,TD[2] ∧ hb(TD[2],TW[1])).
• Correctness condition. For correctness, if an assertion event ` = T[`] : assert(G`)
is feasibly reached, then G` must hold. Hence, the correctness condition is ΦCOR =∧
`(cond(`)⇒ G`) where ` ranges over assertion events.
The final encoding for bad executions is given by ΦCTP(π) = ΦPO ∧ΦVD ∧ΦPI ∧ ¬ΦCOR. We
also encode the complementary correctness condition as ΦCTP(π) = ΦPO ∧ΦVD ∧ΦPI ∧ΦCOR.
For convenience, we use an auxiliary formula ΦFEA to represent the condition that each
assumption must hold. We have ΦFEA =
∧
` cond(`) where ` ranges over all events.
An execution π corresponds to a model V of ΦCTP if: (a) the value of each vι in V is the
initial value of v in π; (b) the value of each vr,` in V is the value of v read by ` in π; (c) the value
of each vw,` in V is the value of v written by ` in π; and (d) the value of hb(`i, `j) in V is true if
and only if `i occurs before `j in π.
75
Theorem 5.2.2. Given a trace π, (a) for every model V of ΦCTP(π) there is a bad execution
π ∈ L(N bπ) such that π corresponds to V; and (b) for every π ∈ L(N bπ) there is a model V of
ΦCTP(π) such that π corresponds to V .
Bad neighbourhood computation. Armed with ΦCTP — an SMT encoding of bad executions
in the neighbourhood of a trace π — we now present an algorithm to compute a representation
of N bπ . Algorithm 5.1 proceeds by repeatedly computing satisfying assignments to ΦCTP using
an SMT solver (lines 2 and 3), and accumulating the HB-formulæ in the models (lines 4 and 5).
We conjoin ΦCTP with additional constraints to ensure that the same satisfying assignments are
not returned each time.
Algorithm 5.1 Computing the bad neighbourhood of a trace
Require: Trace π
Ensure: HB-formula ϕB such that N bπ = [[ϕB]].
1: Φ← ΦCTP(π); ϕB ← false
2: while Φ ∧ ¬ϕB is satisfiable do
3: V ← satisfying assignment for Φ ∧ ¬ϕB
4: ϕ′B ←
∧
{hb(`, `′) | V |= hb(`, `′)}
5: ϕB ← ϕB ∨ ϕ′B
6: end while
7: return ϕB
Overapproximating bad neighbourhoods. While Algorithm 5.1 computes an exact repres-
entation of N bπ , it is inefficient in practice. Hence, we forgo the goal of an exact representation.
Instead, we compute a sound overapproximation of N bπ , which may include infeasible traces, but
not good traces. Given trace π, Algorithm 5.2 computes sound overapproximations of N bπ and
N gπ . Algorithm 5.2 performs several optimisations with respect to Algorithm 5.1 to accumulate
weaker constraints from each model of ΦCTP , i.e., Algorithm 5.2 attempts to accumulate larger
subsets of Nπ into ϕB in each iteration.
• Data-flow analysis. From the model V of ΦCTP(π), the data-flow analysis retains those
happens-before constraints (ϕ′B) that are necessary to preserve the data-flow into the failing
assertion in the corresponding execution. We use the function DF V(`) (line 5) to compute
constraints that ensure ` can be feasibly reached and can read the same variable values as in
V . Given the execution corresponding to V , let reads(`), readsG(`), and srcEvent(v, `)
represent the variables read by `, the variables read by ` in the guard (if ` is not a guarded
76
assignment, readsG(`) = ∅), and the event that writes the value of v read by `. We have
DF V(`) = DF
1
V(`) ∪ DF 2V(`) where:












{(v, srcEvent(v, `′), `′)} ∪ DF V(srcEvent(v, `′))
]
where event `′ ranges over E = {`′ | tid(`) = tid(`′) ∧ V |= hb(`′, `)}.
Intuitively, DF 1V ensures that ` can read the same values as in V and DF 2V ensures that ` is
feasibly reached. We then get additional constraints ADF necessary to ensure conflicting
writes do not affect the data-flow into the assertion (line 6).
• Unsatisfiable core computation. Next, we perform two rounds of generalisation on
ϕB′ through unsatisfiable core computation. In the first round, we construct a formula
ϕB′ ∧ Choices(V) ∧ ΦCTP(π) where Choices(V) fixes the initial variable values to the
ones from V (line 10). A satisfying assignment to this formula models executions where no
failing assertion is feasibly reached. Therefore, if the formula is unsatisfiable, the happens-
before constraints from the unsatisfiable core (line 12) ensure that all executions satisfying
Choices(V) are bad. Note that if all statements are deterministic, the above formula is
always unsatisfiable. In the second round (line 10), we follow a similar procedure, but
with the formula ϕB′ ∧ ΦFEA ∧ ΦCTP . Here, a model is a good execution and hence,
the constraints from the unsatisfiable core (line 10) ensure that any feasible execution is
necessarily bad.
Roughly, the first round allows us to generalise the HB-formula in the case of data-
dependent bugs. The second round lets us generalise further in the case of data-independent
bugs.
The sound overapproximation, ϕG, of N gπ is obtained by complementing ϕB (line 22). Note
that ϕB returned is in disjunctive normal form (DNF), while ϕG is in conjunctive normal form
(CNF).
Theorem 5.2.3. For a trace π, if Algorithm 5.2 returns (ϕB, ϕG), thenN bπ ⊆ [[ϕB]],N gπ ⊆ [[ϕG]],
and [[ϕG]] ∩N bπ = [[ϕB]] ∩N gπ = ∅.
The complexity of Algorithm 5.2 originates from SMT check in line 2, which is NP-complete.
A comparison with Algorithm 3.1 is not meaningful, because Algorithm 5.2 basically solves the
verification problem for a straight-line program.
77
Algorithm 5.2 Computing sound overapproximations of the bad and good neighbourhoods of a
trace
Require: Trace π
Ensure: HB-formulæ (ϕB, ϕG) such that N gπ ⊆ [[ϕG]], N bπ ⊆ [[ϕB]], and [[ϕG]] ∩ [[ϕB]] = ∅.
1: Φ← ΦCTP(π); ϕB ← false
2: while Φ ∧ ¬ϕB is satisfiable do
3: V ← satisfying assignment for Φ ∧ ¬ϕB
4: . Data-flow analysis







{(v, `k, `i) |





9: . Unsat-core computation
10: Choices(V)←
∧
v∈V vι = V [vι]
11: if ϕB′ ∧ Choices(V) ∧ ΦCTP(π)) is unsatisfiable then
12: ϕB′ ← MinUNSATCore(Soft← ϕB′ ,
13: Hard← Choices(V) ∧ ΦCTP(π))
14: end if
15: if ϕB′ ∧ ΦFEA(π) ∧ ΦCTP(π)) is unsatisfiable then
16: ϕB′ ← MinUNSATCore(Soft← ϕB′ ,
17: Hard← ΦFEA(π) ∧ ΦCTP(π))
18: end if
19: ϕB ← ϕB ∨ ϕB′
20: end while
21: ϕG ← ¬ϕB
22: return (ϕB, ϕG)
78
5.3 Synchronisation Synthesis
We use the representation of a sound overapproximation of the good neighbourhood of a trace
π (returned as ϕG by Algorithm 5.2) to synthesise synchronisation that eliminates the bad
neighbourhood of π. Missing synchronisation primitives such as locks, barriers, and wait-signal
statements present themselves as easily identifiable HB-formula patterns in ϕG. Our algorithm
derives the required synchronisation using rules that rewrite such patterns into the corresponding
primitives.
Synchronisation primitives. We first describe various synchronisation primitives that we
derive. Recall from Section 2.2.1 that we use the notation T[`] to refer to events labelled with
T[`], and the notations T[` : `′] and T[L] to refer to corresponding event sequences.




denotes a wait to make T2[`2] wait
for T1[`1] to complete, and a signal to make T1[`1] signal T2[`2] upon completion.
2. Locks. A lock lock(T1[L1], . . . ,Tn[Ln]) denotes a common lock protecting each event se-
quence Ti[Li], i ∈ [1, n], to ensure that these event sequences cannot execute concurrently.
3. Barriers. A barrier Barrier
(
T1[`1], . . . ,Tn[`n]
)
at location `i of thread Ti, i ∈ [1, n],
prevents each thread Ti from proceeding beyond `i until every other thread Tj reaches `j.
In other words, Ti cannot execute the event at `i until every other Tj executes the event at
`j − 1.
4. Shared-exclusive locks. A shared-exclusive lock (or, a readers-writers lock)
ShExLock
(
Sh :Ts1[Ls1], . . . ,Tsn[Lsn], Ex :Tx1[Lx1], . . . ,Txm[Lxm]
)
permits concurrent exe-
cution of all event sequences Tsi[Lsi], i ∈ [1, n], while preventing concurrent execution
of (a) any two Txi[Lxi] and Txj[Lxj] with i 6= j, and (b) any Txi[Lxi] and Tsj[Lsj].
Rewriting ϕG to derive synchronisation. During the rewrite process below, we use disjunct-
ive formulæ (denoted by ψ) where each disjunct is either an atomic hb-constraint of the form
hb(Ti[`i],Tj[`j]), or a synchronisation primitive. For a trace π, we repeatedly apply the rewrite
rules from Figure 5.3 on ϕG (in CNF, as returned from Algorithm 5.2) until no more rules
are applicable. The ADD.WAITSIGNAL, ADD.LOCK and ADD.BARRIER rules introduce the
wait-signal, locks, and barrier primitives. The MERGE.LOCKS rule merges locks across pairs of
threads, while the MERGE.LOCKS.DEADLOCKS rule merges locks that can potentially lead to
deadlocks. The MULTITHREAD.LOCK and MULTITHREAD.BARRIER rules inductively derive
79
Figure 5.3 Rewrite rules for synchronisation synthesis
hb(T1[`
′
































i=1 lock(Ti[Li],Tn+1[Ln+1]) ∨ ψ












′] ⊆ T1[L1] T2[L2′] ⊆ T2[L2]
lock(T1[L1],T2[L2]) ∨ ψ
MERGE.LOCKS





















































































locks and barriers spanning multiple threads. The ADD.SHAREDEXCLUSIVELOCK rule derives
a shared exclusive lock from already inferred locks. Since ϕG, as generated by Algorithm 5.2, is
already optimised, we do not merge WaitSignal primitives.
We explain two of the above rules here. The premise of the ADD.LOCK rule asks for two
event sequences T1[`1 : `′1] and T2[`2 : `
′
2] such that one of them has to finish execution before the
other starts, i.e., hb(T1[`′1],T2[`2])∨hb(T2[`′2],T1[`1]). Equivalently, the two event sequences do
not execute concurrently. This is enforced by the lock lock(T1[`1 : `′1],T2[`2 : `
′
2]). The premise
of the MERGE.LOCKS.DEADLOCKS rule looks for two already derived locks, acquired by two
threads in different orders (which may lead to a deadlock), and merges these locks into one.
Note that the rewriting process always terminates. However, depending on the order of
rules applied, we may obtain different formulæ. Upon termination, we get a CNF formula over
synchronisation primitives. We pick a set S of synchronisation primitives, consisting of one
primitive from each conjunct. Let CS be the program obtained by inserting each synchronisation
80
primitive in S into the corresponding position in the original concurrent program C.
Theorem 5.3.1 (Soundness of rewrite rules). Given a trace π, let CS be obtained as described
above. Let π ∈ L(Nπ) be a deadlock-free execution of CS . Then π /∈ L(N bπ), i.e., π is not bad.
While CS is not guaranteed deadlock-free, we perform simple consistency checks when
choosing S to prevent obvious deadlocks. For example, we ensure that WaitSignal primitives
in S do not introduce ordering cycles over locs(π).
Note that our rewrite rules are by no means complete. It may be possible to derive the
above synchronisation primitives using different rules that represent other scenarios. Further,
our rewrite system can also be extended to other synchronisation primitives. We now present
examples illustrating the application of our rules.
Example 5.3.2. For the example trace shown in Figure 5.1, ϕG is given by hb(TD[2],TW[1]) ∨
hb(TW[2],TD[1]). Applying the ADD.LOCK rewrite rule yields lock(TW[1 : 2],TD[1 : 2]).
Example 5.3.3. For the example trace shown in Figure 5.4(a), ϕG is given by (hb(TF[4],TS[3])∨
hb(TS[4],TF[3])) ∧ hb(TS[4],TF[5]) ∧ hb(TF[4],TS[5]). Applying the ADD.LOCK rewrite rule
yields: lock(TF[3 : 4],TS[3 : 4])∧hb(TS[4],TF[5])∧hb(TF[4],TS[5]). Applying the ADD.BARRIER





Example 5.3.4. For the example trace shown in Figure 5.4(b), ϕG is as shown. The disjuncts ψ1
and ψ2 are not relevant for this example except for the fact that ψ1 is common to the 3rd and 4th
conjuncts, ψ2 is common to the 5th and 6th conjuncts and ψ1 6= ψ2.
• Applying ADD.LOCK yields: hb(TI[2],TF[2])∧hb(TI[2],TS[2])∧(lock(TF[4],TS[3 : 4])∨
ψ1)∧(lock(TF[3 : 4],TS[4])∨ψ1)∧(lock(TF[4],TS[3 : 4])∨ψ2)∧(lock(TF[3 : 4],TS[4])∨
ψ2).
• Applying MERGE.LOCKS next yields: hb(TI[2],TF[2])∧hb(TI[2],TS[2])∧(lock(TF[3 : 4],
TS[3 : 4]) ∨ ψ1) ∧ (lock(TF[3 : 4],TS[3 : 4]) ∨ ψ2).









∧ (lock(TF[3 : 4],TS[3 : 4]) ∨ ψ1) ∧
(lock(TF[3 : 4],TS[3 : 4]) ∨ ψ2).
Note that the MERGE.LOCKS rule does not apply to the last two conjuncts as ψ1 6= ψ2. One pos-








, lock(TF[3 : 4],TS[3 : 4])}.
81
Figure 5.4 Example programs
init: value1 := 1; value2 := 2;
value3 := 4; value4 := 8; sum := 0;
flag1 := 0; flag2 := 0
thread_firsthalf
TF[1] : localsum1 := value1;
TF[2] : localsum1 := localsum+ value2;
TF[3] : temp1 := sum;
TF[4] : sum := temp1+ localsum1;
TF[5] : value1 := value1/sum;
TF[6] : value2 := value2/sum;
TF[7] : flag1 := 1
thread_secondhalf
TS[1] : localsum2:=value3;
TS[2] : localsum2:=localsum2+ value4;
TS[3] : temp2:=sum;





TC[1] : assume(flag1 = 1 ∧ flag2 = 1);
TC[2] : assert(value1 + value2 + value3 +
value4 = 1)
ϕG: (hb(TF[4],TS[3]) ∨ hb(TS[4],TF[3])) ∧
hb(TS[4],TF[5]) ∧ hb(TF[4],TS[5])
(a) Normalisation. The goal of the program this trace
is drawn from is to normalise a set of values such
that their sum computes to 1. The program consists
of three threads. The first and second thread process
one half each of the set of values. Once the first and
second thread run to completion, the third thread








TF[1] : assume(intrmask = 1);
TF[2] : assert(initdone = 1);
TF[3] : temp1:=workqueueitems;
TF[4] : workqueueitems:=temp1+ 1;
TF[5] : interrupts:=interrupts+ 1
thread_second_irqhandler
TS[1] : assume(intrmask = 1);
TS[2] : assert(initdone = 1);
TS[3] : temp2:=workqueueitems;
TS[4] : workqueueitem:=temp2+ 1;
TS[5] : interrupts:=interrupts+ 1
thread_checkworkqueue
TC[1] : assert(workqueueitems ≥ interrupts);
ϕG: hb(TI[2],TF[2]) ∧ hb(TI[2],TS[2])∧
(hb(TS[4],TF[4]) ∨ hb(TF[4],TS[3]) ∨ ψ1) ∧
(hb(TF[4],TS[4]) ∨ hb(TS[4],TF[3]) ∨ ψ1)∧
(hb(TS[4],TF[4]) ∨ hb(TF[4],TS[3]) ∨ ψ2) ∧
(hb(TF[4],TS[4]) ∨ hb(TS[4],TF[3]) ∨ ψ2)
(b) Interrupt handler (simplified snippet of the Linux
RealTek 8169 network driver). Once the intrmask
variable is set by the interruptmaskset thread, the
hardware starts producing interrupts. The handling of
these interrupts, by the two irqhandler threads, is cor-
rect only if the driver initialisation is complete (captured
by the initdone variable). The irqhandlers add
items to a workqueue; the addition of items is modelled
using a counter workqueueitems. The variable
interrupts counts the total number of interrupts
handled by the irqhandler threads and the thread
checkworkqueue uses interrupts to check for
inconsistencies in the workqueue.
82






TN[1] : assume(registered 6= 0);
TN[2] : assert(∗hw_start = drv_hw_start)
/* pointer dereference */
void drv_hw_start() {
/* does something */
}
ϕB : hb(TN[2],TP[2])
(c) Network device initialiser. This trace is drawn
from a simplified snippet of the Linux RealTek 8169
network driver. The pci thread signals that a network
device is registered using the variable registered
and sets hw_start to point to the drv_hw_start
method. The network thread calls drv_open once
the network device is registered. The drv_open
method dereferences the hw_start pointer.
globals: int[] pagetable, memory;
init: pagetable[1] = 5, memory[5] =
10;
thread_pagetableaccess:
locals: int loc, data, page;
TP[1]: page := 1;
TP[2]: loc := pagetable[page];
TP[3]: data := memory[loc];
TP[4]: assert (data = 10);
thread_datamove:
locals: int page, newloc, loc;
TD[1]: page, newloc := 1, 20;
TD[2]: loc := pagetable[page];
TD[3]: pagetable[page] := newloc;
TD[4]: memory[newloc] :=
memory[loc];
ϕB : hb(TD[3],TP[2]) ∧ hb(TP[3],TD[4])
(d) Page-table. The pagetableaccess thread reads
a memory location loc from pagetable and reads
data from that memory location. The datamove
thread reads the current memory location loc from
pagetable, updates pagetable with a new memory
location newloc and copies the data from the old
memory location to the new memory location.
5.4 Bug Summarisation
Next we show that the representation for a sound overapproximation of the bad neighbourhood
of a trace π (returned as ϕB by Algorithm 5.2) is useful for counterexample summarisation and
bug summarisation. The HB-formula ϕB encapsulates relevant ordering information about all
counterexamples in the neighbourhood of π and can be viewed as a stand-alone counterexample
summary. For instance, in Figure 5.4(c), one may view ϕB = hb(TN[2],TP[2]) as a counter-
example summary that indicates a possible order violation. While such a bug report can already
be useful to a human debugger, a cursory examination of the data-flow through the events in ϕB
can enable formulation of a more precise bug summary. To this end, we present a set of rules to
help infer specific bugs such as data races, define-use order violations and two-stage access bugs.
5.4.1 Inferring Bug Summaries from ϕB
We assume ϕB is in DNF. Our inference rules are presented in Figure 5.5. For a thread T, a
location `, and a global program variable v, (a) read(T[`], v) denotes that event T[`] reads from
v, (b) write(T[`], v) denotes that event T[`] writes to v, and (c) access(T[`], v) denotes that
83
Figure 5.5 Inference rules for bug summarisation (In this figure, `1 < `′1 < `′′1 and `2 < `′2 < `′′2 .)
hb(T1[`
′






















DataRace({T1[`′1],T1[`′′1 ]}, {T2[`′2],T2[`′′2 ]})
DATARACE.2
hb(T1[`1],T2[`2]) ∧ hb(T2[`2],T1[`′1]) ∧ ψ









2]) ∧ hb(T2[`2],T1[`′1]) ∧ ψ











hb(T1[`1],T2[`2]) ∧ hb(T2[`′2],T1[`′1]) ∧ ψ
write(T1[`1], v) write(T1[`
′









hb(T1[`1],T2[`2]) ∧ hb(T2[`′2],T1[`′1]) ∧ ψ
read(T1[`1], v) read(T1[`
′









hb(T1[`1],T2[`2]) ∧ ψ read(T1[`1], v) write(T2[`2], v)
∃σ ∈ Nπ : σ |= hb(T1[`1],T2[`2]) ∧ ψ ∧
∧
T[`] write(T[`], v) ∧ T[`] 6= T1[`1]⇒ hb(T1[`1],T[`])
DefineUse(T1[`1],T2[`2])
DEFINEUSE
event T[`] reads from or writes to v. In the discussion below, we combine these with ordering
constraints in a natural manner. For example, read(T1[`1], v)→ write(T2[`2], v) says that event
T1[`1] happens before T2[`2] and that read(T1[`1], v) and write(T2[`2], v) hold.
Data races. Take two events `′1; `′′1, where event `′1 has statement r:=v + 1, event `′′1 has
statement v:=r, and r is a local variable modelling a register. In this case, a data race between
events `′1; `
′′
1 and some other event `2 writing to v in another thread manifests itself in a trace
σ as the ordering pattern `′1 <σ `2 <σ `
′′
1. Hence, the DATARACE.1 rule infers a possible
data race between events labelled T1[l′1],T1[l
′′
1], and T2[l2] if the pattern read(T1[l
′
1], v) →
access(T2[l2], v)→ write(T1[l′′1], v) is found in ϕB.




2 reads from v and `
′′
2 writes to v, a data
race can manifest in a trace σ as `′1 <σ `
′′







2], if the patterns read(T1[`
′
1], v)→ write(T2[`′′2], v)
and read(T2[`′2], v)→ write(T1[`′′1], v) is found in the same disjunct of ϕB.
84
Atomicity violations. The ATOMICITYVIOLATION rules generalise the DATARACE rules. If
the data-flow and ordering pattern access(T1[`1], v) → access(T2[`2], v) → access(T1[`′1], v)
manifests inϕB , the first rule infers a possible atomicity violation of the event sequence T1[`1 : `′1]
via event T2[`2]. If the patterns access(T1[`1], v)→ access(T2[`′2], v) and access(T2[`2], v)→
access(T1[`
′
1], v) manifest in the same disjunct of ϕB , the second rule infers a possible atomicity
violation of the event sequence T1[`1 : `′1] and event sequence T2[`2 : `
′
2].
Two stage access. The TWOSTAGEACCESSBUG rules capture two classic scenarios of two-
stage access bugs, indicating violations of some consistency requirement of accesses to v and
w. In particular, the values of v and w read by a thread could be inconsistent if either of the fol-
lowing patterns manifest in ϕB: (a) write(T1[`1], v)→ read(T2[`2], v)→ read(T2[`′2], w)→
write(T1[`
′
1], w); or (b) read(T1[`1], v)→write(T2[`2], v)→ write(T2[`′2], w)→ read(T1[`′1], w).
Define-use ordering. The DEFINEUSE rule infers a specific type of order violation indicating
the use of a variable before it is defined. Given ϕB in DNF, if the ordering read(T1[`1], v)→
write(T2[`2], v) manifests in a disjunct δ, the rule infers a define-use order violation if there
exists a trace σ ∈ Nπ such that σ satisfies δ and T1[`1] precedes all events that write to v in σ.
Starting from ϕB given in DNF, we repeatedly apply the inference rules from Figure 5.5 until
no more rules are applicable. We report all inferred bugs as possible violations. Note that our
goal here is only to assist the user in program debugging. Our inference rules are not complete.
We do not claim that our inferred bugs will manifest in the program’s executions, or that they
will match a human debugger’s intuition. We now present examples illustrating the application
of some of our bug inference rules.
Example 5.4.1. For the example trace shown in Figure 5.1, ϕB is given by hb(TW[1],TD[2]) ∧
hb(TD[1],TW[2]). Since read(TW[1], balance), write(TW[2], balance), read(TD[1], balance) and
write(TD[2], balance) hold, we can apply the DATARACE.2 rule to infer a DataRace(W[1 : 2], Y[1 : 2]).
Note that this bug inference matches the synchronisation lock(TW[1 : 2],TD[1 : 2]) synthesised in
Example 5.3.2.
Example 5.4.2. Consider the example trace shown in Figure 5.4(c). In our encoding, the
pointer hw_start is modelled as an integer variable hw that is initially 0 (since hw_start is
uninitialised). The pointer dereference in TN[2] is modelled as assert(hw > 0). For this example,
ϕB is given by hb(TN[2],TP[2]). Since read(TN[2], hw) and write(TP[2], hw) hold, and trace
85
TP[1],TN[1],TN[2],TP[2] satisfies the last premise of the DEFINEUSE rule, we can apply the rule
to infer a define-use order violation between TN[2] and TP[2].
Example 5.4.3. For the example trace shown in Figure 5.4(d), ϕB is given by hb(TD[3],TP[2])∧
hb(TP[3],TD[4]). Sincewrite(TD[3], pagetable),write(TD[4],memory), read(TP[2], pagetable)
and read(TP[3],memory) hold, we can apply the TWOSTAGEACCESSBUG.1 rule to infer
TwoStageAccessBug(TD[3 : 4],TP[2 : 3]).
5.5 Accelerating CEGAR
Finally, we present a procedure for learning predicates for refinement in a CEGAR loop [Clarke
et al., 2000], with the help of TARA. An abstraction-refinement loop proceeds by building an
abstract model of an input program and applying a model-checker on the abstract model. If the
abstract model satisfies the correctness specification, then the input program is correct. Otherwise,
the model-checker finds an abstract counterexample, i.e., an execution in the abstract model. The
abstraction counterexample is spurious if there is no concrete execution that corresponds to the
abstract counterexample. Given a spurious counterexample, the refinement procedure refines
the abstract model. This is done by finding predicates that inform the abstraction procedure to
construct the next abstract model by adding the relevant details to the current abstract model
such that the spurious counterexample is absent from next abstract model. The process starts
over with the newly refined abstraction. After a number of iterations, the abstract model may
have no more counterexamples, which proves the correctness of the input program. For simpler
presentation, we assume that the input program is correct and all the abstract counterexamples
are spurious.
An abstraction-refinement loop often takes many iterations to find the right set of predicates
to prove correctness of the input program. This usually depends on the design of the refinement
procedure. Many heuristics have been proposed to find the relevant predicates in fewer iterations
(see, for example, [Beyer et al., 2007]). We aim to use TARA to accelerate the search for the
right predicates, i.e., reduce the number of iterations of a CEGAR loop.
Our refinement procedure takes a concurrent abstract counterexample as input and returns
refinement predicates. First, we analyse the counterexample using TARA and obtain an HB-
formula ϕB that encodes a set of incorrect interleavings. We sample a number of interleavings
from ϕB and attempt to compute refinement predicates that simultaneously remove all the
86
sampled spurious inter-leavings using a method similar to beautiful interpolants [Albarghouthi
and McMillan, 2013].
5.5.1 Abstraction and Refinement
An abstract model of a concurrent program C = 〈V, {T1, . . . ,Tk}, SV, 〈LV1, . . . , LVk〉〉 is
another concurrent program Ĉ = 〈V, {T̂1, . . . , T̂k}, SV, 〈LV1, . . . , LVk〉〉 such that, for each
i ∈ [1, k] and event ` in Ti, there is an event ˆ̀ in T̂i such that if S0`S1 is feasible then S0 ˆ̀S1 is
feasible.
In predicate abstraction, the abstract event ˆ̀corresponding to an event ` is defined using a
set of predicates as follows. Let us suppose predicates ρ1, . . . , ρm are used for abstraction. Let
i ∈ [1,m]. Let βi be the weakest precondition of ` over ρi, and γi be the weakest precondition of
` over ¬ρi. Let β̂i and γ̂i be the weakest formulæ that are Boolean combinations of ρ1, . . . , ρm,
and imply βi and γi, respectively. S0 ˆ̀S1 is feasible iff ∀i ∈ [1,m] : (S0 |= β̂i → S1 |=
ρi) ∧ (S0 |= γ̂i → S1 |= ¬ρi).
Let S0 ˆ̀1S1 . . . ˆ̀nSn be a spurious counterexample, i.e., an execution in the abstract model that
violates the specification. A refinement procedure computes additional predicates α0, α1, . . . , αn−1, αn
over program variables that satisfy the following constraint.
α0 = true ∧ αn = false ∧
n∧
i=1
αi−1 ∧ `i → α′i
Note that the primed formula α′i is the formula αi where each variable v is replaced by its
primed version v′. Recall that v′ represents the value of v after the execution of a statement. An
abstract model computed using predicates α1, . . . , αn−1 is guaranteed to not exhibit the spurious
counterexample [Henzinger et al., 2004].
5.5.2 Sampling an HB-formula
We pass trace ˆ̀1 . . . ˆ̀n to TARA and obtain an HB-formula ϕB in DNF to represent bad abstract
traces. ϕB is a formula over events ˆ̀1 . . . ˆ̀n. With slight abuse of notation, we assume that ϕB
is a formula over events `1 . . . `n, which can be obtained by replacing abstract events by their
corresponding concrete events in ϕB . We sample a few concrete infeasible traces that satisfy ϕB
and try to compute the simultaneous refinement predicates, i.e., predicates that eliminate all the
sampled traces from the abstract program. Intuitively, learning predicates simultaneously using
87
multiple spurious counterexamples may allow us to find more general predicates. Both sampling
and simultaneous refinement are heuristics choices. Here, we present one possible choice for
the sampling. However, one can imagine a wide array of heuristics for these choices. In our
sampling heuristic, we search for two disjuncts in ϕB of the form
ϕ1 ∧ ea < eb and ϕ2 ∧ eb < ea
such that negation of any HB-formula in ϕ1 is not in ϕ2. We generate traces π1 and π2 such that
(a) they satisfy ϕ1 ∧ ϕ2 ∧ ea < eb and ϕ1 ∧ ϕ2 ∧ eb < ea respectively; and (b) they are of the




































e31 . . . e
3
k3
If ϕ1 ∧ ϕ2 ∧ ea < eb and ϕ1 ∧ ϕ2 ∧ eb < ea are satisfiable, such traces always exist. Both the
traces have a common prefix and suffix, and two middle segments e11 . . . e
1
k1




swapped. From the traces, we obtain refinement predicates α1 . . . αk0 , β1 . . . βk1+k2 , γ1 . . . γk1+k2 ,
and δ1 . . . δk3 by solving the following constraints.
α0 = true ∧
k0∧
i=1
(αi−1 ∧ `0i → α′i) ∧ αk0 = β0 = γ0 (prefix)
k1∧
i=1
(βi−1 ∧ `1i → β′i) ∧
k1+k2∧
i=k1+1
(βi−1 ∧ `2i−k1 → β
′
i) (mid trace 1)
k2∧
i=1
(γi−1 ∧ `2i → γ′i) ∧
k2+k1∧
i=k2+1
(γi−1 ∧ `1i−k2 → γ
′
i) (mid trace 2)
δ0 = βk1+k2 = γk1+k2 ∧
k3∧
i=1
(δi−1 ∧ `3i → δ′i) ∧ δk3 = false (suffix)
In the above equations, the first and last constraints correspond to the prefix and suffix respectively.
The second and third constraints correspond to the middle segments of the two traces.
5.5.3 Constraint Solving for Simultaneous Refinement
We discuss how to solve the above constraints for refinement. The above constraints are a set
of non-recursive Horn clauses. Many techniques exist to solve such constraints (e.g. [Gupta
et al., 2011a; Bjørner et al., 2013]). Since we are aiming for simultaneous refinement, we
prefer the solutions for the unknown predicates to be simple atomic formulæ. If an unknown
predicate appears as consequent of multiple implications (for example, αk0+1), then the solver
may naturally give a solution that is a disjunction of two atomic formulæ. We use the method
88
that is presented in Section 4 of [Albarghouthi and McMillan, 2013] for the theory of linear
arithmetic that forces a solver to return solutions for the above constraints with single atomic
formulæ if such a solution exists.
5.6 Implementation and Experiments
We have implemented Algorithms 5.1 and 5.2 in a tool TARA1. TARA consists of 4000 lines of
C++ code and uses Z3 [de Moura and Bjørner, 2008] to discharge SMT queries. We use a new
input format, CTRC, for specifying traces. The CTRC format consists of global and thread-local
variables along with types and any initial valuations, and the statements (in SMT-LIB format) in
each thread. This makes TARA independent and easy to use with any front-end that can translate
statements to the SMT-LIB syntax. We use a modified version of CONREPAIR (Chapter 4) to
generate CTRC files for bad traces. CONREPAIR, in turn, uses CBMC [Clarke et al., 2004]
to find bad traces in programs and CPACHECKER [Beyer and Keremoglu, 2011] to translate C
statements into the SMT-LIB format.
TARA has a number of different output options. Algorithm 5.1 generates an HB-formula
in DNF, which is often large. Algorithm 5.2 generates a succinct HB-formula in DNF, the
sizes of whose disjuncts are locally minimised. In our experience, the unsat core provided
by Z3 is often far from minimal. Hence, we first use Z3 to compute an unsat core and then
use a custom minimisation technique—we use Z3 incrementally with triggers to activate and
deactivate expressions for unsat core minimisation. TARA can also generate an HB-formula in
CNF representing bad neighbourhoods. However, this is computationally more expensive.
Experiments. Our benchmarks are from a diverse set of sources, namely, the concurrency
track of the 2014 software verification competition SV-COMP [Beyer, 2014] (suite sv) and the
regression-test suites of CONREPAIR (Chapter 4) (suite cr) and ESBMC [Morse et al., 2014]
(suite es). We also use a set of small handmade examples with common bug patterns (suite hm).
The cr suite contains simplified versions of real buggy code from the Linux kernel. To test the
limits of TARA, we use the loop-x examples that have two threads each executing a loop of
x iterations. For correct behaviour, each iteration should execute atomically with respect to
iterations of the other thread. However, the locks required to ensure this are missing.
1available as open-source software along with benchmarks: https://github.com/thorstent/TARA
89
All measurements were done on an Intel core i5-3320M laptop with 8GB of RAM. Our
results are presented in Table 5.1. The time reported only includes the time taken by TARA, and
not the time needed to find a bad trace in the benchmark program. The #Threads/#Instrs column
in Table 5.1 indicates the complexity of the benchmarks in terms of the number of threads
and statements. The performance of SMT queries involving ΦCTP is mostly influenced by the
number and size of π-functions. The #π-functions/#Disjuncts column indicates the number of
π-functions and average number of arguments per π-function.
The performance of TARA using Algorithm 5.1 and Algorithm 5.2 are in columns marked
Algo.1 and Algo.2, respectively. For each algorithm, we report the number of iterations, the total
time taken and the size of the generated ϕB (as the number of disjuncts and the average number
of terms in each disjunct). Algorithm 5.1 times out after 10 minutes in many cases—in such
cases, we report the number of loop iterations completed before the timeout. With Algorithm 5.2,
TARA terminates within 5 seconds for each benchmark. This time is negligible compared to the
time taken to find the initial counterexample trace. For example, CBMC took 2 minutes to find
the trace usb-serial-1, while our analysis completed exploration of its bad neighbourhood
in 2 seconds. We tested the limits of our tool in the loop-x examples. With 32 iterations per
thread, we exceeded the timeout and hit the limit of our current implementation.
Table 5.1 Experiments: ϕB generation
Iterations Total time Size of ϕB
Name Suite #Threads/#Instrs #π-functions/#Disjuncts Algo.1 Algo.2 Algo.1 Algo.2 Algo.1 Algo.2
reorder_2 sv 2/3 2/2.0 1 1 18ms 28ms 1/2.0 1/2.0
define_use cr 2/4 2/2.0 1 1 15ms 22ms 1/2.0 1/1.0
em28xx cr 2/8 4/2.0 1 1 16ms 25ms 1/2.0 1/1.0
locks es 3/8 10/1.6 12 2 27ms 37ms 12/5.5 2/4.0
2stage hm 2/8 5/1.4 8 1 26ms 32ms 8/3.8 1/2.0
drbd_receiver cr 2/9 5/1.6 40 1 42ms 28ms 40/3.9 1/1.0
md cr 3/11 4/1.8 40 1 76ms 33ms 40/6.1 1/1.0
lazy01 sv 3/12 6/3.7 2 2 31ms 57ms 2/3.0 2/2.0
locks_hb hm 4/13 10/2.2 >29.0k 7 TO 119ms TO 6/3.0
lc_rc cr 4/14 8/2.0 4.6k 1 21.4s 37ms 4.6k/16.7 1/1.0
barrier_locks hm 3/18 17/2.6 10.6k 6 1.4min 521ms 10.6k/10.0 4/1.5
stateful01 sv 3/19 10/3.4 2.3k 2 10.5s 84ms 2.3k/9.4 2/1.0
read_write_lock sv 4/22 16/3.4 9.2k 4 1.6min 319ms 9.2k/16.1 4/3.0
loop hm 2/38 14/2.7 2 1 38ms 72ms 2/3.0 1/2.0
fib_bench sv 3/39 24/3.6 >20.5k 2 TO 2.3s TO 2/10.0
i2c_hid cr 2/42 26/4.5 >23.4k 3 TO 615ms TO 3/1.3
rtl8169-1 cr 7/71 22/2.7 >20.4k 1 TO 111ms TO 1/2.0
rtl8169-2 cr 7/116 41/2.3 >7.3k 1 TO 463ms TO 1/1.0
rtl8169-5 cr 7/134 48/3.1 >5.5k 1 TO 1.5s TO 1/1.0
rtl8169-4 cr 7/142 48/3.0 >8.4k 9 TO 3.8s TO 2/1.0
rtl8169-6 cr 7/144 52/2.9 >8.1k 1 TO 887ms TO 1/1.0
usb_serial-1 cr 7/151 87/3.7 >5.5k 1 TO 1.9s TO 1/1.0
usb_serial-2 cr 7/163 93/3.6 >4.4k 3 TO 4.4s TO 1/1.0
rtl8169-3 cr 8/174 61/3.6 >4.2k 2 TO 2.7s TO 1/1.0
usb_serial-3 cr 7/178 100/3.7 >4.3k 1 TO 2.1s TO 1/1.0
loop-2 N/A 2/16 8/3.0 >4.0k 4 11.6s 135ms 4.0k/8.9 4/2.0
loop-8 N/A 2/64 32/9.0 >15.3k 16 TO 3s TO 16/2.0
loop-32 N/A 2/256 128/33.0 >674 64 TO 35.5min TO 64/2.0
91
5.6.1 Synchronisation Synthesis
We implemented the synthesis algorithm as an extension to TARA. Given a trace π, TARA
supports synchronisation synthesis as an optional step after generating succinct representations
ofN gπ andN bπ . The implementation first attempts to apply the rules ADD.BARRIER, ADD.LOCK
and ADD.WAITSIGNAL (in that order). Then, the merging rules are applied, first merging locks
across thread pairs, and then merging barriers and locks spanning multiple threads.
We report the results of synchronisation synthesis experiments in Table 5.2. In each case, we
report the numbers of locks (#L), barriers (#B) and wait-signal (#WS) primitives synthesised.
The synthesised synchronisation matched our (human) intuition about the repairs needed. Since
TARA generates fairly small ϕG formulæ, the synthesis takes less than 50 microseconds in every
case.
Table 5.2 Experiments: synchronisation synthesis
L : Locks WS : Wait-notifies B : Barriers.
Name #L #B #WS Name #L #B #WS
reorder_2 1 0 0 loop 1 0 0
define_use 0 0 1 fib_bench 1 0 0
em28xx 0 0 1 i2c_hid 1 0 2
locks 1 0 0 rtl8169-1 0 0 1
2stage 0 0 1 rtl8169-2 0 0 1
drbd_receiver 0 0 1 rtl8169-5 0 0 1
md 0 0 1 rtl8169-4 0 0 2
lazy01 0 0 2 rtl8169-6 0 0 1
locks_hb 1 0 2 usb_serial-1 0 0 1
lc_rc 0 0 1 usb_serial-2 0 0 1
barrier_locks 1 1 0 rtl8169-3 0 0 1
stateful01 0 0 2 usb_serial-3 0 0 1
read_write_lock 4 0 0
5.6.2 Bug Summarisation
Given a trace π, TARA supports bug summarisation as an optional step after generating ϕB.
Starting from ϕB in DNF, the implementation attempts to apply the DATARACE, ATOMICITYVI-
OLATION, TWOSTAGEACCESSBUG and DEFINEUSE inference rules (in that order). Identical
bug reports are merged to avoid duplicates.
The experimental results of using our TARA-based bug summarisation on our test-suite are
presented in Table 5.3. We report the numbers of data races, atomicity violations, two-stage
access bugs and define-use bugs inferred. The Human column in the table presents a classification
of the bugs present in the benchmarks, as reported by an expert user. The last column indicates if
92
TARA’s bug summary matched the human classification. For the majority of benchmarks, TARA
summarised the bug correctly (Yes). In some cases, TARA did not infer a bug summary (–). For
the usb_serial-1 benchmark, TARA’s bug summary contradicted the human classification. For
each example, the implementation takes at most 12 milliseconds.
Table 5.3 Experiments: bug summarisation
2S : 2-stage access bug DR : Data-race bug OAV : Other atomicity violation
DU : Define-use bugs OV : Order-violation bug (only human inspection)
Human : Human interpretation of the bug
Name #2S #DR #AV #DU Human Bug summary right?
reorder_2 0 0 0 1 DU Yes
define_use 0 0 0 1 DU Yes
em28xx 0 0 0 1 DU Yes
locks 0 2 0 0 DR Yes
2stage 1 0 0 0 2S Yes
drbd_receiver 0 0 0 0 OV –
md 0 0 0 1 DU Yes
lazy01 0 0 0 0 OV –
locks_hb 0 2 0 2 DR, DU Yes
lc_rc 0 0 0 0 OV –
barrier_locks 0 2 0 0 DR, OV Yes
stateful01 0 0 0 0 OV –
read_write_lock 0 0 4 0 AV Yes
hm-loop 0 1 0 0 DR Yes
fib_bench 0 0 2 0 AV Yes
i2c_hid 0 0 1 0 AV, OV Yes
rtl8169-1 0 0 0 1 DU Yes
rtl8169-2 0 0 0 1 DU Yes
rtl8169-5 0 0 0 0 OV –
rtl8169-4 0 0 0 0 OV –
rtl8169-6 0 0 0 0 OV –
usb_serial-1 0 0 0 1 OV No
usb_serial-2 0 0 0 0 OV –
rtl8169-3 0 0 0 0 OV –
usb_serial-3 0 0 0 0 OV –
5.6.3 Accelerating CEGAR
We have implemented the abstraction refinement procedure in SATABS [Donaldson et al., 2011]
and refer to the modified version as SATABS[TARA]. In Table 5.4 we present the result of running
SATABS and SATABS[TARA] on three hand crafted examples. Each of these examples contain
two threads and 15-20 lines of code. Our method reduces the number of iterations in all the
93
Table 5.4 Experiments: CEGAR acceleration
Example SATABS SATABS[TARA]
Iterations Time(s) Iterations Time(s)
example1 55 35.4 45 33.5
example2 65 45.7 60 47.9
example3 45 23.0 41 23.9
examples. However, the total time of verification increases for two examples due to the fact that




Synthesis using an Implicit Specification
6.1 Problem Statement and Illustrative Example
In the past three chapters we introduced synthesis techniques for programs using an explicit
specification, such as assertions. Explicit specifications suffer from the drawback that it is
difficult to ensure that the specification is complete and fully captures the programmer’s intent.
We propose a solution where the specification is implicit. We observe that a core difficulty in
concurrent programming originates from the fact that the scheduler can preempt the execution
of a thread at any time. We therefore give the developer the option to program assuming a
friendly, non-preemptive, scheduler. Our tool automatically synthesises synchronisation code
to ensure that every behaviour of the program under preemptive scheduling is included in the
set of behaviours produced under non-preemptive scheduling. Thus, we use the non-preemptive
semantics as an implicit correctness specification.
The non-preemptive scheduling model (also known as cooperative scheduling [Yi and
Flanagan, 2010]) can simplify the development of concurrent software, including operating
system (OS) kernels, network servers, database systems, etc. [Sadowski and Yi, 2010; Ryzhyk
et al., 2009]. In the non-preemptive model, a thread can only be descheduled by voluntarily
yielding control, e.g., by invoking a blocking operation. Synchronisation primitives may be
used for communication between threads, e.g., a producer thread may use a semaphore to notify
the consumer about availability of data. However, one does not need to worry about protecting
accesses to shared state: a series of memory accesses executes atomically as long as the scheduled
thread does not yield.
96
A user evaluation by Sadowski and Yi [Sadowski and Yi, 2010] demonstrated that this model
makes it easier for programmers to reason about and identify defects in concurrent code. There
exist alternative implicit correctness specifications for concurrent programs. For example, for
functional programs one can specify the final output of the sequential execution as the correct
output. The synthesiser must then generate a concurrent program that is guaranteed to produce
the same output as the sequential version [Bloem et al., 2014]. This approach does not allow any
form of thread coordination, e.g., threads cannot be arranged in a producer-consumer fashion. In
addition, it is not applicable to reactive systems, such as device drivers, where threads are not
required to terminate.
Another implicit specification technique is based on placing atomic sections in the source
code of the program [Flanagan and Qadeer, 2003]. In the synthesised program the computation
performed by an atomic section must appear atomic with respect to the rest of the program. Spe-
cifications based on atomic sections and specifications based on the non-preemptive scheduling
model, used by our tool, can be easily expressed in terms of each other. For example, one can
simulate atomic sections by placing yield statements before and after each atomic section, as
well as around every statement that does not belong to any atomic section.
We believe that, at least for systems code, specifications based on the non-preemptive
scheduling model are easier to write and are less error-prone than atomic sections. Atomic
sections are subject to syntactic constraints. Each section is marked by a pair of matching
opening and closing statements, which in practice means that the section must start and end
within the same program block. In contrast, a yield can be placed anywhere in the program.
Moreover, atomic sections restrict the use of thread synchronisation primitives such as
semaphores. An atomic section either executes in its entirety or not at all. In the former case,
all wait conditions along the execution path through the atomic section must be simultaneously
satisfied before the atomic section starts executing. In practice, to avoid deadlocks, one can only
place a blocking statement at the start of an atomic section. Combined with syntactic constraints
discussed above, this restricts the use of thread coordination with atomic sections—a severe
limitation for systems code where thread coordination is common. In contrast, synchronisation
primitives can be used freely under non-preemptive scheduling. Internally, they are modelled
using yields: for instance, a semaphore acquisition statement is modelled by a yield followed by
an assume statement that proceeds when the semaphore becomes available.
Lastly, our specification defaults to the safe choice of assuming everything needs to be atomic
97
unless a yield statement is placed by the programmer. In contrast, code that uses atomic sections
can be preempted at any point unless protected by an explicit atomic section.
In defining behavioural equivalence between preemptive and non-preemptive executions, we
focus on externally observable program behaviours: two program executions are observationally
equivalent if they generate the same sequences of calls to interfaces of interest. This approach fa-
cilitates modular synthesis where a module’s behaviour is characterised in terms of its interaction
with other modules. Given a multi-threaded program C and a synthesised program C ′ obtained
by adding synchronisation to C, C ′ is preemption-safe w.r.t. C if for each execution of C ′ under a
preemptive scheduler, there is an observationally equivalent non-preemptive execution of C. Our
synthesis goal is to automatically generate a preemption-safe version of the input program.
We rely on abstraction to achieve efficient synthesis of multi-threaded programs. We propose
a simple, data-oblivious abstraction inspired by an analysis of synchronisation patterns in OS
code, which tend to be independent of data values. The abstraction tracks types of accesses
(read or write) to each memory location while ignoring their values. In addition, the abstraction
tracks branching choices. Calls to an external interface are modelled as writes to a special
memory location, with independent interfaces modelled as separate locations. To the best of
our knowledge, our proposed abstraction is yet to be explored in the verification and synthesis
literature. The abstract program is denoted as Cabs.
Two abstract program executions are observationally equivalent if they are equal modulo the
classical independence relation I on memory accesses. This means that every sequence ω of
observable actions is equivalent to a set of sequences of observable actions that are derived from
ω by repeatedly commuting independent actions. Independent actions are accesses to different
locations, and accesses to the same location iff they are both read accesses. Using this notion of
equivalence, the notion of preemption-safety is extended to abstract programs.
Our abstraction is reminiscent of previously used abstractions that track reads and writes to
individual locations (e.g., [Vechev et al., 2010b; Alglave et al., 2014]). However, our abstraction
is novel as it additionally tracks some control-flow information (specifically, the branches taken)
giving us higher precision with almost negligible computational cost.
Under abstraction, we model each thread as a nondeterministic finite automaton (NFA) over a
finite alphabet, with each symbol corresponding to a read or a write to a particular variable. This
enables us to construct NFAs NPabs, representing the abstraction of the original program C under
non-preemptive scheduling, and Pabs, representing the abstraction of the synthesised program
98
C ′ under preemptive scheduling. We show that preemption-safety of C ′ w.r.t. C is implied by
preemption-safety of the abstract synthesised program C ′abs w.r.t. the abstract original program
Cabs, which, in turn, is implied by language inclusion modulo I of NFAs Pabs and NPabs. While
the problem of language inclusion modulo an independence relation is undecidable [Bertoni et al.,
1982], we show that the antichain-based algorithm for standard language inclusion [de Wulf
et al., 2006] can be adapted to decide a bounded version of language inclusion modulo an
independence relation.
Our synthesis works in a counterexample-guided inductive synthesis (CEGIS) loop that
accumulates a set of global constraints. The loop starts with a counterexample obtained from
the language inclusion check. A counterexample is a sequence of locations in Cabs, which
when executed in order produce an observation sequence that is valid under the preemptive
semantics, but not under the non-preemptive semantics. From the counterexample we infer
mutual exclusion (mutex) constraints, which when enforced in the language inclusion check
avoid returning the same counterexample again. We accumulate the mutex constraints from
all counterexamples iteratively generated by the language inclusion check. Once the language
inclusion check succeeds, we construct a set of global constraints using the accumulated mutex
constraints and constraints for enforcing deadlock-freedom. This approach is the key difference
to our paper [Černý et al., 2015b], where a greedy approach is employed that immediately places
a lock to eliminate a bug. The greedy approach may result in a suboptimal lock placement with
unnecessarily overlapping or nested locks.
The global approach allows us to use an objective function f to find an optimal lock placement
w.r.t. f once all mutex constraints have been identified. Examples of objective functions include
minimising the number of lock statements (leading to coarse-grained locking) and maximising
concurrency (leading to fine-grained locking). We encode such an objective function, together
with the global constraints, into a weighted maximum satisfiability (MaxSAT) problem, which is
then solved using an off-the-shelf solver.
Since the synthesised lock placement is guaranteed not to introduce deadlocks our solution
follows good programming practices with respect to locks: no double locking, no double
unlocking and no locks locked at the end of the execution.
We implemented our synthesis algorithm in a new prototype tool called LISS (Language
Inclusion-based Synchronisation Synthesis) and evaluated it on a series of device driver bench-
marks, including an Ethernet driver for Linux and the synchronisation skeleton of a USB-to-serial
99
Figure 6.1 Running example
procedure OPEN_DEV
`1: if (open == 0) then
`2: POWER_UP
end if




`5: if (open > 0) then
`6: open := open− 1















controller driver, as well as an in-memory key-value store server. First, LISS was able to detect
and eliminate all but two known race conditions in our examples; these included one race
condition that we previously missed when synthesising from explicit specifications (Chapter 4),
due to a missing assertion. Second, our abstraction proved highly efficient: LISS runs an order of
magnitude faster on the more complicated examples than our previous synthesis tool based on
the CBMC model checker. Third, our coarse abstraction proved surprisingly precise in practice:
across all our benchmarks, we only encountered three program locations where manual abstrac-
tion refinement was needed to avoid the generation of unnecessary synchronisation. Fourth, our
tool finds a deadlock-free lock placement for both a fine-grained and a coarse-grained objective
function. Overall, our evaluation strongly supports the use of the implicit specification approach
based on non-preemptive scheduling semantics as well as the use of the data-oblivious abstraction
to achieve practical synthesis for real-world systems code. With the two objective functions we
implemented, LISS produces an optimal lock placements w.r.t. the objective.
6.1.1 Illustrative Example
Figure 6.1 contains our running example, a part of a device driver. A driver interfaces the
operating system with the hardware device (as illustrated in Figure 6.2) and may be used by
different threads of the operating system in parallel. An operating system thread wishing to
use the device must first call the OPEN_DEV procedure and finally the CLOSE_DEV procedure
to indicate it no longer needs the device. The driver keeps track of the number of threads that
100
Figure 6.3 Abstraction of the running example
procedure OPEN_DEV_ABS
`1a: read(open)



















interact with the device. The first thread to call OPEN_DEV will cause the driver to power up the
device (line `2), the last thread to call CLOSE_DEV will cause the driver to power down the device
(line `8). The interaction between the driver and the device are represented as procedure calls
in lines `2 and `8. From the device’s perspective, the power-on and power-off signals alternate.
In general, we must assume that it is not safe to send the power-on signal twice in a row to the
device. If executed with the non-preemptive scheduler the code in Figure 6.1 will produce a
sequence of a power-on signal followed by a power-off signal followed by a power-on signal and
so on.
Consider the case where the procedure OPEN_DEV is called in parallel by two operating
system threads that want to initiate usage of the device. Without additional synchronisation, there
could be two calls to POWER_UP in a row when executing under a preemptive scheduler. Consider
two threads (T1 and T2) running the OPEN_DEV procedure. The corresponding trace is T1.`1;
T2.`1; T1.`2; T2.`2; T2.`3; T2.`4; T1.`3; T1.`4. This sequence is not observationally equivalent
to any sequence that can be produced when executing with a non-preemptive scheduler.
Figure 6.3 contains the abstracted versions of the two procedures, OPEN_DEV_ABS and
CLOSE_DEV_ABS. For instance, the statement open := open + 1 is abstracted to the two
statements labelled `3a and `3b. The calls to the device (POWER_UP and POWER_DOWN) are
abstracted as writes to a hypothetical dev variable. This expresses the fact that interactions
with the device are never independent. The abstraction is coarse, but still captures the problem.
Consider two threads (T1 and T2) running the OPEN_DEV_ABS procedure. The following trace
is possible under a preemptive scheduler, but not under a non-preemptive scheduler: T1.`1a;
T1.`1b; T2.`1a; T2.`1b; T1.`2; T2.`2; T2.`3a; T2.`3b; T2.`4; T1.`3a; T1.`3b; T1.`4. Moreover, the
101
Figure 6.4 Running example with the synthesised locks
procedure OPEN_DEV
lock(LkVar)
`1: if (open == 0) then
`2: POWER_UP
end if






`5: if (open > 0) then
`6: open := open− 1







trace cannot be transformed by swapping independent events into any trace possible under a
non-preemptive scheduler. This is because statements `3b : write(open) and `1a : read(open) are
not independent. Further, `2 : write(dev) is not independent with itself. Hence, the abstract trace
exhibits the problem of two successive calls to POWER_UP when executing with a preemptive
scheduler. Our synthesis algorithm finds this problem, and stores it as a mutex constraint:
mtx([`1a : `3b], [`2 : `3b]). Intuitively this constraint expresses the fact if one thread is executing
any instruction between `1a and `3b no other thread may execute `2 or `3b.
While this constraint ensures two parallel calls to OPEN_DEV behave correctly, two parallel
calls to CLOSE_DEV may result in the the device receiving two POWER_DOWN signals. This is
represented by the concrete trace T1.`5; T1.`6; T2.`5; T2.`6; T2.`7; T2.`8; T2.`9; T1.`7; T1.`8;
T1.`9. The corresponding abstract trace is T1.`5a; T1.`5b; T1.`6a; T1.`6b; T2.`5a; T2.`5b; T2.`6a;
T2.`6b; T2.`7a; T2.`7b; T2.`8; T2.`9; T1.`7a; T1.`7b; T1.`8; T1.`9. This trace is not possible
under a non-preemptive scheduler and cannot be transformed to a trace possible under a non-
preemptive scheduler. This results in a second mutex constraint mtx([`5a : `8], [`6b : `8]). With
both mutex constraints the program is correct. Our lock placement procedure then encodes these
constraints in SMT and the models of the SMT formula are all the correct lock placements. In
Figure 6.4 we show OPEN_DEV and CLOSE_DEV with the inserted locks.
6.2 Solution Overview
Reduction of preemption-safety to language inclusion. To ensure tractability of checking
preemption-safety, we build the abstract program Cabs from C using the abstraction function
described in Section 6.3. Under abstraction, we model each thread as a nondeterministic finite

103
Automaton modification for enforcing mutex constraints. Once we have the mutex con-
straints inferred from a generalised counterexample, we enforce them in P′abs, effectively remov-
ing transitions from the automaton that violate the mutex constraint. This completes our loop and
we repeat the language inclusion check of P′abs and NPabs. If another counterexample is found
our loop continues, if the language inclusion check succeeds we proceed to the lock placement.
This differs from the greedy approach employed in [Černý et al., 2015b] that modifies C ′abs,
constructs a new automaton P′abs from C ′abs and then restarts the language inclusion check. The
greedy approach inserts locks into C ′abs that are never removed in a future iteration, which can
lead to inefficient lock placement. For example a larger lock may be placed that completely
surrounds an earlier placed lock.
Computation of an f -optimal lock placement. Once P′abs and NPabs satisfy language inclu-
sion modulo I , we formulate global constraints over lock placements for ensuring correctness.
These global constraints include all mutex constraints inferred over all iterations and constraints
for enforcing deadlock-freedom. Any model of the global constraints corresponds to a lock place-
ment that ensures program correctness. We describe the formulation of these global constraints
in Section 6.6.
Given a cost function f , we compute a lock placement that satisfies the global constraints
and is optimal w.r.t. f . We then synthesise the final output C ′ by inserting the computed lock
placement in C. We present various objective functions and describe the computation of their
respective optimal solutions in Section 6.7.
6.3 Abstract Concurrent Programs
The state of the concrete semantics contains unbounded integer variables, which may result in
an infinite state space. We therefore introduce a simple, data-oblivious abstraction Wabs for
concurrent programs written in W communicating with an external system. The abstraction
tracks types of accesses (read or write) to each memory location while abstracting away their
values. Inputs/outputs to a channel are modelled as writes to a special memory location (dev).
Even inputs are modelled as writes because in our applications we cannot assume that reads from
the external interface are free of side-effects in the component on the other side of the interface.
Havocs become ordinary writes to the variable they are assigned to. Every branch is taken
104
Figure 6.6 Syntax ofWabs
var ::= Variables
ShVar Shared variable
dev Variable for interaction with channels
GrdExpr ::= Expression over guard variables
true/false Boolean constant
GrdVar Guard variable
boolop(GrdExpr 1, . . . ,GrdExprn) Boolean operation
LbStmt ::= Labelled Statement
` : stmt Statement annotated with a location
LbStmt1; LbStmt2 Sequence of statements
stmt ::= Statement
skip marks the end of the thread
read(var) Read a shared variable var
write(var) Write to shared variable var
if (∗) then LbStmt1 else LbStmt2 conditional
while (∗) LbStmt while loop
lock(LkVar) Locks the mutex lock
. . . remaining statements as in Figure 2.1
non-deterministically and tracked. Given C written inW , we denote by Cabs the corresponding
abstract program written inWabs.
Abstract Syntax (Figure 6.6). In the figure, var denotes all shared program variables and
the dev variable. The syntax of all synchronisation primitives and the assumptions over guard
variables remains unchanged. The purpose of the guard variables is to improve the precision of
our otherwise coarse abstraction. Currently, they are inferred manually, but can presumably be
inferred automatically using an iterative abstraction-refinement loop. In our current benchmarks,
guard variables needed to be introduced in only three scenarios.
Abstract semantics. As before, we first define the semantics ofWabs for a single-thread.
Single-thread semantics (Figure 6.7). The abstract state of a single thread tid is given simply by
〈Vo, `〉 where Vo is a valuation of all lock, condition and guard variables and ` is the location of
the statement in tid to be executed next. We define the flow graph and successors for locations in
the abstract program tid in the same way as before. An abstract observable symbol is of the form:
(tid , θ, `), where θ ∈ {(read, sv), (write, sv), then, else, loop, exitloop}. The symbol θ records
the type of access to variables along with the variable name ((read, v), (write, v)) and records
non-deterministic branching choices {if, else, loop, exitloop}. Figure 6.7 presents the rules for
105
Figure 6.7 Partial set of rules for single-thread semantics ofWabs
stmt(`) = read(var) `′ = succ(`)
〈Vo, `〉
(tid ,(read,var),`)−−−−−−−−−→ 〈Vo, `′〉
READ
stmt(`) = write(var) `′ = succ(`)
〈Vo, `〉
(tid ,(write,var),`)−−−−−−−−−→ 〈Vo, `′〉
WRITE
stmt(`) = if (∗) then ls1 else ls2 `′ = succ1(`)
〈Vo, `〉
(tid ,then,`)−−−−−−→ 〈Vo, `′〉
IF1
stmt(`) = if (∗) then ls1 else ls2 `′ = succ2(`)
〈Vo, `〉
(tid ,else,`)−−−−−→ 〈Vo, `′〉
IF2
stmt(`) = while (∗) ls `′ = succ1(`)
〈Vo, `〉
(tid ,loop,`)−−−−−−→ 〈Vo, `′〉
WHILE1
stmt(`) = while (∗) ls `′ = succ2(`)
〈Vo, `〉
(tid ,exitloop,`)−−−−−−−−→ 〈Vo, `′〉
WHILE2
Figure 6.8 Example motivating branch-tagging
x := 0; y := 0
Thread T1
`1: x := 0
`2: if (y) then
`3: yield
end if




`6: x := 1
statements unique toWabs; the rules for statements common toWabs andW are the same.
Concurrent semantics. A state of an abstract concurrent program is either 〈terminated〉,
〈invalid〉, or is given by 〈Vo, ctid, (`1, . . . , `n)〉 where Vo is a valuation of all lock, condition
and guard variables, ctid is the current thread identifier and `1, . . . , `n are the locations of the
statements to be executed next in threads T1 to Tn, respectively. There is no 〈failed〉 state
because Wabs does not have assertions. The non-preemptive and preemptive semantics of a
concurrent program written inWabs are defined in the same way as that of a concurrent program
written inW minus the ASSERTION FAILURE rule.
106
Figure 6.9 Abstraction function fromW toWabs
⟪LoExp⟫` = (nothing)
⟪ShVar⟫` = ` : read(ShVar)
⟪op(ShVar ,LoExp1, . . . ,LoExpn)⟫` = ` : read(ShVar)
⟪LbStmt1; LbStmt2⟫ = ⟪LbStmt1⟫; ⟪LbStmt2⟫
⟪` : ShVar := LoExp⟫ = ` : write(ShVar)
⟪` : LoVar := ShExp⟫ = ⟪ShExp⟫`
⟪` : ShVar := havoc⟫ = ` : write(ShVar)
⟪` : ShVar := input(ch)⟫ = ` : write(dev); ` : write(ShVar)
⟪` : output(ShVar , ch)⟫ = ` : read(ShVar); ` : write(dev)
⟪` : if (ShExp) then LbStmt1 else LbStmt2⟫ = ⟪ShExp⟫`; ` : if (∗) then ⟪LbStmt1⟫ else ⟪LbStmt2⟫
⟪` : while (ShExp) LbStmt⟫ = ⟪ShExp⟫`; ` : while (∗) ⟪LbStmt⟫; ⟪ShExp⟫`
⟪` : assert(ShExp)⟫ = (nothing)
⟪` : await(ShExp)⟫ = ⟪ShExp⟫`; ` : while (∗) ⟪LbStmt⟫
⟪` : lock(LkVar)⟫ = ` : lock(LkVar)
. . .
6.3.1 Abstraction function (Figure 6.9)
A thread in W can be translated to Wabs using the abstraction function ⟪⟫. The abstraction
replaces all global variable access with read(var) and write(var) and replaces branching condi-
tions with nondeterminism (∗). Since we will use the abstraction only for programs with implicit
specification the assert statements are removed during the translation. All synchronisation
primitives remain unaffected by the abstraction. The abstraction may result in duplicate labels `,
which are replaced by fresh labels. goto statements are rewritten accordingly. For example in
our running example in Figure 6.1 the abstraction of `1 results in two abstract labels `1a and `1b
in Figure 6.3.
6.4 Checking Preemption-safety
6.4.1 Reduction of Preemption-safety to Language Inclusion
Soundness of the abstraction. Formally, two observable behaviours ω1 = α0 . . . αk and
ω2 = β0 . . . βk of an abstract program Cabs in Wabs are equivalent if:
(A1) For each thread tid , the subsequences of α0 . . . αk and β0 . . . βk containing only symbols
of the form (tid , a, `), with a ∈ {(read, var), (write, var), then, else, loop, loopexit} are
equal,
107
(A2) For each variable var, the subsequences of α0 . . . αk and β0 . . . βk containing only write
symbols (of the form (tid , (write, var), `)) are equal and
(A3) For each variable var, the multisets of symbols of the form (tid , (read, var), `) between
any two write symbols, as well as before the first write symbol and after the last write
symbol are identical.
Using this notion of equivalence, the notion of preemption-safety is extended to abstract programs:
Given abstract concurrent programs Cabs and C ′abs in Wabs such that C ′abs is obtained by adding
locks to Cabs, C ′abs is preemption-safe w.r.t. Cabs if [[C ′abs]]P babs [[Cabs]]NP .
For the abstraction to be sound we require only that whenever preemption-safety does not
hold for a program C, then there must be a trace in its abstraction Cabs feasible under preemptive,
but not under non-preemptive semantics.
To illustrate this we use the program in Figure 6.8, which is not preemption-safe. To see
this consider the observation (T1, out, 10, ch) that cannot occur in the non-preemptive semantics
because x is always 0 at `4. Note that `3 is unreachable because the variable y is initialised to
0 and never assigned. With the preemptive semantics the output can be observed if thread T2
interrupts thread T1 between lines `1 and `4. An example trace would be `1; `6; `2; `4; `5.
If we consider the abstract semantics, we notice that under the non-preemptive abstract
semantics `3 is reachable because the abstraction makes the branching condition in `2 non-
deterministic. However, since our abstraction is sound there must still be an observation se-
quence that is observable under the abstract preemptive semantics, but not under the abstract
non-preemptive semantics. This observation sequence is (T1, (write, x), `1), (T2, (write, x), `6),
(T1, (read, y), `2), (T1, else, `2), (T1, (read, x), `4), (T1, then, `2), (T1, (write, dev), `5). The branch
tagging records that the else branch is taken in `2. The non-preemptive semantics cannot produce
this observation sequences because it must also take the else branch in `2 and can therefore not
reach the yield statement and context-switch. As a site note, it is also not possible to transform
this observation sequence into an equivalent one under the non-preemptive semantics because of
the write to x at `6 and the accesses to x in `1 and `4.
This example illustrates why branch tagging is crucial to soundness of the abstraction. If
we assume a hypothetical abstract semantics without branch tagging we would get the follow-
ing preemptive observation sequence: (T1, (write, x), `1), (T2, (write, x), `6), (T1, (read, y), `2),
(T1, (read, x), `4), (T1, (write, dev), `5). This sequence would also be a valid observation se-
108
quence under the non-preemptive semantics, because it could take the then branch in `2 and
reach the yield statement and context-switch.
Theorem 6.4.1 (soundness). Given concurrent program C and a synthesised program C ′ obtained
by adding locks to C, [[C ′abs]]P babs [[Cabs]]NP =⇒ [[C ′]]P b [[C]]NP .
Proof. It is easier to prove the contrapositive: [[C ′]]P 6b [[C]]NP =⇒ [[C ′abs]]P 6babs [[Cabs]]NP .
[[C ′]]P 6b [[C]]NP means that there is an observation sequence ω′ of [[C ′]]P with no equival-
ent observation sequence in [[C]]NP . We now show that the abstract sequence ω′abs in [[C ′abs]]P
corresponding to the sequence ω′ has no equivalent sequence in [[Cabs]]NP .
Towards contradiction we assume there is such an equivalent sequence ωabs in [[Cabs]]NP . We
show that if ωabs indeed existed it would correspond to a concrete sequence ω that is equivalent
to ω′, thereby contradicting our assumption.
By (A1) ωabs would have the same control flow as ω′abs because of the branch tagging. By
(A2) and (A3) ωabs would have the same data-flow, meaning all reads from global variables
are reading the values written by the same writes as in ω′abs. Since all interactions with the
environment are abstracted to write(dev) the order of interactions must be the same between ωabs
and ω′abs. This means that, assuming all inputs and havocs are returning the same value, in the
execution ω corresponding to ωabs all variables valuation are identical to those in ω′. Therefore,
ω is feasible and its interaction with the environment is identical to ω′ as all variable valuations
are identical. Identical interaction with the environment is how equivalence between ω and ω′ is
defined. This concludes our proof.
Language inclusion modulo an independence relation. We define the problem of language
inclusion modulo an independence relation. Let I be a non-reflexive, symmetric binary relation
over an alphabet Σ. We refer to I as the independence relation and to elements of I as independent
symbol pairs. We define a symmetric binary relation ≈I over words in Σ∗: for all words
σ, σ′ ∈ Σ∗ and (α, β) ∈ I , (σ · αβ · σ′, σ · βα · σ′) ∈≈I . Let ≈tI denote the reflexive transitive
closure of ≈I .1 Given a language L over Σ, the closure of L w.r.t. I , denoted CloI(L), is the set
{σ ∈ Σ∗: ∃σ′ ∈ L with (σ, σ′) ∈ ≈tI}. Thus, CloI(L) consists of all words that can be obtained
from some word in L by repeatedly commuting adjacent independent symbol pairs from I .
1The equivalence classes of ≈tI are Mazurkiewicz traces.
109
Definition 6.4.2 (Language inclusion modulo an independence relation). Given NFAs A,B over
a common alphabet Σ and an independence relation I over Σ, the language inclusion problem
modulo I is: L(A) ⊆ CloI(L(B))?
Data independence relation. We define the data independence relation ID over our observable
symbols. Two symbols α = (tidα, aα, `α) and β = (tidβ, aβ, `β) are independent, (α, β) ∈ ID,
iff (I0) tidα 6= tidβ and one of the following hold:
(I1) aα or aβ in {then, else, loop, loopexit}
(I2) aα and aβ are both (read, var)
(I3) aα is in {(write, varα), (read, varα)} and aβ is in {(write, varβ), (read, varβ)} and
varα 6= varβ
Checking preemption-safety. Under abstraction, we model each thread as a nondeterministic
finite automaton (NFA) over a finite alphabet consisting of abstract observable symbols. This
enables us to construct NFAs NPabs and P′abs accepting the languages [[Cabs]]NP and [[C ′abs]]P ,
respectively. Cabs is the abstract program corresponding to the input program C and C ′abs is the
program corresponding to the result of the synthesis C ′. It turns out that preemption-safety of C ′
w.r.t. C is implied by preemption-safety of C ′abs w.r.t. Cabs, which, in turn, is implied by language
inclusion modulo ID of NFAs P′abs and NPabs. NFAs P
′
abs and NPabs satisfy language inclusion
modulo ID if any word accepted by P′abs is equivalent to some word obtainable by repeatedly
commuting adjacent independent symbol pairs in a word accepted by NPabs.
Proposition 6.4.3. Given concurrent programs C and C ′, [[C ′abs]]P babs [[Cabs]]NP iff L(P′abs) ⊆
CloID(L(NPabs)).
Proof. By construction P′abs, resp. NPabs, accept exactly the the observation sequences that C ′abs,
resp. Cabs, may produce under the preemptive, resp. non-preemptive, semantics (denoted by
[[C ′abs]]P , resp. [[Cabs]]NP ). It remains to show that two observation sequences ω1 = α0 . . . αk and
ω2 = β0 . . . βk are equivalent iff ω1 ∈ CloID({ω2}).
We first show that ω1 ∈ CloID({ω2}) implies ω1 is equivalent to ω2. The proof proceeds by
induction: The base case is that no symbols are swapped and is trivially true. The inductive case
assumes that ω′ is equivalent to ω2 and we needs to show that after one single swap operation in
110
ω′, resulting in ω′′, ω′ is equivalent to ω′′ and therefore by transitivity also equivalent to ω2. Rule
(A1) holds because ID does not allow symbols of the same thread to be swapped (I0). To prove
(A2) we use the fact that writes to the same variable cannot be swapped (I2), (I3). To prove (A3)
we use the fact that reads and writes to the same variable are not independent (I2), (I3).
It remains to show that ω1 is equivalent to ω2 implies ω1 ∈ CloID({ω2}). Clearly ω1 and ω2
consist of the same multiset of symbols (A1). Therefore it is possible to transform ω2 into ω1
by swapping adjacent symbols. It remains to show that all swaps involve independent symbols.
By (A1) the order of events in each thread does not change, therefore condition (I0) is always
fulfilled. Branch tags can swap with every other symbol (I1) and accesses to different variables
can swap with each other (I3). For each variables ShVar (A2) ensures that writes are in the same
order and (A3) allows reads in between to be reordered. These swaps are allowed by (I2). No
other swaps can occur.
6.4.2 Checking Language Inclusion
We first focus on the problem of language inclusion modulo an independence relation (Defini-
tion 6.4.2). This question corresponds to preemption-safety (Theorem 6.4.1, Proposition 6.4.3)
and its solution drives our synchronisation synthesis.
Theorem 6.4.4. For NFAs A,B over alphabet Σ and a symmetric, irreflexive independence
relation I ⊆ Σ× Σ, the problem L(A) ⊆ CloI(L(B)) is undecidable [Bertoni et al., 1982].
We now show that this general undecidability result extends to our specific NFAs and
independence relation ID.
Theorem 6.4.5. For NFAs P′abs and NPabs constructed from Cabs, the problem L(P′abs) ⊆
CloID(L(NPabs)) is undecidable.
Proof. Our proof is by reduction from the language inclusion modulo an independence relation
problem (Definition 6.4.2). Theorem 6.4.5 follows from the undecidability of this problem
(Theorem 6.4.4).
Assume we are given NFAs A = (QA,Σ,∆A, Qι,A, FA) and B = (QB,Σ,∆B, Qι,B, FB)
and an independence relation I ⊆ Σ× Σ. Without loss of generality we assume A and B to be
deterministic, complete, and free of ε-transitions, meaning from every state there is exactly one
111
Figure 6.10 Simulator algorithm
Thread T1
`1: while (*) do
`2: signal(ch-sym) . choose symbol
`3: wait_reset(ch-sym-compl)
`4: sA1 ← ∆1A(sA1, . . . , sAn, τ 1, . . . , τ p)
`5: . . .
`6: sAn ← ∆nA(sA1, . . . , sAn, τ 1, . . . , τ p)
`7: sB1 ← ∆1B(sB1, . . . , sBm, τ 1, . . . , τ p)
`8: . . .



















`14: while (*) do
`15: wait_reset(ch-sym)
`16: τ 1 ← α1
`17: . . .






transition for each symbol. We show that we can construct a program Cabs that is preemption-safe
iff L(A) ⊆ CloI(L(B)).
For our reduction we construct a program Cabs that simulates A or B if run with a preemptive
scheduler and simulates onlyB if run with a non-preemptive scheduler. Note thatL(A)∪L(B) ⊆
CloI(L(B)) iff L(A) ⊆ CloI(L(B)). For every symbol α ∈ Σ our simulator produces a
sequence ωα of abstract observable symbols. We say two such sequences ωα and ωβ commute if
ωα ·ωβ ≈tID ωβ ·ωα, i.e, if ωβ ·ωα can be obtained from ωα ·ωβ by repeatedly swapping adjacent
symbol pairs in ID.
We will show that (a) Cabs simulates A or B if run with a preemptive scheduler and simulates
only B if run with a non-preemptive scheduler, and (b) sequences ωα and ωβ commute iff
(α, β) ∈ I .
The simulator is shown in Figure 6.10. States and symbols of A and B are mapped to
natural numbers and represented as bitvectors to enable simulation using the languageWabs.
In particular we use Boolean guard variables from Wabs to represent the bitvectors. We use
true to represent 1 and false to represent 0. As the state space and the alphabet are finite we
know the number of bits needed a priori. We use n, m, and p for the number of bits needed to
represent QA, QB, and Σ, respectively. The transition functions ∆A and ∆B likewise work on
the individual bits. We represent bitvector x of length n as x1 . . . xn.
Thread T1 simulates both automata A and B simultaneously. We assume the initial states
112
of A and B are mapped to the number 0. In each iteration of the loop in thread T1 a symbol
α ∈ Σ is chosen non-deterministically and applied to both automata (we discuss this step in the
next paragraph). Whether thread T1 simulates A or B is decided only in the end: depending
on the value of simA we assert that a final state of A or B was reached. The value of simA is
assigned in thread T2 and can only be true if T2 is preempted between locations `12 and `13.
With the non-preemptive scheduler the variable simA will always be false because thread T2
cannot be preempted. The simulator can only reach the 〈terminated〉 state if all assumptions
hold as otherwise it would end in the 〈invalid〉 state. The guard final will only be assigned
true in `11 if either simA is false and a final state of B has been reached or if simA is true
and a final state of A has been reached. Therefore the valid non-preemptive executions can only
simulate B. In the preemptive setting the simulator can simulate either A or B because simA
can be either true or false. Note that the statement in location `11 executes atomically and the
value of simA cannot change during its evaluation. This means that P′abs simulates L(A) ∪ L(B)
and NPabs simulates L(B).
We use τ to store the symbol used by the transition function. The choice of the next symbol
needs to be non-deterministic to enable simulation of A,B and there is no havoc statement in
Wabs. We therefore use the fact that the next thread to execute is chosen non-deterministically
at a preemption point. We define a thread Tα for every α ∈ Σ that assigns to τ the number
α maps to. Threads Tα can only run if the conditional variable ch-sym is set to 1 by the
notify statement in `2. The wait_reset(ch-sym-compl) in `3 is a preemption point for the non-
preemptive semantics. Then, exactly one thread Tα can proceed because the wait_reset(ch-sym)
statement in `15 atomically resets ch-sym to 0. After setting τ and outputting the representation
of α thread Tα, notifies thread T1 using condition variable ch-sym-compl. Another symbol
can only be produced in the next loop iteration of T1.
To produce an observable sequence faithful to I for each symbol in Σ we define a homo-
morphism h that maps symbols from Σ to sequences of observables. Assuming the symbol
α ∈ Σ is chosen, we produce the following observables:
• Loop tag. To output α the thread Tα has to perform one loop iteration. This implicitly
produces a loop tag (Tα, loop, `14).
• Conflict variables. For each pair of (α, αi) /∈ I , we define a conflict variable v{α,αi}. Note
that v{α,αi} = v{αi,α} and two writes to v{α,αi} do not commute under ID. For each αi, we
produce a tag (Tα, (write, v{α,αi}, `oi)). Therefore if two variables α1 and α2 are dependent
113
the observation sequences produced for each of them will contain a write to v{α1,α2}.
Formally, the homomorphism h is given by h(α) = (Tα, loop, `14); (Tα, (write, v{α,α1}), `o1);
· · · ; (Tα, (write, v{α,αk}), `ok). For a sequence σ = α1 . . . αn use define h(σ) = h(α1) . . . h(αn).
We show that (α1, α2) ∈ I iff h(α1) and h(α2) commute. The loop tags are independent iff
α1 6= α2. If α1 = α2 then (α1, α2) /∈ I and h(α1) and h(α2) do not commute due to the loop
tags. Assuming (α1, α2) ∈ I then h(α1) and h(α2) commute because they have no common
conflict variable they write to. On the other hand, if (α1, α2) /∈ I , then both h(α1) and h(α2)
will contain (Tα{1,2} , (write, v{α1,α2}), `oi) and therefore cannot commute. We extend this result
to sequences and have that h(σ′) ≈tID h(σ) iff σ
′ ≈tI σ.
This concludes our reduction. It remains to show that Cabs is preemption-safe iff L(A) ⊆
CloI(L(B)). By Proposition 6.4.3 it suffices to show that L(A) ⊆ CloI(L(B)) iff L(P′abs) ⊆
CloID(L(NPabs)).
1. We assume that L(A) ⊆ CloI(L(B)). Then, for every word σ ∈ L(A) we have that
σ ∈ CloI(L(B)). By construction h(σ) ∈ L(P′abs). It remains to show that h(σ) ∈
CloID(L(NPabs)). By σ ∈ CloI(L(B)) we know there exists a word σ′ ∈ L(B), such that
σ′ ≈tI σ. Therefore also h(σ′) ≈tID h(σ) and by construction h(σ
′) ∈ L(NPabs).
2. We assume that L(A) 6⊆ CloI(L(B)). Then, there exists a word σ ∈ L(A) such that
σ /∈ CloI(L(B)). By construction h(σ) ∈ L(P′abs). Let us assume towards contradiction
that h(σ) ∈ CloID(L(NPabs)). Then there exists a word ω in L(NPabs) such that ω ≈tID
h(σ). By construction, this implies there exists some σ′ ∈ L(B) such that ω = h(σ′)
and h(σ′) ≈tID h(σ). Thus, there exists σ
′ ∈ L(B) such that σ′ ≈tI σ. This implies
σ ∈ CloI(L(B)), which is a contradiction.
Fortunately, a bounded version of the language inclusion modulo I problem is decidable.
Recall the relation≈I over Σ∗ from Section 6.4.1. We define a symmetric binary relation≈iI over
Σ∗: (σ, σ′) ∈≈iI iff ∃(α, β) ∈ I: (σ, σ′) ∈≈I , σ[i] = σ′[i + 1] = α and σ[i + 1] = σ′[i] = β.
Thus ≈iI consists of all words that can be obtained from each other by commuting the symbols
at positions i and i+ 1. We next define a symmetric binary relation  over Σ∗: (σ, σ′) ∈ iff
∃σ1, . . . , σt: (σ, σ1) ∈≈i1I , . . . , (σt, σ′) ∈≈
it+1
I and i1 < . . . < it+1. The relation  intuitively
consists of words obtained from each other by making a single forward pass commuting multiple
pairs of adjacent symbols. We recursively define k as follows: 0 is the identity relation
114
id. For k > 0 we define k= ◦ k−1, the composition of  with k−1. Given a language
L over Σ, we use Clok,I(L) to denote the set {σ ∈ Σ∗ : ∃σ′ ∈ L with (σ, σ′) ∈k}. In
other words, Clok,I(L) consists of all words which can be generated from L using a finite-state
transducer that remembers at most k symbols of its input words in its states. By definition we
have Clo0,I(L) = L.
Definition 6.4.6 (Bounded language inclusion modulo an independence relation). Given NFAsA,B
over Σ, I ⊆ Σ× Σ and a constant k > 0, the k-bounded language inclusion problem modulo I
is: L(A) ⊆ Clok,I(L(B))?
We present an algorithm to check k-bounded language inclusion modulo I , based on the
antichain algorithm for standard language inclusion [de Wulf et al., 2006].
6.4.3 Antichain algorithm for language inclusion
Given a partial order (X,v), an antichain over X is a set of elements of X that are incomparable
w.r.t. v. In order to check L(A) ⊆ L(B) for NFAs A = (QA,Σ,∆A, Qι,A, FA) and B =
(QB,Σ,∆B, Qι,B, FB), the antichain algorithm proceeds by exploring A and B in lockstep.
Without loss of generality we assume that A and B do not have ε-transitions. While A is
explored nondeterministically, B is determinised on the fly for exploration. The algorithm
maintains an antichain, consisting of tuples of the form (sA, SB), where sA ∈ QA and SB ⊆ QB .
The ordering relation v is given by (sA, SB) v (s′A, S ′B) iff sA = s′A and SB ⊆ S ′B. The
algorithm also maintains a frontier set of tuples yet to be explored.
Given state sA ∈ QA and a symbol α ∈ Σ, let succα(sA) denote {s′A ∈ QA : (sA, α, s′A) ∈
∆A}. Given set of states SB ⊆ QB , let succα(SB) denote {s′B ∈ QB : ∃sB ∈ SB : (sB, α, s′B) ∈




In each step, the antichain algorithm explores A and B by computing α-successors of all
tuples in its current frontier set for all possible symbols α ∈ Σ. Whenever a tuple (sA, SB) is
found with sA ∈ FA and SB ∩ FB = ∅, the algorithm reports a counterexample to language
inclusion. Otherwise, the algorithm updates its frontier set and antichain to include the newly
computed successors using the two rules enumerated below. Given a newly computed successor
tuple p′, if there does not exist a tuple p in the antichain with p v p′, then p′ is added to the
115
frontier set or antichain (Rule R1). If p′ is added and there exist tuples p1, . . . , pn in the antichain
with p′ v p1, . . . , pn, then p1, . . . , pn are removed from the antichain (Rule R2). The algorithm
terminates by either reporting a counterexample, or by declaring success when the frontier
becomes empty.
6.4.4 Antichain algorithm for k-bounded language inclusion modulo I
This algorithm is essentially the same as the standard antichain algorithm, with the automaton B
above replaced by an automaton Bk,I accepting Clok,I(L(B)). The set QBk,I of states of Bk,I
consists of triples (sB, η1, η2), where sB ∈ QB and η1, η2 are words over Σ of up to k length.
Intuitively, the words η1 and η2 store symbols that are expected to be matched later along a run.
The word η1 contains a list of symbols for transitions taken by Bk,I , but not yet matched in B,
whereas η2 contains a list of symbols for transitions taken in B, but not yet matched in Bk,I . We
use ∅ to denote the empty list. Since for every transition of Bk,I , the automaton B will perform
one transition, we have |η1| = |η2|. The set of initial states of Bk,I is {(sB, ∅, ∅) : sB ∈ Qι,B}.
The set of final states of Bk,I is {(sB, ∅, ∅) : sB ∈ FB}. The transition relation ∆Bk,I is construc-
ted by repeatedly performing the following steps, in order, for each state (sB, η1, η2) and each
symbol α. In what follows, η[\i] denotes the word obtained from η by removing its ith symbol.
Given (sB, η1, η2) and α
• Step S1: Pick new s′B and β ∈ Σ such that (sB, β, s′B) ∈ ∆B
• Step S2:
(a) If ∀i: η1[i] 6= α and α is independent of all symbols in η1,
η′2 := η2 · α and η′1 := η1,
(b) else, if ∃i: η1[i] = α and α is independent of all symbols in η1 prior to i, η′1 := η1[\i]
and η′2 := η2
(c) else, go to S1
• Step S3:
(a) If ∀i: η′2[i] 6= β and β is independent of all symbols in η′2,
η′′1 :=η
′
1 · β and η′′2 := η′2,
(b) else, if ∃i: η′2[i] = β and β is independent of all symbols in η′2 prior to i, η′2 := η′2[\i]












q0, ∅, ∅ start
q1, ∅, ∅ q2, β, α q2, ∅, ∅ q1, α, β
q2, β, α q2, ∅, ∅ q2, ∅, ∅ q2, α, β
B1,{(α,β)}:
α α β β
α β α β
(c) else, go to S1
• Step S4: Add ((sB, η1, η2), α, (s′B, η′′1 , η′′2)) to ∆Bk,I and go to 1.
Example 6.4.7. In Figure 6.11, we have an NFA B with L(B) = {αβ, β}, I = {(α, β)} and
k = 1. The states of Bk,I are triples (q, η1, η2), where q ∈ QB and η1, η2 ∈ {α, β}∗. We
explain the derivation of a couple of transitions of Bk,I . The transition shown in bold from
(q0, ∅, ∅) on symbol β is obtained by applying the following steps once: S1. Pick q1 following
the transition (q0, α, q1) ∈ ∆B. S2(a). η′2 := β, η′1 := ∅. S3(a). η′′1 := α, η′′2 := β. S4. Add
((q0, ∅, ∅), β, (q1, α, β)) to ∆Bk,I . The transition shown in bold from (q1, α, β) on symbol α is
obtained as follows: S1. Pick q2 following the transition (q1, β, q2) ∈ ∆B. S2(b). η′1 := ∅,
η′2 := β. S3(b). η
′′
2 := ∅, η′′1 := ∅. S4. Add ((q1, α, β), β, (q2, ∅, ∅)) to ∆Bk,I . It can be seen that
Bk,I accepts the language {αβ, βα, β} = Clok,I(L(B)).
Proposition 6.4.8. Given k ≥ 0, the automaton Bk,I accepts at least Clok,I(L(B)).
Proof. The proof is by induction on k. The base case is trivially true, as L(B0,I) = L(B) =
Clo0,I(L(B)). The induction case assumes that Bk,I accepts at least Clok,I(L(B)) and we want
to show that Bk+1,I accepts at least Clok+1,I(L(B)). We take a word ω ∈ Clok+1,I(L(B)). It
must be derived from a word ω′ ∈ Clok,I(L(B)) by one additional forward pass of swapping.
Bk+1,I accepts ω: In step S1 we pick the same transitions in ∆B as to accept ω′. Steps S2 and
S3 will be identical as for ω′ with the exception of those adjacent symbol pairs that are newly
swapped in ω. For those pairs the symbols are first added to η2 and η1 by S2 and S3. In the next
step they are removed because the swapping only allows adjacent symbols to be swapped. This
also shows that the bound k + 1 suffices to accept ω.
In general NFA Bk,I can accept words not in Clok,I(L(B)). Intuitively this is because Bk,I
has two stacks and can also accept words where the swapping is done in a backward pass (instead
117
of a forward pass required in our definition). For our purposes it is sound to accept more words
as long as they are obtained only by swapping independent symbols.
Proposition 6.4.9. Given k ≥ 0, the automaton Bk,I accepts at most CloI(L(B)).
Proof. We need to show that ω′ ∈ Bk,I =⇒ ω′ ∈ CloI(L(B)). For this we need to show that
ω′ is a permutation of a word ω ∈ L(B) by repeatedly swapping independent, adjacent symbols.
The word ω′ must be a permutation of ω because Bk,I only accepts if η1 and η2 are empty and
the stacks represent exactly the symbols not matched yet in NFA B. Further, we need to show
only independent symbols may be swapped. The stack η1 contains the symbols not yet matched
by B and η2 the symbols that were instead accepted by B, but not yet presented as input to Bk,I .
Before adding a new symbol to the stack we ensure it is independent with all symbols on the
other stack because once matched later it will have to come after all of these. When a symbols
is removed it is ensured that it is independent with all symbols on its own stack because it is
practically moved ahead of the other symbols on the stack.
6.4.5 Language inclusion check algorithm
We develop an algorithm to check language inclusion modulo I (Section 6.4.4) by iteratively
increasing the bound k. The algorithm is incremental: the check for k + 1-bounded language
inclusion modulo I only explores paths along which the bound k was exceeded in the previous
iteration.
The algorithm for k-bounded language inclusion modulo I is presented as function INCLU-
SION in Algorithm 6.1 (ignore Lines 25-29 for now) . The antichain set consists of tuples of
the form (sA, SBk,I ), where sA ∈ QA and SBk,I ⊆ QB × Σk × Σk. The frontier consists of
tuples of the form (sA, SBk,I , cex), where cex ∈ Σ∗. The word cex is a sequence of symbols
of transitions explored in A to get to state sA. If the language inclusion check fails, cex is
returned as a counterexample to language inclusion modulo I . Each tuple in the frontier set
is first checked for equivalence w.r.t. acceptance (Line 20). If this check fails, the function
reports language inclusion failure and returns the counterexample cex (Line 20). If this check
succeeds, the successors are computed (Line 23). If a successor satisfies rule R1, it is ignored
(Line 24), otherwise it is added to the frontier (Line 31) and the antichain (Line 32). When
adding a successor to the frontier the symbol α it appended to the counterexample, denoted as
118
Algorithm 6.1 Checking language inclusion modulo I
Require: Automata A = (QA,Σ,∆A, Qι,A, FA), B = (QB,Σ,∆B, Qι,B, FB) and independ-
ence relation I ⊆ Σ× Σ
Ensure: true iff L(A) ⊆ CloI(L(B))
1: frontier ← {(sA, {(Qι,B, ∅, ∅)}, ∅) : sA ∈ Qι,A}
2: No tuple in frontier is dirty
3: antichain ← frontier
4: overflow ← ∅
5: k ← 2
6: while true do
7: cex← INCLUSION(k)
8: if cex 6= true ∧ cex is spurious then
9: k ← k + 1
10: frontier ← {(sA, SBk,I ) ∈ frontier : SBk,I not dirty} ∪ overflow
11: antichain ← {(sA, SBk,I ) ∈ antichain : SBk,I not dirty} ∪ overflow






18: while frontier 6= ∅ do
19: remove a tuple (sA, SBk,I , cex) from frontier
20: if sA ∈ FA ∧ (SBk,I ∩ FB) = ∅ then return cex
21: end if




)← succα(sA, SBk,I )
24: if @p ∈ antichain : p v (s′A, S ′Bk,I ) then . Rule R1
25: if ∃(sB, η1, η2) ∈ S ′Bk,I : |η1| > k ∨ |η2| > k then







28: S ′Bk,I ← {(sB, η1, η2) ∈ S
′
Bk,I
: |η1| ≤ k ∧ |η2| ≤ k}
29: Mark S ′Bk,I dirty
30: end if
31: frontier ← frontier ∪ {(s′A, S ′Bk,I , cex · α)}












cex · α. During the update of the antichain the algorithm ensures that its invariant is preserved
according to rule R2.
We need to ensure that our language inclusion honours the bound k by ignoring states that
exceed the bound. These states are stored for later to allow for a restart of the language inclusion
algorithm with a higher bound. Given a newly computed successor (s′A, S
′
Bk,I
) for an iteration
with bound k, if there exists some (sB, η1, η2) in S ′Bk,I such that the length of η1 or η2 exceeds k
(Line 25), we remember the tuple (s′A, S
′
Bk,I
) in the set overflow (Line 27). We then prune S ′Bk,I
by removing all states (sB, η1, η2) where |η1| > k ∨ |η2| > k (line 28) and mark S ′Bk,I as dirty
(line 28). If we find a counterexample to language inclusion we return it and test if it is spurious
(Line 8). In case it is spurious we increase the bound to k + 1, remove all dirty items from the
antichain and frontier (lines 10-11), and add the items from the overflow set (Line 12) to the
antichain set and frontier. Intuitively this will undo all exploration from the point(s) the bound
was exceeded and restarts from that/those point(s).
We call a counterexample cex from our language inclusion algorithm spurious if it is not a
counterexample to the unbounded language inclusion, formally cex ∈ CloI(L(B)). This test is
decidable because there is only a finite number of permutations of cex. This spuriousness arises
from the fact that the bounded language-inclusion algorithm is incomplete and every spurious
example can be eliminated by sufficently increasing the bound k. Note, however, that there exists
automata and independence relations for which there is a (different) spurious counterexample for
every k. In practice we test if a cex is spurious by building an automata A that accepts exactly
cex and running the language inclusion algorithm algorithm with k being the length of cex. This
is very fast because there is exactly one path through A.
Theorem 6.4.10 (bounded language inclusion check). The procedure INCLUSION of Algorithm 6.1
decides L(A) ⊆ L(Bk,I) for NFAs A, B, bound k, and independence relation I .
Proof. Our algorithm takes as arguments automata A and B. Conceptually, the algorithm
constructs Bk,I and uses the antichain algorithm [de Wulf et al., 2006] to decide the language
inclusion. For efficiency, we modify the original antichain language inclusion algorithm to
construct the automaton BI on the fly in the successor relation succ (line 23). The bound k is
enforced separately in line 25.
Theorem 6.4.11 (preemption-safety problem). If program C is not preemption-safe ([[C]]P 6b
[[C]]NP ), then Algorithm 6.1 will return false.
120
Proof. By Theorem 6.4.1 we know [[Cabs]]P 6babs [[Cabs]]NP . From Proposition 6.4.3 we get
L(Pabs) * CloID(L(NPabs)). From Proposition 6.4.9 we know that for any k this is equivalent
to L(Pabs) * L(Bk,I), where B = NPabs. Theorem 6.4.10 shows that Algorithm 6.1 decides
this for any bound k.
6.5 Finding and Enforcing Mutex Constraints
If the language inclusion check fails it returns a counterexample trace. Using this counterexample
we derive a set of mutual exclusion (mutex) constraints that we enforce in P′abs to eliminate the
counterexample and then rerun the language inclusion check with the new P′abs.
6.5.1 Finding Mutex Constraints
The counterexample cex returned by the language inclusion check is a sequence of observables.
Since our observables record every branching decision it is easy to reconstruct from cex a trace
π: tid0.`0; . . . ; tidn.`n, where each `i is a location identifier from Cabs.
Recall the definition of good, bad neighbourhoods and HB-formulæ from Section 2.2. In our
setting good traces are those that are equivalent to a non-preemptive trace and all other feasible
traces are bad.
Non-preemptive neighbourhood. First, we define function Φ to extract a conjunction of
atomic ordering constraints from a trace π, such that all traces π′ in Φ(π) produce an observation
sequence equivalent to π. Then, we obtain a correctness constraint ϕ that represents all good
traces in nhood(cex). Remember, that the good traces are those that are observationally equival-
ent to a non-preemptive trace. The correctness constraint ϕ is a disjunction over the ordering




Φ(π) enforces the order between conflicting accesses in the abstract trace π:
Φ(π) =
∧
{Ti.`j <Tk.`l : i 6= k ∧ Ti.`j precedes Tk.`l in π∧
Ti.`j, Tk.`l access same variable ∧ Ti.`j or Tk.`l is a write}
121
Algorithm 6.2 Counterexample enumeration and generalisation algorithm
Require: Trace π, formula of good traces ϕG in nhood(π)
Ensure: HB-formula of bad traces ϕB
1: Ψ← quantifier-free first-order formula representing all feasible traces in nhood(π)
2: ΨB ← Ψ ∧ ¬ϕG
3: ϕB ← false
4: while ΨB ∧ ¬ϕB is satisfiable do
5: ψ ← satisfying assignment for ΨB ∧ ¬ϕB
6: σ ← trace represented by ψ
7: . Conflicting access analysis
8: ϕB′ ← Φ(σ)
9: . Unsat-core computation
10: ϕB′′ ← MinUNSATCore(Soft← ϕB′ ,
11: Hard← Ψ ∧ ϕG)
12: ϕB ← ϕB ∨ ϕB′′
13: end while
14: return ϕB
Example. Recall the counterexample trace from the running example in Section 6.1.1: cex =
T1.`1a; T1.`1b; T2.`1a; T2.`1b; T1.`2; T2.`2; T2.`3a; T2.`3b; T2.`4; T1.`3a; T1.`3b; T1.`4. There
are two feasible traces in N gπ :
• π1 = T1.`1a; T1.`1b; T1.`2; T1.`3a; T1.`3b; T1.`4; T2.`1a; T2.`1b; T2.`2; T2.`3a; T2.`3b;
T2.`4 and
• π2 = T2.`1a; T2.`1b; T2.`2; T2.`3a; T2.`3b; T2.`4; T1.`1a; T1.`1b; T1.`2; T1.`3a; T1.`3b;
T1.`4.
We represent
• π1 as Φ(π1) = ({T1.`1a, T1.`3a, T1.`3b} < T2.`3b) ∧ (T1.`3b < {T2.`1a, T2.`3a, T2.`3b}) ∧
(T1.`2 < T2.`2) and
• π2 as Φ(π2) = (T2.`3b < {T1.`1a, T1.`3a, T1.`3b}) ∧ ({T2.`1a, T2.`3a, T2.`3b} < T1.`3b) ∧
(T2.`2 < T1.`2).
The correctness specification is Θ = Φ(π1) ∨ Φ(π2).
Counterexample enumeration and generalisation. We next build a quantifier-free first-order
formula ΨB over the event identifiers in cex such that any model of ΨB corresponds to a bad,
feasible trace in nhood(cex). A trace is feasible if it respects the preexisting synchronisation,
which is not abstracted away. Bad traces are those that are feasible under the preemptive semantics
and not in ϕG. Further, we define a generalisation function G that works on conjunctions of
122
atomic ordering constraints ϕ by iteratively removing a constraint as long as the intersection of
traces represented by G(ϕ) and ϕG is empty. This results in a local minimum of atomic ordering
constraints in G(ϕ), so that removing any remaining constraint would include a good trace in
G(ϕ). We iteratively enumerate models ψ of ΨB, building a constraint ϕB′ = Φ(ψ) for each
model ψ and generalising ϕB′ to represent a larger set of bad traces using G. This results in an
ordering constraint in disjunctive normal form ϕB =
∨
ψ∈ΨB G(Φ(ψ)), such that the intersection
of ϕB and ϕG is empty and the union equals nhood(cex).
Algorithm 6.2 shows how the algorithm works. For each model ψ of ΨB a trace σ is
extracted in Line 6. From the trace the formula ϕB′ is extracted using Φ described above (Line 8).
Line 10 describes the generalisation function G, which is implemented using an unsat core
computation. We construct a formula ϕB′ ∧ Ψ ∧ ϕG, where Ψ ∧ ϕG is a hard constraint and
ϕ′B are soft constraints. A satisfying assignment to this formula models feasible traces that
are observationally equivalent to a non-preemptive trace. Since σ is a bad trace the formula
ϕB′ ∧Ψ ∧ ϕG must be unsatisfiable. The result of the unsat core computation is a formula ϕB′′
that is a conjunction of a minimal set of happens-before constraints required to ensure all trace
represented by ϕB′′ are bad.
Example. Our trace π from Section 6.1.1 is generalised toG(Φ(π)) = T2.`1a < T1.`3b∧T1.`3b <
T2.`3b. This constraint captures the interleavings where T2 interrupts T1 between locations `1a
and `3b. Any trace that fulfils this constraint is bad. All bad traces in Nπ are represented as
P = (T2.`1a < T1.`3b ∧ T1.`3b < T2.`3b) ∨ (T1.`1a < T2.`3b ∧ T2.`3b < T1.`3b).
Inferring mutex constraints. The constraint inference uses the same patterns as are used to
infer locks in Chapter 5. From each clause ρ in P described above, we infer mutex constraints to
eliminate all bad traces satisfying ρ. The key observation we exploit is that atomicity violations
show up in our formulæ as two simple patterns of ordering constraints between events.
1. The first pattern tid1.`1 < tid2.`2 ∧ tid2.`′2 < tid1.`′1 (visualised in Figure 6.12a)
indicates an atomicity violation (thread tid2 interrupts tid1 at a critical moment).
2. The second pattern is tid1.`1 < tid2.`′2 ∧ tid2.`2 < tid1.`′1 (visualised in Figure 6.12b).
This pattern is a generalisation of the first pattern in that either tid1 interrupts tid2 or the
other way round.




Figure 6.12 Atomicity violation patterns












Example. The generalised counterexample constraint T2.`1a < T1.`3b ∧ T1.`3b < T2.`3b yields
the constraint mutex mtx(T2.[`1a : `3b],T1.[`3b : `3b]). In the next section we show how this
mutex constraint is enforced in P′abs.
6.5.2 Enforcing Mutex Constraints
To enforce mutex constraints in P′abs, we prune paths in P
′
abs that violate the mutex constraints.









j ) of location identifiers satisfying the following:




i are adjacent locations in thread tid i,
(b) `cprej , `
cpost
j are adjacent locations in the other thread tid j ,
(c) `1 ≤ `prei , `midi , `
post
i ≤ `′1 and
(d) `2 ≤ `cprej , `
cpost
j ≤ `′2.
Intuitively, a conflict represents a minimal violation of a mutex constraint due to the execution of





thread i. Note that a statement at location ` in thread tid is executed when the current location of
tid changes from ` to succ(`).








j ), let pre(c) = `
pre
i , mid(c) = `
mid
i , post(c) =
`posti , cpre(c) = `
cpre
j and cpost(c) = `
cpost
j . Further, let tid1(c) = i and tid2(c) = j. To prune
all interleavings prohibited by the mutex constraints from P′abs we need to consider all conflicts
derived from all mutex constraints. We denote this set as C and let K = |C|.
Example. We have an example program and its flow-graph in Figure 6.13 (we skip the statement
labels in the nodes here). Suppose in some iteration we obtain mtx(T1.[`a1 : `a2],T2.[`b1 : `b4]).
This yields 2 conflicts: c1 given by (`b1, `b2, `b3, `a1, `a2) and c2 given by (`b2, `b3, `b4, `a1, `a2).
124



















On an aside, this example also illustrates the difficulty of lock placement in the actual code. The
mutex constraint would naïvely be translated to the lock lock(T1.[`1 : `2],T2.[`1 : `4]). This is
not a valid lock placement; in executions executing the else branch, the lock is never released.
Constructing new P′abs. Initially, let NFA P
′
abs be given by the tuple (Qold,Σ∪{ε},∆old, Qι,old, Fold),
where
(a) Qold is the set of states 〈Vo, ctid, (`1, . . . , `n)〉 of the abstract program Cabs corresponding to
C, as well as 〈terminated〉 and 〈invalid〉,
(b) Σ is the set of abstract observable symbols,
(c) Qι,old is the initial state of Cabs,
(d) Fold = {〈terminated〉} and
(e) ∆old ⊆ Qold × Σ ∪ {ε} × Qold is the transition relation with (q, α, q′) ∈ ∆old iff q
α→ q′
according to the abstract preemptive semantics.
To enable pruning paths that violate mutex constraints, we augment the state space of P′abs to
track the status of conflicts c1, . . . , cK using four-valued propositions p1, . . . , pK , respectively.
Initially all propositions are 0. Proposition pk is incremented from 0 to 1 when conflict ck is
activated, i.e., when control moves from `prei to `
mid
i along a path. Proposition pk is incremented
from 1 to 2 when conflict ck progresses, i.e., when thread tid i is at `midi and control moves from
`cprej to `
cpost
j . Proposition pk is incremented from 2 to 3 when conflict ck completes, i.e., when
control moves from `midi to `
post
i . In practice the value 3 is never reached because the state is
pruned when the conflict completes. Proposition pk is reset to 0 when conflict ck is aborted, i.e.,
when thread tid i is at `midi and either moves to a location different from `
post
i , or moves to `
post
i





Example. In Figure 6.13, conflict c1 is activated when T2 moves from `b1 to `b2; c1 progresses if
125
now T1 moves from `a1 to `a2 and is aborted if instead T2 moves from `b2 to `b3; c1 completes
after progressing if T2 moves from `b2 to `b3 and is aborted if instead T2 moves from `b2 to `b5.
Formally, the new P′abs is given by the tuple (Qnew,Σ ∪ {ε},∆new, Qι,new, Fnew), where:
(a) Qnew = Qold × {0, 1, 2}K ,
(b) Σ is the set of abstract observable symbols as before,
(c) Qι,new = (Qι,old, (0, . . . , 0)),
(d) Fnew = {(Q, (p1, . . . , pK)) : Q ∈ Fold ∧ p1, . . . , pK ∈ {0, 1, 2}} and
(e) ∆new is constructed as follows:
add ((Q, (p1, . . . , pK)), α, (Q′, (p′1, . . . , p
′
K))) to ∆new iff
(Q,α,Q′) ∈ ∆old and for each k ∈ [1, K], the following hold:
1. Conflict activation: (the statement at location pre(ck) in thread tid1(ck) is executed)
if pk = 0, ctid = ctid′ = tid1(ck), `ctid = pre(ck) and `′ctid = mid(ck), then p
′
k = 1,
2. Conflict progress: (thread tid1(ck) is interrupted by tid2(ck) and the conflicting statement
at location cpre(ck) is executed)
else if pk = 1, ctid = ctid′ = tid2(ck), `ctid = cpre(ck) and `′ctid = cpost(ck), then
p′k = 2,
3. Conflict completion and state pruning: (the statement at location mid(ck) in thread
tid1(ck) is executed and that completes the conflict)
else if pk = 2, ctid = ctid′ = tid1(ck), `ctid = mid(ck) and `′ctid = post(ck), then delete
state (Q′, (p′1, . . . , p
′
K)),
4. Conflict abortion 1: (tid1(ck) executes alternate statement)
else if pk = 1 or 2, ctid = ctid′ = tid1(ck), `ctid = mid(ck) and `′ctid 6= post(ck), then
p′k = 0,
5. Conflict abortion 2: (tid1(ck) executes statement at location mid(ck) without interruption
by tid2(ck))
else if pk = 1, ctid = ctid′ = tid1(ck), `ctid = mid(ck) and `′ctid = post(ck), then p
′
k = 0
In our implementation, the new P′abs is constructed on-the-fly. Moreover, we do not maintain
the entire set of propositions p1, . . . , pK in each state of P′abs. A proposition pi is added to the
list of tracked propositions only after conflict ci is activated. Once conflict ci is aborted, pi is
dropped from the list of tracked propositions.
Theorem 6.5.1. We are given a program Cabs and a sequence of observable symbols ω that
is a counterexample to preemption-safety, formally ω ∈ L(P′abs) ∧ ω /∈ CloI(L(NPabs)). If a
126
pattern P eliminating ω is found, then, after enforcing all resulting mutex constraints in P′abs,
the counterexample ω is no longer accepted by P′abs, formally ω /∈ L(P′abs).
Proof. The pattern P eliminating ω represents a mutex constraint mtx(tid i.[`1:`′′1], tid j.[`2:`
′′
2]),









j ). Each such conflict represents a context switch that is not allowed:




j → `midi → `
post
i . Because P eliminates ω we know that ω has a
context switch from tid i.`′1 to tid j.`
′
2, where `1 ≤ `′1 ≤ `′′1 and `2 ≤ `′2 ≤ `′′2. One of the conflicts
























immediately following `′2. If now a context switch happens at location `
′
1 switching to location
`′2, this triggers the conflict and this trace will be discarded in P
′
abs.
6.6 Global Lock Placement Constraints
Our synthesis loop will keep collecting and enforcing conflicts P′abs until the language inclusion
check holds. At that point we have collected a set of conflicts Call that need to be enforced in
the original program source code. To avoid deadlocks, the lock placement has to conform to a
number of constraints.
We encode the global lock placement constraints for ensuring correctness as an SMT2
formula LkCons. Let L denote the set of all location and Lk denote the set of all locks
available for synthesis. We use scalars `, `′, `1, . . . of type L to denote locations and scal-
ars LkVar ,LkVar ′,LkVar 1, . . . of type Lk to denote locks. The number of locks is finite and
there is a fixed locking order. Let Pre(`) denote the set of all immediate predecessors in node
` : stmt(`) in the flow-graph of the thread tid(`) in C. We use the following Boolean variables
in the encoding.
2The encoding of the global lock placement constraints is essentially a SAT formula. We present and use this as
an SMT formula to enable combining the encoding with objective functions for optimisation (see Section 6.7).
127
LockBefore(`,LkVar) lock(LkVar) is placed just before the state-
ment represented by `
LockAfter(`,LkVar) lock(LkVar) is placed just after the statement
represented by `
UnlockBefore(`,LkVar) unlock(LkVar) is placed just before the state-
ment represented by `
UnlockAfter(`,LkVar) unlock(LkVar) is placed just after the state-
ment represented by `
For every location ` in the source code we allow a lock to be placed either immediately
before or after it. If a lock LkVar is placed before `, than ` is protected by LkVar . If LkVar is
placed after `, than ` is not protected by LkVar , but the successor statements are. Both options
are needed, e.g. to lock before the first statement of a thread and to unlock after the last statement
of a thread. We define three additional Boolean variables:
(D1) InLock(`,LkVar): If location ` has no predecessor than it is protected by LkVar if there
is a lock statement before `.
InLock(`,LkVar) = LockBefore(`,LkVar)
If there exists a predecessor `′ to ` than ` is protected by LkVar if either there is a lock
statement before ` or if `′ is protected by LkVar and there is no unlock in between.
InLock(`,LkVar) = LockBefore(`,LkVar) ∨ (¬UnlockBefore(`,LkVar)∧
InLockEnd(`′,LkVar))
Note that either all predecessors are protected by a lock or none. We enforce this in Rule
(C7) below.
(D2) InLockEnd(`,LkVar): The successors of ` are protected by LkVar if either location ` is
protected by LkVar or lock(LkVar) is placed after `.
(InLock(`,LkVar) ∧ ¬UnlockAfter(`,LkVar)) ∨ LockAfter(`,LkVar)
(D3) Order(LkVar ,LkVar ′): We give a fixed lock order that is transitive, asymmetric, and
irreflexive.
128
Order(LkVar ,LkVar ′) = true iff LkVar needs to be acquired before LkVar ′. This
means that an instruction lock(LkVar) cannot be place inside the scope of LkVar ′.
We describe the constraints and their SMT formulation constituting LkCons below. All
constraints are quantified over all `, `′, `1, . . . ∈ L and all LkVar ,LkVar ′,LkVar 1, . . . ∈ Lk.
(C1) All locations in the same conflict in Call are protected by the same lock.
∀C ∈ Call : `, `′ ∈ C⇒ ∃LkVar . InLock(`,LkVar) ∧ InLock(`′,LkVar)
(C2) Placing lock(LkVar) immediately before/after unlock(LkVar) is disallowed. Doing so
would make (C1) unsound, as two adjacent locations could be protected by the same lock
and there could still be a context-switch in between because of the immediate unlocking
and locking again. If ` has a predecessor `′ then
UnlockBefore(`,LkVar)⇒(¬LockAfter(`′,LkVar))
LockBefore(`,LkVar)⇒(¬UnlockAfter(`′,LkVar))
(C3) We enforce the lock order according to Order defined in (D3).




InLockEnd(`′,LkVar ′))⇒ Order(LkVar ′,LkVar)
(C4) Existing locks may not be nested inside synthesised locks. They are implicitly ordered
before synthesised locks in our lock order.
(stmt(`) = lock(. . .))⇒ ¬InLock(`,LkVar)
(C5) No wait statements may be in the scope of synthesised locks to prevent deadlocks.
(stmt(`) = wait(. . .)/wait_not(. . .)/wait_reset(. . .))⇒ ¬InLock(`,LkVar)
(C6) Placing both lock(LkVar) and unlock(LkVar) before/after ` is disallowed.
(¬LockBefore(`,LkVar) ∨ ¬UnlockBefore(`,LkVar)) ∧
(¬LockAfter(`,LkVar) ∨ ¬UnlockAfter(`,LkVar))
(C7) All predecessors must agree on their InLockEnd status. This ensures that joining branches









(C8) unlock(LkVar) can only be placed only after a lock(LkVar).
UnlockAfter(`,LkVar)⇒ InLock(`,LkVar)
If ` has a predecessor `′ then also
UnlockBefore(`,LkVar)⇒ InLockEnd(`′,LkVar)
else if ` has no predecessor then
UnlockBefore(`,LkVar) = false
(C9) We forbid double locking: A lock may not be acquired if that location is already protected
by the lock.
LockAfter(`,LkVar)⇒ ¬InLock(`,LkVar)
If ` has a predecessor `′ then also
LockBefore(`,LkVar)⇒ ¬InLockEnd(`,LkVar)
(C10) The end state lasti of thread i is unlocked. This prevents locks from leaking.
∀i : ¬InLock(lasti, lk)
According to constraints (C4) and (C5) no locks may be placed around existing wait or lock
statements. Since both statements are implicit preemption points, where the non-preemptive
semantics may context-switch, it is never necessary to synthesise a lock across an existing lock
or wait statement to ensure preemption-safety.
We have the following result.
Theorem 6.6.1. Let concurrent program C ′ be obtained by inserting any lock placement satisfy-
ing LkCons into concurrent program C. Then C ′ is guaranteed to be preemption-safe w.r.t. C and
not to introduce new deadlocks (that were not already present in C).
Proof. To show preemption-safety we need to show that language inclusion holds (Proposi-
tion 6.4.3). Language inclusion follows directly from constraint (C1), which ensures that all
mutex constraints are enforced as locks. Further, constraints (C2) and (C6) ensure that there
is never a releasing and immediate reacquiring of locks in between statements. This is crucial
because otherwise a context-switch in between two instructions protected by a lock would be
possible.
Let as assume towards contradiction that a new deadlocked state s = 〈V , ctid, (`1, . . . , `n)〉
is reachable in C ′. By definition this means that none of the rules of the preemptive semantics
130
ofW (Figures 2.3 and 2.4) is applicable in s. Remember, that an infinite loop is considered a
lifelock. We proceed to enumerate all rules of the preemptive semantics that may block:
• All threads reached their last location, then the TERMINATE rule is the only one that
could be applicable. If it is not, then a lock is still locked. This deadlock is prevented by
condition (C10).
• The rule NSWITCH is not applicable because the other thread is blocked and SEQ is not
applicable because none of the rules of the single-thread semantics (Figure 2.2) apply.
The following sequential rules have preconditions that may prevent them from being
applicable.
– Rule LOCK may not proceed if the lock LkVar is taken. If LkVar = ctid we
have a case of double-locking that is prevented by constraint (C9). Otherwise
LkVar = j 6= ctid. In this case tid ctid is waiting for tid j . This may be because of
(a) a circular dependency of locks. This cannot be a new deadlock because of
constraints (C4) and (C3) enforcing a strict lock order even w.r.t. existing locks.
(b) another deadlock in tid j . This deadlock cannot be new because we can make a
recursive argument about the deadlock in tid j .
– Rule UNLOCK may not proceed if the lock is not owned by the executing thread.
In this case we either have a case of double-unlock (prevented by constraint (C8))
or a lock is unlocked that is not held by tid ctid at that point. The latter may happen
because the lock was not taken on all control flow paths leading to `ctid. This is
prevented by constraints (C7) and (C8).
– Rules WAIT/WAIT_NOT/WAIT_RESET may not proceed if the condition variable is
not in the right state. According to constraint (C5) `ctid cannot be protected by a syn-
thesised lock. This means the deadlock is either not new or it is caused by a deadlock
in a different thread making it impossible to reach signal(CondVar)/reset(CondVar).
In that case a recursive argument applies.
• The THREAD_END rule is not applicable because all other threads are blocked. This is
impossible by the same reasoning as above.
131
6.7 Optimising Lock Placement
The global lock placement constraint LkCons constructed in Section 6.6 often has multiple models
corresponding to very different lock placements. The desirability of these lock placements varies
considerably due to performance considerations. For example a coarse-grained lock placement
may be useful when the cost of locking operations is relatively high compared to the cost of
executing the critical sections, while a fine-grained lock placement should be used when locking
operations are cheap compared to the cost of executing the critical sections. Neither of these
lock placement strategies is guaranteed to find the optimally performing program in all scenarios.
It is necessary for the programmer to judge when each criterion is to be used.
Here, we present objective functions f to distinguish between different lock placements. Our
synthesis algorithm combines the function f with the global lock placement constraints LkCons
into a single maximum satisfiability modulo theories (MaxSMT) problem and the optimal model
corresponds to the f -optimal lock placement. We present objective functions for coarse- and
fine-grained locking.
Objective functions. We say that a statement ` : stmt in a concurrent program C is protected
by a lock LkVar if InLock(`,LkVar) is true. We define the two objective functions as follows:
1. Coarsest-grained locking. This objective function prefers a program C1 over C2 if the
number of lock statements in C1 is fewer than in C2. Among the programs having the same
number of lock statements, the ones with the fewest statements protected by any lock are
preferred. Formally, we can define Coarse(Ci) to be λ+ ε · StmtInLock(Ci) where λ is the
count of lock statements in Ci, StmtInLock(Ci) is the count of statements in Ci that are
protected by any lock and ε is given by 1
2k
where k is the total number of statements in Ci.
The reasoning behind this formula is that the total cost is always dominated by the number




2. Finest-grained locking. This objective function prefers a program C1 over C2 if C1 allows
more concurrency than C2. Concurrency of a program is measured by the number of
pairs of statements from different threads that cannot be executed together. Formally, we
define Fine(Ci) to be the total number of pairs of statements `1 : stmt1 and `2 : stmt2 from
132
different threads that cannot be executed at the same time, i.e., are protected by the same
lock.
Optimisation procedure. The main idea behind the optimisation procedure for the above
objective functions is to build an instance of the MaxSMT problem using the global lock
placement constraint LkCons such that (a) every model of LkCons is a model for the MaxSMT
problem and the other way round; and (b) the cost of each model for the MaxSMT problem is
the cost of the corresponding locking scheme according to the chosen objective function. The
optimal lock placement is then computed by solving the MaxSMT problem.
A MaxSMT problem instance is given by 〈Φ, 〈(Ψ1, w1), . . .〉〉 where Φ and each Ψi are SMT
formulæ and each wi is a real number. The formula Φ is called the hard constraint, and each
Ψi is called a soft constraint with associated weight wi. Given an assignment V of variables
occurring in the constraints, its cost c is defined as the sum of the weights of soft constraints that
are falsified by V : c =
∑
i:V 6|=Ψi wi. The objective of the MaxSMT problem is to find a model
that satisfies Φ with minimal cost. Intuitively, by minimising the cost we maximise the sum of
the weights of the satisfied soft constraints.
In the following, we write InLock(`) as a short-hand for
∨
LkVar InLock(`,LkVar), and
similarly LockBefore(`) and LockAfter(`). For each of our two objective functions, the hard
constraint for the MaxSMT problem is LkCons and the soft constraints and associated weights
are as specified below:
• For the coarsest-grained locking objective function, the soft constraints are of three types:
(a) ¬LockBefore(`) with weight 1, (b) ¬LockAfter(`) with weight 1, and (c) ¬InLock(`)
with weight ε, where ε is as defined above.
• For the finest-grained locking objective function, the soft constraints are given by
∧
lk ¬InLock(`, lk)∨
¬InLock(`′, lk), for each pair of statements ` and `′ from different threads. The weight of
each soft constraint is 1.
Theorem 6.7.1. For the coarsest-grained and finest-grained objective functions, the cost of
the optimal program is equal to the cost of the model for the corresponding MaxSMT problem
obtained as described above.
133
6.8 Implementation and Experiments
In order to evaluate our synthesis algorithm, we implemented it in a tool called LISS, comprised
of 5400 lines of C++ code. LISS uses Clang/LLVM 3.6 to parse C code and insert locks into the
code. By using Clang’s rewriter, LISS is able to maintain the original formatting of the source
code. As a MaxSMT solver, we use Z3 version 4.4.1 (unstable branch). LISS is available as
open-source software along with benchmarks3. The language inclusion algorithm is available
separately as a library called LIMI4. LISS implements the synthesis method presented in this
chapter with several optimisations. For example, we take advantage of the fact that language
inclusion violations can often be detected by exploring only a small fraction of NPabs and P′abs,
which we construct on the fly.
Our prototype implementation has some limitations. First, LISS uses function inlining during
the analysis phase and therefore cannot handle recursive programs. During lock placement,
however, functions are taken into consideration and it is ensured that a function does not “leak”
locks. Second, we do not implement any form of alias analysis, which can lead to unsound
abstractions. For example, we abstract statements of the form “*x = 0” as writes to variable x,
while in reality other variables can be affected due to pointer aliasing. We sidestep this issue
by manually massaging input programs to eliminate aliasing. This is not a limitation of our
technique, which could be combined with known aliasing analysis techniques.
We evaluate our synthesis method w.r.t. the following criteria: (1) Effectiveness of synthesis
from implicit specifications; (2) Efficiency of the proposed synthesis algorithm; (3) Precision of
the proposed coarse abstraction scheme on real-world programs; (4) Quality of the locks placed.
6.8.1 Benchmarks
We ran LISS on a number of benchmarks, summarised in Table 6.1. For each benchmark we
report the complexity (lines of code (LOC), number of threads (Th)), the number of iterations
(It) of the language inclusion check (Figure 6.5) and the maximum bound k (MB) that was used
in any iteration of the language inclusion check. Further we report the total time (TT) taken by




objective functions (Coarse, Fine). Finally, we report the maximum resident set size (Memory).
All measurements were done on an Intel core i5-3320M laptop with 8GB of RAM.
Implicit vs explicit specification. In order to evaluate the effectiveness of synthesis from
implicit specifications, we apply LISS to the set of benchmarks used in our previous CONREPAIR
tool for assertion-based synthesis (Chapter 4). In addition, we evaluate LISS and CONREPAIR
on several new assertion-based benchmarks (Table 6.1). We added yield statements to the source
code of the benchmarks to indicate where a context-switch in the driver would be expected by
the developer. This is a very light-weight annotation burden compared to the assertions required
by CONREPAIR.
The set includes synthetic microbenchmarks modelling typical concurrency bug patterns in
Linux drivers and the usb-serial macrobenchmark, which models a complete synchronisa-
tion skeleton of the USB-to-serial adapter driver. For LISS we preprocess these benchmarks
by eliminating assertions used as explicit specifications for synthesis. In addition, we replace
statements of the form assume(v) with await(v), redeclaring all variables v used in such
statements as condition variables. This is necessary as our program syntax does not include
assume statements.
We use LISS to synthesise a preemption-safe, deadlock-free version of each benchmark.
This method is based on the assumption that the benchmark is correct under non-preemptive
scheduling and bugs can only arise due to preemptive scheduling. We discovered two benchmarks
(lc-rc.c and myri10ge.c) that violated this assumption, i.e., they contained race conditions
that manifested under non-preemptive scheduling; LISS did not detect these race conditions.
LISS was able to detect and fix all other known races without relying on assertions. Furthermore,
LISS detected a new race in the usb-serial family of benchmarks, which was not detected by
CONREPAIR due to a missing assertion. We compared the output of LISS (using coarse-grained
locking as an objective function) with manually placed synchronisation (taken from real bug
fixes) and found that the two versions were similar in most of our examples.
Performance and precision. CONREPAIR uses CBMC for verification and counterexample
generation. Due to the coarse abstraction we use, both are much cheaper with LISS. For example,
verification of usb-serial.c, which was the most complex in our set of benchmarks, took
LISS 103 seconds, whereas it took CONREPAIR 20 minutes.
The loss of precision due to abstraction may cause the inclusion check to return a counter-
135
Table 6.1 Experiments
Name LOC Th It MB TT Coarse Fine Memory CR
ConRepair benchmarks
ex1.c 18 2 1 1 <1s <1s <1s 29MB <1s
ex2.c 23 2 1 1 <1s <1s <1s 29MB <1s
ex3.c 37 2 1 1 <1s <1s <1s 29MB <1s
ex5.c 42 2 4 1 <1s <1s <1s 32MB <1s
lc-rc.cc 35 4 0 1 <1s N/A N/A 15MB 9s
dv1394.c 37 2 2 1 <1s <1s <1s 32MB 17s
em28xx.c 20 2 1 1 <1s <1s <1s 29MB <1s
f_acm.c 54 3 6 1 <1s <1s <1s 35MB 1872s
i915_irq.c 17 2 1 1 <1s <1s <1s 29MB 2.6s
ipath.c 23 2 1 3 <1s <1s <1s 29MB 12s
iwl3945.c 26 3 0 1 <1s <1s <1s 15MB 5s
md.c 35 2 1 1 <1s <1s <1s 30MB 1.5s
myri10ge.cc 60 4 0 3 <1s N/A N/A 16MB 1.5s
usb-serial.bug1.c 357 7 2 1 6.1s <1s <1s 267MB ∞b
usb-serial.bug2.c 355 7 2 1 4.5s <1s <1s 175MB 3563s
usb-serial.bug3.c 352 7 2 1 2.8s <1s <1s 105MB ∞b
usb-serial.bug4.c 351 7 2 1 3.8s <1s <1s 130MB ∞b
usb-serial.ca 357 7 0 3 31.9s N/A N/A 792MB 1200s
CPMAC driver benchmark
cpmac.bug1.c 1275 5 1 2 6s 1.6s 1.1s 156MB
cpmac.bug2.c 1275 5 4 10 152.9s 63s 41.4s 1210MB
cpmac.bug3.c 1270 5 9 4 11.1s 16.2s 9.6s 521MB
cpmac.bug4.c 1276 5 4 7 107.3s 10.5s 6.5s 5392MB
cpmac.bug5.c 1275 5 4 4 136.5s 11s 7.7s 3549MB
cpmac.ca 1276 5 0 1 2.1s N/A N/A 114MB
memcached benchmark
memcached.c 294 2 104 2 22.8s 6.2s 2.1s 114MB
Th=Threads, It=Iterations, MB=Max bound, TT=Time for language incl. loop,
CR=CONREPAIR time
a bug-free example
b timeout after three hours
c race not detected, as it was present under non-preemptive scheduling
136
Table 6.2 Lock placement statistics: the number of synthesised lock variables, lock and unlock
statements, and the number of abstract statements protected by locks for different objective
functions.
Name
















cpmac.bug1 2 6/6 11 1 3/3 11 1 3/3 9
cpmac.bug2 2 22/23 119 1 4/4 98 1 6/7 95
cpmac.bug3 1 4/4 29 1 2/3 29 1 5/6 28
cpmac.bug4 4 16/16 53 1 4/4 53 1 6/6 26
cpmac.bug5 3 15/15 30 1 4/4 30 1 5/5 30
memcached 2 5/5 26 1 1/1 28 1 2/2 24
example that is spurious in the concrete program, leading to unnecessary synchronisation being
synthesised. On our existing benchmarks, this only occurred once in the usb-serial driver,
where abstracting away the return value of a function led to an infeasible trace. We refined the
abstraction manually by introducing a guard variable to model the return value.
Real-world benchmarks. While these results are encouraging, synthetic benchmarks are not
necessarily representative of real-world performance.
CPMAC benchmark. We therefore implemented another set of benchmarks based on a complete
Linux driver for the TI AR7 CPMAC Ethernet controller. The benchmark was constructed
as follows. We manually preprocessed driver source code to eliminate pointer aliasing. We
combined the driver with a model of the OS API and the software interface of the device written
in C. We modelled most OS API functions as writes to a special memory location. Groups of
unrelated functions were modelled using separate locations. Slightly more complex models
were required for API functions that affect thread synchronisation. For example, the free_irq
function, which disables the driver’s interrupt handler, blocks, waiting for any outstanding
interrupts to finish. Drivers can rely on this behaviour to avoid races. We introduced a condition
variable to model this synchronisation. Similarly, most device accesses were modelled as writes
to a special ioval variable. Thus, the only part of the device that required a more accurate
model was its interrupt enabling logic, which affects the behaviour of the driver’s interrupt
handler thread.
Our original model consisted of eight threads. LISS ran out of memory on this model, so we
simplified it to five threads by eliminating parts of driver functionality. Nevertheless, we believe
137
that the resulting model represents the most complex synchronisation synthesis case study, based
on real-world code, reported in the literature.
The CPMAC driver used in this case study did not contain any known concurrency bugs,
so we artificially simulated five typical race conditions that commonly occur in drivers of this
type (see Section 3.5.1). LISS was able to detect and automatically fix each of these defects
(bottom part of Table 6.1). The coarse abstraction may lead to unnecessary synchronisation,
which may be solved by abstraction refinement using guard variables. We only encountered two
program locations where manual abstraction refinement was necessary. This process could be
automated; automatic abstraction refinement is a known technique [Vechev et al., 2010a] we
could implement in LISS.
Memcached benchmark. Finally, we evaluate LISS on memcached, an in-memory key-value
store version 1.4.5 [memcached]. The core of memcached is a non-reentrant library of store
manipulation primitives. This library is wrapped into the thread.c module that implements
a thread-safe API used by server threads. Each API function performs a sequence of library
calls protected with locks. In this case study, we synthesise lock placement for a fragment of the
thread.c module. In contrast to our other case studies, here we would like to synthesise lock-
ing from scratch rather than fix defects in existing lock placement. Furthermore, optimal locking
strategy in this benchmark depends on the specific load. We envision that the programmer may
synthesise both a coarse-grained and a fine-grained version and at deployment the appropriate
version is selected.
Quality of synthesis. Next, we focus on the quality of synthesised solutions for the two real-
world benchmarks from our benchmark set. Table 6.2 compares the implementation synthesised
for these benchmarks using each objective functions in terms of (1) the number of locks used in
synthesised code, (2) the number of lock and unlock statements generated, and (3) total number
of program statements protected by synthesised locks.
We observe that different objective functions produce significantly different results in terms
of the size of synthesised critical sections and the number of lock and unlock operations guarding
them: the fine-grained objective synthesises smaller critical sections at the cost of introducing a
larger number of lock and unlock operations. Implementations synthesised without an objective
function are clearly of lower quality than the optimised versions: they contains large critical
sections, protected by unnecessarily many locks. These observations hold for the CPMAC
138
benchmarks, where we start with a program that has most synchronisation already in place, as
well as the memcached benchmark, where we synthesise synchronisation from scratch.
To summarise our experiments, we found that (1) our coarse abstraction is highly precise
in practice; (2) manual effort involved in synchronisation synthesis can be further reduced via
automatic abstraction refinement; (3) additional work is required to improve the performance
of our method to be able to handle real-world systems without simplification; (4) the objective
functions allow specialising synthesis to a particular locking scheme; (5) the time required to




In this thesis we introduced a number of synchronisation techniques for concurrent programs.
We started with a simple technique that can infer statement reorderings and atomic sections for
concurrent programs with assertions (Chapters 3 and 4). Next, we improved the synthesis by
considering additional synchronisation primitives: Instead of atomic sections, which are not
directly implementable, we now place locks, barriers and wait-signal statements. Finally, we
moved from an explicit specification to an implicit specification. The implicit specification
relieves the programmer of the burden of providing sufficient assertions to specify correctness of
the program. Our synthesis is guaranteed not to introduce deadlocks and the lock placement can
be optimised using a static objective function.
We developed a number of tools for this thesis. The most mature tool is LISS, which can
parse a significant framework of the C programming language and automatically insert missing
locks so that the program is correct w.r.t. our implicit specification. A number of key elements
are missing to enable LISS to process real-world programs: For example to process pointers
we would need an aliasing analysis. An abstraction refinement would be needed to increase
precision of the synthesis. Further, for real-world programs there could be performance issues
because the automata in the language inclusion step would grow too large.
An this point we have a number of research directions we can pursue. In ongoing work
[Černý et al., 2015a] we aim to optimise lock placement not merely using syntactic criteria,
but by optimising the actual performance of the program running on a specific system. In this
approach we start with a synthesised program that uses coarse locking and then profile the
performance on a real system. Using those measurements we adjust the locking to be more
140
fine-grained in those areas where a high contention was measured.
Another direction are weak memory models. In this thesis we assumed a sequentially consist
memory model, which is an idealised model not found in computers today. To deal with weak
memory models we could adapt our techniques to synthesise fences.
141
Bibliography
[Albarghouthi and McMillan, 2013] A. Albarghouthi and K. McMillan, “Beautiful Interpolants,”
In CAV, pages 313–329, 2013.
[Alglave et al., 2014] J. Alglave, D. Kroening, V. Nimal, and D. Poetzl, “Don’t Sit on the Fence
— A Static Analysis Approach to Automatic Fence Insertion,” In CAV, pages 508–524, 2014.
[Alglave et al., 2013] J. Alglave, D. Kroening, and M. Tautschnig, “Partial Orders for Efficient
Bounded Model Checking of Concurrent Software,” In CAV, pages 141–157, 2013.
[Alur et al., 2013] R. Alur, R. Bodík, G. Juniwal, M. Martin, M. Raghothaman, S. Seshia,
R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa, “Syntax-guided synthesis,” In FMCAD,
pages 1–17, 2013.
[Ball et al., 2001] T. Ball, R. Majumdar, T. Millstein, and S. K. Rajamani, “Automatic predicate
abstraction of C programs,” In PLDI, pages 203–213, 2001.
[Basu et al., 2003] S. Basu, D. Saha, Y. Lin, and S. Smolka, “Generation of all counter-examples
for push-down systems,” In FORTE, pages 79–94, 2003.
[Bertoni et al., 1982] A. Bertoni, G. Mauri, and N. Sabadini, “Equivalence and membership
problems for regular trace languages,” In Automata, Languages and Programming, pages
61–71, 1982.
[Beyer, 2014] D. Beyer, “Status Report on Software Verification,” In TACAS, pages 373–388,
2014, http://sv-comp.sosy-lab.org/.
[Beyer et al., 2007] D. Beyer, T. Henzinger, R. Majumdar, and A. Rybalchenko, “Path invari-
ants,” In PLDI, pages 300–309, 2007.
142
[Beyer and Keremoglu, 2011] D. Beyer and E. Keremoglu, “CPAchecker: A Tool for Con-
figurable Software Verification,” In CAV, pages 184–190, 2011, http://cpachecker.
sosy-lab.org/.
[Biere et al., 2003] A. Biere, A. Cimatti, E. M. Clarke, O. Strichman, and Y. Zhu, “Bounded
model checking,” Advances in computers, 58:117–148, 2003.
[Bjesse and Kukula, 2004] P. Bjesse and J. Kukula, “Using counter example guided abstraction
refinement to find complex bugs,” In DATE, pages 156–161, 2004.
[Bjørner et al., 2013] N. Bjørner, K. McMillan, and A. Rybalchenko, “On Solving Universally
Quantified Horn Clauses,” In SAS, pages 105–125, 2013.
[Bloem et al., 2014] R. Bloem, G. Hofferek, B. Könighofer, R. Könighofer, S. Außerlechner,
and R. Spörk, “Synthesis of Synchronization using Uninterpreted Functions,” In FMCAD,
pages 35–42, 2014.
[Bryant, 1986] R. Bryant, “Graph-Based Algorithms for Boolean Function Manipulation,”
Computers, IEEE Transactions on, C-35(8):677–691, 1986.
[Cadiou and Levy, 1973] J. M. Cadiou and J.-J. Levy, “Mechanizable proofs about parallel
processes,” In Switching and Automata Theory, pages 34–48, 1973.
[Černý et al., 2015a] P. Černý, E. M. Clarke, T. A. Henzinger, A. Radhakrishna, L. Ryzhyk,
R. Samanta, and T. Tarrach, “Optimizing Solution Quality in Synchronization Synthesis,”
ArXiv e-prints, November 2015, arXiv:1511.07163.
[Černý et al., 2015b] P. Černý, E. M. Clarke, T. A. Henzinger, A. Radhakrishna, L. Ryzhyk,
R. Samanta, and T. Tarrach, “From Non-preemptive to Preemptive Scheduling Using Syn-
chronization Synthesis,” In CAV, pages 180–197, 2015.
[Černý et al., 2013] P. Černý, T. A. Henzinger, A. Radhakrishna, L. Ryzhyk, and T. Tarrach,
“Efficient Synthesis for Concurrency by Semantics-Preserving Transformations,” In CAV,
pages 951–967, 2013.
[Černý et al., 2014] P. Černý, T. A. Henzinger, A. Radhakrishna, L. Ryzhyk, and T. Tarrach,
“Regression-Free Synthesis for Concurrency,” In CAV, pages 568–584, 2014.
143
[Chatterjee and Henzinger, 2007] K. Chatterjee and T. A. Henzinger, “Assume-guarantee syn-
thesis,” In Tools and Algorithms for the Construction and Analysis of Systems, pages 261–275,
2007.
[Cherem et al., 2008] S. Cherem, T. Chilimbi, and S. Gulwani, “Inferring locks for atomic
sections,” In PLDI, pages 304–315, 2008.
[Chou et al., 2001] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, “An Empirical Study
of Operating Systems Errors,” In SOSP, pages 73–88, 2001.
[Church, 1962] A. Church, “Logic, arithmetic and automata,” In Proceedings of the interna-
tional congress of mathematicians, pages 23–35, 1962.
[Clarke et al., 1986] E. M. Clarke, E. A. Emerson, and A. P. Sistla, “Automatic verification of
finite-state concurrent systems using temporal logic specifications,” ACM Trans. Program.
Lang. Syst., 8(2):244–263, 1986.
[Clarke et al., 2000] E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith, “Counterexample-
Guided Abstraction Refinement,” In CAV, pages 154–169, 2000.
[Clarke et al., 2004] E. M. Clarke, D. Kroening, and F. Lerda, “A Tool for Checking ANSI-C
Programs,” In TACAS, pages 168–176, 2004, http://www.cprover.org/cbmc/.
[Clarke et al., 2005] E. M. Clarke, D. Kroening, N. Sharygina, and K. Yorav, “SATABS: SAT-
based predicate abstraction for ANSI-C,” In TACAS, pages 570–574, 2005.
[Clarke and Emerson, 1982] E. M. Clarke and E. A. Emerson, “Design and synthesis of syn-
chronization skeletons using branching time temporal logic,” In Logics of Programs, pages
52–71, 1982.
[Clarke et al., 1983] E. M. Clarke, E. A. Emerson, and A. P. Sistla, “Automatic verification of
finite state concurrent system using temporal logic specifications: a practical approach,” In
POPL, pages 117–126, 1983.
[Cousot and Cousot, 1977] P. Cousot and R. Cousot, “Abstract interpretation: a unified lattice
model for static analysis of programs by construction or approximation of fixpoints,” In
POPL, pages 238–252, 1977.
144
[de Moura and Bjørner, 2008] L. de Moura and N. Bjørner, “Z3: An efficient SMT solver,” In
Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340, 2008.
[de Wulf et al., 2006] M. de Wulf, L. Doyen, T. A. Henzinger, and J.-F. Raskin, “Antichains: A
new algorithm for checking universality of finite automata,” In CAV, pages 17–30, 2006.
[Deshmukh et al., 2010] J. Deshmukh, G. Ramalingam, V. Ranganath, and K. Vaswani, “Lo-
gical Concurrency Control from Sequential Proofs,” In Programming Languages and Systems,
pages 226–245, 2010.
[Dijkstra, 1968] E. W. Dijkstra, “Co-operating Sequential Processes,” Programming Languages.
Academic Press, New York, 1968.
[Donaldson et al., 2011] A. Donaldson, A. Kaiser, D. Kroening, and T. Wahl, “Symmetry-
Aware Predicate Abstraction for Shared-Variable Concurrent Programs,” In CAV, pages
356–371, 2011.
[Elmas et al., 2009] T. Elmas, S. Qadeer, and S. Tasiran, “A calculus of atomic actions,” In
POPL, pages 2–15, 2009.
[Engler and Ashcraft, 2003] D. Engler and K. Ashcraft, “RacerX: effective, static detection of
race conditions and deadlocks,” In SOSP, pages 237–252, 2003.
[Ermis et al., 2012] E. Ermis, M. Schäf, and T. Wies, “Error Invariants,” In FM, pages 187–201,
2012.
[Esparza, 1997] J. Esparza, “Decidability of model checking for infinite-state concurrent sys-
tems,” Acta Informatica, 34(2):85–107, 1997.
[Eswaran et al., 1976] K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger, “The notions
of consistency and predicate locks in a database system,” Communications of the ACM,
19(11):624–633, 1976.
[Farchi et al., 2003] E. Farchi, Y. Nir, and S. Ur, “Concurrent bug patterns and how to test them,”
In PDPS, page 7 pp., 2003.
[Farzan et al., 2013a] A. Farzan, A. Holzer, N. Razavi, and H. Veith, “Con2colic testing,” In
FSE, pages 37–47, 2013.
145
[Farzan et al., 2013b] A. Farzan, Z. Kincaid, and A. Podelski, “Inductive data flow graphs,” In
POPL, pages 129–142, 2013.
[Flanagan and Qadeer, 2003] C. Flanagan and S. Qadeer, “Types for atomicity,” In ACM
SIGPLAN Notices, pages 1–12, 2003.
[Floyd, 1967] R. W. Floyd, “Assigning meanings to programs,” Mathematical aspects of
computer science, 19(19-32):1, 1967.
[Glusman et al., 2003] M. Glusman, G. Kamhi, S. Mador-Haim, R. Fraer, and M. Vardi,
“Multiple-counterexample guided iterative abstraction refinement: An industrial evaluation,”
In TACAS, pages 176–191, 2003.
[Godefroid et al., 1996] P. Godefroid, D. Peled, and M. Staskauskas, “Using partial-order
methods in the formal validation of industrial concurrent programs,” In ISSTA, pages 261–269,
1996.
[Graf and Saidi, 1997] S. Graf and H. Saidi, “Construction of abstract state graphs with PVS,”
In Computer Aided Verification, pages 72–83, 1997.
[Green, 1969] C. Green, “Application of theorem proving to problem solving,” Technical report,
DTIC Document, 1969.
[Griesmayer et al., 2006] A. Griesmayer, R. Bloem, and B. Cook, “Repair of Boolean Programs
with an Application to C,” In CAV, pages 358–371, 2006.
[Gupta et al., 2011a] A. Gupta, C. Popeea, and A. Rybalchenko, “Solving Recursion-Free Horn
Clauses over LI+UIF,” In APLAS, pages 188–203, 2011.
[Gupta et al., 2011b] A. Gupta, C. Popeea, and A. Rybalchenko, “Threader: A Constraint-Based
Verifier for Multi-threaded Programs,” In CAV, pages 412–417, 2011.
[Gupta et al., 2015] A. Gupta, T. A. Henzinger, A. Radhakrishna, R. Samanta, and T. Tarrach,
“Succinct Representation of Concurrent Trace Sets,” In POPL, pages 433–444, 2015.
[Henzinger et al., 2004] T. A. Henzinger, R. Jhala, R. Majumdar, and K. L. McMillan, “Ab-
stractions from proofs,” In POPL, pages 232–244, 2004.
146
[Henzinger et al., 2002] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre, “Lazy abstrac-
tion,” In POPL, pages 58–70, 2002.
[Henzinger et al., 2000] T. A. Henzinger, S. Qadeer, and S. K. Rajamani, “Decomposing
refinement proofs using assume-guarantee reasoning,” In ICCAD, pages 245–253, 2000.
[Herlihy and Wing, 1990] M. P. Herlihy and J. M. Wing, “Linearizability: A correctness
condition for concurrent objects,” In TOPLAS, pages 463–492, 1990.
[Hoare, 1969] C. A. R. Hoare, “An axiomatic basis for computer programming,” Commun.
ACM, 12(10):576–580, 1969.
[Hoare, 1978] C. A. R. Hoare, “Communicating Sequential Processes,” Commun. ACM,
21(8):666–677, August 1978.
[Jalbert and Sen, 2010] N. Jalbert and K. Sen, “A trace simplification technique for effective
debugging of concurrent programs,” In FSE, pages 57–66, 2010.
[Jin et al., 2012] G. Jin, W. Zhang, D. Deng, B. Liblit, and S. Lu, “Automated Concurrency-Bug
Fixing,” In OSDI, pages 221–236, 2012.
[Jobstmann et al., 2005] B. Jobstmann, A. Griesmayer, and R. Bloem, “Program Repair as a
Game,” In CAV, pages 226–238, 2005.
[Jones, 1983] C. B. Jones, “Specification and Design of (Parallel) Programs.,” In IFIP congress,
pages 321–332, 1983.
[Kahlon and Wang, 2010] V. Kahlon and C. Wang, “Universal causality graphs: a precise
happens-before model for detecting bugs in concurrent programs,” In CAV, pages 434–449,
2010.
[Kashyap and Garg, 2008] S. Kashyap and V. Garg, “Producing Short Counterexamples Using
“Crucial Events”,” In CAV, pages 491–503, 2008.
[Khoshnood et al., 2015] S. Khoshnood, M. Kusano, and C. Wang, “ConcBugAssist: Constraint
solving for diagnosis and repair of concurrency bugs,” In International Symposium on Software
Testing and Analysis, 2015.
147
[Kripke, 1963] S. Kripke, “Semantical Considerations on Modal Logic,” Acta Philosophica
Fennica, pages 83–94, 1963.
[Lamport, 1979] L. Lamport, “How to make a multiprocessor computer that correctly executes
multiprocess programs,” Computers, IEEE Transactions on, 100(9):690–691, 1979.
[Lamport, 1980] L. Lamport, ““Sometime is sometimes” not “never” – On the temporal logic
of programs,” In POPL, pages 174–185, 1980.
[Lipton, 1975] R. J. Lipton, “Reduction: a new method of proving properties of systems of
processes,” In POPL, pages 78–86, 1975.
[Lu et al., 2008] S. Lu, S. Park, E. Seo, and Y. Zhou, “Learning from mistakes: a comprehensive
study on real world concurrency bug characteristics,” In ASPLOS, pages 329–339, 2008.
[Manna and Wolper, 1984] Z. Manna and P. Wolper, “Synthesis of Communicating Processes
from Temporal Logic Specifications,” In TOPLAS, pages 68–93, 1984.
[Mazurkiewicz, 1987] A. Mazurkiewicz, “Trace theory,” In Petri Nets: Applications and
Relationships to Other Models of Concurrency, pages 278–324, 1987.
[McCarthy, 1960] J. McCarthy, “Recursive functions of symbolic expressions and their compu-
tation by machine, Part I,” Communications of the ACM, 3(4):184–195, 1960.
[McCarthy, 1962] J. McCarthy, “Towards a mathematical science of computation,” In IFIP
Congress, pages 21–28, 1962.
[McMillan, 1993] K. L. McMillan, Symbolic Model Checking, Springer US, 1993.
[memcached] “Memcached distributed memory object caching system,” http://memcached.org.
[Milner, 1980] R. Milner, A Calculus of Communicating Systems, Lecture Notes in Computer
Science. Springer-Verlag, 1980.
[Milner, 1992] R. Milner, “Functions as processes,” Mathematical Structures in Computer
Science, 2:119–141, 6 1992.
[Morse et al., 2014] J. Morse, M. Ramalho, L. Cordeiro, D. Nicole, and B. Fischer, “ESBMC
1.22,” In TACAS, pages 405–407, 2014, http://www.esbmc.org/.
148
[Musuvathi and Qadeer, 2007] M. Musuvathi and S. Qadeer, “Iterative context bounding for
systematic testing of multithreaded programs,” In PLDI, pages 446–455, 2007.
[Owicki and Gries, 1976] S. Owicki and D. Gries, “An axiomatic proof technique for parallel
programs I,” Acta Informatica, 6(4):319–340, 1976.
[Papadimitriou, 1986] C. Papadimitriou, The theory of database concurrency control, Computer
Science Press Inc., Rockville, MD, 1986.
[Petri, 1962] C. A. Petri, Kommunikation mit Automaten, PhD thesis, Universität Hamburg,
1962.
[Pnueli and Rosner, 1989] A. Pnueli and R. Rosner, “On the synthesis of a reactive module,” In
POPL, pages 179–190, 1989.
[Pnueli, 1977] A. Pnueli, “The temporal logic of programs,” In Foundations of Computer
Science, 18th Annual Symposium on, pages 46–57, 1977.
[Q] “Q Program Verifier,” http://research.microsoft.com/en-us/projects/
verifierq/.
[Qadeer and Rehof, 2005] S. Qadeer and J. Rehof, “Context-bounded model checking of con-
current software,” In Tools and Algorithms for the Construction and Analysis of Systems,
pages 93–107, 2005.
[Queille and Sifakis, 1982] J. Queille and J. Sifakis, “Specification and verification of concur-
rent systems in CESAR,” In International Symposium on Programming, pages 337–351,
1982.
[Rabin, 1972] M. O. Rabin, “Automata on infinite objects and Church’s problem,” 1972.
[Ramadge and Wonham, 1987] P. J. Ramadge and W. M. Wonham, “Supervisory control of a
class of discrete event processes,” SIAM journal on control and optimization, 25(1):206–230,
1987.
[Ryzhyk et al., 2009] L. Ryzhyk, P. Chubb, I. Kuz, and G. Heiser, “Dingo: Taming Device
Drivers,” In Eurosys, 2009.
149
[Sadowski and Yi, 2010] C. Sadowski and J. Yi, “User Evaluation of Correctness Conditions:
A Case Study of Cooperability,” In PLATEAU, pages 2:1–2:6, 2010.
[Said et al., 2011] M. Said, C. Wang, Z. Yang, and K. Sakallah, “Generating data race witnesses
by an SMT-based analysis,” In NASA Formal Methods, pages 313–327, 2011.
[Sakunkonchak et al., 2007] T. Sakunkonchak, S. Komatsu, and M. Fujita, “Using counter-
example analysis to minimize the number of predicates for predicate abstraction,” In ATVA,
pages 553–563, 2007.
[Samanta et al., 2008] R. Samanta, J. Deshmukh, and A. Emerson, “Automatic Generation of
Local Repairs for Boolean Programs,” In FMCAD, pages 1–10, 2008.
[Savage et al., 1997] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson,
“Eraser: A dynamic data race detector for multithreaded programs,” ACM Transactions
on Computer Systems (TOCS), 15(4):391–411, 1997.
[Schnoebelen, 2002] P. Schnoebelen, “The Complexity of Temporal Logic Model Checking,”
Advances in modal logic, 4(393-436):35, 2002.
[Sen, 2008] K. Sen, “Race Directed Random Testing of Concurrent Programs,” In PLDI, 2008.
[Sharma et al., 2012] R. Sharma, A. V. Nori, and A. Aiken, “Interpolants as Classifiers,” In
CAV, pages 71–87, 2012.
[Sinha and Wang, 2010] N. Sinha and C. Wang, “Staged concurrent program analysis,” In FSE,
2010.
[Sinha and Wang, 2011] N. Sinha and C. Wang, “On Interference Abstractions,” In POPL,
2011.
[Solar-Lezama et al., 2008] A. Solar-Lezama, C. Jones, and R. Bodík, “Sketching concurrent
data structures,” In PLDI, pages 136–148, 2008.
[Solar-Lezama et al., 2006] A. Solar-Lezama, L. Tancau, R. Bodík, S. A. Seshia, and V. A.
Saraswat, “Combinatorial sketching for finite programs,” In ASPLOS, pages 404–415, 2006.
[Vechev and Yahav, 2008] M. Vechev and E. Yahav, “Deriving linearizable fine-grained concur-
rent objects,” In PLDI, pages 125–135, 2008.
150
[Vechev et al., 2010a] M. Vechev, E. Yahav, and G. Yorsh, “Abstraction-guided synthesis of
synchronization,” In POPL, pages 327–338, 2010.
[Vechev et al., 2009] M. T. Vechev, E. Yahav, and G. Yorsh, “Inferring Synchronization under
Limited Observability,” In TACAS, pages 139–154, 2009.
[Vechev et al., 2010b] M. T. Vechev, E. Yahav, R. Raman, and V. Sarkar, “Automatic Verification
of Determinism for Structured Parallel Programs,” In SAS, pages 455–471, 2010.
[von Essen and Jobstmann, 2013] C. von Essen and B. Jobstmann, “Program Repair without
Regret,” In CAV, pages 896–911, 2013.
[Waldinger and Lee, 1969] R. J. Waldinger and R. C. T. Lee, “PROW: a step toward automatic
program writing,” In Proceedings of the 1st international joint conference on Artificial
intelligence, pages 241–252, 1969.
[Wang et al., 2009] C. Wang, S. Kundu, M. Ganai, and A. Gupta, “Symbolic predictive analysis
for concurrent programs,” In FM, pages 256–272, 2009.
[Wang et al., 2006] C. Wang, B. Li, H. Jin, G. Hachtel, and F. Somenzi, “Improving Ariadne’s
bundle by following multiple threads in abstraction refinement,” IEEE TCAD, pages 2297–
2316, 2006.
[Wang et al., 2010] C. Wang, R. Limaye, M. Ganai, and A. Gupta, “Trace-based symbolic
analysis for atomicity violations,” In TACAS, pages 328–342, 2010.
[Weeratunge et al., 2011] D. Weeratunge, X. Zhang, and S. Jaganathan, “Accentuating the
positive: atomicity inference and enforcement using correct executions,” In OOPSLA, pages
19–34, 2011.
[Yi and Flanagan, 2010] J. Yi and C. Flanagan, “Effects for cooperable and serializable threads,”
In Proceedings of the 5th ACM SIGPLAN workshop on Types in language design and imple-
mentation, pages 3–14, 2010.
[Zhang and Wang, 2014] L. Zhang and C. Wang, “Runtime prevention of concurrency related
type-state violations in multithreaded applications,” In ISSTA, pages 1–12, 2014.
151
List of Publications
• Pavol Černý, Edmund M. Clarke, Thomas A. Henzinger, Arjun Radhakrishna, Leonid
Ryzhyk, Roopsha Samanta and Thorsten Tarrach,
“From Non-preemptive to Preemptive Scheduling Using Synchronization Synthesis,”
In Computer Aided Verification (2015), pages 180–197.
Implementation: https://github.com/thorstent/Liss
• Ashutosh Gupta, Thomas A. Henzinger, Arjun Radhakrishna, Roopsha Samanta and
Thorsten Tarrach,
“Succinct Representation of Concurrent Trace Sets,”
In Principles of Programming Languages (2015), pages 433–444.
Implementation: https://github.com/thorstent/TARA
• Pavol Černý, Thomas A. Henzinger, Arjun Radhakrishna, Leonid Ryzhyk and Thorsten
Tarrach,
“Regression-Free Synthesis for Concurrency,”
In Computer Aided Verification (2014), pages 568–584.
Implementation: https://github.com/thorstent/ConRepair
• Pavol Černý, Thomas A. Henzinger, Arjun Radhakrishna, Leonid Ryzhyk and Thorsten
Tarrach,
“Efficient Synthesis for Concurrency by Semantics-Preserving Transformations,”
In Computer Aided Verification (2013), pages 951–967, 2013.
Implementation: https://github.com/thorstent/ConcurrencySwapper
• Michael Backes, Cătălin Hriţcu and Thorsten Tarrach,
“Automatically verifying typing constraints for a data processing language,”
In Certified Programs and Proofs (2011), pages 296–313.
