From Non-preemptive to Preemptive Scheduling using  Synchronization Synthesis by Černý, Pavol et al.
Co
ns
iste
nt *
Complete * W
ell Documented*Easyto
Re
us
e*
*Evaluated*
CA
V
*
Art
ifact *
AEC
From Non-preemptive to Preemptive Scheduling
using Synchronization Synthesis ⋆
Pavol Cˇerny´1, Edmund M. Clarke2, Thomas A. Henzinger3, Arjun
Radhakrishna4, Leonid Ryzhyk2, Roopsha Samanta3, and Thorsten Tarrach3
1 University of Colorado Boulder
2 Carnegie Mellon University
3 IST Austria
4 University of Pennsylvania
Abstract. We present a computer-aided programming approach to con-
currency. The approach allows programmers to program assuming a
friendly, non-preemptive scheduler, and our synthesis procedure inserts
synchronization to ensure that the final program works even with a pre-
emptive scheduler. The correctness specification is implicit, inferred from
the non-preemptive behavior. Let us consider sequences of calls that the
program makes to an external interface. The specification requires that
any such sequence produced under a preemptive scheduler should be in-
cluded in the set of such sequences produced under a non-preemptive
scheduler. The solution is based on a finitary abstraction, an algorithm
for bounded language inclusion modulo an independence relation, and
rules for inserting synchronization. We apply the approach to device-
driver programming, where the driver threads call the software interface
of the device and the API provided by the operating system. Our exper-
iments demonstrate that our synthesis method is precise and efficient,
and, since it does not require explicit specifications, is more practical
than the conventional approach based on user-provided assertions.
1 Introduction
Concurrent shared-memory programming is notoriously difficult and error-prone.
Program synthesis for concurrency aims to mitigate this complexity by synthe-
sizing synchronization code automatically [4, 5, 8, 11]. However, specifying the
programmer’s intent may be a challenge in itself. Declarative mechanisms, such
as assertions, suffer from the drawback that it is difficult to ensure that the
specification is complete and fully captures the programmer’s intent.
We propose a solution where the specification is implicit. We observe that
a core difficulty in concurrent programming originates from the fact that the
scheduler can preempt the execution of a thread at any time. We therefore give
⋆ This research was supported in part by the European Research Council (ERC) un-
der grant 267989 (QUAREM), by the Austrian Science Fund (FWF) under grants
S11402-N23 (RiSE) and Z211-N23 (Wittgenstein Award), by NSF under award CCF
1421752 and the Expeditions award CCF 1138996, by the Simons Foundation, and
by a gift from the Intel Corporation.
the developer the option to program assuming a friendly, non-preemptive, sched-
uler. Our tool automatically synthesizes synchronization code to ensure that
every behavior of the program under preemptive scheduling is included in the
set of behaviors produced under non-preemptive scheduling. Thus, we use the
non-preemptive semantics as an implicit correctness specification.
The non-preemptive scheduling model dramatically simplifies the develop-
ment of concurrent software, including operating system (OS) kernels, network
servers, database systems, etc. [13, 14]. In this model, a thread can only be de-
scheduled by voluntarily yielding control, e.g., by invoking a blocking operation.
Synchronization primitives may be used for communication between threads,
e.g., a producer thread may use a semaphore to notify the consumer about
availability of data. However, one does not need to worry about protecting ac-
cesses to shared state: a series of memory accesses executes atomically as long
as the scheduled thread does not yield.
In defining behavioral equivalence between preemptive and non-preemptive
executions, we focus on externally observable program behaviors: two program
executions are observationally equivalent if they generate the same sequences of
calls to interfaces of interest. This approach facilitates modular synthesis where a
module’s behavior is characterized in terms of its interaction with other modules.
Given a multi-threaded program C and a synthesized program C′ obtained by
adding synchronization to C, C′ is preemption-safe w.r.t. C if for each execution
of C′ under a preemptive scheduler, there is an observationally equivalent non-
preemptive execution of C. Our synthesis goal is to automatically generate a
preemption-safe version of the input program.
We rely on abstraction to achieve efficient synthesis of multi-threaded pro-
grams. We propose a simple, data-oblivious abstraction inspired by an analysis
of synchronization patterns in OS code, which tend to be independent of data
values. The abstraction tracks types of accesses (read or write) to each memory
location while ignoring their values. In addition, the abstraction tracks branch-
ing choices. Calls to an external interface are modeled as writes to a special
memory location, with independent interfaces modeled as separate locations. To
the best of our knowledge, our proposed abstraction is yet to be explored in the
verification and synthesis literature.
Two abstract program executions are observationally equivalent if they are
equal modulo the classical independence relation I on memory accesses: accesses
to different locations are independent, and accesses to the same location are
independent iff they are both read accesses. Using this notion of equivalence,
the notion of preemption-safety is extended to abstract programs.
Under abstraction, we model each thread as a nondeterministic finite automa-
ton (NFA) over a finite alphabet, with each symbol corresponding to a read or a
write to a particular variable. This enables us to construct NFAs N , represent-
ing the abstraction of the original program C under non-premptive scheduling,
and P , representing the abstraction of the synthesized program C′ under pre-
emptive scheduling. We show that preemption-safety of C′ w.r.t. C is implied
by preemption-safety of the abstract synthesized program w.r.t. the abstract
original program, which, in turn, is implied by language inclusion modulo I of
NFAs P and N . While the problem of language inclusion modulo an indepen-
dence relation is undecidable [2], we show that the antichain-based algorithm for
standard language inclusion [9] can be adapted to decide a bounded version of
language inclusion modulo an independence relation.
Our overall synthesis procedure works as follows: we run the algorithm for
bounded language inclusion modulo I, iteratively increasing the bound, until it
reports that the inclusion holds, or finds a counterexample, or reaches a timeout.
In the first case, the synthesis procedure terminates successfully. In the second
case, the counterexample is generalized to a set of counterexamples represented
as a Boolean combination of ordering constraints over control-flow locations
(as in [11]). These constraints are analyzed for patterns indicating the type of
concurrency bug (atomicity, ordering violation) and the type of applicable fix
(lock insertion, statement reordering). After applying the fix(es), the procedure
is restarted from scratch; the process continues until we find a preemption-safe
program, or reach a timeout.
We implemented our synthesis procedure in a new prototype tool called Liss
(Language Inclusion-based Synchronization Synthesis) and evaluated it on a se-
ries of device driver benchmarks, including an Ethernet driver for Linux and the
synchronization skeleton of a USB-to-serial controller driver. First, Liss was able
to detect and eliminate all but two known race conditions in our examples; these
included one race condition that we previously missed when synthesizing from
explicit specifications [5], due to a missing assertion. Second, our abstraction
proved highly efficient: Liss runs an order of magnitude faster on the more com-
plicated examples than our previous synthesis tool based on the CBMC model
checker. Third, our coarse abstraction proved surprisingly precise in practice:
across all our benchmarks, we only encountered three program locations where
manual abstraction refinement was needed to avoid the generation of unneces-
sary synchronization. Overall, our evaluation strongly supports the use of the
implicit specification approach based on non-preemptive scheduling semantics
as well as the use of the data-oblivious abstraction to achieve practical synthesis
for real-world systems code.
Contributions. First, we propose a new specification-free approach to synchro-
nization synthesis. Given a program written assuming a friendly, non-preemptive
scheduler, we automatically generate a preemption-safe version of the program.
Second, we introduce a novel abstraction scheme and use it to reduce preemption-
safety to language inclusion modulo an independence relation. Third, we present
the first language inclusion-based synchronization synthesis procedure and tool
for concurrent programs. Our synthesis procedure includes a new algorithm for
a bounded version of our inherently undecidable language inclusion problem.
Finally, we evaluate our synthesis procedure on several examples. To the best of
our knowledge, Liss is the first synthesis tool capable of handling realistic (al-
beit simplified) device driver code, while previous tools were evaluated on small
fragments of driver code or on manually extracted synchronization skeletons.
Related work. Synthesis of synchronization is an active research area [3–6,10–
12,15, 16]. Closest to our work is a recent paper by Bloem et al. [3], which uses
implicit specifications for synchronization synthesis. While their specification is
given by sequential behaviors, ours is given by non-preemptive behaviors. This
makes our approach applicable to scenarios where threads need to communicate
void open dev() {
1: while (*) {
2: if (open==0) {
3: power up();
4: }
5: open=open+1;
6: yield; } }
void close dev() {
7: while (*) {
8: if (open>0) {
9: open=open-1;
10: if (open==0) {
11: power down();
12: } }
13: yield; } }
void open dev abs() {
1: while (*) {
2: (A) r open;
if (*) {
3: (B) w dev;
4: }
5: (C) r open;
(D) w open;
6: yield; } }
void close dev abs() {
7: while (*) {
8: (E) r open;
if (*) {
9: (F) r open;
(G) w open;
10: (H) r open;
if (*) {
11: (I) w dev;
12: } }
13: yield; } }
(a) (b)
Fig. 1: Running example and its abstraction
explicitly. Further, correctness in [3] is determined by comparing values at the
end of the execution. In contrast, we compare sequences of events, which serves
as a more suitable specification for infinitely-looping reactive systems.
Many efforts in synthesis of synchronization focus on user-provided specifi-
cations, such as assertions (our previous work [4, 5, 11]). However, it is hard to
determine if a given set of assertions represents a complete specification. In this
paper, we are solving language inclusion, a computationally harder problem than
reachability. However, due to our abstraction, our tool performs significantly bet-
ter than tools from [4,5], which are based on a mature model checker (CBMC [7]).
Our abstraction is reminiscent of previously used abstractions that track reads
and writes to individual locations (e.g., [1,17]). However, our abstraction is novel
as it additionally tracks some control-flow information (specifically, the branches
taken) giving us higher precision with almost negligible computational cost. The
synthesis part of our approach is based on [11].
In [16] the authors rely on assertions for synchronization synthesis and in-
clude iterative abstraction refinement in their framework. This is an interesting
extension to pursue for our abstraction. In other related work, CFix [12] can
detect and fix concurrency bugs by identifying simple bug patterns in the code.
2 Illustrative Example
Fig. 1a contains our running example. Consider the case where the procedures
open dev() and close dev() are invoked in parallel, possibly multiple times
(modeled as a non-deterministic while loop). The functions power up() and
power down() represent calls to a device. For the non-preemptive scheduler,
the sequence of calls to the device will always be a repeating sequence of one
call to power up(), followed by one call to power down(). Without additional
synchronization, however, there could be two calls to power up() in a row when
executing it with a preemptive scheduler. Such a sequence is not observationally
equivalent to any sequence that can be produced when executing with a non-
preemptive scheduler.
Fig. 1b contains the abstracted versions (we omit tracking of branch-
ing choices in the example)of the two procedures, open dev abs() and
close dev abs(). For instance, the instruction open = open + 1 is abstracted
to the two instructions labeled (C) and (D). The abstraction is coarse, but
still captures the problem. Consider two threads T1 and T2 running the
open dev abs() procedure. The following trace is possible under a preemp-
tive scheduler, but not under a non-preemptive scheduler: T1.A; T2.A; T1.B;
T1.C; T1.D; T2.B; T2.C; T2.D. Moreover, the trace cannot be transformed
by swapping independent events into any trace possible under a non-preemptive
scheduler. This is because instructions A and D are not independent. Hence,
the abstract trace exhibits the problem of two successive calls to power up()
when executing with a preemptive scheduler. Our synthesis procedure finds this
problem, and fixes it by introducing a lock in open dev() (see Sec. 5).
3 Preliminaries and Problem Statement
Syntax. We assume that programs are written in a concurrent while languageW. A concurrent program C in W is a finite collection of threads ⟨T1, . . . ,Tn⟩
where each thread is a statement written in the syntax from Fig. 2. All W vari-
ables (program variables std var, lock variables lock var, and condition vari-
able cond var) range over integers and each statement is labeled with a unique
location identifier l. The only non-standard syntactic constructs in W relate to
the tags. Intuitively, each tag is a communication channel between the program
and an interface to an external system, and the input(tag) and output(tag, expr)
statements read from and write to the channel. We assume that the program
and the external system interface can only communicate through the channel.
In practice, we use the tags to model device registers. In our presentation, we
consider only a single external interface. Our implementation can handle com-
munication with several interfaces.
expr ::= std var | constant | operator(expr, expr, . . ., expr)
lstmt ::= loc: stmt | lstmt; lstmt
stmt ::= skip | std var := expr | std var := havoc()
| if (expr) lstmt else lstmt | while (expr) lstmt | std var := input(tag)
| output(tag, expr) | lock(lock var) | unlock(lock var)
| signal(cond var) | await(cond var) | reset(cond var) | yield
Fig. 2: Syntax of W
Semantics. We begin by defining the semantics of a single thread inW, and then
extend the definition to concurrent non-preemptive and preemptive semantics.
Note that in our work, reads and writes are assumed to execute atomically and
further, we assume a sequentially consistent memory model.
Single-thread semantics. A program state is given by ⟨V,P⟩ where V is a valua-
tion of all program variables, and P is the statement that remains to be executed.
Let us fix a thread identifier tid .
The operational semantics of a thread executing in isolation is given in Fig. 3.
A single execution step ⟨V,P⟩ αÐ→ ⟨V ′,P ′⟩ changes the program state from ⟨V,P⟩
to ⟨V ′,P ′⟩ while optionally outputting an observable symbol α. The absence of
a symbol is denoted using . Most rules from Fig. 3 are standard—the special
rules are the Havoc, Input, and Output rules.
1. Havoc: Statement l ∶ x ∶= havoc assigns x a non-deterministic value (say k)
and outputs the observable (tid ,havoc, k, x).
2. Input, Output: l ∶ x ∶= input(t) and l ∶ output(t, e) read and write values to
the channel t, and output (tid , input, k, t) and (tid ,output, k, t), where k is
the value read or written, respectively.
Intuitively, the observables record the sequence of non-deterministic guesses,
as well as the input/output interaction with the tagged channels. In the following,
e represents an expression and e[v/V[v]] evaluates an expression by replacing
all variables v with their values in V.
e[v/V[v]] = k⟨V, l ∶ x ∶= e⟩ Ð→ ⟨V[x ∶= k], skip⟩Assign k ∈ N α = (tid ,havoc, k, x)⟨V, l ∶ x ∶= havoc⟩ αÐ→ ⟨V[x ∶= k], skip⟩Havoc
e[v/V[v]] = false⟨V, l ∶ while(e) s⟩ Ð→ ⟨V, skip⟩While1 e[v/V[v]] = true⟨V, l ∶ while(e) s⟩ Ð→ ⟨V, s;while(e) s⟩While2
e[v/V[v]] = true⟨V, l ∶ if e then s1 else s2⟩ Ð→ ⟨V, s1⟩If1 e[v/V[v]] = false⟨V, l ∶ if e then s1 else s2⟩ Ð→ ⟨V, s2⟩If2⟨V, s1⟩ αÐ→ ⟨V ′, s′1⟩⟨V, l ∶ s1; s2⟩ αÐ→ ⟨V ′, s′1; s2⟩Sequence k ∈ N α = (tid , input, k, t)⟨V, l ∶ x ∶= input(t)⟩ αÐ→ ⟨V[x ∶= k], skip⟩Input
⟨V, l ∶ skip; s2⟩ Ð→ ⟨V, s2⟩Skip e[v/V[v]] = k α = (tid ,output, k, t)⟨V, l ∶ output(t, e)⟩ αÐ→ ⟨V, skip⟩ Output
Fig. 3: Single thread semantics of W
Non-preemptive semantics. The non-preemptive semantics of W is presented in
Appendix A. The non-preemptive semantics ensures that a single thread from
the program keeps executing as detailed above until one of the following occurs:
(a) the thread finishes execution, or it encounters (b) a yield statement, or (c) a
lock statement and the lock is taken, or (d) an await statement and the condition
variable is not set. In these cases, a context-switch is possible.
Preemptive semantics. The preemptive semantics of a program is obtained from
the non-preemptive semantics by relaxing the condition on context-switches, and
allowing context-switches at all program points (see Appendix A).
3.1 Problem statement
A non-preemptive observation sequence of a program C is a sequence α0 . . . αk if
there exist program states Spre0 , S
post
0 , . . . , S
pre
k , S
post
k such that according to
the non-preemptive semantics of W, we have: (a) for each 0 ≤ i ≤ k, ⟨Sprei ⟩ αiÐ→⟨Sposti ⟩, (b) for each 0 ≤ i < k, ⟨Sposti ⟩ Ð→∗⟨Sprei+1 ⟩, and (c) for the initial state Sι and
a final state (i.e., where all threads have finished execution) Sf , ⟨Sι⟩ Ð→∗⟨Spre0 ⟩
and ⟨Spostk ⟩ Ð→∗⟨Sf ⟩. Similarly, a preemptive observation sequence of a program C
is a sequence α0 . . . αk as above, with the non-preemptive semantics replaced with
preemptive semantics. We denote the sets of non-preemptive and preemptive
observation sequences of a program C by [[C]]NP and [[C]]P , respectively.
We say that observation sequences α0 . . . αk and β0 . . . βk are equivalent if:
– The subsequences of α0 . . . αk and β0 . . . βk containing only symbols of the
form (tid , Input, k, t) and (tid ,Output, k, t) are equal, and
– For each thread identifier tid , the subsequences of α0 . . . αk and β0 . . . βk
containing only symbols of the form (tid ,Havoc, k, x) are equal.
Intuitively, observable sequences are equivalent if they have the same interaction
with the interface, and the same non-deterministic choices in each thread. For
sets of observable sequences O1 and O2, we write O1 ⊆ O2 to denote that each
sequence in O1 has an equivalent sequence in O2. Given a concurrent programC and a synthesized program C′ obtained by adding synchronization to C, the
program C′ is preemption-safe w.r.t. C if [[C′]]P ⊆ [[C]]NP .
We are now ready to state our synthesis problem. Given a concurrent programC, the aim is to synthesize a program C′, by adding synchronization to C, such
that C′ is preemption-safe w.r.t. C.
3.2 Language Inclusion Modulo an Independence Relation
We reduce the problem of checking if a synthesized solution is preemption-safe
w.r.t. the original program to an automata-theoretic problem.
Abstract semantics for W. We first define a single-thread abstract semantics
for W (Fig. 4), which tracks types of accesses (read or write) to each memory
location while abstracting away their values. Inputs/outputs to an external in-
terface are modeled as writes to a special memory location (dev). Even inputs
are modeled as writes because in our applications we cannot assume that reads
from the external interface are free of side-effects. Havocs become ordinary writes
to the variable they are assigned to. Every branch is taken non-deterministically
and tracked. The only constructs preserved are the lock and condition variables.
The abstract program state consists of the valuations of the lock and condition
variables and the statement that remains to be executed. In the abstraction,
an observable is of the form (tid ,{read,write, exit, loop, then, else}, v, l) and ob-
serves the type of access (read/write) to variable v and records non-deterministic
branching choices (exit/loop/then/else). The latter are not associated with any
variable.
In Fig. 4, given expression e, the function Reads(tid , e, l) represents the se-
quence (tid , read, v1, l) ⋅ . . . ⋅ (tid , read, vn, l) where v1, . . . , vn are the variables in
e, in the order they are read to evaluate e.
α = Reads(tid , e, l) ⋅ (tid ,write, x, l)⟨V, l ∶ x ∶= e⟩ αÐ→ ⟨V, skip⟩ Assign α = (tid ,write, x, l)⟨V, l ∶ x ∶= havoc⟩ αÐ→ ⟨V, skip⟩Havoc
α = Reads(tid , e, l) ⋅ (tid , exit, , l)⟨V, l ∶ while(e) s⟩ αÐ→ ⟨V, skip⟩ While1 α = Reads(tid , e, l) ⋅ (tid , loop, , l)⟨V, l ∶ while(e) s⟩ αÐ→ ⟨V, s;while(e) s⟩While2
α = Reads(tid , e, l) ⋅ (tid , then, , l)⟨V, l ∶ if e then s1 else s2⟩ αÐ→ ⟨V, s1⟩If1 α = Reads(tid , e, l) ⋅ (tid , else, , l)⟨V, l ∶ if e then s1 else s2⟩ αÐ→ ⟨V, s2⟩If2⟨V, s1⟩ αÐ→ ⟨V ′, s′1⟩⟨V, l ∶ s1; s2⟩ αÐ→ ⟨V ′, s′1; s2⟩Sequence α = (tid ,write,dev, l) ⋅ (tid ,write, x, l)⟨l ∶ x ∶= input(t)⟩ αÐ→ ⟨skip⟩ Input
⟨V, l ∶ skip; s2⟩ Ð→ ⟨V, s2⟩Skip α = Reads(tid , e, l) ⋅ (tid ,write,dev, l)⟨V, l ∶ output(t, e)⟩ αÐ→ ⟨V, skip⟩ Output
Fig. 4: Single thread abstract semantics of W
The abstract program semantics (Figures 6 and 7) is the same as the concrete
program semantics where the single thread semantics is replaced by the abstract
single thread semantics. Locks and conditionals and operations on them are not
abstracted.
As with the concrete semantics of W, we can define the non-preemptive and
preemptive observable sequences for abstract semantics. For a concurrent pro-
gram C, we denote the sets of abstract preemptive and non-preemptive observable
sequences by [[C]]Pabs and [[C]]NPabs , respectively.
Abstract observation sequences α0 . . . αk and β0 . . . βk are equivalent if:
– For each thread tid , the subsequences of α0 . . . αk and β0 . . . βk
containing only symbols of the form (tid , a, v, l), with a ∈{read,write, exit, loop, then, else} are equal,
– For each variable v, the subsequences of α0 . . . αk and β0 . . . βk containing
only write symbols (of the form (tid ,write, v, l)) are equal, and
– For each variable v, the multisets of symbols of the form (tid , read, v, l) be-
tween any two write symbols, as well as before the first write symbol and
after the last write symbol are identical.
We first show that the abstract semantics is sound w.r.t. preemption-safety (see
Appendix B for the proof).
Theorem 1. Given concurrent program C and a synthesized program C′ ob-
tained by adding synchronization to C, [[C′]]Pabs ⊆ [[C]]NPabs ⇒ [[C′]]P ⊆ [[C]]NP .
Abstract semantics to automata. An NFA A is a tuple (Q,Σ,∆,Qι, F )
where Σ is a finite alphabet, Q,Qι, F are finite sets of states, initial states and
final states, respectively and ∆ is a set of transitions. A word σ0 . . . σk ∈ Σ∗ is
accepted by A if there exists a sequence of states q0 . . . qk+1 such that q0 ∈ Qι and
qk+1 ∈ F and ∀i ∶ (qi, σi, qi+1) ∈ ∆. The set of all words accepted by A is called
the language of A and is denoted L(A).
Given a program C, we can construct automata A([[C]]NPabs ) and A([[C]]Pabs)
that accept exactly the observable sequences under the respective semantics.
We describe their construction informally. Each automaton state is a program
state of the abstract semantics and the alphabet is the set of abstract observable
symbols. There is a transition from one state to another on an observable symbol
(or an ) iff the program can execute one step under the corresponding semantics
to reach the other state while outputting the observable symbol (on an ).
Language inclusion modulo an independence relation. Let I be a non-
reflexive, symmetric binary relation over an alphabet Σ. We refer to I as the
independence relation and to elements of I as independent symbol pairs. We
define a symmetric binary relation ≈ over words in Σ∗: for all words σ,σ′ ∈ Σ∗
and (α,β) ∈ I, (σ ⋅ αβ ⋅ σ′, σ ⋅ βα ⋅ σ′) ∈ ≈. Let ≈t denote the reflexive transitive
closure of ≈.5 Given a language L over Σ, the closure of L w.r.t. I, denoted
CloI(L), is the set {σ ∈ Σ∗∶ ∃σ′ ∈ L with (σ,σ′) ∈ ≈t}. Thus, CloI(L) consists of
all words that can be obtained from some word in L by repeatedly commuting
adjacent independent symbol pairs from I.
Definition 1 (Language inclusion modulo an independence relation).
Given NFAs A,B over a common alphabet Σ and an independence relation I
over Σ, the language inclusion problem modulo I is: L(A) ⊆ CloI(L(B))?
5 The equivalence classes of ≈t are Mazurkiewicz traces.
We reduce preemption-safety under the abstract semantics to language
inclusion modulo an independence relation. The independence relation I
we use is defined on the set of abstract observable symbols as follows:((tid , a, v, l), (tid ′, a′, v′, l′)) ∈ I iff tid ≠ tid ′, and one of the following holds:
(a) v ≠ v′ or (b) a ≠ write ∧ a′ ≠ write.
Proposition 1. Given concurrent programs C and C′, [[C′]]Pabs ⊆ [[C]]NPabs iffL(A([[C′]]Pabs)) ⊆ CloI(L(A([[C]]NPabs ))).
4 Checking Language Inclusion
We first focus on the problem of language inclusion modulo an independence
relation (Definition 1). This question corresponds to preemption-safety (Thm. 1,
Prop. 1) and its solution drives our synchronization synthesis (Sec. 5).
Theorem 2. For NFAs A,B over alphabet Σ and an independence relation I ⊆
Σ ×Σ, L(A) ⊆ CloI(L(B)) is undecidable [2].
Fortunately, a bounded version of the problem is decidable. Recall the rela-
tion ≈ over Σ∗ from Sec. 3.2. We define a symmetric binary relation ≈i over Σ∗:(σ,σ′) ∈ ≈i iff ∃(α,β) ∈ I: (σ,σ′) ∈ ≈, σ[i] = σ′[i + 1] = α and σ[i + 1] = σ′[i] = β.
Thus ≈i consists of all words that can be optained from each other by commut-
ing the symbols at positions i and i + 1. We next define a symmetric binary
relation ≍ over Σ∗: (σ,σ′) ∈ ≍ iff ∃σ1, . . . , σt: (σ,σ1) ∈ ≈i1 , . . . , (σt, σ′) ∈ ≈it+1 and
i1 < . . . < it+1. The relation ≍ intuitively consists of words obtained from each
other by making a single forward pass commuting multiple pairs of adjacent
symbols. Let ≍k denote the k-composition of ≍ with itself. Given a language L
over Σ, we use Clok,I(L) to denote the set {σ ∈ Σ∗ ∶ ∃σ′ ∈ L with (σ,σ′) ∈ ≍k}.
In other words, Clok,I(L) consists of all words which can be generated from L
using a finite-state transducer that remembers at most k symbols of its input
words in its states.
Definition 2 (Bounded language inclusion modulo an independence
relation). Given NFAs A,B over Σ, I ⊆ Σ × Σ and a constant k > 0, the
k-bounded language inclusion problem modulo I is: L(A) ⊆ Clok,I(L(B))?
Theorem 3. For NFAs A,B over Σ, I ⊆ Σ ×Σ and a constant k > 0, L(A) ⊆
Clok,I(L(B)) is decidable.
We present an algorithm to check k-bounded language inclusion modulo I,
based on the antichain algorithm for standard language inclusion [9].
Antichain algorithm for language inclusion. Given a partial order (X,⊑),
an antichain over X is a set of elements of X that are incomparable w.r.t. ⊑.
In order to check L(A) ⊆ CloI(L(B)) for NFAs A = (QA,Σ,∆A,Qι,A, FA) and
B = (QB ,Σ,∆B ,Qι,B , FB), the antichain algorithm proceeds by exploring A
and B in lockstep. While A is explored nondeterministically, B is determinized
on the fly for exploration. The algorithm maintains an antichain, consisting of
tuples of the form (sA, SB), where sA ∈ QA and SB ⊆ QB . The ordering relation
⊑ is given by (sA, SB) ⊑ (s′A, S′B) iff sA = s′A and SB ⊆ S′B . The algorithm also
maintains a frontier set of tuples yet to be explored.
Given state sA ∈ QA and a symbol α ∈ Σ, let succα(sA) denote {s′A ∈ QA ∶(sA, α, s′A) ∈ ∆A}. Given set of states SB ⊆ QB , let succα(SB) denote {s′B ∈
QB ∶ ∃sB ∈ SB ∶ (sB , α, s′B) ∈ ∆B}. Given tuple (sA, SB) in the frontier set, let
succα(sA, SB) denote {(s′A, S′B) ∶ s′A ∈ succα(sA), S′B = succα(sB)}.
In each step, the antichain algorithm explores A and B by computing α-
successors of all tuples in its current frontier set for all possible symbols α ∈ Σ.
Whenever a tuple (sA, SB) is found with sA ∈ FA and SB ∩ FB = ∅, the algo-
rithm reports a counterexample to language inclusion. Otherwise, the algorithm
updates its frontier set and antichain to include the newly computed successors
using the two rules enumerated below. Given a newly computed successor tuple
p′:
– Rule 1: if there exists a tuple p in the antichain with p ⊑ p′, then p′ is not
added to the frontier set or antichain,
– Rule 2: else, if there exist tuples p1, . . . , pn in the antichain with p
′ ⊑
p1, . . . , pn, then p1, . . . , pn are removed from the antichain.
The algorithm terminates by either reporting a counterexample, or by declaring
success when the frontier becomes empty.
Antichain algorithm for k-bounded language inclusion modulo I. This
algorithm is essentially the same as the standard antichain algorithm, with the
automaton B above replaced by an automaton Bk,I accepting Clok,I(L(B)). The
setQBk,I of states of Bk,I consists of triples (sB , η1, η2), where sB ∈ QB and η1, η2
are k-length words over Σ. Intuitively, the words η1 and η2 store symbols that
are expected to be matched later along a run. The set of initial states of Bk,I is{(sB ,∅,∅) ∶ sB ∈ IB}. The set of final states of Bk,I is {(sB ,∅,∅) ∶ sB ∈ FB}.
The transition relation ∆Bk,I is constructed by repeatedly applying the following
rules, in order, for each state (sB , η1, η2) and each symbol α. In what follows,
η[∖i] denotes the word obtained from η by removing its ith symbol.
1. Pick new s′B and β ∈ Σ such that (sB , β, s′B) ∈∆B
2. (a) If ∀i: η1[i] ≠ α and α is independent of all symbols in η1,
η′2 ∶=η2 ⋅ α and η′1 ∶=η1, (b) else, if ∃i: η1[i] = α and α is independent of all
symbols in η1 prior to i, η
′
1 ∶=η1[∖i] and η′2 ∶=η2 (c) else, go to 1
3. (a) If ∀i: η′2[i] ≠ β and β is independent of all symbols in η′2,
η′1 ∶=η′1 ⋅ β, (b) else, if ∃i: η′2[i] = β and β is independent of all symbols in η′2
prior to i, η′2 ∶=η′2[∖i] (c) else, go to 1
4. Add ((sB , η1, η2), α, (s′B , η′1, η′2)) to ∆Bk,I and go to 1.
Example 1. In Fig. 5, we have an NFA B with L(B) = {αβ,β}, I = {(α,β)} and
k = 1. The states of Bk,I are triples (q, η1, η2), where q ∈ QB and η1, η2 ∈ {∅, α, β}.
We explain the derivation of a couple of transitions of Bk,I . The transition shown
in bold from (q0,∅,∅) on symbol β is obtained by applying the following rules
once: 1. Pick q1 since (q0, α, q1) ∈∆B . 2(a). η′2 ∶= β, η′1 ∶= ∅. 3(a). η′1 ∶= α. 4. Add((q0,∅,∅), β, (q1, α, β)) to ∆Bk,I . The transition shown in bold from (q1, α, β)
on symbol α is obtained as follows: 1. Pick q2 since (q1, β, q2) ∈∆B . 2(b). η′1 ∶= ∅,
η′2 ∶= β. 3(b). η′2 ∶= ∅. 4. Add ((q1, α, β), β, (q2,∅,∅)) to ∆Bk,I . It can be seen
that Bk,I accepts the language {αβ,βα,β} = Clok,I(B).
q0 start
q1
q2
B:
α
β
β
q0,∅,∅ start
q1,∅,∅ q2, β, α q2,∅,∅ q1, α, β
q2, β, α q2,∅,∅ q2,∅,∅ q2, α, β
B1,{(α,β)}:
α α β β
α β α β
Fig. 5: Example for illustrating construction of Bk,I for k = 1 and I = {(α,β)}.
Proposition 2. Given k > 0, NFA Bk,I described above accepts Clok,I(L(B)).
We develop a procedure to check language inclusion modulo I by iteratively
increasing the bound k (see Appendix C for the complete algorithm). The proce-
dure is incremental: the check for k+1-bounded language inclusion modulo I only
explores paths along which the bound k was exceeded in the previous iteration.
5 Synchronization Synthesis
We now present our iterative synchronization synthesis procedure, which is based
on the procedure in [11]. The reader is referred to [11] for further details. The
synthesis procedure starts with the original program C and in each iteration
generates a candidate synthesized program C′. The candidate C′ is checked for
preemption-safety w.r.t. C under the abstract semantics, using our procedure for
bounded language inclusion modulo I. If C′ is found preemption-safe w.r.t. C
under the abstract semantics, the synthesis procedure outputs C′. Otherwise, an
abstract counterexample cex is obtained. The counterexample is analyzed to in-
fer additional synchronization to be added to C′ for generating a new synthesized
candidate.
The counterexample trace cex is a sequence of event identifiers:
tid0.l0; . . . ; tidn.ln, where each li is a location identifier. We first analyze the
neighborhood of cex, denoted nhood(cex), consisting of traces that are permu-
tations of the events in cex. Note that each trace corresponds to an abstract
observation sequence. Furthermore, note that preemption-safety requires the
abstract observation sequence of any trace in nhood(cex) to be equivalent to
that of some trace in nhood(cex) feasible under non-preemptive semantics. Let
bad traces refer to the traces in nhood(cex) that are feasible under preemptive
semantics and do not meet the preemption-safety requirement. The goal of our
counterexample analysis is to characterize all bad traces in nhood(cex) in order
to enable inference of synchronization fixes.
In order to succinctly represent subsets of nhood(cex), we use ordering con-
straints. Intuitively, ordering constraints are of the following forms: (a) atomic
constraints Φ = A < B where A and B are events from cex. The constraint A < B
represents the set of traces in nhood(cex) where event A is scheduled before
event B; (b) Boolean combinations of atomic constraints Φ1 ∧ Φ2, Φ1 ∨ Φ2 and¬Φ1. We have that Φ1 ∧ Φ2 and Φ1 ∨ Φ2 respectively represent the intersection
and union of the set of traces represented by Φ1 and Φ2, and that ¬Φ1 represents
the complement (with respect to nhood(cex)) of the traces represented by Φ1.
Non-preemptive neighborhood. First, we generate all traces in nhood(cex)
that are feasible under non-preemptive semantics. We represent a single trace
pi using an ordering constraint Φpi that captures the ordering between non-
independent accesses to variables in pi. We represent all traces in nhood(cex)
that are feasible under non-preemptive semantics using the expression Φ = ⋁pi Φpi.
The expression Φ acts as the correctness specification for traces in nhood(cex).
Example. Recall the counterexample trace from the running ex-
ample in Sec. 2: cex = T1.A;T2.A;T1.B;T1.C;T1.D;T2.B;T2.C;T2.D.
There are two trace in nhood(cex) that are feasible under non-
preemptive semantics: pi1 = T1.A;T1.B;T1.C;T1.D;T2.A;T2.B;T2.C;T2.D
and pi2 = T2.A;T2.B;T2.C;T2.D;T1.A;T1.B;T1.C;T1.D. We represent pi1 as
Φ(pi1) = {T1.A,T1.C,T1.D} < T2.D ∧ T1.D < {T2.A,T2.C,T2.D} ∧ T1.B < T2.B and
pi2 as Φ(pi2) = T2.D < {T1.A,T1.C,T1.D} ∧ {T2.A,T2.C,T2.D} < T1.D ∧ T2.B < T1.B.
The correctness specification is Φ = Φ(pi1) ∨Φ(pi2).
Counterexample generalization. We next build a quantifier-free first order
formula Ψ over the event identifiers in cex such that any model of Ψ corresponds
to a bad trace in nhood(cex). We iteratively enumerate models pi of Ψ , building a
constraint ρ = Φ(pi) for each model pi, and generalizing each ρ into ρg to represent
a larger set of bad traces.
Example. Our trace cex from Sec. 2 would be generalized to T2.A < T1.D∧ T1.D <
T2.D. Any trace that fulfills this constraint is bad.
Inferring fixes. From each generalized formula ρg described above, we infer
possible synchronization fixes to eliminate all bad traces satisfying ρg. The key
observation we exploit is that common concurrency bugs often show up in our
formulas as simple patterns of ordering constraints between events. For example,
the pattern tid1.l1 < tid2.l2 ∧ tid2.l′2 < tid1.l′1 indicates an atomicity violation
and can be rewritten into lock(tid1.[l1 ∶ l′1], tid2.[l2 ∶ l′2]). The complete list of
such rewrite rules is in Appendix D. This list includes inference of locks and
reordering of notify statements. The set of patterns we use for synchronization
inference are not complete, i.e., there might be generalized formulae ρg that are
not matched by any pattern. In practice, we found our current set of patterns
to be adequate for most common concurrency bugs, including all bugs from the
benchmarks in this paper. Our technique and tool can be easily extended with
new patterns.
Example. The generalized constraint T2.A < T1.D ∧ T1.D < T2.D matches the lock
rule and yields lock(T2.[A ∶ D],T1.[D ∶ D]). Since the lock involves events in the
same function, the lock is merged into a single lock around instructions A and D in
open dev abs. This lock is not sufficient to make the program preemption-safe.
Another iteration of the synthesis procedure generates another counterexample
for analysis and synchronization inference.
Proposition 3. If our synthesis procedure generates a program C′, then C′ is
preemption-safe with respect to C.
Note that our procedure does not guarantee that the synthesized program C′
is deadlock-free. However, we avoid obvious deadlocks using heursitics such as
merging overlapping locks. Further, our tool supports detection of any additional
deadlocks introduced by synthesis, but relies on the user to fix them.
6 Implementation and Evaluation
We implemented our synthesis procedure in Liss. Liss is comprised of 5000 lines
of C++ code and uses Clang/LLVM and Z3 as libraries. It is available as open-
source software along with benchmarks at https://github.com/thorstent/
Liss. The language inclusion algorithm is available separately as a library called
Limi (https://github.com/thorstent/Limi). Liss implements the synthesis
method presented in this paper with several optimizations. For example, we take
advantage of the fact that language inclusion violations can often be detected by
exploring only a small fraction of the input automata by constructingA([[C]]NPabs )
and A([[C]]Pabs) on the fly.
Our prototype implementation has several limitations. First, Liss uses func-
tion inlining and therefore cannot handle recursive programs. Second, we do not
implement any form of alias analysis, which can lead to unsound abstractions.
For example, we abstract statements of the form “*x = 0” as writes to variable
x, while in reality other variables can be affected due to pointer aliasing. We
sidestep this issue by manually massaging input programs to eliminate aliasing.
Finally, Liss implements a simplistic lock insertion strategy. Inference rules
(see Sec. 5) produce locks expressed as sets of instructions that should be inside a
lock. Placing the actual lock and unlock instructions in the C code is challenging
because the instructions in the trace may span several basic blocks or even
functions. We follow a structural approach where we find the innermost common
parent block for the first and last instructions of the lock and place the lock and
unlock instruction there. This does not work if the code has gotos or returns
that could cause control to jump over the unlock statement. At the moment, we
simply report such situations to the user.
We evaluate our synthesis method against the following criteria: (1) Effec-
tiveness of synthesis from implicit specifications; (2) Efficiency of the proposed
synthesis procedure; (3) Precision of the proposed coarse abstraction scheme on
real-world programs.
Implicit vs explicit synthesis In order to evaluate the effectiveness of syn-
thesis from implicit specifications, we apply Liss to the set of benchmarks used
in our previous ConRepair tool for assertion-based synthesis [5]. In addition,
we evaluate Liss and ConRepair on several new assertion-based benchmarks
(Table 1). The set includes microbenchmarks modeling typical concurrency bug
patterns in Linux drivers and the usb-serial macrobenchmark, which mod-
els a complete synchronization skeleton of the USB-to-serial adapter driver. We
preprocess these benchmarks by eliminating assertions used as explicit specifi-
cations for synthesis. In addition, we replace statements of the form assume(v)
with await(v), redeclaring all variables v used in such statements as condi-
tion variables. This is necessary as our program syntax does not include assume
statements.
We use Liss to synthesize a preemption-safe version of each benchmark.
This method is based on the assumption that the benchmark is correct under
non-preemptive scheduling and bugs can only arise due to preemptive schedul-
ing. We discovered two benchmarks (lc-rc.c and myri10ge.c) that violated
this assumption, i.e., they contained race conditions that manifested under non-
Name LOC Th It MB BF(s) Syn(s) Ver(s) CR(s)
ConRepair benchmarks [5]
ex1.c 18 2 1 1 <1s <1s <1s <1s
ex2.c 23 2 1 1 <1s <1s <1s <1s
ex3.c 37 2 1 1 <1s <1s <1s <1s
ex5.c 42 2 3 1 <1s <1s 2s <1s
lc-rc.c 35 4 0 1 - - <1s 9s
dv1394.c 37 2 1 1 <1s <1s <1s 17s
em28xx.c 20 2 1 1 <1s <1s <1s <1s
f acm.c 80 3 1 1 <1s <1s <1s 1871.99s
i915 irq.c 17 2 1 1 <1s <1s <1s 2.6s
ipath.c 23 2 1 1 <1s <1s <1s 12s
iwl3945.c 26 3 1 1 <1s <1s <1s 5s
md.c 35 2 1 1 <1s <1s <1s 1.5s
myri10ge.c 60 4 0 3 - - <1s 1.5s
usb-serial.bug1.c 357 7 2 1 0.4s 3.1s 3.4s ∞b
usb-serial.bug2.c 355 7 1 3 0.7s 2.1s 12.9s 3563s
usb-serial.bug3.c 352 7 1 4 3.8s 1.3s 111.1s ∞b
usb-serial.bug4.c 351 7 1 4 93.9s 2.4s 123.1s ∞b
usb-serial.ca 357 7 0 4 - - 103.2s 1200s
CPMAC driver benchmark
cpmac.bug1.c 1275 5 1 1 1.3s 113.4s 21.9s -
cpmac.bug2.c 1275 5 1 1 3.3s 68.4s 27.8s -
cpmac.bug3.c 1270 5 1 1 5.4s 111.3s 8.7s -
cpmac.bug4.c 1276 5 2 1 2.4s 124.8s 31.5s -
cpmac.bug5.c 1275 5 1 1 2.8s 112.0s 58.0s -
cpmac.ca 1276 5 0 1 - - 17.4s -
Th=Threads, It=Iterations, MB=Max bound, BF=Bug finding, Syn=Synthesis,
Ver=Verification, Cr=ConRepair a bug-free example b timeout after 3 hours
Table 1: Experiments
preemptive scheduling; Liss did not detect these race conditions. Liss was able
to detect and fix all other known races without relying on assertions. Further-
more, Liss detected a new race in the usb-serial family of benchmarks, which
was not detected by ConRepair due to a missing assertion. We compared the
output of Liss with manually placed synchronization (taken from real bug fixes)
and found that the two versions were similar in most of our examples.
Performance and precision. ConRepair uses CBMC for verification and
counterexample generation. Due to the coarse abstraction we use, both steps are
much cheaper with Liss. For example, verification of usb-serial.c, which was
the most complex in our set of benchmarks, took Liss 103 seconds, whereas it
took ConRepair 20 minutes [5].
The loss of precision due to abstraction may cause the inclusion check to
return a counterexample that is spurious in the concrete program, leading to
unnecessary synchronization being synthesized. On our existing benchmarks,
this only occurred once in the usb-serial driver, where abstracting away the
return value of a function led to an infeasible trace. We refined the abstraction
manually by introducing a condition variable to model the return value.
While this result is encouraging, synthetic benchmarks are not necessarily
representative of real-world performance. We therefore implemented another set
of benchmarks based on a complete Linux driver for the TI AR7 CPMAC Ether-
net controller. The benchmark was constructed as follows. We manually prepro-
cessed driver source code to eliminate pointer aliasing. We combined the driver
with a model of the OS API and the software interface of the device written in
C. We modeled most OS API functions as writes to a special memory location.
Groups of unrelated functions were modeled using separate locations. Slightly
more complex models were required for API functions that affect thread synchro-
nization. For example, the free irq function, which disables the driver’s inter-
rupt handler, blocks waiting for any outstanding interrupts to finish. Drivers can
rely on this behavior to avoid races. We introduced a condition variable to model
this synchronization. Similarly, most device accesses were modeled as writes to
a special ioval variable. Thus, the only part of the device that required a more
accurate model was its interrupt enabling logic, which affects the behavior of
the driver’s interrupt handler thread.
Our original model consisted of eight threads. Liss ran out of memory on
this model, so we simplified it to five threads by eliminating parts of driver
functionality. Nevertheless, we believe that the resulting model represents the
most complex synchronization synthesis case study, based on real-world code,
reported in the literature.
The CPMAC driver used in this case study did not contain any known con-
currency bugs, so we artificially simulated five typical race conditions that com-
monly occur in drivers of this type [4]. Liss was able to detect and automatically
fix each of these defects (bottom part of Table 1). We only encountered two pro-
gram locations where manual abstraction refinement was necessary.
We conclude that (1) our coarse abstraction is highly precise in practice;
(2) manual effort involved in synchronization synthesis can be further reduced
via automatic abstraction refinement; (3) additional work is required to improve
the performance of our method to be able to handle real-world systems without
simplification. In particular, our analysis indicates that significant speed-up can
be obtained by incorporating a partial order reduction scheme into the language
inclusion algorithm.
7 Conclusion
We believe our approach and the encouraging experimental results open several
directions for future research. Combining the abstraction refinement, verification
(checking language inclusion modulo an independence relation), and synthesis
(inserting synchronization) more tightly could bring improvements in efficiency.
An additional direction we plan on exploring is automated handling of deadlocks,
i.e., extending our technique to automatically synthesize deadlock-free programs.
Finally, we plan to further develop our prototype tool and apply it to other
domains of concurrent systems code.
References
1. Alglave, J., Kroening, D., Nimal, V., Poetzl, D.: Don’t sit on the fence - A static
analysis approach to automatic fence insertion. In: CAV. pp. 508–524 (2014)
2. Bertoni, A., Mauri, G., Sabadini, N.: Equivalence and membership problems for
regular trace languages. In: Automata, Languages and Programming, pp. 61–71.
Springer (1982)
3. Bloem, R., Hofferek, G., Ko¨nighofer, B., Ko¨nighofer, R., Außerlechner, S., Spo¨rk,
R.: Synthesis of synchronization using uninterpreted functions. In: FMCAD. pp.
35–42 (2014)
4. Cˇerny´, P., Henzinger, T., Radhakrishna, A., Ryzhyk, L., Tarrach, T.: Efficient
synthesis for concurrency by semantics-preserving transformations. In: CAV. pp.
951–967 (2013)
5. Cˇerny´, P., Henzinger, T., Radhakrishna, A., Ryzhyk, L., Tarrach, T.: Regression-
free synthesis for concurrency. In: CAV, pp. 568–584 (2014), https://github.com/
thorstent/ConRepair
6. Cherem, S., Chilimbi, T., Gulwani, S.: Inferring locks for atomic sections. In: PLDI.
pp. 304–315 (2008)
7. Clarke, E., Kroening, D., Lerda, F.: A tool for checking ANSI-C programs. In:
TACAS. pp. 168–176 (2004), http://www.cprover.org/cbmc/
8. Clarke, E.M., Emerson, E.A.: Design and synthesis of synchronization skeletons
using branching time temporal logic. Springer (1982)
9. De Wulf, M., Doyen, L., Henzinger, T.A., Raskin, J.F.: Antichains: A new algo-
rithm for checking universality of finite automata. In: CAV. pp. 17–30. Springer
(2006)
10. Deshmukh, J., Ramalingam, G., Ranganath, V., Vaswani, K.: Logical Concurrency
Control from Sequential Proofs. In: Programming Languages and Systems, pp.
226–245 (2010)
11. Gupta, A., Henzinger, T., Radhakrishna, A., Samanta, R., Tarrach, T.: Succinct
representation of concurrent trace sets. In: POPL15. pp. 433–444 (2015)
12. Jin, G., Zhang, W., Deng, D., Liblit, B., Lu, S.: Automated Concurrency-Bug
Fixing. In: OSDI, pp. 221–236 (2012)
13. Ryzhyk, L., Chubb, P., Kuz, I., Heiser, G.: Dingo: Taming device drivers. In: Eu-
rosys (Apr 2009)
14. Sadowski, C., Yi, J.: User evaluation of correctness conditions: A case study of
cooperability. In: PLATEAU. pp. 2:1–2:6 (2010)
15. Solar-Lezama, A., Jones, C., Bod´ık, R.: Sketching concurrent data structures. In:
PLDI. pp. 136–148 (2008)
16. Vechev, M., Yahav, E., Yorsh, G.: Abstraction-guided synthesis of synchronization.
In: POPL. pp. 327–338 (2010)
17. Vechev, M.T., Yahav, E., Raman, R., Sarkar, V.: Automatic verification of deter-
minism for structured parallel programs. In: SAS. pp. 455–471 (2010)
A Semantics of preemptive and non-preemptive
execution
In Fig. 6 we present the non-preemptive semantics. The preemptive semantics
consist of the rules of the non-preemptive semantics and the single rule in Fig. 7.
We denote the state of a program as ⟨V, ctid, (P1, . . . ,Pn)⟩ where (a) Valu-
ation V is a valuation of all program variables. Further, for each lock l, we have
that V[l] holds the identifier of the thread that currently holds the lock, or 0
if no thread holds the lock. Similarly, for a condition variable c, we have thatV[c] = 0 if the variable is reset and V[c] = 1 otherwise. (b) The value ctid is the
thread identifier of the current executing thread or 0 in the initial state, and
(c) Program fragments P1 to Pn are the parts of the program to be executed by
T1 to Tn, respectively.
The premise in rule Sequential refers to the single-threaded semantics in
Fig. 3 or the abstract single-threaded semantics in Fig. 4. Rules LockYield and
AwaitYield force a context-switch iff the lock is not available or the condition
variable is not set.
ctid = 0 1 ≤ ctid′ ≤ n⟨V, ctid, (P1, . . . ,Pn)⟩ Ð→ ⟨V, ctid′, (P1, . . . ,Pn)⟩ScheduleStart
ctid = i ⟨V,Pi⟩ αÐ→ ⟨V ′,P ′i⟩⟨V, ctid, (P1, . . . ,Pi, . . . ,Pn)⟩ αÐ→ ⟨V, ctid, (P1, . . . ,P ′i , . . . ,Pn)⟩Sequential
ctid = i V(l) ∉ {0, i} 1 ≤ ctid′ ≤ n⟨V, ctid, (P1, . . . , lock(l), . . . ,Pn)⟩ Ð→ ⟨V, ctid′, (P1, . . . , lock(l), . . . ,Pn)⟩LockYield
ctid = i V(l) ∈ {0, i}⟨V, ctid, (P1, . . . , lock(l), . . . ,Pn)⟩ Ð→ ⟨V[l ∶= i], ctid, (P1, . . . , skip, . . . ,Pn)⟩Lock
ctid = i V(l) = ctid⟨V, ctid, (P1, . . . ,unlock(l), . . . ,Pn)⟩ Ð→ ⟨V[l ∶= 0], ctid, (P1, . . . , skip, . . . ,Pn)⟩Unlock
ctid = i V(c) = false 1 ≤ ctid′ ≤ n⟨V, ctid, (P1, . . . , await(c), . . . ,Pn)⟩ Ð→ ⟨V, ctid, (P1, . . . , await(c), . . . ,Pn)⟩AwaitYield
ctid = i V(c) = true⟨V, ctid, (P1, . . . , await(c), . . . ,Pn)⟩ Ð→ ⟨V, ctid, (P1, . . . , skip, . . . ,Pn)⟩Await
ctid = i⟨V, ctid, (P1, . . . , signal(c), . . . ,Pn)⟩ Ð→ ⟨V[c ∶= true], ctid, (P1, . . . , skip, . . . ,Pn)⟩Signal
ctid = i⟨V, ctid, (P1, . . . , reset(c), . . . ,Pn)⟩ Ð→ ⟨V[c ∶= false], ctid, (P1, . . . , skip, . . . ,Pn)⟩Reset
s1 ≠ skip ⟨V, ctid, (P1, . . . , s1, . . . ,Pn)⟩ αÐ→ ⟨V, ctid, (P1, . . . , s′1, . . . ,Pn)⟩⟨V, ctid, (P1, . . . , s1; s2, . . . ,Pn)⟩ αÐ→ ⟨V, ctid, (P1, . . . , s′1; s2, . . . ,Pn)⟩ Sequence
ctid = i 1 ≤ ctid′ ≤ n Pi = skip⟨V, ctid, (P1, . . . ,Pi, . . . ,Pn)⟩ Ð→ ⟨V, ctid′, (P1, . . . ,Pi, . . . ,Pn)⟩DescheduleSkip
ctid = i 1 ≤ ctid′ ≤ n⟨V, ctid, (P1, . . . , yield, . . . ,Pn)⟩ Ð→ ⟨V, ctid′, (P1, . . . , skip, . . . ,Pn)⟩Yield
Fig. 6: Operational non-preemptive semantics
1 ≤ ctid′ ≤ n⟨V, ctid, (P1, . . . ,Pn)⟩ Ð→ ⟨V, ctid′, (P1, . . . ,Pn)⟩DeschedulePreempt
Fig. 7: From non-preemptive semantics to preemptive semantics
B Proof of Thm. 1
Theorem 1. Given concurrent program C and a synthesized program C′ ob-
tained by adding synchronization to C, [[C′]]Pabs ⊆ [[C]]NPabs ⇒ [[C′]]P ⊆ [[C]]NP .
Proof. Let us assume [[C′]]Pabs ⊆ [[C]]NPabs .
Let σ′ be a concrete observation sequence in [[C′]]P . Let σ′abs be the abstract
observation sequence in [[C′]]Pabs corresponding to σ′. Then, there exists σabs ∈[[C]]NPabs such that σabs is equivalent to σ′abs.
Observe that if two abstract observation sequences — σ′abs from [[C′]]Pabs and
σabs from [[C]]NPabs — are equivalent, then they correspond to executions over
the same observable control-flow paths with the same data-flow into havoc and
input/output statements. Hence, σ′abs and σabs either both map back to infeasible
concrete observation sequences, or both map back to feasible, equivalent concrete
observation sequences.
Since σ′abs maps back to a feasible concrete observation sequence σ′ by def-
inition, σabs also maps back to a feasible concrete observation sequence, say σ,
such that σ is equivalent to σ′. Hence, we have [[C′]]P ⊆ [[C]]NP . ⊓⊔
C Language Inclusion Procedure
The algorithm for k-bounded language inclusion modulo I is presented as func-
tion Inclusion in Algo. 1 (ignore Lines 22-25 for now) . The function proceeds
exactly as the standard antichain algorithm outlined earlier. It explores A non-
deterministically as before, and Bk,I is determinized on the fly for exploration.
The antichain and frontier sets consist of tuples of the form (sA, SBk,I ), where
sA ∈ QA and SBk,I ⊆ QB ×Σk ×Σk. Each tuple in the frontier set is first checked
for equivalence w.r.t. acceptance (Line 18). If this check fails, the function re-
ports language inclusion failure (Line 18). If this check succeeds, the successors
are computed (Line 20). If a successor satisfies Rule 1, it is ignored (Line 21),
otherwise it is added to the frontier (Line 26) and the antichain (Line 27). During
the update of the antichain the algorithm ensures that its invariant is preserved
according to rule 2. The frontier also stores a sequence of symbols that lead to a
particular tuple of states in order to return a counterexample trace if language
inclusion fails.
We develop a procedure to check language inclusion modulo I by iteratively
increasing the bound k (see Algo. 1 in the appendix). The procedure is incremen-
tal: the check for k+1-bounded language inclusion modulo I only explores paths
along which the bound k was exceeded in the previous iteration. Given a newly
computed successor (s′A, S′Bk,I ) for an iteration with bound k, if there exists
some (sB , η1, η2) in S′Bk,I such that the length of η1 or η2 exceeds k (Line 22),
we remember the tuple (s′A, S′Bk,I ) in the set overflow (Line 23). We continue
exploration of Bk,I from all states (sB , η1, η2) with ∣η1∣ ≤ k ∧ ∣η2∣ ≤ k, but mark
them dirty. If we find a counter-example to language inclusion we return it and
test if it is spurious (Line 8). It may be a spurious counter-example caused be-
cause we removed states exceeding k. In that case we increase the bound to k+1,
remove all dirty items from the antichain and frontier (lines 10-11), and add the
items from the overflow (Line 12). Intuitively this will undo all exploration from
the point(s) the bound was exceeded and restarts from that/those point(s).
To test if a particular counterexample is spurious, we invoke the language
inclusion procedure, replacing the preemptive automaton with the exact trace
(trace automaton) and allowing an infinite bound. This is fast and guaranteed
to terminate as the trace automaton does not have loops. We found that this
optimization helps find a valid counterexample faster.
Algorithm 1 Checking language inclusion modulo I
Require: Automata A = (QA,ΣA,∆A, IA, FA) and B = (QB ,ΣB ,∆B , IB , FB)
Ensure: true only if L(A) ⊆ CloI(L(B)), false only if L(A) /⊆ CloI(L(B))
1: frontier ← {(sA,{(IB ,∅,∅)},∅) ∶ sA ∈ IA}
2: All tuples in frontier are not dirty
3: antichain ← frontier
4: overflow ← ∅
5: k ← 2
6: while true do
7: cex← inclusion(k)
8: if cex ≠ true ∧ cex is spurious then
9: k ← k + 1
10: frontier ← {(sA, SBk,I ) ∈ frontier ∶ SBk,I not dirty} ∪ overflow
11: antichain ← {(sA, SBk,I ) ∈ antichain ∶ SBk,I not dirty} ∪ overflow
12: overflow ← ∅
13: else
14: return cex
15: function inclusion(k)
16: while frontier ≠ ∅ do
17: remove a tuple (sA, SBk,I , cex) from frontier
18: if sA ∈ FA ∧ (SBk,I ∩ FB) = ∅ then return cex
19: for all α ∈ Σ do
20: (s′A, S′Bk,I )← succα(sA, SBk,I )
21: if ∄p ∈ antichain ∶ p ⊑ (s′A, S′Bk,I ) then ▷ Rule 1
22: if ∃(sB , η1, η2) ∈ S′Bk,I ∶ ∣η1∣ > k ∨ ∣η2∣ > k then
23: if S′Bk,I not dirty then overflow ← overflow ∪ {(s′A, S′Bk,I )}
24: S′Bk,I ← {(sB , η1, η2) ∈ S′Bk,I ∶ ∣η1∣ ≤ k ∧ ∣η2∣ ≤ k}
25: Mark S′Bk,I dirty
26: frontier ← frontier ∪ {(s′A, S′Bk,I , cex ⋅ α)}
27: antichain ← antichain/{p ∶ S′Bk,I ⊑ p} ∪ {(s′A, S′Bk,I )} ▷ Rule 2
28: return true
D Synchronization inference rules
The inference rules are applied as rewrite rules to the formula ρg obtained in
Sec. 5. Each rule requires a certain subexpression in ρg and rewrites it to a
synchronization primitive. That means that a single ρg could possibly be solved
by one of several synchronization primitives.
The two lock rules fix atomicity violations and the reorder rule fixes ordering
violations. The Add.Lock rule captures a set of threads where thread 1 is
descheduled at or after location l1 and thread 2 is scheduled at or before l2.
Another context switch deschedules thread 2 at or after l′2 and schedules again
thread 1 at or before l′1. As this pattern is present in the generalized ρg this
context switch is necessary to make the trace bad. We can avoid this context
switch by adding the lock from the conclusion. The Add.Lock2 rules captures
the more general case where both, thread 2 interrupting thread 1 and thread 1
interrupting thread 2, are bad traces.
The Add.Reorder rule captures an ordering violation that can be fixed
by moving a signal() statement. Intuitively the await() statement is signaled too
early and thread 1 can start running in the preemptive semantics. In the non-
preemptive semantics thread 2 keeps running after a signal() statement until a
preemption point is reached.
ρg = tid1.l1 < tid2.l′2 ∧ tid2.l2 < tid1.l′1 ∧ ψ
lock(tid1.[l1 ∶ l′1], tid2.[l2 ∶ l′2]) ∨ ψ Add.Lock
ρg = tid1.l1 < tid2.l2 ∧ tid2.l′2 < tid1.l′1 ∧ ψ
lock(tid1.[l1 ∶ l′1], tid2.[l2 ∶ l′2]) ∨ ψ Add.Lock2
ρg = tid1.l′1 < tid2.l′2 ∧ ψ ∃tid1.l1, tid2.l2 ∶ tid1.l1 < tid1.l′1
tid2.l2 < tid2.l′2 tid1.l1 = await(c) tid2.l2 = signal(c)
reorder(tid2.l2, tid2.l′2) ∨ ψ Add.Reorder
Fig. 8: Synchronization inference rules
