LNCS by Cerny, Pavol et al.
Efficient Synthesis for Concurrency by
Semantics-Preserving Transformations?
Pavol Cˇerny´1, Thomas A. Henzinger2, Arjun Radhakrishna2, Leonid Ryzhyk3,
and Thorsten Tarrach2
1 University of Colorado Boulder
2 IST Austria
3 NICTA
Abstract. We develop program synthesis techniques that can help pro-
grammers fix concurrency-related bugs. We make two new contributions
to synthesis for concurrency, the first improving the efficiency of the syn-
thesized code, and the second improving the efficiency of the synthesis
procedure itself. The first contribution is to have the synthesis procedure
explore a variety of (sequential) semantics-preserving program transfor-
mations. Classically, only one such transformation has been considered,
namely, the insertion of synchronization primitives (such as locks). Based
on common manual bug-fixing techniques used by Linux device-driver
developers, we explore additional, more efficient transformations, such
as the reordering of independent instructions. The second contribution
is to speed up the counterexample-guided removal of concurrency bugs
within the synthesis procedure by considering partial-order traces (in-
stead of linear traces) as counterexamples. A partial-order error trace
represents a set of linear (interleaved) traces of a concurrent program
all of which lead to the same error. By eliminating a partial-order er-
ror trace, we eliminate in a single iteration of the synthesis procedure
all linearizations of the partial-order trace. We evaluated our techniques
on several simplified examples of real concurrency bugs that occurred in
Linux device drivers.
1 Introduction
We develop program synthesis techniques that can help programmers fix
concurrency-related bugs. We place ourselves into a setting all the threads of
the program are sequentially correct, i.e., all the errors are due to concurrency.
In this setting, our goal is to automatically fix concurrency errors. In other words,
the programmer needs to worry only about sequential correctness, and the syn-
thesis tool automatically makes the program safe for concurrent execution.
Our first contribution is to have the synthesis procedure explore a variety
of (sequential) semantics-preserving program transformations to obtain efficient
? This work was supported in part by the Austrian Science Fund NFN RiSE (Rigor-
ous Systems Engineering), by the ERC Advanced Grant QUAREM (Quantitative
Reactive Modeling), and by a gift from Intel Corporation.
concurrent code. In existing work on partial program synthesis, mostly only one
such transformation has been considered: the insertion of synchronization prim-
itives such as locks [15, 3]. Our study of real-world concurrency bugs from device
drivers shows that only 17% of bugs are fixed using locks. For the remaining bugs,
developers use other program transformations that avoid the use of synchroniza-
tion primitives yielding more efficient code. In particular, the most common fix
used for 28% of driver concurrency bugs is instruction reordering, i.e., a rear-
rangement of program instructions that changes the driver’s concurrent behavior
while preserving the sequential semantics of each thread. For example, a pointer
initialization may be moved before an instruction releasing another thread that
dereferences the pointer. We develop a technique for automating this type of
transformation. We also consider other semantics-preserving transformations in-
spired by practical bug-fixing techniques. For example, the synthesis tool may
repeat idempotent instructions multiple times (we give an example where dupli-
cating an instruction removes a concurrency bug).
Our second contribution is to increase the efficiency of the synthesis pro-
cedure itself by considering partial-order traces (as opposed to linear traces)
as counterexamples in the context of counterexample-guided synthesis. A par-
tial order on the instructions involved in the counterexample represents a set
of linear counterexample traces that all lead to the same error. We first find
a linear counterexample trace using an off-the-shelf tool, and then generalize
it to a partial-order trace. We achieve this generalization by combining ideas
from Lipton reduction [9] and error invariants [6]. We relax the ordering of a
pair of instructions in the linear trace if swapping these instructions preserves
error invariants (and thus the bug can still be reached). Intuitively, the result-
ing partial-order trace captures the ‘true cause’ of the bug. For instance, if the
linear counterexample includes context switches that are not necessary to reach
the bug, these context switches will not be required by the partial-order trace.
A key insight in our algorithm is that given the partial-order trace µ, the
problem of eliminating µ can be phrased as the problem of creating a minimal
cycle in a graph (representing the partial order) by adding new edges. A graph
with a cycle does not allow linearization and hence a cycle corresponds to a set
of transformations that together eliminate µ. The additional edges correspond
to possible instruction reordering or the insertion of atomic sections. Each addi-
tional edge is labeled by a cost (for instance, the length of the atomic section).
We implemented our techniques in a prototype tool called ConcurrencySwap-
per. As specifications, we handle assertions, deadlocks, and generic conditions
such as pointer use before initialization. However, our techniques apply to a
larger class of reachability properties. For finding buggy traces, we use the
model checker Poirot [1]. If Poirot produces a buggy trace, ConcurrencySwap-
per generalizes it to a partial-order trace, which it then tries to eliminate first
by instruction reordering, and failing that, using an atomic section. Otherwise,
the current version of the driver is returned, with all the discovered bugs fixed.
We evaluated our tool on (a) five microbenchmarks that are simplified ver-
sions of bugs from Linux device drivers, and (b) a simplified driver for the Realtek
init: x = 0; t1 = F
thread1 thread2
A: l1 = x 1: l2 = x
B: l1++ 2: l2++
C: x = l1 3: x = l2
D: t1 = T 4: assert(!t1∨x=2)
(a) Concurrent increment
init: IntrMask = 0; ready = 0; handled = 0;
init thread intr thread
M: IntrMask = 1 R: assume(IntrMask = 1)
N: ready = 1 S: handled = ready
T: assert(handled)
(b) Interrupt handling
A
B
C
D
1
2
3
4
(c) 1pi
A
B
C
D
1
2
3
4
l1 = l2
(d) 2pi
A
B
C
D
1
2
3
4
(e) 1+pi
M
N
R
S
T
(f) θ
Fig. 1: Illustrative examples
8169 Ethernet controller. The latter had 364 LOC, seven threads, and contained
five bugs. In the experiments, we found that: (a) bug finding and verification
(in Poirot) dominates time spent generalizing counterexamples, and (b) using
generalized counterexamples reduces the number of bug-finding iterations.
Related work. Synthesis for concurrent programs has attracted considerable
research [13, 15, 3], which is mainly concerned with the insertion of locks. In con-
trast, we consider general semantics-preserving transformations, a key one being
instruction reordering. In [13] and [14], an order for a given set of instructions
is synthesized, whereas we are given a buggy program, and we reorder instruc-
tions to remove the bug (while preserving the sequential semantics). The main
difference from previous work is the algorithm. We generalize counterexample
traces to partial-order traces and eliminate them by adding constraints on the
instruction order. In contrast, e.g., in [13] the problem of choosing orderings is
reduced to resolution of nondeterminism.
Concurrent trace theory has long studied the idea of treating traces as par-
tial orders over events, as in the seminal work on Mazurkiewicz traces (see, for
example, [10]). However, their use in counterexample-guided approaches to ab-
straction refinement, verification, or synthesis has not been considered before.
We augment concurrent traces with error invariants, on which there has been
recent work in the sequential setting [6].
2 Illustrating Examples
Generalizing buggy traces. In Figure 1a, thread1 and thread2 concurrently
increment x. The assertion states that x is 2 in the end. It fails in trace pi ≡
A→ B→ 1→ C→ 2→ D→ 3→ 4, where both threads read the initial x value
0, and then write back 1 to x. However, pi is just one trace exhibiting this bug.
For example, swapping B and 1 in pi gives another buggy trace. Let pi be a
total order where X pi Y iff statement X occurs before Y in pi. We relax pi
by removing all constraints X pi Y where X; Y has the same effect as Y; X.
This gives us the partial order 1pi (shown in Figure 1c). All traces where the
execution order respects 1pi fail the assertion.
For C and 3, the sequence C; 3 is not equivalent to 3; C when l1 6= l2.
However, in all traces of 1pi, it can be seen that l1 = l2 = 1, and further, this is
sufficient to trigger the bug. These sufficient conditions to trigger bugs are error
invariants. Using this information, we can further relax pi to 1pi shown in
Figure 1d, where the only constraints are that both threads read x before either
writes to it, and that D occurs before 4. A main component of our synthesis
algorithm is the generalization of buggy traces to determine their root cause.
Atomic sections. We attempt to eliminate the bug represented by pi by
adding atomic sections. For example, adding an atomic section around 1, 2,
and 3 in 1pi gives us 1+pi from Figure 1e, where the atomic section is col-
lapsed into a single node. Note that 1+pi is not a valid partial order, as
there is a cycle of nodes [1;2;3] and C. Intuitively, the cycle implies that
[1;2;3] happens both before and after C, which is impossible. Hence, adding
an atomic section around [1;2;3] eliminates all traces represented by 1pi from
the program. The atomic section [1;2;3] does not eliminate the buggy trace
A → [1; 2; 3] → B → C → D → 4. Analyzing this trace similarly, we find that
another atomic section [A;B;C] is needed to obtain a correct program.
The number of bug fixing iterations can be reduced using error invariants.
For example, in 2pi, the atomic section [1;2;3] is not sufficient to create a cycle;
instead, we immediately see that both [1;2;3] and [A;B;C] are needed.
Instruction reordering. The example in Figure 1b is inspired by a real bug
from a Linux device driver. Thread intr thread runs when interrupts are en-
abled, i.e., IntrMask is 1, and attempts to handle them; it fails if the driver is
not ready. The init thread enables interrupts and readies the driver.
The bug is that interrupts are enabled before the driver is ready, for example,
in trace θ ≡ M→ R→ S→ N→ T. Note that statements M and N are independent,
i.e., M; N is equivalent to N; M. We construct a partial order from θ as before,
but remove the constraint M θ N, giving us Figure 1f (excluding the dashed
edge). Adding the edge N→ M creates a cycle and eliminates the bug. This edge
changes the order of M and N, forcing the order N; M. This results in a correct
program with the driver ready to handle interrupts before they are enabled.
Following the ideas presented in this section, our synthesis algorithm works by
generalizing linear counterexample traces to partial-order traces and eliminating
them using atomic section insertion or instruction reordering.
3 Model and Problem Statement
Let V be a set of variables ranging over a domain D. A V -valuation is a function
V : V → D. A state assertion φ is a first-order constraint over valuations of
variables. We model an instruction as an assertion τ over V and V ′. Intuitively,
V and V ′ represent the values of the variables before and after the execution
of the instruction, respectively. For example, x′ = x + y represents x = x + y
in a C-like language. Given a V -valuations V and a V ′-valuation V ′, we write
(V,V ′) |= τ to denote the fact that assertion τ holds for values given by V and
V ′. Furthermore, we require that the instruction is deterministic, i.e., for every
V, there is at most one V ′ such that (V,V ′) |= τ .
We model procedures as control-flow graphs (CFGs) with locations labeled
with instructions. Formally, a method is a tuple 〈V, I,O, S,∆, sι, sf , inst〉 where
(a) V is a set of variables and I ⊆ V and O ⊆ V are input and output variables
respectively; (b) S is a finite set of control locations and sι and sf are the initial
and final control locations respectively; (c) ∆ ⊆ S×S is a set of transitions; and
(d) inst is a function labeling control locations with instructions. The initial
values of the input variables and the final values of the output variables are the
arguments and the return values of the method call, respectively. We assume
that methods are deterministic, and that all the CFGs are reducible (see [11]).
A concurrent library P is a tuple 〈M1, . . . ,Mn〉 of methods with mutually dis-
joint control locations. Let locs(Mi), vars(Mi), ∆(Mi) be the control locations,
variables, and transitions of Mi, respectively. Further, let globals(P) denote the
variables shared among methods, and locs(P) = ⋃i locs(Mi) the locations of P.
Modeling language constructs. Programs are encoded as CFGs in a standard way.
For example, if(x == 0) is a choice between then and else branches prefixed
with assume(x == 0) and assume(x != 0) respectively. We model assertions
in Mi using variable errMi that is set to 1 when an assertion fails. To block
execution after an assertion failure, we replace every instruction τ by errMi =
0 ∧ τ . Atomic sections are modeled using an auxiliary variable that prevents
other methods from running when the program is inside an atomic section.
Semantics. The methods of a library can be executed in parallel, by an un-
bounded number of threads. We assume that each thread executes one method.
Let Tids be a set of thread identifiers. A thread state is a triple (tid , s,V) where
tid ∈ Tids, s ∈ locs(Mi) is a control location and C is a (vars(Mi) \ globals(P))-
valuation. A thread state is initial (resp., final) if the control state is initial (resp.
final). A library state (G, T ) contains a globals(P)-valuation G, and a set T of
thread states with unique thread identifiers. State (G, T ) is final for thread tid
if (tid , s,V) ∈ T where s is a final control location. We denote by (G, T )tid the
valuation given by G ∪ V where V is such that (tid , s,V) ∈ T .
A single-step execution of thread tid is a triple ((G, T ), s, (G′, T ′)) such that
there exist Mi, (tid , s,V) ∈ T and (tid , s′,V ′) ∈ T ′ with (a) T \ {(tid , s,V)} =
T ′ \ {(tid , s′,V ′)}; and (b) (G ∪ V,G′ ∪ V ′) |= inst(s) and (s, s′) ∈ ∆(Mi). A
trace pi of L is a sequence (G0, T0)s0(G1, T1)s1 . . . sn−1(Gn, Tn) where every thread
state in T0 is initial, and every ((Gi, Ti), si, (Gi+1, Ti+1)) is a single-step execution.
Since our instructions are deterministic, we write pi as [(G0, T0)]s0 → s1 → . . ..
A sequential trace of L is a trace pi = (G0, T0)s0 . . . such that all the transitions
of a single thread tid occur in a contiguous block (say (Gk, Tk)sk . . . (Gk+l, Tk+l))
with (Gk+l, Tk+l) being final for tid . For any such block, we write [(Gk, Ik) −
seq(Mi) → (Gk+l,Ok+l)] where Ik and Ok+l are the valuation of the in-
put and output variables in (Gk,Sk)tid and (Gk+l,Sk+l)tid , respectively. The
sequential semantics SeqSem(Mi) of method Mi is the set of all relations
[(G, I)− seq(Mi)→ (G′,O)] that occur in sequential traces of P. Intuitively, the
SeqSem(Mi) characterizes all sequential input-output behaviour of a method.
For every trace pi = (G0, T0)s0 . . . sn−1(Gn, Tn), we define timepi :
{0, . . . , n} → N as follows: (a) timepi(0) = 0; (b) timepi(i + 1) = timepi(i)
if si is in an atomic section; and (c) timepi(i+ 1) = time(i) + 1 otherwise.
The synthesis problem. A trace pi = (G0, T0)s0(G1, T1) . . . of P is erroneous
if some library state (Gi, Ti) has an errMi set to 1. Let P = 〈M1, . . . ,Mn〉 be a
concurrent library. Library P is sequentially correct if every sequential trace of P
is not erroneous. Let M and M ′ be two methods with the same input and output
variables. M and M ′ are sequentially equivalent, if SeqSem(M) = SeqSem(M ′).
Let P ′ = 〈M ′1, . . . ,M ′n〉 be a concurrent library. We say that P ′ is sequentially
equivalent to P if, for all i, Mi is sequentially equivalent to M ′i .
The concurrent library synthesis problem asks the following: given a sequen-
tially correct concurrent library P = 〈M1, . . . ,Mn〉, output a sequentially equiv-
alent library P ′ = 〈M ′1, . . . ,M ′n〉 such that every trace of P ′ is not erroneous.
Intuitively, the problem asks for a version P ′ of P which is safe for concurrent
execution, but has the same sequential semantics. The obvious solution involv-
ing adding an atomic section around each method is undesirable. Instead, our
approach is as follows: (a) find an erroneous trace pi of P; (b) compute a
generalization of pi; and (c) transform the methods minimally to avoid the bug.
We observe the following about the definition. First, we assume a sequentially
correct library to ensure that at least one solution to the synthesis problem exists.
However, our approach can be easily extended to not have this assumption; in
that case, it may terminate without finding a solution. Second, correctness was
specified by assertions in the code. However, our approach works for a more
general class of specifications, namely, for safety properties on the global state.
4 Semantics-Preserving Transformations
We present a few sequential-semantics preserving transformations, focusing on
statement reordering and insertion of atomic sections. We also give a motivating
example for idempotent statement duplication, but do not treat it formally, in
the interest of space. Our synthesis algorithm operates by collecting constraints
on possible solutions, and hence, we present representations of constraints on
possible solutions obtained by the considered transformations.
Statement reordering. Statement reordering is a transformation that
changes the order of statements within a method. Notably, this transformation
can change concurrent behavior without changing sequential semantics (e.g.,
in Figure 1b). A block is a single-entry, maximal sequence of control locations
s0 . . . sk representing straight-line code. For simplicity, we only consider reorder-
ings within blocks. Our techniques can be extended to allow reorderings across
block boundaries.
We represent multiple reorderings of method M compactly using a reordering
constraint v⊆ locs(M)× locs(M), that specifies a partial order on locs(M). We
may have s v s′ only if s and s′ are from the same block. We write s vp s′, if
s v s′ and s is different from s′. Let M ′ be a method obtained by statement
reordering from method M . Method M ′satisfies v if for all s vp s′, s occurs
before s′ in the corresponding block. A reordering constraint v is weaker than
v′ if s v s′ =⇒ s v′ s′.
As our reorderings need to preserve sequential semantics, we can compute
some reordering constraints even before considering concurrent executions. The
procedure SemPreservingOrders computes a reordering constraint v as follows.
It first constructs the total order v of control states in each block. Then, it picks
s and s′ such that s vp s′, and checks if inst(s) and inst(s′) commute, i.e., we
test using a theorem prover two conditions: (a) inst(s′); inst(s) can execute to
completion from each state inst(s); inst(s′) can; and (b) they have the same
effect. If they commute, then the pair (s, s′) is removed from v and we repeat
the process. When not such pair exists, we return v. If SemPreservingOrders
returns v on input M , then every M ′ satisfying v (and obtained by reordering)
is sequentially equivalent to M , and no weaker v′ has the same property.
Example 1. Running SemPreservingOrders on the code fragment from Fig-
ure 1b gives us a single constraint S v T as all other pairs of statements are
independent of each other. In Figure 1a, we get A v B v C and 1 v 2 v 3 v 4.
Atomic sections. Our second semantics-preserving transformation is inser-
tion of an atomic section. An atomic section encapsulates a set of statements,
and ensures that no concurrent thread can interrupt the execution of these state-
ments.
We represent an atomic section by the set of control locations A it contains.
An atomicity constraint α ⊆ 2locs(P) is satisfied by a set of atomic sections
{A1, A2, . . . , An} if ∀A ∈ α.∃Ai : A ⊆ Ai. An atomicity constraint α is weaker
than an atomicity constraint α′ if ∀A ∈ α.∃A′ ∈ α′ : A ⊆ A′. Any set of atomic
sections that satisfies α′ also satisfies the weaker α.
Combining atomicity and reordering constraints. A constraint is
an atomicity- and reordering-constraint pair. Constraint (α,v) is weaker than
(α′,v′) if either α is weaker than α′, or α = α′ and v is weaker than v′.
Intuitively, we prefer reordering to inserting atomic sections. We define the con-
junction of constraints (α,v) ∧ (α′,v′) as (α′′,v′′) where:
– α′′ = α ∪ α′ ∪ {{s, s′} | ((s v s′ ∧ s′ v′ s) ∨ (s v′ s′ ∧ s′ v s)) ∧ s 6= s′};
– v′′= (v ∪ v′) \ {(s, s′) | ((s v s′ ∧ s′ v′ s) ∨ (s v′ s′ ∧ s′ v s)) ∧ s 6= s′}.
Intuitively, if v and v′ disagree on the order of s and s′, we put them in an
atomic section. We define > as a trivial constraint satisfied by all libraries.
Other transformations. We motivate another sequential-semantics pre-
serving transformation with an example. Some further transformations are in
Section 7.1.
Example 2. In Figure 2, the timer thread is invoked when timer enabled =
1 to handle requests. The device shutdown thread, shutdown, handles the re-
maining requests and disables the timer. There are two correctness conditions:
(1) the timer is disabled after device shutdown; and (2) the unsafe() function
can accessed only by one thread at a time. Condition (2) is violated as state-
ments 1 and C can cause unsafe to be executed simultaneously. This happens if
C calls unsafe, and after executing a few instructions of unsafe, thread timer
executes and, in the atomic section, calls unsafe. One fix is to move 2 before 1.
This introduces a trace where the assertion fails as the timer gets re-enabled by
1 (switching 1 and 2 is not semantics preserving). A possible fix is to execute
statement 2 twice, before and after statement 1.
init: timer enabled=1, halted=0
timer shutdown work queue() {
atomic{ 1: work queue() P: unsafe()
A: assume(timer enabled) 2: timer enabled=0 Q: timer enabled=1
B: timer enabled=0 3: assert(!timer enabled) }
C: work queue()
}
Fig. 2: Example for copying idempotent statements.
The above example illustrates another useful semantics-preserving transforma-
tion, namely, replication of idempotent statements. A statement s occurring after
s′ can be replicated before s′ if s; s’; s has the same effect as s’; s.
5 Generalizing Counterexamples to Partial-Order Traces
Partial-order traces. A po-trace µ of a concurrent library P is a tuple 〈ϕin , X,
, loc, ϕend〉 where (a) X is a finite set (b)  is a partial-order on X; (c) loc :
X → Tids × locs(P) is a function labeling X with thread identifiers and control
states; and (d) ϕin and ϕend are state assertions.
A po-trace represents a set of traces of the library. We say a trace pi =
(G0, T0)s0(G1, T1) . . . sn−1(Gn, Tn) is contained in µ if there is a bijection f :
X → {0, 1, . . . , n − 1} such that: (a) f(x) = i =⇒ loc(x) = (tid, si) and
∃V : (tid , si,V) ∈ Ti; (b) x1  x2 =⇒ timepi(f(x1)) ≤ timepi(f(x2)); and
(c) f(x) = 0 =⇒ (Gi, Ti)tid |= ϕin) ∧ f(x) = n− 1 =⇒ (Gn, Tn)tid |= ϕend).
Intuitively, pi ∈ µ if the execution order of statements in pi respects the partial
order given by , the condition ϕin holds at the beginning, and the condition
ϕend holds at the end. If the order  is linear, we call µ a linear po-trace.
As an example, we show how an erroneous trace pi =
(G0, T0)s0(G1, T1) . . . sn−1(Gn, Tn) such that Gn(err) = 1 is converted into
a linear po-trace µpi = 〈ϕin , X,, loc, ϕend〉. The elements of the tuple are
defined as follows. (a) X is {0, 1, . . . , n-1 }; (b) pi is the natural order on
N; (c) loc(i) = (tid , si) if ((Gi, Ti), si, (Gi+1, Ti+1)) is a single-step execution of
thread tid ; and (d) ϕin expresses that we start from any initial state; (e) ϕend
expresses that an error is reached (i.e., it is err = 1).
Given two po-traces µ = 〈ϕin , X,µ, loc, ϕend〉 and µ′ = 〈ϕin , X,µ′
, loc, ϕend〉 sharing the same trace locations X and loc, and assertions ϕin and
ϕend , we say that µ
′ is a relaxation of µ if µ⊇µ′ . Intuitively, a relaxed po-trace
puts fewer constraints on the order of execution of statements.
Error invariants. Error invariants were introduced in [6] in a sequential setting.
Here we use them to generalize counterexamples to partial-order traces. Let
µ be a linear po-trace 〈ϕin , X,, loc, ϕend〉. Without loss of generality, let X
be {0, 1, . . . , n − 1} and let  be the natural order on N. An error invariant
ErrInv is a function from X to state assertions, such that : (a) ErrInv(0) = ϕin
(b) ErrInv(i) (for 0 < i ≤ n−1) over-approximates the set of states reachable at
i along µ. That is, if for a trace pi = (G0, T0)s0(G1, T1) . . . sn−1(Gn, Tn) we have
that if ϕin holds for (G0, T0), then ErrInv(i) holds for (Gi, Ti). (c) ErrInv(i)
under-approximates the set of states from which we can reach the ϕend along µ
starting from i. That is, if for a trace pi = (G0, T0)s0(G1, T1) . . . sn−1(Gn, Tn) we
have that if ErrInv(i) holds for (Gi, Ti), then ϕend holds for (Gn, Tn).
As an example, let us consider linear po-trace given by a sequence of state-
ments A: x=x+1; B:x=2*x; C:y=y+1, and where ϕin is x = 0∧y = 0, and ϕend
is x > 0 ∧ y > 0. An error invariant ErrInv can be x > 0 ∧ y ≥ 0 before the
execution of B, and the same formula before C.
We generalize the notion of error invariant to (non-linear) po-traces. Let a
po-trace µ be a tuple 〈ϕin , X,, loc, ϕend〉. An error invariant for µ is a function
ErrInv from X to state assertions such that ErrInv is an error invariant for
every linear po-trace µ′ such that µ is a relaxation of µ′.
5.1 Generalizing counterexample traces
We say that a po-trace µ is a counterexample if every trace pi contained in µ is
erroneous. Given an erroneous trace pi, or equivalently, a linear po-trace µl, we
now present techniques for generalizing it into a non-linear po-trace µ that is a
counterexample.
The trace generalization technique proceeds iteratively. Given a po-trace
µ = 〈ϕin , X,, loc, ϕend〉, in each step, we attempt to relax µ by removing
the relation A  B for some A,B ∈ X where A and B correspond to statements
from different threads. Further, we require that ¬∃P : A  P  B. However,
we need to ensure that the resultant po-trace remains a counterexample after
the relaxation, i.e., that every trace contained in it is an erroneous trace. We
formalize this condition below.
Let C ,D ∈ X be such that C  A  B  D and ∀E ∈ X : E  C ∨ C 
E  D ∨ D  E. Further, let κ ⊆ X be the set {E|C  E  D} \ {D}, i.e.,
κ represents the set of instructions occur between C and D . We call the triple
(C ,D , κ) a border set of A, B, and .
Let JC and JD be the error invariants at C and D . Intuitively, we check
that we can get from JC to JD for every ordering of instructions in κ allowed
by  \(A,B). Formally, let X1, X2, . . .Xn be such that each Xi ∈ κ and
∀E ∈ κ.∃i : Xi = E, and ∀i : Xi  Xi+1 ∨ Xi = B ∧ Xi+1 = A. Let si be
the instruction corresponding to Xi. We allow relaxing the condition A  B
in a step if and only if the following holds: for every sequence X1, X2, . . . , Xn
satisfying the above conditions, the Hoare-triple {JC}s1; s2; . . . ; sn{JD} is valid.
Therefore, the full technique for generalizing a trace is as follows: in each step,
we pick A and B, and then check the above conditions. If they hold, we relax by
removing the pair (A,B) from . Although this technique is sound and complete
for generalizing traces, it can be inefficient due to the large number of complex
checks needed in each iteration. Instead, we present an alternative algorithm
(Algorithm 1) which is sound, but incomplete. The outline of the algorithm is
the same as the complete technique presented above, i.e., in each iteration, the
algorithm attempts to relax A  B. However, we use two alternative checks.
Note that when we try to relax an edge from A to B, we need to check
whether this does not invalidate any previous relaxation. Therefore we recheck
all the previously relaxed edges in the border set given by A and B.
Algorithm 1 Generalizing linear counterexamples
Input: linear counterexample po-trace µl, error invariant ErrInv
Output: counterexample po-trace µ that is a relaxation of µl
1: µ = µl
2: for all A µl B do
3: removeEdge(A,B, µ)
4: C,D, κ← borderSet(A,B, µ)
5: res ← true
6: for all U, V ∈ κ do
7: if U 6µ V ∧ V 6µ U then
8: res ← res ∧ (check1(U, V,C,D, µ,ErrInv) ∨
check2(U, V,C,D, µ,ErrInv))
9: if ¬ res then addEdge(A,B, µ)
10: return µ
Rule 1, implemented in procedure check1 , allows relaxing the order be-
tween statements that commute under certain conditions. Let sU and sV be
the instructions corresponding to U to V . To relax the edge from U to V ,
we check if there exists K1 such that {JC}sC{K1} is a valid Hoare-triple and
K1 ∧ sV ; sU =⇒ K1 ∧ sU ; sV . Intuitively, we are checking if the instructions
sU and sV commute given the pre-condition K1. Further, we require that other
instructions do not interfere with K1, i.e., for all E ∈ κ with instruction sE , K1
is preserved under sE , i.e., {K1}sE{K1} is a valid Hoare-triple.
Rule 2, implemented in procedure check2 , allows relaxing the order between
statements which do not commute, but ensure the similar post-conditions in
both orders. The procedure check2(U, V,C,D, µ) works as follows. Let JC be
the error invariant at C , and let JD be the error invariant at D . Let sU and sV be
the two instructions at nodes U and V . The procedure returns true if and only
if there exists two state assertions K1 and K2 such that all nodes the following
conditions hold: (a) {JC}sC{K1}, {K1}sU ; sV {K2}, and {K1}sU ; sV {K2} are
valid Hoare-triples; and (b) K2 → JD . These conditions state that the error
invariants are sufficient to prove that su and sv commute. Furthermore, let E
be any other node in κ, and let sE be the corresponding instruction. We require
that sE preserves K1 and K2, i.e., the following two Hoare-triples are valid:
(c) {K1}sE{K1} (d) {K2}sE{K2} Intuitively, instead of checking all allowed
paths from C to D , we find state assertions K1 and K2 that are strong enough
to prove commutativity, but are preserved by other statements in κ.
Example 3. – Consider methods 1: x = 0; 2: x = x++ and A: x = x++;
B: assert(x ≤ 1). Here, 1→ A→ 2→ B is an erroneous trace. However,
the ordering of A and 2 is irrelevant to the bug. This order can be eliminated
by applying Rule 1 with precondition K1 ≡ true, as we have A; 2 =⇒ 2; A.
– Using Rule 1 in the illustrative example (Figure 1a) taking K1 to be l1 =
1 ∧ l2 = 1 lets us commute the statements x = l1 and x = l2.
– Consider two methods each with the code 1: x = 3 and A: x = 2; B:
assert(x == 0). The erroneous trace here is 1→ A→ B. Here, it is clear
that 1 and A do not commute, i.e., 1; A 6≡ A; 1. However, in the context of
this trace, interchanging A and 1 still preserves the error. Therefore, using
Rule 2 with K1 ≡ true and K2 ≡ x > 0 relax the ordering between A and 1.
We note that Rule 1 and Rule 2 provide only a sound, not a complete proof
system for trace generalization. Application of both these rules involve find-
ing suitable K1 and K2. The set of conditions imposed on K1 and K2 can be
expressed as Horn clauses. Solving Horn clauses (in logics useful for program
analysis) is a focus of recent research. Non-recursive version was solved by [7],
and recursive Horn clauses are solved successfully using heuristics, for example,
in [8]. These techniques can be used to implement check1 and check2.
Theorem 1. Let µl be a linear counterexample po-trace corresponding to an
erroneous trace, and ErrInv an error invariant for µl. If Algorithm 1 returns
po-trace µ on µl and ErrInv, then µ is a counterexample and a relaxation of µl.
6 Synthesis by Elimination of Partial-Order
Counterexamples
We now present Algorithm 2 to solve the synthesis problem stated in Section 3.
It works by finding a buggy trace, generalizing it, and then eliminating it us-
ing either an atomic section, or a code reordering. The algorithm maintains an
atomicity constraint α and a reordering constraint v. In each iteration, library
P ′ which satisfies (α,v) is picked and verified. If correct, it is returned. Other-
wise, (α,v) is strengthened using the generalized counterexample. Note that as
Verify is solving an undecidable problem, it may not terminate. This results in
our algorithm not terminating as well. However, as the constraint is strength-
ened at each step and only a finite number exist, if all calls to Verify terminate,
then the algorithm terminates and always returns a correct library. This correct
library, in the worst case, will have every method enclosed in an atomic section.
Procedure SemPreservingOrders was defined in Section 4. Generalize is
the Algorithm 1. Procedure Choose picks a library satisfying a given constraint.
Eliminate (see below) finds constraints to eliminate a generalized po-trace.
The basic idea behind generalized trace elimination is that µ encodes the
happens-before relation among instructions and hence cannot contain loops.
Hence, we aim to enforce minimal constraints to introduce a cycle in the µ
relation. We extend the graph representing µ by introducing constraint edges
Algorithm 2 Synthesis algorithm
Input: Library P
Output: Error-free library P ′ sequentially equivalent to P
1: (α,v)← (∅, SemPreservingOrders(P))
2: while true do
3: P ′ ← Choose(v, α)
4: if Verify(P ′) return P ′
5: µ← Generalize(cex(P ′),v)
6: (α,v)← (α,v) ∧ Eliminate(µ, (α,v))
corresponding to possible atomic sections and reorderings. We then find the
smallest cycles, which correspond to the required minimal constraints.
Fix a library P and a partial-order trace µ = 〈ϕin , X,µ, loc, ϕend〉 for the re-
mainder of this section. The elimination graph G(µ, α,v) = (S,E) is a weighted
graph with vertices S = X. The edges E ⊆ S × N× S are described below. Let
x, x′ ∈ S and loc(x) = (tid , s) and loc(x′) = (tid ′, s′). The function cons assigns
a constraint to each edge of the elimination graph. We have (x,w, x′) ∈ E if:
– tid 6= tid ′ ∧ x µ x′ ∧w = 1∧¬∃x′′ : x µ x′′ µ x′. In this case, we define
cons((x,w, x′)) = >.
– tid = tid ′, where x µ x′ and either x and x′ belong to different blocks or
s v s′. We have w = |{x′′ | x µ x′′ µ x′}|. Here, we let cons((x,w, x′)) =
>. These edges correspond to happens-before relations that hold due to v.
– tid = tid ′ ∧ s′ v s and w = A · |{s′′ | s′ µ s′′ µ s}| for some constant
A ∈ N. Here, we define cons((x,w, x′)) = ({s, s′}, ∅). Such edges correspond
to adding an atomic section around s and s′. We give the atomic section a
cost proportional to the minimum number of control locations it contains.
– tid = tid ′∧s 6v s′∧s′ 6v s and w = R·|{(s′′, s′′′) | s′′ 6v s′′′∧s v s′′∧s′′′ v s′}
for some R ∈ N. Here, we define cons((x,w, x′)) = (∅, {(s, s′)}). This edge
corresponds to forcing the order s before s′ and has a cost proportional to
the number of additional statement orders the constraint implies.
Intuitively, an edge (x,w, x′) with cons((x,w, x′)) = > represents a happens-
before relation true in any P ′ satisfying (α,v). Every remaining edge (x,w, x′)
is a happens-before relation true in any library satisfying cons((x,w, x′)). We
pick A much larger than R to prefer solutions having only reorderings rather
than atomic sections (picking A and R such that A > R · |X|2 is sufficient).
Let x0 . . . xn−1x0 be a cycle in the elimination graph for a po-trace µ and
(α,v) such that loc(x0) = (tid , s)∧ loc(xn−1) = (tid ′, s′) and tid 6= tid ′. We call
such a cycle an elimination cycle. We show that any elimination cycle gives us
a constraint that eliminates all traces in po-trace µ. From the elimination cycle,
we obtain the following constraint
∧n−2
i=0 cons((xi, xi+1)). This is the constraint
returned by Eliminate (called from Algorithm 2). Fix constraint (α,v). A con-
straint (α′,v′) eliminates a po-trace µ iff all libraries satisfying (α,v) ∧ (α,v)
and sequentially equivalent to P do not share a trace with µ.
Theorem 2. Let G(µ, (α,v)) contain an elimination cycle x0x1 . . . xn−1x0.
Then,
∧n−2
i=0 cons((xi, xi+1)) eliminates the po-trace µ.
Proof. Say pi = (G0, T0)s0(G1, T1) . . . sn−1(Gn, Tn) ∈ µ and let f : X →
{1, . . . , n} be the bijection witnessing the containment. Any trace pi in P ′
satisfying (α,v) and cons(xi, xi+1) has time(f(xi)) ≤ time(f(xi+1)). Hence,
any trace pi satisfying
∧n−2
i=0 cons((xi, xi+1)) and (α,v) satisfies time(f(x0)) ≤
time(f(xn−1)). However, as (xn−1, x0) is an edge in the elimination graph where
x0 and xn−1 come from different threads, we have that xn−1 pi x0 and
hence, time(f(xn−1)) ≤ time(f(x0)). Therefore, we have that time(f(x0)) =
time(f(xn−1)). This is not possible as x0 and xn−1 correspond to different
threads. Hence, every trace pi ∈ µ is eliminated by ∧n−2i=0 cons((xi, xi+1)). uunionsq
Further, the minimal elimination cycle corresponds to a minimal constraint.
As A > R|X|2, atomic sections are used iff µ cannot be eliminated by reordering.
Theorem 3. If (α,v) is the constraint corresponding to the minimal cycle in
the elimination graph, no strictly weaker constraint is sufficient to eliminate µ.
Finding minimal cycles can be done by running an all-pairs shortest path al-
gorithm, and finding nodes u, v from different threads such that sum of distances
u to v and v to u is minimal. Hence, the theorem follows.
Theorem 4. Finding minimal elimination cycles in the elimination graph
G(µ, (α,v)) can be done in time polynomial in the size of µ, α, and v.
7 Application to Systems Code
7.1 A study of concurrency bugs in Linux drivers
Our work is motivated by a study of concurrency defects in Linux device drivers.
Drivers are required to perform well under concurrent workloads, which calls for
sparing and fine-grained use of locks. This, in turn, provokes many concurrency-
related bugs, making concurrency a major source of errors in drivers [4, 12]. Our
study considered 100 most recent (as of Dec. 2012) concurrency-related defects
fixed in Linux device drivers (we used the Linux kernel development archive
obtained from www.kernel.org). These defects occurred in 68 different drivers,
all maintained by different developers. For each bug, we rely on manual code
inspection to understand the exact nature of the bug and the fix.
We observed that many bug fixes involve subtle and seemingly ad hoc code
transformations. In-depth analysis reveals several common patterns, shown in
Table 1. In particular, 28 of 100 fixes were semantic-preserving statement re-
orderings (the reorder pattern). These further fall into several subpatterns
(see Table 2 and Figure 3). Reordering instructions often involves additional
side effects. For example, moving a statement across function boundaries may
require adding arguments or return values to functions. Our implementation
currently does not perform these, but can be extended to do so.
Interestingly, lock pattern (17%) is rarer than expected. Performance and
kernel-imposed constraints often prevent lock usage. This observation confirms
that locks are not a universal band-aid for concurrency defects in OS code. We
pattern description #
reorder Reorder program statements to eliminate a race 28
lock Protect racing code sections with a lock 17
optimistic Check if another thread has modified the value of a shared variable 10
barrier Use a system-provided function to wait for a racing thread to ter-
minate or complete a critical section
7
atomic Replace a statement-sequence with an equivalent atomic primitive 6
upgrade Replace a synchronization primitive with a stronger one 5
unshare Avoid sharing by creating a private copy of a shared variable 3
clone Replicate an idempotent statement 1
adhoc Transformations that do not fall into one of the previous categories 23
Total 100
Table 1: Synchronization patterns in Linux device drivers.
init: x=F, run=F
thread1 thread2
B: x = T 1: wait(run)
A: run = T 2: assert(x)
((((B: x = T
(a) reorder.release
init: x=T
thread1 thread2
B: x = F 1: lock()
D: x = T 2: assert(x)
C: unlock() 3: unlock()
((((D: x = T
(b) reorder.lock
init: x=T
thread1 thread2
((((A: x = F 1: assert(x)
B: wait(exit)2: exit = T
A: x = F
(c) reorder.delay
Fig. 3: Examples of reorder subpatterns and corresponding elimination graphs.
do not discuss remaining bug categories, but note that we encountered 23 bug
fixes that did not fit into any pattern (ad hoc in Table 1). We expect to discover
new patterns among these as we include more defects in our study.
7.2 Synthesis case study
We implemented our algorithms in a tool called Concur-
rencySwapper (Source and benchmarks can be found here:
https://github.com/thorstent/ConcurrencySwapper). It handles a restricted
subset of C, avoiding complex parts including pointer arithmetic, aliasing,
bit-wise arithmetic, etc. It uses CPAChecker [2] to convert C statements into
formulae representing instructions, as in Section 3. We use the bounded model
checking tool Poirot [1] to detect three kinds of bugs: (a) assertion failures;
(b) generic correctness conditions (e.g., initialization-before-use for pointers);
and (c) deadlocks (as Poirot does not detect deadlocks, we manually encoded
these as suitable assertions for our examples). We generalize buggy traces, using
Z3 theorem prover [5] to perform the required checks for Rules 1 and 2. The
current implementation does not compute invariants during generalization; but
even without invariant computation, our tool came up with the right program
transformations quickly. To evaluate the effectiveness of trace generalization,
we ran the experiments with and without it.
Reporting. Although each iteration of the algorithm eliminates a buggy po-
trace, additional traces may exhibit the same bug. We report the iterations
needed to completely fix a bug, i.e., until no more traces exhibit a similar bug.
Also, we report separately, the time taken to: (a) find bugs; (b) generalize
pattern description example #
reorder.release Move a variable assignment to a location before
another thread accessing this variable is released
Fig 3a 11
reorder.lock Move statements to existing lock-protected section Fig 3b 10
reorder.delay Delay assignment to a shared variable until a rac-
ing thread accessing this variable has terminated
Fig 3c 6
reorder.rw Reorder accesses to a pair of shared variables Fig 1b 1
reorder.adhoc Application-specific reordering – 1
Table 2: Subpatterns of the reorder pattern.
the trace and find a fix; and (c) verify the correct program. We report the
verification time separately as it is usually the largest fraction of execution time.
Benchmarks. Our initial evaluation consisted of 5 microbenchmarks each of
15–30 lines of code without comments, and modeling a single concurrency de-
fect found in a real Linux driver. The iterations required and fix patterns are
summarized in Table 3. The synthesis took less than 15 seconds for each case,
with trace analysis taking less than 0.5 seconds. Also, in 1 case, not using trace
generalization leads to an additional iteration, leading to a larger execution time.
Bench- Fix pattern Iters. Iters (w/o
mark trace gen.)
ex1 reorder.rw 1 1
ex2 reorder.release 1 1
ex3 reorder.lock 1 1
ex4 reorder.adhoc 3 3
ex5 lock 2 3
Table 3: Micro-benchmarks
We evaluate the scalability of
ConcurrencySwapper using a sim-
plified version of the Linux Realtek
8169 driver. This driver is represen-
tative of medium to high-end drivers
both in terms of overall complexity
and the complexity of synchroniza-
tion logic. We extracted the driver’s complete synchronization skeleton, including
code and variables related to thread synchronization and communication. The
skeleton does not include the actual device management code, which is irrelevant
to concurrency, and was additionally simplified to avoid currently unsupported
C constructs. We provide an environment model to simulate all (7) OS threads
that interact with the driver. The resulting skeleton had 364 LOC, while the
original driver had around 7,000 LOC. The skeleton had 5 concurrency defects.
Poirot was able to find all the defects, and ConcurrencySwapper was able
to find fixes for each defect through statement reordering. The results are sum-
marized in Table 4. In each iteration, the trace analysis phase took less than 2
seconds. The extra bug finding times due to additional iterations is reported for
the runs without trace generalization. In one case, the 3 additional iterations
were required without trace generalization. The bug finding times dominate the
trace analysis times, justifying the use of complex trace generalization procedure
to avoid additional iterations. The verification phase took around 30 minutes.
8 Conclusion
The contributions of the paper are two-fold. First, our synthesis procedure con-
siders a variety of semantics-preserving transformations (not just lock place-
ment). Second, in order to speed up the synthesis procedure, we consider coun-
Bug Fix pattern With trace generalization Without trace generalization
Iters. Bug-finding Iters. Additional bug-finding
bug1 reorder.release 1 8 sec 1 same
bug2 reorder.delay 1 23 sec 4 same + 80 sec
bug3 reorder.rw 1 93 sec 1 same
bug4 reorder.rw 1 94 sec 1 same
bug5 reorder.adhoc 2 47 sec 2 same
Table 4: Results for Linux Realtek 8169 driver benchmark
terexamples that are partial orders on instructions (as opposed to linear orders).
There are several possible directions for future work. First, we will investigate
generalizations of counterexamples in CEGAR algorithms for verification, rather
than synthesis. Second, we plan to address issues in systems programming such
as weak memory models. In this paper, we compute minimal sufficient ordering
constraints. In a sequentially-consistent models, reordering statements alone is
sufficient to enforce the constraints; but in a weak model, additional fences may
be needed to enforce them.
References
1. Poirot: The concurrency sleuth. http://research.microsoft.com /en-
us/projects/poirot/.
2. D. Beyer and E. Keremoglu. CPAchecker: A tool for configurable software verifi-
cation. In CAV, pages 184–190, 2011.
3. S. Cherem, T. Chilimbi, and S. Gulwani. Inferring locks for atomic sections. In
PLDI, pages 304–315, 2008.
4. A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An empirical study of
operating systems errors. In SOSP, pages 73–88, 2001.
5. L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, pages
337–340, 2008.
6. E. Ermis, M. Scha¨f, and T. Wies. Error invariants. In FM, pages 187–201, 2012.
7. A. Gupta, C. Popeea, and A. Rybalchenko. Solving recursion-free horn clauses
over LI+UIF. In APLAS, pages 188–203, 2011.
8. A. Gupta, C. Popeea, and A. Rybalchenko. Threader: A constraint-based verifier
for multi-threaded programs. In CAV, pages 412–417, 2011.
9. R. Lipton. Reduction: A method of proving properties of parallel programs. Com-
mun. ACM, 18(12):717–721, 1975.
10. A. Mazurkiewicz. Trace theory. In Petri Nets: Applications and Relationships to
Other Models of Concurrency, volume 255 of LNCS, pages 278–324. Springer, 1987.
11. F. Nielson, H. Nielson, and C. Hankin. Principles of program analysis (2. corr.
print). Springer, 2005.
12. L. Ryzhyk, P. Chubb, I. Kuz, and G. Heiser. Dingo: Taming device drivers. In
Eurosys, 2009.
13. A. Solar-Lezama, C. Jones, and R. Bod´ık. Sketching concurrent data structures.
In PLDI, pages 136–148, 2008.
14. M. Vechev and E. Yahav. Deriving linearizable fine-grained concurrent objects. In
PLDI, pages 125–135, 2008.
15. M. Vechev, E. Yahav, and G. Yorsh. Abstraction-guided synthesis of synchroniza-
tion. In POPL, pages 327–338, 2010.
