Extracting Safe Thread Schedules from Incomplete Model Checking Results by Metzler, Patrick et al.
Noname manuscript No.
(will be inserted by the editor)
Extracting Safe Thread Schedules from Incomplete Model
Checking Results
Patrick Metzler · Neeraj Suri · Georg Weissenbacher
the date of receipt and acceptance should be inserted later
Abstract Model checkers frequently fail to completely
verify a concurrent program, even if partial-order reduc-
tion is applied. The verification engineer is left in doubt
whether the program is safe and the effort towards ver-
ifying the program is wasted.
We present a technique that uses the results of such
incomplete verification attempts to construct a (fair)
scheduler that allows the safe execution of the par-
tially verified concurrent program. This scheduler re-
stricts the execution to schedules that have been proven
safe (and prevents executions that were found to be
erroneous). We evaluate the performance of our tech-
nique and show how it can be improved using partial-
order reduction. While constraining the scheduler re-
sults in a considerable performance penalty in general,
we show that in some cases our approach—somewhat
surprisingly—even leads to faster executions.
P. Metzler supported by the German Academic Exchange Ser-
vice (DAAD).
N. Suri supported in part by H2020-SU-ICT-2018-2 CON-
CORDIA GA #830927 and BMBF-Hessen TUD CRISP.
G. Weissenbacher funded by the Vienna Science and Technol-
ogy Fund (WWTF) through the project Heisenbugs (VRG11-
005) and the LogiCS doctoral program W1255-N23 of the











Automated verification of concurrent programs is in-
herently difficult because of exponentially large state
spaces [39]. State space reductions such as partial-order
reduction (POR) [10,17,16] allow a model checker to
focus on a subset of all reachable states while the veri-
fication result is valid for all reachable states. However,
even reduced state spaces may be intractably large [17]
and corresponding programs infeasible to (automati-
cally) verify, requiring manual intervention.
We propose a novel model checking approach
for safety verification of potentially non-terminating
programs with a bounded number of threads, non-
deterministic scheduling, and shared memory. Our
approach iteratively generates incomplete verification
results (IVRs) to prove the safety of a program under
a (semi-)deterministic scheduler. Our contribution is
the novel generation and use of IVRs based on existing
model checking algorithms, where we use lazy abstrac-
tion with interpolants [40] to instantiate our approach.
The scheduling constraints induced by an IVR can be
enforced by iteratively relaxed scheduling [29], a tech-
nique to enforce fine-grained orderings of concurrent
memory events. When the scheduling constraints of
an IVR are enforced, all executions (for all possible in-
puts) are safe, even if the underlying (operating system)
scheduler is non-deterministic. Thereby, the program
can be executed safely before a complete verification
result is available. Executions can still exploit concur-
rency and the number of memory accesses that are
executed concurrently may even be increased. As the
model checking problem is eased, additional programs
become tractable. Furthermore, IVRs can be used to
safely execute unsafe programs which are safe under at
least one scheduler. E.g., instead of programming syn-
2 Patrick Metzler et al.
1 initially:
2 empty buffer of size N
3 count = 0











15 if count < N:
16 put item







24 if count > 0:
25 remove item




Fig. 1: An erroneous version of the producer-consumer
problem
chronization explicitly, our model checking algorithm
can be used to synthesize synchronization so that all
executions are safe.
We use the producer-consumer example from Fig. 1
to explain our approach. The verifier analyses an initial
schedule, e.g., where thread T1 and T2 produce and con-
sume in turns, and emits an IVR R1, guaranteeing safe
executions under this schedule. With its second IVR,
the verifier might verify the correctness of producing
two items in a row and the scheduling constraints can
be relaxed accordingly. When the verifier hits an unsafe
execution (the producer causes an overflow or the con-
sumer causes an underflow), it emits an unsafe IVR for
debugging. If the verifier accomplishes to analyze all
possible executions of the program, it will report the
final result partially safe, as the program can be used
safely under all inputs but unsafe executions exist. Had
there been no unsafe or safe IVRs, the final result would
be safe or unsafe, respectively.
This paper shows how to instantiate our approach
by answering the following questions: 1. Which state
space abstractions are suitable for iterative model
checking? The abstraction should be able to represent
non-terminating executions and facilitate the extrac-
tion of schedules. 2. How to formalize and represent
suitable IVRs? IVRs should be as small as possible
in order to allow short iterations, while they must be
large enough to guarantee fully functional executions
under all possible program inputs. More precisely, for
every possible program input, an IVR must cover a pro-
gram execution. 3. What are suitable model checking
algorithms that can be adapted to produce IVRs? A
suitable algorithm should easily allow to select sched-
ules for exploration.
Beyond the contributions of a previous version of
this paper [30], this extended version contains proofs
of our formal statements, a more detailed description
of constructing ARTs with the monolithic Impact al-
gorithm for concurrent programs and our iterative ex-
tension, a more detailed description of the implemen-
tation for our evaluation, additional experimental per-
formance measurements, additional illustration of our
case studies, and a more detailed discussion of section
schedules and their optimization.
2 Incomplete verification results
2.1 Basic definitions
A program P comprises a set S of states (including
a distinct initial state) and a finite set T of threads.
Each state s ∈ S maps program counters and variables
to values. We use l(s) to denote the program location
of a state s, which comprises a local location lT (s) for
each thread T ∈ T . W.l.o.g. we assume the existence
of a single error location that is only reachable if the
program P is not safe.
A state formula φ is a predicate over the program
variables encoding all states s in which φ(s) evaluates
to true. A transition relation R relates states s and their
successor states s′. Each tread T is partitioned into lo-
cal transitions Rl,l′ such that l = lT (s) and l
′ = lT (s
′)
for all s, s′ satisfying Rl,l′(s, s
′) and Rl,l′ leaves the
program locations and variables of other threads un-
changed. We use Guard(R) to denote a predicate en-
coding ∃s′ . R(s, s′), e.g., Guard(R13,14) is (count < N)
for the transition from location 15 to 16 in Fig. 1.
We say that Rl,l′ (or T , respectively) is active at lo-
cation l and enabled in a state s iff l(s) = l and s satisfies
Guard(R). We write enabled(s) for the set of enabled
transitions at s. Multiple transitions of a thread T at a
location can be active, but we allow only one transition
R to be enabled at a given state. If R exists, we write
enabledT (s) := {R} and enabledT (s) := ∅ otherwise.
If there exist states s for which no transition of a
thread T is enabled (e.g., in line 14 in Fig. 1), T may
block. We assume that such locations lT (s) are (conser-
vatively) marked by may-block(lT (s)).
An execution is a sequence s0, T1, s1, . . . , where s0 is
the initial state and the states si and si+1 in every ad-
jacent triple (si, Ti, si+1) are related by the transition
relation of Ti. An execution that does not reach the
error location is safe. A deadlock is a state s in which
no transitions are enabled. W.l.o.g. we assume that all
finite executions correspond to deadlocks and are un-
desirable; intentionally terminating executions can be
modelled using terminal locations with self-loops.
An execution τ is (strongly) fair if every thread Ti
enabled infinitely often in τ is also scheduled infinitely
Extracting Safe Thread Schedules from Incomplete Model Checking Results 3
often [5]. We assume that fairness is desirable and en-
force it by our algorithm presented in Sec. 3. Other no-
tions of fairness, such as weak fairness, can be enforced
analogously to our use of strong fairness.
Non-determinism can arise both through schedul-
ing and non-deterministic transitions. A scheduler can
resolve the former kind of non-determinism.
Definition 1 (scheduler) A scheduler ζ : (S×T )∗×
S → T of a program P is a function that takes an
execution prefix s0, T1, . . . , Tn, sn and selects a thread
that is enabled at sn, if such a thread exists. A scheduler
ζ is deadlock-free (fair, respectively) if all executions
possible under ζ are deadlock-free (fair).
A scheduler for the program of Fig. 1, for in-
stance, must select T1 rather than T2 for the pre-
fix sinit , T1, s1, T1, s2, T1, s3, T2, s4, T2, s5, since at that
point the lock is held by T1 and enabledT2(s5) = ∅.
Non-deterministic transitions are the second source
of non-determinism. If Rl,l′ of thread T allows multiple
successor states for a state s, we presume the existence
of input symbols X such that each ι ∈ X determines
a unique successor state s′ by selecting an Rιl,l′ ⊆ Rl,l′
with Rιl,l′(s, s
′).
Definition 2 (input) An input is a function χ : (S ×
T )∗ → X, which chooses an input symbol depending
on the current execution prefix.
In conjunction, an input and a scheduler render
a program completely deterministic: the input χ and
scheduler ζ select a transition in each step such that
each adjacent triple (si, Ti+1, si+1) is uniquely deter-
mined.
For Partial Order Reduction (POR), we assume
that a symmetric independence relation ‖ on transi-
tions of different threads is given, which induces an
equivalence relation on executions. Two transitions R1
and R2 are only independent if they are from distinct
threads, they are commutative at states where both R1
and R2 are enabled, and executing R1 does neither en-
able nor disable R2. If R1 and R2 are not independent,
we write R1 ∦ R2.
2.2 Requirements on incomplete verification results
Our goal is to ease the verification task by producing
incomplete verification results (IVRs) which prove the
program safety under reduced non-determinism, i.e.,
only for a certain scheduler. We only allow “legitimate”
restrictions of the scheduler that do not introduce dead-
locks or exclude threads. Inputs must not be restricted,
since this might reduce functionality and result in un-
handled inputs.
Hence, we define an IVR to be a function R that
maps execution prefixes to sets of threads, representing
scheduling constraints. An IVR for the program from
Fig. 1, for instance, may output {T1} in states with
an empty buffer, meaning that only thread T1 may be
scheduled here, and {T2} otherwise, so that an item is
produced if and only if the buffer is empty. A scheduler
ζR enforces (the scheduling constraints of) an IVR R
if ζR(τ) ∈ R(τ) for all execution prefixes τ . IVR R
permits all executions possible under a scheduler that
enforces R.
The remainder of this subsection discusses the re-
quirements on useful IVRs. We define safe, realizable,
deadlock-free, fairness-admitting, and fair IVRs. In the
following subsection, we instantiate IVRs with abstract
reachability trees (ARTs). Fig. 2 gives an overview on
the logical relationship between properties of ARTs
(left) and IVRs (right).
Safety. An IVR R can either expose a bug in a pro-
gram or guarantee that all permitted executions are
safe. Here, we are only concerned with the latter case.
An IVR R is safe if all executions permitted by R are
safe. An unsafe IVR permits an unsafe execution and
is called a counterexample.
Completeness. To reduce the work for the model
checker, a safe IVR R should ideally have to prove the
correctness of as few executions as possible. At the same
time, it should cover sufficiently many executions so
that the program can be used without functional re-
strictions. For instance, the IVR R(τ) := ∅, for all τ ,
is safe but not useful, as it does not permit any ex-
ecution. Consequently, R should permit at least one
enabled transition, in all non-deadlock states, which
is done by realizable IVRs: an IVR R is realizable if
at least one scheduler that enforces R exists. Further-
more, an IVR should never introduce a deadlock: an
IVR R is deadlock-free if all schedulers that enforce R
are deadlock-free.
Fairness. In general, we deem only fair executions
desirable. The IVR R(τ) := {T1}, for instance, is
deadlock-free for the program of Fig. 1 but useless,
as no item is consumed. A deadlock-free IVR admits
fairness if there exists a fair scheduler enforcing R (i.e.,
a fair execution of the program is possible).
If a scheduler permits both fair and unfair execu-
tions, it might be difficult to guarantee fairness at run-
time. In such cases, a fair IVR can be used: A deadlock-
free IVR R is fair if all schedulers enforcing R are fair.
4 Patrick Metzler et al.
ART: IVR:
A is safe ⇒ RA is safe
⇑
⇑ RA is realizable
⇑
A is deadlock-free ⇒ RA is deadlock-free
⇑ ⇑
A admits fairness ⇒ RA admits fairness
⇑ ⇑
A is fair ⇒ RA is fair
Fig. 2: Overview on the relationship between properties
of IVRs and ARTs.⇒ and ⇑ denote logical implication.
2.3 Abstract reachability trees as incomplete
verification results
In this subsection, we instantiate the notion of IVRs us-
ing abstract reachability trees (ARTs), which underly a
range of software model checking tools [21,28,23,9] and
have recently been used for concurrent programs [40].
Due to the explicit representation of scheduling choices
from the beginning of an execution up to an (abstract)
state, ARTs are well-suited to represent IVRs. Model
checking algorithms based on ARTs perform a path-
wise exploration of program executions and represent
the current state of the exploration using a tree in which
each node v corresponds to a set of states at a program
location l(v). These states, represented by a predicate
φ(v), (safely) over-approximate the states reachable via
the program path from the root of the ART (ε) to v.
Edges expanded at v correspond to transitions starting
at l(v). A node w may cover v (written v B w) if the
states at w include all states at v (φ(v) ⇒ φ(w)); in
this cases, v is covered (covered(v)) and its successors
need not be further explored. (Intuitively, executions
reaching v are continued from w.) Formally, an ART is
defined as follows:
Definition 3 (abstract reachability tree [28,40])
An abstract reachability tree (ART) is a tuple A =
(V, ε,−→,B), where (V,−→) is a finite tree with root ε ∈ V
and B⊆ V×V is a covering relation. Nodes v are labeled
with global control locations and state formulas, writ-
ten l(v) and φ(v), respectively. Edges (v, w) ∈−→ are la-
beled with a thread and a transition, written v
T,R−−→ w.
Intuitively, an ART A is well-labeled [28] if A ’s
−→-edges represent the transitions of the program and
edges v B w indicate that all states modeled by node v
are also modeled by node w. Formally, A is well-labeled
if for every edge v
T,Rl,l′−−−−→ w in A we have that (i) φ(ε)
represents the initial state, (ii) φ(v)(s) ∧ Rl,l′(s, s′) ⇒
φ(w)(s′) and lT (v) = l and lT (w) = l
′, and (iii) for
every v, w with v B w, φ(v)⇒ φ(w) and ¬covered(w).
mutex = 0 ∧ count = 0
mutex = 0 ∧ count = 0
mutex = 1 ∧ count = 0
mutex = 1 ∧ count = 0
mutex = 1 ∧ count = 0
mutex = 1 ∧ count = 1
mutex = 0 ∧ count = 1
false
mutex = 0 ∧ count = 1
mutex = 1 ∧ count = 1
mutex = 1 ∧ count = 1
mutex = 1 ∧ count = 1
mutex = 1 ∧ count = 0



















T2: if count> 0
T2: remove item
T2: count -= 1
T2: unlock(mutex)
T2: else
Fig. 3: An (incomplete) ART for the program of Fig. 1
An incomplete ART Ap-c for the producer-consumer
problem of Fig. 1 is shown in Fig. 3. Nodes show the
state formulas and edges are labeled with the thread
and statement corresponding to the transition. The
dashed edge is a B-edge.
ART-induced schedulers. A well-labeled ART A di-
rectly corresponds to an IVR RA that simulates an
execution by traversing A . We define RA as follows:
Let τ = s0, T1, s1, . . . , sn be an execution prefix. If A
contains no path that corresponds to τ , RA leaves the
schedules for this execution unconstrained. Otherwise,
let vn be the last node of the path in A that corre-
sponds to τ . RA permits exactly those threads that are
expanded at vn (or at w if vn is covered by some node
w). Execution prefixes are matched with (B ∪ −→)-
paths, which is, in particular, necessary to build infinite
executions. For example, the execution prefix
τ = s0, T1, s1, . . . , T1,︸ ︷︷ ︸
T1 scheduled 6 times
s6, T2, s7, . . . , T2,︸ ︷︷ ︸
T2 scheduled 6 times
s0
corresponds to the path in Ap-c from ε over v1, . . . , v12
back to ε. As only T1 is expanded at ε, RA p-c allows
only {T1} after τ .
Safety. An ART is safe if whenever lT (v) is the error
location then φ(v) = false. As only safe executions may
Extracting Safe Thread Schedules from Incomplete Model Checking Results 5
correspond to a path in a safe ART (cf. Theorem 3.3
of [40]), RA is a safe IVR.
Completeness. In order to derive a deadlock-free
IVR from a well-labeled ART A , we have to fully ex-
pand at least one thread T at each node v that repre-
sents reachable states (where T is fully expanded at v if
v has an outgoing edge for every active transition of T
at lT (v)). However, there may exist reachable states s
represented by φ(v) for which no transition of T is en-
abled (i.e., enabledT (s) = ∅). If T is the only thread
expanded at v, RA is not realizable. This situation
can arise for locations l at which T may block (marked
with may-block(lT )).
Consequently, whenever may-block(lT (v)) in a
deadlock-free ART A , we require that φ(v) is strong
enough to entail that the transition R of T expanded
at v (or at the node covering v, respectively) are en-
abled (i.e., φ(v) ⇒ Guard(R)). For instance, φ(v1) in
the ART shown above proves the enabledness of T1 at
v1, as φ(v1) ⇒ mutex = 0 and lock(mutex) is enabled if
mutex = 0.
Lemma 1 If an ART A is deadlock-free, RA is a
deadlock-free IVR.
Proof Let RA be the IVR of a deadlock-free ART A .
First, we construct a scheduler that enforces RA , which
proves that RA is realizable. Second, we show that all
schedulers that enforce RA are deadlock-free, which
concludes the proof that RA is deadlock-free.
For arbitrary execution prefixes of the form τ =
s0, T1, s1, . . . , sn, let T ′(τ) = RA (τ) ∩ {T ∈ T :
enabledT (sn) 6= ∅}. Let ζ : (S × T )∗ × S → T be an
arbitrary function such that ∀τ. ζ(τ) ⊆ T ′(τ) when-
ever T ′(τ) is not empty. (A description of how ζ can
be constructed is given by the definition of RA .) By
construction, ζ enforces RA if ζ is a scheduler. We
show that ζ is a scheduler by contradiction. Assume
that ζ is not a scheduler. Then there exists an execu-
tion prefix τ = s0, T1, s1, . . . , sn such that ζ(τ) = T ,
enabledT (sn) = ∅ and enabled(sn) 6= ∅.
case τ does not correspond to a path in A : By
the definition of RA , RA (τ) = T . By assumption
enabled(sn) 6= ∅, T ′ is not empty. By the construc-
tion of ζ, T ∈ T ′. Contradiction to enabledT (sn) =
∅.
case τ corresponds to a path π = v0, T1, R1, v1, . . . , vn
in A : By the construction of RA , T is expanded
at vn.
case may-block(lT (vn)): By the definition of may
block, T has exactly one transition R active
at lT (vn). As A is deadlock-free, φ(vn) ⇒
Guard(R). By the assumption that τ corre-
sponds to a path π, sn  φ(vn). Hence, φ(vn) 
Guard(R) and R ∈ enabled(sn). Contradiction
to enabled(sn) = ∅.
case not may-block(lT (vn)): By the definition of
may block, enabledT (sn) 6= ∅. Contradiction to
enabledT (sn) = ∅.
It remains to show that all schedulers that en-
force RA are deadlock-free. Let ζ be an arbitrary
scheduler that enforces RA . Assume that ζ is not
deadlock-free. Then there exists an execution τ =
s0, T1, s1, . . . , sn that is possible under ζ such that
sn is a deadlock, i.e., ∀T ∈ T . enabledT (sn) = ∅
and ∃T ∈ T .∃Rl,l′ . lT (sn) = l. As τ is an exe-
cution permitted by RA , τ corresponds to a path
π = v0, T1, R1, v1, . . . , vn in A . Let T = ζ(τ). By
choice of ζ, T is expanded at vn. With the same
argument as above, in case may-block(lT (vn)), we
have φ(vn) ⇒ Guard(R) for some transition Rl,l′
with lT (vn) = lT (sn) = l and a contradiction to
enabled(sn) = ∅ and in case not may-block(lT (vn)),
we have enabledT (sn) 6= ∅ and a contradiction to
enabledT (sn) = ∅.
Fairness. IVRs derived from deadlock-free ARTs do
not necessarily admit fairness if the underlying ART
contains cycles (across B and −→ edges) that represent
unfair executions. In order to make sure a deadlock-free
ART admits fairness we implement a scheduler that al-
lows A to schedule each thread infinitely often (when-
ever it is enabled infinitely often) by requiring that ev-
ery (B ∪ −→)-cycle is “fair”, defined as follows.
Definition 4 (ART admitting fairness) A deadlock-
free ART A = (V, ε,−→,B) admits fairness if every
(B ∪ −→)-cycle contains, for every thread T that is
enabled at a node of the cycle, a node v such that T is
expanded at v.
Before we proof the fairness of IVRs induced by fair
ARTs, we state the following auxiliary proposition.
Proposition 1 (completely visited cycles) Let
G = (V,−→) be a directed, finite graph. For all infinite
paths π ∈ V ω through G and for all nodes v ∈ V that
occur infinitely often in π, there exists a cycle π′ in G
such that π′ contains v and all nodes of π′ are visited
infinitely often by π.
Lemma 2 If an ART A admits fairness, RA is an
IVR that admits fairness.
Proof We need to show that there exists a fair sched-
uler ζ that enforces an arbitrary ART A that admits
fairness. After constructing ζ, we show that ζ is fair by
contradiction.
Let τ = s0, T1, s1, . . . , sn be an execution pre-
fix and let π be a path such that τ corresponds to


























Fig. 4: A (B ∪ −→)-cycle (B is shown by a dashed line)
π = v0, T1, . . . , vn. By γ(T ), we denote the number of
occurrences of T in π. Let T ′ be the set of threads
that is both enabled at sn and permitted by A , i.e.,
T ′ = RA (τ) ∩ {T : enabledT (sn) 6= ∅}. We let ζ
schedule an arbitrary thread T ∈ T ′ such that no other
thread in T ′ occurs less often in π, i.e., ζ(τ) = T ∈ T ′
such that ∀T ′ ∈ T ′. γ(T ) ≤ γ(T ′). By Lemma 1 and as
A admits fairness, ζ is indeed a scheduler (T ′ is only
empty when enabled(sn) is empty).
It remains to show that ζ is fair, i.e., that ev-
ery execution scheduled by ζ is fair. Let τ be an
execution that is scheduled by ζ (τ is of the form
τ = sinit , ζ(sinit), s1, . . .). If τ is finite, it is trivially
fair. Otherwise, assume that τ is not fair. Then there
exists a thread T that is infinitely often enabled in τ
but does not occur in τ after some prefix of τ . Let
π be a path in A such that τ corresponds to pi. Let
vT be a node at which T is enabled and that occurs
infinitely often in π. As A is finite and by Proposi-
tion 1, there exists a cycle that contains vT such that
π visits all nodes in this cycle infinitely often. As A
admits fairness, there exists v
T,a−−→A v′ such that v is
in this cycle and a ∈ enabled(s) for all states s that
correspond to v. As T is not scheduled in τ after some
finite number i of steps, there exist one or more other
threads T ′ 6= T with v T
′
−→A w for some w 6= v′ which
are scheduled at v for all steps k > i. Let t be the set of
those threads T ′. By the construction of the scheduler,
γ(T ′) ≤ γ(T ) for all T ′ ∈ t. After only finitely many
steps l, γ(T ) < γ(T ′) for all T ′ ∈ t (e.g., take l to
be the product of the maximum path length from v to
v and the number
∑
T ′∈t 1 + γ(T ) − γ(T ′) of required
visits of v). Hence, there exists a prefix of π of length
l′ ≥ l in which v T−→A v′ is the last step, i.e., T has
been scheduled. Contradiction to the assumption that T
is not scheduled after i steps in π.
Note that the expansion of a thread T at a node in
a cycle does not guarantee that the transition is part of
the cycle. A slight modification of the fairness condition
for ARTs leads to a sufficient condition for ARTs as fair
IVRs, as the following definition and lemma show. The
difference in the fairness condition is that all enabled
threads are expanded within each (B ∪ −→)-cycle c,
which we denote by fair(c). The (B ∪ −→)-cycle shown
in Fig. 4, for instance, is fair.
Definition 5 (fair ART) A deadlock-free ART A =
(V, ε,−→,B) is fair if fair(c) holds for every (B ∪ −→)-
cycle c.
Lemma 3 (fairness) For all fair ARTs A , RA is a
fair IVR.
Proof Let A be a fair ART. By Lemma 1 and as A
is deadlock-free, there exists a scheduler ζ that enforces
A . It remains to show that ζ is fair, which we prove
by contradiction. Suppose that an unfair execution τ is
possible under ζ. There exists a thread T that is en-
abled infinitely often in τ but does not occur in τ after
a finite prefix. Let π be a path through A such that τ
corresponds to π. As VA is finite, there exists a node
v that occurs infinitely often in π and at which T is
enabled. As A is finite and by Proposition 1, v is part
of a cycle of which all nodes occur infinitely often in π.
By fairness, one edge in this cycle is labeled with T . By
the definition of ARTs ((VA ,−→A ) is a tree), this edge
occurs infinitely often in π. Contradiction.
Given an ART A that admits fairness, one can gen-
erate a fair ART A ′ such that RA permits all execu-
tions permitted by RA ′ .
3 Iterative model checking
A suitable algorithm for our framework must gener-
ate fair IVRs. We use model checking based on ARTs
(cf. Sec. 2.3), which allows us to check infinite execu-
tions and explicitly represent scheduling. Nevertheless,
other program analysis techniques such as symbolic exe-
cution are also suitable to generate IVRs. In particular,
our algorithm (Alg. 1) constitutes an iterative exten-
sion of the Impact algorithm [28] for concurrent pro-
grams [40]. We chose Impact as a base for our algo-
rithm because it has an available implementation for
multi-threaded programs, which we use to evaluate our
approach in Sec. 5.
Impact generates an ART by path-wise unwinding
the transitions of a program. Once an error location is
reached at a node v, Impact checks whether the path
π from the ART’s root to v corresponds to a feasible
execution. If this is the case, a property violation is
reported; otherwise, the node labeling is strengthened
via interpolation. Thereby, a well-labeled ART is main-
tained. Once the ART is complete, its node labeling
provides a safety proof for the program.
Extracting Safe Thread Schedules from Incomplete Model Checking Results 7
Algorithm 1: Iterative Impact for concurrent programs: main procedure (based on [40])
input : Program with threads T
intermediate outputs: fair ARTs A1 ⊆ A2 ⊆ . . . ⊆ An and unsafe ARTs
output : safe, partially safe, or unsafe
Data: A = (V, ε,−→,B) := ({ε}, ε, ∅, ∅), W := {ε}, I := {}
1 Function Main()
2 while true do
3 status := Iteration()
4 if status = no progress then
5 break
6 else if status = counterexample then
7 yield A as an unsafe IVR
8 else
9 A ′ := Remove_Error_Paths(A )
10 yield A ′ as a safe IVR
11 if A is safe then
12 return safe





18 W := New_Schedule_Start()
19 if W = ∅ then
20 return no progress
21 while W 6= ∅ do
22 select and remove v from W
23 Close(v)
24 if v not covered then
25 status := Refine (v)
26 if status = counterexample then
27 return counterexample
28 status := Check_Enabledness(v)
29 if status = no progress then




34 π := v0
T1,R1−−−−→ v1 . . .
Tn,Rn−−−−−→ vn path from ε to v
35 if not may-block(lvn−1)T n then
36 return progress
37 if R1 ∧ . . . ∧ Rn−1 ∧ ¬Guard(Rn) is unsat then




42 for all uncovered nodes w that have been created before v do
43 if l(w) = l(v) ∧ (φ(v)⇒ φ(w)) ∧∀c ∈ CA (v, w). fair(c) then
44 B:=B ∪{(v, w)}
45 B:=B \{(x, y) : v  y}
46 for T with v
T−→ v′ and not w T−→ w′ do
47 add (v, T ) to I
48 Function Backtrack(v)
49 π := v0
T1,R1−−−−→ v1 . . .
Tn,Rn−−−−−→ vn path from ε to v
50 i := n− 1
51 while i ≥ 0 do
52 if ∃T, v′i. vi
T−→ v′i /∈ A ∧(Skip(vi, T) = false) then
53 add vi
T−→ v′i to A
54 W := W ∪ {v′i}
55 prune
Ti+2,Ri+2−−−−−−−−→ vi+3 . . . . . .
Tn,Rn−−−−−→ vn from A
56 φ(vi+1) := false
57 return progress
58 i := i− 1
59 return no progress
60 Function Expand(v)
61 T := Schedule_Thread (v)
62 Expand_Thread (T , v)
To build an ART as in the producer-consumer ex-
ample of Fig. 3, Impact starts by constructing the root
node ε with φ(ε) = true and l(ε) = (8, 12), where we
indicate locations by line numbers in Fig. 1. Initially,
mutex = 0, count = 0, and the buffer size is bound by
an arbitrary constant N > 0. Thread T1 is expanded by
adding a node v1 with φ(v1) = true and l(v1) = (14, 12).
From v1, thread T1 is expanded repeatedly until node
v6 with φ(v6) = true and l(v6) = (8, 12) is produced.
At this point, all statements of the produce() procedure
have been expanded once. As v6 has the same global
location as ε and φ(v6) ⇒ φ(ε), a covering v6 B ε can
be inserted. However, when the else branch of thread T1
at node v1 is expanded, a node verror labeled with the
error location is added. In order to check the feasibility
of the error path ε −→ v1 −→ v2 −→ verror, Impact tries
to find a sequence interpolant for:




8 Patrick Metzler et al.
As we assume that the buffer is never of size 0, i.e.,
N > 0,
∧
U is unsatisfiable and a possible sequence in-
terpolant is:
I0 ≡ true
I1 ≡ count = 0 ∧ mutex = 0
I2 ≡ count = 0 ∧ mutex′ = 1
I3 ≡ false
with:
I0 ∧ count = 0 ∧ mutex = 0⇒ I1
I1 ∧ mutex′ = 1⇒ I2
I2 ∧ count ≥ N⇒ I3
Hence, verror can be labeled with false, so that the ART
remains safe, and the preceding labels can be updated
to φ(ε) = φ(v1) = count = 0 ∧ mutex = 0 and φ(v2) =
count = 0∧mutex = 1. Due to the relabeling, the covering
v6 B ε has to be removed and v6 has to be expanded.
When T2 has been expanded six times beginning at
v6, a node v12 is added with l(v12) = (8, 12). Impact
applies a heuristic that attempts to introduce cover-
ings eagerly, which results in a label φ(v12) = mutex =
0∧ count = 0 and a covering v12 B ε can be added. With
this covering, the current ART is fair and can be used as
an IVR. In contrast, Impact for concurrent programs
would then continue to explore additional interleavings
by expanding, e.g., T2 at ε. A complete ART is found
when both error paths and all interleavings of produce()
and consume() that respect the available buffer size N are
explored. Impact for concurrent programs does not ter-
minate until such a complete ART is found and would
not terminate at all if the buffer size is unbounded. Our
algorithm, however, is able to yield an fair IVR each
time a new interleaving has been explored.
In each iteration, our extended algorithm yields an
IVR which is either unsafe (a counterexample) or fair
(can be used as scheduling constraints). If the algo-
rithm terminates, it outputs “safe”, “partially safe”,
or “unsafe”, depending on whether the program is safe
under all, some, or no schedulers. Procedure Main()
repeatedly calls Iteration() (line 3), which, intuitively,
corresponds to an execution of the original algorithm
of [40] under a deterministic scheduler. Iteration()
(potentially) extends the ART A . If no progress is
made (A is unchanged), the algorithm terminates
(lines 12, 14, and 16). Otherwise, an intermediate
output is yielded: either A as an intermediate output
(line 7) or A with all previously found counterexamples
removed, i.e., the largest fair ART that is a subgraph
of A , denoted by Remove Error Paths().
Iteration() maintains a work list W of nodes v to
be explored via Close(v), which tries to find (as in [40])
a node that covers v. In addition to the covering check
of [40], we check fairness, where CA (v, w) denotes all
cycles that would be closed by adding the edge v B w
(line 43). If such a node w is found, any thread T that
is expanded at v but not at w (line 46) must not be
skipped at w by POR. Instead of expanding T instan-
taneously at w (as in [40]), which would explore another
schedule, T is added to the set I so that it can be ex-
plored in a subsequent iteration. If no covering node
for v is found, v is refined, which returns counterex-
ample if v has a feasible error path (line 25). Other-
wise (line 28), Check Enabledness() performs a dead-
lock check by testing whether the last transition that
leads to v is enabled in all states represented by the
predecessor node. If not, deadlock-freedom is not guar-
anteed and Backtrack() tries to find a substitute node
where exploration can continue.
The deterministic scheduler of Iteration() is con-
trolled by New Schedule Start() and Schedule Thread().
The former selects a set of initial nodes for the explo-
ration (line 18); the latter decides which thread to
expand at a given node (line 61). We use a simple
heuristic that selects the first (in breadth-first order)
node which is not yet fully expanded and use a round-
robin scheduler for Schedule Thread that switches to
the next thread once a back jump occurs (e.g., the
end of a loop body is reached). Additionally, Sched-
ule Thread returns only threads that are necessary to
expand at the given node after POR (cf. Skip() [40]).
More elaborate heuristics are conceivable but out of
the scope of this paper.
The correctness of Alg. 1 w.r.t. safety follows from
the correctness of [28] and [40]. Additionally, Alg. 1 is
also fair:
Lemma 4 (fairness of Alg. 1) Any safe ART A gen-
erated by Alg. 1 is fair.
Proof By contradiction. Assume that Alg. 1 returns a
safe ART A = (VA , ε,−→A ,B) that is not fair. By def-
inition 5, A contains a (B ∪ −→A )-cycle c that does
not satisfy fair(c). As (VA ,−→A ) is a tree, the cycle
contains a B edge. However, Alg. 1 checks, in line 43,
whether the candidate covering would produce an unfair
cycle. A B edge is only added if the resulting cycle is
fair. Contradiction.
4 Partial-order reduction
A naive enforcement of the context switches at the rel-
evant nodes of a safe IVR RA would result in a strictly
sequential execution of the transitions, foiling any ben-
efits of concurrency. To enable parallel executions, we
Extracting Safe Thread Schedules from Incomplete Model Checking Results 9
1 Variables:
2 int x, y, z
3 Thread T1:
4 while true:
5 x := 1
6 if z = 0:
7 y := 1
8 Thread T2:
9 while true:
10 y := 0












T1: if z=0 T1: else
T1: y:=1
Fig. 5 (a) A Program with a fair ART
T1:
e1 , x := 1
e2 , read z
T2:
e3 , y := 0
e4 , x := 0
Fig. 5 (b) The section schedule for




σ2, z = 0 σ3, z 6= 0
Fig. 5 (c) A corresponding pro-
gram schedule
introduce program schedules that relax the scheduling
constraints by means of partial-order reduction (POR).
Note that this application of POR concerns the enforce-
ment of scheduling constraints and occurs in addition
to POR applied by our model checking algorithm when
constructing an ART (cf. Sec. 3). Nevertheless, depen-
dency information that is used for POR during model
checking can be reused so that redundant computations
are avoided.
The goal is to permit the parallel execution of inde-
pendent transitions (in different threads) whose order
does not affect the outcome of the execution represented
by A (i.e., the resulting traces are Mazurkiewicz-
equivalent). Using traditional POR to construct such
scheduling constraints poses two challenges: 1. Execu-
tions may be infinite, but we need a finite representation
of scheduling constraints. 2. The control flow of an ex-
ecution may be unpredictable, i.e., it is a priori unclear
which scheduling constraints will apply. We solve is-
sue 1 by partitioning ARTs into sections and associate
a finite schedule with every section. To address issue 2,
we require that sections do not contain branchings
(control flow and non-deterministic transitions).
Consider the program and corresponding ART in
Fig. 5a. The if-statement of T1 is modeled as a separate
read transition followed by a branching at node v3. We
define three section paths:
π1 := ε −→ v1 −→ v2 −→ v3 −→ v4
π2 := v4 −→ v5 −→ v7 −→ ε
π3 := v4 −→ v6 −→ ε
After π1 has been executed, a scheduler can distinguish
the cases y = 0 and y 6= 0 and schedule π2 or π3 ac-
cordingly.
Formally, a section path v1
R1−−→ . . . Rn−−→ vn+1 corre-
sponds to a branching-free path in an ART whose first
transition may be guarded. A section path follows −→A
edges, skipping covering edges B. The section schedule
of a section path describes the Mazurkiewicz equiva-
lence class of the contained transitions and is defined
as the smallest partial order σ = (Vσ,−→σ) such that
Vσ = {e1, . . . , en} and −→σ⊇ {(ei, ej) : i < j ∧Ri ∦ Rj},
where ei, 1 ≤ i ≤ n is the occurrence of transition Ri
at position i.
The section schedule σ(π1) of π1 is depicted in
Fig. 5b. It consists of four events e1 , T1 : x:=1,
e2 , T1 : read z, e3 , T2 : y:=0, and e4 , T2 : x:=0. An
arrow e → e′ indicates that σ(π1) requires e to occur
before e′. Events of the same thread are ordered ac-
cording to the program order of the respective thread.
Events e1 and e3 are from different threads and write
to the same variable, hence they are dependent and
the section schedule needs to specify an ordering: e1
must occur before e3. Accordingly, the complete section
schedule is ({e1, e2, e3, e4}, {(e1, e2), (e3, e4), (e1, e3)}).
By the following lemma, an execution from a state
corresponding to the first node of a section and sched-
uled according to the respective section schedule will
always lead to a state corresponding to the last node of
the section. For instance, the following execution frag-
ments both lead from the initial state to a state repre-
sented by v4 (s4, s
′
4  φ(v4)), as e1 and e3 are indepen-
dent and can be swapped:
sinit , T1, s1, T2, s2, T1, s3, T2, s4! e1, e3, e2, e4








4! e3, e1, e2, e4
Lemma 5 (correctness of section schedules) Let
τ be a linear extension of a section schedule σ(π) of a
section path π in a deadlock-free ART A . τ is equivalent
to a linear extension of σ(π) that corresponds to π.
Proof Let π be a section path, σ(π) its section schedule,
and τ a linear extension of σ(π). As σ(π) is a partial or-
der, all linear extensions of σ(π) are equivalent [17], in
particular the linear extension of σ(π) that corresponds
to π.
A program schedule Σ comprises several section
schedules. Σ is a labeled graph (VΣ , −→Σ). Each node
v ∈ VΣ is the start of a section path π in A . Each
10 Patrick Metzler et al.
edge is labeled with the section schedule of π and the
guard Guard(R) of the first transition R in π. As A
is deadlock-free, there exists a thread T which is fully
expanded at v in A and we require that Σ likewise has
outgoing edges at v labeled with T for each transition
of T at v. Fig. 5c shows a program schedule for our
example program.
A scheduler can enforce the scheduling constraints
of a program schedule by picking a section schedule
that matches the current execution prefix and schedul-
ing an event whose predecessors (according to the sec-
tion schedule) have already been executed. Hence, all
independent events in a section can be executed con-
currently without synchronization. All events of a sec-
tion schedule have to appear before the first event of
the next section schedule, so that the states reached
between sections correspond to nodes of the program
schedule. For example, the event T1 : y := 1 from sec-
tion π2 must not occur in between events T1 : read z
and T2 : y := 0 from section π1.
A program schedule of an ART A that admits fair-
ness permits exactly those executions that correspond
to a path in A (modulo Mazurkiewicz equivalence).
In particular, as Mazurkiewicz equivalence preserves
safety properties [17], only safe executions are permit-
ted.
Lemma 6 (correctness of program schedules) Let
A be an ART that admits fairness and Σ a program
schedule for A . All program executions that adhere to
the scheduling constraints of Σ are equivalent to an ex-
ecution that corresponds to a path in A .
Proof Let A be an ART that admits fairness, Σ a pro-
gram schedule for A , and τ be an execution that ad-
heres to the scheduling constraints of Σ. We show that
all finite prefixes τ ′ of τ are equivalent to an execution
prefix that corresponds to a path from ε in A .
Induction on the length of τ ′.
case τ ′ is empty: τ ′ corresponds to the empty path in
A .
inductive case: Let πτ ′ = v0
σ0(π0)−−−−→Σ . . . vn
σn(πn)−−−−→Σ
vn+1 be the path in Σ that τ
′ corresponds to. Let
τ ′ = x1x2 be partitioned so that x1 corresponds to
the prefix v0 . . . vn in that path. Such a partition ex-
ists, as an event must occur after all events from the
previous section schedule and before all events from
the following section schedule.
By induction hypothesis, there exists an execution
x≈1 that is equivalent to x1 that corresponds to the
path π0 . . . πn−1 in A . By Lemma 5, there exists a
linear extension x≈2 of σn(πn) that is equivalent to
x2, which corresponds to πn in A . Thus, x≈1 x
≈
2 is































12 execute critical section()
13 unlock(mutex2)
14 unlock(mutex1)













Fig. 8: Section schedule for the program of Fig. 7
5 Evaluation
In five case studies, we evaluate our iterative model
checking algorithm and scheduling based on IVRs.
We use the Impara model checker [40], as it is the
only available implementation of model checking for
non-terminating, multi-threaded programs based on a
forward analysis on ARTs we have found. Impara uses
lazy abstraction with interpolants based on weakest
preconditions. We extend the tool by implementing
our algorithm presented in Sec. 3. Impara accepts C
programs as inputs, however, some language features
are not supported and we have rewritten programs
accordingly.1 We refer to the (non-iterative) Impara
1 E.g., Pthread mutexes, some uses of the address-of opera-
tor, and reuse of the same function by several threads are not
supported. We solve these issues by rewriting our benchmark
Extracting Safe Thread Schedules from Incomplete Model Checking Results 11
1 Threads
2 T1: while true: produce()
3 T2: while true: produce()
4 T3: while true: consume()
5 T4: while true: consume()
6 produce:
7 if buffer is not full():
8 lock()




13 if buffer is not empty():
14 lock()
15 assert buffer is not empty()
16 remove item()
17 unlock()










Fig. 9 (b) First IVR (simplified)
tool as Impara-C (for complete verification) and to
our extension of Impara with iterative model checking
as Impara-IMC.
5.1 Implementation
To evaluate the enforcement of program schedules
for infinite executions, we implement a custom (user
space) scheduler.
In a first step, we automatically translate ARTs
constructed by Impara-IMC to program schedules en-
coded as vector clocks. To omit sections in the gen-
erated program schedule that would never be executed
and thereby reduce the size of the program schedule, we
discard all paths in the ART that lead only to nodes la-
beled with false. As we use only deadlock-free ARTs, an
alternative, feasible path, always exists. A given ART
is traversed from the root. Recursively, we build section
paths by traversing the graph until a branching node
is reached. At the branching node, a fully expanded
thread T is chosen. The next sections are started at all
child nodes of the branching node that are reached by
a transition of T . For each section, the section schedule
is generated based on the dependency information of
memory accesses. Section schedules are represented by
vector clocks. Additionally, each section schedule con-
tains a link to all possible successor sections, i.e., those
sections that start at a direct successor node of the
current section. If there exist nodes v, w such that all
possible (interleaved) paths between v and w are equiv-
alent and section paths, a single section path between v
and w with relaxed scheduling constraints is sufficient.
In this case, no dependencies between memory events
need to be enforced. However, we use only the first IVR
in our experiments (produced in a single iteration of Al-
gorithm 1), hence we do not evaluate this case.
Firstly, all section schedules for the given ART are
generated by enumerating them, including link infor-
mation about successor sections, and marking the ini-
tial section.
programs so that Impara handles them correctly and their
semantics is not changed. We will publish our modifications
to Impara, including two bug fixes.
Secondly, we instrument the source code of bench-
mark programs manually with callbacks to our user
space scheduler and code for time measurement. The
user space scheduler is implemented in C++11 and
uses the C++ standard library for atomic memory
operations. Program schedules are included as header
files. Every access to a non-thread-local, global variable
(shared variable) is replaced by a C++ preprocessor
macro that calls the user space scheduler, executes the
original statement, and calls the user space scheduler
to notify that the statement has been executed. In
our selection of benchmark programs, we had to in-
strument assignments and if-then-else statements. In
the case of control flow branchings that depend on a
shared variable, i.e., an if-then-else statement where
the branching expression depends on a shared vari-
able, additional callbacks are necessary to notify the
scheduler of the taken control flow path.
To ensure that memory accesses enclosed by call-
backs are indeed executed after the preceding callback
and before the succeeding callback, memory fences
are used.
The result of steps one and two is a multi-threaded
program that executes concurrent memory accesses ac-
cording to a given program schedule. Threads are exe-
cuted concurrently and only forced to execute sequen-
tially where required by the program schedule. Each
time a thread T enters the callback preceding a mem-
ory access, T looks up the current section schedule and
program counters of the other threads. If the vector
clock of the section schedule, at the position of the cur-
rent event of T , shows an event of an other thread that
has to occur first, T waits until this event has been ex-
ecuted. If no more events are required to occur before
the current event of T by the section schedule, T exe-
cutes the current memory access and, in the succeeding
callback, updates its program counter so that the other
threads are notified that T has executed another event.
In case all events of the current section have already
been executed, T chooses the successor section associ-
ated to its current event. Waiting for all threads to com-
pletely execute the current section before switching to a
successor section ensures that the program, at the end
of each section, reaches a state that is represented by a
node in the program schedule (and thereby, in the ART
12 Patrick Metzler et al.
generated by the model checker). In case T has no suc-
cessor section associated to its current event, T waits
for an other thread to choose the next section. In case
the last node of the current section is a branching node,
only the thread with a control flow branching chooses
the next section. In case T has a control flow branching
at the end of the last section, T chooses the successor
section based on the taken control flow branch.
Thirdly, we instrument the benchmark programs
with code for time measurement. Each thread executes
in an indefinite loop. Each time a thread has accom-
plished useful work in the current loop iteration, e.g.,
producing or consuming an item, writing a block or
inode, or executing the critical section, it increments its
performance counter. The main thread sleeps for 2 sec-
onds, the time out duration, and subsequently prints
the sum of the performance counters of all threads and
terminates the program. Such a single run of a bench-
mark program is executed five times and we report the
respective median value of performance counter sums.
All experiments have been executed on a 4-core Intel
Core i5-6500 CPU at 3.2 GHz.
While we manually instrumented the benchmark
source code, an automated instrumentation is well con-
ceivable. Main tasks of such an automated instrumen-
tation are to identify shared variables and all points in
the program, where dependent expressions are accessed.
Relevant shared variables can be either overapproxi-
mated so that all shared or global variables are included
or found by a static dependency analysis. Even if the
variables to be instrumented are overapproximated, the
expected additional execution time overhead is small,
as our experiments show: a callback to our scheduler is
fast if the current thread does not have to wait for other
threads before executing the next variable access. Ex-
pressions that depend on a shared variable can likewise
be found by a static dependency analysis. The auto-
mated instrumentation may of course be implemented
on the level on the intermediate representation of a
compiler and does not have to be conducted on the
source code level.
5.2 Infeasible complete verification
Even for a moderate number of threads, complete veri-
fication, i.e., verification of a program under all possible
schedules and inputs, may be infeasible. In particular,
Impara-C times out (after 72 h) on a corrected variant
of the producer-consumer problem (Fig. 1) with four
producers and four consumers. Impara-IMC produces
the first IVR R1 after 4:29:53 hours. A simplification
of R1 is depicted in Fig. 6; it covers all executions in
which the threads appear to execute their loop bodies
atomically in the order T1, T2, . . . , T8. While the main
bottleneck for Impara-C is state explosion and finding
many coverings for different schedules, we observe that
the main issue to produce R1 is to find a single covering
that comprises all threads, i.e., to find a fair cycle. The
essential predicates that lead to a fair cycle are:
count > 0, count + 1 > 0, count + 2 > 0, count + 3 > 0,
count 6= 1000, count 6= 999, count 6= 998, count 6= 997
The subsequent IVRs R2, . . . ,R8 are found much
faster than the first IVR, after 19:31, 12:3, 6:13, 28:0,
9:25, 8:27, and 8:40 minutes. We stop the model checker
after eight IVRs. According to our implementation of
New Schedule Start() in Alg. 1, IVR Ri permits, in
addition to all executions permitted by Ri−1, those
executions in which the threads appear in the or-
der Ti, T1, . . . , Ti−1, Ti+1, . . . , T8. Hence, R8 gives the
scheduler more freedom than R1, which may result in a
better execution performance, e.g., because a producer
which has its item available earlier does not have to
wait for all previous producers.
5.3 Deadlocks
A common issue with multi-threaded programs are
deadlocks, which may occur when multiple mutexes
are acquired in a wrong order, as in the program in
Fig. 7, in which two threads use two mutexes to protect
their critical sections. A deadlock is reached, e.g., when
T2 acquires mutex2 directly after T1 has acquired mutex1.
A monolithic verification approach would try to verify
one or more executions and, as soon as a deadlock is
found, report the execution that leads to the deadlock
as a counterexample. With manual intervention, this
counterexample can be inspected in order to identify
and fix the bug.
In contrast, Impara-IMC logs both safe and un-
safe IVRs. The first IVR found in this example cov-
ers all executions in which Threads 1 and 2 execute
their loop bodies in turns, with Thread 1 beginning.
The corresponding program schedule consists of a sin-
gle section schedule depicted in Fig. 8. As expected,
executing the program with enforcing the first program
schedule never leads to a deadlock. Executing the unin-
strumented program (without scheduling constraints)
leads to a deadlock after only a few hundred loop iter-
ations. Hence, IMC enables to safely use the program
deadlock-free and without manual intervention.
5.4 Race conditions through erroneous synchronization
The program in Fig. 9a shows a variant of the producer-
consumer problem with two producers and two con-





5 mutex m inode
6 mutex m busy




11 if not inode:
12 lock(m busy)
13 busy := true
14 unlock(m busy)
15 inode := true





21 if not busy:
22 block := 0





28 inode := false
29 busy := false
30 unlock(m inode)
31 unlock(m busy)
Fig. 10: The file system benchmark
1 Thread T1:
2 while true:
3 if not inode:
4 busy := true
5 inode := true
6 atomic−begin
7 assume inode and busy




12 if not busy:
13 atomic−begin
14 assume not busy





20 assume inode = busy
21 inode := false
22 busy := false
23 atomic−end
Fig. 11: The file system benchmark with synchronization constraints in assume state-
ments
1 Thread T ′2:
2 while true:
3 atomic−begin
4 assume not busy
5 block := 0
6 atomic−end
Fig. 12: Thread T ′2: the if-
statement is omitted
1 initially:
2 empty buffer of size 1000
3 count = 0





9 if count != 1000:
10 int return value = produce()






17 if top > 0:
18 return value = consume();
19 assert(return value != UNDERFLOW);
20 unlock()
Fig. 13: A correct program for the producer-consumer
problem with four producers and four consumers
sumers which uses erroneous synchronization: both the
produce and consume procedures check the amount of free
space without acquiring the mutex first. For example, a
buffer underflow occurs if the buffer contains only one
item and the two consumers concurrently find that the
buffer is not empty; although the buffer becomes empty
after the first consumer has removed the last item, the
second consumer tries to remove another item.
The first IVR found by Impara-IMC is depicted
simplified in Fig. 9b. The simplification merges all in-
dividual edges of a procedure into a single edge, which
is possible as Impara-IMC does not apply context
switches inside of procedures during the first iteration.
Since both procedures appear to be executed atomi-
cally, no assertion violation is found during the first
iteration. We ran the program with a program schedule
corresponding to the first IVR. As expected, we have
not observed any assertion violations.
5.5 Declarative synchronization
Fig. 10 shows an extension of a benchmark used in [15],
which is a simplified extract of the multi-threaded
Frangipani file system. The program uses a time-
varying mutex: depending on the current value of
the busy bit, a disk block is protected by m busy or m inode.
We want to evaluate whether we can use Impara-IMC
to generate safe program schedules even if all mutexes
are (intentionally) removed from the program.
For this purpose, we use a variant of the file system
benchmark where all mutexes are removed and synchro-
nization constraints are declared as assume statements,
shown in Fig. 11. It is sufficient to assure for T1 that
the block is written only if it is allocated, i.e., both inode
and busy are true. For T2, it is sufficient to assure that
the block is only reset if it is not busy, i.e., busy = false.
Finally, for T3, it is necessary to assure that the block
is deallocated only if it is already deallocated or fully
allocated, i.e., inode = busy.
Running Impara-IMC on the file system benchmark
without mutexes yields a first program schedule that
schedules T1, T2, T3 repeatedly in this order, accord-
ing to our simple heuristic for an initial IVR. However,
although all executions permitted by this schedule are
fair, the if-condition of T2 always evaluates to false and
T2 never performs useful work. To obtain a more useful
schedule, we inform the model checker that the (omit-
ted) else-branch of Thread T2 is not useful. We encode
14 Patrick Metzler et al.
this information by inserting else: assume false. After sim-
plifying the code, we obtain T ′2 as depicted in Fig. 12.
For the updated code, Impara-IMC yields a first sched-
uler that schedules T3 before T2 before T1, so that all
threads perform useful work.
5.6 Performance
Tab. 1 shows the performance impact of enforcing
IVRs on several correct programs. Each program is
model-checked once until the first IVR (Impara-IMC)
and once completely (Impara-C). As a baseline, the
program is run without schedule enforcement (uncon-
strained). The first IVR is enforced without (Opt0),
and with optimizations (Opt1, Opt2). Opt1 applies
POR and omits operations on synchronization objects
(mutexes, barriers).2 Opt2 uses, in addition to Opt1,
longer section schedules (by replicating a section eight
times) and stronger partial-order reduction that identi-
fies independent accesses to distinct indices of an array.
Additionally, for the producer-consumer benchmark,
we apply a compiler-like optimization, removing and
reordering events to reduce the number of constraints.3
Both Opt1 and Opt2 enable the concurrent execution
of more memory accesses, e.g., because the beginning
of a critical section can already be executed before
a thread arrives at a constrained access that has to
wait. The schedules for each benchmark (Opt0–Opt2)
are obtained from the first IVR. As all benchmarks
use unbounded loops, we measure the execution time
performance by counting useful (i.e., with a successful
concurrent access such as a produced item) loop itera-
tions and terminating the execution after 2 seconds.
At the example of a section schedule of the producer-
consumer benchmark with two threads, Fig. 14a–14b il-
lustrates the difference between optimizations. Fig. 14a
shows a section schedule for Opt0. All shared memory
events are executed strictly sequentially, as it is the
case with unconstrained executions: only the thread
holding the lock is allowed to access shared memory.
Opt1 removes the lock operations while maintaining
the same ordering of events. Opt2, cf. Fig.14b, relaxes
the original ordering, subsumes eight loop executions of
both threads, and eliminates the redundant read event
of count.
In Fig. 14b, when the consumer executes the sched-
uler callback before its first event (read count), it looks
2 As enforcing an IVR is redundant to synchronization over
existing mutexes and barriers, omitting them is safe.
3 Opt2 follows a general algorithm, however we do not au-
tomate our implementation of Opt2, as it would be a large
effort to implement compiler optimizations. Our implemen-
tation of Opt1 is automated.
up the constraint e12 → e21 and waits for the producer
to finish event e12. When the producer in the callback
after e12 has notified that e12 has been executed, the
consumer continues and executes e21. Similarly, the pro-
ducer is permitted to execute e14 before e23 has been
executed. Thus, the constrained execution under the
optimized schedule permits “more” concurrency (i.e.,
more events to be executed concurrently) than the un-
constrained execution with locks.
For instance, the consumer is allowed to read the
counter already after the producer has written it and
does not have to wait for the producer to also write an
item to the buffer.
We use the producer-consumer implementation
(with correct synchronization and buffer size 1000)
from SV-COMP [1] (stack safe), modified with an un-
bounded loop and with 1, 2, and 4 producers and
consumers. The double lock benchmark is a corrected
version (lock operations in T2 reversed) of the dead-
lock benchmark (Sec. 5.3), where the critical section is
simulated by sleeping for 1 ms; the uncorrected version
reached a deadlock after only 172 loop iterations. The
file system benchmark from SV-COMP (time var mu-
tex safe) is extended with a third thread and again with
unbounded loops as in Sec. 5.5. The barrier benchmark
uses two barriers to implement ring communication
between threads.
As the model checking columns of Tab. 1 show,
Impara-IMC finds the first IVR often much faster
than or at least as fast as it takes Impara-C for com-
plete model checking; it can produce an IVR even
for our largest benchmarks, where Impara-C times
out. For a buffer size of 5, Impara-C can verify the
producer-consumer benchmark even with eight threads
but again, Impara-IMC is considerably faster in find-
ing the first IVR. Subsequent IVRs were generated
considerably faster than the first IVR, which might be
caused by caching of facts in the model checker.
The verification time for the producer-consumer
benchmark of both Impara-C and Impara-IMC ap-
pears to grow exponentially with the number of threads.
This growth is not a limitation of our approach but
a property of the application of lazy abstraction with
interpolants in Impara. Potentially, Impara can be im-
proved by including symmetry reduction, which would
reduce the verification time for both Impara-C and
Impara-IMC but is outside of the scope of this work.
Somewhat surprisingly, some benchmarks are slower
when executed unconstrained than under Opt2. We
conjecture that this is caused by more memory ac-
cesses being executed in parallel under Opt2, as all
other effects of Opt2 only improve handling by our
user space scheduler and do not affect unconstrained
Extracting Safe Thread Schedules from Incomplete Model Checking Results 15
T1 (producer):
if (count < N)
local count = count
buf[local count + 1] = item
count = local count + 1
e11 , lock
e12 , read count
e13 , read count
e14 , write buf
e15 , write count
e16 , unlock
T2 (consumer):
if (count > 0)
local count = count
count = local count − 1
item = buf[local count − 1]
e21 , lock
e22 , read count
e23 , read count
e24 , write count
e25 , read buf
e26 , unlock
Fig. 14 (a) Section schedule for the producer-consumer benchmark (Opt0)
T1 (producer):
local count = count
count = local count + 1
buf[local count + 1] = item
local count = count
count = local count + 1
buf[local count + 1] = item
e11 , read count
e12 , write count
e13 , write buf
e14 , read count
e15 , write count
e16 , write buf
T1 (producer):
local count = count
count = local count + 1
buf[local count + 1] = item
local count = count
count = local count + 1
buf[local count + 1] = item
e21 , read count
e22 , write count
e23 , write buf
e24 , read count
e25 , write count
e26 , write buf
Fig. 14 (b) Section schedule for the producer-consumer benchmark (Opt2)
Table 1: Experimental results (to: timeout, rounded to full seconds)
Performance is measured in number of useful (e.g., with a successful concurrent access such as a produced item)
loop iterations within a time limit of 2 seconds.
Model checking Performance (higher is better)
Benchmark Time 1st IVR Impara-C Opt0 Opt1 Opt2 Unconstrained
prod.-cons. 1p 1c 1000b 2m 0 s to (72h) 4 864 489 7 466 093 11 370 258 8 199 202
prod.-cons. 2p 2c 1000b 23m 47 s to (72h) 3 400 187 5 959 041 8 428 598 11 643 208
prod.-cons. 4p 4c 1000b 4 h 29m 53 s to (72h) 1 327 063 2 576 695 3 676 876 7 210 796
prod.-cons. 1p 1c 5b 2 s 2 m 28 s 4 945 116 7 075 596 12 372 817 7 915 465
prod.-cons. 2p 2c 5b 18 s 1 m 16 s 3 194 019 5 514 429 9 271 859 6 933 172
prod.-cons. 4p 4c 5b 2m 41 s 9 m 44 s 1 345 991 2 465 108 3 392 111 3 240 136
double lock 1 ms 0 s 0 s 1 845 1 834 3 217 1 797
file system 0 s 0 s 3 667 4 877 035 6 705 672 23 822 129
barrier 1 s 4 m 14 s 1 238 720 8 285 228 14 586 849 1 077 907
executions. It is, however, not directly possible to mea-
sure the effect of parallelizing memory accesses: in
order to re-sequentialize memory accesses under Opt2,
synchronization (e.g., over a mutex) would have to be
added, which produces additional overhead.
In all cases but one, Opt2 is considerably faster than
Opt1, which is considerably faster than Opt0. The high-
est overhead is observed for the file system benchmark,
where Opt2 is about 3.5 times slower than the uncon-
strained execution. We conjecture that the high over-
head here stems from an unequal distribution of loop it-
erations among threads, when executed unconstrained:
the loop body of T2 was executed nearly 100 times more
frequently than T1, while it is shorter and probably
faster. Opt0–Opt2 execute all threads nearly balanced.
In addition to the Pthread barriers used in the bar-
rier benchmark, we tried a variant with busy waiting
barriers, where the unconstrained execution showed a
performance of 13 567 135, which is still slower than
Opt2.
Comparing the results for the producer-consumer
benchmark with a buffer size of 1000 to those for a
buffer size of 5, we observe that there is no considerable
effect on Opt0–Opt2 but on most of the unconstrained
executions. This observation is comprehensible, as the
first IVR does not make use of more than at most four
16 Patrick Metzler et al.
Execution time (s)
Schedule Constrained Unconstrained Relative
S1 3.34 3.25 1.03
S2 3.34 3.25 1.03
S3 3.6 3.25 1.10
S4 3.57 3.25 1.10
Table 2: Experimental performance results for pfscan
cells in the buffer (in case of four producers). The per-
formance of unconstrained executions decreases with a
smaller buffer as the chance that the buffer is full and
a producer has to wait is higher. For all three configu-
rations with a buffer size of 5, Opt2 shows the highest
execution time performance.
Even in repeated executions of the experiment,
the unconstrained variant of double lock showed only
“starving” executions in the sense that the second
thread was never able to acquire the mutexes before
the timeout of 2 seconds. Hence, the constrained exe-
cutions improve on the operating system scheduler in
terms of a balanced execution of all threads.
In order to compare to the enforcement of input-
covering schedules [7] (explained in Section 6), we mea-
sure the overhead of our scheduler implementation on
the pfscan benchmark used there. Pfscan is a parallel
implementation of grep and uses 1 producer and 2 con-
sumer threads to distribute tasks, consisting of reading
and searching a file for a given query. As input, we use 8
files with 100MB of random content each. We evaluate 4
different schedules4, which show an overhead between
3% and 10% (with Opt2). Hence, IVRs can perform
much better than input-covering schedules (60% over-
head reported in [7]).
Tab. 2 contains our experimental results for the pfs-
can benchmark. We use two worker threads in addition
to the main thread. The benchmark is executed with
scheduling constraints of several program schedules S1–
4 (column two) and unconstrained (column three). Ex-
ecution times are given in seconds. The fourth column
gives the relative execution time (overhead). In all con-
strained configurations, operations on synchronization
objects have been omitted (Opt1). S1, S2, and S3 are
program schedules as they can be produced during the
first iteration of our model checking algorithm. Pro-
gram schedule S4 allows any interleaving of critical sec-
tions so that all executions of the unconstrained pro-
gram are matched. S1 and S2 contain sections that
comprise both worker threads, while S3 and S4 con-
tain only single-threaded sections. S1 and S2 differ in
the ordering of the worker threads.
4 As Impara cannot handle several features used by pfscan
(such as condition variables, structs, and standard output),
we manually generate initial IVRs.
S3 causes an overhead of 10% with respect to the
unconstrained execution. Although S4 allows any inter-
leaving of critical sections, there remains an overhead of
10% caused by looking up section schedules during the
execution. S1 and S2 show only a small overhead of 3%.
We conjecture that the lower number of section sched-
ule look-ups (compared to S3 and S4) is responsible for
the considerably lower overhead.
6 Related work
Unbounded model checking [20,40,33,18] is a technique
to verify the correctness of potentially non-terminating
programs. In our setting, we deploy algorithms that use
abstract reachability trees (ARTs) [21,28,40] to repre-
sent the already explored state space and schedules,
and perform this exploration in a forward manner. In-
stead of discarding an ART after an unsuccessful at-
tempt to verify a program, we use the ART to extract
safe schedules.
Conditional model checking [8] reuses arbitrary in-
termediate verification results. In contrast to our ap-
proach, they are not guaranteed to prove the safety of a
program that is functional under all inputs and does not
enforce the preconditions (e.g., scheduling constraints)
of the intermediate result.
Context bounding [37,36,32] eases the model check-
ing problem by bounding the number of context
switches. It is limited to finite executions and unlike
our approach, does not enforce schedules at runtime.
Automated fence insertion [13,24,2,3,26] trans-
forms a program that is safe under sequential con-
sistency to a program that is also safe under weaker
memory models. While the amount of non-determinism
in the ordering of events is reduced, non-determinism
due to scheduling can not be influenced. Synchroniza-
tion synthesis [19] inserts synchronization primitives in
order to prevent incorrect executions, but may intro-
duce deadlocks.
Deterministic multi-threading (DMT) [4,6,7,12,11,
27,31,35] reduces non-determinism due to scheduling
in multi-threaded programs. Schedules are chosen dy-
namically, depending on the explicit input, and can
not be enforced by a model checker. Nevertheless,
there are combinations with model checking [11] and
instances which schedule based on previously recorded
executions [12].
We are aware of only one DMT approach that
supports symbolic inputs [7]. Similar to our sections,
bounded epochs describe infinite schedules as permu-
tations of finite schedules. Via symbolic execution,
an input-covering set of schedules is generated, which
Extracting Safe Thread Schedules from Incomplete Model Checking Results 17
contains a schedule for each permutation of bounded
epochs. As all permutations need to be analyzed (even
if they are infeasible), state space explosion through
concurrency is only partially avoided; indeed, the ex-
perimental evaluation shows that the analysis is infea-
sible even for five threads when the program has many
such permutations. In contrast, we do not require race-
freedom, use model checking, sections may contain
multiple threads, omit infeasible schedules, and allow
a safe execution from the first schedule on, i.e., an IVR
can be considerably smaller than an input-covering set
of schedules.
Deterministic concurrency requires a program to be
deterministic regardless of scheduling. In [38], a deter-
ministic variant of a concurrent program is synthesized
based on constraints on conflicts learned by abstract
interpretation. In contrast to DMT, symbolic inputs
are supported, however no verification of general safety
properties is done and the degree of non-determinism
is not adjustable, in contrast to IVRs.
Sequentialized programs [37,25,14,22,33,34] em-
ulate the semantics of a multi-threaded program, al-
lowing tools for sequential programs to be used. The
amount of possible schedules is either not reduced at
all or similar to context bounding.
7 Conclusion
We present a formal framework for using IVRs to ex-
tract safe schedules. We state why it is legitimate to
constrain scheduling (in contrast to inputs) and for-
mulate general requirements on model checkers in our
framework. We instantiate our framework with the Im-
pact model checking algorithm and find in our evalua-
tion that it can be used to 1. model check programs that
are intractable for monolithic model checkers, 2. safely
execute a program, given an IVR, even if there exist
unsafe executions, 3. synthesize synchronization via as-
sume statements, and 4. guarantee fair executions. A
drawback of enforcing IVRs is a potential execution
time overhead, however, in several cases, constrained
executions turned out to be even faster than uncon-
strained executions.
References
1. Benchmark suite of the competition on software ver-
ification (SV-COMP). https://github.com/sosy-lab/
sv-benchmarks
2. Abdulla, P.A., Atig, M.F., Chen, Y., Leonardsson, C.,
Rezine, A.: Counter-example guided fence insertion un-
der TSO. In: TACAS. Springer (2012)
3. Abdulla, P.A., Atig, M.F., Chen, Y., Leonardsson, C.,
Rezine, A.: Memorax, a precise and sound tool for au-
tomatic fence insertion under TSO. In: TACAS, LNCS.
Springer (2013)
4. Aviram, A., Weng, S., Hu, S., Ford, B.: Efficient system-
enforced deterministic parallelism. In: OSDI. USENIX
Association (2010)
5. Baier, C., Katoen, J.P.: Principles of model checking.
MIT Press (2008)
6. Bergan, T., Anderson, O., Devietti, J., Ceze, L., Gross-
man, D.: Coredet: a compiler and runtime system for de-
terministic multithreaded execution. In: ASPLOS. ACM
(2010)
7. Bergan, T., Ceze, L., Grossman, D.: Input-covering
schedules for multithreaded programs. In: OOPSLA
(2013)
8. Beyer, D., Henzinger, T.A., Keremoglu, M.E., Wendler,
P.: Conditional model checking: a technique to pass in-
formation between verifiers. In: FSE. ACM (2012)
9. Beyer, D., Keremoglu, M.E.: Cpachecker: A tool for con-
figurable software verification. In: CAV, LNCS, vol. 6806,
pp. 184–190. Springer (2011)
10. Clarke, E.M., Grumberg, O., Minea, M., Peled, D.: State
space reduction using partial order techniques. STTT
2(3) (1999)
11. Cui, H., Simsa, J., Lin, Y., Li, H., Blum, B., Xu, X.,
Yang, J., Gibson, G.A., Bryant, R.E.: Parrot: a practical
runtime for deterministic, stable, and reliable threads. In:
SOSP. ACM (2013)
12. Cui, H., Wu, J., Gallagher, J., Guo, H., Yang, J.: Effi-
cient deterministic multithreading through schedule re-
laxation. In: SOSP. ACM (2011)
13. Fang, X., Lee, J., Midkiff, S.P.: Automatic fence insertion
for shared memory multiprocessing. In: ICS. ACM (2003)
14. Fischer, B., Inverso, O., Parlato, G.: Cseq: A concurrency
pre-processor for sequential C verification tools. In: ASE.
IEEE (2013)
15. Flanagan, C., Freund, S.N., Qadeer, S.: Thread-modular
verification for shared-memory programs. In: ESOP,
LNCS. Springer (2002)
16. Flanagan, C., Godefroid, P.: Dynamic partial-order re-
duction for model checking software. In: POPL. ACM
(2005)
17. Godefroid, P.: Partial-Order Methods for the Verifica-
tion of Concurrent Systems - An Approach to the State-
Explosion Problem, LNCS, vol. 1032. Springer (1996)
18. Günther, H., Laarman, A., Sokolova, A., Weissenbacher,
G.: Dynamic reductions for model checking concurrent
software. In: VMCAI, LNCS. Springer (2017)
19. Gupta, A., Henzinger, T.A., Radhakrishna, A., Samanta,
R., Tarrach, T.: Succinct representation of concurrent
trace sets. In: POPL. ACM (2015)
20. Henzinger, T.A., Jhala, R., Majumdar, R.: Race checking
by context inference. In: PLDI. ACM (2004)
21. Henzinger, T.A., Jhala, R., Majumdar, R., Sutre, G.:
Lazy abstraction. In: POPL, pp. 58–70. ACM (2002)
22. Inverso, O., Tomasco, E., Fischer, B., La Torre, S., Par-
lato, G.: Bounded model checking of multi-threaded C
programs via lazy sequentialization. In: CAV. Springer
(2014)
18 Patrick Metzler et al.
23. Kroening, D., Weissenbacher, G.: Interpolation-based
software verification with wolverine. In: CAV, LNCS, vol.
6806, pp. 573–578. Springer (2011)
24. Kuperstein, M., Vechev, M.T., Yahav, E.: Automatic in-
ference of memory fences. In: FMCAD. IEEE (2010)
25. Lal, A., Reps, T.W.: Reducing concurrent analysis under
a context bound to sequential analysis. Formal Methods
in System Design 35(1), 73–97 (2009)
26. Linden, A., Wolper, P.: A verification-based approach to
memory fence insertion in PSO memory systems. In:
TACAS, LNCS. Springer (2013)
27. Liu, T., Curtsinger, C., Berger, E.D.: Dthreads: efficient
deterministic multithreading. In: SOSP. ACM (2011)
28. McMillan, K.L.: Lazy abstraction with interpolants. In:
CAV, LNCS. Springer (2006)
29. Metzler, P., Saissi, H., Bokor, P., Suri, N.: Quick ver-
ification of concurrent programs by iteratively relaxed
scheduling. In: ASE. IEEE Computer Society (2017)
30. Metzler, P., Suri, N., Weissenbacher, G.: Extracting safe
thread schedules from incomplete model checking results.
In: SPIN, LNCS. Springer (2019)
31. Mushtaq, H., Al-Ars, Z., Bertels, K.: Detlock: Portable
and efficient deterministic execution for shared memory
multicore systems. In: High Performance Computing,
Networking Storage and Analysis. IEEE (2012)
32. Musuvathi, M., Qadeer, S.: Iterative context bounding for
systematic testing of multithreaded programs. In: PLDI.
ACM (2007)
33. Nguyen, T.L., Fischer, B., La Torre, S., Parlato, G.: Lazy
sequentialization for the safety verification of unbounded
concurrent programs. In: ATVA, LNCS (2016)
34. Nguyen, T.L., Schrammel, P., Fischer, B., La Torre, S.,
Parlato, G.: Parallel bug-finding in concurrent programs
via reduced interleaving instances. In: ASE. IEEE Com-
puter Society (2017)
35. Olszewski, M., Ansel, J., Amarasinghe, S.P.: Kendo: effi-
cient deterministic multithreading in software. In: ASP-
LOS (2009)
36. Qadeer, S., Rehof, J.: Context-bounded model checking
of concurrent software. In: TACAS, LNCS. Springer
(2005)
37. Qadeer, S., Wu, D.: KISS: keep it simple and sequential.
In: PLDI. ACM (2004)
38. Raychev, V., Vechev, M.T., Yahav, E.: Automatic syn-
thesis of deterministic concurrency. In: SAS. Springer
(2013)
39. Valmari, A.: The state explosion problem. In: Lectures
on Petri Nets I: Basic Models, Advances in Petri Nets.
Springer (1996)
40. Wachter, B., Kroening, D., Ouaknine, J.: Verifying multi-
threaded software with impact. In: FMCAD. IEEE
(2013)
