Safe Execution of Concurrent Programs by Enforcement of Scheduling
  Constraints by Metzler, Patrick et al.
Safe Execution of Concurrent Programs by
Enforcement of Scheduling Constraints
Patrick Metzler, Habib Saissi, Péter Bokor, and Neeraj Suri
Technische Univeristät Darmstadt
{metzler, saissi, pbokor, suri}@deeds.informatik.tu-darmstadt.de
Abstract
Automated software verification of concurrent programs is challeng-
ing because of exponentially growing state spaces. Verification techniques
such as model checking need to explore a large number of possible execu-
tions that are possible under a non-deterministic scheduler. State space
reduction techniques such as partial order reduction simplify the verifica-
tion problem, however, the reduced state space may still be exponentially
large and intractable.
This paper discusses Iteratively Relaxed Scheduling, a framework that
uses scheduling constraints in order to simplify the verification problem
and enable automated verification of programs which could not be han-
dled with fully non-deterministic scheduling. Program executions are safe
as long as the same scheduling constraints are enforced under which the
program has been verified, e.g., by instrumenting a program with ad-
ditional synchronization. As strict enforcement of scheduling constraints
may induce a high execution time overhead, we present optimizations over
a naive solution that reduce this overhead. Our evaluation of a prototype
implementation on well-known benchmark programs shows the effect of
scheduling constraints on the execution time overhead and how this over-
head can be reduced by relaxing and choosing constraints.
1 Introduction
Concurrent programs with non-deterministic scheduling may show exponentially
large state spaces which is a hurdle for automated verification such as model
checking [Val96]. In contrast to testing, verification covers all possible pro-
gram behavior, including different behavior due to non-deterministic schedul-
ing. In order to reduce the complexity of the verification task, model check-
ing techniques such as partial order reduction (POR) [CGMP99,God96,FG05]
exist that reduce a concurrent program’s state space. Although recent POR
1
ar
X
iv
:1
80
9.
01
95
5v
1 
 [c
s.P
L]
  6
 Se
p 2
01
8
algorithms [WKO13,AAJS14,AAdlB+17,SRDK17] show considerable improve-
ments, even reduced state spaces may be of exponential size [God96] and state
of the art tools for automated verification may need a long time to verify a
program. In other cases, state of the art tools may even not terminate at all for
a given program. Even if it is possible to manually rewrite such a program to
make it tractable by a particular tool, such a process would be time consum-
ing. Overall, the delay that is introduced by verification between completion of
the implementation of a concurrent program and deployment, the verification
delay, is presumably still unacceptably high for a wide industrial adoption of
verification.
Iteratively Relaxed Scheduling (IRS) [MSBS17] is a framework for verification
of safety properties on concurrent programs under non-deterministic schedul-
ing. It enables a reduction and adjustment of the verification delay by enforcing
scheduling constraints that ensure a safe program execution even when only a
fraction of the sate space is already proven to be safe. By iteratively using inter-
mediate verification results, scheduling constraints can be relaxed to gradually
increase the amount of non-determinism of scheduling.
Permitting a safe execution of a program before complete verification has
finished is a key novelty of IRS and distinguishes it from other verification tech-
niques that use intermediate verification results. We are aware of two classes
of approaches to restricting scheduling of programs, which is also necessary in
IRS to ensure that only correct executions may occur. (A) Deterministic multi-
threading (DMT) and related techniques facilitate concurrency testing by re-
ducing the amount of non-determinism due to concurrency, thereby reducing the
number of necessary test cases and improving reproducibility of errors. Schedul-
ing of concurrent programs is restricted such that scheduling is deterministic for
a particular input or only a reduced set of schedules may occur for a particu-
lar input [LCB11,CSL+13]. (B) Another approach to reduce non-determinism
is to synthesize synchronization statements, which allows to restrict scheduling
independently of inputs, in contrast to (A). Representatives of this class are au-
tomated fence insertion, which has been applied to facilitate verification of con-
current programs with relaxed memory models [BM08,FLM03] and synchroniza-
tion synthesis such as [GHR+15], which inserts locks and other synchronization
primitives that are more powerful than fences in that also scheduler-related non-
determinism can be eliminated. Methods in (A) depend on concrete inputs and
are therefore not suited for verification, as a change in the program inputs may
have an effect on scheduling that is unforeseeable for the verifier. The effective-
ness of (B) is limited as only non-determinism due to relaxed memory accesses is
removed (in case of fence insertion) or deadlocks may be introduced [GHR+15],
which limits the program’s functionality.
Compared to such existing approaches, IRS shows several differences that
make it a candidate for more effective reduction of the verification delay. IRS
uses intermediate verification results to record and iteratively increase knowl-
edge about schedules that adhere to a given program specification (we denote
such schedules as correct). Such schedules are guaranteed to show correct be-
havior independently of program inputs. IRS applies scheduling constraints in
2
order to enforce that only correct schedules occur. As more schedules are ex-
plored during verification and thus known to be correct, scheduling constraints
can be relaxed. In contrast to (A), determinism can be enforced independently
of program inputs and the amount of non-determinism is controllable dynami-
cally (during program execution) via scheduling constraints. Additionally, un-
like some techniques in (A), IRS is able to provide strong determinism [OAA09].
In contrast to (B), IRS addresses scheduler-related non-determinism and enables
to adjust the amount of non-determinism between fully deterministic and fully
non-deterministic executions.
Common to all the aforementioned techniques is execution time overhead
caused by additional synchronization. A central assumption underlying the fea-
sibility of IRS is that scheduling constraints can be designed such that the exe-
cution time overhead drops below an acceptable threshold. This paper validates
this assumption for different classes of benchmark programs. Hence, IRS allows
to find a sweet spot between a low verification overhead and a low execution
time overhead.
In order to iteratively relax scheduling constraints, it is necessary to col-
lect knowledge about correct schedules already during the verification process
rather than monolithically at the end. In contrast to existing model check-
ing approaches that use intermediate verification results, IRS comprises addi-
tional requirements: an intermediate verification result proves safety under some
scheduling constraints. These constraints must permit to fully use the program,
i.e., no restrictions on the program inputs or the execution length may be ap-
plied. This paper provides a formal interface between IRS and a model checker
that describes this requirement for finite executions. We leave the formalization
of requirements for infinite (non-terminating) executions for future work.
Contributions. We provide a formal framework and algorithm for IRS that
(1) states requirements for verification algorithms on their intermediate results
and (2) enables a correct execution of a program provided with such an inter-
mediate verification result, without introducing deadlocks. (3) We evaluate the
effect of relaxing scheduling constraints on execution time overhead, based on an
optimized scheduling implementation of IRS. Our results show that execution
time overhead depends not only on the number of scheduling constraints but
on their structure as well. After verifying only 1% of a program’s Mazurkiewicz
traces, execution time overhead of IRS can be reduced below 50%.
2 Overview
Iteratively relaxed scheduling (IRS) is an iterative verification approach for con-
current programs with non-deterministic scheduling, in contrast to the con-
ventional, monolithic approach to program verification. This section reviews
the basic IRS approach that we have previously presented [MSBS17]. Detailed
definitions are provided in Section 3. The conventional approach to program
verification can be described as follows: (1) Develop a program or update an
existing program. (2) Verify the (updated) program. (3) Repeat steps (1)–(2)
3
until the verification is successful. (4) The (updated) program can be safely
used under a non-deterministic scheduler, i.e., with all feasible schedules. Since
all feasible schedules have to be verified, the verification step may be slow or
may even not terminate, in which case a different verification technique or tool
would have to be tried or the program would have to be rewritten such that
it is tractable by the used verification technique. In any of these cases, a large
verification delay is introduced.
By constraining the scheduler and thereby reducing the number of feasible
schedules, IRS tries to reduce the verification delay. A program can be safely
used as soon as a sufficiently large initial set of safe schedules is found. An
IRS execution environment ensures that only safe schedules are feasible. Safe
schedules that are found later can be used to iteratively relax the scheduling
constraints. In particular, the verification approach of IRS is: (1) Develop a
program or update an existing program. (2) Start an iterative verification pro-
cess which verifies in each iteration an individual schedule or a set of schedules.
(3) As soon as a sufficient subset of safe schedules is found, the program can be
safely used inside an IRS execution environment. (4) New safe schedules that
are found in subsequent iterations may relax the scheduling constraints during
execution of the program.
The set of initially found safe schedules must enable a program execution
without restricting functionality. Hence, for each possible input, a safe schedule
must be available. Additionally, a specific use case may require additional ini-
tially safe schedules, for example to allow a practical enforcement of scheduling
constraints.
Initial experiments have shown that constraining scheduling may introduces
considerable execution time overhead [MSBS17]. Only if it is possible to reduce
the execution time overhead by relaxing scheduling constraints, the overhead
incurred by IRS can be adjusted: the more schedules are verified, the less over-
head will occur. In this case, the sweet spot between a short verification delay
and a small execution time overhead can be found by continuously testing the
execution time overhead with the current set of schedules found to be safe. As
soon as the execution time overhead is small enough (i.e., a “sufficient amount
of non-determinism” is used), the program can be used and verification can be
stopped (i.e., no more that the “necessary amount of non-determinism” is used).
If a program shows a sufficient set of schedules to use it without restricting
functionality and additionally shows schedules that violate the specification, IRS
can be used to nevertheless safely execute the program. The program may be
fixed, in which case it may be possible to eventually remove all scheduling con-
straints, or it may be left unchanged and used only with scheduling constraints
that guarantee safety.
Several conceivable use cases are given in [MSBS17], including (1) safely
use programs that are not tractable by conventional verification, (2) safely use
programs with bugs, (3) verify as many schedules as possible within a given
time budget, and (4) verify as many schedules as necessary for a given budget
of execution time performance.
An IRS execution environment may be realized inside an application program
4
or by modifying the operating system. For example, in the former case, the
program may be instrumented so that a thread waits before memory accesses
that are not yet permitted to occur, according to the scheduling constraints.
Even if the scheduler of the operating system is non-deterministic, the scheduling
constraints are enforced. In the latter case, it is conceivable to directly constraint
the scheduler of the operating system to obtain an IRS execution environment
and enforce schedules.
Unlike previous approaches to deterministically execute concurrent programs
(e.g., [LCB11,CSL+13,CWG+11]), IRS provides a novel approach to constraint
scheduling independent from program inputs. This independence makes it com-
patible with program verification. However, only such verification techniques are
suitable for IRS that yield meaningful intermediate verification results. Mean-
ingful intermediate verification results either show a counter example for pro-
gram correctness or guarantee correctness under some feasible scheduling con-
straints. No additional constraints should be necessary such as constraints about
program inputs or execution length, as a program may not be fully operational
under such constraints. In particular, a correct schedule has to be known for
each possible program input, even if inputs are given interactively (during a
program execution).
Intermediate verification results of this form have previously not been pro-
posed, to the best of our knowledge. Nevertheless, some existing verification
approaches show similarities or potential to be applied in the framework of IRS.
Forward search-based, symbolic model checking, such as the Impact algorithm
for concurrent programs by Wachter, Kroening, and Ouaknine [WKO13], seems
to be suitable to generate intermediate verification results for IRS, by using a
depth-first strategy. This strategy might use heuristics that prioritize executions
according to their number of context switches, as it is done in context bound-
ing analyses, e.g., by Qadeer and Rehof [QR05] or in iterative context bounding
by Musuvathi and Qadeer [MQ07]. Furthermore, verification in IRS can parti-
tion the state space of a program into disjoint sets of schedules and verify each
partition individually. A similar approach has been followed for bug finding by
Nguyen et al. [NSF+17]. Conditional model checking by Beyer et al. [BHKW12]
could be used in the framework of IRS if conditions passed between verification
runs encode suitable intermediate results.
Mazurkiewicz traces are equivalence classes on program executions and are
used in the context of partial order reduction (POR) [GKW+15,WKO13,FG05,
GFYS07, AAJS14] to identify executions that can be skipped during verifica-
tion. In order to formalize suitable intermediate verification results, IRS uses an
extended notion of Mazurkiewicz traces, called symbolic traces. Based on sym-
bolic traces, we formalize the requirements on a verifier for IRS in the following
section.
As in the related field of DMT, additional synchronization is necessary
to enforce scheduling constraints. Our experiments confirm that constraining
scheduling introduces a considerable execution time overhead, in extreme cases
a 44 times slowdown. A main concern for the praticality of IRS is to limit this
overhead depending on the requirements of a use case. We try to design IRS with
5
a low overhead by addressing several aspects: the amount of additional synchro-
nization for schedule enforcement, storage and look-up of scheduling constraints,
and the effect of relaxing constraints on the execution time overhead.
Section 3 addresses the former aspects: reducing the amount of additional
synchronization is achieved by permitting threads to execute as long as possible
without interruption. For an efficient encoding scheduling constraints, we intro-
duce trace prefixes, based on symbolic traces. Section 5 investigates the latter
aspect of execution time overhead due to scheduling constraints.
3 The IRS Algorithm
An IRS algorithm consists of a verifier and an execution environment, which
run concurrently. The verifier continuously verifies schedules and reports sets
of safe schedules to the execution environment. In order to guarantee program
executions without functional restrictions, submitted sets of safe schedules must
represent feasible schedules for all program inputs. This restriction for suitable
intermediate verification results are formalized below as symbolic traces. A
symbolic trace permits at least one schedule for each possible program input.
A symbolic trace that is reported by the verifier to the execution environment
is called an admissible trace. The execution environment maintains scheduling
constraints, which are updated for each submitted admissible trace.
When implementing IRS, it is desirable to efficiently maintain and enforce
scheduling constraints in order to incur as little overhead as feasible over con-
ventional program execution. This section presents a detailed IRS algorithm
and discusses its efficiency with respect to:
• Storing scheduling constraints
• Looking up scheduling constraints
• Synchronization between threads for the enforcement of scheduling con-
straints
The general IRS algorithm [MSBS17] maintains a set of admissible traces
and controls the scheduling of a given program such that at any time, the cur-
rent partial execution adheres to some admissible trace. As more and more
schedules or symbolic traces are proven to be correct, they are added to the
set of admissible traces. This representation of scheduling constraints has an
exponential space requirement and it seems impractical to store all symbolic
traces for large programs. Similarly, when permission for an event is checked,
the look-up time is exponential if no further structure is given to the set of
admissible traces. Unfoldings have been applied for model checking both Petri
nets [McM92] and concurrent programs [KSH12,RSSK15,SRDK17]. By unfold-
ings, it is possible to represent all executions of a concurrent program in a single
data structure, which is more space-efficient than storing a set of all symbolic
traces since each event occurs only once in an unfolding. Looking up an event in
an unfolding is faster than searching in an unstructured set of symbolic traces,
6
as well. However, the size of an unfolding can still grow quickly (exponentially
in the worst case) with an increasing number of threads [KSH14]. The space
efficiency of verification based on a depth-first search is lost. Hence, unfoldings
are not directly suitable to store scheduling constraints for practical programs.
In order to implement IRS, we address the problem of space complexity by us-
ing trace prefixes. If all admissible Mazurkiewicz traces or executions are stored
in order to express scheduling constraints, so that each time a new execution
has been verified and is permitted, more space is required. In contrast, trace
prefixes can be used as scheduling constraints such that when new executions
are permitted, constraints may be removed and less space is required. How-
ever, the use of trace prefixes requires the verifier to explore symbolic traces in
a depth-first manner. More freedom can be given to the verifier by extending
trace prefixes to partial unfoldings, at the price of a higher space requirement.
Additionally, an interesting question, however left for future work, is how to
generalize scheduling constraints for non-terminating programs, for example by
representing scheduling constraints via cyclic graphs or automatons.
Our tests of several IRS implementations confirmed that as expected, inter-
thread synchronization incurs a major part of execution time overhead of IRS
over unconstrained scheduling. In order to reduce the execution time overhead
caused by synchronization between threads, it is crucial to omit such synchro-
nization in case an event needs not to be scheduled after an event from an other
thread. The IRS algorithm presented in this section achieves this by execut-
ing several events without intermediate synchronization, as is detailed below.
Besides reducing the amount of inter-thread synchronization, execution time
overhead can be considerably reduced by reducing the duration of a single syn-
chronization, for example by using lock-free synchronization instead of locks.
We discuss this matter in Section 4.
In the following, we state our system model and present the IRS algorithm,
proving correctness and progress of the algorithm.
3.1 System Model
We model a (concurrent) program P is as a transition system (S, Sinit ,Σ,→)
where S is a set of states, Sinit ⊆ S is a set of initial states (program intputs), Σ
is a finite set of threads, and →⊆ (S ×Σ) ⇀ S is an acyclic transition relation.
We require that for a given state and thread, there is at most one successor
state, i.e., scheduling is the only source of non-determinism. We write s1
t−→ s2
to denote (s1, t, s2) ∈→.
An execution of P is an initial state and a sequence of events (s0, u) ∈
Sinit × (Σ × N)∗, where u = e1 . . . en, such that the following holds. (1) There
exist states and transitions such that s0
t1−→ s1 · · · tn−→ sn and (2) each event
ei = (ti, k) contains thread ti and counts occurrences of ti in u before position
i: k = |{ej : j < i∧ ej = (ti,_)}|. We denote the thread t of an event e = (t, k)
by tid(e). We write s0
u−→ sn if execution (s0, u) leads to state sn. An execution
(s0, u) is complete, written ⇓(s0,u), if it leads to a terminal state and otherwise
7
is partial. The set of all executions of P is denoted by executions(P ).
We assume that a dependency relation is given for every program P that gives
rise to a Mazurkiewicz equivalence relation on executions [Maz86,AAJS14] via a
happens-before relation between events. We extend the notion of Mazurkiewicz
traces by symbolic traces, defined below. The happens-before relation of one
or more executions is represented by a symbolic trace graph as a triple o =
(Eo, Co,→o) such that
• Eo ⊆ (Σ × N) is a set of events,
• Co is a set of path constraints (state predicates, e.g., collected during
model checking), and
• →o⊆ Eo×Co×Eo is a partial order labeled with path constraints, which
expresses a happens-before relation.
As an auxiliary function, we introduce remove(e, o), which removes event e
from symbolic trace graph o. Formally, remove(e, (Eo, Co,→o)) = (E′o, C ′o,→′o)
such that
• E′o = Eo \ e,
• →′o= {(e1, c, e2) ∈→o: e1 6= e ∧ e2 6= e}, and
• C ′o = {c : (_, c,_) ∈→′o}.
An execution (s0, u) adheres to the happens-before relation of a symbolic trace
graph o = (Eo, Co,→o), written (s0, u) 4 o, if u is empty, or
• e1 ∈ Eo,
• ∀(e, c, e′) ∈→o. e′ = e1 ⇒ s0 2 c and
• (s1, e2 . . . en) 4 remove(e1, o).
If u additionally contains exactly the events of Eo (Eo = {e : e ∈ u}), we write
(s0, u) ≈ o. Execution ((s0, u) is called a linearization of o.
Based on symbolic trace graphs and their correspondence to happens-before
relations of executions, we define (symbolic) traces, as a generalization
of Mazurkiewicz traces. Intuitively, a symbolic trace contains scheduling
information for all possible program inputs and represents all executions of a
program with matching scheduling.
Definition 1. A (symbolic) trace is a symbolic trace graph o such that ∀s0 ∈
Sinit .∃u. (s0, u) ∈ executions(P )∧ ⇓(s0,u) ∧(s0, u) ≈ o.
A trace prefix o is a trace, except that we do not require executions to be
complete: ∀s0 ∈ Sinit .∃u. (s0, u) ∈ executions(P ) ∧ (s0, u) 4 o.
8
1 T1:
2 input: int ∗x
3 local: int a
4 a := ∗x
5 ∗x := a + 1
6 assert ∗x == a + 1
7 T2:
8 input: int ∗y
9 local: int b
10 b := ∗y
11 ∗y := b + 1
Figure 1: Example program.
(T1, 0) : read x
(T1, 1) : write x
(T1, 2) : read x
(T2, 0) : read y
(T2, 1) : write y
[x==y]
[x==y]
Figure 2: Example symbolic trace for the program of Figure 1. In all executions
that adhere to this trace, the assertion in line 6 of thread T1 is not violated.
(Transitive edges are omitted.)
Example 1. A symbolic trace for the program of Figure 1, is given in Figure 2.
The program consists of two threads, T1 and T2. Each thread is given a pointer
as input and increments the value at the pointer’s target, mistakenly without
synchronization. Thread T1 asserts that the target of x indeed holds the intended
value. In case the pointers x and y point to different memory locations, the
threads do not interfere with each other and the assertion holds. Otherwise,
dependent accesses occur and the assertion does not hold under every possible
ordering of events. The symbolic trace in Figure 2 ensures that the assertion
holds in all executions that adhere to the trace. Nodes correspond to events and
are labeled with the corresponding memory access for clarity. Edges between
events of the same thread represent the thread’s program order; edges between
events of different threads represent scheduling constraints. Since dependencies
between T1 and T2 occur only if the pointer targets match, x==y, the scheduling
constraints are labeled with this condition.
If a trace graph is used to encode scheduling constraints, (s0, u) 4 o can be
checked as follows: (s0, u) 4 o holds if u is empty. Intuitively, that u is empty
means that it does not contain any events that can violate any constraint given
by o. If u is not empty, (s0, u) 4 o is satisfied if the first event ei of u occurs in o
without incoming edge that satisfies the current path constraints and recursively,
(s′, u′) 4 o′ holds, where s′ is the successor state of s0 after ei, u′ is u with the
first event removed, and o′ is o with ei and all adjacent edges removed.
9
Algorithm 1: IRS with trace prefixes and execution of sequences without
synchronization
Data: oadm – the admissible trace prefix, initially an arbitrary complete,
correct trace of program P
1 Verifier:
2 initialize internal verification status G such that safeG(oadm);
3 while not finishedG do
4 do next verification step and update G;
5 if ∃o′ < oadm . safeG(o′) then
6 oadm ← o′;
7 Execution environment:
8 set the current execution (s0, u) to the current program input and
empty sequence;
9 while P has not terminated do
10 choose some sequence v from free(s0, u, oadm);
11 execute v;
12 append v to u;
3.2 Algorithm
The general IRS algorithm [MSBS17] requires a synchronization between indi-
vidual threads and the IRS execution environment after each event in order to
check compliance of the current execution with a previously verified trace. Ad-
ditionally, it stores all current admissible traces explicitly, which increases space
requirements and look-up times as the verification advances. With Algorithm 1,
we present an IRS algorithm that can be efficiently implemented. It addresses
both previously described issues by the use of trace prefixes as scheduling con-
straints and allowing threads to run uninterrupted for multiple memory events
whenever scheduling constraints do not require synchronization.
In order to simplify the presentation, it is assumed that the IRS execution
environment enforces sequential consistency independently from scheduling con-
straints. For platforms where this incurs a considerable slowdown, scheduling of
events of the same thread can be relaxed by considering intra-thread scheduling
constraints.
Do Not Synchronize Already Reversed Races. By using trace prefixes
as scheduling constraints, it is possible to avoid synchronization before events
when every possible continuation of the current execution is proven to be error-
free. The corresponding part in an admissible trace does not have to be enforced
and scheduling constraints can be removed.
Instead of managing a set of admissible traces, Algorithm 1 uses a single
trace as the current admissible trace prefix. Every event that occurs in this
prefix has to be executed according to its partial order, however every additional
event may be executed without synchronization. Once the verifier has collected
10
enough information about correct executions of the program, the admissible
trace prefix is updated. In order to prevent unnecessary assumptions on the
verifier, we do not require the use of a specific data structure such as a state
graph. Instead, we only require that the verifier maintains an internal state G
that contains information on safe parts of the state space.
Definition 2. Given a program P and a state predicate error_free() (induced
by the property to be verified), a verifier maintains an internal state G and
provides predicates safeG() and finishedG such that ∀s. safeG(s)⇒ ∀u.∀s′. s u−→
s′ ⇒ error_free(s′) and finishedG holds when verification has finished.
In other words, a correct verifier guarantees safety for all (partial) executions
from a state s whenever safeG(s) holds. We use a derived definition for safety of
trace prefixes o that guarantees safety for all executions satisfying o: safeG(o)
holds if ∀s0 ∈ Sinit .∀s′.∀u. s0 u−→ s′ ⇒ ((u ≈ o ⇒ safeG(s′)) ∧ (u 4 o ⇒
error_free(s′))). An implementation of the verifier may, for example, use an
abstract reachability tree [WKO13] to realize G.
The current admissible trace prefix oadm is updated by shortening it, i.e.,
by removing constraints at the end of its happens-before order. Formally, a
new admissible trace prefix o′ is required to satisfy o′ < oadm , which is defined
as: (E1, C1,→1) < (E2, C2,→2) if E1 ( E2 ∧ C1 = {c : (_, c,_) ∈→1} →1=
{(e1, c, e2) ∈→2: e1, e2 ∈ E1} ∧ ∀e1 →2 e2. e1 /∈ E1 ⇒ e2 /∈ E1. On a more
abstract level, the verifier finds an initial, complete, and correct trace o of the
program and generates a sequence o > o1 > . . . > on of subsequent trace prefixes
such that for all 1 ≤ i ≤ n, safeG(oi).
A verifier can update a trace prefix o as follows. Each edge (e1, e2) with
tid(e1) 6= tid(e2) in o is interpreted as a scheduling constraint that requires e2 to
be executed after e1. Updates of trace prefixes remove scheduling constraints.
Let o′ be o with e1, e2, and all their successors (w.r.t. the happens-before
relation) removed. It is safe to remove the scheduling constraint (e1, e2) if
all states s that are reachable by a linearization of o′ are safe, i.e., safeG(s).
Depending on the verification approach used, it may be more efficient to delay
the removal of (e1, e2) until it occurs at an end of o, w.r.t. the happens-before
relation, i.e., no event happens after e2 that has an incoming or outgoing edge
with an event from an other thread.
In the worst case, even if scheduling constraint (e1, e2) is at the end of a
trace prefix, the verifier has to prove safety for exponentially many states before
(e1, e2) can be safely removed. On the one hand, this complexity is a general
limitation of IRS. On the other hand, the duty of the verifier can be reduced
exponentially by adding only one scheduling constraint, which may reduce the
verification delay considerably.
Do Not Preempt Minimal Events. In addition to the use of trace
prefixes, Algorithm 1 omits synchronization before events that do not have to
occur second in a race, i.e., events that do not have a predecessor in oadm from
a different thread, i.e., events {e ∈ o : ∀e′ ∈ o. e′ c−→o e⇒ tid(e′) = tid(e)}.
The execution environment of Algorithm 1 reduces the number of synchro-
nizations by permitting a sequence of events, potentially from multiple threads,
11
between two synchronizations. This sequence is chosen from the set free(s0, u, o)
as a continuation of the current execution (s0, u) that adheres to o or contains
only synchronization-free events.
Definition 3. Given an execution (s0, u) with u = e1 . . . en, the set of
synchronization-free events, free(s0, u, o), is defined as free(s0, u, o) := {v ∈
(Σ × N)+ : ∃s. s0 u·v−−→ s ∧ ∀1 ≤ i ≤ n. (ei /∈ o ∨ (s0, u · e1 . . . ei) 4 o).
In Algorithm 1, verifier and execution environment are executed concur-
rently. The execution environment can be executed several times during a single
run of the verifier.
3.3 Correctness and Progress
An IRS algorithm is correct if only safe executions, i.e., executions that do not
violate the safety specification, can occur under its execution environment. The
following theorem provides correctness of Algorithm 1.
Theorem 1 (Correctness). Whenever an execution (s0, u) with u = e1 . . . en
and s0
e1−→ . . . en−→ sn has been executed by Algorithm 1, all visited states are
error-free, i.e., ∀0 ≤ i ≤ n. error_free(si).
Proof. Let (s0, u) with u = e1 . . . en and s0
e1−→ . . . en−→ sn be an execution
executed by Algorithm 1, let oadm be the final admissible trace prefix, and G the
final verifier state. By the definition of free(), there exists a prefix v = e1 . . . ekof
u such that v ≈ oadm . Algorithm 1 in line 5 ensures that safeG(oadm). By the
definition of safeG(), safeG(sk) and error_free(si) holds for all 0 ≤ i ≤ k. The
verifier guarantees that safeG(sk) implies error_free(si) for all k ≤ i ≤ n.
In addition to correctness, an important requirement is that a program is
never completely blocked by scheduling constraints (provided that at least one
correct execution exists). The following progress theorem guarantees that this
cannot happen with Algorithm 1.
Theorem 2 (Progress). Whenever an execution (s0, u) has been executed by
Algorithm 1 with verified state graph G and admissible trace prefix oadm , either
the program has terminated or free(s0, u, oadm) is not empty.
Proof. Assume that the program has not terminated. Let oadm be the initial
admissible trace prefix and let o′adm be the current admissible trace prefix. Case
(s0, u) 4 o′adm and not (s0, u) ≈ o′adm : as oadm is a feasible trace and o′adm <
oadm (no constraints can be added), there exists some e ∈ o′adm such that
u · e 4 oadm . By definition, e ∈ free(s0, u, o′adm). Case not (s0, u) 4 o′adm : by
correctness, (s0, v) 4 o′adm for some prefix v of u. Hence, for any e such that
(s0, v·) is an execution, e ∈ free(s0, u, o′adm).
12
4 Implementation
We have implemented Algorithm 1 from Section 3 in our IRS proto-
type [MSBS17]. This prototype handles C and C++ programs translated
to LLVM-IR. The LLVM-IR code is instrumented via the LLVM compiler
infrastructure [ZZZa] in order to enforce an admissible trace prefix whenever
the program is executed. The IRS execution environment is realized completely
inside the instrumented application program and does not depend on any
modifications of the operating system or assumptions on the used scheduler.
Via a standard dependency analysis the prototype identifies all dependent
memory accesses, which are memory accesses that ether directly access
global memory or may influence the result of an other global memory access.
Scheduling constraints are enforced by callbacks directly before each dependent
memory access that check whether this memory access is currently permitted.
Callbacks directly after each dependent memory access communicate to other
threads that the memory access has been performed. Memory fences inside
these callbacks ensure sequential consistency, as assumed by our presentation in
Section 3.2. Before each instrumented memory access, a thread checks whether
an event of an other thread has to occur before its own upcoming event via a
look-up in a global vector clock. Busy waiting is performed until the current
thread is permitted to continue. After the memory access, the callback signals
that the memory access is completed by updating the global vector clock. In
contrast to earlier versions of our prototype, no thread is added to the program.
When testing several alternatives of implementing schedule enforcement, we
observed that, as expected, lock-based implementations of waiting for other
threads’ events is much slower than busy waiting. A disadvantage of busy wait-
ing is CPU consumption during waiting, which can reduce performance when
more threads are active than hardware cores are available. We expect that im-
provements over our current, simple scheme of busy-waiting for permissions can
be made by the use of a more advanced combination of busy waiting with lock-
based synchronization or scheduler interaction (e.g., the POSIX sched_yield()
system call). Additionally, we tested an implementation that uses a loadable
kernel module to communicate with the Linux scheduler. Whenever an event
is not yet permitted to be executed, the corresponding task’s state is set to
TASK_WAIT and only restored once the event is permitted. This design cir-
cumvents the additional CPU consumption of busy waiting. However, additional
overhead appears because the current program counter of each thread has to be
communicated to the loadable kernel module. In our tests, this design showed
only an advantage if most events were constrained, i.e., the likelihood that an
event has to wait is high.
5 Experimental Evaluation
Enforcing scheduling constraints in order to disable schedules outside of a given
admissible trace is likely to incur execution time overhead (here: simply over-
13
0.95
1
1.05
1.1
1.15
1.2
1.25
0%20%40%60%80%100%
R
el
at
iv
e
ex
ec
ut
io
n
ti
m
e
Scheduling constraints
Bigshot
Dekker
Fibonacci
Lamport
Peterson
Shared Pointer
uninstrumented
Figure 3: Execution time overhead of IRS relative to uninstrumented bench-
marks for decreasing numbers of scheduling constraints (two-threaded bench-
marks)
head) in comparison to plain program executions (without IRS). A crucial factor
for the applicability of IRS in practice is how scheduling constraints in IRS in-
fluence this overhead, which we evaluate on several benchmark programs. The
main goal of this evaluation is to investigate whether, for a given admissible
trace and induced scheduling constraints, relaxing those constraints reduces the
overhead and, if this is the case, how fast. Additionally, we investigate whether
the selection of the initial and following admissible traces, i.e., the structure of
the admissible trace prefix, influences the overhead.
Setup. All experiments have been conducted with our IRS implementa-
tion described in Section 4. The hardware used is an Intel Core i5-6500 CPU
at 3.20GHz with four cores running Linux 4.8.0. Each benchmark is run with
and without instrumentation by our prototype. The instrumented version is run
in several configurations, with a decreasing amount of scheduling constraints.
The initial number of scheduling constraints and the number of scheduling con-
straints that can be removed in one step, and thereby the number of config-
urations per benchmark, vary as the number of conflicting memory accesses
varies among benchmarks. Each configuration is run 1000 times. We report
the median execution time and overhead relative to the unmodified benchmark.
14
110
100
0%20%40%60%80%100%
R
el
at
iv
e
ex
ec
ut
io
n
ti
m
e
(l
og
.
sc
al
e)
Scheduling constraints
Indexer
Indexer-Opt
Last Zero
Last Zero-Opt
uninstrumented
Figure 4: Execution time overhead of IRS relative to uninstrumented bench-
marks for decreasing numbers of scheduling constraints (many-threaded bench-
marks)
Detailed measurement results are shown in Appendix A.
Benchmark set 1. The first set of benchmarks are concurrent programs
from the SV-COMP benchmark suite [ZZZb] and the POR literature (Shared
Pointer, [GFYS07]). We chose these benchmarks because they are well-studied
verification problems and contain a high amount of concurrent interaction, which
is expected to highlight performance issues of IRS. All benchmarks contain two
threads. The corresponding results are shown in Fig. 3. For these benchmarks,
IRS produces a maximum overhead of 22%, which is much less than we ex-
pected and might be already an acceptable overhead for certain applications.
For all benchmarks, the overhead is reduced by relaxing scheduling constraints,
albeit in some cases, a significant reduction occurs only at the last reduction
step. In some cases, the overhead is negative, i.e., the instrumented version
of a benchmark executed faster than the plain benchmark. We conjecture that
both measurement noise as well as improved timing of cache operations due to a
different interleaving of memory operations may be relevant for this effect, as al-
ready noted by Olszewski et al. [OAA09]. Similarly, an increased overhead after
removing scheduling constraints could be caused in such a way. Overall, both
the initial overhead and the amount of reductions are lower than we expected.
15
Benchmark set 2. Since we expected a higher overhead, we conduct the
same experiment on two benchmarks from the POR literature (Indexer [FG05]
with 15 threads and Last Zero [AAJS14] with 16 threads), where we expect
a higher overhead as a larger amount of threads and dependencies result in a
higher amount of scheduling constraints. Fig. 4 shows the corresponding results.
Indeed, for the Indexer and Last Zero benchmarks, the overhead is much higher.
Interestingly, the overhead for Indexer abruptly decreases from 1904% to 61%
at the transition from 3 to 2 scheduling constraints. We explain this observation
by the fact that the permitted trace prefix with 3 scheduling constraints requires
3 threads to wait, while after removing 1 scheduling constraint, only 2 threads
have to wait. Since our implementation uses busy waiting, many concurrently
waiting threads may prevent threads that are not required to wait from quickly
proceeding.
Structure of scheduling constraints. An interesting question is whether
the overhead can be reduced by choosing a different trace prefix with roughly
the same amount of scheduling constraints. Interestingly, we have found op-
timized traces for both Indexer and Last Zero that indeed show a drastically
reduced overhead with the same or even more scheduling constraints. The cor-
responding results are depicted as Indexer-Opt and Last Zero-Opt in Fig. 4.
For Indexer, we found that choosing a trace prefix that requires less threads
to wait can be executed faster. Fig. 5 shows two alternative traces for In-
dexer. Nodes represent events and edges a happens-before relation. The nodes
of even-indexed threads are shown in gray and events of the same thread are
arranged vertically one below the other. Fig. 5a shows one of the slower trace
prefixes, where many threads wait rarely, and Fig. 5b shows one of the faster
(optimized) trace prefixes, where few threads wait often. For the former trace
prefix, 16 Mazurkiewicz traces, for the latter trace prefix, only 8 Mazurkiewicz
traces have to be verified. Although more scheduling constraints are enforced,
the program execution is faster with the latter trace prefix. While we optimize
trace prefixes manually, it is conceivable that verifiers can prioritize faster trace
prefixes automatically, e.g., by applying a heuristic or by testing few traces and
comparing their overhead. Such a prioritization resembles the effects of order-
ing heuristics on the performance of POR algorithms studied by Lauterburg
et al. [LKMA10]. For Last Zero, our original trace prefixes require the second
event of a worker thread to wait for the first event of the next worker thread. By
letting threads wait already before their first events, the program execution is
drastically accelerated already for 100% scheduling constraints, i.e., when only
a single Mazurkiewicz trace is verified.
Summary. Our results show that relaxing scheduling constraints can reduce
the overhead for all benchmarks. For example, after verifying only 8 of 4096
Mazurkiewicz traces of Indexer, the overhead is reduced from 2841% to 48%.
However, in other cases, the execution time may not decrease considerably until
a large part of all scheduling constraints have been removed. In yet other cases,
the overhead is reduced considerably by removing a single scheduling constraint,
while it does not change considerably before and after this step. Besides the
number of scheduling constraints, the choice of the permissible trace prefix, i.e.,
16
(a) Many threads wait rarely (b) Few threads wait often (optimized)
Figure 5: Trace prefixes for Indexer (threads with only conflict-free events are
omitted)
the structure of the induced scheduling constraints, may have a large influence
on the overhead. These observations suggest that a sensible selection of an
initial trace during verification can considerably improve the execution time
performance of a program that is executed with IRS. Comparing our current
results for Indexer and Last Zero to earlier experiments with a less optimized
schedule enforcement [MSBS17], we see a considerable speed-up when optimized
trace prefixes are used.
6 Related Work
Deterministic multi-threading (DMT) [BAD+10, CSL+13, CWG+11, LCB11,
OAA09, AWHF10] limit the amount of non-determinism due to scheduling
for multi-threaded programs. Dthreads by Liu et al. [LCB11] adapts the
interface of the multi-threading library Pthreads and guarantees, for any
given input, a deterministic execution. Dthreads interleaves parallel phases
(in which threads write only to a local copy of the shared memory) and
sequential phases (in which the local copies are merged). Dthreads cannot
handle programs that bypass the Pthreads library by synchronizing directly
over shared memory [LCB11].
Cui et al. propose Peregrine [CWG+11], which initially records a set of ex-
ecutions and enforces schedules of these initial executions during subsequent
executions where these schedules are compatible. Schedules may be incompat-
ible if an input is seen that leads to a different schedule. In the subsequently
presented Parrot framework [CSL+13], Cui et al. propose to combine DMT with
a model checker for bug-finding. Parts of a program that are manually marked
as performance-critical are executed non-deterministically and model checked
to increase the confidence about their correctness. Only the remaining parts of
the program are executed deterministically, so that the overhead of additional
synchronization is reduced.
In contrast to IRS, the above described DMT approaches do not provide
any guarantees about which schedule is enforced, for a particular input. Us-
ing these approaches to simplify program verification is therefore impractical if
many program inputs need to be covered. While we conjecture that some of the
17
former techniques can be extended to communicate a general scheduling policy
that guides a verifier, it is not directly clear how to do so. In contrast, IRS pro-
vides a formal interface that uses admissible traces to communicate scheduling
constraints. Additionally, the above described DMT approaches do not allow to
relax scheduling constraints during runtime, in contrast to IRS, which enables to
iteratively relax scheduling constraints and, provided that the program is even-
tually proven safe, remove all scheduling constraints. On the implementation
level, the approaches of [OAA09,LCB11,CSL+13] (but not [BAD+10,CWG+11])
synchronize only at library calls (such as uses of Pthreads locks), which improves
execution time performance but may result in non-determinstic executions when
global memory is accessed (perhaps accidentally) directly, e.g., without lock pro-
tection. In contrast, our IRS implementation schedules all accesses to shared
variables.
Program analyses that use context bounding [QR05] consider only those ex-
ecutions of a program which contain only up to k context switches between
threads, for a typically small bound k. While reachability for concurrent, re-
cursive programs is undecidable [Ram00], additionally bounding the number of
context switches makes the problem decidable [QR05]. Context bounding may
be used within IRS, although it is only a special case of scheduling constraints
in IRS. Similar to context bounding, a generally undecidable model checking
problem may become decidable when handled with IRS: by only checking a
limited set of symbolic traces, IRS enables to safely use a program even if its
reachability problem is undecidable under unconstrained scheduling.
When applied with bounded model checking (BMC) for concurrency bug
finding [RG05,CF11,MQ07,LR09,ITF+14], context bounding focuses the search
for erroneous schedules to those with few context switches. Consequently, poten-
tial bugs are missed that manifest themselves only after more context switches
than the current bound. However, based on empirical results, Musuvathi and
Qadeer argue that a low context bound is sufficient to find many interesting
bugs [MQ07]. They propose iterative context bounding (ICB) as an extension
to BMC: a program is iteratively checked with an increasing context bound,
similar to increasing the bound on execution lengths in BMC. Given limited
resources (that usually do not allow to search the complete state space of a pro-
gram), ICB prioritizes schedules with few context switches. This search strategy
of ICB could be used by a model checker in conjunction with IRS. However, in
contrast to bug finding based on BMC, IRS requires a sound program analy-
sis (under scheduling constraints), i.e., a safety proof for complete, unbounded
program executions, which is not given, in general, by BMC. Another difference
between context bounding in bug finding and IRS are guarantees about schedul-
ing: when searching for erroneous schedules, bug finding may use assumptions
about the likelihood of schedules in order to guide the search. However, any
assumptions about the likelihood of schedules are not enforced. Bug finding
consequently accepts to miss feasible executions of a program that contain, e.g.,
a bug that has not been found under context bounding. In contrast, IRS guar-
antees that only checked executions may occur.
Nguyen et al. [NSF+17] transform a concurrent program into several in-
18
stances that show only a reduced number of schedules. Each instance is checked
individually by BMC (and with a context bound). Similar to IRS, this de-
creases the complexity of the model checking problem and improves bug find-
ing. However, their approach of dividing a program into instances is based on
lazy sequentialization for BMC [FIP13] and therefore not directly usable for
verification.
Conditional model checking [BHKW12] is a general framework to reuse (in
general arbitrary) intermediate verification results. In contrast to IRS, it does
not require intermediate verification results to prove safety of a fully functional
program variant and does not enforce the preconditions of the intermediate
result.
Partial order reduction (POR) [GKW+15,WKO13,FG05,GFYS07,AAJS14]
identifies equivalent executions in order to verify a sufficiently large subset.
It is orthogonal to IRS and can be used in conjunction, as we demonstrate
via symbolic traces. IRS differs from other verification techniques that handle
thread scheduling explicitly in ensuring a safe program execution as soon as a
single symbolic trace is found.
In addition to scheduling, a source of non-determinism are relaxed memory
models in modern architectures. In [BM08], a memory monitoring approach is
proposed to make sure that sequential consistency is maintained during the ex-
ecution of a program. Fang et al. [FLM03] present an automated memory fence
insertion technique to enforce SC using instrumentation at the source code level.
In both cases, the program can be safely verified under the assumption that SC
holds with a reduced state space. Similarly to IRS, these approaches restrict
the amount of non-determinism. However, in contrast to IRS, they are not
able to dynamically adapt the amount of non-determinism and are restricted to
non-determinism due to relaxed memory access. Slightly related, synchroniza-
tion synthesis, for example presented by Gupta et al. [GHR+15], automatically
inserts locks and other synchronization primitives that are more powerful than
fences in that also scheduler-related non-determinism can be eliminated. How-
ever, this technique may introduce deadlocks into a program [GHR+15], hence
it is unsuitable for IRS, where we have to rely on the fact that a verified schedule
does not limit the program’s functionality.
7 Conclusion
Iteratively Relaxed Scheduling enables to adjust both the amount of scheduler-
related non-determinism and the size of the relevant part of a program’s state
space to be verified. This paper discusses issues of how to efficiently implement
IRS in terms of execution time overhead, i.e., how to efficiently encode and
enforce scheduling constraints. Furthermore, we formalize the requirements on
verifiers for IRS. Support for non-terminating programs is left for future work.
Our experimental results show that iteratively relaxing scheduling con-
straints can reduce execution time overhead. Thereby, we give evidence that
IRS indeed allows to adjust both the verification delay and the incurred
19
execution time overhead in order to find a sweet spot. Interestingly, we found
cases in which a much earlier reduction of execution time overhead is obtained
by choosing favorable scheduling constraints, which suggests that execution
time performance does not simply rely on the number of scheduling constraints
but to a large extend also on their structure.
References
[AAdlB+17] Elvira Albert, Puri Arenas, Maria Garcia de la Banda, Miguel
Gómez-Zamalloa, and Peter J. Stuckey. Context-sensitive dynamic
partial order reduction. In Rupak Majumdar and Viktor Kuncak,
editors, Computer Aided Verification - 29th International Confer-
ence, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceed-
ings, Part I, volume 10426 of Lecture Notes in Computer Science,
pages 526–543. Springer, 2017.
[AAJS14] Parosh Aziz Abdulla, Stavros Aronis, Bengt Jonsson, and Kon-
stantinos F. Sagonas. Optimal dynamic partial order reduction.
In Suresh Jagannathan and Peter Sewell, editors, The 41st Annual
ACM SIGPLAN-SIGACT Symposium on Principles of Program-
ming Languages, POPL ’14, San Diego, CA, USA, January 20-21,
2014, pages 373–384. ACM, 2014.
[AWHF10] Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford. Ef-
ficient system-enforced deterministic parallelism. In Remzi H.
Arpaci-Dusseau and Brad Chen, editors, 9th USENIX Symposium
on Operating Systems Design and Implementation, OSDI 2010,
October 4-6, 2010, Vancouver, BC, Canada, Proceedings, pages
193–206. USENIX Association, 2010.
[BAD+10] Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan
Grossman. Coredet: a compiler and runtime system for determin-
istic multithreaded execution. In James C. Hoe and Vikram S.
Adve, editors, Proceedings of the 15th International Conference on
Architectural Support for Programming Languages and Operating
Systems, ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March
13-17, 2010, pages 53–64. ACM, 2010.
[BHKW12] Dirk Beyer, Thomas A. Henzinger, M. Erkan Keremoglu, and
Philipp Wendler. Conditional model checking: a technique to pass
information between verifiers. In Will Tracz, Martin P. Robil-
lard, and Tevfik Bultan, editors, 20th ACM SIGSOFT Symposium
on the Foundations of Software Engineering (FSE-20), SIGSOFT-
/FSE’12, Cary, NC, USA - November 11 - 16, 2012, page 57.
ACM, 2012.
20
[BM08] Sebastian Burckhardt and Madanlal Musuvathi. Effective pro-
gram verification for relaxed memory models. In Aarti Gupta and
Sharad Malik, editors, Computer Aided Verification, 20th Inter-
national Conference, CAV 2008, Princeton, NJ, USA, July 7-14,
2008, Proceedings, volume 5123 of Lecture Notes in Computer Sci-
ence, pages 107–120. Springer, 2008.
[CF11] Lucas C. Cordeiro and Bernd Fischer. Verifying multi-threaded
software using smt-based context-bounded model checking. In Pro-
ceedings of the 33rd International Conference on Software Engi-
neering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28,
2011, pages 331–340, 2011.
[CGMP99] Edmund M. Clarke, Orna Grumberg, Marius Minea, and Doron
Peled. State space reduction using partial order techniques. STTT,
2(3):279–287, 1999.
[CSL+13] Heming Cui, Jirí Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xi-
nan Xu, Junfeng Yang, Garth A. Gibson, and Randal E. Bryant.
Parrot: a practical runtime for deterministic, stable, and reliable
threads. In Michael Kaminsky and Mike Dahlin, editors, ACM
SIGOPS 24th Symposium on Operating Systems Principles, SOSP
’13, Farmington, PA, USA, November 3-6, 2013, pages 388–405.
ACM, 2013.
[CWG+11] Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, and Jun-
feng Yang. Efficient deterministic multithreading through schedule
relaxation. In Ted Wobber and Peter Druschel, editors, Proceed-
ings of the 23rd ACM Symposium on Operating Systems Principles
2011, SOSP 2011, Cascais, Portugal, October 23-26, 2011, pages
337–351. ACM, 2011.
[FG05] Cormac Flanagan and Patrice Godefroid. Dynamic partial-order
reduction for model checking software. In Jens Palsberg and Martín
Abadi, editors, Proceedings of the 32nd ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, POPL 2005,
Long Beach, California, USA, January 12-14, 2005, pages 110–
121. ACM, 2005.
[FIP13] Bernd Fischer, Omar Inverso, and Gennaro Parlato. Cseq: A con-
currency pre-processor for sequential C verification tools. In Ewen
Denney, Tevfik Bultan, and Andreas Zeller, editors, 2013 28th
IEEE/ACM International Conference on Automated Software En-
gineering, ASE 2013, Silicon Valley, CA, USA, November 11-15,
2013, pages 710–713. IEEE, 2013.
[FLM03] Xing Fang, Jaejin Lee, and Samuel P. Midkiff. Automatic fence
insertion for shared memory multiprocessing. In Utpal Baner-
jee, Kyle Gallivan, and Antonio González, editors, Proceedings of
21
the 17th Annual International Conference on Supercomputing, ICS
2003, San Francisco, CA, USA, June 23-26, 2003, pages 285–294.
ACM, 2003.
[GFYS07] Guy Gueta, Cormac Flanagan, Eran Yahav, and Mooly Sagiv.
Cartesian partial-order reduction. In Dragan Bosnacki and Stefan
Edelkamp, editors, SPIN, volume 4595 of Lecture Notes in Com-
puter Science, pages 95–112. Springer, 2007.
[GHR+15] Ashutosh Gupta, Thomas A. Henzinger, Arjun Radhakrishna,
Roopsha Samanta, and Thorsten Tarrach. Succinct representa-
tion of concurrent trace sets. In Sriram K. Rajamani and David
Walker, editors, POPL, pages 433–444. ACM, 2015.
[GKW+15] Shengjian Guo, Markus Kusano, Chao Wang, Zijiang Yang, and
Aarti Gupta. Assertion guided symbolic execution of multi-
threaded programs. In Elisabetta Di Nitto, Mark Harman, and
Patrick Heymans, editors, Proceedings of the 2015 10th Joint Meet-
ing on Foundations of Software Engineering, ESEC/FSE 2015,
Bergamo, Italy, August 30 - September 4, 2015, pages 854–865.
ACM, 2015.
[God96] Patrice Godefroid. Partial-Order Methods for the Verification of
Concurrent Systems - An Approach to the State-Explosion Prob-
lem, volume 1032 of Lecture Notes in Computer Science. Springer,
1996.
[ITF+14] Omar Inverso, Ermenegildo Tomasco, Bernd Fischer, Salvatore La
Torre, and Gennaro Parlato. Bounded model checking of multi-
threaded C programs via lazy sequentialization. In Armin Biere
and Roderick Bloem, editors, Computer Aided Verification - 26th
International Conference, CAV 2014, volume 8559 of Lecture Notes
in Computer Science, pages 585–602. Springer, 2014.
[KSH12] Kari Kähkönen, Olli Saarikivi, and Keijo Heljanko. Using un-
foldings in automated testing of multithreaded programs. In
Michael Goedicke, Tim Menzies, and Motoshi Saeki, editors,
IEEE/ACM International Conference on Automated Software En-
gineering, ASE’12, Essen, Germany, September 3-7, 2012, pages
150–159. ACM, 2012.
[KSH14] Kari Kähkönen, Olli Saarikivi, and Keijo Heljanko. Unfolding
based automated testing of multithreaded programs. Automated
Software Engineering, pages 1–41, 2014.
[LCB11] Tongping Liu, Charlie Curtsinger, and Emery D. Berger. Dthreads:
efficient deterministic multithreading. In Ted Wobber and Peter
Druschel, editors, Proceedings of the 23rd ACM Symposium on
22
Operating Systems Principles 2011, SOSP 2011, pages 327–336.
ACM, 2011.
[LKMA10] Steven Lauterburg, Rajesh K. Karmani, Darko Marinov, and Gul
Agha. Evaluating ordering heuristics for dynamic partial-order re-
duction techniques. In David S. Rosenblum and Gabriele Taentzer,
editors, FASE, volume 6013 of Lecture Notes in Computer Science,
pages 308–322. Springer, 2010.
[LR09] Akash Lal and Thomas W. Reps. Reducing concurrent analysis
under a context bound to sequential analysis. Formal Methods in
System Design, 35(1):73–97, 2009.
[Maz86] Antoni W. Mazurkiewicz. Trace theory. In Wilfried Brauer, Wolf-
gang Reisig, and Grzegorz Rozenberg, editors, Advances in Petri
Nets, volume 255 of Lecture Notes in Computer Science, pages
279–324. Springer, 1986.
[McM92] Kenneth L. McMillan. Using unfoldings to avoid the state explo-
sion problem in the verification of asynchronous circuits. In Gre-
gor von Bochmann and David K. Probst, editors, Computer Aided
Verification, Fourth International Workshop, CAV ’92, Montreal,
Canada, June 29 - July 1, 1992, Proceedings, volume 663 of Lecture
Notes in Computer Science, pages 164–177. Springer, 1992.
[MQ07] Madanlal Musuvathi and Shaz Qadeer. Iterative context bounding
for systematic testing of multithreaded programs. In Jeanne Fer-
rante and Kathryn S. McKinley, editors, Proceedings of the ACM
SIGPLAN 2007 Conference on Programming Language Design and
Implementation, San Diego, California, USA, June 10-13, 2007,
pages 446–455. ACM, 2007.
[MSBS17] Patrick Metzler, Habib Saissi, Péter Bokor, and Neeraj Suri. Quick
verification of concurrent programs by iteratively relaxed schedul-
ing. In Grigore Rosu, Massimiliano Di Penta, and Tien N. Nguyen,
editors, ASE, pages 776–781. IEEE, 2017.
[NSF+17] Truc L. Nguyen, Peter Schrammel, Bernd Fischer, Salvatore La
Torre, and Gennaro Parlato. Parallel bug-finding in concurrent
programs via reduced interleaving instances. In Grigore Rosu,
Massimiliano Di Penta, and Tien N. Nguyen, editors, Proceedings
of the 32nd IEEE/ACM International Conference on Automated
Software Engineering, ASE 2017, Urbana, IL, USA, October 30 -
November 03, 2017, pages 753–764. IEEE Computer Society, 2017.
[OAA09] Marek Olszewski, Jason Ansel, and Saman P. Amarasinghe.
Kendo: efficient deterministic multithreading in software. In Pro-
ceedings of the 14th International Conference on Architectural Sup-
port for Programming Languages and Operating Systems, ASPLOS
23
2009, Washington, DC, USA, March 7-11, 2009, pages 97–108,
2009.
[QR05] Shaz Qadeer and Jakob Rehof. Context-bounded model check-
ing of concurrent software. In Nicolas Halbwachs and Lenore D.
Zuck, editors, Tools and Algorithms for the Construction and Anal-
ysis of Systems, 11th International Conference, TACAS 2005, vol-
ume 3440 of Lecture Notes in Computer Science, pages 93–107.
Springer, 2005.
[Ram00] G. Ramalingam. Context-sensitive synchronization-sensitive anal-
ysis is undecidable. ACM Trans. Program. Lang. Syst., 22(2):416–
430, 2000.
[RG05] Ishai Rabinovitz and Orna Grumberg. Bounded model checking
of concurrent programs. In Kousha Etessami and Sriram K. Ra-
jamani, editors, Computer Aided Verification, 17th International
Conference, CAV 2005, Edinburgh, Scotland, UK, July 6-10, 2005,
Proceedings, volume 3576 of Lecture Notes in Computer Science,
pages 82–97. Springer, 2005.
[RSSK15] César Rodríguez, Marcelo Sousa, Subodh Sharma, and Daniel
Kroening. Unfolding-based partial order reduction. In Luca Aceto
and David de Frutos-Escrig, editors, 26th International Conference
on Concurrency Theory, CONCUR 2015, volume 42 of LIPIcs,
pages 456–469. Schloss Dagstuhl - Leibniz-Zentrum fuer Infor-
matik, 2015.
[SRDK17] Marcelo Sousa, César Rodríguez, Vijay D’Silva, and Daniel Kroen-
ing. Abstract interpretation with unfoldings. In Rupak Majumdar
and Viktor Kuncak, editors, Computer Aided Verification - 29th
International Conference, CAV 2017, Heidelberg, Germany, July
24-28, 2017, Proceedings, Part II, volume 10427 of Lecture Notes
in Computer Science, pages 197–216. Springer, 2017.
[Val96] Antti Valmari. The state explosion problem. In Wolfgang Reisig
and Grzegorz Rozenberg, editors, Lectures on Petri Nets I: Basic
Models, Advances in Petri Nets, the volumes are based on the Ad-
vanced Course on Petri Nets, held in Dagstuhl, September 1996,
volume 1491 of Lecture Notes in Computer Science, pages 429–528.
Springer, 1996.
[WKO13] Björn Wachter, Daniel Kroening, and Joël Ouaknine. Verify-
ing multi-threaded software with impact. In Formal Methods in
Computer-Aided Design, FMCAD 2013, Portland, OR, USA, Oc-
tober 20-23, 2013, pages 210–217. IEEE, 2013.
[ZZZa] The LLVM compiler infrastructure. http://llvm.org.
[ZZZb] Collection of verification tasks.
24
A Measurement Results
The following table shows our detailed measurement results. The columns con-
tain the benchmark name (-opt means with optimized trace prefixes), the num-
ber of constraints in the respective trace prefix, the mean execution time in µs
and the execution time overhead compared to the uninstrumented benchmark
version.
Benchmark Constraints Time Overhead%
Continued on next page
bigshot 1 124 5%
bigshot 0 121 3%
dekker 2 115 4%
dekker 1 114 3%
dekker 0 113 2%
fibonacci 98 176 13%
fibonacci 44 169 9%
fibonacci 24 181 12%
fibonacci 0 166 6%
lamport 16 123 12%
lamport 15 123 12%
lamport 10 124 13%
lamport 7 124 13%
lamport 6 124 13%
lamport 4 123 12%
lamport 2 123 12%
lamport 1 124 13%
lamport 0 113 3%
peterson 28 124 8%
peterson 24 125 9%
peterson 22 122 6%
peterson 1 123 7%
peterson 0 113 -2%
shared pointer 3 135 22%
shared pointer 2 134 21%
shared pointer 1 133 20%
shared pointer 0 115 4%
indexer(15) 12 7538 2692%
indexer(15) 8 7603 2716%
indexer(15) 4 6793 2416%
25
indexer(15) 3 5412 1904%
indexer(15) 2 435 61%
indexer(15) 1 299 11%
indexer(15) 0 235 -13%
last zero(16) 15 10664 4288%
last zero(16) 8 5286 2075%
last zero(16) 5 492 102%
last zero(16) 1 263 8%
last zero(16) 0 230 -5%
indexer(15)-opt 12 5558 2841%
indexer(15)-opt 9 279 48%
indexer(15)-opt 6 257 36%
indexer(15)-opt 0 215 14%
last zero(16)-opt 15 378 94%
last zero(16)-opt 8 269 38%
last zero(16)-opt 5 253 30%
last zero(16)-opt 1 250 28%
last zero(16)-opt 0 223 14%
26
