Distributed Symbolic Bounded Property Checking  by Nalla, Pradeep K. et al.
Distributed Symbolic Bounded Property
Checking 1
Pradeep K. Nalla, Roland J. Weiss, Prakash Peranandam,
Ju¨rgen Ruf, Thomas Kropf, Wolfgang Rosenstiel
Wilhelm-Schickard-Institut fu¨r Informatik
Universita¨t Tu¨bingen
Sand 13, 72076 Tu¨bingen, Germany
Abstract
In this paper we describe an algorithm for distributed, BDD-based bounded property checking and
its implementation in the veriﬁcation tool SymC. The distributed algorithm veriﬁes larger models
and returns results faster than the sequential version.
The core algorithm distributes partitions of the state set to computation nodes after reaching a
threshold size. The nodes proceed with image computation on the nodes asynchronously. The
main scalability problem of this scheme is the overlap of state set partitions. We present static and
dynamic overlap reduction techniques.
Keywords: Veriﬁcation, bounded model checking, property checking, binary decision diagrams,
parallelization.
1 Introduction
Although symbolic representations of state spaces [8] based on Binary Decision
Diagrams (BDDs) [7] and bounded model checking (BMC) [3] have dramat-
ically increased the design sizes that can be handled by veriﬁcation tools,
research in model checking techniques still concentrates on enabling faster
veriﬁcation of larger models. Large designs cause memory overﬂow during
exploration of the state space, the dreaded state space explosion. There are
1 This work has been funded in part by the German Research Council (DFG) within
projects GRASP and KOMFORT and by the BMBF and edacentrum within project FEST.
Electronic Notes in Theoretical Computer Science 135 (2006) 47–63
1571-0661 © 2006 Elsevier B.V. 
www.elsevier.com/locate/entcs
doi:10.1016/j.entcs.2005.10.018
Open access under CC BY-NC-ND license.
several proposed solutions to deal with the immense memory requirements
of BDDs. One proposal is to partition BDDs [30] into two or more pieces
and handle them separately during further traversal. The traversal of the
partitions can be done sequentially [10] or in parallel [14].
In [28], a combination of on-the-ﬂy [12] and bounded model checking is
presented, which is implemented in the tool SymC. The checking algorithm
traverses the product automaton of model and property until it either detects
a validation or a violation of the property, or the explicit or implicit time bound
is reached. Only the frontier set is kept in memory, i.e. no ﬁx-point iterations
are performed. This approach performs well for certain classes of models
and properties, but the sequential version also faces memory exhaustion for
large model, e.g. for some of the ISCAS89 examples. This fact motivated the
parallelization of the proof algorithm which we present here.
The paper is organized as follows. The next section discusses related work
and our contributions. Section 3 summarizes symbolic bounded property
checking, followed by a description of the distributed algorithm. Then, we
present our static and dynamic methods for overlap reduction. Section 6 gives
experimental results. Finally, we conclude and mention future work.
2 Related Work
2.1 Partitioning
Many approaches for decomposing Boolean functions represented as BDDs
exist in literature. For distributed veriﬁcation [16,14] splitting algorithms
aim at creating balanced partitions. However, similar approaches exist in se-
quential veriﬁcation methodologies [11,10]. The main distinguishing feature
of these algorithms is the employed cost function for selecting the splitting
variable. The cost functions typically take into account the achieved mem-
ory reduction, the amount of sharing between the cofactors, and the memory
balance of the cofactors. Also, the CUDD package [31] contains various de-
composition algorithms, producing both balanced and unbalanced partitions.
Furthermore, decomposition techniques allow representing the same function
with multiple BDDs but requiring less memory [20,19]. The image computa-
tion algorithms have to be updated for these techniques. The more complex
operations are set oﬀ by the reduced peak memory requirements of the BDDs
[30]. As shown in [4], the reduction can even be exponential. Finally, dense
under-approximations [26,25] try to reduce the memory requirements of the
BDD but still capture a large percentage of the state space. These algorithms
are of minor interest for state set distribution as they result in unbalanced
subsets.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–6348
None of the proposed heuristics consider subsequent state overlap. How-
ever, similar eﬀorts are undertaken for model checkers with an explicit state
graph representation [21,5]. They apply graph algorithms that heuristically
try to ﬁnd partitions with few crossover transitions in order to reduce the
communication eﬀort between processes. In [15], the authors investigate state
space distribution in the context of model checking Petri nets, also employing
an explicit representation. These approaches cannot be directly applied to
symbolic representations.
2.2 Distributed model checking
The state space explosion problem in model checking has raised interest in han-
dling this problem by adjusting the algorithms for distributed environments
recently. This includes both explicit [32,5,17,6] and symbolic [14,16,2,13,18]
model checking methodologies.
The group at Haifa also works on the parallelization of BDD-based veriﬁ-
cation algorithms. At the core, they create k slices of the current state set and
distribute these slices to k cluster machines. They use the slicing technology
from [20], but with an enhanced cost function for selecting slicing variables
[16]. States are classiﬁed as owned and non-owned. After every image compu-
tation step the non-owned states are distributed to the owning nodes. In [16]
load balancing is achieved by adjusting the slices if the initial balance is lost.
In [14] they try to keep only as many nodes busy as necessary by splitting
and joining BDDs on demand. The exchange of non-owned states after every
step makes their algorithm mainly synchronous. In [14,16] reachability is com-
puted with ﬁx-point iterations, in [2] regular expressions are used to indicate
illegal behavior and µ-calculus formulas are checked in [13]. Our approach
checks time-bounded properties speciﬁed in PSL (Property Speciﬁcation Lan-
guage) [1] or FLTL (Finite Linear Time Temporal Logic) [27] without ﬁxpoint
iterations.
2.3 Contributions
Synchronous schemes for parallelizing BDD-based veriﬁcation algorithms re-
duce the potential speedup because processes are kept waiting for others to
complete. Up to now, no successful asynchronous BDD-based veriﬁcation al-
gorithms have been proposed.
The main contribution of our approach is such an asynchronous distributed
algorithm. This algorithm becomes feasible only when the shared states due
to crossover transitions are reduced to avoid duplicate work. We present
algorithms for static and dynamic overlap reduction.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–63 49
Translation to 
formal 
representation
Translation to 
AR-automata
Symbolic 
execution 
engine
System 
description
Property 
description
Accept / reject
properties
SymC
Wittness /
counterexample
Fig. 1. Overview of SymC operation.
3 Sequential Symbolic Bounded Property Checking
The formal veriﬁcation algorithms in [28,22] combine bounded property check-
ing and symbolic traversal. The temporal logic formulas are converted to spe-
cial ﬁnite state machines called Accept-Reject automata (AR-automata) [27].
AR-automata allow ﬁnding violations or validations of properties on ﬁnite se-
quences, thus they are well suited for bounded property checking. The check-
ing algorithm manipulates both the system description and the AR-automata
represented as BDDs. In order to avoid the construction of the complete
transition relation, a set of conjunctively partitioned transition relations is
built, which is used for early quantiﬁcation [9]. The algorithms have been
implemented in the tool SymC, whose general operation is shown in Fig. 1.
An iteration of the sequential veriﬁcation algorithm works in two steps.
First, the successor states of the AR-automata are computed and the ter-
mination condition is checked. If the termination condition is not satisﬁed,
image computation is performed on the system in the second step. During im-
age computation the conjunction of all partitions is built on-the-ﬂy to obtain
the successor state set. Like bounded model checking [3], this property check-
ing algorithm does not traverse the state space exhaustively but examines all
reachable states within a given time bound.
A central optimization technique for the algorithm is state set splitting.
Whenever a threshold for the size of the BDD representing the current state
set is reached, the set is split into disjoint parts and the algorithm continues
working on these subsets in a divide-and-conquer manner.
The sequential veriﬁcation algorithm continues with one of the subsets
and stacks the others. This can happen recursively. Traversal proceeds on the
current subset until the time bound is reached or the termination condition
is satisﬁed. Termination stops the veriﬁcation with ﬁnding either a valida-
tion or a violation of the property. Otherwise, the process is repeated for
all stacked subsets. The termination condition diﬀers if one checks the prop-
erty on all paths, i.e. universal quantiﬁcation, or on one path, i.e. existential
quantiﬁcation. Informally, the sequential termination condition is deﬁned as
follows:
Universal If one reject state is detected in the current state set, a violation
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–6350
of the property is found. If all states in the current state set are accepting
states, a validation of the property is found. Otherwise, the property is still
pending.
Existential If one accept state is detected in the current state set, a vali-
dation of the property is found. If all states in the current state set are
rejecting states, a violation of the property is found. Otherwise, the prop-
erty is still pending.
4 Parallelization of Bounded Property Checking
The distributed checking algorithm is composed of an initial sequential stage
and a subsequent parallel stage. First, the transition relation is created on all
k computation nodes and state space traversal proceeds sequentially on one
node until a threshold limit on the BDD size triggers state set distribution.
The splitting into k subsets is already performed in parallel and every node is
responsible for getting its own disjoint part of the whole state set. The nodes
start state space traversal independently on these subsets. The termination
condition stays the same, however the nodes have to communicate their local
results in order to allow testing termination conditions that depend on all
states.
This simple scheme fails to provide signiﬁcant speedups on many models
because of crossover transitions. These transitions start in a state of the cur-
rent subset but lead to a state that is already present in one of the other state
subsets. We call this phenomenon state set overlap, or just overlap. Of course,
image computation for overlapping states is performed redundantly. As image
computation is one of the key components of formal veriﬁcation tools, redun-
dancy of such a component badly aﬀects the time and memory requirements
of the whole veriﬁcation process. Thus, optimizing the distributed algorithm
concentrates on reducing the overlap (see section 5).
4.1 State set distribution
Splitting the state set into k parts for subsequent traversal in parallel is a
costly operation. Therefore, we already perform it in parallel. For simplicity
we assume that k = 2n, n ∈ N. Basically, once the ﬁrst node dumps its state
set to disk, all other nodes pick up the dumped set after notiﬁcation. Then,
each node splits the set into two parts and depending on its rank, a number
identifying every node, it drops one part and continues splitting on the other
part recursively until only its own subset remains. The algorithm is illustrated
in Fig. 2.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–63 51
// get subset i of k slices from state set S
getSubset(in: S, k, i; out: Si)
  Si := S
for j := 1 .. log2(k)
split(Si; g, h)
    // skip h on odd bit, skip g on even bit
if i % 2 = 1 then Si := g else Si := h
    i := i / 2  // get next bit
// split state set S into two slices g and h
split(in: S; out: g, h)
S
Srank = g
Srank = h
Srank = g
g
h
g
skip h
skip h
skip g
Fig. 2. Algorithm for state set distribution. The left hand side gives the distribution algorithms,
an example application is shown on the right hand side.
4.2 State set overlap
After all nodes picked their state subsets, the nodes proceed with symbolic
state space traversal. A very important observation is that after a few steps
of traversal state overlap between network nodes may emerge.
Deﬁnition 1 Let S be a set represented using a BDD. Then ‖S‖ denotes the
number of states in S, which is given by the number of maximal minterms of
the BDD.
Deﬁnition 2 Let S be a nonempty set and S1, . . . , Sk ⊆ S with k ≥ 2. Then
we deﬁne the state overlap ok ∈ [0, 1] of these partitions as:
ok =
∑k−1
i=1
∑k
j=i+1 ‖Si ∩ Sj‖
‖S‖
∑k−1
i=1 i
. (1)
The overlap is thus the normalized average of states in the pairwise inter-
section of subset permutations. The sum in the denominator ranges from 1 to
k − 1 because this yields the number of pairs Si, Sj with i < j. An overlap of
ok = 0 corresponds to disjoint partitions and an overlap of ok = 1 corresponds
to partitions containing the same states.
5 Overlap Reduction
Boolean functions represent all the state sets and the transition relation in
symbolic traversal. This representation can grow large if the sets to be rep-
resented are big, corresponding directly to more memory requirements. The
Boolean functions are represented and manipulated using BDDs. The mem-
ory requirements |f | of a Boolean function f are deﬁned as the number of
its nodes. In order to reduce the memory requirements one can partition a
Boolean function into smaller parts, whose union is the whole set.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–6352
SS1 S2
Imagen(S1) Imagen(S2)
v v
n traversal 
steps
S
S1 S2
Imagen(S1)=Imagen(S2)
v v
n traversal 
steps
S
S1 S2
Imagen(S1) Imagen(S2)
v v
n traversal 
steps
(a) Tv depends on v only (b) Tv depends on ei (c) Tv depends on ei
Fig. 3. Possible overlap of subsets after n steps with dependencies on the splitting variable.
Deﬁnition 3 Given a Boolean function f : Bn → B, f is partitioned into
two functions f1 and f2 on a variable v from the support set of f with
f = f1 ∨ f2 where f1 = v ∧ fv, f2 = v¯ ∧ fv¯. (2)
The splitting variable v deﬁnes the partitioning of f into f1 and f2. This
splitting can be implemented easily with BDD operations. BDDs are com-
pressed decision trees where common subtrees are joined. This causes sig-
niﬁcant sharing of nodes in a function’s representation. Thus, splitting a
function f into two functions f1 and f2 with a poor choice of v may not nec-
essarily reduce the memory requirements of the split functions and can result
in |f1| ≈ |f2| ≈ |f |. In the following discourse, we identify a state set with its
characteristic function represented as BDD.
5.1 Static overlap reduction
Overlap originates from states in diﬀerent sets having transitions to the same
next states. In order to minimize the overlap of splits, the selected splitting
variable v should not allow states that have common next states to be in
diﬀerent splits. In other words, v should partition the states such that they
have no common next states. However, in reality such a partitioning is not
possible, but one can put some eﬀort in selecting the splitting variable v to
minimize overlap. For ﬁnding a good splitting variable we statically analyze
the design which is represented as ﬁnite state machine (FSM).
Deﬁnition 4 A FSM A is a 4-tuple A = (S,Σ, T , I), where S = {s1, . . . , sn}
is a ﬁnite set of states encoded by state variables e1, . . . , em, Σ is a ﬁnite input
alphabet, T ⊆ S × Σ× S is a transition relation represented with T1, . . . , Tm
partitions, and I ⊆ S is the set of initial states.
The idea of selecting a good splitting variable v relies on the conjunctively
partitioned transition relation T [9]. For every i ∈ 1, . . . , m a partition Ti of
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–63 53
the transition relation corresponds to the truth value of next state variable
e′i such that T =
∧m
i=1 Ti. We pick v from the set of state variables E =
{e1, . . . , em}. Fig. 3 (a) shows the best case where there is no overlap. This
kind of situation is only possible if v will stick to its truth value in all further
steps, i.e. partition Tv (Ti with v = ei) depends only on v. Though this is the
ideal case, we hardly have such situations in real designs. This means that v
might change its truth value in future steps as its partition of the transition
relation depends on more factors. The worst case of almost complete overlap
can occur if Tv depends on input variables disjunctively only, as depicted
by Fig. 3 (b). The common case lies in between these two extremes and
happens when v depends on inputs conjunctively with other combinations of
state variables, depicted in Fig. 3 (c). The algorithm MinOverlap pioneers in
exploiting static information of the partitioned transition relation T to ﬁnd a
good splitting variable v.
In a pre-processing step, every state variable is assigned an inﬂuence and
the variables are ordered decreasingly by their inﬂuence. The inﬂuence table
maps state variables to their inﬂuence. Later, the splitting variable selection
algorithm utilizes this information.
Deﬁnition 5 Let l1, l2 ∈ N be inﬂuence lookaheads. For a given FSM A, the
inﬂuence Φl1,l2(e) ∈ [−1, 1] of a state variable e ∈ E, with |E| = m, is deﬁned
as
Φl1,l2(e) =
|D↑(e, l1)| − |D
↓(e, l2)|
m
. (3)
Set D↑(e, l1) contains all state variables that get inﬂuenced by e in l1 steps,
and set D↓(e, l2) contains all state variables that inﬂuence e in l2 steps. These
sets are determined iteratively starting with l1 = 1 and l2 = 1. Each Ti directly
corresponds to the truth value of the next state variable e′i, so we compute
these sets by walking all Ti and ei. For D
↑(e, 1), we count the partitions Ti
that contain e, whereas for D↓(e, 1) we count the state variables in the support
of Ti.
The basic assumption of the MinOverlap algorithm is that splitting on a
variable v with high inﬂuence will lead to fewer cross transitions between the
resulting partitions, because the value of Φl1,l2(v) next state variables depends
on v. Of course, there are other factors determining the values of these next
state variables, weakening our assumption. Our algorithm works well if the
partitioned transition relations Ti depend on conjunctively connected variables
only. It degrades if the Ti depend on disjunctively connected variables where at
least one disjunct contains only input variables. However, it is computationally
expensive to analyze all Boolean connectives of the clauses of every Ti.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–6354
The actual MinOverlap algorithm picks a viable state variable for split-
ting. The state variables are categorized based on their inﬂuence and put into
diﬀerent sets. We start with the set containing variables with a high inﬂu-
ence and check them against a balancing condition. Alongside, we compute
the cost of these variables with the cost function from [16] that consists of a
redundancy and a reduction factor. If none of the examined variables satisﬁed
the balancing condition, the variable with minimal cost is selected. Fig. 4
gives the pseudo code for MinOverlap.
1// S is the current state set
2// S1 and S2 are the resulting partitions
3// Φ is the inﬂuence table
4// δ is the memory balance factor
5// α is the weight for the cost function
6split(in: S, Φ, δ, α; out: S1, S2)
7bestCost := Φ.top()
8minCost := cost(S, bestCost, δ, α)
9while C = getCandidateSet(Φ) ∧ C = ∅
10for all w ∈ C
11if max(|Sw |, |Sw′ |) ≤ δ|S| then
12v := w; goto do split
13else
14thisCost := cost(S, v, α)
15if thisCost < minCost then
16minCost := thisCost; bestCost := w
17v := bestCost
18do split: S1 := Sv; S2 := Sv′
Fig. 4. State set splitting with the MinOverlap algorithm.
5.2 Dynamic overlap reduction
Initially, the overlap between state sets of network nodes is reduced by ap-
plying the MinOverlap algorithm. However, in general the overlap may still
pursue after a few steps of state space traversal. In order to further conﬁne
the overlap we perform dynamic overlap reduction. This is a methodology
where we allow overlap to some extent and heuristically select a time frame to
remove it periodically. We perform overlap removal after state set distribution
(see section 4.1). This method is iteratively performed either throughout the
veriﬁcation process or up to n times. An extra node called coordinator orga-
nizes the communication between the nodes and performs dynamic removal
of state overlap. The overlap removal algorithm for each node works in three
steps:
(i) Upon reaching a reduction time point 2 the node dumps its current state
2 The state set distribution time point and the reduction period determine the reduction
time points.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–63 55
set onto the network drive and sends a message to the coordinator.
(ii) The coordinator removes the overlap of the node with respect to the
already visited state space by other nodes at this time point, and updates
the history of the visited state space. Then it informs the corresponding
node to proceed with the reduced state space by dumping the trimmed
state set.
(iii) Finally, if all nodes passed a reduction time point, the coordinator re-
moves the state space history of that time point.
Fig. 5 delineates the usage of overlap reduction in the main computation
loop of the symbolic simulation algorithm in the parallel stage. We have to
check the termination condition locally, i.e. only in the current subset (line
10), and globally, which requires communication with the other nodes (line
8). For example, in order to show an universal validation, all nodes have to
ﬁnish in accept states locally, which can only be checked globally.
1// S is the set of initial states
2// t is the checking time bound
3// p is the period of steps at which overlap removal is performed
4// n is the overlap removal limit, 0 indicates continuous reduction
5simulate(in: S, t, p, n)
6reduction limit := 0; reduction step := 0
7if n > 0 then tillEnd := false else tillEnd := true
8while iteration < t
9checkTerminationConditionGlobally()
10S := imageAR(S) // Compute image of AR-automata.
11checkTerminationConditionLocally(S);
12S := imageT (S) // Compute image of the system.
13if (reduction limit < n) ∨ tillEnd then
14reduction step++
15if reduction step = p then
16S := removeOverlap(S)
17reduction step := 0; reduction limit++
Fig. 5. Main computation loop for state overlap removal.
The main advantage of our dynamic reduction method is that nodes do
not have to wait for slow nodes. After dumping their current state set, faster
nodes can continue to traverse the product automaton. Therefore, we achieve
asynchronous overlap removal between network nodes. Although nodes have
to wait for the coordinator to update their state set, this time is not signiﬁcant
compared to the time spent on image computation.
An interesting side eﬀect of our asynchronous methodology is the resulting
natural load balancing. The very last node that reaches a reduction time
point gets its overlap removed with respect to all other nodes. So this last
node has no states in common with the other nodes at this reduction step.
Our experiments state that usually the last node after overlap removal has
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–6356
the smallest subset. This in turn means faster image computation enabling
this node to reach the forthcoming reduction time point faster. Hence, at
that reduction time point this particular node will arrive earlier than other
nodes, and therefore continues with a larger state set. This process alternates
among the nodes accordingly depending on the weight of image computation,
resulting in natural load balancing between the network nodes.
For some models, the overlap is so high that the late nodes become empty
after overlap removal. This special situation is handled by state set sharing
with the following node that reaches any reduction time point.
6 Experimental Results
We performed our experiments on the Kepler cluster at the University of
Tuebingen 3 . This cluster contains 98 computing nodes, each consisting of
dual 650 MHz Pentium-III processors with 1 GB of shared memory (512 MB
for each processor). We conducted our experiments on some of the circuits
from the ISCAS89 benchmarks and a model of a holonic production system
[29]. All experiments were performed with dynamic variable ordering disabled
in the BDD package. For circuits from the ISCAS89 benchmarks we check for
reachability of a state at high hamming distance from the initial states (see
equation 4) along with properties from [2]. In the holonic production system
we check for consumption of a workpiece (see equation 5). All properties are
checked universally. The properties written in FLTL look like this, where
b > 0 are explicit time bounds on the properties:
G[b] !(s1512.start & s1512.video & ... & s1512.I1733) (4)
F[b] OutBuﬀer.s consume (5)
6.1 Static overlap reduction
In this part we concentrate on comparing the static overlap reduction heuris-
tic MinOverlap to an altered version of the slicing heuristic from [16] labeled
EqualDist, and the variable disjunction decomposition algorithm from the
CUDD package [31] labeled as VarDisj. The MinOverlap algorithm is de-
noted by the inﬂuence Φl1,l2 used for ordering the state variables. In these
experiments, we use a balancing condition of max(|f1|, |f2|) ≤
2
3
|f | for the
MinOverlap algorithm. The results are shown in Fig. 6.
3 http://kepler.sfb382-zdv.uni-tuebingen.de
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–63 57
Design Split. alg. Recursion level : Splitting vars. Step : ok · 100 st vt
s1269 Φ1,1 1:32 1:31.5 0.11 214.13 (0.55)
#2 Φ1,0 1:32 1:31.5 0.11 217.5 (0.51)
37 EqualDist 1:0 1:58.6 0.36 236.7 (0.28)
5000 VarDisj 1:14 1:57.4 0.15 371.2 (0.25)
s1512 Φ5,1 1:96 5:64.8 / 10:89.3 0.15 3204.85 (4.52)
#2 Φ1,0 1:10 5:89.3 / 10:91.8 0.11 3330.31 (4.4)
57 EqualDist 1:10 5:89.3 / 10:91.8 0.87 3324.89 (3.72)
10000 VarDisj 1:10 5:89.3 / 10:91.8 0.38 3312.75 (3.75)
s1269 Φ1,1 1:32 / 2:34 / 3:36 1:9.1 0.68 69.9 (0.55)
#8 Φ1,0 1:32 / 2:34 / 3:40,36 1:9.1 0.68 69.43 (0.52)
37 EqualDist 1:0 / 2:42,6 / 3:12,8,40 1:12.7 0.55 58.8 (0.29)
5000 VarDisj 1:14 / 2:12 / 3:10,26 NA 0.26 58.4 (0.27)
s1512 Φ5,1 1:96 / 2:98 / 3:100,10 10:54.6 / 15:68.3 0.23 2891.7 (4.6)
#8 Φ1,0 1:10 / 2:12 / 3:14 10:76.4 / 15:94.2 0.15 2993.0 (4.4)
57 EqualDist 1:10 / 2:96,12 / 3:12,14,94 10:76.6 / 15:94.2 1.5 2988.9 (3.7)
10000 VarDisj 1:10 / 2:12,94,96 / 3:12,96,14 10:76.4 / 15:94.2 0.71 3029.2 (3.7)
nh2 Φ1,1 1:14 / 2:48,16 / 3:10,18,52 60:11.7 / 100:25.4 0.65 #481 (7.12)
#8 Φ1,0 1:14 / 2:54,16 / 3:48,18,10 60:12.2 / 100:26.6 0.51 #272 (7.0)
118 EqualDist 1:4 / 2:58,176, / 3:24,134,54,40 60:22.1 / 100:40.8 13.4 #151 (6.04)
50000 VarDisj 1:4 / 2:58,40 / 3:44,54,18,176 60:20.9 / 100:41.0 13.3 #145 (6.07)
Fig. 6. Comparison of MinOverlap with other heuristics. The ﬁrst column lists the design, followed
by the number of processors used, the number of state variables and the splitting threshold. The
second column indicates the splitting algorithm. The third column gives at each splitting recursion
level the indexes of the selected splitting variables. The CUDD package identiﬁes variables by
index. Then the fourth column shows the overlap at diﬀerent iteration steps. The ﬁfth and sixth
columns list the average splitting time st, and the total veriﬁcation time vt (or a memory overﬂow
is indicated by #, followed by the maximum number of steps), respectively. The splitting time
corresponds to the time spent in algorithm split as described in ﬁgure 4.
Discussion: The preprocessing step of the MinOverlap algorithm does
not require a signiﬁcant amount of time, in all experiments it consumed less
than 1% of the veriﬁcation time. For two processor, design s1269 shows
a signiﬁcant reduction in overlap by selecting high inﬂuence variables and
hence a gain in overall veriﬁcation time can be observed. Both MinOverlap
and EqualDist picked high inﬂuence variables, but only MinOverlap reduced
the overlap signiﬁcantly. This is due to the inﬂuence lookahead condition
explained in Section 5.1. The inﬂuence stays positive for MinOverlap and
becomes negative for EqualDist with Φ1,1.
For eight processors, design s1269 has low overlap with all splitting al-
gorithms. But the other two designs clearly show the beneﬁt of applying
MinOverlap, both for designs with huge and moderate overlap. Design s1512
belongs to the category with huge overlap after a few steps. However, Min-
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–6358
Overlap with a lookahead of l1 = 5 is able to signiﬁcantly delay the occurrence
of overlap and reduce the veriﬁcation time. Nevertheless, this reveals that high
inﬂuence variables can only help to reduce the overlap for a few steps but can-
not avoid it beyond a limit, making dynamic removal techniques a must. The
overlap in design nh2 increases much slower than in the other design, but its
size leads to memory overﬂow. Again, MinOverlap is able to reduce the over-
lap, even after 100 steps. This allows the nodes to go a lot further without
memory overﬂow.
6.2 Dynamic overlap reduction
First, we ran some of the larger designs in sequential SymC with all relevant
optimizations switched on. The sequential algorithm splits the state set re-
peatedly upon reaching the threshold, whereas the parallel version does it only
during state set distribution. The results are available in Fig. 7. For most of
the designs the sequential algorithm cannot complete traversal due to memory
overﬂow or time out problems 4 .
Design Threshold Φl1,l2 Time bound Peak node count vt
s4863 20000 Φ1,1 5 4.48 #2
s1512 50000 Φ2,1 100 3.80 *80
s1423p1 50000 Φ1,0 - 13.55 #11
s1423p3 50000 Φ1,0 12 14.27 *11
nh2 50000 Φ1,1 1000 2.30 663
Fig. 7. Results for fully optimized sequential SymC. The ﬁrst column lists the design name. Column
two gives the splitting threshold. The third column shows the inﬂuence used for MinOverlap
splitting. The fourth and ﬁfth columns list the time bound speciﬁed in the property and maximum
peak node count in millions, respectively. The last column shows the overall veriﬁcation time. #n
or *n denote memory overﬂow or time out at step n.
Fig. 8 shows the results of the distributed approach with dynamic over-
lap removal using 32 processors dedicated to the checking algorithm and one
processor acting as the coordinator. In these experiments, dynamic overlap
removal is applied throughout the veriﬁcation process repeatedly every p steps.
Discussion: First of all, the parallel algorithm is able to ﬁnish all the
problems that the sequential approach was not able to handle due to space or
time restrictions.
Designs s4863 and s1512 clearly show the advantage of both parallelization
and dynamic overlap removal, i.e. decreasing p reduces veriﬁcation time. Also,
traversal of design nh2 completes with a speedup of 2.8 compared to the
sequential version.
4 Experiments were stopped after one hour.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–63 59
Design p (Φl1,l2) Time bound Seq. time (step) Peak node count vt
s4863 1 (Φ1,1) 5 1.67 (1) 8.39 587.73
20000 2 (Φ2,1) 5 1.69 (1) 10.52 613.32
s1512 2 (Φ2,1) 100 108.75 (33) 2.41 508.21
10000 3 (Φ3,1) 100 108.95 (33) 2.55 522.25
5 (Φ5,1) 100 106.53 (33) 2.83 643.61
s1423p1 1 (Φ1,1) - 75.70 (8) 13.20 748.5
50000 2 (Φ2,1) - 75.54 (8) 13.77 806.99
5 (Φ5,1) - 76.13 (8) 11.23 567.41
s1423p2 1 (Φ1,1) - 76.62 (8) 1.87 114.25
50000
s1423p3 1 (Φ1,1) 12 152.04 (9) 14.51 1322.22
50000 2 (Φ2,1) 12 151.84 (9) 13.18 1171.18
3 (Φ3,1) 12 153.18 (9) 7.68 791
nh2 50000 100 (Φ1,1) 1000 86.22 (132) 1.658 230.08
Fig. 8. Results of the distributed algorithm with dynamic overlap removal. The ﬁrst column
indicates the design and the splitting threshold. The second column shows the time period p at
which overlap reduction is performed and the inﬂuence used in MinOverlap. The third column lists
the time bound speciﬁed in the property. Column four lists the time taken by the sequential part
and the time step at which the parallel stage starts. Column ﬁve shows the maximum peak node
count of all the nodes in millions. The last column lists the overall veriﬁcation time.
For design s1423 we considered three properties p1, p2 and p3. Both p1
and p2 are from [2] and pure LTL properties, hence there is no time bound
speciﬁed in the property. In comparison to [2], SymC ﬁnds errors in the designs
signiﬁcantly faster, even taking diﬀerent hardware conﬁgurations into account.
However, design s1423 behaves unexpectedly as veriﬁcation time increases
with shorter dynamic overlap reduction periods. This eﬀect is caused by the
behavior of the BDDs representing the state sets. Removing states from the
sets actually increases their BDD representation. This opens a new thread
for heuristics when and how to apply dynamic removal. We also investigate if
dynamic variable reordering takes care of this problem.
Fig. 9 depicts the natural load balance graph for the circuit s1512 with
reduction period 2. Only four nodes are shown for clear visibility of the graph.
The load balancing eﬀect can be seen very well when nodes 0 and 24 swap
their arrival order during execution.
Finally, measurements indicate that reading and writing BDDs to and from
disk does not contribute to the overall veriﬁcation time signiﬁcantly. Thus,
network I/O is not a bottleneck of the distributed algorithm.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–6360
05
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10
reduction time points
n
o
d
e
 a
rr
iv
a
l 
o
rd
e
r
node 0
node 8
node 16
node 24
Fig. 9. Arrival order of nodes at reduction points showing load balancing between the nodes.
7 Conclusions and Future Work
This paper presents the parallelization of a BDD-based bounded property
checking algorithm. The two main contributions are a novel splitting algo-
rithm taking overlap reduction into account and a distributed on-the-ﬂy algo-
rithm for asynchronous state space traversal with dynamic overlap reduction
resulting in natural load balancing.
The MinOverlap splitting heuristic enhances current decomposition algo-
rithms by preprocessing the transition relation and using this information for
ordering the list of potential splitting variables. The experiments show that
this preprocessing step is able to actually reduce the overlap and the splitting
time. Furthermore, MinOverlap almost never degrades the splitting runtime
or the resulting overlap signiﬁcantly.
Dynamic overlap reduction is an important technique in enabling veriﬁca-
tion of larger designs and signiﬁcantly improves the applicability of the dis-
tributed algorithm. Reassigning idle nodes avoids wasted computation power.
However, for some designs overlap reduction can actually increase the BDD
representation of sets with fewer states. This seems to be related to the charac-
teristic that a ﬁxed BDD variable order is kept after the sequential stage. We
experiment with diﬀerent variable orderings on computation nodes to handle
these cases. Furthermore, we are extending our experiments to designs from
the VIS suite and recent IBM examples.
8 Acknowledgements
We want to thank the reviewers for their detailed comments that helped in
enhancing the quality of this paper.
References
[1] Accellera, “Property Speciﬁcation Language (PSL), Version 1.1,” (2004),
http://www.eda.org/vfv.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–63 61
[2] Ben-David, S., T. Heyman, O. Grumberg and A. Schuster, Scalable distributed on-the-ﬂy
symbolic model checking, International Journal on Software Tools for Technology Transfer
(STTT) 4(4) (2003), pp. 496–504.
[3] Biere, A., A. Cimatti, E. M. Clarke, O. Strichman and Y. Zhu, Bounded model checking, in:
M. Zelkowitz, editor, Highly Dependable Software, Advances in Computers 58, Academic Press,
2003 .
[4] Bollig, B. and I. Wegener, Partitioned BDDs vs. other BDD models, in: ACM/IEEE
International Workshop on Logic Synthesis (IWLS), 1997.
[5] Braberman, V., A. Olivero and F. Schapachnik, Issues in distributed timed model checking:
Building Zeus, International Journal on Software Tools for Technology Transfer (STTT) 7(1)
(2005), pp. 4 – 18.
[6] Brim, L., I. Cˇerna´, P. Moravec and J. Sˇimsˇa, Distributed partial order reduction of state spaces,
in: Proceedings of PDMC 2004 [24].
[7] Bryant, R. E., Symbolic boolean manipulation with ordered binary-decision diagrams, ACM
Computing Surveys 24(3) (1992), pp. 293–318.
[8] Burch, J., E. Clarke, K. L. McMillan, D. Dill and L. Hwang, Symbolic Model Checking: 1020
States and Beyond, Information and Computing 98 (1992), pp. 142–170.
[9] Burch, J. R., E. M. Clarke and D. E. Long, Representing circuits more eﬃciently in symbolic
model checking, in: 28th Conference on Design Automation (1991), pp. 403–407.
[10] Cabodi, G., P. Camurati, L. Lavagno and S. Quer, Disjunctive partitioning and partial iterative
squaring: An eﬀective approach for symbolic traversal of large circuits, in: 34th Conference on
Design Automation (1997), pp. 728–733.
[11] Cabodi, G., P. Camurati and S. Quer, Improved reachability analysis of large ﬁnite state
machines, in: Proceedings of ICCAD 1996 [23], pp. 354–360.
[12] Clarke, E. M., O. Grumberg and D. E. Peled, “Model Checking,” The MIT Press, 1999.
[13] Grumberg, O., T. Heyman and A. Schuster, Distributed symbolic model checking for µ-calculus,
in: G. Berry, H. Comon and A. Finkel, editors, Computer Aided Veriﬁcation, 13th International
Conference, Lecture Notes in Computer Science 2102 (2001), pp. 350–362.
[14] Grumberg, O., T. Heyman and A. Schuster, A work-eﬃcient distributed algorithm for
reachability analysis, in: W. A. Hunt Jr. and F. Somenzi, editors, Computer Aided Veriﬁcation,
15th International Conference, Lecture Notes in Computer Science 2725 (2003), pp. 54–66.
[15] Haverkort, B., A. Bell and H. Bohnenkamp, On the eﬃcient sequential and distributed
generation of very large Markov chains from stochastic Petri nets, in: 8th International
Workshop on Petri Nets and Performance Models (1999).
[16] Heyman, T., D. Geist, O. Grumberg and A. Schuster, Achieving scalability in parallel
reachability analysis of very large circuits, in: E. A. Emerson and A. P. Sistla, editors, Computer
Aided Veriﬁcation, 12th International Conference, Lecture Notes in Computer Science 1855
(2000), pp. 20–35.
[17] Inggs, C. P. and H. Barringer, CTL* model checking on a shared-memory architecture, in:
Proceedings of PDMC 2004 [24].
[18] Lange, M. and H. W. Loidl, Parallel and symbolic model checking for ﬁxpoint logic with Chop,
in: Proceedings of PDMC 2004 [24].
[19] McMillan, K. L., A conjunctively decomposed boolean representation for symbolic model
checking, in: R. Alur and T. A. Henzinger, editors, Computer Aided Veriﬁcation, 8th
International Conference, Lecture Notes in Computer Science 1102 (1996), pp. 13–25.
[20] Narayan, A., J. Jain, M. Fujita and A. L. Sangiovanni-Vincentelli, Partitioned ROBDDs
- a compact, canonical and eﬃciently manipulable representation for boolean functions, in:
Proceedings of ICCAD 1996 [23], pp. 547–554.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–6362
[21] Orzan, S., J. van de Pol and M. V. Espada, A state space distribution policy based on abstract
interpretation, in: Proceedings of PDMC 2004 [24].
[22] Peranandam, P. M., R. J. Weiss, J. Ruf, T. Kropf and W. Rosenstiel, Dynamic guiding of
bounded property checking, in: IEEE International High Level Design Validation and Test
Workshop 2004 (HLDVT 04), 2004.
[23] “Proceedings of ICCAD 1996,” ACM and IEEE Computer Society Press, 1996.
[24] “Proceedings of PDMC 2004,” Electronic Notes in Theoretical Computer Science, Elsevier,
2004.
[25] Ravi, K., K. L. McMillan, T. R. Shiple and F. Somenzi, Approximation and decomposition of
binary decision diagrams, in: 35th Conference on Design Automation (1998), pp. 445–450.
[26] Ravi, K. and F. Somenzi, High-density reachability analysis, in: 1995 IEEE/ACM International
Conference on CAD (1995), pp. 154–158.
[27] Ruf, J., D. W. Hoﬀmann, T. Kropf and W. Rosenstiel, Simulation-guided property checking
based on a multi-valued AR-automata, in: W. Nebel and A. Jerraya, editors, Design,
Automation and Test in Europe 2001 (2001), pp. 742–748.
[28] Ruf, J., P. M. Peranandam, T. Kropf and W. Rosenstiel, Bounded property checking with
symbolic simulation, in: Forum on Speciﬁcation and Design Languages 2003, 2003.
[29] Ruf, J., R. J. Weiss, T. Kropf and W. Rosenstiel, Modeling and formal veriﬁcation of production
automation systems, in: E. et. al., editor, Integration of Software Speciﬁcation Techniques for
Applications in Engineering, Lecture Notes in Computer Science 3147, Springer, 2004 pp.
541–566.
[30] Sahoo, D., S. K. Iyer, J. Jain, C. Stangier, A. Narayan, D. L. Dill and E. A. Emerson, A
partitioning methodology for BDD-based veriﬁcation, in: A. J. Hu and A. K. Martin, editors,
Formal Methods in Computer-Aided Design, Fifth International Conference, Lecture Notes in
Computer Science 3312 (2004), pp. 399–413.
[31] Somenzi, F., CUDD: CU decision diagram package, release 2.4.0,
http://vlsi.colorado.edu/∼fabio/CUDD (2004).
[32] Stern, U. and D. L. Dill, Parallelizing the Murφ veriﬁer, in: O. Grumberg, editor, Computer
Aided Veriﬁcation, 9th International Conference, Lecture Notes in Computer Science 1254
(1997), pp. 256–278.
P.K. Nalla et al. / Electronic Notes in Theoretical Computer Science 135 (2006) 47–63 63
