On the non-termination of MDG-based abstract state enumeration  by Mohamed, Otmane Aı̈t et al.
Theoretical Computer Science 300 (2003) 161–179
www.elsevier.com/locate/tcs
On the non-termination of MDG-based abstract state
enumeration
Otmane A)*t Mohameda ; 1, Xiaoyu Songb; ∗, Eduard Cernyc
aConcordia University, ECE Department, 1455 de Maisonneuve Blvd. W., Montreal, Canada H3G 1M8
bPortland State University, P.O. Box 751, Portland, OR 97207-0751, USA
cUniversit,e of Montr,eal, C.P. 6128, Succ. Centre-Ville, Montr,eal, Canada H3C 3J7
Received 20 December 1999; received in revised form 8 March 2001; accepted 23 August 2001
Communicated by R. Gorrieri
Abstract
Multiway decision graphs are a new class of decision graphs for representing abstract states
machines. This yields a new veri3cation technique that can deal with the data-width problem
by using abstract sorts and uninterpreted functions to represent data value and data operations,
respectively. However, in many cases, it may su6er from the non-termination of the state enu-
meration procedure. This paper presents a novel approach to solving the non-termination prob-
lem when the generated set of states, even in3nite, represents a structured domain where terms
(states) share certain repetitive patterns. The approach is based on the schematization method
developed by Chen and Hsiang, namely -terms. Schematization provides a suitable formalism
for 3nitely manipulating in3nite sets of terms. We illustrate the e6ectiveness of our method by
several examples. c© 2002 Elsevier Science B.V. All rights reserved.
Keywords: Multiway decision graphs; Reachability analysis; Recurrent domains; -terms
1. Introduction
Two main approaches of formal veri3cation have been studied: interactive veri3cation
using a theorem prover, and model checking. Each method possesses its own strengths
and weaknesses. Mechanical theorem proving is more general, but requires intensive
human guidance and is thus time consuming. Model checking is automatic, but it
is applicable to 3nite state machines and su6ers from the state-explosion problem.
∗ Corresponding author. Tel.: +1-503-725-5398; fax: +1-503-725-3807.
E-mail addresses: ait@ece.concordia.ca (O.A. Mohamed), song@ee.pdx.edu (X. Song), cerny@iro.
umontreal.ca (E. Cerny).
1 This work was performed while the 3rst author was at University of Montreal.
0304-3975/03/$ - see front matter c© 2002 Elsevier Science B.V. All rights reserved.
PII: S0304 -3975(01)00345 -0
162 O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179
This limits its use to relatively small circuits. The problem was widely addressed
in the literature. The works described in [1, 3, 4, 7, 8, 11, 16] exploit Bryant’s reduced
ordered binary decision diagrams (ROBDDs) [2] to encode sets of states and to perform
implicit enumeration of the state space. However, these methods are not adequate in
general for verifying circuits with large and complex data paths, because of the Boolean
representation of circuits. More speci3cally, every individual bit of every data signal
is represented by a Boolean variable, while the size of a ROBDD grows, sometimes
exponentially, with the number of variables. This means that ROBDD-based veri3cation
methods often take too much time, or run out of memory, when applied to circuits
having a complex data path.
Recently, a new veri3cation approach was presented to overcome the above draw-
backs. This approach is based on abstract descriptions of state machines (ASM)
which are encoded by a new class of decision graphs, called multiway decision graphs
(MDGs) [10], of which ROBDDs are a special case. With MDGs, one can integrate two
veri3cation techniques that have been very successful: implicit state enumeration and
the use of abstract sorts and uninterpreted function symbols. MDGs are decision graphs
that can represent relations as well as sets of states, and they incorporate variables of
abstract sorts to denote data values, and uninterpreted function symbols to denote data
operations. A set of basic operations on MDG graphs was implemented to perform var-
ious kinds of veri3cation. It includes the algorithms for disjunction, relational product
(image computation), pruning by subsumption, and rewriting [10].
Unfortunately, the method su6ers in many cases from an important problem, namely
non-termination when computing the set of reachable states. This can be a severe
limitation on the use of MDGs as a veri3cation tool. For example, consider an abstract
description of a conventional (non-pipelined) microprocessor where a state variable
pc of abstract sort represents the program counter, a generic constant zero of the
same abstract sort denotes the initial value of pc, and an abstract function symbol inc
describes how the program counter is incremented by a non-branch instruction. The
MDG representing the set of reachable states of the microprocessor would contain states
of the form
(pc; inc(: : : inc(
︸ ︷︷ ︸
k
zero) : : :))
for every k¿0. Consequently, there is no 3nite MDG representation of the set of reach-
able states, and hence the reachability algorithm will not terminate. This illustrates a
typical form of non-termination, due to the fact that the terms can be arbitrarily large
and hence arbitrarily many.
In this paper, we present a method based on schematization to deal with this kind
of non-termination problem. Schematization is a method for 3nitely representing in3-
nite sets of objects [12, 5, 15, 9, 14]: terms, rewrites rules, substitutions, etc. If some
in3nite sets can be represented 3nitely and if algebraic operations can be performed
on these 3nite representations, then the problem of non-termination can be avoided.
The abstract machines we consider are those that present a cyclic behavior starting
O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179 163
from any state of the machine. Let us return to the above example: as we can see,
the labels of the pc node in the MDGs generated by the reachability analysis proce-
dure form a structured domain whose terms share a repetitive pattern. For instance,
the pattern of the domain {inck(zero) : k ∈N} is inc() such that, for every term
inck(zero) in the domain, inc(inck(zero)) is also in the domain. Such a set is usu-
ally represented by a single expression (inc(); N; zero), where the 3rst argument of
 represents the pattern, the second argument serves as a counter, called the degree
variable, and the third argument represents the base term. This kind of an expres-
sion is called a -term [5]. By allowing -terms to be part of the language, we are
able to represent an in3nite MDG 3nitely, by labeling some edges by a -term, i.e.,
we can represent a logical formula having an in3nite number of disjuncts of the form:
pc= zero∨pc= inc(zero)∨pc= inc(inc(zero))∨ · · · ∨ inck(zero)∨ · · · as pc=(inc
(); N; zero).
The paper is organized as follows: In Section 2, we brieMy present MDGs. In
Section 3, we provide a background on -terms, and we de3ne the MDG extension
which incorporates the -terms. In Section 4, we illustrate our method on several
examples. Finally, we conclude with some remarks and discuss the direction of future
work.
Related Works. The non-termination problem was studied in [17]. The authors pre-
sented a method based on the generalization of the state variable that causes divergence,
like the variable pc in the example. Rather than starting the reachability analysis with
a generic constant zero as the value of pc, a fresh 2 variable is assigned to pc at the
beginning of the analysis. As a consequence, the set of states represented by pc is
enlarged, so that any incrementation of pc leads the ASM to a state, where the new
value of pc is an instance of its arbitrary initial value. This technique is applicable only
to circuits which, like a conventional (non-pipelined) processor, has a cyclic behavior:
starting from a ready state, doing some work, and returning to a ready state. This class
of circuits is called processor-like loop circuit. Unfortunately, if the entrance of the loop
does not start in the initial state, then this generalization technique may not work. The
solution proposed by the authors consists of 3nding manually the entry state of the loop,
where the generalization must be done. In general, it is diNcult to identify processor-
like loops in a machine. The heuristics used by the authors require human interferences
at di6erent stages of the veri3cation process. Furthermore, the main drawback of this
generalization method is the loss of information provided by axioms which partially
interpret abstract function symbols. This means that we become deprived of powerful
automated deduction technique, such as rewriting, which is useful when carrying out
veri3cation with MDGs. In this work, we use a schematization which possesses the ad-
vantages of generalization while avoiding its weaknesses to deal with non-termination.
Rewriting rules that characterize uninterpreted functions can still be used. It suNces to
instantiate the degree variables to speci3c values and apply suitable rules.
2 A fresh variable is disjoint from all other variables.
164 O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179
2. Multiway decision graphs
An MDG is a 3nite, directed acyclic graph (DAG) where the leaves are labeled by
True (), the internal nodes are labeled by terms, and the edges issuing from an internal
node  are labeled by terms of the same sort as the label of . Such graph is a canonical
representation, modulo a set of well-formedness conditions (see [10] for details), of
a certain quanti3er-free formula, called a directed formula (DF). A DF formula is a
variant of 3rst-order logic with equality and sorts, with a distinction between concrete
sorts and abstract sorts. This distinction is a syntactic counterpart of the hardware
di6erence between data path and control. Concrete sorts have enumerations which are
sets of individual constants, while abstract sorts do not.
Syntax. Let F be a set of function symbols and V a set of variables. We denote
the set of terms freely generated from F and V by T(F;V). The syntax of a directed
formula is then given by the grammar below:
Sort S ::= S | S
Abstract sort S ::=  |  |  | : : :
Concrete sort S ::=  |  |  | : : :
Generic constant C ::= a | b | c | : : :
Concrete constant C ::= a | b | c | : : : | 0 | 1 | : : :
Variable V ::= V |V
Abstract variable V ::= x |y | z | : : :
Concrete variable V ::= x |y | z | : : :
Directed Formula Disj ::= Conj ∨ Disj Conj
Conj ::= Eq ∧ Conj |Eq
Eq ::= A = C (A ∈ T(F;V))
|V = C
|V = A (A ∈ T(F;V))
The vocabulary consists of generic constants, concrete constants, abstract variables,
concrete variables and function symbols. The distinction between abstract and concrete
sorts leads to a distinction between three kinds of function symbols. Let f be a function
symbol of type 1× · · · × n→ n+1. If n+1 is an abstract sort then f is an abstract
function symbol. If all the 1 : : : n+1 are concrete, f is a concrete function symbol.
If n+1 is concrete while at least one of 1 : : : n is abstract, then we refer to f as a
cross-operator; cross-operators are useful for modeling feedback from the data path to
the control circuitry. Atomic formulae are the equations, generated by the clause Eq,
O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179 165
plus  (truth) and ⊥ (false). Directed formulae are a disjunction of conjunction of
equations.
An equation is well-typed if the sort of the term on the left-hand side of the equation
is the same as the sort of the term on the right-hand side.
A directed formula is well-typed and is of type U→V if and only if: (1) each
equation is well-typed, and (2) each term A in the equation of the form A=C or
X =A is in T(F;U), and (3) for every abstract variable v∈V appears as the LHS of
an equation v=A in each of the disjuncts.
Just as ROBDDs [2] must be reduced and ordered, MDGs must obey a set of well-
formedness conditions given in [10]. Among other things, these conditions specify the
kinds of nodes that may appear in an MDG. An internal node may be labeled by
a variable of concrete sort, with edges issuing from the node labeled by individual
constants in the enumeration of the sort; or by a variable of abstract sort, with edges
labeled by concretely reduced terms of that sort; or by a cross-term of a sort , with
edges labeled by the individual constants in the enumeration of . All leaf nodes are
labeled , except when the graph has a single node, which may be labeled  or ⊥.
Semantics. An interpretation is a mapping  that assigns a denotation to each sort,
constant and function symbol, and satis3es the following conditions:
• The denotation  (S) of an abstract sort S is a non-empty set.
• If S = {a1; : : : ; am} then  (S)= { (a1); : : : ;  (am)} and  (ai) =  (aj) for all i; j such
that ai = aj; i = j.
• A variable assignment with domain X compatible with an interpretation  , is a
function  that maps every variable x of X of sort S to an element (x) of  (S).
We write + X for the set of  -compatible assignments to the variables in X .
• If f(t1; : : : ; tn) is a term of sort Sn+1 and t1; : : : ; tn are terms of sorts S1; : : : ;Sn,
respectively, then the denotation  (f(t1; : : : ; tn)) is de3ned as  (f)( (t1); : : : ;  (tn)).
In particular, if the arity of f is equal to 0, (i.e., f is a generic constant of sort S),
 (f)∈  (S).
• We write  ;  |=P if a formula P denotes truth under an interpretation  and
 -compatible variable assignment  to the free variables of P;  |=P if  ;  |=P for
all such assignments , and |=P if  |=P for all interpretations  . Two formulae P
and Q are logically equivalent i6 |=P ⇔ Q.
MDG-based abstract state enumeration. A circuit is described at the register-transfer
level as a collection of components interconnected by nets that carry signals. Each sig-
nal is represented by a variable. Variables denoting control signals have concrete sorts,
while variables denoting data values have abstract sorts. An Uninterpreted function
symbols are used to model control operations, which must have a concrete sort, while
data operations are viewed as black boxes and are modeled by an uninterpreted func-
tion symbol which must have an abstract sort. A set of basic operation on the MDGs
graph is implemented to perform various kind of veri3cation for a given circuit. This
set include algorithms for disjunction, relational product (image computation), pruning
by subsumption, and rewriting. A good discussion on that is described in [10]. Most
of the veri3cation techniques with MDGs are based on an implicit reachability analysis
166 O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179
algorithm which forms the kernel of this tool. The algorithm is based on abstract state
enumeration [10], where sets of states, as well as transition and output relations, are
represented using MDGs. Because of abstract variables and the uninterpreted nature of
function symbols, the reachability analysis algorithm may not terminate, and the least
3xed point may not be reached in state enumeration. The procedure, called ReAn for
Reachability Analysis, is described by the following pseudo-code:
1. Proc ReAn(D)
2. R :=FI; Q :=FI; K := 0;
3. loop
4. K :=K + 1;
5. I := Fresh(X; K);
6. N :=RelP({I; Q; FT}; X ∪Y; 3);
7. Q := PbyS(N; R);
8. if Q = ⊥ then return success;
9. R := PbyS(R;Q);
10. R :=Disj(R;Q);
11. end loop;
12. end ReAn;
where D=(X; Y; Z; FI; FT; FO) is an abstract state machine description in which X; Y; Z
are disjoint sets of variables, viz. the input, state, and output variables, respectively,
and FI; FT; FO are DFs representing a set of initial states, the transition, and the output
relations, respectively.
We describe the most important steps of the algorithm, for more details see [17, 10].
In this pseudo-code, I; N; Q and R are program variables that take as values MDGs
representing sets of states. We will identify the program variables and their values in
the following explanations when there is no risk of confusion.
Before each loop iteration, R represents the set of reachable states found so far,
while Q represents the frontier set, i.e., a subset of Set Y (R) containing at least all
those states that entered Set Y (R) for the 3rst time in the previous iteration.
In line 5, Fresh(X; K) constructs a one-path MDG representing a conjunction of equa-
tions x= u, one for each abstract input variable x∈X , where u is a fresh variable from
the set of auxiliary abstract variables U . The value of the loop counter K is used to
generate the fresh variables. This one-path MDG is assigned to I , which represents the
set of input vectors.
In line 6, the relational product operation computes the MDG N representing the set
of states reachable in one step from the frontier set Q of states that have not been
visited.
Note that the MDG Q representing the frontier set is of type U →Y , the MDG I
representing the set of input vectors is of type U →X , and the MDG FT representing
the transition relation is of type (X ∪Y )→Y ′. The result of taking the conjunction
of these three MDGs would be of type U → (X ∪Y ∪Y ′), the result of subsequently
removing the variables in X ∪Y by existential quanti3cation would be of type U →Y ′,
O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179 167
and the result of subsequently applying the renaming substitution 3 would be of type
U →Y . The RelP operation performs these three operations in one pass, and assigns
the resulting MDG of type U →Y to N .
Lines 7 and 8 check if the states reachable in one step are included in the sets of
states found in the previous iterations. This is done by using the prune-by-subsumption
operation PbyS which removes from N those paths which are instances of some paths
in R. This operation uses syntactic matching between terms that label two corresponding
nodes to 3nd such paths. If the result is the empty set, then the procedure terminates
and reports success. Otherwise, the MDG Q represents the new frontier set.
Line 9 simpli3es R by removing from it any paths that are subsumed by Q, using
PbyS. There may be such paths because Q was not computed earlier as an exact
di6erence. Then line 10 computes the new value of R by taking the disjunction of R
and Q, which represents the set of states Set Y (R)∪ Set Y (Q), and assigning it to R.
In general, this procedure may not terminate. When the MDGs generated in line
7 have a regular structure, we can schematize them by labeling some of its edges
by a special term that represents this family of in3nite objects. By using this 3nite
representation, we can manipulate them by appropriate algebraic operations, such as
uni3cation. The non-termination problem can thus be avoided. In the next sections, we
de3ne -terms and we show how to use them to solve the non-termination problem.
3. A solution to the non-termination problem
Schematization is a formalism for 3nitely describing in3nite families of objects.
Di6erent schematizations were studied during the last years, using term schemes [12],
recurrence terms [6, 15], rules with membership constraints [9], meta-rules [14], and
primal grammars [13]. We chose recurrence terms (-terms) [5] for schematizing the
in3nite states generated during reachability analysis because their algebraic operations
are decidable and -terms have been used as an extension to the Prolog language [5]
in which our MDG veri3cation system was implemented too.
3.1. Preliminaries: -terms [5]
Let F be a set of function symbols, V a set of variables and D a set of degree
variables. Let  be a special function symbol of arity 3. We use sequences of positive
numbers, also called positions, to refer to speci3c subterms in a term. The empty
sequence 6 is a position in any term t, while a sequence i · u is a position in a term
t=f(t1; : : : ; tn) only if 16i6n and u is a position in ti, where · is concatenation. If
u is a position in a term t, then the subterm t|u of t at position u is t, if u= 6, and
ti|v, if t=f(t1; : : : ; tn) and u= i · v, for some i with 16i6n. We denote by t[s]u, the
result of replacing in t the subterm at position u by s. We de3ne t[s]u to be the term
s, if u= 6, and the term f(t1; : : : ; ti−1; ti[s]v; ti+1; : : : ; tn), if t=f(t1; : : : ; tn) and u= i · v.
168 O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179
De"nition 1 (-term). A -term is either a variable in V or an expression f(t1; : : : ; tn),
where f is a function symbol of arity n and t1; : : : ; tn are -terms, or (h[]u; N; l),
where h[l]u is a ground term, u is a non-root position of h and N is a degree variable
and  is a special symbol serving as a place holder. (h[]u; N; l) is called a generator.
A -term is ground if it does not contain any variable, and is called a proper -term
if its degree variables are uninstantiated.
De"nition 2. We inductively de3ne the function Dvar which computes the degree vari-
able of a -term as follows:
Dvar(x) = ∅ if x ∈ V
Dvar(f(t1; : : : ; far(f))) if f ∈ F and t1; : : : ; far(f) are -terms
= Dvar(t1) ∪ · · · ∪ Dvar(tar(f))
Dvar((h[]u; N; l)) = {N}
A proper -term represents an in3nite set of terms of T(F;V).
We use H [n1=N1; : : : ; nm=Nm] to denote the -term obtained from a proper -term H
by instantiating the degree variables N1; : : : ; Nm by natural numbers n1; : : : ; nm,
respectively, and then evaluating the instantiated generators (h[]u; n; l) to h[h[h[: : : h
︸ ︷︷ ︸
n[l ]u : : :]u]u]u
︸ ︷︷ ︸
n
, denoted by hn[l]un .
It means that, for a particular natural number n, a -term denotes one term from
T(F;V). Formally, the unfolding is de3ned as follows:
(h[]u; 0; l) def= l
(h[]u; n+ 1; l) def= h[(h[]u; n; l)]u
Example 3. g((inc(); N; a); (inc(); M; b))[0=N; 2=M ] = g(a; inc(inc(b))).
De"nition 4 (Recurrence domain). A recurrence domain, denoted ;(H), is a set ob-
tained by unfolding a -term H for all the possible values of each of its degree
variables:
;(x) def= x
;((h[]u; N; l)) def= {(h[]u; n; l) | n ∈ N}
;(f(t1; : : : ; tn))
def= {f(s1; : : : ; sn) | si ∈ ;(ti); 16 i 6 n}
De"nition 5 (Equivalence of -terms). Two -terms L and R which share the same
degree variables N1; : : : ; Nm are equivalent, denoted as L≡R, if and only if ;(L[n1=
O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179 169
N1; : : : ; nm=Nm])=;(R[n1=N1; : : : ; nm=Nm]) for all n1; : : : ; nm ∈N. If L and R do not share
any degree variables, then L≡R if and only if ;(L)=;(R).
De"nition 6 (Inclusion of -terms). A -term L is included in a -term R, denoted as
L⊆R, where L is a ground term and the set of degree variables of R is disjoint from
those of L if and only if ;(L)⊆;(R). If R is also ground, then L⊆R if and only if
for all s in ;(L), there exists a term t in ;(R) such that s is identical to t.
We call an unfolding substitution a 3nite set us= {q1 ∗Q1 + k1=N1; : : : ; qm ∗Qm +
km=Nm} such that Ni =Nj for all i = j and Ni =Qj for all i and j, where q and k are
integers, Ni and Qi are degree variables, 16i6m; ∗ is arithmetic multiplication and
+ is arithmetic addition. An empty unfolding substitution is denoted by id.
Applying q ∗Q + k=N to a -term H is de3ned as:
x[q ∗ Q + k=N ] = x
f(s1; : : : ; sm)[q ∗ Q + k=N ] =f(s1[q ∗ Q + k=N ]; : : : ; sm[q ∗ Q + k=N ])
(h[]u;M; l)[q ∗ Q + k=N ] =(h[]u;M; l) if M = N
(h[]u; N; l)[q ∗ Q + k=N ] = hk [(hq[]uq ; Q; l)]uk
3.2. Application of -terms in MDGs
The extension of the syntax of directed formulae to incorporate -terms is straight-
forward. We allow the term A, de3ned in the syntax of Section 2, to be a proper
-term. -DF is a directed formula where some of its terms are -terms. For a -DF
P with m degree variables N1; : : : ; Nm, we extend the function ; by morphism as:
;(P) = ∨n1 ;n2 ;:::;nm∈N P[n1=N1; : : : ; nm=Nm]
where P[n1=N1; : : : ; nm=Nm] denotes the -DF obtained by instantiating each degree vari-
ables N1; : : : ; Nm by natural numbers n1; n2; : : : ; nm.
Thus, ;(P) is an in3nite disjunction obtained by unfolding each -term in P for all
possible values of each of its degree variables.
If t is a -term (h[]u; N; l), then the denotation  ((h[]u; N; l)) under an interpre-
tation  , is de3ned as ( (h)[]u; N;  (l)), where  (h) and  (l) are the interpretations
of terms h and l, respectively.
Let s denote a 3nite simultaneous substitution [n1=N1; : : : ; nm=Nm]. We write  ; ; s
|=P if a formula P[s] denotes the truth under an interpretation  , a  -compatible
variable assignment  to the free variables of P[s];  ;  |=P if  ; ; s |=P for all
substitutions s;  |=P if  ;  |=P for all  -compatible assignments ; and |=P if
 |=P for all interpretations  . Two formulae P and Q are logically equivalent if and
only if |=P⇔Q.
170 O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179
The -directed formulae are used to represent either sets or relations. For a -DF
P of type U→V, where U contains only abstract variables, and for an interpretation
 ; P represents the following set of vectors:
Set (P) = { ∈ + V |  ;  |= ∃
→
N · (∃U )P}
where
→
N represents a vector of degree variables that appear in the -terms con-
tained in P.
Lemma 7. (inc(); N; zero) is concretely reduced term.
The proof could be done by induction on the degree variable N .
• Basic case: N =0. (inc(); 0; zero)= inc(zero) · inc(zero) is a concretely reduced
term by de3nition.
• Inductive case: Using the unfolding rule the term (inc(); N + 1; zero) is equal to
inc((inc(); N; zero)). This term is concretely reduced term, since inc((inc(); N;
zero)) is concretely reduced using the inductive hypothesis, thus the term (inc();
N +1; zero) is concretely reduced term.
Theorem 8. The extension of MDG by -term is conservative.
Since a -term is concretely reduced. Thus the de3nition of well-formedness of MDG
still valid. This fact ensures that all the proven results for MDG [10] remain true for
-MDG.
3.3. Abstract state enumeration with -terms
Having incorporated -terms in the syntax of DF, we need some extensions of the
existing algorithms for MDGs. In this section, we present an extension of the reachability
analysis algorithm that includes the appropriate handling of -terms.
1. Proc Generalize(Q;K; v; t)
2. if t is a generator then t := t[Dvar(t)#K=Dvar(t)];
3. for each equation in Q like v= rhs where rsh is not a -term
4. replace rhs by t;
5. end for
6. end Generalize
Given a -term t and an abstract variable v, this procedure generalizes the variable v to
this -term with its degree variable replaced by a fresh one, obtained by concatenating
the degree variable of t with a value of the counter K . This counter counts the number
of passes through the reachability analysis loop (i.e., the number of transitions by
which the state machine advanced from the initial state). The second condition in line
3 is necessary because we need to generalize again during reachability analysis (see
O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179 171
example 11). The procedure is called by a modi3ed version of the ReAn′ described
below:
1. Proc ReAn′(D; v; t)
2. K := 0; FI :=Generalize(FI; K; v; t); R :=FI; Q :=FI;
3. loop
4. K :=K + 1;
5. Q :=Generalize(Q;K; v; t)
6. I := Fresh(X; K);
7. N :=RelP′({I; Q; FT}; X ∪Y; 3);
8. N :=Unfold(N )
9. Q := PbyS′(N; R);
10. if Q=⊥ then return success;
11. R := PbyS′(R;Q);
12. R :=Disj′(R;Q);
13. end loop;
14. end ReAn′;
where D is a description of an abstract machine as described in Section 2, v is a state
variable to be generalized and t is a -term. 3
We use a modi3ed version of the relational product, the disjunction algorithm and the
prune-by-subsumption algorithms. These new versions are extended by suitable rules
to handle -terms.
In line 2 we generalize the state variable v to the -term t. This operation is done on
the DF describing the initial state. The new operations 4 to perform at each iteration
in the loop are:
Line 5: Generalizing the variable v in the frontier set Q. This operation can be just
a renaming of the degree variable of a -term.
Line 7: Computing the states reachable in one transition from the states in Q.
Line 8: unfolding each -term in N by using the unfolding rules given in Sec-
tion 3. For example, the unfolding of the -term eqz((inc(); M; zero))= 1 gives
eqz(zero)= 1 and eqz(inc((inc(); M ′; zero)))= 1.
It is not diNcult to generalize ReAn′ when several state variables cause divergence.
The variable to be generalized and the -term are supplied by the user after observation
of the trace of the original ReAn algorithm. Unlike the generalization by variable
where we lose any partial interpretations of uninterpreted functions, our method allows
applying those rules during reachability analysis, by using the unfolding rules. This
permits a useful simpli3cation of the sets of states, thus reducing the possibility of
false negative answers to invariant checking that can result from the simple variable
generalization.
3 For the moment, this term is supplied by the user, we plan in the near future to infer automatically this
term from structural pattern of divergence.
4 The other operations remain unchanged.
172 O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179
4. Examples
In this section, we illustrate our method on three di6erent examples. Each example
is chosen to demonstrate one particular aspect of the method. The 3rst one shows
that in some cases the generalization of the initial state is suNcient. Also, we give
an outline of the uni3cation algorithm of -terms [5]. The second one illustrates the
use of rewriting rules to obtain a regular structure. For this example, it also suNces
to generalize the initial state using a -term. The third and 3nal example is more
complicated and requires more than one generalization during reachability analysis.
Example 9. Consider a synchronous machine which includes a data register count,
a multiplexer mux, and a functional block represented by the uninterpreted function
symbol inc which takes count as its input and produces an abstract value inc(count).
The transition relation Tr of this machine is as follows:
Tr ≡ ((y = 0) ∧ count′ = count) ∨
((y = 1) ∧ count′ = inc(count))
where count′ is the next state variable of the register.
Suppose that count initially contains a generic constant zero. If we explore the state
space using ReAn, the procedure never terminates and generates an unbounded se-
quences of values for count: zero; inc(zero); inc(inc(zero)); : : : ; inck(zero). The variable
to be generalized is count and the -term is H =(inc(); N; zero). Hence, we use
ReAn′ to do reachability analysis as follows:
ReAn′(({y}; {count}; ∅; count = zero; Tr; ∅); count; H):
In line 5, the call to Gen(count= zero; 0; count; H) returns a -DF representing the
initial state of the machine
P0 = [count = (inc(); N0; zero)]
The next states computed in line 7 are described by the formula
P1 = (count = (inc(); N1; zero)) ∨ (count = inc((inc(); N1; zero))) (1)
The -DF formula in Eq. (1) represents the set
Set (P1) = { ∈ + {count} |  ;  |= ∃N1:P1}
and the set represented by initial -DF P0 is
Set (P0) = { ∈ + {count} |  ;  |= ∃N0:P0}
The problem now is to show that
Set (P1) ⊆ Set (P0)
O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179 173
This inclusion is checked by PbyS′(P1; P0). Informally, this operation can be viewed as
PbyS(;(P1); ;(P0)). There are two edges issuing from the node count in P1, labeled by
(inc(); N1; zero) and inc((inc(); N1; zero)), respectively. There is one edge issuing
from the same node count in P0, labeled by (inc(); N0; zero). It remains to show
that the -terms in P1 are included in the -term in P0.
For the 3rst subproblem, ∀N1:∃N0:(inc(); N1; zero)⊆(inc(); N0; zero) we replace
N0 by N1. Hence, we get (inc(); N1; zero)⊆(inc(); N1; zero). For the second,
∀N1:∃N0:inc((inc(); N1; zero))⊆(inc(); N0; zero), we replace the variable N0 by
N1 + 1. By applying the unfolding rules, we get:
inc((inc(); N1; zero)) ⊆ inc((inc(); N1; zero)
Therefore, after one pass through the loop, the newly reached states are covered by
the initial states and the procedure terminates.
Example 10. Consider a more complex synchronous circuit which consists of a data
register count, two multiplexers mux1 and mux2, and three functional blocks repre-
sented by uninterpreted function symbols inc; dec, and eqz. The functions inc and dec
take as their input count and produce an abstract output inc(count) and dec(count),
respectively. The cross-term eqz takes as its abstract input count and produces a
concrete output of sort bool. The transition relation R of this machine is as
follows:
R≡ [((y = 0) ∧ count′ = inc(count))∨
((y = 1) ∧ eqz(count) = 0 ∧ count′ = dec(count))∨
((y = 1) ∧ eqz(count) = 1 ∧ count′ = count)]
where count′ represents the next state variable of count.
We de3ne the following rewriting rules to give a partial meaning to the function
symbols:
(1) eqz(zero) → 1
(2) eqz(inc(x)) → 0
(3) dec(inc(x)) → x
Suppose that register count initially contains a generic constant zero. Reachability anal-
ysis of this machine produces an in3nite number of states for the register count, con-
taining the values zero; inc(zero); inc(inc(zero)); : : : . This regular structure is obtained
by removing the dec operator by rewites. This divergence suggests to generalize the
register count to the -term H = (inc(); N; zero). The initial state is thus described
by the -DF: s0 = [count = H ].
174 O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179
After one transition, the reached states are:
s1 = [eqz(H [N1=N ]) = 1 ∧ count = H [N1=N ]];
s2 = [eqz(H [N1=N ]) = 0 ∧ count = dec(H [N1=N ])]; and
s3 = [count = inc(H [N1=N ])]:
It is clear that s3 is covered by s0, since inc(H [N1=N ])⊆H (see the 3rst example).
The state s1 is also covered by the initial state, because
• If N1 = 0, then s1 can be rewritten to:
s1 = [eqz((inc(); 0; zero)) = 1
∧ count = (inc(); 0; zero)]
( by applying the unfolding substitution)
= [eq(zero) = 1 ∧ count = zero]
( by applying the unfolding rules)
= [count = zero] ( simpli3cation by rewriting with rule (1))
= s′1
We can see immediately that s′1 is an instance of the initial state s0, by replacing the
degree variable N of H by 0.
• If N1 =N ′ + 1, s1 can be rewritten to:
s1 = [eqz((inc(); N ′ + 1; zero)) = 1
∧ count = (inc(); N ′ + 1; zero)]
( by applying the unfolding substitution)
= [eqz(inc((inc(); N ′; zero))) = 1
∧ count = inc((inc(); N ′; zero))]
( by applying the unfolding rules)
= [0 = 1 ∧ count = inc((inc(); N ′; zero))]
( simpli3cation by rewriting with rule (2))
= ⊥ (It means that in this case s1 is unreachable:)
It remains to show that the sets of states described by s2 is covered by s0.
O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179 175
Fig. 1. A state machine.
We have:
• If N1 = 0, then s2 can be rewritten to:
s2 = [eqz((inc(); 0; zero)) = 0
∧ count = dec((inc(); 0; zero))]
( by applying the unfolding substitution)
= [eqz(zero) = 0 ∧ count = dec(zero)]
( by applying the unfolding rules)
= [1 = 0 ∧ count = dec(zero)]
( simpli3cation by rewriting with rule (1))
= ⊥(It means that in this case that s2 is unreachable:)
• If N1 =N ′ + 1, then s2 can be rewritten to:
s2 = [eqz((inc(); N ′ + 1; zero)) = 0
∧ count = dec((inc(); N ′ + 1; zero))]
( by applying the unfolding substitution)
= [eqz(inc((inc(); N ′; zero))) = 0
∧ count = dec(inc((inc(); N ′; zero)))]
( by applying the unfolding rules)
= [count = (inc(); N ′; zero)]
( simpli3cation by rewriting with rules (2) and (3))
= s′2
s′2 is instance of s0, by letting N equal to N
′.
Example 11. Our third example concerns the synchronous machine shown in Fig. 1.
It consists of three states R1, R2 and R3. We use one state variable R of concrete sort
{R1; R2; R3} to describe the behavior of this machine.
176 O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179
The transition relation is as follows:
R ≡
(ie = 0 ∧ R = R1 ∧ R′ = R1 ∧ c′ = c) ∨
(eqz(c) = 0 ∧ R = R1 ∧ R′ = R1 ∧ c′ = c) ∨
(ie = 1 ∧ eqz(c) = 1 ∧ R = R1 ∧ R′ = R2 ∧ c′ = c) ∨
(iy = 0 ∧ ie = 0 ∧ R = R2 ∧ R′ = R2 ∧ c′ = c) ∨
(iy = 0 ∧ ie = 1 ∧ R = R2 ∧ R′ = R3 ∧ c′ = inc(c)) ∨
(iy = 1 ∧ R = R2 ∧ R′ = R1 ∧ c′ = c) ∨
(ie = 1 ∧ R = R3 ∧ R′ = R3 ∧ c′ = c) ∨
(ie = 0 ∧ R = R3 ∧ R′ = R2 ∧ c′ = c)
De3ne the following rewriting rules:
eqz(zero) → 1
eqz(inc(x)) → 0
The initial state of this machine is s0 = [R=R1 ∧ c= zero].
The states reached after one transition are s1 = [R=R1 ∧ c= zero], and s2 = [R=R2 ∧
c= zero]:
State s1 is covered by s0, hence we continue the analysis with s2. After the next
transition, there are three possible states
s3 = [R = R3 ∧ c = inc(zero)];
s4 = [R = R2 ∧ c = zero]; and
s5 = [R = R1 ∧ c = zero]:
s4 is covered by s2 and s5 is covered by s0. If we continue the exploration, we would
3nd that the procedure never terminates. The divergence occurs because the value of
the register c can be unbounded sets of values:
zero; inc(zero); inc(inc(zero)); : : :
This divergence suggests to generalize the register c to the -term H=(inc(); N; zero).
The initial state becomes s0 = [R=R1 ∧ c=H ].
After one transition, we can reach three possible states:
s1 = [R = R1 ∧ c = H [N0=N ]];
s2 = [R = R1 ∧ eqz(H [N0=N ]) = 0 ∧ c = H [N0=N ]]; and
s3 = [R = R2 ∧ eqz(H [N0=N ]) = 1 ∧ c = H [N0=N ]]:
O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179 177
• If N0 = 0, then
s2 = [R = R1 ∧ eqz(zero) = 0 ∧ c = zero]
= ⊥(this state is unreachable)
s3 = [R = R2 ∧ eqz(zero) = 1 ∧ c = zero]
= [R = R2 ∧ c = zero]
= s′3
• If N0 =N ′ + 1, then
s2 = [R = R1 ∧ eqz(inc(H [N ′=N ])) = 0 ∧ c = inc(H [N ′=N ])]
= [R = R1 ∧ c = inc(H [N ′=N ])]
s3 = [R = R2 ∧ eqz(inc(H [N ′=N ])) = 1 ∧ c = inc(H [N ′=N ]]
= ⊥(this state is unreachable)
The register c in s′3 must be generalized again, and we then continue the analysis from
s′′3 = [R=R2 ∧ c=H ] (H replacing zero).
After one transition we reach
s4 = [R = R3 ∧ c = inc(H [N0=N ])];
s5 = [R = R2 ∧ c = H [N0=N ]]; and
s6 = [R = R1 ∧ c = H [N0=N ]:
s5 is covered by s′′3 and s6 is covered by s0, hence we continue the analysis from s4.
After one transition, we reach
s7 = [R = R3 ∧ c = inc(H [N0=N ][N1=N0])] and
s8 = [R = R2 ∧ c = inc(H [N0=N ][N1=N0])]:
s7 is covered by s4 and s8 is covered by s3, since inc(H [N0=N ][N1=N0])⊆H [N0=N ].
After three transitions the procedure terminates.
5. Conclusions
The non-termination problem of reachability analysis is a severe limitation of the
methods based on abstract state machines. We have presented a new approach based
on schematization using -terms, to 3nitely represent the in3nite sets of states gener-
ated during reachability analysis. Schematization presents a suitable formalism to deal
explicitly by 3nite means with in3nites families of objects, i.e., it describes by 3nite
expressions in3nites sets of terms. It permits to manipulate e6ectively the schema-
tized sets. It is constructed so that the uni3cation problem and the inclusion problem
178 O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179
for the schematized sets are decidables. This means that we can use these operations
in the reachability analysis and invariant checking. We have also proposed an exten-
sion to the syntax of MDGs and to the reachability analysis algorithm to incorporate
-terms. Furthermore, schematization enlarges the set of states just enough to deal with
non-termination, since the generalization is performed by a -term representing a more
restricted form of information than a fresh free variable used in the earlier approaches.
Thus, the false negatives introduced by generalization with variables are considerably
reduced, because the -term represents all the possibles states of the machine, and
contains only symbols de3ned in the machine.
Future work is directed to deriving suNcient conditions for detecting divergence of
reachability analysis. These conditions can be described as structural patterns of the
transition relation of a given abstract state machine. It would be interesting to have a
method that automatically infers a -term from the structural pattern of divergence.
Acknowledgements
The work was partially supported by NSERC Canada-Nortel Cooperative Research
Grant CRD 191958.
References
[1] S. Bose, A.L. Fisher, Automatic veri3cation of synchronous circuits using symbolic logic simulation and
temporal logic, in: Proc. IMEC–IFIP Workshop on Applied Formal Methods for Correct VLSI Design,
1989.
[2] R.E. Bryant, Graph-based algorithms for boolean function manipulation, IEEE Trans. Comput. 35 (8)
(1986) 677–691.
[3] R.E. Bryant, D.L. Beatty, C.-J.H. Seger, Formal hardware veri3cation by symbolic ternary trajectory
evaluation, In 28th ACM=IEEE Design Automation Conf., 1991.
[4] J.R. Burch, E.M. Clarke, D.E. Long, K.L. McMillan, D.L. Dill, Symbolic model checking for sequential
circuit veri3cation, IEEE Trans. Comput. Aided Design 13 (4) (1994) 401–424.
[5] H. Chen, J. Hsiang, Recurrence domains: their uni3cation and application to logic programming, Inform.
and Comput. 122 (1995) 45–69.
[6] H. Chen, J. Hsiang, H.-C. Kong, On 3nite representations of in3nite sequences of terms, in: S. Kaplan,
M. Okada (Eds.), Proc. 2nd Internat. Workshop on Conditional and Typed Rewriting Systems, Montreal
(Canada), Lecture Notes in Computer Science, Vol. 516, Springer, Berlin, June 1990, pp. 100–114.
[7] H. Cho, G. Hachtel, S.-W. Jeong, B. Plessier, E. Schwarz, F. Somenzi, ATPG aspects of FSM
veri3cation, in: Internat. Conf. on Computer-Aided Design, 1990.
[8] L. Claesen, F. Proesmans, E. Verlind, H. De Man, SFG-tracing: a methodology for the automatic
veri3cation of MOS transistor level implementations from high level behavioral speci3cations, in: P.A.
Subrahmanyam (Ed.), Internat. Workshop on Formal Methods in VLSI Design, Miami, FL, January
1991.
[9] H. Comon, Completion of rewrite systems with membership constraints, in: W. Kuich (Ed.), Proc. 19th
ICALP Conf., Wien (Austria), Lecture Notes in Computer Science, Vol. 623, Springer, Berlin, July
1992, pp. 392–403; exists also as Research report 699, LRI, Orsay.
[10] F. Corella, Z. Zhou, X. Song, M. Langevin, E. Cerny, Multiway decision graphs for automated hardware
veri3cation, Formal Methods System Design 10 (1) (1997) 7–46.
[11] O. Coudert, J.C. Madre, A uni3ed framework for the formal veri3cation of sequential circuits, in:
Internat. Conf. on Computer-Aided Design, 1990.
O.A. Mohamed et al. / Theoretical Computer Science 300 (2003) 161–179 179
[12] B. Gramlich, Uni3cation of term schemes — theory and applications, SEKI Report SR-88-18, Universit)at
Kaiserslautern, Germany, 1988.
[13] M. Hermann, On the relation between primitive recursion, schematization, and divergence, in: H.
Kirchner, G. Levi (Eds.), Proc. 3rd Conf. on Algebraic and Logic Programming, Volterra (Italy),
Lecture Notes in Computer Science, Vol. 632, Springer, Berlin, September 1992, pp. 115–127.
[14] H. Kirchner, Schematization of in3nite sets of rewrite rules generated by divergent completion process,
Theoret. Comput. Sci. 67 (2–3) (1989) 303–332.
[15] G. Salzer, The uni3cation of in3nite sets of terms and its applications, in: A. Voronkov (Ed.), Proc. 3rd
Internat. Conf. on Logic Programming and Automated Reasoning, St. Petersburg (Russia), Lecture Notes
in Computer Science (in Arti3cial Intelligence), Vol. 624, Springer, Berlin, July 1992, pp. 409–420.
[16] H.J. Touati, H. Savoj, B. Lin, R.K. Brayton, A. Sangiovanni-Vincentelli, Implicit state enumeration of
3nite state machines using BDDs, in: Internat. Conf. on Computer-Aided Design, 1990.
[17] Z. Zhou, X. Song, S. Tahar, F. Corella, E. Cerny, M. Langevin, Formal veri3cation of the island tunnel
controller using multiway decision graphs, in: Proc. Internat. Conf. on Formal Methods in Computer
Aided Design (FMCAD’96), Palo Alto, California, USA, November 1996.
