Controlling state explosion during automatic verification of delay-insensitive and delay-constrained VLSI systems using the POM verifier by Jensen, L. & Probst, D.
3rd NASA Symposium on VLSI Design 1991
N94-18365
8.2.1
Controlling State Explosion During
Automatic Verification of Delay-Insensitive
Delay-Constrained VLSI Systems
Using the POM Verifier 1
and
D. Probst and L. Jensen
Department of Computer Science
Concordia University
1455 de Maisonneuve Blvd. West
Montreal, Quebec Canada H3G 1MS
Abstract- Delay-insensltive VLSI systems have a certain appeal on the ground
due to difficulties with clocks; they are even more attractive in space. We an-
swer the question, is it possible to control state explosion arising from various
sources during automatic verification (model checking) of delay-insensltive sys-
tems? State explosion due to concurrency is handled by introducing a partial-
order representation for systems, and defining system correctness as a simple
relation between two partial orders on the same set of system events (a graph
problem). State explosion due to nondeterminlsm (chiefly arbitration) is han-
dled when the system to be verified has a clean, finite recurrence structure.
Backwards branching is a further optimization. The heart of this approach is
the ability, during model checking_ to discover a compact finite presentation
of the verified system without prior composition of system components. The
fully-implemented POM verification system has polynomial space and time
performance on traditional asynchronous-circuit benchmarks that are expo-
nential in space and time for other verification systems. We also sketch the
generalization of this approach to handle delay-constrained VLSI systems.
Keywords: delay-insensitive system, model checking, state explosion, partial-order rep-
resentation, recurrence structure, state encoding, delay-constrained reactive system.
1 Introduction
Delay-insensitive systems are motivated by difficulties with clock distribution and compo-
nent composition in clocked systems [1,2,5,9]. In a delay-insensitive system, modules may
be interconnected to form systems in such a way that system correctness does not depend
on delays in either modules or interconnection media. Gate-level implementations of mod-
ules whose specifications are delay-insensitive are often themselves quasi-delay-insensitive;
essentially, the assumption of isochronic forks allows one gate to handshake on behalf of
lThis research was supported by the Natural Sciences and Engineering Research Council of Canada under
grants A3363 and MEF0040121. Email: probst@erim.ca.
https://ntrs.nasa.gov/search.jsp?R=19940013892 2020-06-16T18:06:20+00:00Z
8.2.2
another. Most interesting are delay-constrained reactive systems, in which either outputs
or inputs or both must appear in some temporal window relative to enabling inputs or
outputs. Hardware systems in space make delay insensitivity even more attractive due
to (i) pervasive asynchronous communication, and (ii) extremely-low-power applications.
Delay insensitivity has a natural link to controlling state expIosion during automatic veri-
fication; the simple enabling relations in delay-insensitive control systems make it easy to
discover a solution to the state-explosion problem based on causality checking. To build an
atitomatic verifier based on causality checking, you need two things: (i) an expressive fi-
nite partial-order representation strategy that explicitly distinguishes concurrency, choice
and recurrence, and (ii) a "goal-directed" state-encoding strategy that is both compre-
hensive (includes aI] causality) and minimal (has fewest states)--the last for performance
reasons. Given these two things, you can combine the best features of automata-based and
partial-order-based computational verification methods.
2 Behavior Automata
The basic automata used to represent processes are called behavior automata, which can
be unrolled to produce event structures (essentially sets of partially-ordered computations
with all branching due to conflict resolution made explicit) [5-8]. Partial orders and con-
current computation are discussed in [3]. Restrictions on behavior automata trade off
between expressiveness and processability (e.g., the efficiency of verification algorithms)
[8]. The most important rules for delay insensitivity are (cf. [10]):
Rule 1 Any two events at the same port in a partially-ordered computation are order-
separated by at least one event at some other port.
Rule 2 There is no immediate order relation between two input events or two output
events. Each ordering chain is an infinite sequence of strictly alternating input and
output events.
We seek abstract, i.e., black-box, specifications [4]. For this purpose, behavior automata
are constructed in three phases. First, there is a deterministic finite-state machine (stick
figure) that expresses both conflict resolution (choice) and recurrence structure. This is b.
"small" automaton relative to the full transition system. Second, there is an expansion of
dfsm transitions (sticks) into finite posers, with additional machinery (sockets) to define
possibly nonsequential concatenation of posers. Third, there is an iterative process of
labeling successor arrows in posers, which terminates with an appropriate state encoding.
We sketch the formal definition of behavior automaton. Given disjoint alphabets Act
(process actions), Arr (successor-arrow labels), Corn (dfsm transitions) and Soc (sockets),
first define Pos as the set of finite labeled posers over Act U Soc. Each member of Pos is
a labeled poser (B, P, t_), where (i) F is a partial order over B C_ Act U Soc, and (ii) r,: ft
Arr assigns a label to each element in the successor relation ft (the transitive reduction
of F). A behavior automaton is a 3-tupie (D, 77, o), where (i) D is a dfsm over Corn, (ii)
3rd NASA Symposium on VLSI Design 1991 8.2.3
r]: Corn _ Pos maps dfsm transitions to labeled posets, and (iii) o: Soc --_ powerset(Act)
maps sockets to sets of process actions. Map o defines which process actions can "plug in"
to an empty socket when a poset command is concatenated to a sequence of earlier poset
commands as defined by dfsm D.
A C-element has two input ports a and b, and an output port _e. Two actions are
possible at a given port depending on whether the signal transition is rising (+) or falling
(-). There is no conflict resolution (choice), and the recurrence structure of D is a simple
loop. Transitions (sticks) concatenate sequentially in this example, shown in Fig. 1. Both
the reset action and action e_- can fill the unique socket in this poset. Digit colons identify
dfsm D vertices.
:0
Figure 1: Behavior automaton for a C-element.
In the absence of conflict resolution, each enabled output action must be performed
eventually (indicated by bracketing). The use of both dashed and solid arrows is a visual
reminder that a process specification contains both an interprocess protocol (given by
the dashed arrows) and an intraprocess protocol (given by the solid arrows). Here, the
state encoding (arrow labeling) is essentially fixed; since the state is encoded as the set
of successor arrows crossing from the past to the future, i.e., crossing a consistent cut
produced by a partial execution, using fewer arrow labels would alter the enabling relation
of the C-element.
The semantics are straightforward. For example, action a + is enabled in any state
containing arrow 1; when it is performed, arrow 1 is removed from the state and arrow 3
is added. Similarly, action c__+ is enabled and required (because of the bracket) in any state
containing arrows 3 and 4. When it is performed, arrows 3 and 4 are removed from the
state and arrows 5 and 6 are added. Action c- has preset and postset given by: {7, 8} c-
(1, 2}.
Behavior automata are more interesting when branching is involved. A delay-insensltive
arbiter has two input ports a and b, and two output ports e_ and d. It grants exclusive
access to one of two competing clients at a time. The behavior automaton is shown in Fig.
2.
Clients follow a four-cycle protocol. (A) = c +1 4-4 a- and (B) = _d+] 4-4 b- are the
two critical sections. The labeling shown, if completed, would be conservative (the state
encoding includes all causality, but is not minimal). Having arrows 8, 9 and 10 in state
encodings indicates who made the token available (viz., first client, second client and
reset action). These three arrows are distinct instances of causality that must be checked
separately. Still, there are too many state encodings.
We can group arrows 8, 9 and 10 into an equivalence class t. This does not alter the








o _c_c-] -_. a+ 1\,/





Figure 2: Behavior automaton for a delay-insensitive arbiter.
of arrow t requires backing up in the behavior automaton to both possible sources, viz.,
actions a- and b-. In state {1, 5, t}, cc_+ and d + are concurrently enabled but conflicting
actions. Verification algorithms that process behavior automata perform both forwards
branching (conflict resolution) and backwards branching (examination of distinct recent
pasts).
After equivalenced arrow t has been defined, we can complete the picture in Fig. 2 to
make it match the formal definition (the labeled arrows leaving posers are derivable from
map o). Consider the second poser command. The top socket is filled only by action a+;
its arrow is labeled 1. The middle socket is filled by any of the actions a-, b- and reset;
its arrow is labeled t. The remaining (interior) poser arrows are given arbitrary distinct
labels.
3 Correctness as a Graph Problem
We define correctness by using the mirror mP of specification P as a conceptual imple-
mentation tester [1]. Wc form an imaginary closed system S by linking mirror mP of
specification P to the implementation network of processes Net. This produces an infinite
pomtree (event structure) of system events on which two partial orders are defined; sys-
tem correctness is then expressible as a simple, easily-checked relation between the partial
orders. The standard model-independent notion of correctness is as follows. Is there a
.(___lure somewhere, causing system S to become undefined? Does the system just stop,
violating fundamental liveness? Is some progress requirement of P violated? Is there
(program-detectable) nondeterminate llvelock in S so that an appeal to fairness of sys-
tem components is necessary to assert progress? Is some conflict corresponding to output
choice in P resolved unfairly?
Mirror mP is formed by inverting the type of P's actions and the causal/noncausal
interpretation of P's successor arrows, turning P's dashed arrows into solid arrows and
vice versa. Brackets are preserved unchanged. Every action that can be performed in S is
a linked (output action, input action) pair. As a result, we can check whether intraprocess
protocols support interprocess protocols in closed system S.
We bootstrap the dashed (noncausal, intcrprocess protocol) and solid (causal, intrapro-
cess protocol) relations from process actions to system actions, defining an event structure
3rd NASA Symposium on VLSI Design 1991 8.2.5
(sometimes called pomtree) with a noncausal enabling relation on top of the usual causal
enabling one. For example, a noncausal predecessor of system action er is found by locat-
ing the embedded process input action, stepping back along a dashed process arrow, and
returning to the system alphabet. We have thus defined "noncausal preset" of a system
action. Essentially, the safety correctness relation is: whenever a dashed arrow links two
system actions, a chain of solid arrows must also link the two actions.
Let cr be a system action that is causally enabled in S. There is a safety violation at cr
unless
(a) its noncausal preset is also causally enabled in S, and
(b) each member of its noncausal preset is a causal ancestor of or.
The causal preset of cr is defined only when cr is a bracketed system action: it is the set
of nearest performances of linked mP output actions on any causal chain coming into a.
In order that a bracketed _ in S is neither a safety nor a progress violation, it is necessary
that the causal and noncausal presets of _ match exactly. When backwards branching is
present in S, these conditions are generalized to hold along each distinct past (backwards
branch). Backwards branching is necessary to resolve multiple sources of equivalenced
arrows.
4 Model Checking
The algorithm is straightforward. Starting from system reset, we enumerate causally-
enabled system actions and visit one system cut per action. We consider each enabled
action in a state produced by some partially-ordered past that we have generated. First,
we repeatedly step back across single dashed arrows to compute the action's noncausal
preset. Second, we repeatedly (finitely) chain back across multiple solid arrows to compute
the action's partial causal ancestor set (or causal preset if the action is bracketed). When
equivalenced arrows are encountered, we branch backwards to check each possible source.
The speedup is due to two effects:
1. we effectively check cuts in the generated past that we have passed by without vis-
iting, and
2. for equivalenced arrows, we effectively check cuts in pasts that we have not generated.
This kills state explosion due to concurrency and/or nondeterminism. We traverse
each determinate segment (stick) of the implicitly constructed system behavior automaton
(stick figure) precisely once. Backwards branching catches all causality that would have
been visible had we traversed the system stick figure in some other way. Example system
stick figures are shown in Fig. 3.
We keep the termination table small by making the mapping from P states to S states




g • • •C) (_) C) C)
2-DME 3-DME 4-DME
Figure 3: System stick figures for the n-DME verification problem.
visible branching and recurrence structure. Explicit structure in each component allows
the verification algorithm to uncover a structure in system S. In particular, when we cycle
in P, we can arrange to cycle in S. As a result_ termination is achieved by checkpointing
very few global states of system S. The top level of the algorithm visits system actions and
tries to complete P sticks, _¢ lower level of the algorithm does arrow CheCking.
5 Output-delay-constrained reactive systems
To fix ideas, consider a hardware system that is a space-based component of a missile
defense system; this component receives massive amounts of target-acquisition data asyn-
chronously, and is required to process it in real time and communicate the result. There
_re two types of delay constraint that could appear in a requirements specification of such
a component, which is a typical reactive system. First, there could be a temporal interval,
relative to the arrival of a complete problem instance, during which the component must
respond; this is an output delay constraint. Second, there could be a temporal interval, rel-
ative to the departure of the previous result and/or the arrival of other input, during which
the external world can safely stimulate the component; this is an input delay constraint.
The simplest delay-constralned reactive systems are those in which delay constraints are
imposed only on the intraprocess protocol, i.e., on module response; in this case, the
mechanism that ensures input safety is unchanged (the interprocess protocol is still real
or virtual handshaking). The difficult case is an interprocess protocol that specifies when
the module can be overwhelmed by hlgh-bandwldth input; we leave the difficult case for
future work. In our representation, minimum/maximum-delay information is expressed by
putting timing windows directly on output actions. Minimum-delay information may be
freely entered on successor arrows, but maximum-delay semantics is constrained by ques-
tions of physical realizability. We choose the following uniform semantics. If bracketed
output action _c is annotated with the temporal interval (train, tmax), then action _c will
be performed no cartier than train units and no Iater than tmax units after the holding of
its preset pre(c__).
The standard verification algorithm for precedence constraints (described in section
4) can easily be extended to check these new delay constraints. When checking for a
3rd NASA Symposium on VLSI Design 1991 8.2.7
(precedence) safety violation at system action _r, we determine whether there is a causal
chin to _r from each member of a's noncausal preset, say, pre(_r). First, copy the timing
window on each output action to each of its predecessor arrows. Second, find the sums
of train and tmax along all causal chins to _r from each member of its noncausal preset
pre(_). Consider the m_mum delay case. For _ C pre(_), define D(,, _) as the m_mum
sum of tmax values along any causal chin from _" to or. Then system action _r will be
performed no later than max over _" of D(v, cr) units after the holding of its noncausal
preset pre(a). For the minimum delay case, define d(v, cr) as the maximum sum of tmin
values, and take the rain over r of d(% _r); cr will be performed no earlier than this many
units after the holding of its noncausal preset.
6 Conclusion
A complete verification package has been written by Lin 3ensen in the Trilogy program-
ming language running on an IBM PC. The POM system has polynomial space and time
performance on benchmarks that are exponential in space and time for other verification
systems. Consider the ring of DME elements benchmark. The runtime for verification of
both safety and progress properties is quadratic in n, the number of DME elements. The
number of system states grows exponentially with n. For example, when when n = 9, the
time is 180 s (roughly 109 states); when n = 10, the time is 220 s (roughly 101° states). The
space requirements for these problems do not exceed 64K bytes, i.e., one IBM PC data seg-
ment. What are the compiler-lndependent space requirements? One must store the input;
this is linear. One must store the termination table; this is quadratic. Given reasonable
garbage collection, the working storage to do backwards chaining in a partially-ordered
system computation is linear, because one constructs and compares simple presets. The
limiting resource is the quadratic space used to store the termination table. To repeat,
both space and time are quadratic, in this example, to verify a concurrent system with
exponentially many states. Building up the actual partially-ordered system computations
themselves is unnecessary; we work directly with the uncomposed behavior automata of
the system components. We have also shown, at least in the simple case of output-delay-
constrined reactive systems, that verifying temporal window constraints is barely more
expensive than verifying precedence constrints. In general, the achievable efficiency of
a real-time verification algorithm is a sensitive function of the precise abstraction of real
time used in the model.
References
[1] D.L. Dill, "Trace theory for automatic hierarchical verification of speed-independent
circuits"_ PhD Thesis, Department of Computer Science, Carnegie Mellon University,
Report CMU-CS-88-119, February 1988. Also MIT Press, 1989.
8.2.8
[2] A. J. Martin, "Compiling communicating processes into delay-insensitive VLSI cir-
cuits", Djstiibu.ted Comput[ng, Voi.1. No=_4,_to_ber _986,pp_ 2976--23--4..........
[3] V. R. Pratt, "Modeling concurrency with partial orders", Int. J. of Parallel Prog.,
_Vo!.15, No.l, February 1986, pp. 33-71.
[4] D. K. Probst and H. F. Li, "Abstract specification of synchronous data types for
VLSI and proving the correctness of systoh'c network implementations", IEEE Trans.
on Computers, Vol. C-37, No. 6, June 1988, pp, 710.-720.
[5] D. _K. Prohst and H. F. Li, "Abstract specification, composition and proof of cor-
rectness of delay-insensitive circuits and systems", Technic _ .Report, Department of
0omputer Science, Coqcor_a University, CS-VLSI-88-2, April 1988 (Revised March
1989).
[6] D. K. Probst and H. F. Li, "Parfi_-0rder model checking of delay-insensitive sys-
tems". In R. Hob_son et 8/. (Eds.), C_nadian Conference on VLSI 1989, Proceedings,
Vancouver, BC, October 1989, pp. 73-80.
[7] D. K. Probst and H. F. Li, "Using parti_-order semantics to. avoid the state explosion
problem in asynchronous systems". !_ E. M. Clarke _nd R. P. Kurshan, (Eds.), Work-
shop on Computer-Aided Verification '90, D!MACS Series, Vol. 3, 1991, pp. 15-24.
Also Leer. Notes in Comput. Sci., Springer Verlag, forthcoming.
[8] D. K. probst and H. F. L i, "Partial-order model checking: A guide for the perplexed".
In K. (3. Larsen and A. Skou, (Eds.), Workshop on Computer-Aided Verification '91,
Proceedings, Department of Ma.thematics and Computer Science, Aalborg University,
Report IR-91-5, July 1991, pp. 4p5-4}16. Also Lect. Notes in Comput. Sci., Springer
Vet lag, forthcoming.
[9] J.v.d. Snepscheut, "Trace theory _nd VLSI design", Leer. Notes in Comput. Sci. 200,
Springer Verlag, 1985.
[10] 3.T. Udding, "A formal model for defining and classifying delay-insensitive circuits",
Distribtl_ecl Computing, Vol. 1, No. 4, October 1986, pp. 197-204.
