Abstract. In formal design veri cation, successful model checking is typically preceded by a laborious manual process of constructing design abstractions. We present a methodology for partially|and in some cases, fully|bypassing the abstraction process. For this purpose, we provide to the designer abstraction operators which, if used judiciously in the description of a design, structure the corresponding state space hierarchically. This structure can then be exploited by veri cation tools, and makes possible the automatic and exhaustive exploration of state spaces that would otherwise be out of scope for existing model checkers. Speci cally, we present the following contributions:
{ A temporal abstraction operator that aggregates transitions and hides intermediate steps. Mathematically, our abstraction operator is a function that maps a at transition system into a two-level hierarchy where each atomic upper-level transition expands into an entire lower-level transition system. For example, an arithmetic operation may expand into a sequence of bit operations. { A BDD-based algorithm for the symbolic exploration of multi-level hierarchies of transition systems. The algorithm traverses a level-n transition by expanding the corresponding level-(n ?1) transition system on-the-y. The level-n successors of a state are determined by computing a level-(n ? 1) reach set, which is then immediately released from memory. In this fashion, we can exhaustively explore hierarchically structured state spaces whose at counterparts cause memory over ows.
{ We experimentally demonstrate the e ciency of our method with three examples|a multiplier, a cache coherence protocol, and a multiprocessor system. In the rst two examples, we obtain signi cant improvements in run times and peak BDD sizes over traditional state-space search. The third example cannot be model checked at all using conventional methods (without manual abstractions), but can be analyzed fully automatically using transition hierarchies.
Introduction
Formal design veri cation is a methodology for detecting logical errors in high-level designs. In formal design veri cation, the designer describes a system in a language with a mathematical semantics, and then the system description is analyzed against various single transition of next ' for M aggregates as many transitions of M as are required to satisfy the condition ', and hides the intermediate steps. For example, if M is a gatelevel description of an ALU, and ' signals the completion of an arithmetic operation, then next ' for M is an operation-level description of the ALU. Mathematically, the semantics of next ' for M is de ned as a two-level hierarchy of transition systems:
each transition of the upper-level (e.g., operation-level) transition system abstracts an entire lower-level (e.g., gate-level) transition system. Then, by nesting next operators we obtain multi-level hierarchies of transition systems. The structuring of a state space into a multi-level transition hierarchy makes possible the exhaustive exploration of very large state spaces. This is because after the traversal of a level-n transition, the computed reach set for the corresponding level-(n ? 1) transition system represents hidden intermediate steps and can be removed from memory. In Section 2, we brie y review the language of Reactive Modules and give a simple example of a transition hierarchy. In Section 3, we introduce an algorithm for the symbolic exploration of transition hierarchies. In Section 4, we present experimental results that demonstrate the e ciency of our algorithm. For this purpose, we design a system comprising two processors with simple instruction sets, local caches, and shared memory. If we simply put together these components, using parallel composition but no next operator, the resulting at transition system is far beyond the scope of existing model checkers. If, however, we use the next operator to aggregate and hide internal transitions between synchronization points before composing the various subsystems, the resulting transition hierarchy can be explored using the search routines of VIS, and correctness requirements can be checked fully automatically. Thus, the description of a design using next can eliminate the need for manual abstractions in veri cation. Related work. The concept of temporal abstraction is inspired by the notion of multiform time in synchronous programming languages BlGJ91, Hal93] , and by the notion of action re nement in algebraic languages AH89]. All of that work, however, concerns only the modeling of systems, and not automatic veri cation.
Temporal abstraction is implicitly present also in the concept of stuttering Lam83]: a stuttering transition of a system is a transition that leaves all observable variables unchanged. Ignoring di erences in the number of stuttering transitions leads to various notions of stutter-insensitive equivalences on state spaces (e.g., weak bisimulation). This suggests the following approach to model checking: for each component system, compute the appropriate stutter-insensitive equivalence, and before search, replace the component by the smaller quotient space. This approach, which has been implemented in tools such as the Concurrency Workbench CPS93], requires the manipulation of the transition relations for individual components, and has not been shown competitive with simple search (cf. Section 3.1 vs. Section 3.2).
Partial-order methods avoid the exploration of unnecessary interleavings between the transitions of component systems. Gains due to partial-order reduction, in space and time, for veri cation have been reported both in the case of enumerative HP94] and BDD-based approaches ABH + 97]. By declaring sequences of transitions to be atomic, the next operator also reduces the number of interleavings between concurrent transitions. However, while partial-order reductions need to be \discovered" a posteriori from the system description, transition hierarchies are speci ed a priori by the designer, as integral part of the system description. For example, Figure 2 shows a module that adds two words. The environment of the word-adder consists of two modules: a command module, which provides the operands to be added and an instruction that they be added, and a bit-adder, which is called repeatedly by the word-adder. Hence the word-adder has the external variables addOp1 and addOp2 of type WORD, which contain the two operands provided by the command module, the external variable doAdd of type BOOLEAN, which is set by the command module whenever the two operands should be added, and three output bits of the bit-adder: the sum bitResult, the carry-out cOut, and the ag doneBitAdd, which is set whenever a bit-addition is completed. The word-adder has the interface variables addResult of type WORD, which contains the sum, over ow of type BOOLEAN, which indicates addition over ow, doneAdd of type BOOLEAN, which is set whenever a wordaddition is complete, and four input bits for the bit-adder: the operands bit1 and bit2, the carry-in cIn, and the ag doBitAdd, which instructs the bit-adder to perform a bitaddition. The word-adder has the private variables state of type FLAGTYPE, which indicates if an addition is being executed, and bitCount of type LOGWORD, which tracks the position of the active bits during the addition of two words. We assume that, once a word-addition is requested, the command module keeps the variable doAdd true until the word-adder signals completion of the addition by setting doneAdd to true.
The state of a reactive module changes in a sequence of rounds. In the rst round, the initial values of all interface and private variables are determined. In each subsequent round, the new values of all interface and private variables are determined, possibly dependent on some latched values of external, interface, and private variables from the previous round, and possibly dependent on some new values of external variables from the current round. No assumptions are made about the initial values of external variables, nor on how they are updated in subsequent rounds. However, in order to avoid cyclic dependencies between variables, it is not permitted that within a single round, a module updates an interface variable x dependent on the new value of an external variable y while the environment updates y dependent on the new value of x. This restriction is enforced by collecting variables into atoms that can be ordered linearly such that in each round, the variables within an atom can be updated simultaneously provided that all variables within earlier atoms have already been updated. Thus, a round consists of several subrounds|one per atom.
A round of the word-adder consists of four subrounds: rst, the command module may provide new operands and issue an add instruction; second, the word-adder may initiate a bit-addition; third, the bit-adder may perform a bit-addition; fourth, the word-adder may record the result of a bit-addition and signal a completion of the word-addition. Accordingly, the interface and private variables of the word-adder are The rst and third subrounds of each round are taken by the command module and the bit-adder, respectively. The bit-adder, shown in Figure 3 , needs one round for bitaddition, but can choose to wait inde nitely before servicing a request. A word-addition of two n-bit numbers, therefore, requires at least n rounds|one round for each bitwise addition. In the rst of these rounds, the word-adder reacts to the command module, and the rst bits may (or may not) be added. In the last of these rounds, the n-th bits are added, and the word-adder signals completion of the addition. . The await dependencies between variables are required to be acyclic. A variable is history-free if it is not read by any atom. For obvious reasons, the values of history-free variables do not have to be stored during state-space traversal.
Flat vs. Hierarchical Models
We discuss two operations for building complex reactive modules from simple ones. The parallel-composition operator abstracts spatial complexities of a system by collecting the atoms of several modules within a single module. The next operator abstracts temporal complexities of a system by combining several rounds of a module as subrounds of a single round. Intuitively, if M is a reactive module and ' is a condition on the a word-adder composed with a bit-adder. The two models di er only in their level of temporal granularity. In the at model ConcreteAdder, the addition of two n-bit words takes at least n rounds. In the hierarchical model AbstractAdder, the addition of two n-bit words takes a single round. This is because the next-abstracted module combines into a single round as many rounds as are necessary to make either doAdd false or doneAdd true. In other words, in the at model, bit-additions are atomic. Thus the at model is adequate under the assumption that the addition unit is put into an environment that interacts with the addition unit only before and after bit-additions, but does not interrupt in the middle of a bit-addition. By contrast, in the hierarchical model, word-additions are atomic. Therefore the hierarchical model is adequate only under the stronger assumption that the addition unit is put into an environment that interacts with the addition unit only before and after word-additions, but does not interrupt in the middle of a word-addition. While the at model is adequate in more situations, we will see that the hierarchical model can be veri ed more e ciently, and therefore should be preferred whenever it is adequate.
states in the hierarchical model, and thus in e ect history-free. This results in further savings in memory for storing states.
The savings are particularly pronounced when hierarchical models are composed. Consider two at models M and N, and two hierarchical models M . In other words, if the interaction between two component systems can be restricted, then some of the state-explosion problem may be avoided. Indeed, as we shall see, in complex systems with many components but well-de ned interactions between the components, the computational savings, both in time and memory, can be enormous.
In the following, we rst de ne the transition relations of composite and hierarchical modules from the transition relations of the components. Then we present a nestedsearch algorithm that explores the state space of a hierarchical module e ciently. The nested-search algorithm uses an implicit, algorithmic representation of the transition relation of a hierarchical module for image computation.
Explicit De nition of Transition Relations
The state-transition graph of a reactive module can be speci ed by a symbolic transition relation. Given a module M with variables X, the symbolic transition relation of M is a boolean function T M (X; X The reachable state set of a module can be computed by iterated application of the transition relation. For this purpose, it is theoretically possible to construct, using the above de nitions, a BDD for the symbolic transition relation of a hierarchical module. In practice, however, during the construction the intermediate BDDs often blows up and results in memory over ow. For parallel composition, it is a common trick to leave the transition relation conjunctively decomposed and represent it as a set of BDDs, rather than computing their conjunction as a single BDD TSL90]. Early quanti cation heuristics are then used to e ciently compute the image of a state set under a conjunctively partitioned transition relation. For next abstraction, we propose a similar approach.
E cient Computation with Implicit Transition Relations
For model checking, it su ces to represent the symbolic transition relation of a module not explicitly, as a BDD, but implicitly, as an algorithm that given a state set, computes the set of successor states. This algorithm can then be iterated for reachability analysis and more general veri cation problems. In contrast to a BDD for T M (X; X 0 ), which explicitly represents the transition relation of module M, the recursive algorithm for computing the function R 1 M implicitly represents the same information. In practice, a mixture of explicit symbolic representation of transition relations (for small modules) and implicit image computation (for complex modules) will be most e cient. We report on our experiences with nested search in the following section.
Experiments
The aim of our experiments is to investigate the e ciency of the proposed method for the automatic reachability analysis of complex designs. All experimental results reported in this paper were obtained by modeling the systems in Verilog and using the vl2mv Verilog compiler along with VIS BSVH + 96]. We implemented a new command in VIS, called abstract reach, based on Algorithm 3.2.
Multiplier
We model a word-multiplier that functions by repeated addition, using the word-adder We perform reachability analysis with both models. Model 1 is given to VIS directly, and reachability analysis is performed using the compute reach command of VIS. In order to analyze Model 2, we use the abstract reach command with the aggregation predicate doAdd ) doneAdd. As a result, the states in which doAdd is true and doneAdd is false become transient states.
We experiment with two 4-bit operands and an 8-bit result. In this case, Model 1 has 68 latches and 1416 gates. After the next abstraction, 24 of these latches become history-free; that is, their values are independent of previous nontransient values. In particular, the local variables of the adder become history-free, and hence, are represented by trivial functions in the BDD that represents the reachable states. Table 1 shows the peak BDD sizes for both models.
Cache Coherence Protocol
We describe the various components of a generic cache coherence protocol before discussing our results. Each cache block can be in one of three states: INVALID, READ SHARED, or WRITE EXCLUSIVE. Multiple processors can have the same memory location in their caches in the READ SHARED state, but only one processor can have a given location in the WRITE EXCLUSIVE state. There is a directory that, for each memory location, has a record of which processors have cached that location and what states (READ SHARED, WRITE EXCLUSIVE) these blocks are in. Due to want of space, we will not explain the protocol formally. An example scenario gives the general avor. Suppose that Processor 1 has a location in WRITE EXCLUSIVE, and Processor 2 wants to read this location. First Cache 2 records a write miss and communicates that to the directory. The directory then sends a message to Processor 1, requesting it to move the state of the block under consideration from WRITE EXCLUSIVE to READ SHARED. Cache 1 acknowledges this request and also sends the latest version of the data in this block to the directory. The directory then services Cache 2 with this data, and Cache 2 gets the block as READ SHARED. Each of these steps involves a transaction on the bus, which could take an arbitrary number of rounds due to the asynchronous nature of the bus.
We experiment with two levels of temporal granularity. Model 1 is a at model of the memory system, and Model 2 is a hierarchical model that abstracts temporal detail about the bus. While a bus transaction can consume multiple rounds in Model 1, it is forced to always complete in a single round in Model 2. For our experiments, we choose a 1-bit address bus and 1-bit data bus. In this case, Model 1 has 44 latches, of which 6 latches become history-free in Model 2. The peak BDD sizes during reachability analysis for both models are reported in Table 1 .
Processor-Memory System
Aiming for a more dramatic improvement over at modeling, we compose several systems whose interactions are limited. We put together two processors, each with an ALU consisting of the adder and multiplier described earlier, and the cache protocol, to obtain a complete processor-memory system. A block diagram of the system is shown in Figure 4 . The processors have a simple instruction set: load/store register to/from memory, add two register operands, multiply two register operands, compare two registers, and a conditional branch. Again we experiment with two models.
Model 1 is at, and Model 2 is constructed by composing next-abstracted versions of the multipliers, adders, and bus protocol. We choose an 1-bit wide address bus and a 2-bit wide data bus. In this case, Model 1 has 147 latches, of which 36 latches become history-free in Model 2 (15 latches in each multiplier, and 6 in the cache protocol). Here, reachability analysis for Model 1 is beyond the capability of current veri cation tools. However, fully automatic reachability analysis succeeds for Model 2, which structures the design using the next operator.
Consider, for example, the situation where both processors start a multiplication at the same time. In Model 1, there are several transient states due to the interleaving of independent suboperations of the two multipliers. These transient states are entirely absent in Model 2. Indeed, nested search (Algorithm 3.2) is the key to verifying this example: we run out of memory when trying to compute an explicit representation of the transition relation for Model 2. 
Conclusions
We introduced a formal way of describing a design using both temporal and spatial abstraction operators for structuring the description. The temporal abstraction operator next induces a hierarchy of transitions on the state space, where a high-level transition corresponds to a sequence of low-level transitions. We exploited transition hierarchies in symbolic reachability analysis and presented an algorithm for proving invariants of reactive modules using hierarchical search. We tested the algorithm on arithmetic circuits, cache coherence protocols, and processor-memory systems, using an extension of VIS. The experimental results are encouraging, giving fully automatic results even on systems that are amenable to existing tools only after manual abstractions. Transition hierarchies can be exploited to give e ciencies in enumerative reachability analysis as well AH96]. We are currently building a formal veri cation tool for reactive modules, called MOCHA, which will incorporate both symbolic and enumerative hierarchical search as primitives.
While the next operator is ideally suited for abstracting subsystems that interact with each other at predetermined synchronization points, it does not permit the \out-oforder execution" of low-level transitions. We currently investigate additional abstraction operators, such as operators that permit the temporal abstraction of pipelined designs.
