Partial order techniques enable reducing the size of the state graph used for model checking, thus alleviating the`state space explosion' problem. These reductions are based on selecting a subset of the enabled operations from each program state. So far, these methods have been studied, implemented and demonstrated for assertional languages that model the executions of a program as computation sequences, in particular the logic LTL (linear temporal logic). The present paper shows, for the rst time, how this approach can be applied to languages that model the behavior of a program as a tree. We study here partial order reductions for branching temporal logics, e.g., the logics CTL and CTL (all logics with the next-time operator removed) and process algebras such as CCS. Conditions on the subset of successors from each node to guarantee reduction that preserves CTL properties are given. Provided experimental results show that the reduction is substantial.
Introduction
Partial order (or more accurately, commutativitybased) methods are useful for tackling the exponential blowup in the memory required for the automated veri cation by model-checking of concurrent programs. They exploit the fact that many properties are insensitive to the order in which concurrent actions are executed. Fixing one out of many such orders can then be used to reduce the memory and time needed to check such properties. Such methods were studied so Partially supported by ESPRIT project P6021: \Building Correct Reactive Systems (REACT)". far 5, 10, 20, 23, 24] in conjunction with speci cations that assert about the set of interleaved executions of the program; e.g., that use linear temporal logic without the next-state operator (LTL-X).
State-based algorithms for model checking a system are patterned after a depth-rst search of the system's con gurations or states, thus generating a state graph that allows checking whether a concurrent nite state program P satis es a temporal logic property ' . Partial order reductions are aimed at constructing a reduced state graph, based on exploring for each visited state only a subset of the enabled operations, so that only some of the successors of that state are expanded and, hence, speci cations can be veri ed in less space and time. The correctness of the reduced state graph generation algorithm is based on employing a set of constraints that limit the choice of such subsets of operations to those that guarantee that the truth of speci cations is preserved. The next step is to try to extend these methods to handle other types of speci cations. Natural candidates are speci cation languages based on branching models, in particular, branching time temporal logics. Such logics, as opposed to LTL-X, can distinguish the state where a nondeterministic choice is made in the execution of the program. We are guided by three main reasons for our pursuit of a reduction that preserves branching-time logics. The rst one is achieving greater expressiveness, e.g., by using a logic such as CTL -X, which, besides being able to distinguish the nondeterministic choices, can express all LTL-X-properties. The second one is the existence of some interesting restricted versions of branching time logics such as CTL-X. Although CTL-X does not include LTL-X (and vice versa), it can, by virtue of the branching operators, describe many interesting properties of programs. Moreover, due to its restrictions, it has a model-checking algorithm that is linear in the size of the checked formula 2], as opposed to the exponential algorithm for . The third motivation for such a reduction lies within the fact that branching temporal properties are preserved by bisimulation 1]; besides basing our correctness proof on this fact, checking that two states are bisimulation equivalent is itself important for process-algebra style correctness. Thus, our reductions can be used to improve the time and state graph and can be used in conjunction with process-algebra based tools such as PSF and AUTO 15, 19] .
The rst goal of this paper then is to nd the proper constraints on the subset that is chosen to be explored at each visited state. Not unexpectedly, the set of constraints turns out to be strictly stronger than the one needed for LTL-X. Indeed, CTL -X is more expressive than LTL-X is, so that branching points due to nondeterministic choices should be preserved in the reduced graph. Of course, this also means that reduction for LTL-X can produce smaller state graphs, and thus be more e cient in space and time. This is compensated by the fact that some branching time logics such as CTL-X have model checking algorithms that are linear rather than exponential in the size of the checked property.
The proof of the correctness of our algorithm is novel in that it is rather di erent from the one used for LTL-X reductions 20]: instead of using traces 16], i.e., equivalence classes of sequences, we show stuttering equivalence between the full and the reduced state graph 1]. This equivalence was proved in 1] to be a necessary and su cient condition for ensuring that the two stuttering equivalent structures satisfy the same CTL -X formulas.
CTL -X is the most expressive of the logics we discuss and, consequently, the same result holds for logics that are included in it, namely ACTL -X, ACTL-X, and CTL-X.
Experimental results show that even with the additional constraint on selecting subsets of the enabled operations, the reduction is still substantial. We demonstrate the reduction on various algorithms and protocols and compare it to the reduction obtained for LTL-X. The simplicity of the reduction algorithm, and the small overhead in time and memory it incurs, suggests that one can obtain signi cant improvement for state-based model checking, by using the suggested reduction algorithm, with a relatively small investment. We also investigate using our algorithm as part of a branching bisimulation checker.
Experiments indicate that it is more e cient to use our reduction strategy to generate a state graph to be checked than it is to generate and check the full state graph.
Basic Notions
Syntax of CTL -X Let PV be a nite set of propositions. The set of state formulas and the set of path formulas are de ned inductively: S1. every member of PV is a state formula, S2. if ' and are state formulas, then so are :' and '^ , S3. if ' is a path formula, then A' is a state formula, P1. any state formula ' is also a path formula, P2. if ', are path formulas, then so are '^ and :', P3 . if ', are path formulas, then so is U('; ). The modal operator A has the intuitive meaning: \for all paths". U denotes the standard Until.
CTL -X consists of the set of all state formulae.
The following abbreviations will be used: ACTL -X. The modality E is prohibited, and negation can be applied only to subformulas that do not contain modalities.
ACTL-X. The sublogic of CTL-X in which the modality E is prohibited, and negation can be applied only to subformulas that do not contain modalities.
LTL-X. Restriction to formulas of the form A', where ' does not contain A and E. We usually write ' instead of A' if confusion is unlikely.
Semantics of CTL -X Let T be a set of labels. A model for CTL -X is a pair (F; V ), where F = (W; ( a ??!) a2T ; w 0 ) is a directed, rooted, edge-labeled graph with node set W and initial node w 0 , while V is a valuation function that assigns to each node w the set of propositions V (w) PV Observe that satisfaction over our models coincides with the standard de nition as provided in, e.g., 1].
Programs, State Graphs and Independence
For purposes of state graph generation and model checking, the speci c syntactic structure of programs is not important. Instead, a nite-state program P is viewed as a 4-tuple (Q; T; ; I) where Q is a nite set of states giving, e.g., the values of the variables and the contents of the message queues, T is a nite set of operations such as assignments and send and receive actions to and from the message queues, 2 Q is the initial state, and I is a so-called independence relation on the program's actions that will be discussed below (see De nition 2.1). The enabling condition en(q) T is the set of operations that can be executed from a state q. Each operation a 2 T is identi ed with a partial function a: Q ! Q (its denotation) that needs to be de ned at least for each q such that a 2 en(q). We assume that en(q) 6 = ; for any q. V (s) = V (st(s)). In the sequel, we shall not distinguish between V and b V . Partial order reduction exploits concurrency in programs and the fact that truth of speci cations is often insensitive to the order in which so-called independent actions from di erent concurrent components occur in computations. Such independent actions can be, e.g., assignments to variables that are local to di erent components and send-actions in di erent components that a ect separate message queues. The information as to which actions are independent can be given in an abstract way as follows:
De nition 2. 
Stuttering Equivalence
The correctness of the reduction method will be based on the notion of stuttering equivalence. In 1] stuttering equivalence is de ned using approximants n . Because our models are nite, it is easy to see that the two de nitions are equivalent. 
The Algorithm
The reduction algorithm is based upon a modi ed depth-rst-search algorithm. It generates a reduced state graph G 0 for the checked program P such that the correctness of any checked property ' under G 0 is the same as under the full state graph G of P. This is guaranteed by ensuring that the model corresponding to G is stuttering equivalent with the model corresponding to G 0 (see Theorem 3.2). The idea of the reduction is that from each state in the reduced state graph, the set of enabled operations is examined, and only a subset of it is used to generate successors. This contrasts with the construction of the full state graph, where all of the enabled operations are expanded. The subset of the operations E(q) taken from a state q satis es restrictions C0, : : :, C3 below, in order to preserve stuttering equivalence between the full and the reduced model.
To explain the restrictions imposed on the set E(q) let's assume rst that the full model M does not contain loops except for self loops. De nition 3.1 describes the cases in which the model M 0 resulting after removing a transition from M is stuttering equivalent with M. In Figure 1 we have indicated the two situations in which the a-labeled transition need not be expanded from state q in M. This is the case when the states q C1 No operation a 2 T n E(q) that is dependent on the operation in E(q) can be executed in P before the operation from E(q) is executed. Now, consider the rst action c along that depends on b and let q be the state on from which c is taken. Since the b-action must have occured along before reaching state q, commutativety of independent actions implies that the constructed pre x of 0 ends in state q, from which c and, indeed, the whole sequence of subsequent actions along can be taken. The nal restriction 20] is needed in case M does have non-trivial self loops and prevents deferring the execution of operations along such loops. In the gure to the right it can be seen that by choosing E(q 1 ) = a 1 , E(q 2 ) = a 2 and E(q 3 ) = a 3 , which satisfy C1 and C2 (when only b is visible), the operation b is not present in the resulting reduced state graph (which includes only the nodes for states q 1 , q 2 , q 3 and the three edges between them). In this case, the model corresponding to the full state graph is not stuttering equivalent with the model corresponding to the reduced state graph. C3 prohibits omitting such operations occurring along a cycle.
C3 If E(q) 6 = en(q), then the operation in E(q), when applied to the current state q, does not close a cycle on the search stack (i.e., we don't allow that an open node with value a(q) exists on the search stack).
The conditions C1, C2, C3 are su cient to guarantee that the reduced state graph will preserve any checked linear temporal logic property ' 21] Checking that a singleton set fag satis es condition C1 is not detailed in the algorithm given in Figure 3. Because nding optimal ample sets is NPcomplete ( 20] ), any implementation of ample will use heuristics that may also depend on the speci c programming language used (to de ne a nite state program in our sense).
Such heuristics are based on checking the type of the operation a (e.g., a local assignment, a synchronous receive operation, etc.) and some conditions on the rest of the program, and the state of the current node s: according to the type of the operation, there are certain conditions whose satisfaction in the current state s guarantee that fag satis es C1. For example, the simplest condition is that a is a local assignment, and is not within a non-deterministic choice with other operations. A slightly more complicated condition applies when a is a non-synchronous receive. Then C1 is guaranteed if there is no other receive operation from the same queue in any other process (this holds vacuously when a communication queue can be shared only by a pair of processes), and the queue is not empty in the state st(s). A more complete description of checking C1 appears in 11]. 
Complexity of the Algorithm
The time complexity of the algorithm is O(n r C + m r ), where n r is the number of states in the generated state graph, m r is the number of edges and C is the complexity of computing an ample set. This is obvious as the algorithm is a modi ed depth-rst search through the state graph. Computing ample sets can be done in constant time along the lines of the earlier explanation. As to the amount of space, this is clearly linear in the number of states and edges. Hence we obtain an O(n r + m r ) space and time complexity for the algorithm.
Experimental Results

Reduced State Space for Various Algorithms and Protocols
The algorithm described in this paper was implemented by Gerard Holzmann in SPIN 9] and run on several examples. The table in gure 4 below contains the number of states and edges, memory used in bytes, and time in seconds of generating full state graphs and reduced state graphs for both LTL-X and CTL-X. The reduction for CTL-X contains an additional restriction, namely C0, on selecting the subset of successors. This restricts the subsets of successors to be either the full set of the enabled operations, or a singleton set. In order to make the comparison unbiased towards any particular checked property, all operations were considered invisible during the tests. The set of properties that can be checked without making any program operation visible includes properties such as deadlock and termination.
All measurements were made on a Sparc-10 workstation with 128Mbyte of RAM. The runtimes are the sum of system-time and user-time. The algorithms checked are as follows: leader is a leader election algorithm for an unidirectional ring 3], sorting is a pipeline distributed sorting algorithm, urp is AT&T universal receiver protocol, dtp is a data transfer protocol, snoopy is a cache coherence protocol, pftp is a le transfer protocol 9] and tpc is a model of a telephone switch. For the rst two examples, leader and sorting, the reduction with and without the additional restriction C0 are the same. Both give (when repeated with di erent numbers of processes) an exponential reduction of the state graph. For urp and dtp, the reduction in space and time is very similar with and without the additional restriction. For snoopy, the CTL-X reduction generates a state graph that is about 25% bigger in space, and takes about 50% more time. For pftp, the reduction is about twice better in space and time for the LTL-X reduction, and for tpc, it is about 2.5 times better in space and more than three times better in time. These results demonstrate that the inclusion of the reduction algorithm is bene cial for all the above examples. A substantial reduction can be achieved with relatively small cost, as the implementation of the reduction algorithm is simple and incurs only very small overhead (for further implementation details, refer to 11], where an e cient LTL-X implementation is described). Even in the cases where the reduction is not very big (in comparison to some other reductions, as for the leader algorithm), such as in the tpc algorithm, where the gain in space is a factor of four, one can obtain a considerable bene t: since the algorithm is complicated enough to consume a large amount of memory, even the fourfold memory reduction could reduce the execution time from over two hours to about a minute and a half (avoiding needless memory swaps). 
Verifying Branching Bisimulation
The reduction method described in Section 4 can be further exploited in the context of process algebra. It can be used to verify whether two states of a program are branching bisimilar 15, 19] .
Let Invis denote the set of invisible operations, i.e., Invis = TnV is. As we identify each operation from Invis with a silent step, the de nition of branching bisimulation can be formulated as follows: it was conjectured that the same time complexity sufces for minimizing arbitrary state graphs. By Theorem 5.2, our algorithm generates a state graph that is branching bisimilar to the full state graph. Moreover it has time and space complexity O(n r +m r ), where n r and m r are the numbers of generated states and edges. This raises the possibility of using our algorithm as a preprocessing phase to constructing a minimal branching bisimilar state graph, thus allowing the minimization algorithm to run in time O(n 2 r +n r m r ) using O(n r +m r ) space. The algorithms 8, 22] can be applied to a reduced state graph, instead of the full one.
The`benchmark' example used in literature, and by us, is Milner's scheduler as described in 17]. This is a simple token ring consisting of k cyclic processes C i , which, on having received the token, communicate with some system and then concurrently wait for acknowledgment and wait to pass on the token. Process C i is described by the CCS equation C i = t i c i ( a i j t imodk+1 ) C i . The complete scheduler on k processes is described by Sch k = (t 1 nil j C 1 j j C k )nt 1 nt k , where the rst process starts C 1 . The operator`j' denotes a concurrent composition, and` ' means sequencing. Letters correspond to operations. Two letters, where one is barred, e.g., c and c, may synchronize, thus producing an invisible action. The operator`n' is the restriction operator which, in this case, forces the t i , t i synchronizations to occur. nil is the idle process that does nothing.
In Figure 5 we have collected some results for various sizes k of the token ring. The measurements where done on a Sparc1 + -workstation with 16MB of memory. The actual generation of both the reduced and the full state graphs are achieved by a script written in PERL, an interpreted language with heavy use of pattern matching, and the absolute times should be interpreted accordingly. A C implementation can be expected to run at least an order of magnitude faster. The number of states and edges in the full state graph is given in the 2nd and 3rd columns. We consider both the case that only the communication actions (c i ) are visible and the case that both c i and the acknowledgment actions ( a i ) remain visible. For both cases we give the sizes of the state graphs as generated by our algorithm and the minimal state graphs (as given by an implementation of the Vaandrager/Groote algorithm, part of the PSF 15] toolkit). The time column gives the time in seconds that our algorithm needs to generate the state graph, where only c i is visible. We see that in this case not only that the resulted reduced state graphs is small but the time to generate them is small as well. This should be contrasted with the gures in the 2nd to last column that give the times it takes to generate the full state graph. The time for the actual minimalization of the reduced state graph is negligible for these sizes of state graphs. In other words, one gains considerably here by generating the state graph using our reduction algorithm A second experiment shows that even in the case when our algorithm does not substantially reduce the state graph, the overhead of doing the reduction is negligible. In this case, both c i and a i are made visible and the reduction of the number of edges is only between 11% (k=4) and 26% (k=8). Here, more than a half of the operations are visible, which de es most of the reduction. This is fortunately untypical. Furthermore, one can see that the minimal state graph also grows exponentially, producing a minimal state graph that is only about 50% smaller than the full state graph. The 3rd to last and 2nd to last columns, marked as \full" and \PO", show the time it takes to generate the full state graph and reduced state graph, respectively. One can see that even though in this case the reduction is small, the overhead that our algorithm incurs is minimal when compared to generating the full state graph; in fact, the algorithm still runs a little faster.
Although minimizing a state graph w.r.t. branching bisimulation is a global process, certain equivalence preserving transformations can be done locally during state graph generation. For instance, states that have precisely one outgoing transition can be removed if that transition is invisible 1 . The column labeled`PO & -removal' shows the result of augmenting our partial order algorithm with invisible-step removal. There is now a reduction in the number of states as well as a more substantial reduction in the edges. Interestingly, the resulting state graphs are in fact the minimal branching bisimilar ones. The last column shows that there is no time penalty. In fact, the running times are almost the same, which is not surprising because the algorithm has to visit the same number of nodes as before. Note however that the minimization algorithm will run in time and space proportional to the size of the reduced graph. Hence, invisible-step removal is advantageous for the minimization phase.
Conclusions
We have presented an algorithm for generating reduced state graphs to be used for model-checking branching temporal properties. The usual DFS expansion algorithm was modi ed so that only subsets of the successors from each state are expanded. This allows reducing the number of states and edges, and thus allows reducing the space and time used for this construction and for model checking. The branching time logics include the temporal logic CTL -X, which is more expressive than the linear time logic LTL-X. They also include the logic CTL-X which has a model-checking algorithm that is linear in the size of the checked property 2]. These advantages in either expressiveness or e ciency can now be combined with the ability to reduce the state graph using partial order methods.
On the other hand, we have shown that, in general, the reduction of the state graph for preserving branching properties is more restricted than the one for LTL-X: an additional restriction was added, limiting the subset of successors taken from each state to be either the full set of successors or a singleton set.
Experimental results show that the suggested algorithm results in a substantial reduction in both space and time over the traditional full state graph explo-ration. Also, the algorithm proved to be the preferred way to generate state graphs to verify branching bisimulation.
