The aim of this paper is to show how big model checking problems for Computation Tree Logic (CTL) can be handled by using current powerful vector processors. Although efficient recursive model checking algorithms for CTL, which run in time proportional to both the size of Kripke structures and the length of formulas, have been already proposed [7, 2], their algorithms cannot be vectorized due to recursive procedure calls. In this paper we propose a new model checking algorithm, called a vectorized model checking algorithm, for CTL which is suitable for the execution on vector processors. It can handle more than 1 million state Kripke structure derived from a deterministic sequential machine.
I n t r o d u c t i o n
Recently various kinds of formal methods for automatic verification have been widely studied. Among them, the model checking approach based on a branching time temporal logic called CTL (Computation Tree Logic) [2, 3, 5, 6, 7] is one of the most efficient approaches. In verification of a system which consists of several machines communicating with each other, however, there is a problem so-called a state explosion problem. Although there are several works trying to avoid this problem [8, I0] , it seems to be difficult to avoid the problem in general, and there are strong requirements for verification of large systems [11] .
We mainly aim at clarifying how big machines can be verified based on the model checking algorithm for CTL by using current powerful vector processors. As the first step to this purpose, we challenged to vectorize model checking for CTL on Kripke structures this time. Although the model checking algorithms in [7] are efficient and runs in time proportional to both the size of Kripke structures and the length of CTL formulas, they are not suitable for vector processors because it is difficult to vectorize them due to their recursive procedure calls. In order to extract high performance of vector processors, we need an algorithm using repeated uniform operations on array type data. It is easy to develop such a model checking algorithms based on fixpoint calculations of CTL semantics~ but the direct implementation of the fixpoint calculations would easily lead to an algorithm whose time complexity is proportional to Isi s or IS] 4, where IS[ is the number of states of Kripke structure.
The new model checking algorithm called vectorized model checking algorithm proposed here can be vectorized for executions on vector processors. It runs in time linear to both the size of Kripke structures (i.e. tSI ÷ IRI, where t/{I is the number of edges of Kripke structure) and the length of CTL formulas. We also implemented the algorithm on a vector processor FACOM VP400E. The analysis of storage requirement of the implementation shows that it can manipulate more than 1 million state Kripke structure derived from a deterministic sequential machine. We also present the result of an experiment which shows the efficiency of our implementation.
This paper is orgamzed as follows: Section 2 summarizes the definition of CTL. Section 3 describes the vectorized model checking algorithm for CTL in conjunction with explanations about vector processors. In Section 4 we explain the implementation of the algorithm on a vector processor FACOM VP400E and show its experimental result. Section 5 concludes this paper with summarizing remaining future problems.
C o m p u t a t i o n T r e e L o g i c
Computation Tree Logic (CTL) [6] is a branching time temporal logic. Let A P be a set of atomic propositions. CTL formulas are inductively defined as follows:
• If p E A P , p is a CTL formula.
• If ~? is a CTL formula, then so are -~y, E X y and EGy.
• If ~ and ~ are CTL formulas, then so are ~ V ~ and E [~U~] . The semantics of CTL is defined over a Kripke structure K = (S, R, I), where • S is a non-empty finite set of states.
• R C S x S is a total binary relation on S (i.e. for Vs E S, there exists s' E S such that (s, s') e R).
• I : S --~ 2 AP is an interpretation function which labels each state with a set of atomic propositions true at that state. An infinite sequence of states Ir = SoSlS2... is called a path from So if (s~, si+1) E / / for Vi ~ 0. ~r(i) denotes the i -t h state of the sequence ~r (i.e. ~r(i) = s~).
The truth-value of a CTL formula is defined at a state of a Kripke structure and K, s ~ ~? denotes that a CTL formula y hold at a state s of a Kripke structure K. If there is no ambiguity, we will omit K and just write as s ~ ~7. The relation ~ is recursively defined as follows:
• s~p ( e A P ) iffpEI(s).
• • s ~ E X~ iff there exists some next state s' of s (i.e. (s, s') E R) such that s' ~ 7.
• s ~ EG~? iff there exists some path ~r on K starting from the state s such that r(i) for w > 0.
• s ~ E[~?//~] iff there exists some path r on K starting from the state s such that 3i > 0, and for 0 < Vj < i. Vector processors are supercomputers for large-scale computations. They achieve more than several hundred MFLOPS (Million FLoating-point Operations Per Seconds) by vector instructions which execute uniform operations on axray-structured data using pipelined functional units, and they usually have large main memory of several hundred mega bytes. In conjunction with floating-point operations, they also support integer and bit-wise logical operations.
Although the maximum speed of vector processors are very high, the following two points axe very important in programming for vector processors to achieve their maximum performance: Vectorization ratio: Vectorization ratio is the rate of the operations executed by vector instructions to the whole operations in a program. This ratio should be more than 90% to obtain high performance of vector processors.
Vector length: Since there are some overheads for setting up vector instructions, the length of operands (vector length) of vector instructions should be large enough; it should be larger than several hundreds to get maximum performance of vector processors.
As for data transmission between the main memory and vector registers, there axe load/store pipelines which support basica~y the following three types of vector accesses: contiguous vector access, constant strided vector access and indirectly addressed vector access (see Figure 1) .
Furthermore, vector processors support D O loop with conditional statements and vector compress function shown in Figure 2 . These vectorized functions axe very powerful in implementing vectorized model checker for CTL. 
Case E[~U~]: From the definition of E[~b/~], it holds at the states where ~ is irue
and it also holds at the states which are reachable to such states only through the states where ~7 holds. Therefore, this is a kind of teachability problem. holds but ~ does not, it is temporarily assigned to 2 indicating that it will be determined later by checking the teachability.
Next, for all the states labeled 2, the procedure checks the reachability to the state labeled 1 through the states labeled 2, and the reachable states are labeled 1 and the unreachable states are labeled 0. This step is done as follows (lines 12 ~ 27): For each state where E[~U~] is newly determined to be true, the labels of its predecessor states are checked, and if they are 2, then they are relabeled as 1 because they are the reachable states and they are added to the set which keeps track of the states newly labeled as 1. This step is repeated until no more states axe newly labeled as 1 (lines 12 ,.~ 23). Finally, the states whose labels axe still 2 are labeled as 0 because they are the unreachable states (lines 24 ,., 27).
Case EGrl: From the definition of EG~, it holds at the states on the loops which are constructed only by the states where ,1 holds. It also holds at the states readable to such loops through the states where ,7 holds. In order to find out such states, the procedure (Figure 4) More precisely, after initializing the set of states N1 to be empty, the procedure labels the states as 1 where y holds; it labels the states as 0 where y does not hold (lines 3 ,,~ 10). For each state labeled 1, if the labels of its predecessor states are greater than 0, then they are incremented (lines 13 ,,~ 16). At this point, the label 0 means that EGy does not hold at the state; the label greater than 0 means that y holds at the state and it has its label -1 successor states where y holds. Next, for each state labeled 1 (i.e. the state which has no successor states where ~? holds), the label is relabeled to 0 and the state is inserted to the set N~ which keeps track of the states newly labeled as 0 (lines 18 ,,, 23). In lines 27 ,,~ 37, for each state newly determined that EGy does not hold on it, the labels of its predecessor states are decremented. This step is repeated for those states that become to have no successor states until no more such states exist. Finally, the states whose labels are greater than 1 are relabeled to 1 because such states have at least one infinite path on which y always holds; the other states are labeled as 0 (lines 38 ,,~ 43).
Verify_EG((~)
8. r e t u r n ; 8. e n d 9. e n d o f p r o c e d u r e 9. else 10.
Label(EG~l,
s) := 0; 1. p r o c e d u r e VeriJy_EU(E[~lU~]) 11. i f Art ~ 0 t h e n 2. Nx := 0 12. b e g i n 3. for all s E S d o 13. for all s ' E N x d o 4. i f Label(~, s) = 1 t h e n 14. for all s s u c h t h a t (s, s') E R d o 5. b e g i
Label(E[~IU~], s)

Lgbel(E[~}b[~],s)
Label(EG~l,
s
T i m e c o m p l e x i t y It is clear that Verffy_Not(-~y) and Veri]y_Or(~ V ~) runs in time proportional to ISI.
In the case of Veri]y_EX(EXy), the lines 6 ,,~ 7 are executed only IRI times in total by adopting a data structure which assigns a list of its predecessor states to each state. 
Therefore, the time complexity is O(IS I + IRD. In the case of Veri]y_EU(E[yU~]), the ]or loop
E[~U~]). Therefore, its time complexity is O(IS I + IRI).
Since one of these procedures is executed for each sub-formula of a given CTL formula
~?, the time complexity of the vectorized model checking algorithm is O((IS I + IRI)IOI),
where ]~?1 denotes the length of 7/.
Fairness c o n s t r a i n t s
Fairness constraints can be handled efficiently by labeling ]air states which have at least one ]air path [7, 9] . This can be done by first obtaining ]air strongly connected components and then getting reachable states to the fair strongly connected components. There is a well known linear time algorithm to get strongly connected components of a directed graph based on the depth first search [1] . It seems to be difficult to vectorize this algorithm. We leave the vectorization of this part as a future problem and decided to use the non-vectorized well known algorithm.
Once the strongly connected components have been obtained, it is easy to vectorize the decision procedure if they are fair or not. The reachability problem can be also vectorized in the same way as model checking of E[~? H~].
The labeling for fair states should be done once before starting model checking and it should be also done when evaluating EG operator.
Vectorized M o d e l Checker
I m p l e m e n t a t i o n
We have implemented the vectorized model checking algorithm on a vector processor FACOM VP400E as a vectorized model checker. In the VP400E, three pipelined vector functional units, each of which consists of 4 pipelined units, can operate in parallel with 7 nano second cycle time. Its peak performance is about 1714 MFLOPS. It has a 256 M byte main memory, in which we can use 200 M bytes as a user area. The input of the vectorized model checker is a Moore type deterministic sequential machine and CTL formulas to be verified. It creates the corresponding Kripke structure internally from a given Moore type deterministic sequential machine.
Let M = (X, Z, ~, 5, A, So) be a Moore type deterministic sequential machine, where • X is a finite and nonempty set of binary input signals (atomic propositions);
• Z is a finite and nonempty set of binary output signals (atomic propositions);
• ~ is a finite and nonempty set of states;
• 6 : 2 x x ~] -* ~ is the state transition function;
• A : ~ -* 2 z is the output function;
• so is the initial state. Then, the corresponding Kripke structure K --(S, R, X) becomes as follows:
u {zlz e
Intuitively, there is a one to one correspondence between the transition edges of the Moore type deterministic sequential machine M and the states of the corresponding Kripke structure K. The size of the Kripke structure becomes as follows: Figure 4 ). The average vector length for these parts becomes the average number of predecessor states of each state, and it is IRI/IS] or 21xl. That is, if the machine has 10 input signals, the average vector length becomes 1024 and it is enough large to extract high performance of a vector processor.
In order to represent a transition relation R of a Kripke structure, we use 2 integer arrays Q and R. Since there is a one to one correspondence between the edges of a sequential machine and the states of the corresponding Kripke structure, it is possible to number the states of the Kripke structure so that each state s~ has its predecessor states SQ( 0 ,,~ sR( 0. By using this numbering method, we can use the efficient contiguous access (see Figure 1) to calculate the labels of predecessor states which is the most time consuming parts of the model checker. The sizes of the arrays Q and R are both IS]. These 2 integer arrays Q and K can be easily constructed directly from a given Moore type deterministic sequential machine in proportional time to the size of the corresponding Kripke structures by using one additional integer array of size IS].
As for N1 and N2 used in Veri/y_ZU(E[,lU~]) and Verify_EG(EG,}), we also use integer arrays with the corresponding index variables. Initialization of Ni (i.e. N,-:= 0, i = 1, 2) can be done by just substituting 0 to the corresponding index variable. Insertion of a state s to N; (i.e. N~ := Ni U {s}) can be done by just storing the state s at the place in Ni pointed by its corresponding index variable and updating the index variable. As for copying data from N2 to N1 (i.e. N1 := N2), we just exchange the role of N1 and N~ instead of copying data actually. The maximum required sizes of the arrays N1 and N2
ISl.
In order to store the truth value of each sub-formula at each state, we use 1 bit each. Therefore, the required memory in total for this purpose is IS] × 1~/I/8 bytes. In addition, we use a working integer array of size IS I to store a label for each state in checking temporal operators.
We also use ISI x 8 words to handle fairness constraints. Note that 1 integer word consists of 4 bytes. Therefore, the total amount of required memory is ISl × 56 + ISl × I, I18 bytes.
For CTL formulas which contains 256 and 1024 operators, the vectorized model checker can manipulate Kripke structures of 2.3 million and 1.1 million states respectively with main memory of 200 M bytes.
E x a m p l e
In order to measure the efficiency of the vectorized model checker, we applied it to the verification of two large Kripke structures (SR8 and SR9) corresponding to synchronous shift registers with parallel load and serial output. SR8 consists of 131,072 states and 67,108,864 edges (each state has 512 edges), and SR9 consists of 524,288 states and 536,870,912 edges (each state has 1024 edges) as shown in Table 1 . The CTL formulas which give their full specifications contain more than 300 different sub-formulas with no fairness constraints.
The benchmark results without fairness handling are shown in Table 2 . Both SR8 and SR9 are verified to be true on the vector machine of VP-400E in about 5 seconds Name SR8 SR9 52 and 29 seconds by using 13 MB and 52 MB of memory respectively. This means that our vectorized model checker evaluates about 7 ,,~ 8 states in a second, which implies that it will be able to verify 1 million state Kripke structure in a couple of minutes. Furthermore, the acceleration ratio obtained by our algorithm is around 26 ,~, 39, which is extremely high ratio in non numeric application programs. We also verified SR8 by using the CTL model checker B1.0 developed by Clarke et al. [5] installed on Sun-3/80. It took about 5,068 seconds to verify SR8. By considering that some benchmark tests show that the scalar unit of FACOM VP-46OE is about 6.7 times faster than Sun-3/80, it will still take about 750 seconds to verify SR8 even if we instal] the CTL model checker B1.0 on VP-400E because their algorithm cannot be vectorized due to recursive calls.
C o n c l u s i o n
We proposed a vectorized model checking algorithm for CTL and implemented the vectorized model checker on a vector processor FACOM VP400E. Almost all parts of the algorithms axe vectorized except the parts to obtain strongly connected components for fairness constraints handling.
It can handle about 2.3 million or 1.1 million state Kripke structures derived from a Moore type deterministic sequential machine when the length of a give CTL formula is less than 256 or 1024 respectively.
We also presented examples which show that a CTL formula of 368 different subformulas is verified to be true in 29 seconds on a Kripke structure with 524,288 states and 536,870,912 edges.
There are several remaining future problems:
The first is to devise a vectorized algorithm for obtaining strongly connected components of a directed graph, and we need more consideration.
The second is a vectorization of model checking which can handle sequential machines directly [2] . We think this is not so difficult and would like to implement it in the near future.
The third is a vectorization of model checking based on Binary Decision Diagrams (BDD) because the BDD is a powerful technique to reduce necessary amount of memory for model checking dramatically in some cases [4] .
Although our current version of the vectorized model checker takes a single Moore type sequential machine as input, it would be interesting to challenge the vectorization of the procedure to create direct product of several concurrent/parallel sequential machines. It would be also interesting to devise vectorized algorithm for model checking without creating direct product explicitly.
