Abstract
Introduction
State explosion poses a serious problem in sequential circuit verification, irrespective of which method is followed [1] , [6] . The designer tackles this problem by resorting to a control part -data path partitioning of the circuit in which all the data storage registers and data transformation / status detection circuits are put in the data path (DP) and the sequential aspects of the behaviour are taken care of in the control part (CP) (Figure 1) .
A verifier designed to exploit this existing partition in the circuit structure can avoid the state explosion problem. In such a scheme, the verifier can be organized as comprising two broad modules: (i) a DP Verifier and (ii) a CP Verifier. The DP verifier, like a conventional one, would produce a "yes/no" correctness answer depending upon whether or not, the data path structure supports the register transfer operations and the status checking operations contained in the RTL behavioural description; in addition, if the answer is "yes", then it would extract a behavioural specification of the control part involving the control output lines and the status input lines of the control part. The CP verifier, in turn, would take as one input this behaviour produced by the DP verifier and the structural interconnection description of the CP components, their behaviour and the initial state as three other inputs and produce the final "yes/no" correctness answer. Figure 2 gives the schematic of a CP-DP partition based verifier. It may be noted that there is a one-to-one correspondence between the RTL behaviour and the control level behaviour; the control flow schema of the descriptions remains the same; the only difference is that the register status checking phrases are to be changed to bit level status checking conditional expressions (ST Analysis [8] ) and the register assignment statements of the former are to be changed to control signal assignment statements (RT Analysis). Since the DP structure does not have any temporal characteristic, the DP verification mechanism involves analysis of only the spatial interconnection of the components. The CP verification, on the other hand, consists in temporal reasoning in a suitably chosen temporal logic framework [2] . Figure 3 depicts the two modules, the RT Analyzer and the ST Analyzer, of the DP Verifier. In this paper the RT analysis task of DP verification has been described.
There have been attempts to exploit during verification the control part-data path partition in sequential circuits. Word Level Model Checking using Multi-Terminal BDDs, BDD arrays, BMDs, etc. for data path representa- [10] . Theorem Proving approaches with predicate based data path representations have also been reported [4, 5, 7, 9, 10] . Various issues pertaining to automation and completeness, however, have not been elaborately described. In many cases, the inputs have been so encoded as to keep the primitive inference steps simpler. In the present work, the emphasis is precisely on identifying the issues in reaching a complete analyzer. Section 2 gives a logical basis of a CP-DP partition based verifier. Section 3 describes the task of RT analysis in data path verification. The termination, soundness and completeness issues are formally established. A complexity analysis of the basic algorithm along with an extension for concurrent RT operation analysis are also included. In section 4, experimental results on verification of some typical data paths against some single and concurrent RT operations have been reported. The paper concludes in section 5 by identifying some limitations of the method.
Logical Basis of CP-DP Partition Based Verification
Let I be the implementation of the entire circuit, S be the behavioural specification and I 1 and I 2 be the implementation of the data path and the control part, respectively. 
RT-Analysis
The analysis method described in this section uses the following notation.
Notation
A: A control signal assertion pattern of the form f< 
Phases of RT Analysis
The DP structure and the DP component behaviours together produce a unary function R s : M ! A . For RT analysis, R s captures the DP structure, in its entirety. The RT analysis problem consists of following two subproblems. (i) All possible sequences of micro-operations which accomplish p b are to be selected from M, and (ii)Corresponding to each sequence selected in (i), it is required to construct, using the function R s , the control signals that are to be asserted. The second task is relatively simpler and is described first; then the first one is presented.
Construction of the Control Signal Assertion List for a Given Micro-operation Sequence
If, for an RT operation p b , the concurrent micro-operations that are found to be necessary are 1 , 2 and 3 , then the control signal assertion pattern A for p b is found from R s ( 1 ), R s ( 2 ) and R s ( 3 ) by an associative binary operation , called superposition operation. The operation is defined as follows. The process terminates here and the sequence of microoperations needed to route the data from the source(s) to the destination is available in reverse order in which they have been used.
Definition 1 Superposition of Assertion
For each register transfer expression, there may be more than one micro-operation applicable. At each step, the results of rewriting using these micro-operations have to be collected for subsequent rewritings. Also, the set of signals replaced and the sequence of micro-operations are to be enhanced. A tree appears to be the most obvious structure to depict the progress; it is referred to as rewrite-tree. Each node of the tree (data type NODE) is associated with 
Correctness of the function analyze rt
Theorem 1 (Termination) The function always terminates.
Proof:
The function always constructs a finite rewrite tree because each node has a finite number of children, as many as the drivers of the signal s being replaced, and each branch has a maximum depth equal to the number of non-register signals in the data path. is. We have the following corrollary of the above theorem whose proof is obvious.
Corollary 1
If the set of sequences of micro-operations returned by "analyze rt" function be , then the RT-operation cannot be accomplished in the given data path.
Complexity of RT-analysis
For each rewrite step, that is, for each node in the rewrite tree, there are three major tasks to be performed, namely, (i) a suitable signal s has to be chosen for rewriting, (ii) choosing the subset M s of micro-operations with s on the lhs and no register signal on the rhs other than the source register(s) of p b , and (iii) rewriting all the occurrences of signal s in p c . The first step takes constant time because the assumption that there can be only one ALU in each path from the source(s) to a destination permits at most two signals on the lhs of any RT-expression. For each of them it has to be checked whether it is a non-register signal by accessing the signal table. The table can be directly probed because the RT-operations (expressions) and the micro-operations are all encoded in terms of index values to the signal table. The complexity of the second step is as follows. The micro-operation table M is hashed in signal name. Each signal will appear at the rhs of only certain number of micro-operations maximum of which can always be specified. Therefore, finding M s from a hashed M takes constant time. When table M is not hashed but sorted on the lhs-signal of the micro-operations, finding M s takes O(log 2 k M k) time. Thus, the complexity of each rewriting is O(1) or O(log 2 k M k) depending upon whether M is hashed or sorted in signal names, respectively.
The number of nodes of the rewrite tree, each node of which involves above complexity, depends upon the operation involved in the RT-operation p b . Obviously, it is the maximum for binary (ALU) operation. In any case, it is always constant (for each ALU). For example, for a binary operation, the root node has as many successors as there are buses. Some of these are (ALU) input buses and the remaining are the output buses. (There can be more than one ALU in the data path.) Each node corresponding to an output bus will have one successor corresponding to the ALU operation resulting in an RT-expression involving the input buses. The input buses do not expand any more because they can only be driven by registers and the heuristic prevents such micro-operations from being used except for one of the source registers. The two ALU input buses, therefore, can be rewritten by two source registers accounting for two linear subtrees of depth two. Thus the rewrite tree for an RT-operation with a binary ALU operation will have 8 nodes. For n ALU's in the data path, there are 8n nodes generated. The complexity figures for RT-analysis can be summarized as follows.
For data path with n ALU's and the micro-operation table hashed, it is O(n). For data path with n ALU's and sorted micro-operation table, it is O(n:log 2 k M k).
Analysis of Concurrent RT-operations
Let P = fp 1 p 2 p n g be the given set of concurrent RT-operations to be analyzed to find all possible sequences of micro-operations that accomplish them. One straightforward approach for analysis could be to invoke "analyze rt" with each member p i of P. Let S i be the set of sequences of micro-operations that can accomplish p i . Let ij be the j-th sequence in the set S i . Let T represent the n-ary Cartesian Product of the sets S i 's; that is, T = X 1 i n fS i g. 
Experiments
The DP-structures given in Figure 4 are analyzed for both single RT-operations and concurrent RT-operations; the performances of the analysis functions are given in Table 1 and Table 2 ; 'P/N' indicates possible/not possible, N 1 , N 2 stand for the number of rewrite-tree nodes and that of the concurrent tree nodes generated. Real-life circuits such as, Bit-wise-shift-and-add multiplier, Booth's Multiplier, Divider and Tamarack Processor [3] have also been analyzed for all RT-operations in their RTL behavioural descriptions. Some faulty/inadequate data paths vis-a-vis the given RT-operations have been tested. It has been found that in many cases, the analysis points to the faults such as, "intermediary register(s) in the data path" (indicating nonrealizability of the RT operation in one time step), "loop detected in the data path", etc.
Conclusion
A CP-DP partition based verification scheme has been proposed and validated. The DP verification problem has been discussed in detail. The task comprises two major subtasks namely, the status checking analysis (ST Analysis) and the register transfer operation analysis (RT Analysis). An RT-analysis algorithm has been presented and its termination, soundness and completeness have been treated rigorously. Complexity of the algorithm has been found to be at worst O(n) or O(n:log 2 k M k), where n is the number of ALU's and M is the set of micro-operations supported by the data path. A simple extension for concurrent RToperations has been discussed. The method has been tested -1bus  ckt2-1bus   ckt2-2bus   r1  r2  r3  r4  a1  a2  a1  r1  r2  r3  r4   res   a1   r1  r2  r3  r4   a1   r1  r2  r3  r4 ckt1-2bus ckt-3bus r4 r1 r2 r3
Figure 4. Example Data Paths for RT Analysis
Ckt name RT-op fed P / N 1 for analysis N ckt1 1bus r1 a1 + a2 P 2 r1 r1 + r2 N 3 ckt2 1bus res r1 + a1 P 2 r1 r1 + r2 N 3 ckt1 2bus r1 r2 + a1 P 3 r1 r1 + r2 N 8 ckt2 2bus r1 r1 + a1 P 3 r1 r1 + r2 N 4 ckt 3bus r1 r1 + r2 P 8 r1 r1 + r1 P 4 fr1 r1 + r2, r3 r2, r4 r1g P 30 fr1 r1 + r2, r4 r1 + r2g P 16 fr1 r1 + r2, r4 r1 + r3g N 14 Table 2 . Performance of "analyze concur rt" on two 1-bus, two 2-bus and a 3-bus data path architectures for both valid and non-valid RT-operations. The method has also been successfully applied to arithmetic circuits such as, bit-wise shift and add multiplier, Booth's multiplier, divider, etc. In fact, the data paths of these problems, having no bus, have been found to be simpler than those of Figure 4 . Although the TAMARACK CPU [3] has been verified using this method, for a full fledged CPU, the instruction set specification involves RT operations which may need more than one time step. More sophisticated analysis is, therefore, needed for the purpose. The present analyzer enhanced in this direction is also likely to be useful for synthesis of behaviours higher than RTL. Again, a CP-DP partition based approach is likely to face hurdles for many circuits where the partition is not there or not easily discernible. A pipelined architecture is a case in point. It will be interesting to examine whether the method can be enhanced for such cases .
