Abstract-This paper introduces the concept of hierarchical cellular automata (HCA). The theory of HCA is developed over the Galois extension field (2 ), where each cell of the CA can store and process a symbol in the extension field (2 ). The hierarchical field structure of (2 ) is employed for design of an HCA-based test pattern generator (HCATPG). The HCATPG is ideally suited for testing very large scale integration circuits specified in hierarchical structural description. Experimental results establish the fact that the HCATPG achieves higher fault coverage than that which could be achieved with any other test structures. The concept of percentile improvement in fault coverage is introduced to have a realistic assessment of fault coverage achieved with the proposed HCATPG.
I. INTRODUCTION
With the growing complexity of present day very large scale integration (VLSI) circuits, the cost of testers (ATE) has become a significant part of the overall product cost. Built-in self-test (BIST) has emerged as an alternative. It reduces the test cost of VLSI designs. A typical BIST structure includes an on-chip test pattern generator (TPG) and signature analyzer (SA). Linear feedback shift registers (LFSRs) [1] and cellular automata (CA) [2] are extensively used as the test pattern generators. Wide variations of such structures have been proposed in [3] - [5] . Alternative schemes [6] , [7] implemented with arithmetic pattern generators/compactors have been also reported.
The BIST schemes are aimed to meet the basic requirements of high fault coverage with minimum test application time and low overhead. However, desired fault coverage for any arbitrary random logic is difficult to achieve with BIST methodologies proposed so far. The GLFSR [8] and phase-shift LFSR-based [9] pseudorandom pattern generators are proposed to improve the fault efficiency. The basic limitation of the conventional on-chip TPG structures is that they are designed without due consideration to the structure of the given circuit under test (CUT).
The BIST schemes targeting analysis of logic circuits from their behavioral and RTL descriptions are reported in [10] - [13] . However, these schemes impose a number of design restrictions and are suitable only for a specific class of logic circuits.
In this background, we report an efficient methodology of a BIST scheme that employs hierarchical cellular automata (HCA)-based test pattern generator (HCATPG). The class of CA dealt with in [14] is defined over Galois Field GF (2) which can handle the elements from the set {0, 1}. By contrast, each cell in a HCA, defined over Galois extension field GF (2 p ), can store a symbol belonging to f0; 1; 2; 3; . . . ; (2 pqr blocks of the CUT. The structural analysis of the subset of PIs input to different subcircuits has enabled HCATPG to achieve higher fault coverage for a CUT. The TPG folding scheme [15] can be employed to reduce the overhead of HCATPG. Design of HCATPG for a given CUT is reported in Section IV after an introduction of extension field preliminaries in Section II and HCA in Section III. The concept of percentile improvement in fault coverage (PIFC) has been introduced in Section IV to have a realistic assessment of the proposed design.
II. EXTENSION FIELD PRELIMINARIES
There exists an element in the extension field GF (2 p ) that generates all the nonzero elements (; 2 ; . . . ; 2 01 ) of the field. is the generator and the irreducible polynomial of which is a root is called the generator polynomial A(x) (coefficients of A(x) 2 GF (2)) [16] .
We choose primitive polynomials as the generator polynomial so that all the 2 p elements of the extension field are distinct.
The element can be represented by a p 2 p matrix M having its elements 0=1 2 GF (2 ; vector (10) = 2, the last column of matrix . The other elements of the field are computed from The star and plus tables are shown in Table I . The first rows and the first columns of the tables represent the GF (2 2 ) elements in decimal notation.
Extension of Extension Field:
The extension field GF (2 pq ) over GF (2) , where p and q both are the positive integers, can be represented as GF (2 p ) , that is, the extension of extension field GF (2 p ) with same number of 2 pq elements from the set f0; 1; 2; . . . ; (2 pq 0 1)g and isomorphic field operators. Similarly, the extension field GF (2 pqr:: ) can be viewed as the extension field GF (2 p ). For example, the Galois extension field GF (2 6 ) can be also viewed as the extension of extension field GF (2 2 ) or GF (2 3 ) , that is, as GF (2 2 ) or GF (2 3 ).
In GF (2 p ), the coefficients of the generator polynomial B(x) belong to GF (2 p ). The generator can be represented by a q 2q matrix with the elements in GF (2 p ) . is the generator of GF (2 p ). An element of GF (2 p ) in turn can be represented by a p 2 p binary matrix.
In effect, the elements of extension field GF (2 p ) are hierarchically partitioned.
Example 2: Fig. 2 illustrates the hierarchical partitioning of field elements in extension field. The structure of an element 1 (generator) 2 GF (2 6 ) is shown in Fig. 2(a) . The GF (2 6 ) can be also viewed as GF (2 2 ) or GF (2 3 ) . The hierarchical structural implementation of the elements belonging to these two fields are shown in Fig. 2 (Table II) . isomorphic. However, for the current engineering application (dealing with hierarchical structure of a CUT), the hierarchical partitioning of field elements has distinct advantages. This property of hierarchical field structure is employed for the design of HCA introduced in the next section.
III. HIERARCHICAL CELLULAR AUTOMATA
Theory of GF (2) CA has been dealt with in [14] . A cell of this class of CA stores and processes an element 0=1 2 GF (2) and hence is referred to as GF (2) CA. Fig. 3 shows the general structure of an n-cell hierarchical GF (2 p ) CA. For ease of presentation of the basic concept of HCATPG, in rest of the paper we shall deal with only two levels of hierarchy. Generalization for additional levels of hierarchy follows directly. In two levels of hierarchy, each cell of an n-cell GF (2 p ) HCA consists of q number of subcells, each having p number of FFs (Fig. 4) . The interconnection among the cells of the HCA are weighted (Fig. 3 ) in the sense that to arrive at the next state qi(t + 1) of the ith cell, the present states of (i 0 1)th, ith, and (i + 1)th cells are multiplied, respectively, with w i01 , w i , and w i+1 and then added. Proof: If the HCA under the operation with T forms a cyclic group, then there should exist an integer u such that
where I is the identity matrix and for any state f(x) fu(x) = T u 1 f(x) = f(x): (2) As per (1), the necessary condition for a cyclic group is existence of The sufficient condition is proved with the following property of finite group of nonsingular square matrices. The group that we are interested in is the commutative subgroup generated by the powers of T . This is a cyclic subgroup with identity element I. The group is also Abelian. The order of this group is n = 2 (pq) , where L is the order of T . This is true because T is only a transformation matrix which maps one L length vector (2 GF (2 p )) to another one of the same length and the number of such vectors can at most be (2 pq ) L = 2 (pq) .
IV. DESIGN OF HCATPG
The hierarchical structural description of the CUT is assumed to be available for this design methodology. That is, the CUT is described as a network of modules/blocks, each of which in turn has a network of submodules/subblocks. The proposed scheme extracts the clustering of primary inputs (PIs) to different hierarchical modules and submodules of the CUT. We next introduce the concept of dependent PI clusters followed by the basic guiding principle of the design.
Definition 2: If two PI clusters enter into the same circuit module/submodule, the dependency is said to exist between them and they are referred to as dependent clusters.
Guiding Principle of the Design: Rather than considering PIs to the circuit at bit level, we can look at a subset of PIs as a cluster/subcluster of inputs to different hierarchical modules of the CUT and decide on the values of extension field parameters p, q, r, … of the HCA to be employed in the design of TPG. Further, the interconnection among the HCATPG cells exploits the dependencies of PI clusters to different circuit blocks. Hierarchical structure of GF (2 p ) field elements (Fig. 2) supports the desired tuning of the HCATPG to realize the objective of enhanced fault coverage.
A. Tuning HCATPG for a CUT
The basic principle of the design is illustrated with the help of an example CUT of Fig. 6 . It can be seen that the PIs of the CUT are grouped into four nine-bit buses (A; B; C; and E ). It is logical to assume that all the 36 PIs of the CUT are not independent so far as their functionality is concerned. The nine PIs of type A, input to module M1 , are functionally similar and can be considered to form a cluster of PIs rather than nine independent PI lines, each carrying a single bit. Similar consideration is valid for B , C , and E . The guiding motivation is, instead of feeding the PIs of a cluster of 9-bit bus from 9 cells of a 36-cell GF (2) CA, we propose to feed the cluster from a cell of the four-cell GF (2 9 ) CA in a single level hierarchical implementation.
Next, on further analysis of the module M 1 of Fig. 6 , it can be observed that the nine members of cluster A[E ] are divided into three groups (subclusters), each group containing three PIs and fed into three different submodules (M U X 1 , M U X 2 , and M U X 3 ). Similar cases can be observed for other blocks (M 2 ; M 3 ; . . .). It is logical to assume that, within a cluster, the intrasubcluster PIs are functionally closer in comparison to the intersubcluster PIs. The functional dependencies of the PIs in a subcluster are to be reflected in designing the TPG cells. This problem for the example circuit of Fig. 6 can be solved if the GF (2 9 ) CA cell is viewed hierarchically as a GF (2 p ) HCA cell (p = 3 and q = 3 for the present example). It is observed that the dependent PI clusters/subclusters closely interact among themselves to detect the faults of corresponding circuit block/subblock. Therefore, the CA cells feeding the dependent clusters/subclusters should have interdependence and so should be interconnected. That is, the tuning of HCATPG for a given CUT boils down to the following steps. An example four-cell dependency matrix is noted below where the first cell, as per the first row, has dependency on the second cell and itself. The second cell is dependent on the first and third, and so on 
B. Selection of Field Parameters and n
This step executes partitioning of primary inputs to form clusters and subclusters input to different circuit blocks. The cardinalities of the clusters and subclusters are identified. The most frequent cardinality of the clusters (c 1 ; c 2 ; . . . ; c k ) is chosen as the value of (p 2 q), whereas the most frequent cardinality of subclusters within a cluster is taken as the value of parameter p. The algorithm to select the extension field parameters and n for two level of hierarchy is as follows. Step 2. Fix the most frequent cardinality of PI clusters.
Step 3. Identify subclusters and their cardinalities for each .
Step 4. The value of for a cluster the most frequent cardinality of the subclusters within . Obtain the value of from Step 2.
Step 5. Step 6. Return , , and .
This process can be repeated for the design of HCATPG with more than two levels of hierarchy. The number of hierarchical levels will be same as that of the structural hierarchy of the CUT. The HCATPG for testing the circuit of Fig. 6 is a four-cell GF (2 3 ) HCA with two levels of hierarchy. The cells are marked as 0, 1, 2, and 3.
Once the clusters of PIs get identified, the next step is to investigate their interrelationship, that is, the formation of the dependency matrix for the HCATPG.
C. Identification of Dependency Matrix
The existence of the dependencies among the PI clusters are exploited to design the dependency matrix. The algorithmic steps to generate the dependency matrix D are given as follows. 
D. Design of Characteristic Matrix T
For the proposed design, we employ the HCA having cyclic structure (group HCA). The following theorems guide the efficient design of group HCA (T matrix). For the HCA of Fig. 7 having equal nonzero weights 2 ; 4 and , respectively, in the columns 1, 2, and 3 of its T matrix, det[D] = 1 (D being a binary matrix). It is a group CA. In general, the cycle structure of a group CA is defined as [1(k1); 2(k2); . . .], that is, it has i number of cycles of length ki, 8i = 1, 2,….. The cycle structure of the CA of Fig. 7 is [1(1), 1(262 143)] ; it has one cycle of length 1 and one of length 262 143.
Property 1: A HCA designed with primitive weight value (Definition 1) while satisfying the result of Theorem 2 generates larger length cycle of the order of 2 npq01 .
Experimental validation of Property 1 is reported in Table III . The first column shows the number of cells (n) and the extension field parameter (p) of different GF (2 p ) CA. The HCA for a particular n and p is designed for a large number of trials with primitive and nonprimitive weight sets. In the columns 2 and 3, the cycle lengths produced by the HCA in at least 25% cases are noted. The HCAs are designed with different values of n and p. Step 1. Select a seed ( ) from the SF.
Step 2. Each of the algorithms reported has been coded in C language and a complete package named BECKIT has been developed. It is found that to complete the design of HCATPG for a CUT, more than 97% processing time of BECKIT is consumed for fault simulation. Hence, additional computation cost of the HCATPG design will have a marginal effect on total run time.
E. Experimental Results
The experimental setup for evaluation of fault coverage of HCATPG is noted as follows. 1) the PI clusters are identified from the hierarchical structural description provided in [ITC99] for benchmark circuits. 2) In order to increase the number of experimental circuits, more than one bench circuits are combined to get a hierarchical structural net list. Table IV shows and local tuning I local = 10; c) desired FC is specified as 100% and udfp for a module as .05%; d) the SF is assumed to have two seeds only. Table V shows the fault coverage figures in Column 3 with number of test vectors noted in Column 2 for the circuits specified in Table IV . The performance of HCATPG evaluated through BECKIT is compared with GLFSR (Column 4) [8] and maximal length GF (2) CA-based test structures (Column 5). We have designed the maximal length GLFSRbased TPGs for = 2; 3; 4, and 5 as proposed in [8] . A CUT is tested for five seeds (taken randomly) with each of four TPGs. The maximum For sequential circuits, the FFs are assumed to be initialized to 0 through hardware reset.
The figures of Table V confirm that the HCA-based design achieves better fault coverage. In absolute terms, the improvement of fault coverage figures of HCATPG over that of GLFSR and GF (2) in the range where X = fault coverage with the new design (say with HCATPG) and Y = fault coverage with the existing scheme (say GF (2) CA/GLFSR). The PIFC with HCATPG over GLFSR and GF (2) CA are noted in the last two columns of Table V for comparison. The  results of Table V clearly establish the inherent strength of HCATPG to push up the fault coverage figures in the region of 97%-99%, achieved by the contemporary schemes.
F. Area Overhead of HCATPG
The process of hardware implementation for GF (2) CA is reported in [14] . An n-cell GF (2) CA (T matrix) is realized with n number of GF (2) cells (FFs). The interconnection among the cells follows from each of the rows of T with its elements as 0 or 1. For a particular i, the outputs from cell j (j = 1 to n), where T [i; j] 6 = 0, are XORed and then connected to the input of cell i. Similarly, an n-cell GF (2 p ) hierarchical CA with characteristic matrix [T ] Fig. 7 ). This results in an nqp 2 nqp (18 2 18) T bin .
The HCA structure noted in Fig. 7 is designed from its T matrix employing multilevel XOR gates to reduce hardware overhead. The [T ] 323 describes the interconnection among the cells 0-2; whereas the weights ; 2 and 4 (belong to GF (2 2 In the process, the number of two-input XOR gates to realize the HCA of Fig. 7 gets reduced to 38 while the number of two input XOR gates required to realize the HCA from its T bin is 71. For further reduction in area overhead of HCATPG, the concept of TPG folding we have reported in [15] has been implemented.
V. CONCLUSION
This paper presents an innovative concept of HCA. The theory of the extension field is utilized in designing the HCA. We establish HCA as an efficient TPG structure for testing VLSI circuits. A scheme to customize the HCATPG structure for a CUT is noted to achieve maximal fault efficiency.
