Index Terms-Built-in self-test (BIST), circuit under test (CUT), derived sequence, detectable error probability estimate, Hamming distance, parity tree compactor, sequence weight, sequence mergeability, space compaction, time compaction.
I. INTRODUCTION

W
ITH increasing complexity in systems design with increased levels of integration densities in digital design, better and more effective methods of testing to ensure reliable operations of chips, the mainstay of today's many sophisticated systems, are required. The concept of testing has a broad applicability, and finding highly efficient testing techniques that ensure correct system performance has assumed significant importance [1] - [15] . The conventional testing technique of digital circuits requires application of test patterns generated by a test generator (TPG) to the circuit under test (CUT) and comparing the responses with known correct responses. However, for large circuits, because of higher storage requirements for the fault-free responses, the test procedure becomes rather expensive and thus alternative approaches are sought to minimize the amount of needed storage. Built-in self-testing is a design approach that provides the capability of solving many of the prob-lems otherwise encountered in testing digital systems. It combines concepts of both built-in test (BIT) and self-test (ST) in one termed built-in self-test (BIST). In BIST, test generation, test application and response verification are all accomplished through built-in hardware, which allows different parts of a chip to be tested in parallel, reducing the required testing time besides eliminating the need for external test equipment. As the cost of testing is becoming a major component of the manufacturing cost of a new product, BIST reduces manufacturing, test, and maintenance costs, and improves diagnosis. Several companies like Motorola, AT&T, and IBM have included BIST in their products. For example, the Motorola 68020 microprocessor is tested using BIST techniques. The microcode ROM is self-tested in the Intel 80 386 microprocessor. Similarly, AT&T has incorporated BIST into more than 200 chips.
A typical BIST environment as shown in Fig. 1 uses a test pattern generator (TPG) that sends its outputs to a CUT, and output streams from the CUT are fed into a test data analyzer. A fault is detected if the circuit response is different from that of the fault-free circuit. The test data analyzer is comprised of a response compaction unit (RCU), a storage for the fault-free response of the CUT, and a comparator. In order to reduce the amount of data represented by the fault-free and the faulty CUT responses, data compression is used to create signatures (short binary streams) from the CUT and its corresponding fault-free circuit. Signatures are compared and faults are detected if a match does not occur. BIST techniques may be used during normal functional operating conditions of the unit under test (on-line testing), as well as when a system is not carrying out its normal functions (off-line testing). In the case where, detecting real-time errors is not that important, systems, boards, and chips can be tested in off-line BIST mode. BIST techniques use pseudorandom or pseudoexhaustive test pattern generators, or on-chip storing of reduced test sets. Today, testing logic circuits exhaustively is no longer used, since only a few test patterns are needed to ensure full fault coverage for stuck-type faults. Reduced pattern test sets can be generated using algorithms such as FAN, and others. Built-in test generators can often generate such reduced test sets at low cost, making BIST techniques suitable for on-chip self-testing. This paper focuses on the response compaction process of built-in self-testing techniques which translates into a process of reducing the test response to a signature. Instead of comparing bit-by-bit the fault-free responses to the observed outputs of the CUT as in conventional testing methods, the observed signature is compared to the correct one, thereby reducing the storage needed for the correct circuit responses. The test data analyzer consists of a compaction unit, a comparator, and a storage (memory device). The compaction unit can be divided into a 0018-9456/00$10.00 © 2000 IEEE space compaction unit and a time compaction unit. In general, input sequences coming from a CUT are fed into a space compactor, providing output streams of bits such that ; most often, test responses are compressed into one sequence . Space compaction brings a solution for the problem of achieving high-quality built-in self-testing of complex chips without monitoring a large number of internal test points. It reduces testing time and area overhead by merging test sequences coming from these internal test points into a single stream of bits. This single bit stream of length is fed into a time compactor and a shorter one of length is obtained at the output. The extra logic representing the compaction circuit must be as simple as possible, to be easily embedded within the circuit under test, and should not introduce signal delays that affect either the test execution time or the normal functionality of the circuit being tested. In addition, the length of the signature must be as short as it can be in order to minimize the amount of memory needed to store the fault-free signatures. Also, signatures obtained from faulty output responses and their corresponding fault-free signatures should not be the same, which unfortunately is not always the case.
A fundamental problem with compaction techniques is error masking or aliasing [13] which occurs when the signatures of a faulty output response map to the fault-free signature, usually calculated by identifying a good circuit, applying test patterns to it, and then having the compaction unit generate the fault-free reference. Aliasing causes loss of information, which affects the testing quality of BIST and reduces the fault coverage (the number of faults detected, after compaction, over the total number of injected faults). Several methods have been suggested for computing the aliasing probability. The exact computation of this aliasing probability is known to be an NP-hard problem. In practice, high fault coverage, over 99%, is required and therefore, any space compression technique which maintains more percentage error coverage information would have to be considered for investigation. This paper considers the general problem of designing and analyzing efficient space compression techniques for built-in self-testing of VLSI circuits using compact test sets. The techniques are based on identifying certain inherent properties of the test output responses of the CUT and the knowledge of failure probabilities. The mergeability criteria of output sequences are developed utilizing the concepts of Hamming distance and sequence weights for a pair of outputs as well as an arbitrary number of outputs (generalized mergeability), and the effect of failure probabilities on the mergeability criteria is analyzed as well. The techniques proposed achieve a very high fault coverage for single stuck-line faults, with low CPU simulation time, and acceptable hardware overhead, as evident from extensive simulation results on ISCAS 85 combinational benchmark circuits, under conditions of both stochastic independence and dependence of single and multiple line errors.
II. SPACE COMPRESSION APPROACH
The space compression techniques proposed in the paper are basically an extension of the hybrid space compression (HSC) proposed initially by Li and Robinson [6] , and the dynamic space compression (DSC) subsequently put forward by Jone and Das [3] . The HSC uses AND, OR, and XOR gates to implement an output compaction tree to compress the multiple outputs of the CUT to a single line. In the DSC, instead of assigning static values for the probabilities of single and double line errors, these values are actually dynamically estimated based on the CUT structure, viz., on the number of single lines and shared lines connected to an output during the computation process. The techniques as developed herein, on the other hand, use AND (NAND), OR (NOR), and XOR (XNOR) gates as appropriate to design a compaction tree that also compresses the outputs of the CUT to a single line, but based on criteria of sequence mergeability utilizing concepts of Hamming distance and sequence weights (second-order as well as th-order). The logic functions selected to construct the compaction tree are determined by the characteristics of the sequences that form inputs to the various gates. The mergeability criteria are initially developed on the assumption of stochastic independence of single and double line errors and then of single and multiple line errors. On the other hand, if the stochastic independence is not assumed, it was observed that the probability of error occurrence plays a very significant role in the selection of gates for merger of either a pair or a number of sequences in many cases. In the latter case of stochastic dependence, the selection of appropriate output lines for merger is based on calculating the detectable error probability estimates (both second-order and th-order) Es. The gate selection criteria may, however, remain unchanged under certain special conditions of probability assignments for single line and double line errors, based on the computation of detectable error probability estimates, as against the selection criteria under the condition of stochastic independence.
III. SEQUENCE CHARACTERIZATION AND MERGEABILITY OF RESPONSE DATA OUTPUT FOR SPACE COMPACTION
The principal idea in space compaction is to compress functional test outputs of the CUT possibly into one single test response to derive the signature without sacrificing too much information. The logic function to be selected to build the compaction tree is essentially determined by the characteristics of the sequences which are inputs to the gates based on some mergeability criteria to be satisfied. The basic theme of the suggested approaches in this paper is to select a suitable gate to merge either two or any arbitrary number of candidate output lines of the CUT under conditions of stochastic independence and stochastic dependence of line errors, using sequence characterization developed in the paper. In the following sections the mathematical basis of these approaches is first given with appropriate notations and terminologies.
A. Sequence Weights and Derived Sequences
Let represent a pair of output sequences of a CUT of length , where the length is the number of bit positions in and . Let represent the Hamming distance between and (the number of bit positions in which and differ). Definition 1: The first-order 1-weight, denoted by , of a sequence , is the number of 1s in the sequence. Similarly, the first-order 0-weight, denoted by , of a sequence , is the number of 0s in the sequence. . However, in general, it is not expected that any two distinct pairs of sequences at the output of the CUT will be identical and hence the possibility of the corresponding derived pairs being identical is also remote.
We can extend the concept of 1-weight and 0-weight to deal with sequences . sequences in the group is given by the number of bit positions in which all the sequences have 1s, which is in this case. Definition 4: For sequences , each of length , the th-order 0-weight is defined in the same way as the th-order 1-weight and corresponds to the number of bit positions in which all the sequences have 0s.
Example 3: Consider three sequences of length as given:
, , and . The third-order 1-weight of , and is since the sequences , and agree in bit positions 2,9,10,11,12 having 1s (counting from right). Note that each of the sequences , and has 1s in other bit positions besides bit positions 2,9,10,11,12. The third-order 0-weight is since all three sequences have 0s in bit positions 1 and 8 (counting from right).
We will denote an th-order 1-weight by and an th-order 0-weight by , where the first subscript in each case corresponds to the sequence number or order, while the second subscript gives the binary digit corresponding to the weight.
Definition 5: For sequences , each of length , the th-order derived sequences, denoted by , respectively, are obtained by deleting the bit positions in which at least two of the sequences differ and replacing the bit positions with a dash (-).
Example 4: For the three sequences , and of the previous example, we have the derived sequences: , , and . For these derived sequences, we have and .
Property 5:
If the th-order 1-weight is and the th-order 0-weight is of the derived sequences in a bundle, then where , called the residue, is the number of bit positions in which at least two of the sequences in the bundle have different entries, and the corresponding sets of derived sequences have dash (-) entries.
IV. GATE SELECTION UNDER PAIRWISE MERGEABILITY
In this section we will briefly summarize the key results concerning pairwise mergeability of response data output of a CUT in the design of space compactors. These will be provided in the form of certain theorems without proofs under conditions of both stochastic independence and stochastic dependence of line errors, of which the details could be found in [14] . Generally space compression has been accomplished using XOR gates in cascade or in tree structure. In [1] and [14] , a combination of both cascade and tree structures (cascade-tree) has been adopted as the framework comprised of AND (NAND), OR (NOR) and XOR (XNOR) operators. The gate selection was primarily based on mergeability criteria that use the properties of Hamming distance, sequence weights, and derived sequences, together with the concept of detectable error probability estimates [6] for a two-input logic function, given two input sequences of length , under conditions of stochastic dependence of single and double line errors at the output of a CUT.
A. Pairwise Merger Under Stochastic Independence of Line Errors
Consider , , , and as four output sequence streams of a CUT. Let and be two distinct output pairs, and , their corresponding derived sequence pairs, respectively, such that 00-01-10 , 00-01-10 and 0000--11 , 0000--11 . Both derived sequence pairs have the same length but they are not identical. 
B. Pairwise Merger Under Stochastic Dependence of Line Errors
In order to consider the role of probability on error occurrence and its effect on sequence mergeability, Li and Robinson [6] defined a parameter called the detectable error probability estimate for a two-input logic function, given two input sequences of length , as follows (in the paper we call it second-order error probability estimate):
, where is the probability of single error effect felt at the output of the CUT, is the probability of double error effect felt at the output of the CUT, is the number of single line errors at the output of gate , if gate is used and is the number of double line errors at the output of gate , if gate is used. Based on the detectable error probability estimate of Li and Robinson as given above, the following results are derived that profoundly influence the selection of gates for merger. . However, the probability does not play any role in the selection between AND (NAND) and OR (NOR) gates if we use the empirical formula of Li and Robinson. The undernoted theorem states the condition that determines the selection criteria between AND (NAND) and OR (NOR) gates.
Theorem 11: For an output sequence pair of length and Hamming distance , an AND (NAND) gate is preferable to an OR (NOR) gate if . Corollary 11.1: An OR (NOR) gate is selected if, on the other hand, . The proofs of the above theorems along with other relevant details including algorithms for selection of the response data outputs and simulation results on ISCAS 85 combinational benchmark circuits are given in [14] and are not repeated here. Our objective in the subject paper is to rather discuss the case of generalized mergeability in greater details, as the subsequent sections will demonstrate.
V. GATE SELECTION UNDER GENERALIZED MERGEABILITY
We now consider the case of generalized sequence mergeability under conditions of both stochastic independence and stochastic dependence of multiple line errors occurring at the output of a CUT in the design of the desirable compaction tree. The following definitions are relevant.
Definition 6:
If sequences at the output of a CUT are grouped together having certain th-order 1-weight or 0-weight, then the process will be termed bundling and the grouped sequences will be termed bundled sequences. Whenever there will be bundling of a number of sequences, we will say that we are working under the bundling constraint.
Definition 7: The error multiplicity, denoted by , is the number of simultaneous errors that can occur at the outputs of a CUT or the number of lines at the output of a CUT that can be faulty simultaneously.
We next discuss our pertinent results in the case of generalized mergeability of response data output assuming stochastic independence of line errors (multiple).
A. Generalized Mergeability Under Stochastic Independence of Multiple Line Errors
Theorem 12: On the assumption of stochastic independence of errors for a bundled set of output sequences each of length at the output of a CUT, the maximum number of possible errors with all possible multiplicities of errors is . , let the derived sequences be . In addition, let the th-order 1-weight be and th-order 0-weight be . In the extreme case when and the total faults detected by -input AND (NAND), OR (NOR) and XOR (XNOR) gates are, respectively, as follows: AND/NAND OR/NOR and XOR/XNOR The theorems are very important in the selection of the best subsets of output sequences in maximizing error detection at the CUT output during space compaction. The proofs of these theorems are intentionally avoided for the sake of brevity though the theorems are retained for completeness. Some of the results in the said context can also be found in [1] .
VI. IMPLEMENTATION STRATEGY
For the implementation of the generalized mergeability criteria as outlined in the results of the preceding section to construct the space compactor, the algorithm was developed. The algorithm was written in the C language, and was executed on ISCAS 85 combinational benchmark circuits using fault simulation and fault detection programs ATALANTA, FSIM and COMPACTEST. The steps of the algorithm are given next followed by a simple illustrative example.
A) Separation Criteria of Sequences for AND (NAND) and OR (NOR) Gates
A1) All original sequences are separated in two lists: ANDlist, ORlist as follows (NAND and NOR gate listings with AND and OR gates, respectively, are omitted in subsequent descriptions of the algorithm).
• All sequences that have number of 1s are sent to the ANDlist (where is the length of the sequence • From the ANDlist, group sequences according to an algorithm that takes into consideration , i.e. , in case of AND gate:
: number of matching 1s in the same bit positions of all candidate sequences. : number of sequences that have matching of 1's bits in common.
• Detailed explanation of this algorithm is given as follows: B1.1) overrides . In other words, the group that has the maximum value of is the best group as long as . B1.
2) The best grouping between two groups that have the same value of is the one that has the bigger value. B2) The same algorithm can be applied to the ORlist but we should replace 1 by 0. B3) Mergeability Criteria: Assume the following:
: Number of matching 1s in the same bit positions of all candidate sequences. : Number of matching 0s in the same bit positions of all candidate sequences.
B3.1) For
: the best grouping of sequences obtained from B1) is merged together with an AND gate only. The resulting output is sent back to the AND sublist.
B3.2) For : the best grouping of sequences obtained from B1) is merged together with AND and XOR gates (XNOR is omitted from discussions). The resulting output is sent either to AND, OR, or XOR (when case exists) sublists. All the resulting trees of gates are found by using a recursive algorithm.
B3.3) For : the best grouping of sequences obtained from B2) is merged together with an OR gate only. The resulting output is sent back to the OR sublist.
B3.4) For : the best grouping of sequences obtained from B1) is merged together with OR and XOR gates. The resulting output is sent either to AND, OR, or XOR (when case exists) sublists. All the resulting trees of gates are found by using a recursive algorithm. B4) Sorting the output sequence after merging with an XOR gate:
If the output sequence has:
• Number of 1s : the sequence is sent to the AND sublist of candidates.
• Number of 1s : the sequence is sent to the OR sublist of candidates.
• Number of 1s = : the sequence is sent either to the AND or OR sublists according to the algorithm described in A2).
C) Stage Two and Intermediate Stages
• Repeat the same steps as in Stage A.
• Intermediate stages exist as long as there are still sequences which can be grouped by either AND, AND (XOR), OR, OR (XOR) gates.
D) Last Stage (XOR Processing)
The obtained sequences that cannot be grouped by AND or OR gates are merged together by the XOR gate. At this point, we get the last output sequence of the whole list of sequences.
The following rules are important for the clarification of the first algorithm.
Rule: Assume we have two 1's groupings of sequences with and . The following rule is applied to obtain the optimal solution (best grouping out of both of them): a) Weight " " takes advantage over (overrides) number of sequences " ." In other words, the group that has the maximum value of is the best solution. b) If , our optimal solution is the group that has the maximum value of . c) If and , our optimal solution is either one. This rule can be generalized for several groups of sequences, i.e., sequences. Consider the following sequences:
. In order to understand the process to obtain the "Best Group," the following definitions, categories of grouping and stages of processing are provided.
Definition and initialization of parameters:
• max : a variable which denotes the maximum number of matching bits (can be 0 or 1 according to the kind of grouping taken into consideration) in the same bit positions obtained so far in the process of getting the BEST GROUP.
• max : a variable which denotes the maximum number of sequences that have max matching of (1s or 0s according to the kind of grouping taken into consideration) bits in common obtained so far in the process of getting the BEST GROUP. As a first step, we initialize both max and max to zero. Stages of processing: At the first stage, "Selecting the Best Group" subalgorithm (explained later) is applied by starting with seq0 going through each of: seq1, seq2, , seq .
At the second stage, "Selecting the Best Group" subalgorithm is applied by starting with seq1 going through each of the following sequences: seq2, seq3, , seq . At the th stage, "Selecting the Best Group" algorithm is applied by starting with seq( ) going through each of the following sequences: seq , seq( , seq . The last stage of processing during the execution of the algorithm starts at seq( ) going through seq .
Categories of groupings:
The various kinds of groupings are defined and classified as follows:
Cat Cat [3] Better Grouping: It has the same conditions as in Good Grouping. In addition, it is the last good grouping found in a stage. The grouping is considered as a candidate for the "Best Grouping."
Cat [4] Best Grouping: It has the maximum of and parameters found after going through all the stages and so it is the best group of all the stages.
Subalgorithm to select the "BEST GROUP": In stage 1, we start processing the operation with seq0. 1) To obtain the matching of 1s in each bit position we do: (seq0 AND seq1) = which has ( ). Now we compare (resulting ) with max and if needed (when = max ), we compare (resulting ) with max and record the highest of both, i.e., MAX[
, max ]. We put it in max , and we put MAX[ , max ] in max .
• If resulting max then (seq0 , seq1) is a bad grouping according to Cat [1] , and we restart 1) for seq0 and seq2.
• If (seq0, seq1) is a good grouping according to Cat [2] , we can say that (seq0, seq1) is the best group so far. Then we go to 2). 2) Now we execute: AND seq2) = which has ( ). Then we compare (resulting ) with max and if needed (when = max ), we compare (resulting ) with max and record the highest of both, i.e., MAX[ , max ]. We put it in max and we put MAX[ , max ] in max .
• If resulting max then (seq0, seq1, seq2) is a bad grouping according to Cat [1] and we restart 2) for and seq3.
• If ( , seq2) is a good grouping according to Cat [2] , we can say that ( , seq2) is the best group so far. Then we go to 3). 3) Now we execute: ( AND seq3) = which has ( ). Then we compare (resulting ) with max (obtained in step 2) and if needed (when = max ) we compare (resulting ) with max (obtained in step 2) and record the highest of both, i.e., MAX[ , max ]. We put it in maxW and we put MAX[ , max ] in max .
• If resulting max (seq0, seq1, seq2, seq3) is a bad grouping according to Cat [1] , we restart 3) for and seq4.
• If ( , seq3) is a good grouping according to Cat [2] , we can say that ( , seq3) is the best group so far. Then we go to the next step. 4) We repeat the above steps until we exhaust all sequences starting with sequence 0. Whenever we find the "better grouping" according to Cat [3] in the first stage (steps: 1, 2, 3), we compare its resulting parameters to the first grouping of the second stage which starts with sequence 1. 5) We repeat step 4) for all the existing stages. At the last stage, we obtain the optimal grouping which is the "best grouping for the whole list of sequences" according to Cat [4] . This step is achieved using a recursive method.
A. Mergeability Criteria
The main idea for merging a set of sequences by AND (NAND), OR (NOR) and XOR (XNOR) gates can be summarized as one of finding if this set of sequences satisfies one of the following criteria:
• If , the obtained group sequences are merged together with an AND (NAND) gate only.
• If , the obtained group sequences are merged together with an OR (NOR) gate only.
• If and , the remaining sequences which cannot be merged by either AND (NAND) or OR (NOR) gates are merged together with the XOR (XNOR) gate only at this level.
VII. SPACE COMPACTION UNDER GENERALIZED MERGEABILITY BASED ON STOCHASTIC DEPENDENCE OF MULTIPLE LINE ERRORS
In this section, we will present generalized mergeability criteria for merging an arbitrary number of response data outputs of the CUT under stochastic dependence of multiple line errors. In order to do that, we first extend the definition of second-order detectable error probability estimate as defined earlier to cover the case of sequences. We state the following obvious theorems without proof.
Theorem 20: The second-order detectable error probability estimate when two sequences and of length are merged by using an AND (NAND) gate is AND/NAND where is the Hamming distance between the two sequences. Theorem 21: The second-order detectable error probability estimate when two sequences and of length are merged by using an OR (NOR) gate is OR/NOR with being the Hamming distance between the sequences. Theorem 22: The second-order detectable error probability estimate when two sequences and of length are merged by using an XOR (XNOR) gate is (XOR/XNOR) = .
A. Derivation of the th-Order Detectable Error Probability Estimate for Individual Gates
In this section, we will determine the th-order detectable error probability estimate for AND (NAND), OR (NOR) and XOR (XNOR) gates. They are derived for the case when is odd. The expressions for s will be very much similar when is even. Let , each of length , be the output sequences at the output of a CUT. Let the corresponding th-order derived sequences be , having th-order 1-weight and 0-weight, and , respectively.
Let , be the probability of -line errors. Realistically, one can assume that and . Let be the residue, that is, the number of bit positions in which at least two of the sequences in the bundle have different entries. Let be the number of columns in the bundle set with vertical 1-weight equal to . Let us denote the binomial coefficient ( ) simply by the symbol or . Example 6: Consider the following five sequences of length each:
In this example, the 5th-order 1-weight is and the 5th-order 0-weight is . The residue is equal to 6 since there are 6 bit positions in which at least two sequences differ. 
B. Implementation
The heuristic approach has been adopted and implemented in the following algorithm in order to get results within an acceptable CPU time. The heuristic approach is very useful in this case since a closed-form expression of the th-order detectable error probability estimates can be computationally intensive.
A1)
From all the output sequences, select a group of sequences having the largest th-order 0-weight. Then select another group of sequences having the largest th-order 1-weight.
A2) Choice between selected 1-weight and 0-weight groups
In order to make a choice between the largest groups based on 1's grouping or 0's grouping, let us consider the following example.
Suppose we obtained from step A1) a 0's grouping and a 1's grouping which have as weights and numbers of sequences ( , ) and ( , ), respectively. The selection of the best group between both groups is done according to the following rule:
a.
(weight) overrides (number of sequences) which means the group that has the maximum value of is the best solution. b. If , the best group is the group that has the largest value of . c. If and , the best group is either one. In the implementation, the th-order 1-weight is chosen. , then use either OR/NOR or XOR/XNOR gate. In the implementation, the XOR gate was chosen in this situation. A5) Apply the above procedure to the merging sequence and all the remaining sequences until we end up with only one sequence at the output.
A3) Choice between (AND/NAND, OR/NOR) gates
VIII. EXPERIMENTAL RESULTS
To demonstrate the feasibility of the proposed space compaction schemes, independent simulations were conducted on various ISCAS 85 combinational benchmark circuits. We used ATALANTA (fault simulation program developed at the Virginia Polytechnic Institute and State University) to generate the fault-free output sequences required to construct our space compactor circuits and to test the benchmark circuits using reduced or compact test sets accompanied with a random testing session with the FSIM fault simulation program to generate pseudorandom test sets. We used the COMPACTEST program to generate reduced test sets that detect most detectable single stuck-line faults for all of the benchmark circuits. For each circuit, we determined the number of test vectors used to construct the compaction tree, the CPU time taken to construct the compactor, the number of applied test vectors, the simulation CPU time, and the percentage fault coverage by running ATA-LANTA and FSIM programs on a SUN Sparc 5 workstation, and COMPACTEST on an IBM aix machine, under conditions of both pairwise and generalized mergeability. For comparison purposes, we used a parity tree space compactor composed of XOR gates, that propagates errors on an odd number of inputs and is usually considered ideal for space compression. With the parity tree space compactor as reference, the novelty of the proposed schemes of constructing compaction trees is based on extensive simulation runs on ISCAS 85 combinational benchmark circuits. Many of these simulation results could be found reported in [1] and [14] . In the generalized mergeability case, the heuristic-based approach to generate the compression trees also works satisfactorily, as is obvious from the results of simulation. In the following tables we provide some experimental results in the generalized mergeability case on ISCAS 85 benchmark circuits using FSIM, ATALANTA, and COMPACTEST. As evident, in all cases, our space compactors compare very favorably with the parity tree space compactor in terms of fault coverage and reduced CPU time. The hardware overhead of the compactors was found to be within acceptable limits (in general, within 1-5% and up to 15% in certain cases). The hardware overhead was estimated as the ratio of the weighted gate count metric, which is basically average fanins multiplied by the number of gates of the compactor and that of the total circuit comprised of the CUT and the space compactor.
As a result of the conducted simulation and for some combinational circuits, we obtained several space compressors (trees). In such a situation, we selected the "best tree," which is the tree that gave the highest fault coverage; when two or more trees had identical fault coverages, we selected the one that gave the smallest CPU time.
For each ISCAS 85 combinational circuit, several variables were determined and included in tabular format. These variables are: the number of test vectors used to construct the best compaction tree, the CPU time taken to construct the best compaction tree, the number of applied test vectors corresponding to the best tree obtained, the simulation CPU time and the percentage of fault coverage by running ATALANTA, FSIM and COMPACTEST programs on a Sparc 5 SUN workstation and IBM aix machine, respectively.
IX. CLASSIFICATION OF SIMULATED RESULTS
All the simulation results obtained from FSIM, ATALANTA, and COMPACTEST are classified in tables and presented in the next paragraph under the following four categories: simulations without using compactors, simulations by assuming stochastic independence of line errors, simulations by considering stochastic dependence of line errors, and simulations by using the parity tree as a space compactor. In addition, we estimated the hardware overhead for all the circuits in two tables. The first table determines the percentage of hardware overhead for FSIM and ATALANTA while the second determines the same percentage for COMPACTEST.
A. Simulation Results Without Using Compactors
Using FSIM, ATALANTA, and COMPACTEST as fault simulators, we determined the fault coverage and the CPU simulation time required for all ISCAS 85 benchmark circuits, without using compactors. The results are shown in Tables I-III, respectively. From the above tables, we can readily conclude that ATA-LANTA and COMPACTEST provide almost similar fault coverage simulation results. These results are much higher than what is provided by the FSIM simulations for all ISCAS 85 circuits.
Since all the simulations were conducted on the Sparc 5 SUN workstation, therefore it is perfectly legitimate to compare the CPU time of all the simulators. Having done that, we notice that FSIM provides the smallest (best) CPU simulation time for almost all circuits, while COMPACTEST provides by far the highest (worst) CPU time as compared to FSIM and ATA-LANTA except for circuit c17. 
B. Simulation Results by Assuming Stochastic Independence
Tables IV-VI show the simulation results for all ISCAS 85 benchmark circuits by assuming stochastic independence of multiple line errors using FSIM, ATALANTA, and COM-PACTEST, respectively. The number of space compressors obtained for each circuit is indicated in the last column of each table. In the case where several space compactors are obtained, we selected the space compactor (tree) that gave the highest fault coverage; when two or more compactors had identical fault coverage, we selected the one that has the smallest CPU time.
It is worth mentioning that the CPU time taken to construct the best compaction tree is obtained by dividing the total CPU time computed for all trees which is obtained from the C pro- gram simulation by the total number of trees obtained for a specific ISCAS 85 benchmark circuit. In the case of stochastic independence, obviously ATA-LANTA provides the best fault coverage results among all the simulators. FSIM provides the best CPU simulation time followed by ATALANTA.
C. Simulation Results by Assuming Stochastic Dependence
By assuming stochastic dependence of line errors, we obtained Tables VII-IX, which represent the simulation results obtained from the C program as well as the simulation results obtained from FSIM, ATALANTA, and COMPACTEST. It is worth noting here that for each grouping, which is the probability of the th line error occurrence is equal to for , and it is equal to for . This operation is computed for each grouping encountered during the execution of the algorithm described above.
As far as fault coverage is concerned, ATALANTA provides the best results, while FSIM provides the best results in terms of CPU simulation time. 
D. Simulation Results Using the Parity Tree as Space Compactor
We also simulated all ISCAS 85 benchmark circuits with parity tree space compactors using FSIM, ATALANTA, and COMPACTEST. The obtained simulation results are given in Tables X-XII, respectively. By comparing the fault coverage of all the simulators, ATA-LANTA in general provides the best results followed by COM-PACTEST. In terms of the CPU time results, FSIM provides the smallest (best) results followed by ATALANTA.
From the simulation experiments, it is obvious that in all cases, our space compactor is comparable in all respects with the parity tree space compactor. For some circuits, we obtained better fault coverage with reduction in CPU time using our space compactor than what we have when a parity tree compactor is used.
X. HARDWARE OVERHEAD
Tables XIII and XIV show the hardware overhead estimates for all ISCAS 85 benchmark circuits corresponding to ATA-LANTA/FSIM and COMPACTEST, respectively.
In order to estimate the hardware overhead, we used the ratio of the weighted gate count metric, that is, average fanins multiplied by the number of gates, of the compactor and that of the total circuit comprised of the CUT and the space compactor.
As can be seen from the above tables, the hardware overhead of the best compactor for all the ISCAS 85 benchmark circuits is as small as (0.2-4.2)% in the case of ATALANTA and FSIM, and equals to 14.3% for circuit c17 (the space compactor of this circuit is composed of one gate only). For the case of COM-PACTEST simulator, the hardware overhead is even smaller and it ranges from 0.01 to 3.9% and is equal to 14.3% for circuit c17.
For the large circuits such as c2670, c7552, we obtained groupings of two, three, and four sequences. In the case of stochastic dependence of errors, the space compactor logical circuit depends on the values assigned to . Therefore, changing could change the gate composition of the corresponding logical circuit.
XI. CONCLUSIONS
This paper presents space compaction techniques of test output responses in the context of built-in self-testing of VLSI circuits. In this research, however, no emphasis was placed on designing aliasing free compressors. We have rather endeavored to show how the input test sets and their lengths play a role in the design of compression networks. Loss of information in general is unavoidable when the size of the output responses is reduced. In the simulation experiments we used the reduced sets of tests provided by ATALANTA and COMPACTEST for ISCAS 85 benchmark combinational circuits, under conditions of stochastic independence and dependence of single and multiple line errors (two line errors, in fact, comes as a special case). Though the provided reduced test sets are not the minimal test sets that ensure 100% fault coverage, experimental results indicate that the designed space compressors are comparable to parity tree compactors in almost all respects. The design methods are simple and the resulting hardware overhead is low, which makes their applications in the BIST environment quite suitable. Tony (Toni) F. Barakat received the B.Sc. degree (Honors) in physics from the Lebanese University, Beirut, Lebanon, and the M.A. Sc. degree in electrical and computer engineering from the University of Ottawa, Ottawa, Ont., Canada, in 1997.
He was with Nortel Networks, Ottawa, as a Software Engineer specializing in ATM technology (Magellan family). Later, he joined Lucent Technologies, Naperville, IL, as Member of Technical Staff. There, he was a software developer in the packet switching unit development platform of the 5ESS switch and as a Network Engineer in the 7R/E packet driver and packet local solutions. His research interests include digital systems design, digital circuits testing, including data compression in built-in self-testing of VLSI circuits.
Mr. Barakat is coauthor (along with S. R. Das and E. M. Petriu) of a paper which received the prestigious Rudolph Christian Karl Diesel Best Paper Award in recognition of the excellence of the contribution. The paper was scheduled for the Fifth Biennial World Conference on Integrated Design and Process Technology, which was held in Dallas, TX, in June 2000. He is currently an Assistant Professor of Electrical and Computer Engineering at Duke University, Durham, NC. His current research projects are in design and test of system-on-a-chip, built-in self-testing (BIST), distributed sensor networks, real-time embedded operating systems, architectural optimization of microelectrofluidic systems, and thermal management in integrated circuits. He has published over 40 papers in archival journals and refereed conference proceedings. His research support is provided by the National Science Foundation, DARPA, the North Carolina Networking Initiative, and several other industrial sponsors.
Dr. Chakrabarty was the recipient of the 1999 National Science Foundation CAREER award, and the Mercator Professor award from Deutsche Forschungsgemeinshaft, Germany, for 2000-2001. He is also the coauthor of a paper that received the James Beausang Best Student Paper Award at the IEEE VLSI Test Symposium, 2000. He serves as Vice-Chair of Technical Activities in IEEE's Test Technology Technical Council, and holds a US patent on built-in self-test. He is a member of Sigma Xi.
