The data paths of most contemporary general and special purpose processors include registers, adders and other arithmetic circuits. I f these circuits are also used for Built-In Self Test, the extra area required for embedding testing structures can be cut down eflciently. Several schemes based on accumulators, subtracters, multipliers and shgt registers have been proposed and analyzed in the past for parallel test response compaction, whereas some efforts have also been devoted in the bit-serial response compaction case. In this paper, we analyze and evaluate the bit-serial version of a recently proposed scheme for parallel test response compaction [5/. Experimental results on the ISCAS'85 benchmark circuits indicate that the post-compaction fault coverage drop attained by the new scheme is significantly lower than other already known accumulator-based compaction schemes.
Introduction
The advances of semiconductor process technology force IC companies to move towards very deep submicron integrated circuit technology for taking advantage of the increased functionality, higher speeds and decreased costs that it offers. Very deep submicron ICs although capable of offering increased speed and integration of millions of gates require new and effective test methodologies in order to be tested adequately and cost effectively.
Built-In Self-Test (BIST) is becoming a very attractive Design For Testability (DFT) strategy since it reduces external testing requirements. BIST tries to incorporate in the same IC the Circuit Under Test (CUT) and its tester enabling in this way the chip to test itself. Although this leads to increased implementation area, this DFT method is becoming more and more attractive since it decreases the time to market, it often leads to higher testing quality and it cuts down the cost effectively [l].
The quality of a BIST scheme depends on: (a) the Test Pattern Generator (TPG), a circuit ithat produces the patterns applied to the CUT, and (b) >the Test Response Verifier, a circuit that captures the responses of the CUT, compacts them to one single pattern called the signature and compares this against the signature of a fault-free CUT.
Even if the test pattern sequence generated by the TPG achieves 100% fault coverage for a specific fault model, the post compaction fault coverage may be much smaller due to the well-known problem of aliasing. Aliasing is the possibility that a faulty and a fault-free circuit produce the same signature although the CUT'S output responses differ.
The silicon area required for embedding BIST can be minimized if some of the original building blocks of the circuit are utilized to generate patterns and / or to compact test responses. Processor as well as digital signal processing circuits' datapaths contain adders, subtractors and multipliers. The suitability of these circuits for test response compaction in test per-clock BIST schemes has been investigated in [2-51. In several cases however, the use of a test per-scan scheme, that i!; bit-serial pattern generation and compaction, is imperativle. For example, we can refer to: (a) embedded cores with an isolation ring, (b) circuits with a boundary scan path and (c) sequential circuits with scan paths.
The suitability of various arithmetic circuits for bitserial test pattern generation and test reisponse compaction was investigated in [6]. Serial bit compaction by an accumulator using either a 2's complement adder or a rotate carry adder was discussed. Finally, the author of [6] proposed the use of a serial-parallel response compaction scheme and derived an upper bound on the limiting value of the aliasing probability for efficiently long test sequences. In many cases however, the length of the test set may not be very large. In such cases, the actual aliasing ratio may be far larger than the upper bound derived in [6] and the post-compaction fault coverage may drop below the acceptable levels. Experiments performed on the ISCAS'85 benchmark circuits verified our fears (experimental results will be presented in Section 3). Recently, a new test response compaction scheme based on an accumulator behaving as a multiple-input Non-Linear Feedback Shift Register has been proposed [ 5 ] . It has been shown that this scheme achieves significantly lower postcompaction fault coverage drop than the other accumulator-based test response compaction schemes in test-per-clock BIST. However its effectiveness in test-perscan structures has not been investigated. In this work we investigate its suitability for serial test response compaction. Experiments on the ISCAS'85 circuits reveal its superiority against the already known accumulatorbased bit-serial schemes.
In the next section we review the bit-serial and serialparallel response compaction schemes that have already been proposed and we analyze the bit-serial versions of the parallel scheme proposed in [5] . We also compare these schemes in terms of area overhead. Experimental results on the ISCAS'85 circuits are presented in Section 3. Conclusions are given in the last section.
Bit Serial Response Compaction Schemes
Consider a CUT which has x outputs and y, with y 2 0 (in case of a combinational circuit y = 0), internal state flip-flops connected in a single scan path. The response of the CUT is shifted out serially by the application of x + y cycles of a shift clock, suppose S,.
Finally, suppose that the width of the available accumulator is k. (ignore the dashed lines logic). We will denote this scheme as "bit-serial accumulator". Each response bit that is shifted out is added at the least significant bit position, in order for all the bits of the signature to get affected by possible erroneous responses. For small test lengths and large accumulator widths (k = 32 or 64) this is equivalent to a counter of Is of the test responses whereas for smaller values of k and large test lengths the response compactor of Figure 1 is equivalent to a modulo 2k counter of Is. Provided that an accumulator exists in the original system, this scheme does not impose any area overhead.
A second bit-serial response compactor [6] can be derived based on the bit-parallel compactor presented in In the above schemes, if due to a fault the amount of +I bit errors equals the number of -1 errors, the fault will not be detected since the bits of the responses are added using the same weight. This can be alleviated if the i-th response bit is added at position i mod k of the accumulator as suggested by [6] . A straightforward implementation of this scheme is given in Figure 2 . In this scheme, after k bits of the response have been shifted in the shift register RI, the content of RI is added with the content of R. The sum is stored in R. We will denote these response compactor as "serial-parallel accumulator", implying that the response of the CUT is first shifted in a k-bits wide register. Again, two different compactors one with stored carry feedback and one without carry feedback may be constructed in an analogous to Figure 1 way. According to the analysis of [6] it is expected that these response compactors would perform better than those of Figure 1 .
However, the area overhead that the scheme of Figure 2 imposes depends on the existence of a second register. If such a register exists, the area overhead imposed consists of the required gates for converting it into a shift register. If a second register is not available in the original system, a k-bit shift register needs to be added. Moreover, since an addition takes place after k new response bits have been shifted into RI, a new clock signal must be devised with a period of k*S, clocks. For deriving such a signal the introduction of a rlogzkl bits wide counter will be required.
Recently, test response compaction by an accumulator behaving as a multiple-input Non-Linear Feedback Shift Register has been proposed in [SI. The parallel scheme proposed in [SI can easily be modified to a bit-serial test response compaction scheme by restricting the number of output response bits that are processed at each clock cycle to one (see Figure 3) . In test mode, during each clock cycle, the contents of the register are shifted by one position to the lefi and are added with the operand A of the adder and the content of the D flip-flop (denoted as X). The result of the addition is stored back in the register and the X flip-flop. The final content of the register constitutes the signature. Note that the least significant bit of' A is the output response bit of the CUT while the remaining k-1 bits are at constant values during testing. Similarly, in our scheme we expect that, by applying a constant value with an irregular pattern of zeros and ones, we will achieve a post-compaction fault coverage drop smaller than by applying a regular pattern. The area overhead imposed by the proposed scheme is dominated by the multiplexer insertion and is equivalent with that needed for converting an existing register of the circuit into a shift register. Note that in the proposed compactor no extra clock signal is required since a new addition is performed in every S, clock cycle in a bit serial fashion. Therefore, the area overhead for the modifications required by the proposed scheme is less than that of Figure  2 , but increased compared to the response compactors of Figure 1 . However, as we will show with experimental results, the compactors of Figure I are incapable of sustaining the post-compaction fault coverage at high levels.
Evaluation and Comparison,s
In order to validate the effectivencss of the proposed scheme, we performed several simulations. We use a customized fault simulator that impl'ements the various response compaction schemes and computes the signatures for all single stuck-at faults in the CUT.
For our experiments we use the nom-redundant version of the ISCAS'85 benchmark circuits. These circuits are combinational parts of datapaths and are therefore likely to be accompanied by an accumulator. We consider that the registers of the inputs and outputs of the circuits form a scan path.
We also consider two test sets : Deterministic compacted test sets derived using the Test Synthesis tools by Synopsys. We assume that the test vectors of these test sets are serially applied to the scan register of the circuit.
0 Pseudorandom test sets produced by LFSRs. For each benchmark circuit we choose a primiitive polynomial of degree m (m = 25 or 31 in our experiments) based on the guidelines given in [9] and construct the corresponding LFSR. We feed the scan register of the circuit with the output of the LFSR until we achieve the desired fault coverage (100% or less in the cases of circuits with random pattern resistant faults). The number of vectors in each test set used as well as the pre-compaction fault coverage for each circuit are given in Table 1 .
Deterministic Tesi Sei
At first we evaluate the bit-serial response compaction schemes proposed in [6] , that is the bit-serial accumulator with and without stored carry feedback and the serialc1908nr c2670nr c3540nr c5315nr parallel accumulator with and without stored carry feedback. Table 2 presents results for three different  datapath sizes (k=8, 16 or 32) .
From the results of Table 2 we can see that:
The post-compaction fault coverage drop in the bitserial accumulator scheme is very high in the case of compacted test sets for the ISCAS'85 benchmark circuits. It is smaller in the case of pseudorandom test sets but still remains at high levels.
For small accumulators (k=8), there are some cases where the bit-serial accumulator without carry feedback gives slightly better results than the bit-serial accumulator with stored carry feedback.
The bit-serial accumulator with stored carry feedback gives the same results with the bit-serial accumulator without stored carry feedback for large accumulators (k=16 and 32). This can be justified by the fact that these response compaction schemes implement a one's count fimction and therefore a carry is unlikely to happen in cases of a few hundred or thousand test vectors. When'the size of the accumulator is small (e. Furthermore, each one of the two schemes produces the same results for k=16 and 32 indicating that an increase in the size of accumulator will not lead to any better results. The serial-parallel schemes produce much better results compared to the bit-serial schemes. In this case, the scheme with the stored carry feedback achieves better results compared with the scheme without stored carry feedback and the results improve with larger accumulator sizes. However in many cases the post-compaction fault coverage drop is more than 1%.
We conclude that the above mentioned schemes are inadequate to provide small post-compaction fault coverage drop. We now evaluate the proposed scheme. In order to evaluate the effect of the constant value that is selected for the k-1 bits that are added together with the CUT'S output response bit, we present results in Table 3 for 3 different values: (a) 0000 ..., (b) 0101 ... and (c) a random value.
We can easily see that, in all cases, the proposed scheme achieves far better results than the other accumulator-based schemes. We can also see that the value 0000 ..., as it was expected, is not the best choice for the k-1 bits. The other two values produce zero fault coverage drop in almost all cases, when k=16 or 32, and very small fault coverage drop in small accumulator sizes (eg. k=8).
The mean value of the post-compaction fault coverage drops measured on the ISCAS'85 circuits for various examined accumulator sizes is presented in Figure 5 . 
Conclusions
BIST approaches are gaining increasing interest in today's complex integrated circuits. There are several cases of circuit or sub-circuit BIST in which1 bit serial testing structures are more appropriate than their bit parallel counterparts. Cores with an isolation ring or scan path equipped sub-circuits are such examples. If these testing structures can be derived by slight modifications of already existent hardware then BIST can be added with a minimum implementation area increase.
In this paper we have analyzed and evaluated the already known schemes for bit-serial test response compaction and the bit-serial version of the parallel test response compaction scheme proposed in [ 5 ] .
Experimental results on the ISCAS'85 benchmark circuits using both deterministic and pseudorandom test sets show that the proposed bit serial compactor's postcompaction fault coverage drop is significantly lower than the already known bit-serial or serial-parallel response compacting schemes. Moreover, the area required for the accumulator modifications is very small. 
