We describe an on-chip test generation scheme for synchronous sequential circuits that allows at-speed testing of such circuits. The proposed scheme is based on loading of (short) input sequences into an on-chip memory, and expansion of these sequences on-chip into test sequences. Complete coverage of modeled faults is achieved by basing the selection of the loaded sequences on a deterministic test sequence T 0 , and ensuring that every fault detected by T 0 is detected by the expanded version of at least one loaded sequence. Experimental results presented for benchmark circuits show that the length of the sequence that needs to be stored at any time is on the average 10% of the length of T 0 , and that the total length of all the loaded sequences is on the average 46% of the length of T 0 .
Introduction
On-chip generation of test sequences for synchronous sequential circuits allows at-speed test application under the normal operation conditions of the circuit. Under such a test application scheme, test sequences are applied to the primary inputs of the circuit at-speed. The flip-flops are driven by the combinational logic of the circuit and clocked at-speed in the same way as during the normal operation of the circuit. At-speed testing is important in detecting defects that affect the timing behavior of a circuit [1] , [2] . As the accuracy of automatic test equipment (ATE) falls behind clock speeds of VLSI chips, on-chip generation and application of at-speed tests becomes an important alternative to ATE-based test application. On-chip generation of test sequences for synchronous sequential circuits without modifying the circuit flip-flops was considered in [3] and [4] . In [3] , test sequences produced by an LFSR were modified by holding certain input vectors (i.e., by applying them repeatedly) for several time units. This was shown to improve the coverage of stuck-at faults achieved by the resulting test sequence compared to the test sequence produced by an LFSR without the hold option. In [4] , special hardware was designed to generate test sequences that achieve high coverage of stuck-at faults. Both techniques rely solely on on-chip generation of test sequences. However, in both techniques, it is not guaranteed that all the detectable stuck-at faults would be detected.
To achieve the goal of detecting every detectable fault, it is possible to load a test sequence that was generated off-chip (i.e., load a test sequence that was generated by a test generation procedure) into an on-chip memory, and then apply it to the circuit at-speed. This approach guarantees that every stuck-at fault detectable by an off-chip test generation procedure would be detected by the on-chip sequence. However, the resulting storage requirements may be unacceptably high, and the time to load the test sequence into the on-chip memory may also be high. Partitioning of the test sequence into subsequences and loading each subsequence separately may reduce the on-chip memory requirements; however, the total loading time of the subsequences remains the same as the loading time of the unpartitioned sequence. In addition, the length of each subsequence may be high, either because of the requirement that the subsequences would together yield the same fault coverage as the unpartitioned sequence, or because of the desirability of using fewer load cycles. Thus, again, we may have large on-chip storage requirements. Other methods to generate test sequences on-chip use encoding of a sequence to reduce memory requirements [5] . However, decoding the sequence typically precludes at-speed test application. Furthermore, encoding can be used to reduce the memory requirements of the scheme proposed here if the requirement for at-speed testing can be relaxed.
In this work, we propose an on-chip test generation scheme that combines the advantages of approaches such as [3] and [4] with the advantages of storing test sequences on-chip.
_

___________________________
Permission to make digital/hardcopy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 99, New Orleans, Louisiana (c) 1999 ACM 1-58113-109-7/99/06.. $5.00 Under the proposed approach, short input sequences are loaded and stored in an on-chip memory. Each input sequence is then expanded on-chip into a test sequence. Together, the expanded test sequences achieve the same fault coverage achieved by a given test sequence that was generated off-chip. Next, we describe the proposed scheme and its relationship to existing approaches in more detail.
It was observed in [6] that the fault coverage achieved by an LFSR applying test patterns to a combinational circuit can be increased by allowing the LFSR to apply a set of test patterns S, followed by the set S consisting of the complemented pattern of every pattern s ∈ S. For synchronous sequential circuits, it was observed in [3] that by holding certain input vectors for several time units, it is possible to increase the fault coverage achieved by a test sequence produced by an LFSR. In [4] , the test sequences produced by a built-in test-pattern generator were repeated, complemented and reversed to increase their fault coverage. The techniques of [3] , [4] and [6] have in common the use of simple manipulations of a test set or test sequence S in order to obtain an expanded test set or test sequence S exp with increased fault coverage compared to S. Related approaches based on the manipulation of individual test vectors are described in [7] , [8] , [9] and [10] . In the arithmetic BIST approach of [7] , an adder available as part of the circuit may be used for generating a new test pattern by adding a constant value to the test pattern currently applied to the circuit. The resulting set of patterns is comparable to that of a random pattern generator such as an LFSR. In [8] , the next test vector to be applied to the circuit is obtained from the test vector currently applied to the circuit by complementing certain bits in the currently applied vector. Single bits are complemented in the implementation described in [8] . Additional functions mentioned in [8] are complementation of multiple bits or of complete test vectors, shifting and rotation of test vectors. The implementation of [8] stores the description of the function to be applied to each test vector in a memory on-chip. Each memory word stores the description of one function that needs to be applied to one test vector in order to obtain the next one. The number of functions that need to be stored in [8] is one less than the number of vectors applied to the circuit. Only combinational circuits were considered in [8] , since only for such circuits, the test sets can be reorganized such that consecutive test vectors are obtained from each other by simple functions that can be stored on-chip. Complete fault coverage is obtained by reorganizing a complete test set for the circuit. In [9] and [10] , a built-in self-test method for scan designs was proposed. The method of [9] and [10] uses a set of vectors called centers applied to the circuit through the scan chain. Additional vectors are derived from the centers by randomly complementing bits of the centers. Since tests are applied through the scan chain, they may not be applicable at-speed unless the scan chain is optimized to operate at the system clock speed.
In this work, we define a set of functions that can be applied to test sequences of a synchronous sequential circuit in order to obtain longer sequences with higher fault coverages. The functions we use are similar to the ones of [3] , [4] , [6] , [7] and [8] , and include operations such as repetition of the test sequence, complementation of the complete sequence, reversal of the complete sequence, and so on. As part of our study, we propose a procedure for deriving a set of sequences Σ to which these functions are applied. We define a set Σ exp that includes the sequences of Σ after the selected functions are applied to them. For every sequence S ∈ Σ, Σ exp contains the expanded version S exp of S obtained after applying the proposed functions to S. Each sequence in Σ exp is applied assuming that the circuit starts from an unknown state. Thus, the sequences in Σ exp are independent of each other.
An important feature of the set Σ computed here is that the fault coverage achieved by the sequences in Σ exp is the same or higher than the fault coverage that can be achieved by an offchip test sequence. To guarantee this property, the set Σ is derived based on a test sequence T 0 that was generated off-chip and achieves the desired fault coverage. In our implementation, Σ consists of selected subsequences of T 0 . Since Σ exp , and not Σ, is applied to the circuit in order to test it, the total length of all the sequences included in Σ may be lower than the length of T 0 . The difference in length between T 0 and Σ may sometimes be significant. This situation is illustrated by Figure 1 . In the figure, three sequences, S 1 , S 2 and S 3 , are derived from a test sequence T 0 . An expanded version of S i , S i,ext is applied to the circuit for i = 1, 2, 3. In this case, the total length of S 1 , S 2 and S 3 is smaller than the length of T 0 . This reduction in length is possible because of the use of expanded sequences during test application. Another important consequence of the use of expanded sequences is that the maximum length of any sequence in Σ can be kept relatively low. This helps reduce the on-chip memory needed to store a subsequence.
Figure 1: Sequences under the proposed scheme
The following test application scheme is implied by the discussion above. Each sequence in Σ is loaded into an on-chip memory at the tester speed. After each sequence S ∈ Σ is loaded, its expanded sequence S exp is produced on-chip and applied to the circuit at-speed. In this scheme, the size of the memory need only be large enough to hold the longest sequence contained in Σ. There is no need to hold the circuit state during the loading of a sequence, since the sequences are computed under the assumption that the circuit state is unknown before the application of each expanded sequence. The memory used for storing the loaded sequences may be a memory that already exists on the chip, or a memory added for test application.
Compared to the methods of [6] , [7] , [8] , [9] and [10] that are aimed at combinational circuits, the proposed method targets synchronous sequential circuits. Compared to the methods of [3] and [4] , the proposed method guarantees the coverage of all the faults detected by an off-chip test sequence T 0 . Compared to loading of the test sequence T 0 and then applying it at-speed, the proposed method loads only some of the vectors included in T 0 .
In addition, at any given time, the proposed method needs only a relatively small number of vectors out of T 0 . Consequently, the proposed method results in reduced loading time, and reduced memory requirements. Another advantage of the proposed test generation scheme is that it applies at-speed a number of test vectors that is larger than the number of vectors in T 0 . Consequently, it potentially achieves better coverage of defects that affect circuit delays. Compared to schemes that partition the test sequence T 0 and load each subsequence separately, the proposed procedure has the following advantages. (1) The total length of the sequences loaded by the proposed method is shorter than T 0 , whereas partitioning includes every test vector of T 0 in at least one loaded subsequence. (2) The maximum subsequence length when partitioning is used may be high due to the need to achieve the same fault coverage as T 0 . Compared to methods that use encoding of off-chip test sequences, the proposed method allows at-speed testing of the circuit.
The proposed test application scheme is general purpose, and can be matched to any circuit. It is only necessary to adjust the size of the memory to the length of the longest sequence in Σ and to the number of circuit primary inputs. The structure of the hardware required to manipulate the test sequences is independent of the circuit.
The proposed method deals with the application of test sequences. As for the output response of the circuit-under-test, it is possible to use output response compression, and compare the compressed response of the circuit with a precomputed signature of the fault free circuit. Care must be taken to synchronize the circuit before the application of every test sequence to avoid unknown values during the computation of the signature.
The paper is organized as follows. In Section 2 we define sequence manipulations that can be easily implemented in hardware or software, and can potentially increase the fault coverage achieved by a given sequence. In Section 3 we describe a procedure for selecting a set of sequences Σ based on a given test sequence T 0 . With the manipulations defined in Section 2, Σ exp achieves the same fault coverage as T 0 . In Section 4 we present experimental results and compare the size of Σ with the length of T 0 . Section 5 concludes the paper.
Sequence manipulations
In this section, we introduce several operations that allow us to obtain an expanded sequence S exp from a given sequence S. We also discuss their hardware implementation. We assume that S is stored in an on-chip memory, and that S exp is applied to the circuit by using a counter (the memory address counter) that goes through the memory addresses in consecutive order. Additional hardware is needed depending on the required operation. Repetition: The expanded sequence S n is obtained from S by repeating n times the sequence S. For example, for the sequence S = (000, 111), we obtain S 2 = (000, 111, 000, 111), S 3 = (000, 111, 000, 111, 000, 111), and so on. Repetition can be implemented by adding a counter which is incremented by one every time the memory address counter reaches zero after going through all the memory addresses.
Complementation:
The sequence S is obtained from S by complementing every vector included in S. For example, for the sequence S = (000, 111), we obtain S = (111, 000). Complementation can be implemented by adding inverters on the memory outputs, and using multiplexers to select between the complemented and uncomplemented outputs. Shifting: The sequence S << 1 is obtained from S by shifting every vector of S to the left by one position. Circular shift is used to prevent the rightmost bits from becoming all-0 or all-1. For example, for the sequence S = (001, 101), we obtain S << 1 = (010, 011). Shifting can be implemented by adding a multiplexer on every memory output. The multiplexer on output i is driven from output i (for non-shifted vectors) and from output (i + 1) mod m (for shifted vectors). Here, we assume that i = 0 corresponds to the most-significant output, and that the word size of the memory is m. Reversal: The sequence rS is obtained from S by reversing the order of the vectors in S. For example, for the sequence S = (000, 001, 111), we obtain rS = (111, 001, 000). Reversal can be implemented by using an up/down counter in the down mode as the memory address counter to go through the memory addresses. The expanded test sequence: We combine all the operations above to obtain from S an expanded test sequence S exp as follows. We repeat n times the sequence S to obtain S′ exp = S n . We complement S′ exp to obtain S′′ exp = S n ⋅ S n (here, ⋅ stands for concatenation 
Sequence selection
The selection of the set of sequences Σ to be expanded on-chip is based on a given test sequence T 0 that achieves the target fault coverage for the circuit. Our goal is to construct Σ such that the set of expanded sequences Σ exp derived from the sequences in Σ would achieve the same fault coverage as T 0 . In this section, we describe the selection of Σ. We first describe the basic procedure in Subsection 3.1. We then describe a postprocessing procedure to reduce the size of Σ in Subsection 3.2.
The basic procedure
Initially, we set Σ = φ. Subsequences for yet-undetected faults are extracted from T 0 and added to Σ, until the set of expanded sequences Σ exp detects all the faults detected by T 0 . A subsequence S added to Σ to detect a target fault f is not required to detect f , but rather, the expanded sequence S exp obtained from S is required to detect f . The sequence S is selected such that it is as short as possible, yet satisfies this condition. The overall structure of the procedure for constructing Σ is given as Procedure 1 below and described next. The given test sequence T 0 is first simulated. The set of faults detected by T 0 is denoted by F. For every fault f ∈ F, the first time unit where f is detected is denoted by u det ( f ). The set F targ contains all the faults in F which are not yet detected by the expanded versions of the sequences in Σ. Initially, Σ = φ, and F targ = F. In each iteration of Procedure 1, a fault f ∈ F targ is considered. A sequence S is constructed based on f by calling Procedure 2 (described below). Procedure 2 constructs S such that its expanded version S exp detects f . The sequence S is added to Σ. Thus, in each iteration of Procedure 1, at least one additional fault is detected, and the procedure is guaranteed to terminate with a set of sequences Σ such that Σ exp detects every fault in F.
The fault f selected for the construction of the next test sequence S is the one with the highest detection time u det ( f ) of all the faults in F targ . The reason for this choice is that faults with higher detection times tend to be more difficult to detect, and test sequences that detect them tend to be longer, and tend to detect a larger number of additional faults. Procedure 1: The overall sequence selection procedure (1) Simulate T 0 to find the set of detected faults F. For every f ∈ F, store in u det ( f ) the first time unit where f is detected. Set
Call Procedure 2 below to construct a sequence S based on f . Add S to Σ. (4) Simulate the faults in F targ under the expanded version S exp of S. Drop from F targ every fault which is detected by S exp . (5) If F targ ≠ φ , go to Step 2. Next, we describe Procedure 2 that constructs a sequence S for a given fault f . The procedure is given below. The procedure is demonstrated by considering ISCAS-89 benchmark circuit s27 under the test sequence shown in Table 2 , and using n = 1 repetitions to obtain expanded test sequences. In Table 2 , we show for every time unit u the vector T 0 [u] at time unit u of T 0 , and the faults detected at time unit u (the faults with u det ( f ) = u). We consider the fault f 10 with u det ( f 10 ) = 9. This is the highest detection time of any fault, making f 10 the first fault selected by Procedure 1.
We use the following notation. The subsequence of T 0 that starts at time unit u 1 and ends at time unit u 2 is denoted by f is detected by T 0 [u 1 , u 2 ] assuming that both the fault free and the faulty circuits are in the all-unspecified states before the subsequence is applied. The goal of Procedure 2 is to find the shortest possible sequence T′ such that T′ exp detects f . Since T 0 detects f , T′ can be derived from T 0 . We derive T′ from T 0 as follows. Procedure 2 first finds a time unit u start and a subsequence T′ = T 0 [u start , u det ( f )] of T 0 such that T′ exp detects f . For this purpose, we start with u start = u det ( f ), and reduce u start until f is detected by T′ exp . It is always possible to find T′. In the worst case, T′ = T 0 [0, u det ( f )] will detect f when u start = 0. The expanded sequence T′ exp will also detect f in this case. For the fault f 10 of s27, we consider the sequences T′ = T 0 [9, 9] = (1011) corresponding to u start = 9, T′ = T 0 [8, 9] = (0000, 1011) corresponding to u start = 8, T′ = T 0 [7, 9] = (0000, 0000, 1011) corresponding to u start = 7, and so on. For every sequence T′, we compute T′ exp and simulate it. For illustration, the sequence T′ exp obtained for u start = 9 is (1011, 0100, 0111, 1000, 1000, 0111, 0100, 1011). The first value of u start for which T′ exp detects f 10 is u start = 6 with T′ = T 0 [6, 9] = (1001, 0000, 0000, 1011).
Once T′ is found, we further reduce its length by omitting test vectors from it. The test vector at time unit u of T′, denoted by T′ [u] , can be omitted if, after the omission, T′ exp still detects f . Procedure 2 considers all the test vectors of T′ in a random order. If T′ exp after the omission of T′ [u] from T′ detects f , the omission is accepted, and T′ is redefined to be the sequence without the omitted vector. Following this, all the time units along the sequence are considered again. Otherwise, if T′[u] cannot be omitted, T′ [u] is restored into T′. The procedure terminates when all the time units have been considered without being able to omit any additional vector. For illustration, we compact T′ = (1001, 0000, 0000, 1011) obtained for f 10 by omitting test vectors in a random order. The test vectors of T′ are numbered 0, 1, 2 and 3 from left to right. Omitting the test vector at time unit 2 of T′, we obtain T′ = (1001, 0000, 1011). The sequence T′ exp corresponding to T′ detects f 10 , and the omission is accepted. We renumber the test vectors of T′ by 0, 1 and 2. Omitting the test vector at time unit 1 of T′, we obtain T′ = (1001, 1011). The sequence T′ exp corresponding to this sequence does not detect f 10 , and we restore the omitted vector to obtain T′ = (1001, 0000, 1011). Omitting the test vector at time unit 2 of T′, we obtain T′ = (1001, 0000). The sequence T′ exp corresponding to this sequence detects f 10 , and the omission is accepted. We renumber the test vectors of T′ = (1001, 0000) by 0 and 1. Omitting the test vector at time unit 1 of T′, we obtain T′ = (1001). The sequence T′ exp does not detect f 10 , and we restore the omitted vector to obtain T′ = (1001, 0000). Omitting the test vector at time unit 0 of T′, we obtain T′ = (0000). The sequence T′ exp does not detect f 10 , and we restore the omitted vector to obtain T′ = (1001, 0000). We have now considered both test vectors of T′ and no additional omissions were possible. Therefore, the procedure terminates with T′ = (1001, 0000). Procedure 2 is given next. Procedure 2: Finding the sequence T′ for a fault f
If f is not detected, set u start = u start − 1 and go to Step 2. (4) Set U = {0, 1, . . . , L − 1}, where L is the length of T′. (5) If U = φ , stop: T′ is the required sequence. (6) Randomly select a time unit u ∈ U. Remove u from U.
If f is not detected, restore T′ [u] , and go to Step 5. (9) Go to Step 4. After the generation of T′ for f 10 as above, the application of Procedure 1 to s27 continues as follows. Simulating the sequence T′ exp for T′ = (1001, 0000) found above, we find that 26 faults of the 32 faults of s27 are detected. The remaining undetected faults are f 13 , f 14 , f 18 , f 19 , f 25 and f 26 . From Table 2 , the undetected fault with the highest detection time by T 0 is f 13 , with u det ( f 13 ) = 5. The next sequence is selected based on f 13 . We find that T′ = T 0 [3, 5] = (1001, 0100, 1011) has the highest value of u start for which T′ exp detects f 13 . After omission of test vectors, we obtain T′ = (1001). Simulating the sequence T′ exp for T′ = (1001), we find that one additional fault, f 13 , is detected. The next sequence is computed based on f 18 , and it detects the remaining five faults.
Postprocessing
In this subsection, we consider the possibility of reducing the number of sequences in the set Σ constructed by Procedure 1.
The sequences in Σ are simulated when they are first generated, and the faults they detect are dropped from the set of target faults. As with other test generation procedures, it is possible that all the faults detected by the expanded version of a sequence S i would be detected by the expanded versions of sequences S i+1 , S i+2 , . . . , added to Σ after S i was already added. In this case, S i can be eliminated from Σ.
We identify sequences S i that can be eliminated from Σ by simulating the sequences in several different orders. In each simulation pass, we start from the set of all target faults, and drop the faults detected by each expanded sequence when it is simulated. An expanded sequence S i,ext that detects no faults when it is simulated causes its sequence S i to be omitted from Σ.
The sequences are simulated in the following orders. First, we simulate the sequences by increasing length. This helps drop the longest sequences if any one of them becomes unnecessary. Second, we simulate the sequences by decreasing length. Since the longer sequences are expected to detect more faults, it is likely that we can find shorter sequences that are not necessary. This simulation step will drop such sequences. Third, we simulate the sequences in the reverse order of generation. This helps drop sequences that become unnecessary after additional sequences are added to Σ. Finally, we simulate the sequences by decreasing number of the faults they detected during the previous simulation pass. The motivation for this step is that sequences that detect a small number of faults are likely to be identified as unnecessary if they are simulated at the end. We refer to this four step process as static compaction of Σ.
Experimental results
We applied Procedure 1 followed by static compaction of the set Σ to ISCAS-89 benchmark circuits. The test sequences used as T 0 in the construction of Σ are the ones generated by STRATE-GATE [11] and compacted by the static compaction procedure of [12] . In four separate experiments, we used n = 2, 4, 8 and 16 repetitions to obtain Σ exp . For each circuit, we report the results obtained using the best value of n. The best value of n is the one that results in the smallest maximum sequence length of any sequence in Σ, and the smallest total length of all the sequences in Σ, at the lowest run time (in this order). The results are reported in Table 3 . After the circuit name, we show the total number of faults, the number of faults detected by the test sequence T 0 , and the length of T 0 . Next, we show the value of n, and the following information for Σ obtained before static compaction. The number of sequences included in Σ is shown under column |Σ|. The total length of all the sequences in Σ is shown under column tot len. The maximum length of any sequence in Σ is shown under column max len. Under column after comp.
we show the results after using static compaction of Σ to drop unnecessary sequences from Σ. In Table 4 , we show the normalized run time of Procedure 1 and of the compaction procedure. The run time is normalized by dividing it by the time to simulate T 0 . Normalization helps factor out inefficiencies of the implementation. Additional information is shown in Table 5 . We include in Table 5 the original sequence length, the number of repetitions n, and the number of sequences in Σ. The total length of the sequences in Σ is shown next, followed by the ratio of the total length to the length of the original sequence T 0 . The maximum length of any sequence in Σ is shown next, followed by the ratio of the maximum length to the length of T 0 . In the last column of Table 5 , we show the total length of the sequences in Σ exp . This is the total length of all the test sequences applied to the circuit. For n repetitions, a sequence S of length L is expanded into a sequence of length 8nL (the sequence is repeated n times, then duplicated three times by complementing it, shifting it, and reversing it). Thus, if the total length of the sequences in Σ is Λ, the total length of the sequences in Σ exp is 8nΛ. This is the value shown in the last column of Table 5 . In the last row of Table 5 , we show the average ratios of the total and maximum lengths to the original sequence length. It can be seen that on the average, the proposed scheme requires loading of less than half the number of vectors included in T 0 . In addition, the length of the sequence that needs to be stored at any given time is a tenth of the length of T 0 . Nevertheless, the fault coverage achieved is the same as the fault coverage achieved by T 0 , due to the use of expanded sequences.
Concluding remarks
We selected a function that can be used to expand a given sequence S into a longer sequence. The function consisted of repetition of S, complementation of S, shifting of the vectors included in S, and reversal of the test vectors in S. We then described a procedure to solve the following problem. Given a test sequence T 0 , find a set of subsequences Σ such that the expanded versions of the sequences in Σ achieve the same fault coverage as T 0 , and the maximum length of any sequence in Σ is as small as possible. The proposed procedure was applied to atspeed, on-chip test generation for synchronous sequential circuits. Under the proposed scheme, the sequences in Σ are loaded into a memory on-chip, expanded into test sequences on-chip, and applied to the circuit-under-test at-speed. We presented experimental results to support the effectiveness of such a scheme in achieving complete fault coverage for synchronous sequential circuits. The results showed that the total length of the sequences in Σ is shorter than the given test sequence T 0 . This implies a reduction in loading time compared to loading of T 0 . In addition, the maximum length of any sequence in Σ was small compared to T 0 . This implies a reduction in the memory requirements compared to storage of T 0 .
