The testability of basic DSP datapath structures using pseudorandom built-in self-test techniques is examined. The addition of variance mismatched signals is identified as a testing problem, and the associated fault detection probabilitie,s are derived in terms of signal probability distributions. A method of calculating these distributions is described, and it is shown how these distributions can be used to predict testing problems that arise from the correlation properties of test sequences generated using linearfeedback shift registers. Finally, it is shown empirically that variance matching using associativity transformations can reduce the number of untested faults by a factor of eight over variance mismatched designs.
Introduction
Digital signal processing applications inipose strict timing constraints on a test strategy. Often, even a single added layer of logic can significantly impair performance. In these applications, conventional built-in self-test techniques can result in unacceptable performance penalties. While the basic operations involved in high-performance signal processing (e.g., shifts, additions, multiplications, registers) are by themselves highly testable, their composition into larger systems often is not. Test insertion strategies that aim to improve the observability and controllability of these composite structures without knowledge of their underlying behavioral characteristics often result in unacceptabk test overhead.
Here, we examine testability problems encountered in applying pseudorandom testing techniques to some fundamental DSP structures with the aim of identifying these problems early in the design process. A probabilistic analysis of the signal flow graph provides a means of locating pseudorandom testability problems prior to design synthesis. The designer can gauge the testability of a design early, allowing the possibility of restructuring the design in a way that improves its testability without impacting performance. Many of the test problems discussed here can be ameliorated by transformation of the signal flow graph, including width scaling and associative operator reordering.
From the perspective of the test designer, the analytical tools presented here enablemore focused test imertion techniques by pinpointing the cause of a missed fault. In addition, these techniques can be used to examine the performance of linear-feedback shiftregister (LFSR) based pseudorandom paittem generators (PRPGs) by modeling the correlation properties of the generated test sequences. This offers the possibility of objectively evaluating pseudorandom test generators based on their statistical and temporal properties. Fwthermore, by providing insight into which tests are difficult to assert at the module level, it is possible to gauge the impact that the underlying gate-level fault representation has on reported fault coverage.
Over the past 20 years, the probabilistic behavior of logic circuits has been extensively studied [l] , most recently in connection with estimation of p w e r dissipation [2] . In linear networks, signal behavior can be characterized most efficiently through analysis of the signal flow graph, yielding insight into testing problems that is harder to discem at the gate level. For example, in DSP design it is common to encounter testing problems at the upper bits of adders. One reason for this is redundant sign bits, upper bits that always follow the most significant bit (MSB). The reason this is a test problem can be seen in Figure 1 , which shows the top two adders in a ripple carry adder. The MSB and MSB-1 inputs to the adder are sign bits. If the output next-to-MSB bit always follows the output MSB, the cany logic of the next-to-MSB adder will be untestable. In some designs, the next-to-MSB output almost always follows the output MSB, but differs occasionally. In such designs the next-to-MSB adder carry logic is testable, but may be difficult to test using pseudorandompattems.
Redundant sign bits can be identified using scaling 13, Sec. 6.9.21, a DSP design technique that is commonly used to adjust multiplier gains so that overfiow is avoided. There is a close relationship between scaling and testability: scaling not only identifies redundant sign bits, but when these bits are removed, clears the path for other logic optimizations that remove redundant faults [4, 51. Even after scaling, there may be upper adder bits that are "near-redundant", i.e. that almost always follow the MSB, and are correspondingly difficult to test. Furthermore, as the test signal passes through the datapath, it is altered in ways that may reduce its ability to assert some tests.
In Section 3, the observations regarding redundant sign bits will be extended to identify in a general way adder tests that are difficult. The probability of asserting the difficult tests will be derived using signal probability distributions, and variance matching is proposed as a technique for improving the random-pattem testability of designs. Section 4 will discuss how signal distributions allalo a9 aa a7 aa as az a3 a2 a1 a0 + b6 ba b6 b6 ba ba bs b4 b3 bz bi bo S11 SI0 s9 SS S7 s6 SS S4 s3 s2 SI SO can be computed in generalnetworks consisting of adders, shifters, and registers. Section 5 will then show how this analytical technique can be applied to identify test problems that arise out of the correlation properties of linear-feedback shift-registers. Section 6 describes experimental results, including fault simulation results for four filters. The effectiveness of design transformations that and bo is the MSB, b~-1 is the LSB. Although signals take on only discrete values, their probability mass functions are plotted here for clarity as continuous density functions on the interval [-1,l). The area under density functions is normalized to 1.
The Variance Gap Problem
Many high-performance signal processing datapaths consist primarily of networks of shift, add, and delay elements. In our examination of large DSP datapaths constructed out of these primitives, we found that most of the difficult faults were located in the upper bits of the carry chains of adders and subtractors. This seems to be consistent with the anecdotal accounts of designers, who report difficulty in fully exercising the cany chains of adders. We found the problem to be most severe in adders where signals of greatly differing amplitudes are combined.
An example is shown in Figure 2 , where a 7-bit number is added to a 12-bit number. The smaller B input is signextended to conform to the width of the adder, represented by the sxt operator in the register-transfer level (RTL) schematic. The E input may not be immediately recognizable as a 7-bit signal; for example, it might be a full 12-bit wide signal that only uses 7 bits of its dynamic range.
In an adder with a variance gap, it is dmcult to generate the tests X10 and XOl at upper adder slices, where the cube specifies the values of the a,, b,, and cany inputs, respectively, and the B input is the lower variance input. The upper bits of the B input follow the sign of B. If a particular test requires the carry input to an upper adder slice to differ from the sign of B, a number of the upper bits of the A input will be constrained. This is shown in Figure 3 , where the test 001 is to be applied to the next-to-MSB full adder. The MSB adder is assumed to not generate a carry.
This test is dt3cuZr in the sense that it is dependenton the difference between the input widths, and requires fairly specificvalues to appear at the A input. If theA input values are uniformly distributed, variance gap.
It is not always necessaryto generate all the difficult tests, since there may exist an easier test for the failure mode in question, and not all difficult tests may be essential. For example, ifa failure mode of a full-adder has the effect of removing the cube 11X from the on-set of the carry function, the easier test 11 1 can be chosen over the difficult test 110. Which tests are essential is dependent on the adder structure and fault model used. Under most gate-level singlestuck fault models of full-adders, testing the carry logic presents the greatest difficulty since the cany logic tends to have essential tests that are difficult, while the sum logic fault model generally includes an easier test among the altemative tests for each fault.
The difficult tests are shown circled in Figure 4 , along with the logic function for the carry logic. The A input values required to assert each difficult test are shown in the accompanying table.
At a minimum, the failure modes of the carry logic typically include faults where the carry output on-set expands to include the cubes X1X or XX1. For these failure modes, the difficult tests 010 and 001 are essential. A gate-level model for such an adder is shown in Figure 5 .
Two common full-adder gate-level models are shown in Figures 5 and 6. The first design is optimistic under the single-stuck fault model in that two of the four difficult tests are nonessential, and consequently it is likely that these tests will not be applied even if 100% fault coverage is reported by fault simulation. The table shows the essential tests for the carry logic (labeled "e"), and the test equivalence classes for test cubes that do not include an essential test (labeled "1" and "2"). The carry logic is fully tested if each of the three essential tests are applied and a test from each of the two equivalence classes is applied. The second design is more conservative from atesting perspective, in that all four difficult tests are required. The first circuit may be too optimistic for use as a general full-adder model in fault simulation; if test generation is stopped as soon as 100% coverage is reported, it is quite possible that some adders with large variance gaps will not have had the 110 and 101 tests applied, which may be required to detect real faults in fabricated devices.
Thus, it is possible to identify-relatively independent of implementation s t r u c t u r e 4 a t (at least) the tests 010 and 001 will be problems in an adder where signals of widely differing amplitudes are combined.
The testing problem described here is compounded by another effect that commonly arises in DSP datapaths. Rather than the adder A input taking on a uniform distribution of values, samples tend to be more densely distributed aroulnd the mean. This is a consequence of the Central Limit Theorem [6]; at intemal nodes signals will tend to take on a normal distribution. This is illustrated in Figure 7 , where a continuous density function has been used to represent the discrete probability function of the A input. The figure shows the approximate locations of vectors that will assert the indicated difficult tests. As the variance gap is increased, the width of each of the eight gray regions narrows. Fortunately, the tests that would be asserted at the tails of the input distribution (010, 101) are also asserted by vectors that lie close to the mean. The problem is typically worse for the tests that lie around f0.5, at least if the standard deviation (a) of the signal is small compared to the full dynamic range available. The problem can be much worse for certain test signal generators, as will be shown in Section 5.
Fault detection probabilities
The previous section gave conditions required for the diflicult tests to be asserted at the upper bit slices of a variance-mismatched adder.
These conditions, while necessary, are not sufficient to ensure that the tests are asserted. The additional conditions are on the sign of the B (low variance) input and the carry output of the lower bits, which will be referred to as the Zower block (bits 0-5 in the example).
Specifically, the probability of a single vector asserting a difficult test at the next-to-MSB adder in an N-bit variance-mismatched adder with variance gap of M bits is p~ = P{ajv-2 = kl, bjv-2 = kz, C N -~ = ka)
where kl, kz E (0, I}, C l b is the carry out of the lower block, sign@) is the value of the sign bit of the B input, and V(lC1, kz) is a cube determined as follows:
If the B input is fed by a uniform disdribution, such as generated by an LFSR, P{sign(B) = k2) = 0.5. Approximating the distribution of the A vectors in the lower block by a uniform distribution, independent of the upper bits., the probability that the lower block generates a carry is the same as the probability that the sum of two uniformly distributed, unsigned L-bit binary numbers is greater than or equal to 2 L , where L is the width of the lower This function is 0.25 for L = 1, and rapidly increases towards its upper bound 07 0.5 as L is increased.
PD gives the probability of a test being applied at a given vector, representing a Bemoulli trial with probability of success p = PD. The probability of first asserting the desired test at vector n follows a geometric distribution, p~ (n) = qn-'p (where q = 1 -p, n = 1,2, . . .). The expected number of vectors for first assertion of the test is l/p, wityl varianceqlp'.
For the earlier example, N = 12, M = 5, L = 6, kl = 0, kz = 0, giving
If the A input is uniformly distributed, p~ = 0.25 . r 5 , and the mean waiting time for application of the test is 128 vectors, with standard deviation a = 127. In realistic applications, the uniform assumption breaks down, and it is necessary to either calculate or estimate the probability distributions of the inputs. Calculation of probability distributions will be discussed in Section 4.
Variance matching transformation
The variance gap problem can sometimes be reduced by restructuring additions using associativity. Ideally, the two smallest variance sources to be added are combined first, followed by the next two, and so on, in a manner analogous to the construction of a Huffman code tree. From a layout perspective this approach has drawbacks due to the potential irregularity of the resulting layout. Often, a linear chain of adders is preferred to a tree structure since this maximizes the regularity of the layout.
In large filters, it is common to have a high variance datapath into which smaller variance signals are added. In this case, one possible compromise between regularity and testability is to add small variance signals in their own chain before adding the result into the larger variance chain. This approach is illustrated in Figure 8 . The top design will be referred to as the chain architecture, while the lower design will be referred to as the variance-matching architecture. The effect of this transformation on the pseudorandom testability of the design will be examined empirically in Section 6.
Computing Probability Distributions
For signal processing datapaths consisting of networks of shift, add, and delay elements, it is possible to efficiently compute the signal probability distribution at any adder output using the convolution property of discrete, lattice-type random variables (RVs) [6] .
Given the probability distributions of two discrete, lattice-type independent RVs, X and Y, px(n) = P{X = n}, p y ( n ) = P{Y = n}, the distribution of the sum is given by the linear convolution of the distribution functions, px+y = p x ( n ) * py(n).
In two's-complement arithmetic, the addition is performed using normal unsigned integer arithmetic, modulo 2N. To account for the effect of overflow, L-point circular convolution is used in place of
The circular convolution can be computed efficiently using the Discrete Fourier Transform (DFT) or the Fast Fourier Transform
where P x ( k ) = DFT{px(n)), Py(k) = DFT(py(n)}.
Thus, when adder inputs are independent, the output distribution can be most efficiently computed in terms of the input distributions. This is applicable to acyclic networks where adder outputs do not reconverge.
A more general model assumes that the network is acyclic, but that adder inputs are not independent (e.g., adder outputs may reconverge). In this case, the impulse response is computed for the adder's output(assuming alinearnetwork), andtheoutputprobability distribution is computed from this. This is done by generating a distribution corresponding to each non-zero component of the impulse response, and convolving the results (or, more efficiently, multiplying together the FFTs of each distribution and taking the inverse FFT of the product). The generated distributions are simply suitably scaled versions of the PRPG source distribution. This approach assumes a single PRPG source, but can be extended to the case of multiple independent PRPGs by convolving the distributions computed for each independent source.
In the case of cyclic networks, the distributions can generally be approximated by truncating infinite impulse responses at a point where the energy in the tail is small compared to the total energy of the impulse response.
For the experimental studies described in this paper, we have implemented the single-source, acyclic version of the algorithm that supports reconvergent adder outputs.
DFT resolution and computational considerations: For wide signals, there is an issue of how many bits to use in the DFT representation of the signal. For our tests, we have found that 8 to 10 bits of resolution show most of the detail in the distributions, and should be adequate for estimating fault detection probabilities at the upper bits of adders, where stubborn faults are typically found. For networks with no reconvergent adder outputs, the DFT only needs to be computed twice and the inverse DFT once for each adder. For adders on reconvergent paths, the impulse response approach is used, requiring one DFT for each non-zero component of the impulse response, followed by one inverse DFT. This is potentially computationally expensive if a filter has a long impulse response. For the large filt64 example that will be introduced in Section 6, the impulse response method was found to use 66 CPU seconds on a 486-66MHz processor for 256-point (8-bit resolution) FFTs.
Nonlinear operations: The discussion here assumes a linear network model. However, truncation is a common non-linear operation found in DSP applications. This can be handled under a linear network model by representing truncation (or, equivalently, right-shifting) as division by apower of two combined with a noise source. For the experiments here, we ignore the noise introduced by truncation.
Modeling LFSR Correlation Properties
Common pseudorandom pattern generators (PRPGs) based on LFSRs do not produce statistically independent test vectors; significant correlation exists between successive tests. A typical LFSR-based PRPG might shift its contents from LSB to MSB for each test, introducing a new bit at the LSB. For this type of PRPG, each output sample (interpreted as a two'ssomplement number) is closely related to twice the preceding sample. The sign-extended output of an N bit LFSR can be expressed in terms of its previous output as
The first case occurs on average 50% of the time, with the other two cases each occuring 25% of the time (6 is the one-bit unsigned 011 signal shifted into the LFSR LSB). This representation shows why correlation effects might be a problem in a datapath structure: for example, if the datapath implements the function where the typically small effect of 6 has lbeen neglected.
For PRPGs generating statistically independent samples, signal distributions are typically smooth after a few signals dependent on different PRPG samples are combined. Hlowever, PRPG correlation effects can destroy this property, introdlucing a large amount of fine structure in the signal distributions. This sort of correlation effect can be examined using the analytical techniques described in Section 4, where the PRPG input signal is replaced by a linear model of an LFSR, shown in Figure 10 . The input to the LFSR model, w(n), is an independent RV taking values 0 and -1, each with probability 0.5. The SHLEXT operator shifts its input left by the indicated amount, producing a signal wide enough to hold the shifted quantity.
For the datapath segment shown in Figure 9 , an idealized PRPG producing statistically independent samples would produce the output distribution shown in Figure 11 . By replacing the input to the circuit with the LFSR model of Figure 10 , the curve labeled "LFSRtheory" in Figure 12 is produced, in agre.ement with the histogram produced by simulating the actual LFSlR sequence. When compared with the test zones indicated in Figure 7 , it can be seen that this signal would not be able to effectively test an adder with even a small variance gap in the following filter stage since tests 001 and 110 would not be activated.
mial (at least for the class of LFSRs that use extemal XOR feedback networks). This result shows how signal distributions are able to provide insight into testing problems that cannot be identified with gross measures, such as signal variance or maximum signal range. 
Experimental Results
Four filter specifications were selected from the literature: a 64-tap filter [7] , a60-tap filter [SI, a25-tap filter [8], andan 11-tap filter [9] .
The design statistics are shown in Table 1 , including the number of adders, the number of state registers, the input signal width, the coefficient width, and the output signal width. Ripple adder chain structures were used to implement the fixed width datapath baseline designs, where all addition and subtraction operations are the width of the filter output. Scaling was then applied to remove redundant sign bits, shrinking portions of the datapath and enabling further redundancy elimination using logic optimizations [4, 5] .
The scaling results are shown in Table 2 , where the number of adders and registers scaled is shown, along with the number of adder and register bits removed via scaling. L1 scaling was used [3, Sec. 6.9.21, which guarantees that the behavior of the design is not changed. Fox comparison purposes, the designs were also constructed using the variance matching architecture described in Section 3.2. The scaling results for the variance matching architecture are showin in Table 3 .
The resulting designs were fault simulated using LFSR-generated test vectors. The total number of adder faults simulated in each design is shown in Table 4 . Registers in these designs are highly testable, consequently their faults are excluded from consideration here. The fault simulation curves are plotted in Figures 13-16 , showing the Iiumber of untested faults for the original (unscaled) chain architecture, the scaled chain architecture, and the scaled variance matching architecture. Variance matching results are not plotted for filtll since it had too few adders per tap to allow significant optimization. All the designs were found to be highly testable in terms of percent fault coverage (all in the high 90s) but the original unscaled designs using the adder chain architecture have a significant number of untested faults even after several thousand test vectors (on average as many as 4 per adder in filt64). Redundancy elimination using scaling and logic optimization significantly reduced the number of untested faults in these designs, but a large number of stubborn faults remained in the cases of filt60 and filt64, as indicated by the high rate at which faults are still detected after 2000 vectors have been applied. For these filters, the variance matching architecture yielded significant gains in terms of reduced test time and fewer untested faults. In all designs, the best optimized design offered more than an order of magnitude reduction in the final number of untested faults over the unoptimized design. The results are summarized in Table 5 in terms of the average number of untested faults per adder after applying a maximum-length LFSR sequence (or 4k vectors for filtll). 
Conclusion
The testability properties of large DSP datapath structures under a pseudorandom self-test paradigm have been examined. Addition of variance-mismatched signals was identified as a testing problem, and the probability of detecting difficult faults in these structures was derived in terms of signal probability distributions. A method of calculating these distributions was described, and its ability to predict testing problems associated with LFSR correlation properties was demonstrated. Variance matching was empirically shown to improve the testability of the two largest designs, reducing the number of untested faults by at least a factor of eight over the scaled and optimized designs that did not use variance matching.
In addition, test length was significantly reduced.
