Absfmct-
I. INTRODUCTION
The fast Fourier transform (FFT) plays an important role in digital signal processing because it increases the computing efficiency for obtaining large discrete Fourier transform (DFT) dramatically [ 11.
One of the most obvious ways of implementing the N-point FFT in hardware is to construct log, N stages of two-input N / 2 butterflies.
Such a circuit will be called the FFT network [2] hereafter in this work. FFT networks are attractive because of their optimal performance (area . time') and throughput.
Three different schemes have been proposed for fault-tolerant FFT networks. Choi and Malek proposed a fault tolerance scheme for FFT based on recomputation through an alternate path [3] . The throughput of this scheme is only 50% compared to a system with no fault tolerance. In [4], Jou and Abraham proposed an ABFT scheme for FFT networks. The hardware overhead of this scheme is approximately 2/ log, N. Due to round-off errors, this scheme's fault coverage or throughput or both may not be very satisfactory [5] . To deal with such a problem, an encoding scheme was suggested in [5] which achieves higher fault converge. Another scheme with a much weaker fault model was given in [6] .
In this work, we propose a new algorithm-based concurrent errordetecting scheme for FFT networks, which has lower hardware overhead and achieves higher fault coverage than the schemes in [4] , [5] . In the next section, we give some background information which is needed in the discussion later. In Section 111, we propose a new ABFT scheme for FFT networks and prove the correctness of this scheme. Next in Sections IV and V, we compare the performance and hardware overhead of this scheme with those of [4] and [5] . We conclude in Section VI.
PRELIMINARIES
In this section we present some preliminary concepts.
A. An FFTNetwork
The discrete Fourier transform (DFT) of a sequence ~( p ) is where
is the Nth root of unity. In order to simplify the notation, where unambiguous, hereafter we write W k in place of Wk. The N-point FFT networks considered in this work consist of (N/2) x log, N two-input butterflies, where N = 2". An %point FFT network is shown in Fig. 1 , and the function of each butterfly is shown in Fig. 2 . The upper and lower input port of a butterfly module will be called port 0 and port 1, respectively. Each butterfly module with inputs a, b, and outputs c, d , performs the butterfly computation
where W k is sometimes called a "twiddle factor." The output sequence X ( k ) , O 5 k 5 N -1, is given in bit-reversed order. In other words, X ( k ) will appear at output port h, where h = B R ( k ) , as defined next. Suppose the binary expansion of k, 0 5 k 5 N -1, is klk, -e . kn (this is equal to k12"-l + k22n-2 + ... + kn), then the bit-reverse of k , denoted as BR(k), is equal to k,kn-l . . . k l .
The stages in an FFT network are labeled from 1 to n, and in each stage the butterflies are labeled from 0 to N / 2 -1, from top to bottom.
From now on, the Zth butterfly in stage i will be called butterfly ( i , I ) .
The twiddle factor associated with butterlly ( i , 1 ) can be computed as follows. Let the binary expansion of Z be 1 Z2 . . In-1. The following lemma follows from the description given in [7] (hereafter, we will sometimes use mixed binary-decimal representations of numbers for convenience). Lemma I: The twiddle factor associated with butterlly (i,Z) is W " , where m = (1n-1Zn-2...ln--z+l) x 2n-a for i > 1, and
B. Fault Model
In this work, we assume the module-level fault model [4] . When a fault appears in a butterfly, the resulting error can be modeled as an additive error at one of the input or output ports of the module, for example, at input ports a or b, or at output ports c or d of the butterfly in Fig. 2 . Since in an FFT network an output port of a module is connected to only one input port of the next-stage module, faults on these ports are not distinguishable and will be considered as the same fault in the following discussion. For an FFT network of size N = 2", a fault at output port j of a stage i module is denoted by f f , 1 5 i 5 n, while a fault at input port j of the FFT network is denoted by f : . Examples of faults in a network of size 8 are given in Fig. 1. A general CED scheme.
C. Data Path and Transfer Function
A general CED scheme is shown in Fig. 3 . The inputs are encoded and outputs are decoded separately, and the results are compared to decide if the outputs are erroneous. Suppose a fault ff causes an additive error e. This error propagates through the FFT network and eventually causes an additive error E k at output port k of the network (some E k S may be zero, depending on the location of the fault). The final output error E , which appears at the output of the output detector, depends on all E k S as well as the decoding scheme. The g h at output port k, denoted as Gk(ff), is defined as Ek/e. The transfer function Sf is defined as E / e .
Clearly, Sf should not be zero for any i,j, otherwise some faults may remain undetected. To compute each E k for a given fault fj, we have to find the data path from the fault site to output port k. Note x 2"-". Since the path enters the module at input port js-t and leaves at output port k,, it is easy to see that the error is amplified by a factor ( -l ) 3 s -r t s ( W m s ( k ) ) 3 s -a in this stage. Thus E k can be written as the product of the initial error e and the gains of the successive stages after the fault site. Thus, n G k ( f , ' ) = n (-1)3a-*ka(Wm8(k) ) 38-1.
s=*+1
111. THE CONCURRENT ERROR DETECTION SCHEME
A. The Basic Scheme
The discrete Fourier transform given in (1) can be written as Then (2) WCS, = 5 s * ZT and the output decoder computes WCS, = r', * 2T. WCS, and WCS, are then compared by the TSC comparator to decide if they are equal. The scheme is illustrated in Fig. 4 for an 8-point FlT network.
The encoder and decoder can be obtained from the above equations. At the output side, note that
Pro08
Since output port k = k l k 2 . . -k n contains an error due to the fault f f , by Lemma 2, we have kl k2 e . . k, = jn--r+1jn--l+2 ....in. Given that j = j1j2. .jn is even, j , is zero and thus k, is zero. Let j' = j + 1 and let j ' be represented as j ; j ; . --A , then jk = j , for 1 5 m < n, and jh = 1. By Lemma 2, port k + 2"-' will be affected by fault fi+l. Let k' = k + 2"-' and let it be represented as k; k; . . . kh, then we have ki = kI for all The aforementioned scheme is analyzed next. For a fault f;, Sy = W i , t E {0,1,2}, which is not zero.
Induction
Step: Suppose that S: # 0, for all o 5 j 5 N -1, o < I < n. We want to prove that S:-' # 0, for all 0 5 j 5 N -1. The fault is associated with input port j 1 of butterfly module j 2 j 3 . . . j , (where jljz . . -j , is the binary equivalent of j , as before), whose twiddle factor will be denoted as W" for convenience (u can be calculated from Lemma 1). Therefore, we have In (6), and from here on, the subscripts 2 j and 2 j + 1 are assumed 
k=ko+Zn-I
From Lemma 4 we know that Gk+Zn--I (fi,+l) = c(z, 2j).Gk(fi,).
Thus, ko+Zn-' -1
si,,, =
c(1,2j). Gk(fi,). W[+2n--( k=ko kO+Zn-'-

= c ( I , 2 j ) . W:n--( .
Gk(f4,). w,"
k=ko = C(Z, 2 j ) . W y . si,. Let c = c(Z,2j) . W,'"-'. From Lemma 4, we known that [C(Z,2j) ]" = 1, where N = 2". Since 2"-' mod 3 # 0, we have = W i or W," . In either case, C # f 1 and thus 1 f C # 0.
Rewrite (6) as
M;32n--I
Since Si, # 0 for all j , we know that Si-' # 0 for all j .
0
It is easy to verify that any single fault in either the input encoder or the output decoder has a nonzero transfer function. The following theorem follows from this fact and Theorem 1.
Theorem 2: Every single functional fault in an FFT network with our weighted checksum encoding scheme is detectable.
w. PERFORMANCE EVALUATION
Round-off errors are introduced into FFT networks due to the limited number of bits available to represent a real number [9] . As a result, the two inputs of the TSC comparator may not be equal even if the FlT network is fault-free. To remedy such a problem, we may allow a small difference 7 between the two weighted checksums which are input to the comparator [4]. Whenever a difference larger than 7 is observed, a retry with modified inputs is used to distinguish between round-off errors and functional errors. A small 7 surely degrades the system throughput since more retries are needed, but a large 7 may let many errors escape detection. In this section, we discuss the throughput and fault coverage. The analyses are for fixed-point arithmetic; systems with floating-point arithmetic can be analyzed in the same way [4] and are less affected by roundoff errors. The throughput of the system for different q / f i g f is given in Table  I . This is the same as the throughputs for the hardware redundancy schemes presented in [41, [51.
B. Fault Converge
The functional error e caused by fault f; is modeled as an additive white noise at the input or output of the module. Thus the real and imaginary parts of the noise are assumed to be uncorrelated and each has an amplitude density that is uniform between -1 and 1 [4] . At the output W C S , of the output decoder, the functional error will be e x S;. The final error is the sum of functional errors and roundoff errors, and the fault alarm will be set iff the final error is larger than the threshold 7. If, however, a fault occurs but le x S; + FI is less than 9, the fault will be masked. The fault coverage for a given f;
is given by C(7, N, S;) = Prob (7 < le x S; + FI).
(8)
For a given 7 , a CED scheme with larger transfer functions is likely to achieve higher fault coverage. In Table 11 , we list the distribution of IS; Is for FFT networks of size from 128 to 1024 with different CED
schemes proposed in [4] (JAH), in [5] (THC) and this work (WJ).
A number in this table represents the ratio of the number of transfer functions falling in the shown range for Sj to the total number of distinct fault sites.
In both the JAH and THC schemes, more and more transfer functions become smaller as the size of the network increases. On the other hand, the scheme proposed in this paper has a better distribution of IS;ls since the value of the minimum transfer function, given in Table I11 for size from 128 to 1024, decreases only slightly as the network size increases. As a result, the fault coverage of this scheme is less sensitive to an increase in the network size. Also, since the value of any S; is reasonably large, the probability of not being able to catch a functional error is small for reasonable values of 77. The average fault coverage of an FFT network can be obtained from all C(q, N, Si). Since we assume that only one fault appears at a time, the average fault coverage can be computed as follows [5] :
However, it is difficult to calculate the fault coverage of an FFT network by using (9) directly. To estimate the actual fault coverage of an FFT network, a method was presented to calculate the upper and lower bound on the fault coverage for a given S; [5, Theorem 3). In Table IV , we list the upper and lower bounds on the fault coverage bounds on the fault coverage for 16-bit FFT networks. The fault coverage for the JAH scheme is lower than 75% for N 2 256 in this As the network size increases, the constant terms in the above expression become negligible. Suppose that all input signals are complex numbers. Let the hardware complexity of a butterfly be B. We now look at the hardware overhead for different number systems (floating-point or fixed-point).
Floating-Point System: The time and hardware complexities of multiplication and addition are comparable in this system. A complex multiplier consists of four real multipliers and two real adders, while a complex adder consists of two real adders. A butterfly module has one complex multiplier and two complex adders. If we use a real mukipliedadder as a unit, then HOR for the three schemes can be computed from the above equations with M, = M, = 6 , A, = A, = 2, B = 10, and the fact that the nonfault tolerant FFT network has B ( N / 2 ) log, N units.
Fixed-Point System: The time and hardware complexities of multiplication of two b-bit numbers are roughly the same as those of addition of b b-bit numbers in this system. Since the contribution of adders to hardware complexity is relatively small, we will count the multipliers only. Thus, M, = M, = B = 4 and A, = A, = 0 can be used to compute the HOR.
If input signals have only real components, the input encoders can be greatly simplified in THC and WJ; the overhead of THC is roughly the same as JAH, and WJ needs 40% to 50% less hardware redundancy than JAH. Readers can verify this by making M , = 2 in the above computations.
VI. CONCLUSION
In this work, we presented a new CED scheme for FET networks.
We have shown that the scheme can achieve 100% fault coverage and throughput theoretically with equal or less hardware redundancy than other hardware redundancy schemes. When roundoff error is taken into account, it is shown in this paper that the fault coverage of the proposed scheme is better than all known hardware redundancy schemes.
