The applicability of special purpose computers to fast Fourier transforms. by Adams, David Hugh
Calhoun: The NPS Institutional Archive
Theses and Dissertations Thesis Collection
1967-09









THE APPLICABILITY OF SPECIAL PURPOSE














THE APPLICABILITY OF SPECIAL PURPOSE
COMPUTERS TO FAST FOURIER TRANSFORMS
by
David Hugh Adams
Captain, United States Marine Corps
B.E., Vanderbilt University, 1962
Submitted in partial fulfillment of the
requirements for the degree of





The Fast Fourier Transform is an algorithm for the computation
of Discrete Fourier Transforms in less time than allowed by any other
algorithm available. The use of special purpose digital machines
to reduce those times even further is of interest for real time
spectral analysis. The main principles of Fast Fourier Transforms
are presented. The design of a full-parallel eight sample processor
is presented as a point of reference for comparison with serial and
serial-parallel hybrid machines. Carry-Save Addition is introduced




2. Parallel Processing 24




I Equations for Eight Sample Application 36
II Equations for Sixteen Sample Application 37
III Illustrative Example of the Discrete Fourier
Transform and Its Inverse 40




1. The Flow Diagram for the Coefficients of the
Eight Sample Problem 18
2. The Flow Diagram for the Even Coefficients of
the Sixteen Sample Problem 19
3. The Flow Diagram for the Odd Coefficients of the
Sixteen Sample Problem 20
4. The Characteristic Reduction of the Fast Fourier
Transform 21
5. The Block Diagram of the Full-Parallel Eight
Sample Processor 25
6. Computation Time vs Number of Samples 32
7. Complexity vs Number of Samples 33

TABLE OF SYMBOLS AND ABBREVIATIONS
Symbols Definition
Couplets The combination of two samples in the first pass
of a FFT. Referred to as a sum, or a difference,
couplet depending on the method of combining.
CSA Carry-Save Adders or Carry-Save Addition
DFT Discrete Fourier Transform
F(n) The n coefficient of a DFT
FFT Fast Fourier Transform
N The number of samples being processed to obtain
the DFT
-9
nsec. Nano-second; 10 seconds
quartets The combination of four samples in the second pass
of a FFT
— f\
usee. Micro-second; 10 seconds
Z(k) The k sample of a sampled time signal

1. Introduction
In presenting a procedure that will determine the complex Fourier
coefficients of a sampled signal, it is first necessary to introduce





Z(k)exp(-j2rrnk/N) k f 0,1,2, N-l (1)
k=0
where F(n) is the n coefficient of the DFT and Z(k) the k sample
of a N sample signal. It is assumed that the N samples are equally
spaced and taken at a frequency that is at least twice the highest
frequency component of a bandlimited signal. Using the principles of
orthogonality it can be shown that the inverse of the DFT is
N-l
Z(k) =tj 7 F(n)exp(j2TTnk/N) n= 0,1,2, N-l (2)
k=0
Since the procedure uses successive reductions of a finite series 1
length by a factor of two, the number of samples, N, must be an integer
power, p, of 2 giving N = 2 . It is evident at this point that p
reductions or passes will be necessary to obtain a non-serial solution
for a given coefficient.
Partitioning equation (1) at the halfway point, it appears as
N/2-1 N-l
F(n) = V Z(k)exp(-j2TTnk/N) + £ Z(k)exp(- j2rrnk/N) (3)
k=0 k=N/2
substituting k = m + N/2, in the second half of the equation yields
N/2-1 N/2-1
F(n) = J Z(k)exp(-j2nnk/N)+ t Z(m+r)exp[- j2TTn(m+r) /N] (4)
k=0 m=0
and combining under a single summation
F(n)= ;z(k)exp(-j2TTnk/N)+Z(k-e)exp[-j2nn(k+^; / ]| (5)
k'=0
'
The common exponential multiplier may be factored and after reduction











F(n) = [Z(k)+Z(k+N/2)]exp(-j2TTnk/N) n even (7)
k=0
N/2-1
F(n) = [Z(k)-Z(k+N/2)]exp(-j2TTnk/N) n odd (8)
k=0
Initiating a new pass, equations (7) and (8) are partitioned at the
half-range point
N/4-1
F(n) = / [Z(k)+Z(k-W/2)]exp(-j2nnk/N)
k=0
N/2-1




F(n) = t [Z(k)-Z(k-W/2)]exp(-j2nnk/N)
k=0
N/2-1
+ Y [Z(k)-Z(k4W/2)]exp(-j2nnk/N) n odd (10)
k=N/4
substituting m = k - N/4, k = m + N/4, in the second summations
N/4-1
F(n) - J [Z(k)+Z(k+N/2)]exp(-j2TTnk/N)
k=0
N/4-1
+ f [Z(m-W/4)+Z(m+3N/2)]exp[-j2nn(m+N/4)/N] n even (11)
m=0
N/4-1
F(n) = £ [Z(k)-Z(k-W/2)]exp(-j2TTnk/N)
k=0
N/4-1
+ 7 [Z(m+N/4)-Z(m+3N/2)]exp[-j2TTn(m+N/4)/N] n odd (12)
m=0
Combining under a single summation and extracting the common exponentials
gives
N/4-1
F(n) = 7 [Z(k)+Z(k+N/2)]+[Z(k-W/4)+Z(k+3N/4)]exp(-JTm/2)
k=0 X '
X exp(- j2TTnk/N) n even (13)
N/4-1
F (n ) = Z [Z(k)-Z(k+N/2)]+[Z(k-fN/4)-Z(k+3N/4)]exp(-jrrn/2)
k=0 '






























F(n) = L I C z (k)-Z(k+N/2)]+j[Z(k+N/4)-Z(k+3N/4)] I exp(- j2rrnk/N)
k=0 '
(n-l)/2 odd (18)
General characteristics of each reduction are now apparent. They are:
1. For the u pass a complex multiplier of the form exp (-j2rrn/2 )
is generated.
2. The limits of the summation after the u pass are k=0 to
k=(N/2 U )-l.
12
3. For each reduction beginning with some combination of samples,
u x
T(k), the multiplicand of the complex multiplier is T(k+N/2 ).






5. When u=p, N=2 , the limits of the summation are k=0 to k=0;
the exponential exp(- j2rrnk/N) becomes unit; 1 and the non-serial solution
of F(n) is explicit.
rd
Now, continuing the procedure for the 3 pass, u=3
:
a. the exponential multiplier is
3
exp(- j2rm/2 ) = exp(- jTTn/4) ;
3
b. the limits of summation are k=0 to k=(N/2 )-l=N/8-l;
3
c. the sample shift is N/2 = N/8;






Recalling equations (15) through (18) the following substitutions
are made to simply bookkeeping:







Also, in any set of equations having common summations, the limits of
the summations will only be shown for the first equation of the set.
Rewriting (15-18) yields
N/4-1
F(n) = t [A(k)+B(k)]exp(-j2TTnk/N)
k=0
F(n) = £ [A(k)-B(k)]exp(-j2TTnk/N)
F(n) = £ [C(k)-jD(k)]exp(-j2Tmk/N)
F(n) = £ [C(k)+jD(k)]exp(-j2TTnk/N)
After the third pass, the resulting equations are
N/4-1
,
F(n) = V j [A(k)+B(k)]+ exp(-JTTn/4)[A(k+N/8)+B(k-W/8)] exp(- j2TTnk/N)
k=0 { '
n/2 even (19)
F (n ) = Li [A(k)-B(k)]+ exp(-jnn/4)[A(k+N/8)-B(k-HN/8)] j exp(- j2nnk/N)
n/2 odd (20)
F(n) = Y.[ [C(k)-jD(k)]+ exp(-JTTn/4)[C(k+N/8)-jD(k+N/8)] j exp(- j2TTnk/N)
(n-l)/2 even (21)












; +J (n-2)/4 odd
i
-,707(l+j) (n-3)/4 even
I +.707(l+j) (n-3)/4 odd







F <n > = Z j [A(k)+B(k)]-[A(k+N/8)+B(k-W/8)] j exp(- j2nnk/N)
n/4 odd (24)






F(n) = JTl j C(k)-.707[C(k+N/8)-D(k-W/8)]
j
-j D(k)-.707[C(k+N/8)+D(k-W/8)] |exp(- j2nnk/N)
(n-l)/4 odd (26)
F(n) = £ I [A(k)-B(k)]-j[A(k+N/8)-B(k-W/8)] 1 exp(- j2TTnk/N)
(n-2)/4 even (27)
15
F(n) = £ j [A(k)-B(k)]+j[A(k-W/8)-B(k+N/8)] j exp(- j2Tink/N)
(n-2)/4 odd (28)
F(n) = J] I |c(k)-.707[C(k-W/8)-D(k+N/8)]
j
+j j D(k)-.707[C(k+N/8)+D(k+N/8)] i iexp(- j2TTnk/N)
J'
(n-3)/4 even (29)
F(n) = v . C(k)+.707[C(k-W/8)-D(k+N/8)] -
+j j D(k)+.707[C(k+N/8)+D(k+N/8)] ) j exp(- j2TTnk/N)
(n-3)/4 odd (30)
Substitution for the functions A,B,C, and D will produce explicit statements
of the transform equations after three passes.
Further arithmetic development is avoided for brevity with the state-
ment of the transform equation sets for the 8 sample (N=8) and 16 sample
(N=16) cases in Appendices I and II, respectively. Appendix III, in turn,
is an illustrative problem in which the spectral lines of a signal are
determined and subsequently the originating signal reproduced via the
described transform and its inverse.
16
The procedure just presented is indeed a very simple example
of the Fast Fourier Transform (FFT) , an algorithm that allows the
computation of a time series DFT with fewer actual arithmetic
operations than other algorithms available. An arithmetic operation
is defined, here, as a complex multiplication followed by a complex
addition.
Noting that the straight forward method of computing the DFT
2
requires N operations, Coo ley and Tukey [l] showed that less than
2Nlog N operations are required when the FFT algorithm is used.
They also showed that this algorithm requires no more data storage
than the storage required for the initial samples, assuming the
initial samples are complex. The particular method demonstrated
herein, however, was first shown by Sande [2] and later referred to
as "Decimation in Frequency" by Cochran, et al, [3] because of the
characteristic divisions of the F(n)'s after each pass.
In Fig. 1 a flow diagram for the equations of Appendix I, the
eight sample example, is shown. To be sure, this diagram is not
entirely general; the assumption of real inputs avoids continual
complex additions and allows absolute segregation of the real and
imaginary parts of the computed coefficients. Such an assumption
has many real world applications and succeeds in reducing the
computation time and circuitry by at least a factor of two.
A flow diagram for the computation of the even numbered coef-
ficients of the 16 sample example (equations of Appendix II) is
presented in Fig. 2. It is interesting to note that this is, in
fact, the same data flow as shown in Fig. 1, with the exception

























































































CO 1—1 .—1 r-l CT. r— .—
i
r-l
+ + + + + + + +





















































































f « • • m «





















CN 00 ON ON CM 00











































































(/I P •«^M O p5Q fa H
O0
faH Pd oW fa fa
Di M CO •~
\
C_> g l.=J CO
























































































couplet in the first pass column in Fig. 2 were used to define the
couplet, the last statement becomes even more obvious. Figure 3
completes the flow diagram for the 16 sample example by presenting
the flow for the odd coefficients. Notably, this type of flow
does not take full advantage of the FFT properties.
The pre-multiplication of difference couplets by appropriate
exponential factors and then continuing through the next lower order
process was shown graphically in [3] for the eight sample case.
Figure 4 extends that representation to the 16 sample case and
illustrates the ease with which general purpose computers might
handle such computations. Comparing Figures 3 and 4, the inherent
trade-off between special purpose and general purpose implementations
of a particular function is illustrated. The use of recursive proce-
dures and the availability of complex arithmetic makes general purpose
computation of the Fig. 4 process most advantageous. Whereas the
elimination of complex multiplications and complex input data in the
Fig. 3 process produces a more cumbersome set of equations; equations
which are, however, more easily adapted to special purpose computation.
It is the purpose of this work to investigate the applicabilities
of special purpose computers to Fast Fourier Transforms. The main
criterion of comparison between special purpose and general purpose
computations will be time versus circuit complexity, where complexity
includes the number of components, the size of memory, and the size
of the control unit.
A completely parallel, 8 "real" sample, 8 spectral line processor
will be used as a vehicle for the "fast" special purpose machine.
Extrapolations to greater numbers of sampled inputs and the subsequent
22
increase in spectral lines will be made with respect to the increased
complexity and cost. Further, an investigation of hybrid sequential-
parallel machines will be presented.
23
2. Parallel Processing
The data flow diagrams of the eight sample example is shown in
Fig. 1 and the explicit set of equations given in Appendix I.
As the word "parallel" has many connotations when referring
to computers, "full parallel" computation implies the simultaneous
computation of all equations of the set. This does not, however,
mean that the computation of each equation in all passes of the
FFT must be independent. In fact, independent computation, when
applied to the particular set of equation being examined, would be
ridiculous. The computation in each pass is done in parallel.
Examining Fig. 1, six basic combinations of four samples (quartet)
and two basic couplets are found at the end of the second pass. Fo-
cusing attention on the quartets, for the present, the principle of
Carry- Save Adders (CSA) is utilized. As shown in Appendix IV, CSA
techniques may be used to obtain exceptionally fast addition of
several arguments through logic rather than cyclic operations as in
conventional adders. It might be added that CSA may also be used to
implement extremely fast multiplication processes. Six 4-argument
CSA's will be used at the front of the purposed processor.
At this point, the real parts of F(2) and F(6) and the imaginary
part of F(6) are available. The addition of one more rank of CSA's
to the imaginary F(6) processor provides the imaginary part of F(2),
the negative of the imaginary part of F(6). F(0) and F(4) are obtained
by crossfeeding the results of the all even samples and all odd samples
summers into two additional two-argument CSA's. The odd- sum must be










































































































To obtain the odd coefficients, the results of the two remain-
ing quartet CSA's must be multiplied by the constant . .70711. Multi-
plication is a shift and add process, dependent on the position of
"ones" in the binary multiplier. Since .70711 . = .1011010100 the
shift would be to the right and five additions necessitated.
For example, the multiplication of an eight bit quantity M by
.70711 would be represented as:
MMMMMMMM
7 6 5 4 3 2 1
MMMMMMMM
7 6 5 4 3 2 1
M^ M, Mn M M M M n M76543210
MMMMMMMM
7 6 5 4 3 2 1
MMMMMMMM
7 6 5 4 3 2 1
PPPPPPPPPPPPPPPP
7 6 5 4 3 2 1 -1 -2 -3 -4 -5 -6 -7 -8
A CSA multiplier for this example would promulgate the carrys from the
-6 digit forward and require a maximum of five arguments (the P.. bit).
The maximum number of delays for multiplication by the given constant
would then be 5+k for a k bit multiplicand.
Performing the multiplication in two Carry-Save Adder (CSA)
multipliers then allows the crossfeeding directly into four 3-argument
CSA adders. This gives the respective combination for the real parts
of the odd coefficients and the imaginary parts of F(3) and F(7).
Complementing the imaginary parts of F(3) and F(7) by an additional
CSA rank on each of the sums yields the imaginary part of F(l) and
F(5).
Having outlined the complete processor verbally, the block diagram
is presented in Fig. 5. Note, the computation of the imaginary parts
of F(l) and F(5) are the most involved; therefore, the time analysis is
26
based on those computations. For the sake of real time estimation,
a 12-bit word and the availability of the 20 nsec adders mentioned in
Appendix IV are assumed. The number of delays to the final stage is
taken from Appendix IV and 11 ripple delays are assumed for the 12-
bit word.
The initial four argument processor required three delays, the
multiplication four delays, the final three argument addition two
delays, and the complement one delay for a total of 10 delays to the
final stage. Adding 11 ripple delays gives 21 units of delay for a
420 nsec total time of computation for the eight spectral lines of
a signal.
A maximum of k - 1 full adders are required for each bit of a
k argument CSA. A measure of complexity, the total number of adders,
may be calculated. Thirty-six adders are required for each of the
six, 4-argument, 12-bit units for a total of 246 adders for all six
units. For each of the two argument summations and the three negation
operations, 12 adders are required for a total of 60. For each of the
four 3-argument units following the multipliers, 24 adders are required
for a total of 96. Finally, for each multiplier 45 adders are needed
giving a 90 adder total. The processor then requires a grand total
of 492 adders. A completely parallel computation of the eight Fourier
coefficients takes less than .5 usee, using five hundred 20 nsec
full adders.
Since the initial information may be stored in the same location
as the eventual answers, 192 bits of immediate access memory is needed.
No control units other than a .5 usee clock and a memory read and write
logic are required. If J-K flip-flop memory is used, only one AND
gate is needed for control of the memory.
27
Extension of the methods of computation shown for the eight
sample example to a sixteen sample example may be accomplished with
one basic assumption; an average of five additions is required for
each constant multiplier.
Referring to Fig. 2, the simplest method of obtaining the even
coefficients would be to provide eight parallel adders in front of
an eight sample processor. For the specified 12-bit word, this
requires an added 96 adders to the basic 492 and would add 20 nsec
to the computation time.
Computation of the odd coefficients becomes slightly involved.
It is found that 1076 adders are required and an approximate compu-
tation time of 500 nsecs achieved (based on the multiplication
assumption stated above).
Thus a 16-sample processor would require a total of 1664 adders
and would produce the 16 complex spectral lines in 500 nsec. It is
quite evident that extension of the "full parallel" computation to
higher number of imputs becomes extremely expensive in terms of the
number of adders required. However, the computation times are
extremely fast. It is estimated that almost two million adders would
be required for the 1024-sample case. The computations would still
be accomplished in less than 2 usee.
It must be remembered that the figures given in this section
are for real input values, not complex. To further extend the
processors to handle complex input signals would require an approximate
257o increase in adders and a 107o increase in processing time. Thus,
the eight and sixteen sample processors would require approximately




A FORTRAN IV program used to compute the Fast Fourier Transforms
on an IBM-360 system was used to give some indication of a fast serial
computation of the coefficients for various numbers of sample. The
average time to perform an arithmetic operation (complex addition
followed by complex multiplication) was estimated at 22 usee. From
that average operation time, the maximum computation times for several
different numbers of samples were computed and are shown in Fig. 6.
The question of combining the speed of parallel processing with
the simplicity of serial operation must be answered with a compromise.
It is quite evident that for 16 samples or more, straight parallel
computation is unreasonably expensive. With a very low hypothetical
price of $10.00 per adder, a 16 sample processor's arithmetic unit
alone would cost $20,000.00. Consider, however, a unit which would
compute 16 coefficients according to the flow in Fig. 4 by using a
single full-parallel eight complex sample processor and serially
processing the basic couplet's arithmetic and complex multiplications.
A general 12 x 12 bit CSA multiplier that will give the product
in 360 nsec may be constructed with 132 adders and 156 AND gates.
With four such multipliers and two simple adders a complex multiplier
is constructed that will provide the product of two complex quantities
in 380 nsec. Since the basic couplets characteristically fall into
sum and difference couplets (see the first pass column in Fig. 2) it
seems reasonable to compute both at the same time. By adding a complex
subtractor (two words) before the multiplier and a parallel adder, the
sum couplet and the difference couplet with complex multiplication are
computed in 400 nsec. Such a unit would require approximately 575
29
adders and 624 AND gates. If 9 AND gates are considered equivalent
to one full adder, the above circuit is equivalent to 640 adders.
At this point the "hybrid" processor consist of one eight-
complex-sample processor and one arithmetic unit that computes
the sum couplet, the difference couplet, and multiplies the dif-
ference couplet by a complex constant in a single 400 nsec operation.
For the 16 sample problem of Fig. 4, the arithmetic unit would
only be required to cycle through eight calculations (First Pass),
then the eight sample unit would cycle twice. Assuming a .25
usee read and write time, one complete arithmetic operation could be
completed in less than 1.25 usee. The eight arithmetic operations
would require 10 usee and the two cycles of the eight sample unit
9 usee (4.5 usee per cycle) for a 19 usee computation time. The
above calculation times have been extended for various numbers of
sampled inputs and the results are shown in Fig. 6.
The hybrid processor's arithmetic unit at this point consist of
a 1240 adder unit. However, the processor must have a control unit
and a control memory. The control memory requires a maximum of N/2
complex multipliers plus the control program. The control unit must
be able to index the pass numbers, the arithmetic operations, and the
eight sample process pass number. Assuming a single word in memory
is equivalent to a full adder in cost and complexity, a possible basis
of comparison is achieved. Since two words of memory are needed for
each multiplier there will be N words or, equivalently , N adders
required in the control memory of a N sample unit. An assumption of
a 200 adder equivalent Control unit, including the control program
memory is a reasonable estimate. For the range of sample inputs from
30
8 to 1024 this value should remain relatively constant. In general,
the control unit then requires the equivalent of N + 200 adders.
The 16 sample hybrid processor is the equivalent of 1450 full
adders. Figure 7 illustrates the relative complexities of the hybrid,
the parallel, and the serial processor. The basic requirements for
sample input and coefficient output memory is the same for all three



































FIGURE 7. COMPLEXITY Vs. NUMBER OF SAMPLES
33
4. Conclusions
Figure 6 graphically shows the superiority of the parallel
processor with respect to speed of computation while Fig. 7 demon-
strates the enormity of size and complexity required to obtain those
speeds. In reverse, Fig. 7 shows the serial processor with the
complexity advantage and being extremely slow in computation, as
shown in Fig. 6.
The combining of serial and parallel operations succeeded in
incorporating the better qualities of each. Certainly, the com-
plexity of the hybrid system follows the same trend as the serial
processor in that it is relatively flat in the log- log plot. This
was to be expected. In presenting the hybrid system, the major
nature of the system is serial. The parallel characteristics are
found in the handling of several computations at one time and reducing
substantially the arithmetic operation time by batch processing
through the eight sample full-parallel processor in the final stages.
On the log- log plot of Fig. 6 the slope of the hybrid curve is slightly
greater than that of the serial curve. However, at the last point
presented (1024 samples) the hybrid processor is still 500 times
faster than the serial unit.
It is evident from the computation times shown that definite
applications for the special purpose handling of FFT's do exist. In
fact, for many applications, real-time analysis is possible. A central
problem remains, that of analog to digital conversion of the data in
times that will be able to fully utilize the speed of the processors.
34
BIBLIOGRAPHY
1. J. W. Coo ley and J. W. Tukey, "An Algorithm for the Machine
Calculation of Complex Fourier Series," Mathematics of
Computation , Vol. 19, No. 90, (1965), pp. 297-301.
2. W. M. Gentlemen and G. Sande, "Fast Fourier Transforms for





, Vol. 29, Washington, D. C. : Spartan, 1966,
pp. 563-578.
3. W. T. Cochran, J. W. Cooley, D. L. Favin, H. D. Helms,
R. A. Kaenel, W. W. Lang, G. C. Ma ling, Jr., D. E. Nelson,
C. M. Rader, and P. D. Welch, "What is the Fast Fourier
Transform?", IEEE Transactions on Audio and Electro-
acoustics , Vol. AU-15, No. 2, June 1967, pp. 45-55.
4. M. S. Schmookler, "Microelectronics Opens the Gate to
Faster Digital Computers," Electronic Design , Ed. 16,
July 5, 1966, pp. 52-57.
5. Flores, I., The Logic of Computer Arithmetic , Prentice-
Hall, Inc., Englewood Cliffs, New Jersey, 1963.
35
APPENDIX I
EQUATIONS FOR EIGHT SAMPLE APPLICATION
The following set of equations defines the DFT for n=8 . The
equations are expressed in terms of the sample number (sample subscript)
rather than using the subscripted notation of the Introduction. In
this notation the couplet [Z(0)+Z(4)] would appear as (0+4), and non-
integer numbers are multiplicative constants.
F(0) = [(0+4)+(2+6)] + [(l+5)+(3+7)]
F(l) = !(0-4)+.707[(l-5)-(3-7)] I -j I (2-6)+.707[ (l-5)+(3-7)] ]
F(2) = [(0+4)-(2+6)]-j[(l+5)-(3+7)]






j -j j (2-6)-.707[(l-5)+(3-7)] ,
F(6) = [(0+4)-(2+6)]+j[(l+5)-(3+7)]
F(7) = !(0-4)'+.707[(l-5)-(3-7)]
j +j j(2-6)+.707[(l-5)+(3-7)] j
36
APPENDIX II
EQUATIONS FOR SIXTEEN SAMPLE APPLICATION
The set of equations beginning on the following page defines
the DFT for N=16. As in Appendix, I, the equations are expressed






















•"-s l—l v^/ s-\ i—l
+ x-v l—l i—i l—l 1 1 l-{ r-4
i—i l—l i—i l—l co /—s l—l |









co ^ 1 1 1 1 1 1 o CO i i r^





1 I O o o o • 1 1 o r~-
r^ i^ r-^ r^. r^. i r^ i^. •
o • • • • /^ o • +




/— •-n i—i • /~s CO
+ CO CO co co i + CO i—l
•~\ i—l l—l i—i i—i m s~\ i—l 1
m 1 1 i i v— ON 1 m
i—i
i
in m m m
l—l
m w




— - CM CM 00 r—
i
CM • *~ ' '• 00 ON
., 00 ON ON CO /—
s







ro • • in • i—
p
1 > • 1


































in m m ^—
v
m i v-x n—y m r-4
+ m i—4 —I i—i l—l —I r^ 1—
>
+ + i—i |
/—N t—
i
I i i l—l i N—
^
/^v •— s-^ i I^>



















+ T ' s~\










\ 1 l—l 1—4 1 i—i 1 | ( ^-s ^s >—' N—' l—l l—l
V i—i i—i /«-n /"-\ l—l —J 1 1 1—4 cn x—
\
m + I 1 | ) l—l 1
i—4 i m in 1 1m /—
\
1
N—' m i—i /—
\




1 m i—i i—i CO CO CO 1 | 1—4 + O o I 1 co >w'
/-N co v—
'
+ + N-^ l—l v-' l^~ + r^ 1—4 r^ •~\ ^s 1 1
co ^^^ i i i
—
r^ 1 1 r^ + 1 1 o r^ *^s 1 • co 1 1 r^
r-4 i i i
—
n—' ^s r-^ o m t^~ r^ Vw^ 1 CM + i—i r^ O
+ i^. o i 1 O r^ N—
'
o • 1 /""
\





l—l • i /'^ • /-^ l—l l—l r^ " - i—i N^ • +
+ • + 1—4 l—l i z~-*. ON i ON 1—4 + O ,., i + + ^-s
•—
s
+ s—s + + ^-v ON + /—
s
| + CO r^ - m /—
N
/^-\ ON
o> <•— ON co co ON 1 i—l ON r-4 cn v—y • 1—
1
v—' ON ON 1

















^ 1—4 i 1 1
/»-n •"N CM •>—
1
ON ^-N co 1 r^ - 00
00 en co 00 ON 1 00 • cn i—i <t s.^ 1—
I
CM CO
+ CM co i—i 1-4 CO • co —I + s*^ + S~\ | ON •
ON • + + • 1 - • + + m , ^—
N
m • 1
• + m m + 1 1 1 m \-y l—l i—i •" '"" 1
r—
n
+ \»/ \^y - ^^ - i 1 l—l i 1 1






co •*s <f 1—
1
S~\





+ /"-\ <t" + + -J- —I vO <i x—> + i—l y^N 1 1 /—
N
+ <t i—lO <* i—i l—l i—l l—l 1 N—^ i—i 1 i—l N-X <f 1^-. l—l vO —I 1
»— i—i l *>_• N^ 1 vO + i vO N_X 1 1 l—l o l—l N«^ 1 v£)
+ I vD 1 1 1 1 VD N—' /-N VO N—' 1 1 1 r-» | + vO N—
/""\ v£> >*-•
"
^S + o N—' +
* O » CO S~\ ^w* +
o — + - 1 s-\ l—l 1 /—
s
""
r-^ + -w' o | s~\
1—1
1 ^-N r^» r^ •-N O + s—V O r^ o 1 s-s 1 1 l—l y~\ O
+ /"-\ o o o o —I CM o i—l o r^ /-^ CO r^ + O i—l




1 CM 1 1 1 CM • I r-l 1 I^~ ^x 1 CM











•^ 1 1 r^ CM l—l r^. O CM r^ o CM i—i 1 1 ON CM r~» o
cm r~- o i—l + o r^ i—l o r^ i—l + r^> i 1 i—l o r-»
1—1 o r~~ + vO r--- • + 1^. • + vO o l—l + r-^ •
+ r^ • <r •^y • i <r • i o- N—^ r-» ^~y <f • +
<f • + Nw' I i •— N—* 1 •—
\










•"•s CM s-\ o 00 i—l y^\ 00 i—
1
•~\ o s~*. /~\ 00 i—
1
.-~\ 00 r-l °? 1—4 1 1 00 1 1 00 l—l 00 °P 1 1
°P 1 | + + o <t + o <!- + + 1 y—\ + o <t+ o <t O CM ^^ i—
'
o v-^ N— o CM o 00 O ^^ s_^



















1 1 1 1














C\J CO <f m vO r^
Nwr' K^y \~s \~S N—S v_^










+ s~\ + /"N
















1 1 r-. i i r^





/—s CO /~\ CO





















CM co /""V CM CO































N—** CO r>- x-
^
<r •~\
I v_s | >w/ \^/ i i 1 r—i 1—1
\ \
1 1






e""s 1—1 1 /"•s 1—1 ( 1 vO 1
co
/"N m 1—1 1 •~\ i—l 1 /—
\
m + N^
m i—i 1 CO co 1 co m i—i z—
\
1 1
i—i + CO •w r—
1
co ^—< i—i + o r-.
+ r-^ "— 1 1 + v»^ 1 1 + r-~ 1—1 O
r^ N*^ 1 1 !*«. m 1 i r~» r~- ^—y 1 r^
v_z
1 I*". O ^w^ r-» o >-• i CM . •
1 /™v o r^ + o r-- i z—\. v^ " +
/-N r-l p«. • /—
N
l~- • /^\ i—i 1 1 ' *" /—
N
r-l i—l • i ON • i i—i i—i r-. —~* " CO
r-l + 1 s-^ + 1 /—\ i—i + o 1 1 1—1
+ CO •"N ON i—l /"
v
ON + CO r^. (»-S 1
CO v^-1 ON 1 n_y ON 1 CO V • m m
^-z V 1 1—1 1 1 1 1—1 N—
'


















CO CM + CM /~\ co I +






+ co • CO • 1—1 + NwZ i—l i—
i
+ in • + r—
r
1 + m 1—1 /"N
in •*s 1 /—s +" m N-X 1 mw
1
*



















ON 1 1 r-~.
ON + /N s* M3 s~\ <r OS + 1 1 r>. n^'
+ i—l -d- i—l N—
'
<t r-l + r-l /""s o i
r— v-^ 1—1 1 + i—l 1 i—
1
^w' <t r~~. /—
\
v^
1 1 1 MD •—
v
| v£> *~s 1 1 1—1 • i-H























O r~» o 1 1—1 "W
o r»* o 1—1 ^— O i—i o r-. /-N 1 I I
"•«. • 1—1 1 I J i—l 1 r-> • o m r~~.
•
i 1 CM 1 1 CM • + 1—1 v— O
—
i





r^ vt 1 I r*» CM I I r~> z-\ <t N—Z 1
+"
CM 1—1 l*». o r— f^- O CM i—i 1 1 /—\
r-l + o Is*. + O ix- r-l + r-». ON






i ^ v^z r^. i—l
^^
1 i /f-S + i /-N >—x 1 • •^s
1 /"N /~s CM /—
\
/•^ CM 1 z—
s
+ . .
•"N o 00 i—l 00 00 i—l Z—
N
o Z~N



















II 1 II + II II 1 II + II +
/""\ «*"N /—N ^-^ /-N /~\
o t—
1
CM CO <f m









Pn fa fa Cn fe &H
39
APPENDIX III
ILLUSTRATIVE EXAMPLE OF THE DISCRETE
FOURIER TRANSFORM AND ITS INVERSE
A time signal, ££(t)
,
is sampled eight times at equally spaced
intervals. For convenience, the signal is a sinewave and the






j 1 ^ ^
/4 K' 2 3n/4 jt\ 5tt74 3*/2 7it/8
1 J






















Now, solving for the coefficients
F(0)=(0+0)+(0+0) =
F(l)=[l.414+.707(2-0)]-j[l.414+. 707(2+0)] = 2.828(l-j)
F(2) = (0-0)-j(0-0) =
F(3)=[l.414-.707(2-0)]+j[l.414-. 707(2+0)] = 0+j0 =
F(4)=(0+0)-(0+0) =
F(5)=[1.414-.707(2-0)]-j[l.414-. 707(2+0)] = 0-j0 -
F(6)=(0-0)+j(0+0) =
F(7)=[l. 41+. 707(2-0) ]+j[l.414+. 707(2+0)] = 2.828(l+j)
If the foregoing procedure is correct, application of the inverse
transform should produce the original sample values. Noting that the only
non-zero coefficients are F(l) and F(7), the inverse is written
Z(k)=l/8 F(l)exp(JTTk/4)+F(7)exp(jTTk/4)
and the Z(k) 's are
Z(0)=(2.828/8)[(l-j)+(l+j)] = .707
Z(l) = (2.828/8)[(l-j)(.707)(l+j)+(l+j)(.707)(l-j)] = 1
Z(2)=(2.828/8)[(l-j)(j)+(l+i)(-j)] = .707




Z(7)=(2.828/8)[(.707)(l-j) 2+(.707)(l+j) 2 ] =
The above Z(k) match precisely the original data; thus showing the




Basically a full adder has three inputs (two addends and a
carry-in) and two outputs (sum and carry-out). Thus, it may be
looked upon as a reduction process of from three to two arguments.
Considering the "carry- in" as a third addend, it is apparent that
the addition of more than two quantities may be handled by cascading
the afore mentioned reduction process. Such implementation of full
adders has led to both the process and the adders themselves being
called Carry-Save Adders (CSA)
.
If more than three addends are being summed more than one
initial CSA would be required. The figure below schematically shows














Schematic representation of a six argument Carry-Save Addition
Note that this is strictly a representation of the reduction process.
In the actual process the "carry-out" of each adder must go to the
next higher significant bit train. Its position, as shown in the
diagram, would be taken by the carries from the next lower significant
bit train. It is noted here that the CSA process generally requires
k-1 units for the reduction of k addends. The final stage (level) of
the CSA process is of interest since the propagation of the carries
are of a "ripple" nature.
42
The following diagram presents a complete four-bit adder

































Block diagram of a four-bit, four addend,
Carry-Save Adder (an addition of four 4-bit words)
is strictly a logic function. With the assumption of a realistic
signal delay time through each logic level, the total computation
time may be calculated. Although pressing the state of the art
at this time, an assumption of a 20 nsec full adder is not unrealistic.
Thus, for the given adder, three levels of delay plus the final carries'
ripple through two delay levels gives a total of five delays, or a 100
nsec addition of four, 4-bit arguments.
With increasing word lengths the computation time increases
linearly for the CSA process. This characteristic is the result of
adding one more stage of carry ripple for each additional bit. For
a 12-bit version of the four argument adder, there would be 13 units
of delay and an addition time of 260 nsec.
43
The use of carry look ahead techniques can reduce the ripple
time by a factor of two per stage of delay. For the 12-bit example,
the three initial units of delay are summed with 10 half-units for
8 delays and a 160 nsec addition of four, 12-bit, quantities is
possible.
Finally, ones complement arithmetic is appropriate since it is
easily obtained by using the two's complement and just adding one
in the least significant bit train.
The following table gives the number of delays as a function of
the number of arguments in a CSA process.


















3. Commandant of the Marine Corps (Code A03C) 1
Headquarters, U. S. Marine Corps
Washington, D. C. 22214
4. Professor Harold A. Titus 10
Department of Electrical Engineering
Naval Postgraduate School
Monterey, California 93940











DOCUMENT CONTROL DATA • R&D
(Security claeeHicatlon ol title, body ot abstract and indexing annotation null be entered when the overall report tt> claeellled)
1. ORIGINATIN G ACTIVITY (Corporate author)
Naval Postgraduate School
Monterey, California 93940




The Applicability of Special Purpose Computers to Fast Fourier Transforms
4. DESCRIPTIVE NOTES (Type ot report and inclusive date*)
Masters Thesis - September 1967
5- AUTHORfS.) (Laat name, drat name. Initial)
Adams, David H. , Captain, U. S. Marine Corps
6 REPORT DATE
September 1967
la. TOTAL NO. OF PAGES
45
7 b. NO. OF REPS
5
8*. CONTRACT OR ORANT NO.
b. PROJECT NO.
9a ORIGINATOR'S REPORT NUMBERfSJ
9b. OTHER REPORT NO(S) (Any other number* that may be aeaianed
thie report)
10. AVAILABILITY/LIMITATION NOTICES
This document is subject to special export controls and each transmittal to foreign
government or foreign nationals may be made only with prior approval of the Naval
Postgraduate School.
11. SUPPLEMENTARY NOTES 12 SPONSORING MILITARY ACTIVITY
a) U. S. Marine Corps
b) Naval Weapons Center
China Lake, California
13. ABSTRACT
The Fast Fourier Transform is an algorithm for the computation of Discrete
Fourier Transforms in less time than allowed by any other algorithm available.
The use of special purpose digital machines to reduce those times even further
is of interest for real time spectral analysis. The main principles of Fast
Fourier Transforms are presented. The design of a full-parallel eight sample
processor is presented as a point of reference for comparison with serial and
serial-parallel hybrid machines. Carry-Save Addition is introduced and used
as the primary arithmetic logic.











ROLE W T ROLE WT ROLE W T
DD ."2," .1473 back
S/N 0101-807-6821
48 UNCLASSIFIED
Security Classification






