A Modular Mixed Signal VLSI Design Approach for Digital Radar Applications by Brakus, Brian M.
Air Force Institute of Technology 
AFIT Scholar 
Theses and Dissertations Student Graduate Works 
3-7-2007 
A Modular Mixed Signal VLSI Design Approach for Digital Radar 
Applications 
Brian M. Brakus 
Follow this and additional works at: https://scholar.afit.edu/etd 
 Part of the VLSI and Circuits, Embedded and Hardware Systems Commons 
Recommended Citation 
Brakus, Brian M., "A Modular Mixed Signal VLSI Design Approach for Digital Radar Applications" (2007). 
Theses and Dissertations. 3098. 
https://scholar.afit.edu/etd/3098 
This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been 
accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more 
information, please contact richard.mansfield@afit.edu. 
A Modular Mixed Signal
VLSI Design Approach
for
Digital Radar Applications
THESIS
Brian M. Brakus, Captain, USAF
AFIT/GCE/ENG/07-02
DEPARTMENT OF THE AIR FORCE
AIR UNIVERSITY
AIR FORCE INSTITUTE OF TECHNOLOGY
Wright-Patterson Air Force Base, Ohio
APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED.
The views expressed in this thesis are those of the author and do not reflect the
official policy or position of the United States Air Force, Department of Defense, or
the United States Government.
AFIT/GCE/ENG/07-02
A Modular Mixed Signal
VLSI Design Approach
for
Digital Radar Applications
THESIS
Presented to the Faculty
Department of Electrical and Computer Engineering
Graduate School of Engineering and Management
Air Force Institute of Technology
Air University
Air Education and Training Command
In Partial Fulfillment of the Requirements for the
Degree of Master of Science in Computer Engineering
Brian M. Brakus, B.S.Cp.E.
Captain, USAF
March 2007
APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED.
AFIT/GCE/ENG/07-02
A Modular Mixed Signal
VLSI Design Approach
for
Digital Radar Applications
Brian M. Brakus, B.S.Cp.E.
Captain, USAF
Approved:
/signed/ 7 Mar 2007
Dr. Yong C. Kim (Chairman) date
/signed/ 7 Mar 2007
Lt Col James A. Fellows, PhD (Member) date
/signed/ 7 Mar 2007
Dr. Guna S. Seetharaman (Member) date
/signed/ 7 Mar 2007
Dr. Greg L. Creech (Member) date
/signed/ 7 Mar 2007
Dr. John M. Emmert (Member) date
AFIT/GCE/ENG/07-02
Abstract
This study explores the idea of building a library of VHDL configurable compo-
nents for use in digital radar applications. Configurable components allow a designer
to choose which components he or she needs and to configure those components for a
specific application. By doing this, design time for ASICs and FPGAs is shortened be-
cause the components are already designed and tested. This idea is demonstrated with
a configurable dynamic pipelinable fast fourier transform. Many FFT implementa-
tions exist, but this implementation is both configurable and dynamic. Pre-synthesis
customization allows the FFT to be tailored to almost any DSP application, and the
dynamic property allows the FFT to calculate different length FFTs real-time. Three
objectives will be accomplished: design and characterization of the aforementioned
FFT; analysis of the error involved in the FFT calculation using different twiddle
factor bit widths; and finally an analysis of all the configurations for the synthesized
design using a 90 nm technology library. Speeds of up to 225 MHz have been simulated
for a length-1024 FFT using the 90 nm technology.
iv
Acknowledgements
First and foremost, I owe a large debt of gratitude to my wife for her understanding
and patience during this research. Additionally, I would like to thank Dr. Yong Kim
for his wisdom and helpfulness with the research and writing of this thesis. I would
also like to thank Dr. Marty Emmert and Dr. Greg Creech for allowing me to work
with AFRL/SND and I hope the knowledge and data within this paper is very useful.
Lastly, I’d like to thank my fellow students for their knowledge and research efforts:
Jason Paul, Joe Pomager, and Jason Shirley.
Brian M. Brakus
v
Table of Contents
Page
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Specific Issue: . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem Statement: . . . . . . . . . . . . . . . . . . . . 3
1.3 Scope and Assumptions: . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Organization: . . . . . . . . . . . . . . . . . . . . 4
1.5 Chapter Summary: . . . . . . . . . . . . . . . . . . . . . 4
II. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . . 5
2.2 Discrete Fourier Transforms . . . . . . . . . . . . . . . . 6
2.3 Fast Fourier Transforms . . . . . . . . . . . . . . . . . . 7
2.3.1 Decimation-In-Time FFT . . . . . . . . . . . . 7
2.3.2 Decimation-In-Frequency FFT . . . . . . . . . . 12
2.4 VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Other FFT Implementations . . . . . . . . . . . . . . . . 14
2.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . 17
III. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Overall Design . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Butterfly Modules . . . . . . . . . . . . . . . . . . . . . 21
3.3 Twiddle Factors . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 Stage 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 Stage X . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7 Completed Design . . . . . . . . . . . . . . . . . . . . . 29
3.8 Testing Procedure . . . . . . . . . . . . . . . . . . . . . 30
3.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . 32
IV. Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Simple Cosine Curve . . . . . . . . . . . . . . . 33
4.1.2 Frequency Sweep . . . . . . . . . . . . . . . . . 34
4.2 Timing Analysis . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Total Area Analysis . . . . . . . . . . . . . . . . . . . . 48
4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . 51
vi
Page
V. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.1 Explanation of the Problem . . . . . . . . . . . . . . . . 52
5.2 Summary of Background . . . . . . . . . . . . . . . . . . 52
5.3 HDL Code Development: Significance, Limitations, and
Further Research . . . . . . . . . . . . . . . . . . . . . . 52
5.3.1 Expanding Beyond the 1024-point Limit . . . . 53
5.3.2 Implementation of a Decimation-In-Time-Frequency
Algorithm . . . . . . . . . . . . . . . . . . . . . 55
Appendix A. FFT Synthesis Testings Scripts . . . . . . . . . . . . . . 57
Appendix B. FFT Error Analysis Testings Scripts . . . . . . . . . . . 60
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
vii
List of Figures
Figure Page
2.1 Fourier Transform Example . . . . . . . . . . . . . . . . . . . . 5
2.2 DIT - Stage 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 DIT - Stage 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 DIT - Length Two DFT . . . . . . . . . . . . . . . . . . . . . . 10
2.5 DIT - Full Decomposition . . . . . . . . . . . . . . . . . . . . . 11
2.6 DIF - Stage 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 DIF - Full Decomposition . . . . . . . . . . . . . . . . . . . . . 14
2.8 Overall Architecture of Reconfigurable FFT Processor . . . . . 15
3.1 R22SDF FFT Architecture for N=1024 . . . . . . . . . . . . . 20
3.2 BF2I Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 BF2II Module . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 W3 Module for a 1024 point dynamic FFT . . . . . . . . . . . 25
3.5 Timing Controller and pipeline for a 1024 point dynamic FFT . 27
3.6 Stage 1 Flow Graph Comparison . . . . . . . . . . . . . . . . . 28
3.7 Stage X of FFT . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.8 Layout of a 1024 length dynamic FFT . . . . . . . . . . . . . . 29
4.1 Average % Error . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Length 8 Error Analysis . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Length 16 Error Analysis . . . . . . . . . . . . . . . . . . . . . 36
4.4 Length 32 Error Analysis . . . . . . . . . . . . . . . . . . . . . 37
4.5 Length 64 Error Analysis . . . . . . . . . . . . . . . . . . . . . 38
4.6 Length 128 Error Analysis . . . . . . . . . . . . . . . . . . . . 39
4.7 Length 256 Error Analysis . . . . . . . . . . . . . . . . . . . . 40
4.8 Length 512 Error Analysis . . . . . . . . . . . . . . . . . . . . 41
4.9 Length 1024 Error Analysis . . . . . . . . . . . . . . . . . . . . 42
4.10 Frequency Sweep Plot . . . . . . . . . . . . . . . . . . . . . . . 43
4.11 Average Error in Frequency Sweep . . . . . . . . . . . . . . . . 44
4.12 Maximum Error in Frequency Sweep . . . . . . . . . . . . . . . 44
4.13 Timing Analysis for 350 nm . . . . . . . . . . . . . . . . . . . . 46
viii
Figure Page
4.14 Timing Analysis for 90 nm . . . . . . . . . . . . . . . . . . . . 47
4.15 Total Area Analysis for 350 nm . . . . . . . . . . . . . . . . . . 49
4.16 Total Area Analysis for 90 nm . . . . . . . . . . . . . . . . . . 50
5.1 Previous Implementation Comparison . . . . . . . . . . . . . . 54
5.2 DITF Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . 56
ix
List of Tables
Table Page
2.1 Bit-reversed Order Example . . . . . . . . . . . . . . . . . . . . 11
2.2 Power and Area Results . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Configurable parameters. . . . . . . . . . . . . . . . . . . . . . 18
3.2 Inputs and Outputs. . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Wx Modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Parameters for Timing Analysis . . . . . . . . . . . . . . . . . 45
4.2 Parameters for Total Area . . . . . . . . . . . . . . . . . . . . . 48
5.1 DITF Calculation Comparison . . . . . . . . . . . . . . . . . . 56
x
List of Abbreviations
Abbreviation Page
EW Electronic Warfare . . . . . . . . . . . . . . . . . . . . . . 1
DSP Digital Signal Processing . . . . . . . . . . . . . . . . . . . 1
ADC Analog to Digital Converter . . . . . . . . . . . . . . . . . 1
ASIC Application Specific Integrated Circuit . . . . . . . . . . . 1
RTL Register Transfer Language . . . . . . . . . . . . . . . . . 1
FPGA Field Programmable Gate Array . . . . . . . . . . . . . . 2
DoD Department of Defense . . . . . . . . . . . . . . . . . . . . 2
FFT Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . 2
TSMC Taiwan Semiconductor Manufacturing Company . . . . . . 3
VHSIC Very-High-Speed Integrated Circuit . . . . . . . . . . . . . 3
VHDL VHSIC Hardware Descriptor Language . . . . . . . . . . . 3
AFRL Air Force Research Lab . . . . . . . . . . . . . . . . . . . 4
DFT Discrete Fourier Transform . . . . . . . . . . . . . . . . . 6
DIT Decimation In Time . . . . . . . . . . . . . . . . . . . . . 7
DIF Decimation In Frequency . . . . . . . . . . . . . . . . . . 7
CMOS Complementary Metal-Oxide Semiconductor . . . . . . . . 16
DITF Decimation-In-Time-Frequency . . . . . . . . . . . . . . . 55
xi
A Modular Mixed Signal
VLSI Design Approach
for
Digital Radar Applications
I. Introduction
Since at least the year 2000 the Pentagon has seen a need for highly advancedElectronic warfare (EW) aircraft. The Pentagon published a Kosovo after-action
report to Congress discussing how NATO forces had difficulty in targeting missile
sites [8]. Also, a separate report said the problems included interference from other
aircrafts’ jammers with friendly targeting devices. These reports preempted Congress
to begin a study in ways to improve EW. Billions of dollars are spent researching and
developing newer and more advanced radar systems. In addition to the high costs,
design and development time can take months even up to years.
In most radar systems digital signal processing (DSP) is used extensively. DSP
is the study of signals in a digital representation and the processing methods of these
signals. The main goal of DSP is to filter to measure real-time analog signals. An
analog-to-digital converter (ADC) is used initially to transform analog signals used in
radar communications into digital signals. Many types of filters and transforms are
used in DSP. These functions are implemented in some type of Application Specific
Integrated Circuit (ASIC). A general conventional design flow for ASICs is as follows:
1. Functional Specifications
2. Design Partitioning
3. RTL (RTL) Design & Simulation
4. Functional Verification
5. Synthesis for Area & Timing Optimizations
1
6. Placement & Routing
7. Chip Fabrication
This design flow is limited due to the length of time needed to make an ASIC.
The RTL and Simulation process (Step #3) itself can take many weeks or months
to complete, depending on the complexity of the design. The design flow is also
limited by the high costs associated with it and the ASIC’s limited flexibility. To
solve this problem, a speedy and adaptable design flow will be proposed by placing
pre-defined modular components into a library. This library will consist of highly
customizeable and configurable codes of DSP functions that can target either ASICs
or field programmable gate arrays (FPGA) to produce circuits to suit the intended
applications. The development of this library will be a time consuming process in
itself, but once the library is complete all a designer must do is pick and choose
which components from the library he or she wants to use. The components will
be configurable so there will be limitless design possibilities. Performing this work
in-house will save the Department of Defense (DoD) from having to out-source to
companies such as Boeing or Raytheon, who could charge millions of dollars to produce
such a product.
1.1 Specific Issue:
DSP is an extremely important function in radar applications. The processing
of digital data must be performed as fast as possible so the warfighter has the ad-
vantage in any combat situation. One such component in DSP is the Fast Fourier
Transform (FFT). The FFT is an algorithm for converting a digital signal in the
time domain to a signal in the frequency domain. One of the original uses of the
FFT was to distinguish between nuclear explosions and natural seismic events. These
two phenomena produce different frequency spectra. By converting the signals to the
frequency domain a distinction between the two events could be seen. Aircraft have
different radar signatures, so by using the FFT on the radar signals received the pilot
can see the aircraft’s location and speed. In this research a configurable FFT will be
2
developed for the aforementioned library. In addition, this FFT will be dynamic; i.e.
it will be able to calculate different length FFTs real-time. This implementation will
be using a 90 nm technology library from the Taiwan Semiconductor Manufacturing
Company (TSMC). Results from this library will be compared to those of the AMI
350 nm library from Oklahoma State University. The 90 nm technology will provide
for faster speeds and lower power consumption compared to those of the 350 nm
library.
1.2 Problem Statement:
The problem to be solved is the demonstration of a modular digital radar library
by designing and characterizing one possible component. The FFT being designed
will be both configurable and dynamic. The configurable parameters can be changed
pre-synthesis and the dynamic parameters can be changed at run-time. To keep the
chip size small and power consumption low, a minimal hardward approach will be
used. This will result in a longer design time for each component in the library but
will allow for the most efficient design.
1.3 Scope and Assumptions:
It is assumed that readers of this paper will have a basic understanding of dig-
ital signal processessing and more specifically FFTs. Additionally, strong knowledge
of the Very-High-Speed Integrated Circuit(VHSIC) Hardware Descriptor Language
(VHDL) is required to understand the coding of the design. The software used for
this research includes Modelsim for circuit simulations, MATLAB for simulations and
error analysis, and the Cadence Encounter RTL compiler for synthesis and power,
timing, and area analysis. A knowledge of simple digital logic components is also
assumed. Such components include muxes, adders, and pipeline registers.
3
1.4 Thesis Organization:
The next chapter of this research project will discuss background information
necessary to understand the scope of this project. A discussion of the mathematics
and algorithms for the FFT is included. Additionally several current (within the past
5 years) FFT implementations will be analyzed and their results discussed. Chapter
III will consist of the theory involved in the design of the FFT architecture and the
methods of testing used. The results of the implementation and characterization
will be discussed in chapter IV. Finally, a review and a look at future topics will be
discussed in chapter V. All VHDL code will be viewable in the appendices.
1.5 Chapter Summary:
The purpose of this research project is to characterize and implement an FFT
component for use at the Air Force Research Lab (AFRL). Pending a successful
demonstration of this component, the component will be included in a future library
of many configurable DSP functions, saving the U.S. military millions of dollars in
addition to many months of design time.
4
II. Background
This chapter provides an overview of the research involved in understanding thescope of this thesis. Fourier transforms and specifically FFTs are reviewed.
Two FFT algorithms are discussed, as one will be used in the FFT implementation.
Many FFT implementations are available in the IEEE database. Several of the most
recent implementations and their claimed results are analyzed.
2.1 Fourier Transform
To understand the derivation and need for the Fast Fourier Transform, we will
first look at the Fourier Transform. The Fourier transform and series are named
after the French scientist and mathematician Joseph Fourier. The equation for the
Fourier Transform is given in (2.1). It is a generalization of the complex Fourier
Series [4]. This equation takes a signal in the time domain and transforms it into the
frequency domain. Information such as frequency range and energy can be obtained
from the frequency domain representation. Figure 2.1 shows an example of such a
transformation.
X(f) =
∫
∞
−∞
x(t)e−j2piftdt (2.1)
0 5 10 15 20 25 30
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
2
4
6
8
10
12
14
16
(b)
Figure 2.1: Signal representation in (a) time domain (b) frequency domain
5
2.2 Discrete Fourier Transforms
The Fourier Transform worked on continuous signals. In fields such as signal
processing, signals are usually sampled. These sampled signals are called discrete
signals. To calculate the Fourier Transform of a discrete signal, the Discrete Fourier
Transform (DFT) is used. James Tsui explains the two limitations of the continuous
Fourier transform in his book [21]:
First, the function in the time domain must be representable in closed
form so that the Fourier integral can be performed. Thus, unless the
input function can be written in closed form, it is impossible to evaluate
the integral. Second, even if the time function can be written in closed
form, it might also be difficult to find a closed-form solution to the integral.
The data to be transformed comes from an ADC, so it is digitized and the function
in the time domain is unknown. Unlike the Fourier transform, the DFT can be
performed on any kind of input data; therefore, its usage is unlimited [21]. Also, the
results from a DFT are an approximate solution.
The general definition of the DFT is as follows: let x(n), n = 0, 1, 2...., N − 1,
be an N-point sequence. From [18], the definition of its discrete Fourier transform is
X(k) =
N−1∑
n=0
x(n)e−j
2pi
N
nk, k = 0, 1, 2, ..., N − 1 (2.2)
For convenience, denote e−j
2pi
N
nk by WN , so equation (2.2) becomes:
X(k) =
N−1∑
n=0
x(n)W knN , k = 0, 1, 2, ..., N − 1 (2.3)
which can be expanded into
X(k) = x(0)W 0N + x(1)W
k
N + x(2)W
2k
n + ... + x(N − 1)W
(N−1)k
N (2.4)
The WN term is called the n
th root of unity, also known as a “twiddle factor”. This
term was coined by Gentleman and Sande in 1966, and has since become widespread
6
in the world of FFTs [9]. From equation (2.4), the calculation of each X(k) requires
N complex multiplications and N complex additions. Since X(k) is calculated from 0
to N-1, the direct computation of the DFT requires on the order of N 2 multiplications
and N2 additions. The complexity of this equation is O(N 2). For example, a 1024
point DFT would require approximately 2,097,152 operations! Fortunately there is
an algorithm which will reduce the complexity from O(N 2) to O(N log2 N). For the
same 1024 point FFT, only about 20,480 operations will be needed, a large decrease
which means a faster calculation. This algorithm is called the Fast Fourier Transform.
2.3 Fast Fourier Transforms
In 1805, Carl Friedrich Gauss describes the critical factorization steps for the
FFT. Almost 150 years later in 1965 James Cooley and John Tukey formally publish
the algorithm for the FFT [7]. They exploited the symmetrical properties of complex
exponentiation reducing the complexity to N log2 N . There are two variations of the
FFT algorithm, the Decimation-In-Time (DIT) FFT algorithm and the Decimation-
In-Frequency (DIF) FFT algorithm.
2.3.1 Decimation-In-Time FFT. Cooley and Tukey, using the Danielson-
Lanczos Lemma from 1942 [16], developed what is known as the decimation-in-time
FFT algorithm. This lemma is only applicable if the length is a power of 2. From
now on, we will assume N , the length of the transform, is a power of 2. The allowable
lengths in the VHDL implementation range from 4, 8, 16, ..., up to 1024. For the
DIT algorithm, x(n) is divided into two sequences, each of length N/2. The even-
indexed samples and odd-indexed samples are grouped separately. Equation (2.2) can
7
be rewritten as
X(k) =
N−1∑
n=0
x(n)e
−j
2pink
N
=
N
2
−1∑
n=0
x(2n)e
−j
2pi(2n)k
N +
N
2
−1∑
n=0
x(2n + 1)e
−j
2pi(2n + 1)k
N
=
N
2
−1∑
n=0
x(2n)e
−j
2pink
N
2 + e
−j
2pik
N
N
2
−1∑
n=0
x(2n + 1)e
−j
2pink
N
2 (2.5)
= DFTN
2
[[x(0), x(2), ..., x(N − 2)]] + W kNDFTN
2
[[x(1), x(3), ..., x(N − 1)]]
The simplifications in equation (2.5) show that all frequency outputs X(k) can
be computed as the sum of the outputs of two length N
2
DFTs, using the even-
indexed and odd-indexed discrete samples respectively. The odd-indexed short DFT
is multiplied by a “twiddle factor” term, WN . Because the samples are split into two
separate groups, this algorithm is called a “radix-2” algorithm. Other such algorithms
exist for radix-4 and radix-8, but will not be discussed in this paper. Since the time
samples are rearranged in alternating groups, this algorithm is called decimation in
time. Figure 2.2 shows how this process begins by breaking the inputs up into two
N/2 DFTs. The recombine stage shown in the figure is used to combine the samples
in the correct order. This process is covered later. Now, the two N/2 stages can
be broken down into four N/4-point DFS, as shown in Figure 2.3. This process is
repeated until a series of two-point DFTs are reached. Figure 2.4 shows the flow
graph for a two-point FFT. This structure is also known as a butterfly.
Figure 2.5 shows an example for a length of 8. Notice the “out-of-order” ordering
of the inputs. Actually, this is bit-reversed ordering, and is a natural process due to
the mathematics of the FFT. To obtain a bit reversed number simply take the binary
equivalent, reverse the order of the bits, and recalculate the decimal equivalent from
that. Table 2.1 shows how the numbers are bit-reversed.
This process also allows for in-place computation, which means the results of
8
Figure 2.2: Decimation-in-time of a length N DFT into two length N/2 DFTs
followed by a recombining stage. [18]
the calculations at any stage can be stored in the same memory locations as those
of the input to that stage. This idea is illustrated in Figure 2.5. The calculations of
X(0) and X(4) require the same two inputs. Once this calculation is complete the
two inputs are no longer needed, so the calculated butterfly values of X(0) and X(4)
can be stored in the memory locations of X(0) and X(4). Because of this, only 2N
storage locations are needed.
9
Figure 2.3: Decimation-in-time of a length N DFT into four length N/4 DFTs
followed by a recombining stage. [18]
Figure 2.4: Flow graph for computation of a two-point DFT. [18]
10
Figure 2.5: Decimation-in-time of a length 8 DFT. [18]
Table 2.1: Bit-reversed order for N=8.
Decimal Binary Bit-Reversed Decimal
Number Representation Representation Equivalent
0 000 000 0
1 001 100 4
2 010 010 2
3 011 110 6
4 100 001 1
5 101 101 5
6 110 011 3
7 111 111 7
11
2.3.2 Decimation-In-Frequency FFT. Two gentlemen by the name of Sande
and Tukey developed the decimation-in-frequency algorithm [19]. The DIF algorithm
works backward from the DIT algorithm. Instead of dividing the input sequence x(n)
into smaller subsequences, the output sequence X(k) is subdivided. The algorithm
consists of arranging the DFT into two parts: calculation of the even-numbered fre-
quency indices X(k) for k = 0, 2, 4, ..., N − 2 and calculation of the odd-numbered
frequency indices k = 1, 3, 5, ..., N−1, or X(2r) and X(2r+1), respectively. We have
X(2r) =
N−1∑
n=0
x(n)W 2rnN
=
N
2
−1∑
n=0
x(n)W 2rnN +
N
2
−1∑
n=0
x(n +
N
2
)W
2r(n+ N
2
)
N
=
N
2
−1∑
n=0
x(n)W 2rnN +
N
2
−1∑
n=0
x(n +
N
2
)W 2rnN 1 (2.6)
=
N
2
−1∑
n=0
(x(n) + x(n +
N
2
))W rnN
2
= DFTN
2
(x(n) + x(n +
N
2
))
and
X(2r + 1) =
N−1∑
n=0
x(n)W
(2r+1)n
N
=
N
2
−1∑
n=0
(x(n) + W
N
2
N x(n +
N
2
))W
(2r+1)n
N
=
N
2
−1∑
n=0
((x(n)− x(n +
N
2
))W nN)W
rn
N
2
(2.7)
= DFTN
2
(x(n)− x(n +
N
2
)W nN)
Notice only the odd-indexed frequencies are multiplied by the twiddle factors. Also
the frequency samples are computed separately in alternating groups, hence the dec-
12
imation in frequency designation. The inputs of the DIF FFT are in order and
the outputs are now in bit-reversed order, opposite of the DIT algorithm. It is for
this reason the DIF algorithm is chosen for the VHDL implementation. Either way,
re-ordering hardware is necessary to arrange the data before or after the FFT calcu-
lation. Figure 2.6 shows the first stage with the FFT being split into two N/2 DFTs.
These N/2 DFTs are broken down until a length-two DFT is found. This is shown
in Figure 2.7.
Figure 2.6: DIF of a length N DFT into two length N/2 DFTs. [18]
2.4 VHDL
With the background and mathematics for a FFT in place, a vehicle to create
the FFT circuit will now be discussed. There are several high-level languages which
can be used to describe a digital circuit. VHDL is a popular design entry language for
13
Figure 2.7: DIF of a length 8 DFT. [18]
FPGAs and ASICs. Another popular language is Verilog. One advantage VHDL has
over Verilog is the ability to use generate statements. Generate statements are used to
include many concurrent VHDL statements. In a modular design, generate statements
will be used heavily to create the module using the least amount of transistors, thus
reducing power, timing, and cell area. Once the VHDL code has been tested for
errors and the simulations are correct, the next step is synthesis. Synthesizing takes
the high-level description and produces a gate netlist. The gate netlist is generated
by the Cadence Encounter RTL Compiler software and uses cells from the TSMC 90
nm library.
2.5 Other FFT Implementations
By looking at other FFT implementations one can get an understanding of
what technologies were used, what the targeted results are, and any other novel
14
ideas in an FFT design. Performing a search in the IEEE Xplore online database,
the Opencores website (www.opencores.org), or Google on FFTs will show many
different implementations of FFTs in VHDL and/or Verilog. In [23] a reconfigurable
FFT which can compute lengths from 4 to 1024 is discussed. The author uses a
radix-2 FFT algorithm. An overall view of the architecture is shown in Figure 2.8.
Figure 2.8: Overall architecture of reconfigurable FFT processor [23]
The butterfly block (BB) carries out the butterfly calculations. Twiddle factors
are stored in memory, called the coefficient memory cluster (CMC). The module stores
512 coefficients, enough to satisfy the requirements of a length 1024 FFT. The 512
twiddle factors are divided into 64 smaller modules called coefficient memory modules
(CMM), with each module storing 8 values. Different coefficient sets are obtained by
combining various CMMs. One set of CMMs will provide twiddle factors for a length
16 FFT. For larger lengths, CMMs are combined together to form a larger memory.
The DMC, or data memory cluster, is composed of two 512x32-bit memories, giving
a total of 1024 memory locations. The address generation block (AGB) generates
15
addresses for both the DMC and CMC. These reconfigurable modules can compute
addresses for different FFT lengths. The data switch (DS) routes the butterfly cal-
culations to the correct DMC modules, using the addresses which are determined by
the address switch (AS). The CB, or control block, contains counters which generate
addresses and timing for the entire design. The author used the Verilog language
to design the processor, and the design was synthesized to the UMC 0.18µm CMOS
standard cell library with the Synopsys Design Compiler [23]. Table 2.2 shows the
power and area results after synthesis. The area is constant because this processor
is able to compute FFTs of length 16 through 1024. With each increase in FFT
length, the consumed power increases. This is expected because more calculations
are performed with larger length FFTs.
Table 2.2: Power and Area Results [23]
FFT Size 16 32 64 128 256 512 1024
Power
Consumption 4.7 7.9 8.3 13.0 26.1 49.7 81.6
(mw)
Area (mm2) 2.9
In articles [10], [11], and [20] a pipelinable FFT architecture is presented. This
type of architecture will ultimately be used in the design of the configurable/dynamic
FFT proposed in this research. As such, the overall design will not be discussed until
a later chapter. Pipelining the FFT processor allows for faster speeds to be achieved.
In [11] the author designs a length 1024 FFT in VHDL, and synthesizes with a 0.5µm
Complementary Metal-Oxide Semiconductor (CMOS) technology. Speeds of about
20 MHz have been achieved. This design is not configurable and can only calculate
a length 1024 FFT. Another design following the pipeline architecture is mentioned
in [20]. Again, this implementation is limited to a length of 1024. This author
uses Handel-C to implement the FFT processor. Handel-C is a direct C-to-hardware
language, and can be synthesized directly to high density FPGA devices from Altera
16
or Xilinx [1]. It is based on the C programming language. The author reports a speed
of 82 MHz for a 1024-point FFT.
2.6 Chapter Summary
The background of the FFT was discussed, along with several algorithms used
to compute FFTs. Several current FFT implementations and their results were also
briefly examined. These results will be compared to the results of this research.
17
III. Methodology
The methodology for the dynamic configurable FFT will now be discussed. Thegoal of this research is to show the feasibility of creating a library with many
configurable DSP modules. The design to be implemented and demonstrated is a
configurable dynamic FFT. The design for the FFT architecture is based on [10], [11],
and [20]. Information on twiddle factors is found in [3]. The overall design is discussed
first, followed by a detailed analysis of the major components found in the architecture.
Minor components such as muxes and shift registers are assumed to be known, so their
design will be omitted. One of the main reasons for creating configurable components
is to be able to take a generic component and conform it to a specific application.
These configurable parameters are processed before synthesizing. The configureability
of the FFT is shown in Table 3.1.
Table 3.1: Configurable parameters.
Parameter Description
input width bit width of real and imaginary parts of input data
output width bit width of real and imaginary parts of output data
tf width bit width of real and imaginary parts of twiddle factors
log2N maximum length of FFT (log2 of length); integer between 2 and 10
r1-r10 radix position of fixed rounders
c1-c10 clipping required for each fixed rounder
ri input register; none or plr (pipeline register)
ro output register; none or plr (pipeline register)
pl pipeline FFT; yes or no
rt register reset type; none, synch, or asynch
nc number of different lengths; integer between 1 and 8
3.1 Overall Design
The inputs and outputs for the FFT are shown in Table 3.2. The FFT imple-
mentations is based on the Radix-22 Single-path Delay Feedback (R22SDF) architec-
ture, and uses the DIF algorithm. Input data is processed in-order and the output
is produced in bit-reversed order. The FFT receives N/2size complex inputs sequen-
18
Table 3.2: Inputs and Outputs.
Parameter Description
d in r real part of complex data input
d in i imaginary part of complex data input
q out r real part of complex data output
q out i imaginary part of complex data output
frame in framing control signal; forces FFT calculation to begin on next input sample
frame out framing control signal; next output is result of new FFT calculation
clock rising edge sensitive clock control signal
reset n active low control signal for reset
size dynamic control signal selects length of current FFT; length = N
2size
tially and the first output sample appears after N/2size − 1 samples. The size signal
controls the dynamic property of the FFT. Changing this input changes the current
size of the FFT. An overview of the architecture is shown in Figure 3.1. The BF2I
and BF2II are the butterfly modules. The BF2I is the typical module which was de-
scribed earlier, and the BF2II is essentially the same except it takes into account the
−j twiddle factors and computes them automatically. The boxes above each butterfly
module are shift registers, with the number inside the box describing how many shifts
it performs. The W1(n) through W4(n) variables are the twiddle factors. These are
multiplied with the output of the BF2II module and passed to the next BF2I module.
The twiddle factors are approximated and stored in a ROM. Each butterfly module
has control signals which determine the calculation the module performs. These sig-
nals are generated from a timing controller. In addition to the control signals, the
timing controller generates the addresses for the twiddle factor ROMs. The length
of the FFT in Figure 3.1 is 256. If one wanted to compute a 128 length FFT, the
same architecture would be used. The differences would be the shift registers would
be halved (i.e. 128 stage shift register becomes a 64 stage shift register) and the final
BF2II module would be bypassed.
19
Figure 3.1: R22SDF FFT Architecture for N=1024 [20]
20
3.2 Butterfly Modules
Two butterfly modules are used, a BF2I and BF2II. On the first N/2 cycles,
the 2-to-1 multiplexors in the first butterfly module (BF2I ) are set to ’0’ and the
module is idle. The input data is shifted into the shift registers until they are filled.
On the next N/2 cycles, the butterfly module computes an N/2-point DFT with the
input data and the data stored in the shift registers. The following equations describe
the operations:
Z(n) = x(n) 0 ≤ n < N/2
Z(n + N/2) = x(n + N/2) 0 ≤ n < N/2
Z(n) = x(n) + x(n + N/2) N/2 ≤ n < N
Z(n + N/2) = x(n)− x(n + N/2) N/2 ≤ n < N (3.1)
A physical implementation of the BF2I module is shown in Figure 3.2
Figure 3.2: BF2I Module [3]
21
The BF2II module is similar in operation, except it takes into account the trivial
twiddle factor multiplication of −j. The following equations describe the operations:
Z(n) = x(n) 0 ≤ n < M/4
Z(n + N/2) = x(n + N/2)
Zreal(n) = xreal(n) + ximag(n + N/2) N/4 ≤ n < N/2
Zimag(n) = ximag(n)− xreal(n + N/2)
Zreal(n + N/2) = xreal(n)− ximag(n + N/2)
Zimag(n + N/2) = ximag(n) + xreal(n + N/2)
Z(n) = x(n) N/2 ≤ n < 3N/4
Z(n + N/2) = x(n + N/2)
Z(n) = x(n) + x(n + N/2) 3N/4 ≤ n < M
Z(n + N/2) = x(n)− x(n + N/2) (3.2)
A physical implementation of the BF2II module is shown in Figure 3.3
Figure 3.3: BF2II Module [3]
22
3.3 Twiddle Factors
The generation of the twiddle factors, or complex roots of unity, for the FFT
will determine the amount of error between the VHDL implementation and real FFT
calculation in MATLAB. These values are precomputed and their binary representa-
tion is stored in a ROM. The bit width selected for the twiddle factors will control
the amount of error. Choosing a large bit width guarantees greater accuracy at a
cost of larger die area and higher power consumption. On the other hand, a small
bit width generates a smaller area and power consumption but larger error. In the
VHDL implementation, they are based on the module length M and the FFT length
N . The equation to calculate these values is
Wp(n) = e
−j2piq(n)
N , 0 ≤ p ≤ log4(N)− 2 (3.3)
expanding this into a trigonometric expression yields
Wp(n) = cos(
2piq(n)
N
)− jsin(
2piq(n)
N
), 0 ≤ p ≤ log4(N)− 2 (3.4)
where
q(n) = 0 ∗ 4p ∗ n
= 2 ∗ 4p ∗ (n−
Mp
4
)
= 1 ∗ 4p ∗ (n−
Mp
2
)
= 3 ∗ 4p ∗ (n−
3Mp
4
) (3.5)
The calculated twiddle factors range between 1 and −1. They are now scaled between
the max and min values based on the twiddle factor bit width. For example, if the
bit width is 10, the twiddle factor would be scaled to a value between 511 (29 − 1) to
23
Table 3.3: Wx Modules.
Wx Contains twiddle factors for lengths...
0 1024, 512
1 1024, 512, 256, 128
2 1024, 512, 256, 128, 64, 32
3 1024, 512, 256, 128, 64, 32, 16, 8
-512 (−29). The real and imaginary calculations are summarized as:
Real = round(cos(θ) ∗ (2TwiddleFactorBitWidth−1 − 1)) (3.6)
Imag = round(sin(θ) ∗ (2TwiddleFactorBitWidth−1 − 1)) (3.7)
Upon storing the twiddle factors in the ROM, they must be offset by 3M/4 samples
to ensure they are aligned with the first sample in the block. If the FFT is configured
for dynamic sizes, all possible twiddle factor ROMs must be made available. For
example, if N=16 and nc=2, FFTs of length 16 and 8 can be calculated. The twiddle
factors for N=16 are different than those for N=8. In this case, the module W3
contains both ROMs and a multiplexor is used to select the correct twiddle factors
based on the currently selected size. This is the case for all dynamic lengths between
1024 and 8. Note, length-4 FFTs do not incorporate twiddle factors. The W3 module
is configurable based on N and nc, so the minimal logic is created. Table 3.3 shows
the Wx modules and the ROMs they may contain. Figure 3.4 shows the schematic
for a W3 module for a 1024 length FFT with the number of dynamic choices equal
to 8. The output of each ROM is passed to the output using the 8-to-1 mux with the
size signal performing the selection. Smaller dynamic choices use smaller muxes to
conserve chip space.
The twiddle factors are generated by running a simulation on the ROMGener-
ate.vhd file. The configurable parameters are specified in the generic listing. This
24
Figure 3.4: W3 Module for a 1024 point dynamic FFT
code will generate the twiddle factors automatically and store them in a file called
TwiddleFactors.vhd. This file will need to be included in order for the entire design
to be elaborated and synthesized.
3.4 Timing
Timing is essential for all the components to work correctly. The timing con-
troller is simply a log2N -bit up counter. The timing signals are passed onto the
butterfly modules and the twiddle factor ROMs. This component also generates the
frame out signal designating the completion of the FFT calculation and the first re-
sult will appear on the q out line. The frame out signal is generated when the counter
“rolls over” to ’0’. Due to the dynamic portion of the hardware, the value at which
25
the counter rolls over depends on the length of the FFT and the value of the size
signal. If a 1024 point FFT is calculated, a 10-bit counter is used. The rollover value
would then be 1111111111. If the FFT is configured to be dynamic, the rollover value
is shifted to the right by the integer value of the size signal. For example, for a 1024
point FFT, a size signal of 001 would shift the rollover values to the right by one
producing a value of 0111111111 which corresponds to a 512 length FFT. Each stage
can handle two different FFT lengths, but because the control signals to each stage
are static, the timing controller will shift the count values left one bit based on the size
signal. If pipelining is enabled, the control signals and twiddle factor addresses will
also have to be pipelined. This is handled in a module called TimingPLR. Excluding
the first stage, pipeline registers are placed around the multipliers in each stage if
enabled. This placement of pipeline registers was chosen because the critical path,
or longest delay in a circuit, always passes through the multipliers. Placing pipeline
registers around the multipliers shortens the critical path, thus increasing the fre-
quency at which the FFT can operate. The timing signals for each subsequent stage
must then be delayed by two clock cycles to make sure they meet up with the correct
values in the computation, hence the two pipeline stages in the TimingPLR compo-
nent. Because the complex multipliers are sandwiched between pipeline registers, the
twiddle factor address must be delayed by one cycle initially, then by two cycles for
each subsequent stage. The TimingPLR module takes as an input the twiddle factor
address signal, but has two outputs for the address. The first output is delayed by
one cycle to be used in the current stage and the second is delayed by two cycles to
be used in the next stages. Figure 3.5 shows the timing controller and first set of
pipeline stages for a length-1024 FFT.
3.5 Stage 1
Figure 3.6(a) shows a diagram of the stage 1 component, and Figure 3.6(b)
shows the corresponding flow graph. The dashed lines correspond to complex data.
For this example a N/4 length such as 4, 16, 64, etc is assumed. Data enters the
26
Figure 3.5: Timing Controller and pipeline for a 1024 point dynamic FFT
first butterfly module at the x(n + M/2) input. The first two values are passed into
the 2-stage shift register. The third and fourth data point are “butterflied” and the
output is passed onto the fixed rounder. These represent the top two lines in the
second stage shown in Figure 3.6(b). The outputs to go into the bottom two lines are
put back into the shift register and held for two cycles until they can be placed into
the second butterfly module. The −j multiplication is build into the BF2II module.
This module will compute the last two 2-point DFTs and pass the output to the final
fixed rounder and q out in bit-reversed order. If the length were an N/2 length such
as 8, 32, 128, etc, then only BF2I would be used to compute a 2-point DFT and the
BF2II would be bypassed. The sign extend extends the bit width by one because
the BF2I and BF2II modules automatically increase the bit width by one during
operation.
3.6 Stage X
Figure 3.7 shows a diagram of the rest of the stages. These generic stages are
essentially the same except for the configurable parameters of the fixed rounder, the
shift registers, and Wx(n) modules. Operation is the same as in the first stage, but
27
Figure 3.6: Stage 1 of FFT and Flow Graph Comparison
this time the twiddle factors and a complex multiplier are included. If the FFT
is configured for pipelining, then pipeline registers are placed before and after the
complex multiplier.
Figure 3.7: Stage X of FFT
28
3.7 Completed Design
Connecting the timing controller and the different stages togethers is not as easy
as it sounds. Due to the configurable and dynamic nature of the FFT, all possible
scenarios must be considered while keeping the hardware usage at a minimum. As an
example, a 1024 length dynamic length (nc = 8) FFT is shown in Figure 3.8.
Figure 3.8: Layout of a 1024 length dynamic FFT
This configuration has data select logic to determine which stage the input data
should begin. For example, for a 1024 or 512 length the input data should enter stage
5. For a length 256 or 128, the data should bypass stage 5 and begin in stage 4.
29
In addition, the data select logic selects between using the d in data or the results
from the previous stage. This logic is essentially a mux (2 or 4 input mux with
the unused inputs grounded). The select bit(s) logic for the mux is generated using
Karnaugh maps and the size signal. If a stage is not to be used, then the inputs are
grounded to prevent usage of the elements within the stage, reducing power usage
and heat generation. The sign extender between each stage extends the bit width of
the input data, which is due to the bit growth of the butterfly modules and twiddle
factor multiplications. With a static 1024 length FFT, there would be no data select
logic blocks; instead the data would pass right through. Generate statements are
used heavily for this portion of FFT code. The structural definition of the code
begins by instantiating a timing controller. After that, each case is broken down by
the configured length. In each of these cases, all possible lengths (based on the nc
parameter) is broken down. Again, to prevent unnecessary chip space usage, generate
statements are used. If pipelining is configured, then pipeline registers are placed
to control the arrival time of the butterfly module control signals, twiddle factor
addresses, and the frame out signal. The logic for the pipeline controls whether the
pipeline is turned on or not. In the 1024/512 case, all pipeline registers should be
functioning. For the 256/128 case, the first set of pipeline registers should be turned
off so the signals arrive correctly. The logic for controlling this functionality is again
determined by Karnaugh maps.
3.8 Testing Procedure
There are two major areas on which testing will be performed. First, the Ca-
dence Encounter RTL compiler will be used to analyze the timing, power, and chip
area used by all the configurations of the FFT. For the timing analysis, all the pos-
sible lengths (4 to 1024) and all possible dynamic sizes (1 to 8) will be synthesized
for both the pipelined and non-pipelined versions. Power and cell usage analysis will
be performed with the same parameters, except only the pipelined version will be
examined. This will give an overall outlook on the configurable properties and what
30
effects it has. A PERL script, located in Appendix A, was written to perform these
tests. The script first will modify the configurable parameters in the FFT.vhd file.
Next, the Cadence RTL compiler is invoked via the command line with a synthesizing
script passed as an argument to the compiler. This script sets up the 90 nm library,
loads the VHDL files, synthesizes the design, and generates reports for the timing,
cell usage, and power consumption. This script is also located in Appendix A.
The next major area is error analysis. This will be performed using MATLAB
and Modelsim for simulations. The results from the MATLAB FFT function will be
compared to those of the VHDL FFT implementation. Even though there is negligi-
ble error in the MATLAB due to the IEEE 754 floating point format, the MATLAB
results will be the baseline for these tests. The twiddle factor bit width and it’s effect
on data will be analyzed. By varying the twiddle factor bit width, one can change the
amount of error in the VHDL FFT compared to that of the MATLAB FFT function.
The testing procedure for this is as follows: A cosine function is generated and the
data points are sampled and placed into a text file. The FFT VHDL testbench opens
this file and reads the data as inputs. The output is generated and placed into a
different text file. MATLAB then resumes and reads in the VHDL FFT output data.
A comparison of each output point is made between the MATLAB FFT function and
the VHDL FFT function. A stem plot is generated showing the % error between
both functions. In addition, statistics are generated for this set of data. This test is
performed on all FFT lengths with varying twiddle factor bit widths. The MATLAB
m-files which perform this testing are found in Appendix B. Additionally, a frequency
sweep test will also be examined. A frequency ranging from 0 Hz to 2.5 GHz in steps
of 10 MHz will be applied to the FFT. For each frequency step, a length-256 FFT will
be calculated both in MATLAB and the VHDL FFT. The average and max percent
error for a range of twiddle factor bit-widths and input bit-widths will be discussed.
31
3.9 Chapter Summary
The design of a configurable/dynamic FFT processor was discussed in this chap-
ter. The overall design was introduced and then broken down into many smaller
components. The design analysis was performed on each of these components, along
with the different configurations of each. Each subcomponent was thoroughly tested
in order to reduce the possible errors when assembling the final design. Additionally,
testing procedures were developed to test both the error in the FFT calculations and
the physical attributes such as power and timing.
32
IV. Analysis and Results
This chapter analyzes the results from the testing procedures discussed in theprevious chapter. The data results will be examined first in the error analysis
section. Next, the area and timing results from synthesis will be evaluated.
4.1 Error Analysis
4.1.1 Simple Cosine Curve. Due to the digital nature of the FFT algorithm
and the use of approximate values for the twiddle factors, there will be error in the
VHDL results as compared to those of the MATLAB FFT function. A comparison
of twiddle factors with bit-widths of 6, 8, 10 ,and 12 are examined. A summary is
shown in Figure 4.1 and figures 4.2 to 4.9 show detailed plots of the error between
the VHDL and MATLAB FFT functions for each point of the N-length FFT. The
error is calculated using the equation error = V HDL−MATLAB
MATLAB
∗ 100%. The analysis
is performed on absolute values of the real and imaginary data outputs. The input
bitwidth used is 10 bits. We will begin the analysis with the length-8 FFT as this is
the first length to use twiddle factors in the calculations. An FFT of length 4 does not
use twiddle factors, therefore there is no error between the VHDL and MATLAB data.
The stem-and-leaf plot in Figure 4.2 shows the percent error between the MATLAB
and VHDL values for each value of n. The results show relatively small error except
for n values of 6 and 7. This is due to the twiddle factor approximation. For the n=6
case, the exact twiddle factor would be (−cos(pi
4
),−j ∗ sin(pi
4
)). Expanding this out
yields a results of (−0.70710678,−0.70710678j). Because the twiddle factors in the
VHDL implementation are scaled integer values, the result is scaled from a range of
(−1, 1) up to a range of (−2tfbw−1−1, 2tfbw−1−1), where tfbw is the twiddle factor bit
width. For a twiddle factor bit width of 10, this range becomes (−511, 511). Scaling
the n=6 twiddle factors results in values of (−361.331565,−361.331565j). These
values are ultimately rounded to (−361,−361j). A negligible 2% error is introduced
with this rounding. Increasing the twiddle factor bit width will reduce this error, but
with an increase in chip area and power consumption. Figure 4.3 shows the plot of
the FFT data and error analysis for a length-16 FFT. The results are similar to that
33
of a length-8 FFT. With length-32 and length-64 FFTs, two sets of twiddle factors
are now used. Because of this, the error in the first set of twiddle factors is now
multiplied by using a second set of factors. This is evident in Figure 4.4 as the errors
for each twiddle factor length are now larger than in the length-8 and length-16 case.
This trend continues with Figures 4.5 - 4.9. The longer-length FFTs generate more
error in the data than the shorter ones. The smaller width twiddle factor bit width
used leads to a larger average error. Increasing the bit width from 6 to 8 shows a
large decrease in average error. Increasing the bit width further to 10 yields better
results, but beyond that the decrease in error is negligible.
Figure 4.1: Average % Error
4.1.2 Frequency Sweep. For the frequency sweep, input bit-widths of 8, 10,
12, 14, and 16 along with twiddle factor bit-widths of the same values will be analyzed.
The resulting frequency sweep produces a plot as shown in Figure 4.10. The average
percent error and max percent error data is shown in Figures 4.11 and 4.12. The
results for the maximum percent error show for a twiddle factor bit-width of 8 the
34
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
Twiddle Factor Bit Width =6
n
%
 
E
r
r
o
r
0 1 2 3 4 5 6 7 8 9
0
0.5
1
1.5
2
2.5
3
3.5
4
Twiddle Factor Bit Width =8
n
%
 
E
r
r
o
r
0 1 2 3 4 5 6 7 8 9
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Twiddle Factor Bit Width =10
n
%
 
E
r
r
o
r
0 1 2 3 4 5 6 7 8 9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Twiddle Factor Bit Width =12
n
%
 
E
r
r
o
r
Figure 4.2: Percent error between MATLAB and VHDL FFT functions for N=8 and various twiddle factor bit-widths
35
0 2 4 6 8 10 12 14 16
0
2
4
6
8
10
12
14
Twiddle Factor Bit Width =6
n
%
 
E
r
r
o
r
0 2 4 6 8 10 12 14 16
0
1
2
3
4
5
6
7
Twiddle Factor Bit Width =8
n
%
 
E
r
r
o
r
0 2 4 6 8 10 12 14 16
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Twiddle Factor Bit Width =10
n
%
 
E
r
r
o
r
0 2 4 6 8 10 12 14 16
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Twiddle Factor Bit Width =12
n
%
 
E
r
r
o
r
Figure 4.3: Percent error between MATLAB and VHDL FFT functions for N=16 and various twiddle factor bit-widths
36
0 5 10 15 20 25 30
0
100
200
300
400
500
600
700
Twiddle Factor Bit Width =6
n
%
 
E
r
r
o
r
0 5 10 15 20 25 30
0
50
100
150
200
250
Twiddle Factor Bit Width =8
n
%
 
E
r
r
o
r
0 5 10 15 20 25 30
0
5
10
15
20
25
Twiddle Factor Bit Width =10
n
%
 
E
r
r
o
r
0 5 10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Twiddle Factor Bit Width =12
n
%
 
E
r
r
o
r
Figure 4.4: Percent error between MATLAB and VHDL FFT functions for N=32 and various twiddle factor bit-widths
37
0 10 20 30 40 50 60
0
100
200
300
400
500
600
700
Twiddle Factor Bit Width =6
n
%
 
E
r
r
o
r
0 10 20 30 40 50 60
0
50
100
150
200
250
Twiddle Factor Bit Width =8
n
%
 
E
r
r
o
r
0 10 20 30 40 50 60
0
5
10
15
20
25
Twiddle Factor Bit Width =10
n
%
 
E
r
r
o
r
0 10 20 30 40 50 60
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Twiddle Factor Bit Width =12
n
%
 
E
r
r
o
r
Figure 4.5: Percent error between MATLAB and VHDL FFT functions for N=64 and various twiddle factor bit-widths
38
0 20 40 60 80 100 120
0
100
200
300
400
500
600
700
Twiddle Factor Bit Width =6
n
%
 
E
r
r
o
r
0 20 40 60 80 100 120
0
50
100
150
200
250
Twiddle Factor Bit Width =8
n
%
 
E
r
r
o
r
0 20 40 60 80 100 120
0
5
10
15
20
25
Twiddle Factor Bit Width =10
n
%
 
E
r
r
o
r
0 20 40 60 80 100 120
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Twiddle Factor Bit Width =12
n
%
 
E
r
r
o
r
Figure 4.6: Percent error between MATLAB and VHDL FFT functions for N=128 and various twiddle factor bit-widths
39
0 50 100 150 200 250
0
100
200
300
400
500
600
700
Twiddle Factor Bit Width =6
n
%
 
E
r
r
o
r
0 50 100 150 200 250
0
100
200
300
400
500
600
Twiddle Factor Bit Width =8
n
%
 
E
r
r
o
r
0 50 100 150 200 250
0
50
100
150
200
250
Twiddle Factor Bit Width =10
n
%
 
E
r
r
o
r
0 50 100 150 200 250
0
5
10
15
20
25
30
35
40
Twiddle Factor Bit Width =12
n
%
 
E
r
r
o
r
Figure 4.7: Percent error between MATLAB and VHDL FFT functions for N=256 and various twiddle factor bit-widths
40
0 50 100 150 200 250 300 350 400 450 500
0
200
400
600
800
1000
1200
1400
Twiddle Factor Bit Width =6
n
%
 
E
r
r
o
r
0 50 100 150 200 250 300 350 400 450 500
0
100
200
300
400
500
600
Twiddle Factor Bit Width =8
n
%
 
E
r
r
o
r
0 50 100 150 200 250 300 350 400 450 500
0
50
100
150
200
250
Twiddle Factor Bit Width =10
n
%
 
E
r
r
o
r
0 50 100 150 200 250 300 350 400 450 500
0
5
10
15
20
25
30
35
40
Twiddle Factor Bit Width =12
n
%
 
E
r
r
o
r
Figure 4.8: Percent error between MATLAB and VHDL FFT functions for N=512 and various twiddle factor bit-widths
41
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 104 Twiddle Factor Bit Width =6
n
%
 
E
r
r
o
r
0 100 200 300 400 500 600 700 800 900 1000
0
100
200
300
400
500
600
700
800
Twiddle Factor Bit Width =8
n
%
 
E
r
r
o
r
0 100 200 300 400 500 600 700 800 900 1000
0
200
400
600
800
1000
1200
Twiddle Factor Bit Width =10
n
%
 
E
r
r
o
r
0 100 200 300 400 500 600 700 800 900 1000
0
10
20
30
40
50
60
70
80
90
100
Twiddle Factor Bit Width =12
n
%
 
E
r
r
o
r
Figure 4.9: Percent error between MATLAB and VHDL FFT functions for N=1024 and various twiddle factor bit-widths
42
error remains the same no matter the input bit-width. This changes with twiddle
factor bit-widths of 12 and higher. With an increase in input bit-width, maximum
percent error drops for twiddle factor bit-widths of 10 - 16. The average percent error
decreases with either an increase in twiddle factor bit-width and/or input bit-width.
0 2 4 6 8 10 12
x 108
0
50
100
150
200
250
Frequency (Hz)
|X(
k)|
Figure 4.10: FFT Plot of Frequency Sweep for 0-2.5GHz
43
Figure 4.11: Average Error in Frequency Sweep
Figure 4.12: Maximum Error in Frequency Sweep
44
4.2 Timing Analysis
For this section and the following sections, an analysis will be performed on the
physical effects of an FFT processor which can calculate one length (fixed) versus one
which can calculate several lengths (dynamic). The critical path will be studied now.
A comparison between the non-pipelined and pipelined versions is shown, using both
a 350 nm and 90 nm technology. Using two different technology libraries will show
the scaling for the timing analysis. Table 4.1 shows the parameters of the analysis.
Figure 4.13 and 4.14 shows the results of the analysis. The y-axis shows the speed
Table 4.1: Parameters for Timing Analysis
Parameter Value
log2N 3 - 10
nc 1 - 8 (based on log2N)
Input Width 10
Twiddle Factor Bit Width 10
Pipelining Off/On
Software Cadence Encounter RTL Compiler
Cell Libraries TSMC High Performance General Purpose 90 nm
AMI 350 nm
in MHz, while the x-axis is divided up by the different lengths (8 - 1024). Each
of these is subdivided by the number of allowable dynamic sizes. A trend with the
non-pipelined version shows maximum frequencies are similar for lengths using the
same stages. For example, a length-128 and length-256 FFT use the same hardware,
therefore the speeds are similar. Another trend is the larger the length, the smaller
the maximum speed. This trend is due to large adders and multipliers which occur
because the bit width of the data passing through the FFT is not rounded or clipped.
Additionally, if a dynamic FFT is needed, any value of nc > 1 produces the same
results. The pipelined version shows a doubling in maximum speed. For a length-
8 FFT, speeds of approximately 450 MHz are obtained. On the other end of the
spectrum, a length-1024 FFT can run between 200 - 250 MHz. The staggering values
are due to various critical paths in the processor. The 90 nm version is approximately
6 times faster than the 350 nm.
45
Figure 4.13: Maximum frequency for pipelined and non-pipelined FFTs with input bitwidth=10 and twiddle factor
bitwidth=10 using the 350 nm technology.
46
Figure 4.14: Maximum frequency for pipelined and non-pipelined FFTs with input bitwidth=10 and twiddle factor
bitwidth=10 using the 90 nm technology.
47
4.3 Total Area Analysis
Table 4.2 shows the parameters for this testing. The total area needed for
each configuration are now discussed, using both the 350 nm and 90 nm libraries.
Figure 4.16 shows the results of the analysis. As expected, with each increase in
length the total area increases. Total area for lengths using the same stages are
similar, as in the case of the 32/64 lengths. The 90 nm version is approximately 44
times smaller in area than the 350 nm.
Table 4.2: Parameters for Total Area
Parameter Value
log2N 2 - 10
nc 1 - 8 (based on log2N)
Input Bit Width 10
Twiddle Factor Bit Width 10
Pipelining On
Software Cadence Encounter RTL Compiler
Cell Libraries TSMC High Performance General Purpose 90 nm
AMI 350 nm
48
(a)
(b)
Figure 4.15: Total area using 350 nm technology for log2N from (a) 2 to 8 (b) 9 to 10
49
(a)
(b)
Figure 4.16: Total area using 90 nm technology for log2N from (a) 2 to 8 (b) 9 to 10
50
4.4 Chapter Summary
This chapter analyzed the results of the design. The error analysis shows the
error resulting from different twiddle factor bit-widths compared to that of the FFT
function found in MATLAB. Longer-length FFTs generally encounter more error than
shorter lengths do. The frequency sweep shows how the input and twiddle factor bit-
widths affect maximum and average percent error. Additionally, the timing and total
area was analyzed for all possible configurations of the FFT processor. Increases in
hardware for longer-length FFTs provide for an increase in total die area. Small
changes are noticed with the increase in the dynamic size parameter nc.
51
V. Conclusions
5.1 Explanation of the Problem
The problem to be solved was to accomplish the characterization and imple-
mentation of a FFT using a fast and portable design strategy. By developing this
type of strategy a designer can create digital components for specific functions in a
matter of hours or days, as opposed to the conventional design flow which could take
weeks or months. Developing standard components which are both configurable and
dynamic and storing them in a library, will greatly decrease the development time for
producing VLSI components for digital radar applications.
5.2 Summary of Background
A review of a variety of book, article, and internet sources was performed in
order to understand, investigate, and verify the methods and previous technologies
that support this research. A brief overview of the Fourier Transform, DFT, and FFT
was discussed. Additionally, two algorithms in computing the FFT were examined.
One of the algorithms, the DIF, was used in the implementation. Hardware descriptor
languages, namely VHDL and Verilog, were discussed along with some pros/cons of
each. Lastly, two previous implementations were analyzed.
5.3 HDL Code Development: Significance, Limitations, and Further
Research
This research successfully demonstrated the use of a modular mixed signal VLSI
design approach. An example component, the FFT, was developed and demonstrated
for many types of configurations. The 90 nm technology library allowed a design to be
synthesized using a smaller area and power consumption, in addition to faster speeds.
In addition, the design was synthesized in a 350 nm library to show the scaling be-
tween the two technologies. The maximum speed at which this FFT processor can
run is greatly enhanced also. In [11] the author demonstrates a speed of 20 MHz with
a length-1024 FFT. In this research, speeds of approximately 225 MHz have been
simulated, a speedup of nearly 1100%! Compared to the Handel-C version by [20]
52
in which the author’s implementation can attain a speed of 82 MHz, this paper’s
design is 274% faster. Figure 5.1 shows a comparison between this implementation
and two other implementations [14] [15] on FPGAs. Although maximum frequency
comparisons on similar hardware vary between +/- 10%, this implementation has its
strength in being modular and portable. The VHDL is portable and self-contained,
as the twiddle factors are generated from a source VHDL file. Additionally, because
this function will be placed into a library, it is customizeable and dynamic. Before
synthesizing, a designer can modify the FFT to be used in any type of project. Also,
the dynamic properties of this FFT allow it to calculate different length FFTs during
run-time with the simple modification of one signal.
Initially, designing configurable and dynamic components is a lengthy process.
The total lines of VHDL code for the entire design is well over 11,000. All possible
scenarios must be accounted for and tested. For this implementation, there are 44
possible configurations for max length and dynamic lengths. Adding pipelining op-
tions doubles this number to 88, which leads to a lengthy testing process. Once this
is complete though, this design can be tailored to almost any specific need.
As with any type of computing device, there are several areas of research which
can be expanded on to improve the FFT implementation. Several key areas to explore
include expanding the 1024 length limit. With FFTs, the longer the length the more
accurate the signal representations in the frequency domain. Also, a combination of
the DIT and DIF algorithms will briefly be discussed as this will decrease the number
of calculations needed.
5.3.1 Expanding Beyond the 1024-point Limit. Due to the modularity of
the design, extending the maximum N value past 1024 is not difficult. A listing of
the components which would need modification are listed below:
ROMGenerate.vhd The twiddle factors W0 to W3 were referenced with W0 being
the twiddle factor for the highest number stage (i.e. stage 5) in the design and
W3 being the factors for the second stage (the first stage does not use twiddle
53
(a) (b)
Figure 5.1: Previous implementation comparisons
factors). This arranging of the variables is due to the way they are presented
in the DIT and DIF algorithms. With this being said, to extend past 1024 the
twiddle factor variables must be shifted. For example, if one wanted to use a
max length of 4096, the twiddle factors would range from W0 to W4 with the
new stage 6 using W0. The only changes needed for the ROMGenerate.vhd code
would be to the main function. Here, one would change the Mp array variable,
the p for loop, and the Mp calculation.
FFT.vhd To add 2048 as a possible length, a new generate statement will be needed
(i.e. Neq2048 ). By following the previous size implementation, it is easy to
see the pattern. All the possible nc choices will have to be covered also. This
allows the minimal components necessary for each value of nc. It is helpful to
create a drawing similar to the completed design layout shown in Chapter II to
implement the necessary muxes and sign extenders. Karnaugh maps are very
useful here.
Components.vhd To follow along with the design by He and Torkelson [10], the
twiddle factors are numbered from 0 to 3 going from left to right in the de-
sign. To accommodate a larger length, these values must be ’shifted’ to the
left. Renumbering the twiddle factors in this module and adding a new W4
component will work.
54
All other components are modular and do not need any modifications.
5.3.2 Implementation of a Decimation-In-Time-Frequency Algorithm. Ali
Saidi developed an algorithm which he claims reduces the number of real multipli-
cations and additions [17]. The algorithm is called “Decimation-In-Time-Frequency
(DITF) FFT Algorithm.” This reduces the arithmetic complexity while using the
same structure as the conventional Cooley-Tukey FFT algorithm. He extended the
algorithm to the radix-2 FFT implemented in this research. The author explains the
heart of the DITF algorithm is based on this observation: in the DIF algorithm most
of the calculations are performed in the early stages of the algorithm while in the DIT
algorithm most of the calculations are done in the final stages of the algorithm [17].
The author proposes starting with the DIT FFT algorithm and then switching to
the DIF FFT algorithm as some intermediate stage will decrease the amount of com-
putations needed. The flow graph in Figure 5.2 illustrates a 32-point DITF FFT
algorithm. The cost of the transition from DIT to DIF and the savings due to this
transition vary depending on the stage at which the algorithms switch. An analysis
is performed by the author in this article [17].
Table 5.1 shows the number of real multiplies for several lengths (N) for both the
Radix-2 Cooley-Tukey and the DITF algorithm, along with several other algorithms.
The data verifies the number of multiplications is smaller for the DITF algorithm,
especially for larger lengths. By decreasing the number of operations necessary to
compute the FFT, the calculation overall will be performed faster.
55
Figure 5.2: 32-Point DITF FFT Flow Graph [17]
Table 5.1: Number of real multiplies for complex FFT algorithms. [17]
Size Split Radix-2 Radix-2 Radix-4 Radix-4 Radix-8 Radix-8
M N RADIX CT DITF CT DITF CT DITF
3 8 4 4 4 N/A N/A N/A N/A
4 16 24 28 24 24 24 N/A N/A
5 32 84 108 88 N/A N/A N/A N/A
6 64 248 332 248 264 264 248 248
7 128 660 908 696 N/A N/A N/A N/A
8 256 1656 2316 1784 1800 1656 N/A N/A
9 512 3988 5644 4472 N/A N/A 3992 3992
10 1024 9336 13324 10744 10248 9528 N/A N/A
11 2048 21396 30732 25336 N/A N/A N/A N/A
12 4096 48248 69644 58360 53256 49656 48280 47608
56
Appendix A. FFT Synthesis Testings Scripts
Listing A.1: PERL Synthesis Script
#!/ usr / bin/ pe r l −w
# Make sure the f i l e s we need e x i s t . . .
i f (!−e " F F T . v h d " ) {
print " M i s s i n g F F T . v h d \ n " ;
exit ( ) ;
}
i f (!−e " F F T C o m p o n e n t s . v h d " ) {
print " M i s s i n g F F T C o m p o n e n t s . v h d \ n " ;
exit ( ) ;
}
i f (!−e " R O M G e n e r a t e . c l a s s " ) {
print " M i s s i n g R O M G e n e r a t e . c l a s s \ n " ;
exit ( ) ;
}
i f (!−e " F F T . c m d " ) {
print " M i s s i n g F F T . c m d \ n " ;
exit ( ) ;
}
# For th i s s e t o f t e s t s , f i x the input bitwidth and
# the twiddle f a c t o r bitwidth to 1 0 b i t s each .
$ in width = 10 ;
$ t f w idth = 10 ;
# Loop through a l l p o s s i b l e l engths . . .
for ( $log2N = 2 ; $log2N <= 10 ; $log2N++) {
# determine the max nc value .
$maxNC = $log2N − 1;
i f ( $log2N == 10) {
$maxNC = 8;
}
# loop through a l l p o s s i b l e nc va lue s
for ( $nc = 1 ; $nc <= $maxNC ; $nc++) {
print " S y n t h e s i z i n g $ l o g 2 N $ n c $ t f _ w i d t h $ i n _ w i d t h \ n " ;
# ca l c u l a t e the output width , which i s based on
# log2N , input width , and t f w idth
$temp = $log2N + ( $log2N % 2) ;
$out width = $in width + $temp ∗ 1 + ( $temp/2 − 1) ∗ $t f w idth ;
# Copy FFT template . vhd to FFT. vhd
‘ cp FFT template . vhd FFT. vhd ‘ ;
# Modify the parameters o f the FFT. vhd f i l e . . .
print " M o d i f y i n g F F T . v h d . . . \ n " ;
‘ p e r l −pi −e ’ s /# i n _ w i d t h / $ i n _ w i d t h / g ’ FFT. vhd ‘ ;
‘ pe r l −pi −e ’ s /# o u t _ w i d t h / $ o u t _ w i d t h / g ’ FFT. vhd ‘ ;
‘ pe r l −pi −e ’ s /# t f _ w i d t h / $ t f _ w i d t h / g ’ FFT. vhd ‘ ;
‘ pe r l −pi −e ’ s /# nc / $ n c / g ’ FFT. vhd ‘ ;
‘ pe r l −pi −e ’ s /# l o g b a s e 2 N / $ l o g 2 N / g ’ FFT. vhd ‘ ;
‘ pe r l −pi −e ’ s /# p i p e l i n e / Y E S / g ’ FFT. vhd ‘ ;
# Generate the twiddle f a c t o r s
print " G e n e r a t i n g T w i d d l e F a c t o r s . v h d . . . \ n " ;
‘˜/ bin / java ROMGenerate $log2N $nc $t f w idth vhdl ‘ ;
57
# I f log2N = 2 ( l ength 4 ) c r ea t e a blank TwiddleFactors . vhdl
# so Cadence won ’ t choke
i f ( $log2N == 2) {
‘ touch TwiddleFactors . vhd ‘ ;
}
# execute RTL s c r i p t . . .
print " R u n n i n g s y n t h e s i s s c r i p t . . . \ n " ;
‘ rc − f i l e s FFT. cmd ‘ ;
# copy output f i l e s to s p e c i f i c f i l e
print " C o p y i n g r e s u l t s to s y n t h _ r e s u l t s d i r e c t o r y . . . \ n " ;
$ t iming f i l ename=join ’ ’ , ’ t i m i n g _ ’ , $log2N , ’ _ ’ , $nc , ’ _ P I P E . t x t ’ ;
$a r ea f i l ename=join ’ ’ , ’ a r e a _ ’ , $log2N , ’ _ ’ , $nc , ’ _ P I P E . t x t ’ ;
$power f i l ename=join ’ ’ , ’ p o w e r _ ’ , $log2N , ’ _ ’ , $nc , ’ _ P I P E . t x t ’ ;
‘mv t iming . txt s y n th r e s u l t s / $ t iming f i l ename ‘ ;
‘mv area . txt s y n th r e s u l t s / $a rea f i l ename ‘ ;
‘mv power . txt s yn th r e s u l t s / $power f i l ename ‘ ;
}
}
58
Listing A.2: Cadence Synthesis Script
# Cadence RTL Compiler (RC)
# ver s i o n 05.20−p002 (32− b i t ) bu i l t Nov 28 2005
#
# Run with the f o l l ow i ng arguments :
# − l o g f i l e rc . l og
# −cmdf i l e rc . cmd
# setup the l i b r a r y search path to the 90nm l i b r a r i e s from TSMC
s e t a t t r i bu t e l i b s e a r ch pa t h /home/ a f i t e n 3 /gce07m/bbrakus / l i b r a r i e s /TSMCHOME/ d i g i t a l /Front End/...
t iming power/ tcbn90ghp 150a
# setup the hdl search paths to the cur r ent d i r e c t o r y
s e t a t t r i bu t e hd l s ea r ch path .
# load one o f the 90nm l i b r a r i e s
s e t a t t r i bu t e l i b r a r y tcbn90ghpbc . l i b
# read a l l the vhdl f i l e s
r ead hd l −vhdl TwiddleFactors . vhd FFTComponents . vhd FFT. vhd
# compi le and check f o r e r r o r s
e l a bo ra t e FFT
# synthe s i z e the des ign
synthe s i z e −to mapped FFT
# cr ea t e r epo r t s and save them to the cur r ent d i r e c t o r y
repor t t iming > t iming . txt
r epor t area > area . txt
r epor t power > power . txt
qu i t
59
Appendix B. FFT Error Analysis Testings Scripts
Listing B.1: MATLAB Error Analysis Script
function [ ] = TestFFTerror ( log2length , input b i tw idth )
%TestFFTerror Generates t e s t input and FFT r e s u l t s data
% TestFFTerror ( log2length , input b i tw idth )
% log2 l ength = log base 2 o f FFT length
% input b i tw idth = bitwidth o f input data
length = 2ˆ log2 l ength ;
i n s c a l e = 2ˆ( input bitwidth −1)−1;
n = [ 0 :2 9 ] ;
data=cos (2∗pi∗n/10) ;
data=round( data ∗ i n s c a l e ) ;
% open f i l e to s t o r e input data
i n i d=fopen ( ’ i n p u t _ d a t a . t x t ’ , ’ wt ’ ) ;
i f ( i n i d == −1)
error ( ’ c a n n o t o p e n f i l e f o r w r i t i n g ’ ) ;
end
% sto r e input data in f i l e . . .
for j =1:30
fpr int f ( in id , ’ % d \ n ’ , real ( data ( j ) ) ) ;
fpr int f ( in id , ’ % d \ n ’ , imag( data ( j ) ) ) ;
end
for j =31: length
fpr int f ( in id , ’ 0\ n ’ ) ;
fpr int f ( in id , ’ 0\ n ’ ) ;
end
fc lose ( i n i d ) ;
for tfbw =6:2 :12
% s c a l e f a c t o r
t f s c a l e = 2ˆ( tfbw−1)−1;
% ca l c u l a t e FFT of data . . .
matlab=( f f t ( data , length ) ) ;
% s c a l e data based on l ength . . .
i f ( length == 8 | | length == 16)
matlab = matlab ∗ t f s c a l e ˆ1 ;
e l s e i f ( length == 32 | | length == 64)
matlab = matlab ∗ t f s c a l e ˆ2 ;
e l s e i f ( length == 128 | | length == 256)
matlab = matlab ∗ t f s c a l e ˆ3 ;
e l s e i f ( length == 512 | | length == 1024)
matlab = matlab ∗ t f s c a l e ˆ4 ;
end
%b it r e v e r s e data . . .
rev=zeros (1 , length ) ;
for j =0: length−1
b in s t r = dec2bin ( j , log2( length ) ) ;
b in s t r = f l i p l r ( b in s t r ) ;
b i t r e v = bin2dec ( b in s t r ) ;
rev ( b i t r e v +1) = matlab ( j +1) ;
end
% open f i l e to s t o r e matlab FFT data
f f t f i l e n am e = s t r c a t ( ’ f f t _ m a t l a b _ ’ , num2str( length ) , ’ _ ’ , num2str( tfbw ) , ’ . t x t ’ ) ;
60
f f t i d=fopen ( f f t f i l e name , ’ wt ’ ) ;
i f ( f f t i d == −1)
error ( ’ c a n n o t o p e n f i l e f o r w r i t i n g ’ ) ;
end
% sto r e FFT data in f i l e . . .
for j =1: length
fpr int f ( f f t i d , ’ % d % d \ n ’ , round( real ( rev ( j ) ) ) , round( imag( rev ( j ) ) ) ) ;
end
fc lose ( f f t i d ) ;
disp ( ’ R u n t h e M o d e l s i m s i m u l a t o r to g e n e r a t e V H D L d a t a u s i n g t h e f o l l o w i n g p a r a m e t e r s : ’ ) ;
d i s p s t r=s t r c a t ( ’ i n p u t _ w i d t h = ’ , num2str( input b i tw idth ) ) ;
disp ( d i s p s t r ) ;
temp = log2 l ength + mod( log2length ,2 ) ;
output width = input b i tw idth + temp ∗1 + ( temp/2 −1)∗tfbw ;
d i s p s t r=s t r c a t ( ’ o u t p u t _ w i d t h = ’ , num2str( output width ) ) ;
disp ( d i s p s t r ) ;
d i s p s t r=s t r c a t ( ’ t f b w = ’ , num2str( tfbw ) ) ;
disp ( d i s p s t r ) ;
d i s p s t r=s t r c a t ( ’ l o g 2 N = ’ ,num2str( l og2 l ength ) ) ;
disp ( d i s p s t r ) ;
disp ( ’ ’ ) ;
disp ( ’ P r e s s a n y k e y w h e n d o n e . . . ’ ) ;
pause ;
% rename the generated VHDL FFT data
vhd l f i l ename = s t r c a t ( ’ f f t _ v h d l _ ’ ,num2str( length ) , ’ _ ’ , num2str( tfbw ) , ’ . t x t ’ ) ;
move f i l e ( ’ f f t _ v h d l . t x t ’ , vhd l f i l ename) ;
end
CompareDataStem( log2 l ength ) ;
61
Listing B.2: MATLAB Stem Plot Script
function [ ] = CompareDataStem( log2N )
%CompareDataStem Compares e r r o r r e s u l t s and produces a stem plo t
% Deta i l ed exp lanat i on goes here
n=2ˆlog2N ;
for tfbw =6:2 :12
matlabFilename=s t r c a t ( ’ f f t _ m a t l a b _ ’ , int2str (n ) , ’ _ ’ , int2str ( tfbw ) , ’ . t x t ’ ) ;
vhdlFilename=s t r c a t ( ’ f f t _ v h d l _ ’ , int2str (n ) , ’ _ ’ , int2str ( tfbw ) , ’ . t x t ’ ) ;
matlabID = fopen ( matlabFilename , ’ r ’ ) ;
d i f fF i l ename=s t r c a t ( ’ f f t _ d i f f _ ’ , int2str (n) , ’ _ ’ , int2str ( tfbw ) , ’ . t x t ’ ) ;
vhdlID = fopen ( vhdlFilename , ’ r ’ ) ;
d i f f ID = fopen ( d i f fF i l ename , ’ wt ’ ) ;
i f ( d i f f ID == −1)
error ( ’ c a n n o t o p e n f i l e f o r w r i t i n g ’ ) ;
end
matlabTHM = fscanf (matlabID , ’ % f % f ’ , [ 2 i n f ] ) ;
vhdlTHM = fscanf ( vhdlID , ’ % f % f ’ , [ 2 i n f ] ) ;
matlabTHM=matlabTHM ’ ;
vhdlTHM=vhdlTHM’ ;
matlabRE = (matlabTHM( : , 1 ) ) ;
matlabIM = (matlabTHM( : , 2 ) ) ;
vhdlRE = (vhdlTHM( : , 1 ) ) ;
vhdlIM = (vhdlTHM( : , 2 ) ) ;
n=length (matlabRE ) ;
r eD i f f=zeros (1 ,n ) ;
imDif f=zeros (1 ,n ) ;
for i =1:n
i f (matlabRE ( i ) ˜= 0 && matlabIM ( i ) ˜= 0)
r eD i f f ( i ) = (100∗( vhdlRE ( i )−matlabRE ( i ) ) /matlabRE ( i ) ) ;
imDif f ( i ) = (100∗( vhdlIM( i )−matlabIM ( i ) ) /matlabIM ( i ) ) ;
else
r eD i f f ( i ) = 0 ;
imDif f ( i ) = 0 ;
end
fpr int f ( d i f f ID , ’ % f % f \ n ’ , r eD i f f ( i ) , imDif f ( i ) ) ;
end
reMaxError = max( r eD i f f ) ;
reMinError = min( r eD i f f ) ;
imMaxError = max( imDif f ) ;
imMinError = min( imDif f ) ;
t i t l e s t r i n g=s t r c a t ( ’ T w i d d l e F a c t o r B i t W i d t h = ’ , int2str ( tfbw ) ) ;
subplot ( 2 , 2 , ( tfbw−4)/2) ;
i f ( ( tfbw−4)/2 == 1)
stem ( [ 1 : n ] , r eD i f f , ’ bx ’ ) , xlim ( [ 0 n+1]) ;
hold on
stem ( [ 1 : n ] , imDiff , ’ r + ’ ) , t i t l e ( t i t l e s t r i n g ) , xlabel ( ’ n ’ ) , ylabel ( ’ % E r r o r ’ ) , xlim ( [ 0 n+1]) ;
e l s e i f ( ( tfbw−4)/2 == 2)
stem ( [ 1 : n ] , r eD i f f , ’ bx ’ ) ; , xl im ( [ 0 n+1]) ;
hold on
stem ( [ 1 : n ] , imDiff , ’ r + ’ ) , t i t l e ( t i t l e s t r i n g ) , xlabel ( ’ n ’ ) , ylabel ( ’ % E r r o r ’ ) , xlim ( [ 0 n+1]) ;
e l s e i f ( ( tfbw−4)/2 == 3)
stem ( [ 1 : n ] , r eD i f f , ’ bx ’ ) ; , xl im ( [ 0 n+1]) ;
hold on
62
stem ( [ 1 : n ] , imDiff , ’ r + ’ ) , t i t l e ( t i t l e s t r i n g ) , xlabel ( ’ n ’ ) , ylabel ( ’ % E r r o r ’ ) ; , xl im ( [ 0 n+1]) ;
else
stem ( [ 1 : n ] , r eD i f f , ’ bx ’ ) ; , xl im ( [ 0 n+1]) ;
hold on
stem ( [ 1 : n ] , imDiff , ’ r + ’ ) , t i t l e ( t i t l e s t r i n g ) , xlabel ( ’ n ’ ) , ylabel ( ’ % E r r o r ’ ) , xlim ( [ 0 n+1]) ;
end
fc lose ( matlabID ) ;
f c lose ( vhdlID ) ;
f c lose ( d i f f ID ) ;
% open f i l e to s t o r e e r r o r data
e r r o r f i l e n ame = s t r c a t ( ’ i m a g e s \ N ’ , num2str(n) , ’ TF ’ , num2str( tfbw ) , ’ e r r o r . t x t ’ ) ;
e r r o r i d=fopen ( e r ro r f i l ename , ’ wt ’ ) ;
i f ( e r r o r i d == −1)
error ( ’ c a n n o t o p e n f i l e f o r w r i t i n g ’ ) ;
end
reMaxError = max(abs( r eD i f f ) ) ;
reAveError = average ( r eD i f f ) ;
reStdDev = std ( r eD i f f ) ;
imMaxError = max(abs( imDif f ) ) ;
imAveError = average ( imDif f ) ;
imStdDev = std ( imDif f ) ;
fpr int f ( e r r o r i d , ’ M a x r e a l e r r o r = % f \ n ’ , reMaxError ) ;
fpr int f ( e r r o r i d , ’ A v e r a g e r e a l e r r o r = % f \ n ’ , reAveError ) ;
fpr int f ( e r r o r i d , ’ S t a n d a r d d e v i a t i o n of r e a l = % f \ n ’ , reStdDev ) ;
fpr int f ( e r r o r i d , ’ M a x i m a g e r r o r = % f \ n ’ , imMaxError ) ;
fpr int f ( e r r o r i d , ’ A v e r a g e i m a g e r r o r = % f \ n ’ , imAveError) ;
fpr int f ( e r r o r i d , ’ S t a n d a r d d e v i a t i o n of i m a g = % f \ n ’ , imStdDev ) ;
f c lose ( e r r o r i d ) ;
end
63
Bibliography
1. “Celoxica”. URL www.celoxica.com.
2. Arfken, G. Mathematical Methods for Physicists. Academic Press, Orlando, FL,
3rd edition, 1985.
3. Boeing. Boeing MSP Macro Databook.
4. Bracewell, R. The Fourier Transform and Its Applications. McGraw-Hill, New
York, NY, 3rd edition, 1999.
5. Brigham, E. O. The Fast Fourier Transform and Applications. Prentice Hall,
Englewood Cliffs, NJ, 1988.
6. Chinnery, DG and K. Keutzer. “Closing the gap between ASIC and custom:
an ASIC perspective”. Design Automation Conference, 2000. Proceedings 2000.
37th, 637–642, 2000.
7. Cooley, J.W. and J.W. Tukey. “An Algorithm for the Machine Calculation of
Complex Fourier Series”. Mathematics of Computation, 19(90):297–301, 1965.
8. Fabey, Michael. “Revolutionizing Electronic Warfare”. Defense News, 8 2005.
9. Gentleman, W.W. and G. Sande. “Fast Fourier transform for fun and profit. Prec.
AFIPS 1966 Fall Joint Comput. Conf., Vol. 29”.
10. He, S. and M. Torkelson. “A New Approach to Pipeline FFT Processor”. Proc.
IEEE Int. Parallel Processing Symp, 766–770, 1996.
11. He, S. and M. Torkelson. “Design and implementation of a 1024-point pipeline
FFT processor”. Custom Integrated Circuits Conference, 1998., Proceedings of
the IEEE 1998, 131–134, 1998.
12. Heo, KL, JH Baek, MH Sunwoo, BG Jo, and BS Son. “New in-place strategy
for a mixed-radix FFT processor”. SOC Conference, 2003. Proceedings. IEEE
International [Systems-on-Chip], 81–84, 2003.
13. Lee, S.Y. and C.C. Chen. “VLSI implementation of programmable FFT archi-
tectures for OFDM communication system”. International Conference On Com-
munications And Mobile Computing, 893–898, 2006.
14. Melnyk, A., B. Dunets, I. Ltd, and U. Lviv. “FFT Processor IP Cores synthesis
on the base of configurable pipeline architecture”. CAD Systems in Microelec-
tronics, 2003. CADSM 2003. Proceedings of the 7th International Conference.
The Experience of Designing and Application of, 211–213, 2003.
15. Potipantong, P., T. Wiangtong, P. Sirisuk, and A. Worapishet. “A scaleable
FFT/IFFT kernel for communication systems using codesign approach”. Field-
Programmable Technology, 2005. Proceedings. 2005 IEEE International Confer-
ence on, 329–330, 2005.
64
16. Press, W. H., B. P. Flannert, S. A. Teukolsky, and W. T. Vetterling. Numerical
Recipes in FORTRAN: The Art of Scientific Computing. Cambridge University
Press, Cambridge, England, 2nd edition, 1989.
17. Saidi, A. “Decimation-in-time-frequency FFT algorithm”. Acoustics, Speech, and
Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on,
3, 1994.
18. Soliman, Samir S. and Mandyam D. Srinath. Continuous and Discrete Signals
and Systems. Prentice Hall, Englewood Cliffs, NJ, first edition, 1990.
19. Stoer, J. and R Bulirsch. Introduction to Numerical Analysis. Springer-Verlag,
New York, NY, 1980.
20. Sukhsawas, S. and K. Benkrid. “A high-level implementation of a high per-
formance pipeline FFT on Virtex-E FPGAs”. VLSI, 2004. Proceedings. IEEE
Computer society Annual Symposium on, 229–232, 2004.
21. Tsui, James. Digital Techniques for Wideband Receivers. Artech House, Norwood,
MA, second edition, 2001.
22. Uzun, IS, A. Amira, and A. Bouridane. “FPGA implementations of fast Fourier
transforms for real-time signal and image processing”. Vision, Image and Signal
Processing, IEE Proceedings-, 152(3):283–296, 2005.
23. Zhao, Y., AT Erdogan, and T. Arslan. “A novel low-power reconfigurable FFT
processor”. Circuits and Systems, 2005. ISCAS 2005. IEEE International Sym-
posium on, 41–44, 2005.
65
Vita
Captain Brian Brakus graduated from Hoover High School in North Canton, OH
in 1993. He attended the University of Akron as a Computer Engineering student, and
graduated in 1999 with a BS in Computer Engineering. After working several civilian
jobs for a few years, Captain Brakus joined the US Air Force and was commissioned
as an officer in September 2002 through the Officer’s Training School.
Capt. Brakus’ first assignment was at the National Air and Space Intelligence Center
(NASIC) at Wright-Patterson Air Force Base, OH. His following assignment was at the
Air Force Institute of Technology (AFIT) to complete a Master’s degree in Computer
Engineering. He will graduate in March of 2007.
66
REPORT DOCUMENTATION PAGE Form ApprovedOMB No. 0704–0188
The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including
suggestions for reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704–0188), 1215 Jefferson Davis Highway,
Suite 1204, Arlington, VA 22202–4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection
of information if it does not display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.
1. REPORT DATE (DD–MM–YYYY) 2. REPORT TYPE 3. DATES COVERED (From — To)
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
6. AUTHOR(S)
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT
NUMBER
9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT
NUMBER(S)
12. DISTRIBUTION / AVAILABILITY STATEMENT
13. SUPPLEMENTARY NOTES
14. ABSTRACT
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF:
a. REPORT b. ABSTRACT c. THIS PAGE
17. LIMITATION OF
ABSTRACT
18. NUMBER
OF
PAGES
19a. NAME OF RESPONSIBLE PERSON
19b. TELEPHONE NUMBER (include area code)
Standard Form 298 (Rev. 8–98)
Prescribed by ANSI Std. Z39.18
06–03–2007 Master’s Thesis Sept 2005 — Mar 2007
A Modular Mixed Signal
VLSI Design Approach
for
Digital Radar Applications
DACA99–99–C–9999
Brian M. Brakus, Capt, USAF
Air Force Institute of Technology
Graduate School of Engineering and Management
2950 Hobson Way
WPAFB OH 45433-7765
AFIT/GCE/ENG/07-02
Dr. Greg Creech, (937)255-4831, email: Greg.Creech@afit.edu
AFRL/SND
2241 Avionics Circle
WPAFB, OH 45433
Approval for public release; distribution is unlimited.
This study explores the idea of building a library of VHDL configurable components for use in digital radar applications.
Configurable components allows a designer to choose which components he or she needs and configures those components
for a specific application. By doing this, design time for ASICs and FPGAs is shortened because the components are
already designed and tested. This idea is demonstrated with a configurable dynamic pipelinable fast fourier transform.
Many FFT implementations exist, but this implementation is both configurable and dynamic. Pre-synthesis
customization allows the FFT to be tailored to almost any DSP application, and the dynamic property allows the FFT
to calculate different length FFTs run-time. Three objectives will be accomplished: design and characterization of the
aforementioned FFT; analysis of the error involved in the FFT calculation using different twiddle factor bit widths; and
finally an analysis of all the configurations for the synthesized design using a 90nm technology library. Speeds of up to
225 MHz have been simulated for a length-1024 FFT using the 90 nm technology.
FFT, Twiddle Factor, Modular, Library, VHDL, ASIC
U U U UU 79
Dr. Yong C. Kim, PhD(ENG)
(937) 255–3636; email: yong.kim@afit.edu
