This paper presents design and implementation of digital receiver based on large point fast Fourier transform (FFT) suitable for electronic warfare (EW) applications. When implementing the FFT algorithm on field-programmable gate array (FPGA) platforms, the primary goal is to maximize throughput and minimize area. This algorithm adopts two-dimension, parallel and pipeline stream mode and implements the reconfiguration of FFT's points. Moreover, a double-sequence-separation FFT algorithm has been implemented in order to achieve faster real time processing in broadband digital receivers. The performance of the hardware implementation on the FPGA platforms of broadband digital receivers has been analyzed in depth. It reaches the requirement of high-speed digital signal processing, and reveals the designing this kind of digital signal processing systems on FPGA platforms.
Introduction
As current ultra deep submicron technologies developing, the design process of digital systems becomes more complicated, especially for application specific integrated circuit (ASIC). Contrasting with ASICs, FPGAs offer significant advantages at a suitable low cost. First, the designers can modify the implementations at any time. Second, the verification of a design mapped into an FPGA is very simple. Finally, the performance they can provide is better than general purpose CPUs or DSP, even though they are not as efficient as ASICs in terms of performance, area or power. So it makes FPGAs an attractive choice for offload computationally intensive digital signal processing functions from the processors [1] .
In electronic warfare digital receivers, wideband frequency coverage, high sensitivity and dynamic range, high probability of intercept, simultaneous signal detection, excellent frequency resolution, and real time operation, are our requirements [2] - [6] . The wideband digital receivers mainly based on FFT, require intensive computation for real time applications. We can realize high-speed and real-time digital signal process on one chip FPGA, then the system throughput of many signal processing algorithms can be improved by exploiting concurrency in the form of parallelism and pipelining [7] - [11] .
In this paper, we present an in-depth study on the hardware implementation of a large point reconfigured FFT. The pipelined and parallel architectures of this FFT are suitable for wideband EW digital receivers. The maximum sequence length of FFT can be 1M (2 20 ) points. Our purpose is to provide not only the hardware FFT architectures on one FPGA chip, but also the analysis of important parameters involved in hardware implementation on FPGAs, and to study the performance of the system. The design concept of this paper can be applied to other fields, such as intelligent transport systems [12] [13] , radar signal processing systems.
The FFT Algorithm

Holistic design principle
For wideband signal process, we consider the signals process speed firstly. The array structure's size is too larger to realize and the recursion structure's process speed is too slow to reach real processing demand of the system. Although the parallel structure can obtain the high speed, it uses more memory units and operation units so that it can't be realized on one FPGA chip.
Bergland presents a two-dimension parallel arithmetic of FFT and this arithmetic can transform a long FFT to several short FFT so that it can reduce the operation amounts. For N=L×M, the N points DFT that we need to do is shown followed:
where, X(k) is a frequency spectrum of input x(n),
twiddle factor of DFT. On the assumption that N is a complex number, and N= L×M, then n and k can be expressed by followed formula separately:
We can obtain the following formula:
By this formula, we can obtain the method of transforming the one-dimension large points FFT to two-dimension small points sub FFT: Support N=1024×4 n , n=0,1,2,3,4,5 Here we use radix-4 FFT arithmetic to transform one-dimension process to two-dimension process.
Step1: To rearrange data. Here N-points data are rearranged to 1024×4 n format. Step2: To do radix-4 FFT for every column data. It means that we should do 4 n times 1024 point's onedimension FFT.
Step3: To multiply the output of the column transformation by twiddle-factors, then save the middle data to the array which row is 4n and column is 1024.
Step4: To do radix-4 FFT for middle array's every row. It means that we should do 1024 times 4 n point's one-dimension FFT.
Step5: To order the sequence by norm order or reverse order.
Because we choose the points to 1024×4 n , here n=0,1,2,3,4,5, the column transformation is L=1024 point's FFT. The point of column transformation is fixed. The row transformation is M=4 n point's FFT. The point of row transformation is variable.
Based on the analysis above, the system is made up of two parts shown in following Fig.1 The two parts' main function modules are both FFT arithmetic. In each part, except FFT module, there are fetch data module, repetition control model. The FFT module completes the column transformation and the row transformation's butterfly operation. The fetch data module and the repetition control module complete these functions such as producing address, fetching data, reversing order, reconfiguring points and output data.
The design of fixed points FFT
The design of fixed points FFT adopts five level pipeline structure which uses decimation in time (DIT) radix-4 algorithm. Each level adopts ping-pong memories to realize pipeline. In order to avoid data overflow and obtain higher precision, we use block float algorithm. Radix-4 FFT algorithm includes pipeline unit, producing address unit, producing twiddle-factors address unit, butterfly operation unit and block float process unit. This is shown in Fig.2 . 
. (3) The butterfly operation unit is a duplicate unit in the design. Every level pipeline can call this unit to make the program modularized. It also makes the program easy to read. At the same time, we can easily to modify program by using it.
The design of variable points FFT
The variable points FFT design adopts the similar method as fixed point FFT. The difference between these two modules is that there is a configuration logic unit to control the operation in every level in variable points FFT.
If we do 1024 points FFT, we start up all five level operation units. If we do 256 points FFT, we start up ahead four level operation units and do not use the last level operation unit. Else points FFT adopts the same method to realize.
The variable points FFT is mainly composed by five units. They are data input unit, 64 points pipeline process unit, multiplying twiddle-factors unit, 4 points and 16 points pipeline process unit and data output unit. Every unit all adopts ping-pong memories to guarantee the real time operation of input data. The two sub FFT modules adopt parallel pipeline structure to improve the process speed.
The variable points FFT system is shown as Fig.3 . is 1023×1023 words, so that one single FPGA or ASIC can't load it. The common method is to store a part of twiddle factors using symmetric feature of sine and cosine which can compute others twiddle factors.
In this paper based on numerical value feature of twiddle factor one linear interpolation method is presented to use less twiddle factors to compute all 1M point twiddle factors.
First let us analyze the numerical value curve of twiddle factor. We can see from Eq. in Fig.1 because high compression ratio can be obtain by bigger P. For instance, the storage twiddle factor of 1024×1024 points FFT can be compressed to 2k×32 bits from 2M×18 bits when P=128 and bit-length of twiddle factor is 18, precision can be ensured simultaneity.
Double sequence separation
Considering input data are real sequence in practical application, this part introduces how to implement two FFTs of N-point real sequence with one complex FFT. On the assumption that x 1 (n) and x 2 (n) are two real sequences of length N, then complex sequence x(n) can define as follows:
Because of linear characteristic of DFT, DFT of x(n) can express as:
Considering x 1 (n) and x 2 (n) can be written function of x(n):
x n x n x n + = .
x n x n x n j
So DFT of x 1 (n) and x 2 (n) are：
Because of 1 0
. (11) So we can obtain:
In order to deduct formula of FPGA implementation, above complex expressions are expanded to real representation.
Performance Analysis of Receiver
It might be important to note that the performance on false alarm probability after FFT. By researching the algorithm of threshold generating of pulsed radar signals' spectrum detection in clutter background, we have presented a threshold generating algorithm based on the structure of CMLD (Censored Mean Level Detector) CFAR detector, and analyze false alarm rate.
Sort algorithm X(1) ≤X(2)≤…≤X(R)
Square modulus output Spectrum input
Reference cell
…… ……
X(1) X(2) X(R/2) X(R/2+1) X(R)
Censored algorithm X(1) ≤X(2)≤…≤X(R) σis the mean square of Gauss distribution. The output of X(k i ) follows Rice distribution every location of harmonic. Assuming that I is the power ratio of the signal and background clutter before DFT, and then after DFT, NI is the power ratio of the carrier wave whose frequency is k 0 and background clutter. The mod of X(k 0 ) follows Rice distribution, its PDF (Probability Density Function) is 
X(1) X(2) X(R) ……
X(1) X(2) X(R-r) ……
When the SCR (Signal-to-Clutter Ratio) is smaller, the above Rice distribution tends to follow the Rayleigh distribution, and when the SCR is bigger, it tends to follow Gauss distribution. Only the situation that the SCR is smaller will be discussed below.
After the square-law detection is done to X(k), the output follows exponential distribution. μis the total average power of the background clutter and thermal noise, λ is the ratio of the targetsignal average power and the clutter-noise power. H 0 is the assumption of none target, H 1 is the assumption of having target. When there is no target, clutter noise is symmetrical, X(k) is statistical independent but follows the same distribution.
If S=TZ is the detection threshold value, it is a random variable itself, so we can use S's statistical character to figure out the false alarm probability: 
where,
( 1)( ), 0 
Thinking about the relationship between PDF and MGF, the PDF of Z is
. (26) LT -1 represents the Laplace transformation. Define a i as: 
Calculating the threshold of GO-CFAR and CMLD-CFAR separately, we can choose the bigger threshold:
(32) Fig.5 shows the simulation of the above-mentioned method. Simulation conditions include of two signals' that frequencies are 125.0MHz and 127.6MHz, Gauss random noise, µ=0, σ 2 =5; N=256K-point DFT; GO-CFAR threshold parameter: reference cell M=64, number of defense cell is 8; CMLD-CFAR threshold parameter: R=64, r=8, nominal factor T=0.125.
As is shown in Fig.5 , on condition that it has larger clutter background, the CMLD-CFAR should be chosen as threshold. During the frequency domain detection, the harmonic frequency points are correlative with the signal, they should not be reckoned into the noise power, and moreover, CMLD-CFAR just fulfills this requirement. When the threshold is built by CMLD-CFAR and the nominal factor T is chosen, it is deduced from Eq.(31) that detection false alarm rate P f is invariable. When it is to GO-CFAR, ensuring the same detection probability, the detection false alarm rate is less than P f ; this is helpful to the detection. The algorithm can obtain stable estimation of spectrum clutter background and provide the conditions the algorithm suits. The academic analysis and the simulation prove the algorithm's validity.
FFT-Based Digital Receiver
There are many additional elements being required, even though the basic implementation of wideband digital receivers is based on FFT algorithm. This section is devoted to the analysis and design of the whole system, because the configuration and implementation of all the elements involved can significantly influence the final performance of the receiver. In this sense, CFAR threshold, windowing or frequency detection may play an important role in the system because some FPGA resources may be required for its implementation. Fig.6 illustrates the FFT spectrum detection modules of wideband digital receivers. They have implemented on one FPGA chip.
For a test signal which make up of four channels' different pulse radar signal as is shown in Tab.1. The signal has been processed by the digital receiver's digital processing part, the FFT results are shown in Fig.7 . The system can detect four max signals from the 4 channel signal, and the detecting results of system are shown in Tab.2. We also can find the four max signal from the FFT results, they are the RF=98MHz signal from channel 1, RF=153MHz signal from channel 2, RF=196MHz signal from channel 4, RF=89MHz signal from channel 2. 
GO-CFAR threshold
Conclusion
We have presented a study of parallel pipelined architectures of the large point reconfigured FFT algorithm targeting FPGA devices for the implementation of wideband digital receivers. The indepth exploration of the FFT architectures also can be used in other wideband digital systems. The FFT algorithm adopts two-dimension signal processing, parallelism and pipeline mode, which can be taken into account for a joint assessment. The design includes fixed points unit and variable points unit to realize the configuration of FFT's points that improve the flexibility of system. From our analysis we can conclude that the whole system being miniaturized can alleviates the engineer's work, and the performance just fulfill the requirement of signal processing capabilities and real-time character synchronously. 
