The paper presents a full digitized approach for the pulse compression implementation in chirp radars. The emphasis is to cancel the quadratic phase term of the echo using a coordinate rotation digital computer (CORDIC). This approach has been implemented on a field programmable gates array (FPGA) and the compressed output peak is 100 dB larger than the noise.
of multi-dimensional CORDIC and its application to complex SVD," J. VLSI Signal Processing, vol. 25 
Introduction
The chirp (linear FM) waveform is the easiest to generate for spread spectrum systems. The pulse compression process has been ever implemented using analog devices, e.g., surface acoustic wave (SAW) device, whose impulse response is characterized as a quadratic phase function. With the development of digital processing devices, it is more appreciated to implement the pulse compression using digital devices. The most popular approach is FFT + Multiply + IFFT, which takes advantage of the fact that convolution in the time domain is equivalent to multiplication in the frequency domain [1, 2, 3, 4, 5, 6, 7, 8] .
Although the utilization of the efficient FFT and IFFT, it suffers from some important difficulties especially when the time-bandwidth product (T B, T is the pulse duration and B is the bandwidth) is large: 1 The sample rate should be no less than 2B. It is appreciated to sample the reference waveform at the same rate [4] , so the storage for the frequency domain version of the reference chirp should be no less than 2T B points. 2 In order to accomplish the linear convolution of the echo and the reference waveform, the FFT length must be no less than the sum of their lengths minus one, usually by padding zeros to the end of the sequences. The FFT length should be (4T B-1) at least, also fit for the IFFT. 3 Four real multiplications are operated to accomplish one multiplication between the FFT results.
Hardware complexity, i.e. memory size, computation and data rate, become apparent as the T B increases. We introduce another fully digital pulse compression implementation to overcome such shortcomings. This approach is accomplished mainly through the CORDIC. The idea of CORDIC is a rotation of a vector in a plane until it coincides with a target position by decomposing the rotation into a sequence of elementary rotations with predefined angles. There are two modes of operation: rotation mode and vectoring mode. We use the rotation mode here: an angle of rotation is given and the coordinate components after rotated are computed. The latency can be shortened by using redundant arithmetic [9] or angle recoding algorithm [10] . The total errors introduced in CORDIC is govern by the output word-length when the number of iterations approaches the output word-length [11] . In our work, the internal word-length is beneficial to the pulse compression processing as shown next.
The remainder of this paper is organized as: Section 2 introduces our proposal approach; Section 3 describes the hardware implementation and test; in Section 4, the test is discussed; at last, we conclude this paper.
Method
Denote the transmitted signal (with chirp slope K) in the base-band for simplicity
and the impulse response of the corresponding matched-filter
The echo can be denoted as a delayed replica of the transmitted signal
where τ is the delay and f c is the carrier frequency. The matched output is obtained from the linear convolution of (2) and (3)
Consider what the matched-filter has done to the echo: The quadratic term e −jπKλ 2 in the 3 rd line cancels the quadratic term e jπKλ 2 of the echo in the 2 nd line, resulting in a single tone per point target; by mapping f = Kt and f τ = Kτ , the integral in the 4 th line is equivalent to an analysis of the single tone using Fourier transform (FT) when K is negative or inverse-FT (IFT) when positive. As a result, the output envelope will be a delayed δ-function and the peak location indicates the target range through R = cτ /2 (c is the velocity of light).
If the transmitted waveform is of finite duration, which is the case of particular interest, the output envelope of (4) will be Sinc-like shape.
Our ideal is to decompose the real-value echo Re{s c,τ (·)} into two vectors, i.e. two rotators, in the rectangular coordinate (as distinct form s τ (·) which is in the base-band, s c,τ (·) is with the carrier)
where T (sample period) is absorbed into n and n τ (that is τ /T ). Denote the rotation operation by the CORDIC as R(θ)
We can rotate the echo in two steps, (1) through the angle of −2πf 0 n, (2) through the angle of −πΔfn 2 , where f 0 is the initial rotation frequency and Δf is the delta rotation frequency. In a W-bits system, these values can be defined by the frequency tone word (FTW ) and the delta frequency tone word (DFTW )
We accommodate the FTW and DFTW as follows
The first rotation converts s c,τ (·) to the base-band and the result is a sampled version of s τ (·) from (3); the second rotation cancels the quadratic phase term of the previous result. It is true that R(α + β) = R(α)R(β), so we incorporate the above two rotations into one step
After the above rotation, s c,τ (·) is converted to a single tone as shown in Fig. 1 , while the angular velocity of s * c,τ (·) is accelerated as in (11.a). We can decimate the sequences since the frequency of the single tone is much smaller than the chirp bandwidth. A decimator consists of a low-pass filter and a switch, thus the accelerated component can be eliminated by the decimator. With the aim of extracting the delay, IFFT (assume K > 0) is operated and the peak locates at n = n τ .
Usually the rotation frequency increase between two clocks may be smaller than the frequency resolution (that is DFTW = 1 in (8)), the designer can increase the internal word-length to solve this problem; also a practical solution is to increase the rotation frequency only once every M clocks. We 
Implementation and test

FPGA implementation
FPGA is adopted here due to the parallel architecture and flexibility to implement. The framework for the closed-loop test is shown in Fig. 3 . The synchronous controller, the CORDIC and the decimator is implemented on the FPGA; the IFFT (or FFT) is operated on the personal computer (PC).
Full pipeline architecture is employed for the CORDIC. The internal word-length is 30-bit and the output is rounded to 14-bit. This decimator consists of three stages whose decimating factors are 512, 8 and 8. In addition, windowing is applied to reduce the sidelobes. 
Test
The operating frequency is 5.0 MHz∼5.06258462 MHz, and the sweep internal is 0.42 s. The range resolution is less than 2.5 Km and the slope is 149011. 612 Hz/s. The output of the DDS is −7.9 dBm, and the sweep delay is 400 μs. The result is shown in Fig. 4 . 
Discussion
The IFFT (or FFT) length and output data rate is reduced to 512 each sweep and it is much smaller than the TB (that is 26285.5). Complex multiplication is not needed. Only FFT or IFFT is needed rather than both FFT and IFFT are necessary as in the existing approach. No storage is used for the reference waveform.
The most attractive is the simplicity and flexibility to be implemented on the hardware. The main operations involved with CORDIC are comparison, shift, addition and subtraction, so it is hardware efficient. Four channels have been implemented on one XC3S2000 (XILINX R ) successfully.
If the FFT + Multiply + IFFT approach was employed, the FFT length should be 105141 at least (4TB-1, as mentioned in the Introduction); even if the FFT length is 65536, the utilization of RAMs has reached to 100% of the
