In order to satisfy the requirement of image registration in multi-band SAR system, a controlling unit with real time phase difference detection was implemented using in this paper. The phase difference detection was based on the DFT method combined with undersampling technique. This paper also provides a comprehensive architecture of the time service recoding and embedded subsystem in the controlling unit. Both ADC consistency and phase difference detection performance test were conducted in the experiment to ensure less than 10ps phase difference resolution requirement.
Introduction
Multi-band SAR is an important tendency of SAR imaging technology [1] . Comprehensive utilization of the information in multi-band SAR images requires accurate registration among different observations. Classical featurebased image registration mainly depends on target or subject in image. However, the backscatter of different band will result in significant difference in texture, grey scale and contrast characteristics [2] , which are too time consuming to meet real time and high precision image registration requirements. Since the SAR system uses phase information to identify the target objects, all the hardware is sourced from a common reference clock provided by the frequency generator (as shown in Fig. 1 ) for synchronization. As for the airborne multi-band SAR system, each SAR is mounted on the same platform with parallel flight path and highly synchronized with each other. Recent multi-band SAR system proposed to detect the phase difference of source clock to each band and measure the residual motion error [3] in system, which make it possible to accomplish the image registration along with image reconstruction process.
The controlling unit was designed to detect the phase difference of the synchronous clock between each band. This parameter associated with the PRF (pulse repetition frequency) time data are called metrology data, which was used to help image registration. In this paper, we proposed a real time phase difference detection method using undersampling technique combined with DFT algorithm. The undersampling technique considerably decreased the difficulty in hardware design. And by achieving high phase consistency between ADC channels in hardware, the minimum phase resolution that the FPGA can distinguish was less than 10ps. All the digital process of SAR controlling unit was implemented in a single Xilinx Kintex-7 325T FPGA. 
Related Work
The measure of phase difference is of great importance in electric, navigation and radar applications [4] . There're a lot of phase difference measurement methods proposed, which are mainly categorized in four types: classical zerocrossing based measurements, correlation algorithm based method, modifications of sine-wave fit algorithm and DFT-based measurements.
The zero-crossing approach is based on the detection of crossing a zero level [5] by both measured signals. A universal counter is required to distinguish the interval of two triggered signals and nanosecond has been the best resolution in practice. Besides, zero-crossing method is sensitive to additive noise or signal higher than harmonic components. Correlation method [6] can be useful since there's little correlation between noise and signal, but coherent sampling is required. [4] improved this method to non-integer period sampling signals, which however, increased the hardware complexity. The sine-wave fit technique [7] and its modifications based on the least-square error (LSE) between the measured signal and an ideal sinusoid. This method have the best performance on noncoherent sampling and additive noise, but is sensitive to harmonic distortion and time consuming in multiple iteration process. The benefit for DFT based method is that all processing is done in the digital domain, which is suitable for real time processing in FPGA or DSP. Also, DFT-based measurements are not sensitive to additive noise and harmonic distortion. But the spectrum leakage is always inevitable for non-coherent sampling. There're two ways to minimize the spectrum leakage: adding a window in pre-processing or interpolation in frequency domain [8] . Spectrum leakage does matter in phase estimation, but can be diminished by the subtraction in phase difference detection which we will detailed discuss in the following sections.
Proposed Controlling Unit

System overview
In this paper, we proposed a controlling unit with real time phase difference detection based on DFT, the block diagram of which is shown in Fig. 2 . This design was divided into three modules: 1) The phase difference detection module. This module receives under-sampled source clock data from 8 ADC channels and then performs 4096 points FFT (Fast Fourier Transfer). 8 CORDIC cores for arc-tangent calculation are instantiated followed by FFT to get the absolute phase of each ADC channel. 2) PRF and time service module. A reference PRF is generated to six SAR subsystems for synchronization. In addition, the time service system is responsible for establishing the local time system and recording the time information when received echo PRF. 3) Embedded subsystem. A configurable soft processor was applied to provide some aided control and manage the DDR3 memory.
Phase difference detection module
This module used the ISERDES2 and a frame detection logic to interface with ADC's LVDS outputs. Phase difference was detected by performing the FFT and arc-tangent calculation to the ADC raw data.
Phase difference detection algorithm base on DFT
The phase difference to the reference can be estimated by DFT as: ∆ = − , where is arc-tangent of the signal's image part divide real part in frequency domain. However, in many cases it is hard to make coherent sampling, which means spectrum leakage becomes inevitable. The most common practice is to add a window function before processing DFT. Take the sampling sequence ( ) with a window function ( ) as follow:
where n = 0, 1, ⋯ N-1, is amplitude and is initial phase, 0 is the signal frequency and is sampling rate.
The DFT of which at 0 can be expressed as:
Mark δ as frequency deviation and ∆ as frequency resolution. So 0 = ( + ) • ∆ , where k=0, 1, ⋯ N/2-1, assume that the negative frequency components are ignored. Then X(k) can be written as:
Then the phase of ( ) can be estimated as: φ = θ + δπ, so the phase difference of two signals would be the subtraction of the two phases:
Eq. (4) shows that the phase bias caused by the spectrum leakage can be eliminated by the subtraction. Although some research has pointed out that this would introduce phase noise, by applying a moving average filter to the phase difference results, the phase noise can be decreased.
Undersampling technique
Undersampling will cause aliasing effect. The aliased frequency in the first Nyquist zone are determined by Eq. (5):
where is the sample rate, and is the input data frequency.
The only reason that undersampling technique could be used in this design is that the sampling signal contains only base harmonic of 100MHz in the spectrum and it wouldn't overlap with each other in frequency domain. The undersampling technique will behave like a mixer or down converter in the receive chain. So when sampling the 8 channel of reference clocks at a frequency of 30MHz simultaneously, the 100MHz signals will appear at 10MHz in frequency domain and without losing their relative phase difference information. This undersampling technique can considerably decrease the difficulty in hardware implementation of the system.
FPGA implementation considerations
The 4096 points decimation in frequency (DIF) FFT core was developed by using the Radix-4 butterfly architecture and the FFT process was separated into three steps: load data, calculation and unload data. The re-arrangement logic will make the output frequency result in nature order. Each complex multiplication utilized 3 DSP48E1s. Block RAMs were used to store the intermediate results and the 24-bit twiddle factors were also stored in the ROMs. Two switch matrixes resided before and after the Radix-4 butterfly to select the right positioned data for each calculation procedure.
The CORDIC core worked in vector mode to perform arc-tangent function. That is, to load the real part and image part of the FFT results to the initial CORDIC's input coordinate (X ,Y ), and set the initial rotating angle Z = 0. By driving the Y to 0 using the shift and rotate operations, the final Z would be the arc-tangent result of arctan (Y X ⁄ ).
PRF and time service module
Synchronization is always paramount in the system. A common reference PRF ranging from 500Hz to 7KHz was generated in this module to allow all six SAR subsystems to transmit pulse simultaneously. When received echo PRF, the system needs to record the time information and its PRF count number. A local time system was established to provide time service using the GPS NMEAformatted data package and PPS (Pulse Per Second) signal. This time system was always kept synchronous with GPS time and had the ability to maintain time service when GPS signal loss. GPS data package, $GPGGA for example, provides the time of data (TOD) in hours, minutes, and seconds. PPS is a digital signal which has a pulse on every second boundary. This signal is used to reset an internal clock counter in the local time system that counts the number of clock ticks from the UTC boundary to indicate the inner second. Fig. 3 shows the time service synchronization process. The precision of time service depends on two aspects: how close the PPS edge is to the actual UTC boundary and the clock frequency errors induced by crystal oscillator. The GPS module has precision within 10's of nanoseconds, and with the crystal oscillator errors in 100ppm, this time service system can achieve a precision of 35ns. The PPS signal may be lost or interfered in noisy electromagnetic environment. To overcome this, an update window with only 1us valid at the second boundary was applied when the system is in synchronous state with GPS. Another problem should consider is that GPS precision can vary due to many factors which can't be easily measured. These errors can change from one PPS to the next. So a median filter is adopted on the clock counter to calibrate the actual count value in situations when PPS lost or software issued a calibration command.
Embedded subsystem
Embedded subsystem had two purposes: 1) be responsible for the control operations, which include parameter configuration, issue instructions to SAR subsystem, and power management. 2) DDR3 memory management, which was to support data buffering in DDR3 and data transmission to the CPU board.
Xilinx's soft processor IP MicroBlaze was introduced to develop this embedded subsystem. As Fig. 2 depicted, there were MicroBlaze processor and DMA core stand as bus masters. The block DMA was in charge of data transfer from DDR3 memory to PCI interface, Memory interface peripheral, on one side connected directly to the external DDR3 memory, on the other side provided full AXI-4 interface to the bus interconnect, thus made it possible for bus masters to access the external memory space. The 1GB 32-bit DDR3 memory space was allocated into 8 partitions to support multi-channel data buffering of phase data and RPF time packages. Each memory partition had a set of barrel read and write address pointers to constantly monitor the used buffer size. When gathered data size greater than 1MB in each channel, a DMA transfer would be initiated.
Besides, a user defined bus slave peripheral with AXI-4 lite interface was added into the system and multiple accessible registers were implemented in it. All the control instructions and parameter configurations to SAR subsystem were done by this writing registers to drive the fabric logic beyond embedded subsystem. The interrupt controller manages all the incoming interrupts to MicroBlaze. Every inbound data to embedded system is associated with a data valid signal. When data ready, an interrupt would be triggered to processor by the valid signal. And at the same time that data is registered into AXI slave. The processor response corresponding interrupt request, and fetch the data at AXI slave's specific offset address.
Experiment Results
Both the ADC consistency and phase detection algorithm tests were conducted to verify the final technical specifications and performance in system.
ADC ENOB test
One specification that can demonstrate ADC's performance is the effective number of bits (ENOB). The ENOB was defined as:
, where SNR = 10log (6) Fig. 4 . ADC ENOB test result
The power spectrum of signal ( ) and noise ( ) were calculated by doing FFT of the ADC raw data, so coherent sampling was needed to avoid spectrum leakage. Raw data can be derived through Logic Analyzer from FPGA. Fig. 4 shows the ENOB of 8 ADC channels, which was tested under the conditions of: 30Msps, -1dBFS of 3.75MHz input sine wave with band pass filter.
ADC consistency test
The phase difference resolution is highly related with the ADC consistency result, so testing the accurate ADC consistency out that introduced by the hardware would provide a good reference for data correction. The ADC consistency test includes DC bias, amplitude consistency and phase consistency. DC bias is defined as: ∆ = ( 2 ) − ( 1 ) , which can reflect ADC's DC characteristics between channels. Amplitude consistency can be deduced from ∆ = 20log ( 2 1 ⁄ ) . The amplitude gain error will decrease dynamic range of ADC performance. Phase consistency or relative phase delay under the same input can have immense error impact on digital signal processing especially for undersampling technique.
Paper [9] has provided a comprehensive method to test the ADC consistency. We followed these test steps under the conditions of 30Msps, 10dBm 100MHz sine wave analog input. The final results were shown as Table 1 . with 64 groups of data averaged. 
Phase difference detection performance
A testbench was designed to verify the FFT and CORDIC core's functionality and their performance. Due to finite precision arithmetic, noise was introduced during transform. Errors relative to double-precision MATLAB simulation was114dB, with RMS error averaged over five simulation runs. These two cores can both ran at 200MHz clock without timing violation. The FFT process had a 72.35us latency while the CORDIC had only 36 clock cycles of latency, which could satisfy the real time data processing requirements. Table 2 . gives the resources utilization report after place and route in Vivado IDE. 
Conclusion
The test results showed that the data sampling circuit had not only achieved up to 11.2 effective number of bits, but also down to 27.23ps of phase delay. These results provided a good reference for the final phase data error correction and compensation. With total phase noise lower than 3ps, this controlling unit had a phase difference resolution of less than 10ps in practice.
