Abstract: In this express, we present a new architecture of digital phase interpolation (PI) controller with clock and data loops, which can greatly reduce the jitter of recovery clock by reducing the probability of the coarse phase jumping and interpolating among several fine phases. A demo design was implemented using 0.13 µm CMOS technology for verification, and the simulation results demonstrate that the recovered clock of the presented architecture has a peak to peak jitter no more than 29 ps under 2.5 Gbps received data, which shows no coarse phase dithering happening. The area of this proposed PI controller is only 0.1 mm 2 .
Introduction
High speed serial link systems have gradually dominated parallel link systems in modern communications [1] . During a high speed serial transmission, usually a clock and data recovery (CDR) circuit is needed. CDR architectures can be classified according to the phase relationship between the received input data and the local clock at the receiver. Commonly CDR based on phase interpolator is used when the recovery data's rate below 10 Gbps [2] , and more and more hybrid PIs are presented, which benefits from digitalization [3, 4, 5, 6] . A block diagram illustrating conventional phase interpolation based on CDR is plotted in Fig. 1 . The CDR circuit consists of delay elements, a phase interpolation core and a phase interpolation controller. The delay elements implemented in either PLL or DLL are used to generate complementary multi-phases, namely CK 1 , CK 1 …, CK N , and CK N . Every clock cycle is divided into 2N phases, which are called coarse phases ψ in the express. A pair of adjacent coarse phases will be selected by the calibration state machine (CSM), which means that a degree of 180/N is interpolated. The PI delay cells are called fine phases φ in this express. And the corresponding fine phases are selected by the bi-directional shift-registers (BDSR) which include W shift cells.
A coarse phase can be given as:
Therefore, the total recovery clock phase can be given: where ψ is 180°=N, and M coming from W shift cells, means that M PI cells are selected, K N, M W. Due to the effect of nonlinear binary phase detector (PD), data's dithering, flipflop's meta-stability and the quantified errors, the recovery clock may bang-bang among several fine phases, two conjoint coarse phases, or their combinations. What is worse, in general the arriving time of the CSM adding N:2 MUX to PI is different from that of BDSR module, for it's very difficult to match the delay of these two paths. Therefore, the recovery clock may jump among coarse phases with a comparatively high probability. When it takes place, a large phase jitter called the coarse-phase jitter will be generated [3, 5, 6, 7] , which will degrade the performance of our design.
This express proposes a novel architecture to resolve the problem described above. In this architecture, Data Interpolator Loop (DIL) is employed to address this problem. The simulation results demonstrate that the new architecture can greatly reduce the occurrence probability of large jitter.
Topology description
As described in Section 1, the coarse-phase jitter will be generated when the selected phases jump back and forth between two coarse phase edges, which will also degrade the performance of the recovery data. This section will first analyze the cause of coarse-phase jitter, and the proposed architecture will be presented later.
Cause of coarse-phase jitter
As shown in Fig. 1 , the CSM chooses two conjoint differential signals, namely fCK M ; CK M ; CK M+1 ; CK M+1 ; 1 M N À 1g and sends them to the phase interpolator module. After being interpolated, four quadrature clock signals will be sent to PD module and be compared with the data. After that the phase information that the data leads or lags the clock is sent to digital filter, and the results of this digital filter are used to control the BDSR module. The BDSR shifts one bit right when an E_F pulse is received and also one fine phase will be interpolated, whereas the module will shift one bit left when a L_F pulse comes and also the PI will decrease by one fine phase. When all BDSR controllers are in the logic state of high, the module sends a carry pulse Incr, otherwise when all is in the logic state of low, a borrow pulse Dec is generated. The generated fIncr, Decg signals will be inputted into CSM module. As described above, the negative feedback regulation mechanism is established. The coarse phase dithering is shown in Fig. 2(a) and (b) .
Let us define the total jitter as below:
where T f and T c represent the fine-phase and coarse-phase jitter respectively. A case of the coarse-phase jitter is described in Fig. 2 . At a time, the W-bit controllers of the bi-directional shift-registers are all "ones" and the carry or borrow signals are all "zeros", which are given as below:
At the next time, if a left shift signal comes, the state of bi-directional shiftregisters will change as.
But at the next point, if a right shift signal comes, the state of bi-directional shift-registers will change as.
As mentioned above, CDR based binary phase detector finally bang-bangs among several phases, rather than stabilizes at a certain phase. Therefore, if the above phenomenon continues, the jitter of the recovery clock will increase largely. To resolve this problem, in the patent [6] , a thermal code generator is introduced, and in the paper [3] , the redundant control logic was implemented by Wei Xueming. But there was no further analysis of the cause of coarse-phase jitter and all the solutions focused on the clock path.
Proposed topology
The proposed topology is shown in Fig. 3 . Compared with the conventional phase interpolator controller as shown in Fig. 1 , Data Interpolator Loop (DIL) is intro- duced to address the above mentioned problem as shown inside the box, and the proposed topology uses a sample finite state machine (FSM) to replace the complex CSM module. Besides, a clock management device (CMD) is used to provide different frequency clocks to all the digital modules.
As seen from Fig. 4 (a) and (b), an example of the fine-phase jitter is described. When the CDR is locked, the carry or borrow signal fIncr, Decg fixes at one of the three states ff1; 0g; f0; 1g; f0; 0gg. In our analysis, we suppose that they are in the state of f0; 0g and the fine-phase controllers output M-bit high values "1". The signal C k in Fig. 4(a) is one of the M-bit controllers. Different from Fig. 2 , the final recovery clock swags only among several fine phases, avoiding jumping between two coarse phases under certain condition. Thus the total jitter can be reduced remarkably. 3 Circuits design 3.1 Improved PD Bang-Bang Phase Detectors are widely used in clock synchronization and data recovery since they provide high gain, work at high speed and are less sensitive to process variation [8] . And the Half-Alexander configuration is one of classical Bang-Bang PDs, which tracks a data signal and only needs half of clock frequency compared to full-rate PDs. The thesis [9] proposed a phase detector for half-rate bang-bang CDR circuit, as shown in Fig. 5 .
The phase detector uses four quadrature clocks to sample the data at 0°, 90°1 80°and 270°, and output D 0 , D 90 , D l80 and D 270 signals respectively. Similar to a full-rate Alexander phase detector, the logic of phase detector can be given as follows:
The phase detector adopts a completely symmetric design. However, when the PD operates at a very high transmitting bit rate, if a sum of a clock to output delay time (CK-Q delay) of the D flip-flops and a delay time of the XOR gates exceeds 1/4 clock cycle, an unpredicted glitch will be generated in the instruction signals fUP, DNg [7, 10] . In other words, PD generates wrong phase-difference instruction signals fUP 1 ,DN 1 g or fUP 2 , DN 2 g, which may deteriorate CDR performance.
This express presents an improved PD, which eliminates the mismatch and also guarantees PD's reliability when it runs at very high speed. Seen from the below Fig. 6 , four D-flip-flops are added to resample D0, D90, D180 and D270, which can ensure that the phase instruction signals fUP 1 , DN 1 g take place at the same time, and so do the signals fUP 2 , DN 2 g. Similarly a selecting device includes four D-flip-flops to sample the XOR output signals. So the proposed PD guarantees the sequence of all critical paths. According to the relationship between the input signal and the clock, the clock lagging the data, ideally the UP signal varies and the DN signal keeps low. Suppose that the delay would occur as mentioned [7] , given the delay time t ¼ 300 ps (exceeds 1/4 clock cycle, where one cycle period is 800 ps). Through the theory analysis under such condition above, the half-rate Alexander PD has glitches. And from simulated by the half-rate Alexander PD and the improved PD in Fig. 7 , where the solid lines fUP 1 , DN 1 g and fUP 2 , DN 2 g are ideal signals, and the dash lines are actual signals, the conventional structure generates glitches, while there are no glitches in the novel PD. It can be concluded that the improved structure prevents the generation of glitch.
Finite state machine module
The FSM module includes one pair, which is gray coding counter and decoder. Gray coder is a binary numeral system in which two successive values differ by only one bit and that is originally designed to prevent spurious output from electromechanical switches. First after the FSM module is initialized, the gray code counter is set at f000g. Then the counter receives the indicator signals fIncr, Decg from the BDSR and changes one bit on which rising edge. If the signals fIncr, Decg change from f0; 0g to f1; 0g, the counter adds "one". If the signals change from f0; 0g to f0; 1g, the counter subtracts "one". While the signals remain unchanged, the counter doesn't count. It is noted that the signals fIncr, Decg cannot be f1; 1g since the BDSR module avoids this state. The decoder is associated to the gray code counter, which produces differential clock phase controller signals according to the counter. These generated signals are sent to the Multiplexer to choose phases of PLL or DLL. And then the chosen phases are interpolated by the PI module. Table I lists the relation among the gray coding counter, the decoder sequence and the coarse phases, in which we don't list the complementary coarse phases.
Data interpolator module
When the BDSR sends a carry or borrow signal, the coarse phase shifts =N radian. If the coarse phases have experienced 2 phase rotation and the indicator signals continue to bang-bang after the reference clock is locked, we may think that the recovery clock falls into the boundary of two coarse phases and can't jump out of this state. Thus, the module will interpolate some delay to the data, changing the initial phase difference of the reference clock and the received data, and making the recovery clock track the interpolated data again. Based on the form signals fIncr, Decg and the number of pulses, the module produces four control signals fCTR1, CTR2, CTR3, and CTR4g to interpolate the corresponding delay to the data. The CDR completes recovering clock and data experiences two periods. Firstly, the recovery clock tracks the data's phase. Secondly, when the CDR stabilizes, the recovery clock bang bangs among several phases. So under the first period, the DIL doesn't operate in order to avoid effecting the normal operation, and two main methods have been adopted. The first one is that the effective time of the signal CTR1 is about 5 us longer than the time that the coarse phases have rotated 2 after the reference clock (PLL or DLL) is locked in the design. As depicted in Subsection 3.2, when a high pulse is generated on the signal Incr or Dec, the coarse phase rotates =N radian. Therefore it's easy to set a suitable beginning time for the data delay control signals. The other one is to set the time gap between fCTR1, CTR2g, fCTR2, CTR3g, and fCTR3, CTR4g longer than the full 2 coarse phase for when the data phase changes, the loops have enough time to recover. The DIL need not interpolate very accurate delay to the data because its main function is to prevent the CDR to fall into the dead zone and if we change the initial condition, which can "jump" over it. From an example simulation waveform of Fig. 8 , when the data interpolator control signals fCTR1, CTR2, CTR3g change from low to high, an appropriate delay ft1, t2, t3g is interpolated to the signal DATA_IN. In this case, after it is interpolated three times, the CDR stabilizes. So the data need not be interpolated more, and the signal CTR4 remains unchanged.
Simulation results
To verify the dual-loop PI controller architecture, a SerDes circuit has been implemented using the CMOS 0.13 µm process. The CDR is simulated with 1.25 GHz reference clock provided by four differential-stage PLL, 2.5 Gbps received data, and the various initial phase differences between the received data and the reference clock are studied. In the demo design, CTR1, CTR2, CTR3 and CTR4 separately control to interpolate 50 ps, 70 ps, 100 ps and 130 ps delay to the received data.
Through comparing the phase difference between the reference clock and the data, the CDR system recovers clock and data. However, the initial phase difference is uncertain, for the serial data streams are sent without an accompanying clock signal. And as analyzed in the sections above, the occurrence of large dithering is that the CDR falls into the dead zone and jumps among coarse phases. So in order to compare the presented CDR with the conventional structure, different initial phase difference is studied. By setting a phase step 50 ps and swapping within a full clock cycle, the large dithering can be found. For example, suppose that the initial phase difference of the reference clock and the received data is 50 ps, and compared with the PI controller by using single loop [11] . Fig. 10(a) and (b) plot the corresponding results. Simulation shows that the presented CDR only causes 28 ps dithering, while the single loop PI causes a peak to peak recovery clock jitter of 205.5 ps under the worst case, obviously where the coarse phase dithering occurs. Furthermore, in a full clock cycle, different phases are simulated in Fig. 11 . Where the x-coordinate represents initial phase difference between the received data and the reference clock, and the y-coordinate represents peak to peak jitter of the recovered clock corresponding to x-coordinate. In Fig. 11 the little dots denote clock jitter of the single loop, while the little triangles are that of the dual loops. From the figure, the maximum jitter is no more than 29 ps in our design, while to the single loop PI, there are four cases in which the peak to peak recovery clock jitter is about 200 ps under the worst conditions. As mentioned in the express, only when the system meets some conditions CDR may occur the coarse phase dithering. It can be concluded that the new architecture can reduce the occurrence probability further more.
Conclusion
In conclusion, a dual loop PI controller is presented in this express, which can reduce the probability of the coarse phase jumping, and also the recovery clock can be interpolated among several fine phases. Thus, the peak to peak jitter of the interpolated clock is reduced. Since the proposed PI controller is based on digital logic operations, it can be used in many architectures of CDR.
