Abstract -A 1.08-Gb/s CMOS half-rate burst-mode clock and data recovery (BMCDR) circuit with a novel jitter reduction technique is presented. There are several discrete delay time values in the programmable delay circuit (PDC) of the edge detector can be selected by five addressing inputs to create a "dynamic average" delay time that equals to half-of-data period (T bit /2) to ensure minimum jitter accumulation. A prototype chip was designed with TSMC 0.18-μm CMOS 1P6M technology. The occupied die area of the CDR is 0.99 × 0.97 mm 2 , and the power consumption is 36 mW under a 1.8-V supply voltage.
INTRODUCTION
Many communication systems adopt a phase locked loop (PLL) to recover clock and data signals because of their excellent jitter suppression performances. However, burstmode applications such as in passive optical network (PON) systems require a fast locking clock and data recovery (CDR). The conventional PLL-based CDR circuits cannot be used for this application because they need long settling time.
A number of fast clock-acquisition techniques for burstmode clock recovery have been proposed [1, 2] . Among them, gated-oscillator based CDR approach [3] [4] [5] provides instantaneous locking and has a simple structure. It is especially attractive for such burst-mode applications as LAN (Local Area Network) and PON [4] . However, for the fast locking purpose, open loop architecture used here may results in accumulated jitter that cannot be suppressed. If consecutive bit stream with same value is encountered, the jitter accumulation can cause error in data recovery.
In this paper, we propose a gated-oscillator based burstmode CDR with instantaneous locking and the jitter reduction characteristics. The PLL in the CDR circuit operates at half of input data rate. With the proposed jitter reduction feature, the CDR circuit has lower jitter on recovered clock edges.
II. GATED VCO BASED CLOCK RECOVERY
The block diagram of traditional BMCDR architecture is shown in Fig. 1(a) [1] . The stoppable VCO is the key point of quick phase alignment.
The gating signal of GVCO is produced by an edge detector which indicates whether data transition occurs. The pulse width of the gating signal is critical as shown in Fig.  1(b) because it decides the quality of the recovered clock. To have the best performance of CDR circuit, the delay time in the edge detector must be set to optimum value of T bit /2 [8] . As mentioned before, jitter generation in the open loop path of BMCDR circuits cannot be suppressed effectively. The occurrence of mismatch of T bit /2 will result in jitter accumulation.
Jitter on output recovered clock will be eliminated as data transition occurs. However, when consecutive bit stream with same value is encountered, the clock recovery circuit (CRC) operates as a normal ring VCO. If the oscillator were noiseless, its zero crossing point would be uniformly distributed over time. On the other hand, when noise, that is, mismatch of T bit /2, is injected to the loop, will cause phase fluctuations which give rise to errors in the zero-crossing time. The phase fluctuations "build up" in a ring VCO over time because the zero-crossing error introduced by each delay cell adds to all the previous zero-crossing errors [9, 10] . As a result, the larger the mismatch is, the worse is the jitter performance. Let us examine the amount of jitter accumulation when mismatch of T bit /2 occurs. In Fig. 2 , we can observe that jitter accumulation is dominated by two factors; one is the run length of input data or data density, the other is the amount of mismatch of T bit /2. As a result, there should be a well-controlled mechanism of half-data-period delay. Unfortunately, in practice, with temperature and process variation, it is impossible to control the delay time in edge detector to an optimum value exactly equals to T bit /2. Recently published BMCDR papers adopt inverter chain [2, 4, 5] or an external chip which offers a digitally selectable step resolution [8] to minimize the mismatch of T bit /2 in the edge detector. However, due to process variation, these methods may still introduce mismatch between T bit /2 and delay value, which also result in jitter accumulation in the recovered clock. Thus, to ensure minimum jitter accumulation, a concept of "dynamic average" to produce T bit /2 delay precisely is presented in section III.
III. METHODOLOGY FOR JITTER REDUCTION
As addressed in reference [7] , mismatch of the half-of-data period in the edge detector induces temporal phase errors on output recovered clock that are eliminated as data transition occurs. When consecutive bit stream with same value is encountered, temporal phase errors are accumulated and provoke error in data recovery. To reduce the phase error resulting from the mismatch, a programmable delay circuit (PDC) shown in Fig. 3 , controlled by five addressing inputs, is used in the edge detector to produce dynamic average delay time of T bit /2 accurately.
As shown in Table I , the PDC offers a digitally selectable resolution of approximately 77ps. The PDC delay can be changed with the use of the five addressing inputs (x0-x4). For example, if the X2 is set to "high", a 308ps delay is added to the previous delay. Therefore, through the combination of the five addressing inputs, we can easily adjust the delay of PDC. Furthermore, in case the PDC delay is fixed on one of the two values that are adjacent to T bit /2, phase errors are accumulated due to mismatch of T bit /2 when there is no data transition. To eliminate phase errors due to difference between the PDC delay and T bit /2, the adjacent two values are chosen to be switched with equal portion periodically, such that the jitter does not go beyond a certain value [8] . However, the dynamic average delay value of the two is not necessarily falls on T bit /2, shown as Fig. 4(a) . The mismatch will result in jitter accumulation, too. In this work, we propose a concept that many values can be chosen being weighted average to produce a value that equals to T bit /2. Here, four values are chosen then the PDC delay is alternatively switched between the four discrete values that are adjacent to optimum value T bit /2 to create a dynamic average value that fall exactly on T bit /2, shown as Fig. 4(b) . As a result, the phase error accumulation can be eliminated and the recovered clock has lower jitter. 
IV. CIRCUIT DESCRIPTION
The architecture for the proposed burst-mode CDR circuit is illustrated in Fig. 5(a) . The CDR circuit includes clock recovery circuit (CRC), decision circuit, and the PLL. The two GVCOs are controlled by a control voltage generated from the PLL. The GVCO generates the recovered clock signal which samples input data and operates with the same frequency as that of the VCO in the PLL.
The edge detector adopted here is composed of a PDC and two NMOS transistors as shown in Fig. 5(b) , extracting pseudo-return-to-zero (PRZ) components from the input NRZ data stream for CRC to generate recovered clock signal. Through the rising edge detector, the data can be sampled at the optimal point instantaneously. The CDR circuit relieves device speed requirement by half-rate clocking scheme.
When input data is applied for the CDR, the RED generates an output pulse with width of T bit /2 to force the oscillator into hold mode while the input rising edge occurs. By PDC, "dynamic average" delay value of T bit /2 is created accurately through the combination of several discrete delay values. In such a way, the T bit /2 delay in the edge detector is well controlled and the accumulated jitter is limited. When input data keeps in a high or low level without rising-edge transition, the CRC is in oscillation mode and acts as an oscillator. While hold mode changes into oscillation mode, the rising/falling edge of output recovered clock will be set to the middle of data eye instantly.
The CRC realigns the phase of the incoming burst-mode data to that of its recovered clock which drives a decision circuit and retimes the data to generate two sequences, D out1 and D out2 . Fig. 5 (c) shows a conceptual timing chart of the CDR. 
V. PROTOTYPE CHIP AND EXPERIMENTAL RESULTS
This burst-mode CDR circuit has been fabricated in a 0.18-μm 1P6M CMOS process. Fig. 6 shows the die photo of the proposed BMCDR. The total area is 0.99 × 0.97 mm 2 . The die is bonded on a 2-layer FR4 board to measure. The total power of the chip is 36mW under 1.8V supply.
After the PLL settles, PRBS is applied to the chip. The recovered clock jitter of the delay values that are adjacent to T bit /2 are measured. From previous discussion, the less mismatch between T bit /2 and dynamic average delay value, the better is the jitter performance. Base on the result, four values in the delay line that have less clock jitter, which are address 4, address 5, address 6, and address 7, the corresponding recovered clock jitter are 165.5ps, 150ps, 130ps, 175ps, respectively, as Table II shows. If the chosen value that is too far away from T bit /2, in this case, address 11, the recovered clock jitter would be extremely large.
Secondly, due to have less recovered jitter of address 5 and 6, we can judge that T bit /2 must fall between the two values, and then, a stream of random pattern that generates average delay value of 5.5 is applied to the chip; the measured recovered clock jitter (peak-peak) is 123.5ps at 540MHz. After that, two random patterns that generates average delay value of 5.333 and 5.667 are applied to the chip, the measured recovered clock jitter (peak-peak) are 124.0ps at 540MHz and 121.2ps at 540MHz, respectively. We can observe that the T bit /2 may fall between address 5.667 and 6, and then, two random patterns that generates average delay value of 5.7 and 5.78 are applied to the chip, the measured recovered clock jitter (peak-peak) are 116.5ps at 540MHz and 114.3ps at 540MHz, respectively. Fig. 7 shows the jitter histogram of the recovered clock when jitter reduction technique is enable. The measured peak-to-peak recovered clock jitter is 114.3ps. Fig. 8 summarizes the relationship between dynamic average value and recovered clock jitter. Generally speaking, the less mismatch between T bit /2 and dynamic average delay value, the better is the jitter performance. The measurement results are summarized in Table III while Table IV gives the performance comparison with the previous works. This paper describes a 1.08-Gb/s 0.18-μm CMOS half-rate burst-mode CDR integrating a programmable delay circuit into the burst-mode CDR chip to realize jitter reduction, compared to reference [8] . The measurement results show that the less mismatch in T bit /2, the better is the jitter performance. When jitter reduction technique is disable, the measured recovered clock jitter is 130ps, while the jitter reduction technique is enable, the measured recovered clock jitter is 114.3ps, reduced by 13.7%.
When probably position of T bit /2 is found, a calibration pattern can be fed into PDC to calibrate recovered clock jitter.
The chip size is 0.99 × 0.97 mm 2 . The power consumption of the CDR is 36mW under a 1.8-V power supply.
