Abstract-Two 10-Gb/s inductorless clock and data recovery (CDR) circuits using different gated digital-controlled oscillators (GDCO) are presented. A digital frequency calibration is adopted to save the power consumption and chip area. They have been fabricated in 0.18-m CMOS process. By using the complementary gating technique, the first CDR circuit occupies an active area of 0.16 mm 2 and draws 36 mW from a 1.8 V supply. The measured rms jitter and peak-to-peak jitter is 8.5 ps and 42.7 ps, respectively. By using the quadrature gating technique, the second CDR circuit consumes an active area of 0.25 mm 2 and its power consumption of 56 mW. The measured rms jitter and peak-to-peak jitter is 3.4 ps and 21.8 ps, respectively. The power of the second CDR circuit is higher than that of the first one but its jitter is reduced.
I. INTRODUCTION
T HE phase-locked loop (PLL) is often used to realize the clock and data recovery (CDR) circuit. The PLL-based CDR circuit has a long settling time which may not be acceptable for some applications. For example, the passive optical networks (PONs) adopts the time division multiple access (TDMA) scheme to connect many optical network units (ONUs) with an optical line termination (OLT). The OLT has to receive the sequent burst-mode data from ONUs. As a result, the CDR circuit at the OLT must detect the input data within tens of bit times and recovers the clock and data [1] . Obviously, the conventional PLL-based CDR circuit is hard to settle within such a short time. Furthermore, a large number of serializers and deserializers (SerDes) are needed in high-speed data transceivers. To exchange the multi-tens Gb/s data among several transceivers, CDR circuits with efficient power and small area are needed.
The open-loop CDR circuit [2] - [10] may provide a good solution to the above problems due to its wide bandwidth. Usually the gated voltage-controlled oscillators (GVCOs) [2] - [10] are widely adopted in open-loop CDR circuits, especially at higher input data rates. Conventional GVCOs are usually controlled in the analog approach.
They can be roughly classified into two categories. One uses the edge detection circuit with a half bit-time delay line to trigger the GVCO [5] , [6] , [9] . The accuracy of the half bit-time delay line affects the jitter tolerance and bit-error rate. The other combines two GVCOs with a NOR gate [2] - [4] or a multiplexer [7] to realize a burst-mode CDR circuit. The complicated logic gates utilizing in these GVCOs limit the bandwidth and consume a large power. Moreover, the active area is increased if the passive inductors are needed [9] . For a GVCO-based CDR circuit in Fig. 1 , a reference PLL generates a control voltage Vc to adjust both the main and replica GVCOs for the target frequency. The reference PLL is composed of a replica GVCO, a divider, a phase-frequency detector (PFD), and a charge pump (CP). Then, the main GVCO recovers the clock to retime the data. The open-loop CDR circuit in Fig. 1 works well for low-bit-rate applications, but the power and area increase severely when the data rate rises to several tens gigabits. It is because the two GVCOs may need passive inductors and a lot of power to boost its bandwidth [9] , [10] .
In this paper, an open-loop CDR architecture with digital frequency calibration is presented. The gated digital-controlled oscillator (GDCO) with a digital frequency calibration loop (DFCL) is adopted. It saves a lot of power because no replica GDCO is needed. Since no analog loop filter in a reference PLL and no inductor are needed, it saves a considerable active area. Two CDR circuits using the complementary gating technique and quadrature gating technique are realized. The GDCO using the quadrature gating technique reduces the jitter. This paper is organized as follows. Section II introduces the two proposed GDCOs and their working principles. The proposed open-loop CDR circuit and its design considerations are given in Section III. The experimental results are given in Section IV and the conclusions are given in Section V. 
II. GATED DIGITAL-CONTROLLED OSCILLATORS
In this section, two inductorless half-rate GDCOs are introduced. The first one uses the complementary gating technique, which is modified from [10] . The second one uses the quadrature gating technique to improve the output jitter further.
A. The GDCO Using the Complementary Gating Technique
In Fig. 2(a) , the first GDCO is composed of three gated multiplexers, M1-M3, and a 4-bit digital-controlled buffer. When the input data is high, the multiplexers, M1 and M2, and the 4-bit digital-controlled buffer form an oscillator. The multiplexer, M3, outputs the clock B, which is the complement of the clock A. When the input data is low, the multiplexers, M1 and M3, and the 4-bit digital-controlled buffer form another oscillator. The multiplexer, M2, outputs the clock A. Once the input data changes, the clock A or B tracks the data. Fig. 2(b)-(d) illustrates how the proposed GDCO adjusts its output phase when the clocks lag, lead, and lock with the data, respectively. In Fig. 2(b) , when the clocks lag the data, the clocks A and B change their polarity before the threshold voltage. It is equivalent to speed up the clock C to compensate the lagged phase. Similarly, in Fig. 2(c) , when the clocks lead the data, the clocks A and B change after the threshold voltage to correct the phase. It is equivalent to slow down the clock C to compensate the leading phase. When the CDR circuit locks with the input data, the timing diagram is shown in Fig. 2(d) . In a conventional GVCO-based CDR circuit [2] - [7] , a GVCO starts to oscillate when the input data is high and stops to be latched when the input data is low. Serious amplitude variation happens if the output is latched to the supply voltage or ground. It also slows down the speed of the oscillators. For the proposed GDCO in Fig. 2(a) , the oscillating waveforms are never latched. Thus, the amplitude variation is reduced and the bandwidth requirement of the gated multiplexers is also relaxed.
The building blocks for the proposed GDCO are shown in Fig. 3 . The data-gated multiplexer M1 is realized by the current-mode logics (CMLs). The data-gated multiplexers, M2 and M3, are similar to M1, except that an additional cross-coupled pair is added. The inputs, Data\pm, in M1-M3 are used to select one of two differential inputs, in1\pm and in2\pm, respectively. The tail current source is also used to suppress the power supply noise. The reason why a cross-coupled pair is added is illustrated in Fig. 4 . Because two inputs of the multiplexers, M2 and M3, are complement, the condition they encounter the data transitions can be simplified as the differential pairs with discontinued inputs. As shown in Fig. 4 , since Q1 experiences a higher voltage level than Q2, Q1 has a larger transconductance than Q2. It results in the amplitude unbalance for out+ and out-; thus, the output jitter will be power supply noise. The reason why a cross-coupled pair is added is illustrated in Fig. 4 . Because two inputs of the multiplexers, M2 and M3, are complement, the condition they encounter the data transitions can be simplified as the differential pairs with discontinued inputs. As shown in Fig. 4 , since Q1 experiences a higher voltage level than Q2, Q1 has a larger transconductance than Q2. It results in the amplitude unbalance for out+ and out-; thus, the output jitter will be increased. If a cross-coupled pair (Q3 and Q4) is added, it makes the outputs more balanced as shown in Fig. 4 . To improve the frequency accuracy of the GDCO, a 4-bit digital-controlled buffer is used and it's shown in Fig. 3 as well. Based on simulation results, this 4-bit digital-controlled buffer ensures a tuning range of 600 MHz around 5 GHz with a monotonic frequency step no more than 60 MHz. The reason why such a frequency step is chosen will be explained in the following section.
For a conventional GVCO [2] , [3] , [9] , the edge detecting circuit and complex logic circuits, such as NAND or NOR gates, are needed. However, the bandwidth of CML NAND or CML NOR gates is much slower than that of CML multiplexers. It is because the parasitic capacitance of CML NAND or CML NOR gates increases due to the cascode or parallel transistors. As shown in Fig. 3 , the building blocks of the proposed GDCO are realized by CML multiplexers and CML differential pairs. It relaxes the bandwidth requirements for the high-speed GDCO.
B. The GDCO Using the Quadrature Gating Technique
The GDCO introduced in the previous section works well for a half-rate 10-Gb/s CDR circuit. However, its jitter performance can be improved further. As illustrated in Fig. 2(a)-(d) , the previous GDCO selects the complementary signals from the multiplexers, M1-M3. It makes the GDCO always be disturbed at zero crossings once the data transitions occur. It results in a larger jitter performance.
To solve this problem, a modified GDCO using the quadrature gating technique is shown in Fig. 5(a) . For the GDCO in Fig. 5(a) , the data-gated multiplexers, M1-M3, are used, but the extra digital-controlled multiplexers, M4-M7, are inserted to generate the quadrature signals. This GDCO is gated by selecting the quadrature signals from the inputs of the multiplexers, M2 and M3. When the input data is high, the multiplexers, M1, M2, M4, and M5, form an oscillator. M6 and M7 are added to generate the replica quadrature signals.
Besides, M3 outputs the clock B, which is approximately delayed by 90 from the clock A. Note that the quadrature signals can be realized from M1, but it introduces too much loading on M1's outputs. On the other hand, when the input data is low, the multiplexers, M1, M3, M4, and M5, form an oscillator and M2 outputs the clock A, which is also approximately delayed by 90 from the clock B. Once the input data changes, the clock A or B tracks the data. Fig. 5(b) -(d) illustrates how the second GDCO adjusts its output phase when the clocks lag, lead, and lock with the data, respectively. As shown in Fig. 5(b) , when the clocks lag the data, the clock A changes at a lower voltage than the threshold voltage. It slows the clock C down more rapidly. It is equivalent to speed up the clock to compensate the lagged phase. Similarly, in Fig. 5(c) , when the clocks lead the data, the clock A changes at a higher voltage than the threshold voltage so as to correct the phase. When the CDR circuit locks with the input data, the timing diagram is shown in Fig. 5(d) . For the GDCO in Fig. 5(a) , the oscillating waveforms are never latched, too. Thus, the inter-symbol interference (ISI) due to the amplitude variations is improved.
The building blocks of the second GDCO are plotted in Fig 6 . The multiplexers, M1-M3, are implemented with the CML circuits and cross-coupled pairs are added to balance the amplitude. The multiplexers, M4-M7, are realized by CMLs with 4-bit digital tuning. To operate at 5 GHz, the frequency tuning is only applied to the multiplexers M4-M7 in order to maximize the bandwidth of the GDCO. The multiplexers M4-M7 also realize a tuning range of 600 MHz around 5 GHz with a monotonic frequency step no more than 60 MHz. From Fig. 5(d) , the second proposed GDCO selects the quadrature signals from the inputs of the multiplexers. The output waveform of the GDCO is only disturbed at its highest or lowest position, which is less sensitive to noise.
To evaluate these effects, the following simulation results are given. The first proposed GDCO is simulated with a 10-Gb/s jitter-free data with 30% transition time. The simulated eye diagram of the recovered data is shown in Fig. 7(a) . The resulting peak-to-peak jitter is 0.06 UI. The second GDCO is also simulated with the same input data. The simulated eye diagram of the recovered data is shown in Fig. 7(b) . The resulting peak-to-peak jitter is 0.03 UI, which is only 50% of the first one.
III. PROPOSED OPEN-LOOP CDR ARCHITECTURE

A. The Open-Loop CDR Circuit With Digital Frequency Calibration
The proposed open-loop CDR architecture using the GDCOs is shown in Fig. 8 . It has two operation modes; one is the frequency calibration mode and the other is the data-recovering mode. The proposed CDR circuit is composed of a 4-bit digital-controlled GDCO, a DFCL and the D-flip-flops (DFFs). Initially, the CDR circuit operates in the frequency calibration mode. In this mode, the GDCO is connected to a reference voltage, , and it behaves like a traditional digital-controlled oscillator. Meanwhile, the de-multiplexing DFFs, DFF3 and DFF4, are turned off to alleviate the loading on GDCO's output. Then the frequency of the GDCO is divided by 8 to compare with a reference clock of 625 MHz by a conventional frequency detector (FD). The frequency calibration procedure is realized by a 4-bit successive approximation register-controlled (SAR) controller. The clock for this SAR controller is 1.22 MHz, where the reference clock of 625 MHz is divided by 512. The frequency calibration mode is accomplished within 4 s. Once the frequency calibration mode is completed, all the digital blocks in the DFCL are turned off except for the output digital codes of the SAR controller. It minimizes the digital noise and power consumption. Then, this CDR circuit operates in the data-recovering mode. Then, the GDCO is connected to input data. The de-multiplexers DFFs, DFF3 and DFF4, are turned on to recover the input data into two 5-Gb/s data streams. Note that four DFFs, DFF1-DFF4, are realized by the CMLs, so the output loading for the GDCO is maintained for both operation modes. In these two prototype chips, the operation modes are switched manually.
In Fig. 8 , the GDCO is interrupted if the DFCL works. To track automatically the fluctuations of the temperature and power supply on a chip, this architecture can be modified slightly with an additional dummy GDCO as shown in Fig. 9 . In Fig. 9 , the DFCL is used to reduce the offset frequency between the GDCO and dummy one without interrupting the main GVCO. However, the offset frequency between the GDCO and dummy one still exists. The appropriate layout is used to reduce this effect. A dummy GDCO is inserted to refresh the digital control codes periodically. Note that the power consumption is not increased much because the DFCL need not to be turned on continuously. The chip area is not increased too much, because the GDCOs do not need the passive inductors. Assume the power consumptions of the GDCO and data-recovering DFFs as and that of the DFCL as . Also let DFCL's refreshing cycle as and the DFCL takes to accomplish the frequency calibration procedures. Then the total power consumption, can be written as
Because the DFCL takes to accomplish the frequency calibration procedures, the power consumption of the DFCL, , can be reduced greatly by choose a moderate refreshing period. For example, if the refreshing rate is 10 kHz (i.e., ), the power consumption of the DFCL can be reduced by 96%. 
B. The Tolerable Frequency Deviating Range for the GDCO
In Section II, two GDCOs have a tuning range of 600 MHz around 5 GHz with a monotonic frequency step no more than 60 MHz. As a result, we should decide if the frequency step of 60 MHz suffices for error-free data recovering even facing sampling offset and input data with modulated jitter. Considering the DFF may have the sampling error at the vicinity of data transitions, all the sampling non-idealities, such as the data transition time and sampling offset, are represented as for a PRBS input data and a half-rate sampling clock, where is the bit time. Assume that the GDCO itself corrects the phase error once experiencing any data transition. As illustrated in Fig. 10 , if the period of the GDCO, , is slightly longer than two bit times, the sampling edge will deviate from the ideal sampling point and cause incorrect sampling when the , is given as (2) Furthermore, at the end of CID, the maximum allowable deviated sampling time,
, is given as
For a PRBS data with maximum CID and a correct sampling, the accumulative deviated sampling time at the end of CID, , should satisfy the following equation: Substituting (2) and (3) into (4), the upper bound for the period of the GDCO is (5) The condition that the period of the GDCO, , is slightly shorter than the input bit time can be derived in the same manner. Thus, the allowable period of the GDCO is given as (6) The above analysis is done with the jitter free assumption and may be too optimistic. Actually, the jitter with different modulation frequency may deteriorate the maximum allowable deviating sampling time, , in (3). Usually the jitter tolerance mask specifies the amount of jitter (in UIpp), which a system should tolerate at a certain modulation frequency. Here we take the jitter tolerance mask of OC-192 for design consideration. Let the data phase at the start of CIDs and the end of CIDs be and , respectively. Since the data transition occurs at a discrete time, they are expressed as (7) (8) where is the amplitude of the jitter modulation (in UI), is the jitter modulation frequency of the input data (in rad/second), and m is the time index. The maximum allowable deviated sampling time, , in (3) is redefined as The above equation can be further written as (10) Considering the worst case and , (10) can be simplified as (11) Eq. (11) indicates that the sampling margin decreases as the amplitude of the jitter modulation and the jitter modulation frequency go up. Substituting (11) into (4)- (6), the allowable period of the GDCO is rewritten as (12) For the jitter tolerance mask of OC-192, the worst case happens when UI and MHz. Hence, the maximum tolerable frequency offset versus the length of PRBS can be calculated according to (12) . The results are plotted in Fig. 11 for 0.3, 0.5, and 0.7, respectively. For CDR circuits utilizing GVCOs or GDCOs, the run-length limiting codes like the 8B/10B coding are usually adopted [7] , [8] , [10] . Hence, a PRBS is sufficient for normal operation. Assuming the sampling margin is only 30% of the input bit time, we know that the maximum tolerable frequency offset is 100 MHz according to Fig. 11 . As a result, the frequency step of the GDCOs less than 60 MHz is chosen to ensure a correct data receiving. 
C. Design Considerations for the DFCL
The DFCL is designed with a SAR scheme [11] . The 4-bit SAR controller used in our work is shown in Fig. 12 . The operation principle is described as follows. In the beginning, the signal CLR stays low to clear all the cells in the SAR controller. Then, it is switched to high when the calibration procedure starts. The calibration clock of 1.22 MHz is generated by the reference clock of 625 MHz divided by 512. As the first rising edge of the calibration clock arrives, the most significant bit (MSB), bit3, is set to high. Then, MSB will be maintained or changed to low according to the comparison result from the FD at the next clock edge and the next bit, bit2, is set to high. Finally, the process will repeat until the least significant bit (LSB), bit0, is determined. The SAR controller also generates a signal, "SAR_Stop", to finish the calibration procedure.
The period of the clock in the 4-bit SAR controller is an important parameter. If the clock period is shorter than the FD's response time, the calibration result may be incorrect and BER increases. Conversely, if the clock period is too long, the total calibration time increases dramatically and the total power consumption, , in (1) increases. Hence, it is critical to find the FD's response time. A conventional FD in the DFCL is shown in Fig. 13(a) . In Fig. 13(b) , assume the feedback clocks (CKI and CKQ in Fig. 8 ) are faster than the input reference clock. The sampling edges of the input reference clock will drift rightward due to the frequency offset. Hence, the FD's longest acquisition time in terms of the number of the reference clock, , is
where is the frequency of the reference clock of 625 MHz and is the frequency offset between the feedback clocks and the reference clock. In our design, the FD is desired to detect a 10 MHz frequency offset for a GDCO of 5 GHz. Therefore, is calculated as 500 and the 4-bit SAR controller is triggered by a 1.22 MHz clock, which is divided by 512 from the reference clock of 625 MHz. 
IV. EXPERIMENTAL RESULTS
Two CDR circuits using the first and the second GDCOs have been fabricated in 0.18-m CMOS process as shown in Fig. 14(a) and (b) , respectively. The core area is mm for the first one and mm for the second one. The CDR circuit using the first GDCO consumes 36 mW and 60 mW from a 1.8-V supply under the data-recovering mode and frequency calibration mode, respectively. The CDR circuit using the second GDCO consumes 56 mW and 80 mW from a 1.8-V supply under the data-recovering mode and frequency calibration mode, respectively.
The measured digital tuning curves for two GDCOs are given in Fig. 15 . Both GDCOs cover the desired frequency and the maximum frequency step is less than 60 MHz. Fig. 16 gives the measured frequency calibration transient for the DFCL. The measurement is performed on the CDR circuit using the first GDCO. Limited by the maximum detecting frequency of our modulation analyzer, the first GDCO's output had been downconverted by a 4 GHz tone with a mixer in advance. The desired frequency is 4976.5 MHz, which is half of the data rate of OC-192. The measured calibration time is about 4 s.
The measured 10-Gb/s data acquisition time with user-defined patterns for the CDR circuits using the first and the second GDCOs are shown in Fig. 17(a) and (b) , respectively. Both CDR circuits require less than 5 bits (0.5 ns) to recover data. Fig. 18(a) and (b) shows the measured recovered half-rate data and clock of the CDR circuit using the first GDCO for a 10-Gb/s PRBS, respectively. The measured peak-to-peak jitter for the recovered clock is 42.7 ps. The measurement results for the CDR circuit using the second GDCO are demonstrated in Fig. 19(a) and (b) . The measured peak-to-peak jitter for the recovered clock is 21.8 ps, which is 50% of the CDR circuit using the first GDCO. It meets the simulation results in Section II well. Both CDR circuits achieve a bit-error rate (BER) less than . We have tested 20 chips for both BMCDR circuits with longer PRBS lengths. For both BMCDR circuits, the measured BER is less than for a PRBS; however, the BER degrades for a PRBS. The eye diagrams of both BMCDRs for a PRBS are shown in Fig. 20(a) and (b). It is hard to decide the accurate values of the parameters in (12) , such as the data rising time inside the chip. Equation (12) still provides us a worst case prediction. Referring to Fig. 11 and (12) , the tolerable PRBS length is 11 for both BMCDR circuits with a frequency offset of 60 MHz and . The performance summary of these two CDR circuits and the comparisons with the previous works are given in Table I .
V. CONCLUSION
An open-loop CDR architecture with digital frequency calibration is presented. By using this architecture, the enormous power consumption, chip area, and loop filters on the auxiliary PLL can be reduced. Moreover, two proposed inductorless half-rate GDCOs are presented. The design considerations are derived in this paper. The experimental results are also demonstrated.
