Abstract Phase-Locked Loop (PLL) 
INTRODUCTION
This paper concentrates on CDR architectures used at the receiver end of a Serializer-Deserializer (SERDES) chain. A CDR circuit has to counter the amplitude and phase degradations induced by the transmitter, channel and the receiver as it recovers the clock and retimes the data [1] . The channel can be a satellite-link, a fiber-optic cable, a coaxial cable, a backplane Copper trace or an integrated circuit (IC) interconnect over a semiconductor substrate each with its own challenges and limitations [2] . With the IC technologies undergoing constant feature-size reduction, it is in the interest of portability and reduced time-to-market that digital, adaptive and automatically synthesizable CDR architectures be used and re-used.
We will briefly discuss the filter-type CDR circuits to elaborate the point that a PLL is not necessarily required to implement a CDR circuit. Nevertheless, PLLs provide a costeffective and a well-studied CDR implementation alternative despite the numerous circuit-and system-design challenges associated with them. As shown in Fig. 1 , one of the clock edges is aligned with the data edges using a PLL. The other clock edge samples the incoming data to implement a 2XO-CDR circuit with a maximum timing margin of 0.5 Unit Intervals peak-to-peak (UI p-p). A logical extension of this idea would be to acquire more samples per period in order to improve CDR performance. We will review the mixed-signal, digital and all-digital implementations of 3XO-CDR circuits that have been presented in the literature during the last decade. We will present a modified CDR architecture that improves one aspect of an existing 3XO design. Simulation results are presented that show the equivalence of the new architecture with the existing one. Using this result, some other research possibilities are also mentioned.
FILTER-TYPE CDR CIRCUIT
The main task of CDR circuits is to recover the clock that is not transmitted with the NRZ data in order to save power and avoid skew at the transmitter end. The block diagram of a filterstyle or a microwave-style CDR is shown in Fig. 2 [3] . The clock component has to be created in the received data spectrum using an edge-detector block (d/dt) followed by a non-linearity (x 2 ) and, subsequently, to be isolated using a high-Q Bandpass Filter (BPF) or an external Surface-Acoustic Wave (SAW) filter. A high-Q on-chip filter that is implemented using a PLL with a narrow loop bandwidth is also being actively researched in conjunction with novel receiver architectures [4] . A filter-type CDR circuit using MESFET-based off-the-shelf chips has been reported [5] with a bit rate of 35.1 Gb/s. By using this as a simulation technique, however, a large number of CDR circuits in a system can be replaced by black boxes to reduce simulation times.
PLL-BASED CDR ARCHITECTURE
In a conventional PLL-based CDR circuit shown in Fig. 3 , the clock is recovered using a PLL. The PLL acts like high-Q bandpass filter that can switch from one frequency to another electronically, if there exists a frequency divider (not shown), in the feedback loop. A D-type flipflop forms the Data Recovery (DR) or the decision circuit. The operating principle for this CDR was introduced earlier in Fig. 1 . In practice, one would have to carefully analyze jitter, systematic skews, effect of long run-lengths and acquisition of lock problems [6] . PLL-type CDR circuits are roughly classified according to the type of Phase Detector (PD) used. The most important ones are the linear and the non-linear CDR structures. A linear CDR circuit uses a Hogge's Phase Detector [7] or one of its variants [8] , whereas a non-linear CDR circuit uses a Bang-Bang Phase Detector (BBPD). It is also known as 'Alexander PD' [9] or an 'Early-Late PD'. Bang-Bang PDs provide high gain and therefore require no charge-pump (CP), no limiting pre-amplifier, automatic compensation of timing offsets since these PDs use sampled data, the possibility of multi-bit sampling in one clock cycle to implement a reduced-rate architecture and no frequency drifts during long run-lengths [10] . A sample-and-hold style PD that combines the advantages of a linear-PD and a BBPD has been reported in [11] . For more information on modern PD architectures, the reader can refer to [12] .
Disadvantages
Traditional PLL-based CDR circuits suffer from device speed limitations with increasing data rates, degradation of onchip Q for inductors (if an LC-VCO is used), 50 percent duty- 
CP
cycle problems, data feedthrough, increased VCO jitter (due to high-VCO gain resulting from supply voltage reduction) and poor performance in the presence of asymmetric jitter. In order to achieve high data rates while maintaining an acceptable performance, reduced-rate architectures are employed [13] . A Novel 1/8 th -rate PD implementation is reported in [14] .
Reduced-Rate CDR Architectures
In order to sample the full-rate data stream with a half-rate VCO, one has to use both the edges of the recovered clock as shown in Fig. 4 and later multiplex the two data streams labelled D +ve and D -ve . The PD could either be linear [15] or non-linear [16] . If further reduction in clock rate is sought, one can use additional equi-distant phases provided by a locked oscillator. The main problem in such a case becomes the generation of equi-distant phases, modeling and capacitance issues associated with multiplexer and de-multiplexer circuit design.
VARIABLE-INTERVAL OVERSAMPLING CDR ARCHITECTURE
A mixed-signal, variable-interval 3XO CDR circuit has been proposed in [17] [18] . This 3XO concept [17] is shown in Fig. 5 . Using a VCO and a DLL combination, the architecture tracks the data edges and then puts the sampling strobe exactly in the centre of the two data edges. In this quarter-rate architecture operating at 5 Gb/s (shown in Fig. 6 ), four NRZ data bits are recovered every clock cycle. The ring oscillator locks to 1.25 GHz using a 'reference loop'. Once the frequency-lock is signaled by the 'lock-detect' circuit, the control voltages for the DLL and the PLL become independent of each other. The DLL tracks the jitter probability density function (pdf) of the incoming data edges using the 'eye-measuring loop' and puts the sampling clock edge in the centre of the two edge-clocks. The BER is improved and a high-frequency jitter tolerance of 0.65 UI p-p in the presence of asymmetric jitter is achieved. As opposed to the 2XO architectures, this 3XO circuit does not acquire equidistant samples except at the end of tracking stage, so it is not strictly a 3XO CDR circuit. The authors call it a 'variable-interval' oversampling circuit. It is possible to discard the extra information (or rather not collect it all) due to the presence of a three loops that track the data edges so that three samples are collected only where these are the most relevant.
PHASE-PICKING OR BLIND OVERSAM-PLING CDR ARCHITECTURE
The blind oversampling [19] (also called 'phase-picking' [20] ) concept is shown in Fig. 7 . This oversampling concept has been around since the early 1970s when it was originally used [21] . An oddnumber of samples (three or five) are acquired per bit. The data edges are detected using an XOR gate (and the missing ones are interpolated digitally). The sample picked using the centerphase is declared as the correct data bit. Majority-voting can also be employed but is less superior than center-picking [20] . The block diagram of the architecture appears in Fig. 8 . Its main advantage is the all-digital nature and the main disadvantage is the extra power and increased latency due to the DSP core and the algorithm. This operation was named 'phase-picking' [20] , although a data bit acquired by a particular phase is picked and not the phase itself. The ambiguity can be resolved by the context of the discussion.
EYE-TRACKING DR ARCHITECTURE
An all-digital DR circuit core has been presented in [22] as shown in Fig. 9 . This is a 3XO architecture that tracks the data eye instead of the data edges. The edge-detection interval is realized using CMOS style delay elements in the BBPD. The decisions are accumulated in a serial shift register and a rotating clock phase pointer either accelerates or decelerates the phase in addition to providing an unlimited phase-range to the DR circuit. The paper provides excellent design information and equations. Fig. 10 shows the 3XO concept for this DR circuit. One has to detect the edges and keep the recovered clock away from these edges. The sampling occurs at the centre of the eye and provides a low BER and a high-frequency jitter tolerance of 0.7 UI p-p. The architecture is compliant with the SFI-5 specification. Of all the architectures reviewed, this consumes the least power and area as well.
OTHER OVERSAMPLING CDR ARCHI-TECTURES
A true phase-picking or phase-selection CDR architecture would pick and use one of the phases directly [23] [24] or feed it back to another PLL in order to achieve further dithering of the jitter [23] . Some of the other important publications in this field are [25] [26] [27] [28] . One noteworthy entry is a quad 3.125 transceiver chip featuring an analog phase rotator [29] . Due to limited space, we now limit this overview and present our modified DR architecture.
MODIFIED DR ARCHITECTURE
In our previous paper [30] , we swept three critical Digital PLL (DPLL) parameters for the eye-tracking architecture [22] ; i.e., the edge-detection interval of the BBPD, the number of clock phases and the phase update interval, in order to investigate their effects on the jitter tolerance of the DR circuit. The edge-detection interval for the DR circuit is a critical parameter of this design. Its value has a conflicting influence on the highand low-frequency jitter tolerance of the DR circuit. If the edge-detection window is too narrow, data edges cannot be detected effectively and the low-frequency tracking ability of the DPLL suffers. If the edge-detection window is too wide, the high-frequency jitter tolerance deteriorates but the tracking improves. The simulations were performed in Matlab/Simulink.
Our modified architecture ameliorates this problem by removing the CMOS style delay blocks in the BBPD altogether. Instead we rely on the presence of a DLL-based phase generator [31] that can provide equidistant phases with much less Process, Voltage and Temperature (PVT) dependence, perhaps for many on-chip data lanes. In addition, we not only use one rotating phase as shown in Fig. 12(a) [22] , but three rotating phases, a fixed distance apart (as seen in Fig. 12 (b) ). The difference between this circuit and a conventional VCO type circuit would be the unlimited phase range due to three rotating phases and the fact that the phases generated by the DLL are not synchronized to the data stream, although they have to be close to the Fig. 10 . Eye-Tracking 3XO concept data frequency. The BBPD architecture shown in [22] (also see Fig. 9 ) is retained, but the delay elements are removed and the Early, Centre and Late Phases as shown in Fig. 11 are used to clock the three front-end flipflops. This implements a 3XO alldigital CDR architecture that doesn't suffer from PVT induced jitter tolerance shifts due to CMOS type delay blocks.
The phase resolution would depend on the speed of the technology and the design requirements for the DLL. For achieving The modified architecture finer resolutions with slower CMOS technologies, a phaseinterpolator could be used. With the improvement in the speed of the technology, the entire design could be implemented using digital-style delay cells in the DLL with sufficient resolution.
Simulation results
Matlab simulation results available at the time were reported in [30] . In order to maintain continuity, we report the comparison of the jitter tolerance for our version of the reference architecture and our modified architecture using the Matlab/Simulink platform. The number of clock phases is eight. The phase update interval is 16 bits and the clock phases maintain a 0.25 UI separation. Fig. 13 shows that the two approaches produce equivalent results for the jitter tolerance. A typical simulation of 2 s is shown in Fig. 14 (5000 bits are not shown) . The DR circuit uses three clock phases, a fixed distance (0.25 UI) apart and comfortably tracks a 0.5 MHz, 8.5 UI p-p jitter sinusoid as it recovers the NRZ data at 2.5 Gb/s. The simulation is self-explanatory, except that any spike reaching a +1 threshold would have meant a bit error in Fig. 14(e) [30] .
Future Research
If we take a closer look, the eye-tracking architecture and the jitter-tolerant variable-oversampling architecture can both be merged using the modified architecture. One would have to make two significant adjustments. These would be:
• The static distance between the early, center and late phases could become dynamic, similar to the one presented in [17] , limited by the available phase resolution from the DLL circuitry. This would allow one to measure the eyeopening digitally. This information can be used to control equalizers in the overall SERDES architecture.
Fig. 13. Jitter Tolerance Comparison: Reference Design vs Modified design
• Changing from the full-rate architecture to half-rate or quarter-rate architecture and having a multiplicity of flipflops in the BBPD clocked by DLL generated phases. The multiple phases would still keep rotating.
This requires the addition of some digital circuitry and would produce a DR architecture that is all-digital, and measures the eye-opening digitally with low power dissipation.
Conclusions
• Following the basic clock recovery concept behind a filtertype CDR circuit, a brief review of PLL-style 2x-oversampling CDR architectures was presented. Several 3x-oversampling CDR architectures were reviewed including the eye-tracking, jitter-tolerant variable-oversampling and the blind oversampling architectures.
• A modified architecture was presented that minimizes the shifts in jitter tolerance performance of the eye-tracking architecture using a 3XO architecture, with three rotating- phases instead of one rotating phase, and utilizing a DLLbased phase generator.
• The two architectures were found equivalent with respect to their jitter tolerance performance.
• Some possibilities about work in progress were also mentioned. Current work is being done using the Verilog-A platform, the added benefits of which will be described in another publication along with a pertinent comparison of the simulation results.
