A Phase Locked Loop (PLL) design based on a new phase detector (PD) is presented. It can be used as a part of data/clock recovery (DCR) systems targeting the applications of SGbit/s-3Gbit/s range ethemet and optic fiber transceivers in current semiconductor processes. A key component in the circuit is a new non-sequential PD that provides for very high speed operation. Using TSMC 0 . 2 5~ CMOS process device models and the HSPICE simulator. results show that the PLL can operate at 2.5GHz over process comers and a 0°C to 100°C temperature range. Total power dissipation is 40mW with a single 2 . W power supply.
I. INTRODUCTION
With the growing demands for faster data rates of LANs and SONET (SDH). gigabit ethemet and high-speed optic fiber networks are becoming more popular because of their high throughputs. Paralleling this demand is a booming market for high data rate transceivers.
For single-cliamel data rates in the gigahertz range. transceivers are often implemented in expensive GaAs. SiGe [I] . bipolar [2] or BiCMOS processes. With the sluinking of gate length. less costly deep sub-nucron CMOS technology can achieve faster operation which makes the CMOS implementation of transceivers particularly attractive. There remain several substantial challenges that must be overcome to take full advantage of the high speed capability of the sub-micron technology in the production of very high speed transceivers. One of the major bottlenecks is in the speed limitations of existing PD architectures which invariable are linlited in speed by the flip flops inherent in these structures.
In an attempt to circumvent this problem. most existing high speed PLL designs use a frequency divider between the VCO and the PD to reduce the speed requirements of the PD. Frequency dividers inlierently introduce more jitter and slow down the lock acquisition. An altemative is to find a better PD structure so that it can not only increase the speed of the PD. but also eliminate the frequency divider and simplify the overall PLL structure.
In this paper. a non-sequential 2.5GHz CMOS PLL is described.
The architecture and the loop design of tlie proposed PLL are described in Section II. Implementation details of each building block are provided in Section 111. Simulation results are given in Section IV.
ARCHITECTURE AND THE LOOP

DESIGN OF THE PLL
The architecture of the proposed PLL is shown in Fig. 1 . It is a typical charge pump PLL structure. A second-order passive loop filter is used for the loop filter. The VCO has four delay stages. Instead of using tlie recovered clock ':Clock". anotliser signal. f?om the VCO is extracted and fed into the PD (CLK-in). 
which shows the standard low pass characteristics. The loop will reject high-frequency phase noise from the input and reject lowfrequency phase noise from the VCO. Since the VCO is a major contributor to the jitter in the recovered data. to minimize the impact of the VCO phase noise, we need to make the bandwidth of the PLL large. This will not only suppress the phase noise, of the VCO. but also increase the tracking speed of the PLL. 
III. BUILDING BLOCKS
A. Non-Sequential Phase Detector
The bottleneck for high speed DCR system is the phase detector (or Phase Frequency Detector) because it must have the capability of handling random NRZ data and recover the clock signal which is associated with the data stream. Since the spectrum of NRZ data has no energy at its data rate. this makes the task of data recovery more difficult and places more severe restrictions on the performance of the phase detector. Often. it requires a nonlinear operation at the ftont end of the phase detector circuit to generate some energy at its data rate frequency.
Several types of phase detectors that are applicable to random data applications have been reported
Probably the most widely used is Hogge's phase detector [5] . The Hogge phase detector can be used at speeds up to 1.6GHz in the referenced process. but its performance deteriorates rapidly at modestly higher frequencies. This performance limitation is due mainly to the inability of the flip-flop used in the circuits to settle fast enough.
The PD we proposed here is specifically designed to operate at higher data rates than are achievable with the Hogge circuits. This should enable the data rate of corresponding data recovery circuits to be increased. The proposed circuits are simple and easy to implement. The structure of this new PD is partly determined by the number of delay stages in the VCO.
Because there is a fixed phase relationship' between each output stage signal in a VCO. we can take advantage of this property and introduce them into the PD in order to help fmd the phase difference.
One or two signals can be introduced from the VCO depending on the number of delay stages in the VCO. If there are even numbers of delay stages in the VCO. only one signal is needed. If there are odd numbers of delay stages in the VCO. two signals are needed.
Since we used 4 delay stages in VCO. we need only one signal from the VCO which is "CLK-in" shown in the structure of the PD in Fig. 2 . We use two delay cells and two XOR gates to detect the edges of the input random data. The Up and Down signals that are required to drive the charge pump are generated fiom signals CLK-in, E and F. When the PLL is in lock. Up and Down signals are generated by using E and F to cut the signal CLK-in in half. Therefore. Up and Down signal have the same duty cycles and hence the loop filter which filters the difference in the duty cycles of the Up and Down signals will not be driven up or down when the PLL is in lock.
From the preceding description. we note that when the PLL is in lock. the CLK is not locked to Data. but to Data-delay1 instead. However. this should not affect the operation of the PLL because. instead of Data. Data-delay1 can be used with CLK to recover the incoming data.
Whenever there is phase shift. either data is leading or lagging the clock. we can easily fmd that the pulse width of either the Up or Down signal will change according to the amount of phase shift.
In this phase detector. the Up and Down signals are only generated whenever there are transitions on the inconling data stream. This property guarantees its ability to handle random NRZ data.
The delay stages in the PD are simply implemented by a series connection of inverters. Because the correct operation of the PD allows a large range of delay time. we don't need to worry about the delay time variation due to process and temperature variations. 
B. Curren t-Switching Charge Pump and Loop Filter
In order to achieve high resolution. we chose a current-switching charge pump. The simplified structure of the charge pump. together with the loop filter. is showi in Fig. 5 . Several considerations of the charge pump design deserve mention:
To maximize the speed of the charge pump. the bias currents should be always ON. Properly chose the transistor sizes to minimize the spikes in the transient of the charge pump. which are due to the charge injection effects. Relatively large transistors are used to minimize tlie effects of mismatch.
Properly set tlie bias voltages so that the switching transistors would operate at active region instead of triode region when they are "ON"
A simple passive second-order loop filter was used. Together with the other blocks of the PLL. they fomied a thircl-order loop. It reduces the ripple which is inherently present in second-order loops at tlie control voltage node. The values of C1. C2 and R1 are carefully chosen in order to maintain an adequate phase margin in the third-order loop and minimize the control voltage ripple.
C. Voltage Controlled Oscillator and Control Voltage Generator
The VCO is a ring oscillator based structure. One of the delay stages in VCO and the control voltage generator are shown in Fig.  6 . They are based on the topology in [3]. The delay stage contains a differential pair with symmetric resistive loads.
The control voltage generator is shown in Fig. 6(b) . It produces the bias voltage Vcn and Vcp from Vcontrol. Its main function is to continuously adjust the bias currents for delay stages providing a tuning range wide enough to compensate for the temperature and process variations. 
IV. SIMULATION RESULTS
Using HSPICE simulator and TSMC 0 . 2 5~ CMOS process device models. the PLL successfully locks to typical pseuclo-random input data with a data rate of 2.5Gbit/s under normal and extreme cases.
All the schematic parasitics were included in the simulation and additional 1Of capacitors were added at each connection nodes to model the interconnection parasitics. 2. SGbit/s input data was distorted by passing it through a cascaded string of inverters before going into tlie PLL. Initial conditions were set to the control voltage. Fig.7 shows the locking characteristics of the control voltages under 3 situations. The simulations show the locks were successful acquired. Additional simulation results showed the locking range of the PLL is 1.5G-2.7G. The lower bound was obtained at fast model comer at 0°C and the upper bound occurred at the slow model comer at 100°C. The power dissipation is about 40mW (noninal) under 2.5V power supply which was quite low.
v. SUMMARY
A 2.5Gbitds CMOS PLL for data and clock recovery is described. It is targeting the applications of high speed transceivers which can achieve 2.5Gbit/s data rate in a single channel in strand CMOS processes. A new non-sequential PD structure that is capable of operating at very high speeds was introduced. Simulation results indicate it can operate at the 2.5Gbit/s data rate in TSMC 0.2511 CMOS processes. thus enabling the overall PLL to operate at higher frequencies than is practical with sequential phase detectors.
