Clock- and data-recovery IC with demultiplexer for a 2.5 Gb/s ATM physical layer controller

Hansen, Flemming; Salama, C.A.T.

Published in:
Proceedings of the IEEE International Symposium on Circuits and Systems

Link to article, DOI:
10.1109/ISCAS.1996.541949

Publication date:
1996

Document Version
Publisher's PDF, also known as Version of record

Citation (APA):
CLOCK- AND DATA-RECOVERY IC WITH DEMULTIPLEXER FOR A 2.5GB/S ATM PHYSICAL LAYER CONTROLLER

Flemming Hansen  
C. Andre T. Salama  

Center for Broadband Telecommunications  
Dep. of Electromagnetic Systems  
Technical University of Denmark  
Building 348  
DK-2800 Lyngby, Denmark  
fh@emi.dtu.dk  

VLSI Research Group  
Dep. of Electrical and Computer Engineering  
University of Toronto  
10 King's College Road  
M5S 1A4, Ontario, Canada  
salama@vrg.utoronto.ca  

ABSTRACT

A Clock- and Data-Recovery (CDR) IC for a Physical Layer Controller in an Asynchronous Transfer Mode (ATM) system operating at a bit rate of 2.488Gb/s is presented. The circuit was designed and fabricated in a 0.8μm BiCMOS process featuring 13GHz fT bipolar transistors. Clock-recovery is accomplished with a Phase-Locked Loop (PLL). The PLL uses a Phase- and Frequency Detector (PFD) to increase the pull-in range. No external components are required. A novel Voltage Controlled Oscillator (VCO) generating both in-phase and quadrature clocks, required by the PFD, is presented. The CDR includes a 1:8 demultiplexer with bit-rotation. Emitter Coupled Logic (ECL) is used in the PLL, data-regeneration and part of the demultiplexer, while the low-speed parts of the demultiplexer are implemented in dynamic CMOS using the True Single-Phased Clock (TSPC) approach.

1. INTRODUCTION

The Clock- and Data-Recovery circuit is part of a Physical Layer Controller for a 2.5Gb/s Asynchronous Transfer Mode (ATM) system. In optical fiber transmission only the data signal is transmitted, the reference clock signal must be generated at the receiving end from the data signal. Two popular methods for extracting the clock information from the data signal are narrow-band filters [1] and phase-locked loops [2]. To reduce the bandwidth of the optical signal, a Non Return to Zero (NRZ) coding is usually chosen, although clock extraction is harder for this coding scheme than other coding schemes, e.g. Return to Zero (RZ), since an NRZ signal contains no energy at the clock frequency. When using narrow-band filters it is therefore necessary to preprocess the NRZ signal in a nonlinear element before filtering the clock signal from the data. In addition to preprocessing, a narrow-band filter is usually bulky, and not suited for integration. A better suited method for integration is to use a Phase-Locked Loop (PLL) where the phase of a local oscillator is aligned with the phase of the incoming data, using a feedback loop. This solution and its implementation in a 0.8μm BiCMOS process featuring 13GHz fT bipolar transistors [3] is presented in this paper.

2. CDR ARCHITECTURE

A block diagram of the CDR circuit is shown in Figure 1. The Phase and Frequency Detector (PFD) compares the arriving data with the clock generated by the Voltage Controlled Oscillator (VCO), generating an error-signal that is filtered in the Loop Filter (LF) and fed back to the VCO. A basic Master-Slave flip-flop is used to regenerate and synchronize the data, using the clock signal from the VCO.

Figure 1. CDR block diagram

The PFD used, and illustrated in Figure 2, is a minor variation of one presented by Potthäcker et al [2]. The PFD consist of two identical Phase-Detectors (PD), a Frequency-Detector (FD) and a Loop Filter Driver. In the two PDs the data is sampled with the in-phase and

0-7803-3073-0/96/$5.00 ©1996 IEEE 125
quadrature clocks respectively, resulting in two beat notes, at nodes Q1 and Q2, with a frequency equal to the difference between the data bit rate and the VCO clock frequency. By examining the phase relation between the two beat notes, the FD can determine whether the VCO frequency is higher or lower than the data bit rate.

![Figure 2. PFD block diagram](image)

In previous work [2], the outputs from one PD and the FD are summed in the analogue domain, resulting in a ternary output from the PFD. To avoid this, and associated problems, the PFD outputs are processed digitally in the Loop Filter Driver.

The acquisition procedure consists of two stages, frequency-acquisition and phase-acquisition. During frequency-acquisition, the PD output will have a duty cycle very close to 50%, and can therefore not be used to drive the VCO towards lock. In this case the outputs from the FD, given in Table 1, are used to make sure that the VCO control voltage is pulled in the right direction. Due to the digital nature of the control loop, there is no steady-state for the VCO control voltage. The active area occupied by the PLL is 0.54mm² and the power dissipation is approximately 250mW.

![Figure 3. VCO schematic](image)

### Table 1. Phase and Frequency Detector states

<table>
<thead>
<tr>
<th>State</th>
<th>Q1</th>
<th>Q2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Freq. acq. (f_{VCO} &lt; f_{data})</td>
<td>Low</td>
<td>High</td>
</tr>
<tr>
<td>Freq. acq. (f_{VCO} &gt; f_{data})</td>
<td>High</td>
<td>Low</td>
</tr>
<tr>
<td>Phase acq. (f_{VCO} = f_{data})</td>
<td>High</td>
<td>High</td>
</tr>
</tbody>
</table>

### 3. VOLTAGE CONTROLLED OSCILLATOR

The PFD used requires two clock signals with a 90-degree phase shift, if the phase shift is not exactly 90 degrees the PFD will still work, but with an asymmetric pull-in range. A VCO that generates quadrature clock signals with an accurate phase shift is shown in Figure 3, it is based on the VCO presented in [4] by Razavi and Sung.

In the process available to us, the VCO in [4] would give an oscillation frequency of approximately 5GHz.

However, the target frequency of our work is 2.488GHz (STM-16/STS-48), and this factor of two can be used to get both in-phase and quadrature outputs from the VCO. To reduce the center frequency of the VCO to the required 2.5GHz, a 6-stage ring-oscillator is used, with two ring-stages between the inputs of transconductance amplifiers connected to the same common load. By connecting transconductance amplifiers to the remaining ring-stages, an exact quadrature output is achieved. The time between transitions on the clock is the delay through two ring-stages, which means that unequal loading of the ring-stages directly translates into jitter. To eliminate this cause of jitter, a layout with abutted cells was used, alternating between ring-stages and transconductance amplifiers, eliminating the need for interconnect in the ring. To generate the ring, we used the fact that any odd number of inversions in the ring will give oscillation, and that the transconductance stages can be either inverting or non-inverting, by interchanging the outputs. The ring-stages are basic ECL buffers, with the VCO control voltage applied to the current sources in the emitter-followers, to reduce variations in voltage swing. An additional control voltage for the current sources in the differential stage of the ECL buffers is available, and can be used to adjust the center frequency of the VCO. The VCO occupies an area of 0.11mm², with a power dissipation of approximately 150mW, and a VCO tuning range of 400MHz centered around 2.5GHz.

### 4. DEMULTIPLEXER

The block diagram for the demultiplexer is shown in Figure 4. A basic tree-structure with Master-Slave (MS) and Phase-Shifted (PS) flip-flops is used, with data-regeneration performed by the first flip-flop. The clock can be supplied by either the on-chip PLL or an external clock generator.

The 1:4 operation is implemented in ECL, while the final stage, as well as the bit-rotation is implemented in CMOS using the True Single-Phased Clock (TSPC) approach [5]. In the final stage positive and nega-
tive edge-triggered dynamic CMOS flip-flops are used. To synchronize the outputs, P-type latches are added to the outputs of the positive edge-triggered CMOS flip-flops. The size of the demultiplexer including bit-rotation block is 0.42mm², power dissipation is approximately 400mW.

4.1. ECL-to-CMOS converter

The ECL-to-CMOS converter used here uses a current mode approach, to avoid having MOS transistors directly driven by low-swing nodes. A schematic for the ECL-to-CMOS converter is shown in Figure 5.

The converter consists of a Current Mode Logic (CML) stage, three CMOS current mirrors and a CMOS inverter for better driving ability. If the input A is high, the collector current of Q₁ is mirrored through the NMOS current mirror of M₂ and M₃, and further mirrored through the PMOS current mirror M₆ and M₇, pulling down the input to the inverter, thus resulting in a rising output. If the input is low, i.e. A₀₀₃ is high, the collector current of Q₂ is mirrored through M₃ and M₄, pulling the input to the inverter high, and thereby resulting in a low output.

4.2. Bit-rotation

The presented circuit is part of a Physical Layer Controller for cell-based ATM. During the cell-delineation procedure byte and cell boundaries must be determined, by scanning the bitstream for valid Header Error Control (HEC) fields [6].

The logical operation of the demultiplexer is to grab eight consecutive bits from the serial bitstream, and present them to the subsequent circuitry in parallel. However, at start-up the eight bits will most likely not correspond to a single octet. During the cell delineation procedure, it is therefore necessary to be able to rotate the bits, corresponding to moving the octet-window in the serial bitstream. This operation is not unique to an ATM Physical Layer Controller, and is therefore implemented with the demultiplexer, to facilitate the use of the demultiplexer in other systems. A schematic for the bit-rotation circuit is shown in Figure 6.

5. FABRICATION

The presented circuit has been fabricated in a standard 0.8μm BiCMOS process featuring 13GHz fₚ bipolar transistors [3], at Northern Telecom, through Canadian Microelectronics Corporation (CMC). A chip photo is shown in Figure 7. The chip contains several independent test-structures, to facilitate testing and characterization of the individual blocks. In the lower right of
Figure 7, the VCO can be seen, while the upper right shows the CDR block, corresponding to Figure 1. The left hand side of Figure 7 contains the full circuit with CDR and demultiplexer. The total area of the chip is 7.3mm².

Figure 7. Micrograph of CDR with demultiplexer

6. MEASUREMENT RESULTS

Since it is the determining factor in proving the feasibility of the new architecture and obtaining the required speed, preliminary measurements were concentrated on the VCO. Measurements on the performance of the full circuit will be presented at the conference.

Figure 8 shows the measured and simulated VCO oscillation frequency versus the applied control voltage. Simulation results are shown for worst, best and typical process corners. It is seen that the measured oscillation frequency is a bit lower than expected from simulations, but within what could be caused by process variations. Adjustments to the current sources in the CML stages of the VCO ring can easily be used to move the entire curve up or down, thereby centering the VCO tuning range around the target frequency of 2.5GHz.

7. CONCLUSION

In this paper a clock- and data-recovery circuit, combined with a demultiplexer and bit-rotation circuitry, was presented. This circuit is an important part of a fully integrated Physical Layer Controller for a 2.5Gb/s ATM system, but can be used in other systems operating at this bit rate. The circuit was implemented in BiCMOS, with the high-speed parts using Emitter Coupled Logic, while dynamic CMOS was used for the last demultiplexing stage and for the bit-rotation logic.

As part of the PLL, a novel VCO configuration, generating high-speed quadrature clock signals, was presented.

8. ACKNOWLEDGEMENTS

The authors would like to thank the Danish Research Academy (Forskerakademiet) and Micronet for financial support.

REFERENCES