## A 10 Gb/s bit interleaving CDR for low-power PON

C. Van Praet, G. Torfs, Z. Li, X. Yin, D. Suvakovic, H. Chow, X.-Z. Qiu and P. Vetter

A novel, low power, downstream clock and data recovery (CDR)decimator architecture is proposed for next generation, energy efficient 10 Gb/s optical network units (ONUs). The architecture employs a new time division multiplexing (TDM) bit-interleaving downstream concept for passive optical networks (Bi-PON) allowing early decimation of the incoming data and lowering of the processing speed to the user rate of the ONU, thus reducing the power consumption is significantly.

*Introduction:* As the demand for broadband services keeps rising, data-rates in access networks keep increasing. Despite technological improvements, energy consumption in high speed access networks is expected to rise inevitably. Considering the vast amount of subscribers, the total power consumption in the ONUs is of major concern. Present-day ecological awareness, increasing energy costs and thermal concerns require drastic measures to be taken.

Because of the very low losses in optical fibers, PON in itself is a very energy-efficient technology. Standard downstream time division multiplexing PON (TDM-PON) protocols, however, are inherently energyinefficient as they operate on frame-level. This requires an ONU to process every frame header at the aggregated line-rate, even though the majority of the packets are intended for other users. Data within frames has to be processed at full line-rate speed until the ONU determines its designated receiver. In conventional TDM-PON, a burst-mode transmission operation is used in the upstream direction. Its operation is relatively more energyefficient than the downstream direction because of its inherent load dependency.

Both ITU-T and IEEE have recently opened the discussion on the energy-saving potential of PONs [1–4], which led to intensive research to address this problem [2–5]. Thus far, the attention is mostly focussed on the possibility of introducing sleep mode techniques into the standard PON systems. Ideally an ONU with sleep mode support would only be awake during reception of its own payload. Practically, however, non-negligible sleep/wake transition times [2] and quality of service (QoS) requirements [3] limit the efficacy of these techniques.

Theoretically, the lower limit for the power-consumption in ONUs is dictated by the actual user-rate, which typically is a fraction of the aggregated line-rate. While conventional TDM-PON protocols are inherently operating at line-rate, the proposed bit-interleaving CDR (Bi-CDR) takes advantage of the line-rate – user-rate discrepancy. In every frame period, the Bi-CDR decimates and offsets the downstream payload reception based on Bi-PON frame header information. Because the interleaving takes place on bit-level, the bits intended for a specific user are uniformly spread in time. This allows ONUs to sample data at their own respective user rate. In this way, downstream bandwidth is dynamically allocated through header information, preceding the encapsulated bit-interleaved payload.

*Bit-Interleaving concept and Bi-CDR operation:* The Bi-PON protocol [6] allows TDM to work on bit-basis rather than frame-basis. As common in other PON protocols, the payload is encapsulated in a frame and is preceded by a header, which informs ONUs about how to retrieve their respective data. Fig. 1 shows this dedicated Bi-PON frame format –which has a fixed length of 125 us– and its different header fields: synchronization patterns, identification numbers (IDs), reserved bits and the bandwith (BW) map.



Fig. 1. The Bi-PON protocol frame format

Inside the Bi-CDR, firstly incoming bits are down-sampled by a factor of 256, yielding 256 separate channels, one of which is processed.

Every channel contains a synchronization pattern, followed by a channel identifier (0..255) and configuration information. Once a synchronization pattern has been found, the channel ID can be obtained. Comparison with the ONU ID yields the phase offset of the down-sampler. The system then adjusts the clock-phase of the down-sampler to lock on its proper channel, waits for the next frame, detects the synchronization pattern and confirms that now the channel ID indeed matches the ONU ID. Now the system is ready to read the appropriate configuration fields.

The field following the ID, as shown in fig. 1, is "reserved" and offers the possibility to pass specific instructions from optical line terminal (OLT) to an ONU, allowing implementation of centrally controlled sleepmodes to further reduce energy-waste in periods of low activity. Next the bandwidth map field provides the particular payload decimation rate (8, 16, 32, 64, 128, 512 or 1024) and the payload offset (see fig. 1), yielding all information required to configure the payload down-sampler for the current frame. Besides, specific instructions can be passed on to the Bi-CDR, such as sleep-modes, which can be used to further reduce power consumption in network devices.

*Implementation of Bi-CDR:* Fig. 2 shows the building blocks of the Bi-CDR, namely a PLL, a clock generation block, and two separate data paths to process header and payload individually.

The data is first sent through a pre-amplifier, recovering the noiseimpaired signal to logic levels. Here-after clock and data are recovered by means of a phase locked loop (PLL). A clock-divider circuit divides the acquired 10 GHz clock by 8 and hence, generates 8 equidistant 1.25 GHz clock phases. Two of those are selected by the subsequent CMOS logic to resample the incoming data, effectively decimating the data by a factor of 8 for the first time and yielding a "header" path and a "payload" path.



Fig. 2. The proposed Bi-CDR architecture

This early decimation reflects the strategy behind the novel bitinterleaving TDM-PON concept, i.e. to reduce the processing speed as early on in the architecture as possible. Only the pre-amplifier, PLL and clock-generation block operate at 10Gb/s, the aggregated line-rate, and are implemented in current-mode logic (CML). The remaining building blocks operate at the much lower user-rate, enabling pure CMOS implementation.

The "header" path is further down-sampled by a factor of 32 in pure CMOS. This yields 256 separate channels as explained before. The "sync detection"-block reads one of those channels, looking for a synchronization pattern. Once found, the particular channel ID can be read and compared to the ONU ID. Calculation of the offset allows reconfiguration of the decimator, i.e. picking other clock-phases for downsampling. When the header of the next frame arrives, the synchronization pattern will be detected all over again, and one will get confirmation that the obtained channel ID, now matches the ONU ID. The remaining of the header now needs descrambling as both the downstream bandwidth map and the payload are scrambled using a frame-synchronous additive scrambling polynomial  $(1 + x^{-18} + x^{-23})$ . To realize descrambling at the decimated rate, without decimating the entire scrambling sequence, both header and payload need separate descrambling blocks as both offset and down-sampling factor differ. The payload parser monitors the length of a frame and signals the sync-detection block to start looking for a new synchronization pattern.

After decimation of the header, the "payload" path can be correctly configured, requiring both an offset and a decimation rate as the latter is configurable. The payload is descrambled by directly determining the decimated and offset scrambling sequence. The payload parser monitors the length of the frame, stops data output when necessary and signals the sync-detection block to wait for a new frame.

Table 1: Power consumption measurements

| decimation | user-rate | Power [mW] | Power [mW] |
|------------|-----------|------------|------------|
| rate       |           | @1.2 V     | @2.5 V     |
| 16         | 625Mb/s   | 17.5       | 125        |
| 32         | 312Mb/s   | 14.3       | 128        |
| 64         | 156Mb/s   | 11.4       | 135        |
| 128        | 78Mb/s    | 9.9        | 133        |
| 256        | 39Mb/s    | 9.1        | 134        |
| 512        | 20Mb/s    | 8.8        | 134        |
| 1024       | 10Mb/s    | 8.7        | 134        |

*Performance and result:* The Bi-CDR has been fabricated in a 0.13 um BiCMOS process. A die micrograph and a picture of the test board are shown in fig. 3. The entire chip dissipates merely 146 mW. This number includes the pre-amplifier for the incoming 10Gb/s data and low voltage differential signalling (LVDS) output buffers for both payload and double data-rate (DDR) clock. It assumes continuous operation. Depending on the clients' usage pattern, an average power consumption could be much lower f sleep-mode operation is employed. Table 1 reveals more details about the power consumption, in function of the decimation rate for both supply voltages. The 1.2 V drives the CMOS logic and the 2.5 V supply drives the CML logic (PLL, clock phase generation, first sampling stages, clock phase multiplexers and LVDS output buffers).

As expected the CML power consumption remains more or less constant, while the CMOS scales well with the user-rate. The CMOS power consumption has a lower threshold because of the Bi-PON header processing logic which –unless sleep-modes are issued– runs at every new frame and always at 1/256th of the linerate. The clear scaling in the CMOS fraction of the power consumption will also impact the energy efficiency of subsequent blocks, such as the notably power-hungry forward error correction (FEC)-block. More advanced processes (e.g. 65 nm or 45 nm CMOS) would allow to move logic from CML to CMOS, further decreasing power consumption and improving scaling.

The pre-amplifier can correctly (bit-error rate < 1E-12) recover input signals with a differential amplitude down to 75 mVpp. The subsequent flip-flop, driven by the clock generated by the PLL, correctly samples the incoming signal for duty cycles between 0.25 and 0.75. These figures were measured by injecting a pseudo-random bit-sequence (PRBS,  $2^{23} - 1$ ) and comparing the output of the CDR with the bits obtained by down-sampling of that very PRBS. The jitter on the 800 ps clock amounts to 5.4 ps RMS and 32 ps peak-to-peak.



Fig. 3. Die micrograph and test-board

*Conclusion:* The Bi-PON protocol is a solution to increase energy efficiency in PONs, a problem requiring urgent attention. The protocol relies on dynamic data decimation on bit-level early in the ONU. The elegance of the Bi-PON solution is clearly reflected in the Bi-CDR architecture. The power consumption of Bi-PON compatible ONUs scales

well with the effective momentary user-bandwidth as the througput – contrary to sleep-modes– can be fine-tuned on frame-basis without – contrary to speed throttling– harming QoS or quality of experience (QoE).

Acknowledgment: This work has been supported by Alcatel-Lucent.

C. Van Praet, G. Torfs, Z. Li, X. Yin and X.-Z. Qiu (*IMEC/INTEC\_design*, 9000 Gent, Belgium)

E-mail: christophe.van.praet@intec.ugent.be

D. Suvakovic, H. Chow, P. Vetter (*Bell Labs, Alcatel-Lucent, Murray Hill, NJ 07974, USA*)

## References

- 1 ITU-T G. Sup45: GPON power conservation, 2009
- 2 Suzuki, N., Kobiki, K., Igawa, E., Nakagawa, J.: 'Dynamic Sleep-Mode ONU with Self-Sustained Fast-Lock CDR for Power Saving in 10G-EPON Systems', ECOC 2011, We.8.C.4
- 3 Kanonakis, K., Zhang, J., Cvijetic, N., Tomkos, I., Wang, T.: '1G/10G-EPON Compliant Scheme Combining ONU Energy Efficiency with QoS Performance', OFC'12, JTh2A.52
- 4 IEEE P802.3az Energy Efficient Ethernet Task Force
- 5 Shi, L., Mukherjee, B., Lee, S.-S.: 'Energy-efficient PON with sleep-mode ONU: progress, challenges, and solutions', *IEEE Network*, 2012, 26, pp. 36-41
- 6 Chow, H., Suvakovic, D., van Veen, D., Dupas, A., Boislaigue, R., Farah, R., Fai Lau, M., Galaro, J., Qua, G., N. Anthapadmanabhan, P., Torfs, G., Van Praet, C., Yin, X., Vetter, P.: 'Demonstration of Low-Power Bit-Interleaving TDM PON', *ECOC 2012*, Mo.2.B.1