Abstract-This paper presents a scalable energy-efficient MAC/PHY protocol for building a metro/access network. The proposed cascaded bit-interleaving (CBI) protocol extends the previously reported bit-interleaving concept to a multi-level paradigm. Moreover, a 40Gb/s 3-level electrical duobinary based physical layer scheme has been proposed for cost and energy saving, especially for end terminals. We compared two implementation approaches in terms of optical budget and transmission penalties. The initial estimate from the proof-ofconcept full-custom ASIC design shows that an ultra-low power metro/access network can be realized.
INTRODUCTION
The evolution of cloud computing and cloud-based services offers service providers alternative revenue to their traditional business of providing connectivity. However, supporting these services to residential and business customers poses new set of challenges for providers' metro and access networks. For example, these new services tend to require lower network latency, wider range and dynamic bandwidth demand, depending on the type and duration of services. Traditional, service providers build their metro and access networks using carrier class Ethernet switches/routers and passive optical networks (PON), respectively. Although these technologies all have certain degree of bandwidth control and quality of service (QoS) capabilities, their line rate and switching capacity are often fixed. To support a higher instantaneous bandwidth demand, service providers are forced to upgrade their networks with higher capacity line cards. However power consumption of a network element is directly proportional to its line rate and capacity, higher capacity line cards will inevitably consume more power. E.g., a 2-port 10GE line card and a 4-port 10GE line card from Cisco [1] consumes about 269W and 371W, respectively. Energy will be wasted when utilization of such network element is low. The use of a sleep-mode protocol [2] may help alleviate the problem, but it incurs additional network latency and packet delay. As reported previously [3] [4], the bit-interleaving protocol offers unique benefits for lowering the power consumption of a PON optical network terminal (ONT) by an order of magnitude. In this paper, we present an extension to this concept, called cascaded bit-interleaving (CBI), suitable for metro/edge/access network applications. CBI extends the original bit-interleaving protocol to a multi-level paradigm. While the proposed CBI protocol can be applied to both mesh optical network and passive optical network, without loss of generality, this paper uses the latter as a use case, namely CBI-PON, for discussion. Fig. 1 illustrates a 40Gb/s CBI-PON which consists of three stages of bit-interleaving PONs (Bi-PONs). In this example, the CBI-PON is served by a single optical line terminal (OLT), namely CBI-Interleaver. Users (or user networks) are connected to the CBI-PON through the CBIEnd-Terminals (eONTs). Lower level Bi-PONs are connected to their upper level network through the CBI-Repeaters. Notice that the CBI-PON is highly scalable and flexible: in principle, both CBI-eONTs and CBI-repeaters can be mixed in one stage. Moreover, depending on the traffic loading condition or user demand, an individual Bi-PON may operate at different instantaneous line rates.
The remainder of the paper is organized as follows. Section II investigates the physical layer design of the 40Gb/s CBI-PON and proposes a 40Gb/s downstream scheme with an electrical 3-level duobinary modulation. Two different implementation approaches have been compared in terms of receiver signal-to-noise (SNR) requirement and transmission penalties. Section III introduces the CBI protocol and describes downstream (DS) and upstream (US) frame structures. Section IV shows a proof-of-concept low-power CMOS ASIC implementation. Simulation results and power consumption comparison are presented in Section V. Finally Section VI concludes the paper.
II. 40GB/S CBI-PON PHYSICAL LAYER CONSIDERATION
The proposed CBI-PON extends the coverage span and increases the number of customers by cascading Bi-PON sections. However, sharing a single CBI-PON among more customers reduces the sustained available bandwidth, which requires increasing the serial data rate at the primary stage, e.g., 40Gb/s DS and 10Gb/s US, to support business-class applications or other bandwidth demanding services. To alleviate the chromatic dispersion (CD) and component speed requirement, esp. for the APD at the receiver side, we propose a 3-level electrical duobinary modulation scheme operating in O-band for the 40Gb/s DS in the CBI-PON. 3-level duobinary modulation relaxes the downstream channel bandwidth requirement to ~20 GHz thus improves the CD tolerance significantly compared to the non-return-to-zero (NRZ) format. Unlike optical duobinary modulation (ODB), this allows for 25-Gbit/s components (especially APDs) to be employed in CBI-repeater/eONTs, thus reducing the cost and power consumption.
There are two approaches to generate the 3-level duobinary signal: receiver-generation and transmittergeneration approach. The receiver-generation approach [5] [6] uses a receiver with a bandwidth of ¼ data rate, which further reduces the receiver bandwidth requirement to ~10 GHz, while the transmitter-generation employs a delay-and-add digital filter to create a 3-level duobinary signal at the transmitter side. For comparison, Fig. 2 illustrates the simplified block diagram of both approaches. The following sub-sections compare the two approaches in terms of receiver SNR requirement and transmission performance.
A. APD receiver SNR requirement
To determine the theoretical receiver SNR requirements for the 3-level duobinary modulation, we follow a similar procedure as described in [7] , but assume that the receiver noise is
where M and F are the multiplication factor and excess noise factor of the APD, respectively. The excess noise factor is a function of the ionization-coefficient ratio kA,
If we define the extinction ratio as the ratio between the high power level PS2 and the low power level PS0, i.e., re = PS2/PS0, we find the following relationship for finite extinction ratio re,
As shown in [7] , the optimal threshold current is signal dependent and has a form of, Based on those equations, we first evaluate the sensitivity performance numerically for the transmitter-generation approach. We assume that the 3dB bandwidth of the receiver is ~20 GHz, the input-referred rms noise current of the amplifier circuit is 2.5uA, and the responsivity R is 0.8A/W. For the APD, the typical parameters of M=10 and kA=0.6 are used. An extra 2dB penalty margin has been included in all the calculations. Fig. 3 shows the calculated receiver sensitivity versus extinction ratio for various bit-error rate (BER) thresholds with above assumptions. The sensitivity is about -19.7dBm at BER=1E-3 for 9dB extinction ratio. This results in a power budget of ~29.7dB assuming +10dBm transmitter output power. Fig. 4 . APD receiver sensitivity versus extinction ratio when 40-Gbit/s 3-level duobinary signal is generated at the receiver.
A similar numerical calculation for receiver-generation approach is shown in Fig. 4 . In this case, the sensitivity was improved by 2.8dB assuming a lower receiver bandwidth of ~10GHz and smaller input-referred rms noise of 1.5uA. Another advantage is the potential in energy saving as the opto-electronic frond-ends operate at a much lower bandwidth than a full-rate receiver.
B. Dispersion penalty
The distortion which is induced by CD grows with the square of the bit rate. To investigate the CD tolerance of the proposed 40Gb/s link in the O-band, we have simulated the dispersion penalty using a system simulator built with RSoft Optsim and Matlab. The worst-case CD parameter in the Oband for the standard single mode fiber (SSMF) [8] , i.e., ~5.2ps/nm/km at 1360nm, has been used in the simulation. The simulated eye-diagrams of 40Gb/s 3-level duobinary transmission for both receiver-and transmitter-generation approaches are shown in Fig. 5 . In the back-to-back case, delay-and-add filter gives a slightly better eye-diagram because the TX output eye has less ringing in the middle level. Both approaches showed a good CD tolerance: even after 40km transmission, 3-level signals with two distinct eyes can still be recognised from the eye-diagrams.
To better understand the difference in dispersion tolerance, we have simulated both approaches for various fiber lengths (0km, 8km, 16km, 24km, 32km, and 40km) and calculated the BER using Monte-Carlo techniques. We assumed a pre-FEC BER threshold of 1E-3 for sensitivity simulation and computed the power penalties with respect to the back-to-back case of each approach. The resulting power penalties for these two modulation formats are listed in Table 1 . For a typical uncompensated fiber length, e.g. ≤20km, the simulated power penalty is less than 3dB. The simulation results also show that the performance difference between two implementation approaches is minimal for all the transmission distance up to 40km.
B2B 40km B2B 40km
(a) (b) The US line rate of any Bi-PON is set to be a quarter of its operating DS line rate.
A CBI DS physical layer frame shares a similar structure as the original Bi-PON protocol. It consists of a header section and a payload section. The header section is organized into a fixed number of lanes bit-interleaved together. Each header lane contains a 4-octet synchronization codeword (SYNC), a 2-octet repeater/e-ONT identifier (RNID) and a 76-bit bandwidth map/OAM (BWMAP) field. Immediately following the header section is a payload section. The number of header lanes and the size of the payload section scale proportionally with the DS line rate and level of Bi-PON, so that a DS frame time of a Bi-PON at any level remains as 125µs. The lower level DS frames are interleaved together to form a higher level DS frame. Fig. 6 shows an example of a primary DS frame hierarchy and its composition.
A CBI-Repeater or eONT first detects the SYNC and matches its pre-configured identifier with the RNID. Then, it parses the BWMAP field which carries information about (i) sampling of current payload data, (ii) US bandwidth allocation of the next US frame and/or (iii) embedded OAM messages, if it presents. For all subsequent DS frames, a repeater simply samples the header and payload section without performing any descrambling operation, while an eONT samples and descrambles only the payload section according to the DS map. US transmission in CBI-PON is a time-slot based burst transfer. An US PHY frame is 4 x 125us long regardless of its transmission line rate. An US frame is divided into a number of fixed size time slots, called quanta. The number of schedulable quanta depends on the US line transmission rate of a Bi-PON. The CBI-Interleaver schedules all US transmission in all Bi-PONs, in terms of quanta, across the CBI-PON. An e-ONT or Repeater determines its turn of transmission and duration by decoding the allocation information from an US bw map sub-field embedded in the BWMAP field in a DS PHY frame. An US allocation is an ordered pair (start, length), in which start indicates the quantum that a CBI-Repeater/CBI-eONT shall begin its burst transfer; and length indicates how long (in terms of number of quanta) such burst shall last. The first quantum (or the beginning of an US frame) shall synchronize with the beginning of a DS PHY frame where the US bw allocation is present. In other words, all US burst transfers in a CBI-PON are reference to the beginning of a DS PHY frame in which the US bw allocation for the next US PHY frame is present. Fig. 7(a) illustrates the relationship between the DS and US transmission. As shown in Fig. 7(b) , Ethernet frames are encapsulated in their native format (10b-codeword or TBC) during US transfer so that no extra PCS processing overhead is added.
A CBI US burst transfer always begins with a train field. A train is a sequence of t codewords of /D21.5/, which is a 10-bit of alternating "1" and "0". A train field is followed by a sequence of CBI encapsulated Ethernet frames. All encapsulated Ethernet frames shall end with /T//R/ characters. The total number of usable TBCs in an US burst shall equal to (length x 1944 -(t + g)), where g is the guard interval (in TBC) at the end of a burst. During the guard interval, the transmitting laser shall be turned off. The CBI US transmission does NOT support Ethernet frame fragmentation. In other words, any unused time in the last allocated quantum shall be filled with IDLEs. A CBI-Repeater forwards the US bursts unmodified without the train and guard field. 
IV. PROOF-OF-CONCEPT IMPLEMENTATION
To experimentally assess the performance of the proposed protocol and the 40Gb/s duobinary transmission, a proof-ofconcept 40G CBI-PON platform is designed. The CBIInterleaver is implemented on an Altera Stratix V GX FPGA with an external 4:1 40GHz multiplexer. The CBI-Repeater and eONT are implemented on a full custom ASIC, codename CABINET. Fig. 8 and Fig. 9 show the block diagram of major functional modules in the DS and US paths. The single chip CABINET can be configured into either repeater mode or 2014 IEEE Online Conference on Green Communications (OnlineGreenComm) eONT mode. The CABINET shares a common DS CDR which is highly flexible and is configurable to operate at 40GHz, 10GHz or 2.5GHz input. When operating in the repeater mode, a configurable burst-mode US CDR is also included. Fig. 10 shows the layout of the CABINET which is 1.7mm x 1.7mm, fabricated using the TSMC 40nm LP process. 
V. ESTIMATED POWER CONSUMPTION
The CABINET design has been simulated extensively using Cadence design tools and ModelSim. The estimated worst case power consumptions of the functional modules, listed in Table 2 , include both static and dynamic power consumption. Dynamic power consumptions are estimated using Synopsys Power Compiler with VCD stimulus extracted from post-layout logic simulation. As expected, the CDR modules and the I/O driver dominate the estimated total power consumption. The CBI protocol processing contributes only ~37mW or 15% of the total power. This shows that the proposed protocol is extremely energy-efficient. Considering the protocol processing alone, majority of the energy is consumed by the US SRAM module. This can be further reduced by shortening the US allocation and powering down portion of the SRAM banks. 
VI. CONCLUSION
In this paper we have presented an energy-efficient protocol, so called cascaded bit-interleaving (CBI) protocol, and a 40Gb/s 3-level electrical duobinary physical layer scheme for building a metro/access network. To analyze the system performance, two approaches for implementing the modulation scheme at 40Gb/s have been simulated and compared in details. Assuming a pre-FEC BER threshold of 1E-3, both approaches can support uncompensated filter up to 20km with a relatively small power penalty in the O-band.
The presented CBI protocol is scalable to large-scale applications due to its hierarchical nature. An ASIC that can perform the main functions of the CBI protocol has been designed and fabricated in a 40nm lower-power CMOS technology. The simulated power consumption of the ASIC (~250mW) shows the potential of significant power saving of the proposed CBI-PON network.
