A high-speed physical-layer architecture for nextgeneration higher-speed Ethernet for VSR and backplane applications was developed. VSR and backplane networks provide 100-Gb/s data transmission in "mega data centers" and blade servers, which have new and broad potential markets of LAN technologies. It supports 100-Gb/s-throughput, high-reliability, and low-latency data transmission, making it well suited to VSR and backplane applications for intra-building and intra-cabinet networks. Its links comprise ten 10-Gb/s high-speed serial lanes. Payload data are transmitted by ribbon fiber cables for very short reach and by copper channels for the backplane board. Ten lanes convey 320-bit data synchronously (32 bits × 10 lanes) and parity data of forward-error correction code (newly developed (544, 512) code FEC), providing highly reliable (BER<1E-22) data transmission with a burst-error correction with low latency (31.0 ns on the transmitter (Tx) side and 111.6 ns on the receiver (Rx) side). A 64B/66B code-sequence-based skew compensation mechanism, which provides low-latency compensation for the lane-to-lane skew (less than 51 ns), is used for parallel transmission. Testing this physical-layer architecture in an ASIC showed that it can provide 100-Gb/s data transmission with a 772-kgate circuit, which is small enough for implementation in a single LSI.
Introduction
Data-center services and broadband networking are rapidgrowth fields within the enterprise and local-area-network markets (Fig. 1) . Especially, data-mining systems and thinclient systems are widely used in these fields. A data-mining system with a high-performance computer and large-scale blade servers for a thin client require high-throughput (100-Gb/s), high-reliability, and low-latency network performance. The high-performance computer consists of several small computers connected by a LAN using VSR (very short reach) links, namely, a distance of less than 100 m. A largescale blade server contains a local-area network in its rack or chassis, and the transmission distance between blades in the backplane board is ultra short (less than 1 m). These LAN-condensed high-speed links are a potential new frontier market for networks.
Aiming at the markets for VSR and backplane networks, we have developed a 100-Gb/s physical-layer architecture. We previously developed 64B/66B code-sequence- † † The author is with the Faculty of Engineering and Technology, Tokyo University of Agriculture and Technology, Koganeishi, 184-8588 Japan.
a) E-mail: hidehiro.toyoda.rt@hitachi.com DOI: 10.1093/ietele/e90-c. 10.1957 based skew compensation using in this architecture [1] - [4] .
In the present study, we propose a forward error-correction mechanism with the Fire code. We then implemented this mechanism in the physical layer architecture, which enables practical and reliable 100-Gbit/s data transmission. We implemented both mechanisms in HDL logic to evaluate their performance and estimate circuit size. A typical metro-area network (MAN), on the other hand, currently requires high-speed data transmission at less than 10 Gb/s over maximum distances, i.e., edge-toedge or edge-to-core connections, of 10 km. The rapid increase of popularity and availability of broadband IPnetwork services have already made these MAN connections insufficient for use. The physical layer (PHY) of a typical MAN consists of gigabit-Ethernet (1000BASE-LX) or 10-Gb-Ethernet (10 GBASE-LR/ER) links [5] . While this level of technology provides long-distance and low-cost data communication without protocol conversion (for example, Ethernet over SONET), the current requirement is for Ethernet links that are at least ten times faster, i.e., 100 Gb/s, and reasonably priced. Given that, 100-Gb/s transmission using a single-wavelength channel has already been reported [6] - [8] ; however, the economical feasibility of a 100-Gb/s serial link has not been confirmed yet.
Our main targets (VSR and backplane) and MAN, both require high throughput (100 Gb/s) and high reliability at reasonable cost. We cost-optimized the physical interfaces to each market, i.e., VSR or MAN. However, to realize a pluggable and easy-to-use interface for users, the logical interface (i.e., PCS (physical coding sublayer)) is unified as one structure. This 100-Gb/s-physical-layer architecture provides high-performance, useful, and cost-effective data communication spanning a broad market from legacy markets (MAN and LAN) to new markets (VSR and backplane). ernet (10 GBASE-ER, and -EW)). The target of 100-m data transmission is the VSR (very short reach) and backplane markets for mega data centers and HPCs (high-performance computers); that of 40-km data transmission is the MAN market.
High-Reliability and Low-Latency Transmission
High-reliability data transmission is important, especially in the case of data-center networks and HPCs. In these markets, a network is used for copying bulk data (memory copy, disk copy, etc.). This application requires low-errorrate (BER: < 1E-15) and low-latency (< 100 ns) data transmission to provide short-time-overhead communication and good MTTF (mean time to failure).
To provide high-throughput data transmission, decision-feedback equalization technologies are generally used. These technologies improve jitter tolerance but convert one-bit error to burst error (multi-bit errors) in their signal processing. To achieve high-reliability transmission, a burst-error correction mechanism is therefore additionally required.
PCS with Common Interface
Next-generation high-speed interfaces will use various optical and electrical physical I/Os. However, the coding sublayer, such as PCS (physical-coding sub-layer) in Ethernet, should be unified in one structure. One PCS provides a common interface attached to all physical I/Os in order to provide utility and cost effectiveness in regards to PCS LSIs. Accordingly, the logical structure in PCS should support all physical I/Os at once.
Solution Technologies
To meet the requirements listed above, we developed the "solution technologies" listed below.
Multi-Lane Synchronized Serial Data Transmission
To provide 100-Gb/s data transmission with short-haul transmission length, we developed multi-lane synchronized serial data transmission. For cost-effective short-haul data transmission (in which interface cost is dominant), multilane synchronized serial data transmission with low-cost LD/PD is better than high-speed (high-cost) serial data transmission. The developed multi-lane transmission technology uses 10-Gb/s data transmission in a 10-lane configuration (Fig. 2 ). This logical structure is also adjustable to other configurations, for example, 25-Gb/s transmission in four-lane configuration.
The challenging issue regarding multi-lane synchronized serial data transmission is skew caused by the difference between fiber-transmission delay times of fibers. To provide skew compensation, our transmitter outputs a specially designed data pattern for deskewing [1] - [4] . The receiver side then detects the skew according to the phase difference between the received timings of the patterns, and compensates the skew using the buffer-delay mechanism. The pattern is constructed with IDLE characters defined in 64B/66B code. No bandwidth overhead for inserting the deskewing patterns into data frames is necessary.
The structure of the circuit for deskewing is described in the following. In the transmitter of the PCS, each bit of the 320-bit parallel input from the media access control (MAC) layer through a 100-gigabit media-independent interface (CGMII) arrives at 312.5 Mb/s; the CGMII, assumed here, between the MAC layer and PCS has 100-gigabit throughput. This data is divided into 32-bit units, and each unit is assigned to one of the ten lanes, as shown in Fig. 2 .
An idle-pattern generator sequentially generates 1024-bit deskewing data; the current data is inserted into each of the lanes in 32-bit units over two cycles, for a total of 64 bits, as shown in Fig. 3 . Each deskewing data pattern is inserted into all ten lanes at the same time. An independent encoder produces the 64B/66B code for the lanes.
The PCS at the receiver decodes the 1024-bit deskewing data in each lane and then resynchronizes the decoded signals.
Each received data bit is sent to the 64B/66B decoder.
Each decoded data bit is stored in a 1024-bit deskew buffer, and sent to the skew detector, which detects the skew of the respective lanes. The delay controller compensates for the skew by controlling the read-out timing from the deskew buffer for each of the lanes. The receive side of PHY has a 1024-bit FIFO buffer in each lane, and each buffer compensates for the skew according to the detected phase difference of deskewing data patterns. The 1024-bit size is twice the maximum skew and is thus sufficient for skew compensation. The deskew buffer takes up most of the receive area.
The deskewing data patterns are constructed of 16 sets of 64B/66B codes, S0 to S15, as shown in Fig. 4 . Each of the first four control characters, C0, C1, C2, and C3, is assigned /I/ or /K/, and the remaining four control characters are the same as the first four characters. Here, /I/ and /K/ are the idle characters of 64B/66B code (/K/ is the reserved control character in the standard 64B/66B code). In all, 16 sets of codes, S0 to S15, can be constructed from the various combinations of five /I/s and /K/s. The 16 sets for skew compensation are transmitted in the sequence S0, S1, . . . , S14, S15, S0, S1, . . . , during gaps between frames.
Fire Code Based FEC
Large signal-propagation loss (−35 dB/m at 5 GHz) is one of the problems in electrical backplane data transmission. To compensate this loss, an FFE (feed-forward equalizer) and a DFE (decision-feedback equalizer) are generally used in a high-speed serial interface. However, a DFE selects the received signal according to its correlation with previous received signals. A one-bit-error occurrence on the transmis-sion line is assumed here. A one-bit error is then converted to multi-bit error (burst error) by the DFE and decoding circuits (which is called bit-error propagation) [9] .
Fire codes are cyclic codes for correcting burst errors [10] . In our PCS layer, FEC (forward error correction) encoders and decoders are based on the (42987, 42955) Firecode and installed in a position that is the nearest the I/O interfaces. This Fire code has 42987-bit encoded data with 32-bit parity data. The generator polynomial of this code is shown below.
This code enables burst-error correction of less than or equal to 11 bits and bit-error detection of less than or equal to 21 bits. The same algorithm is used in the case of 10 GBASE-KR FEC [9] , so the capability of correcting burst error of bit length is also the same. However, in our PHY layer, 512 bits of the 42,955 bits are used as message bits (values of other bits are recognized as 0). This simplifies the error-correction calculation procedure in terms of low latency. The code rate R is 0.94 (i.e., 512/544), so we call it "(544,512) code FEC." In 10 GBASE-KR, 2080 bits of the 42,955 bits are used as message bits. The code rate R of our PHY is slightly worse than that of 10 GBASE-KR FEC (i.e., R = 2080/2112 = 0.98). Short code length introduces low-latency error correction but it decreases the code rate (and decreases transmission efficiency). A trade-off relationship thus exists between code length and transmission efficiency. In our PHY layer, physical bit rate is determined by Eq. (2) because clock rate in a serial interface is technically limited to 11 Gb/s.
The parity-insertion process of the (544,512) code FEC encoder is shown in Fig. 5 . Codeword length of 544 bits is equal to 17 times 32 bits. The FEC circuit is thus designed in units of 32 bits. In the encoder, 512 bits of transmitted data are treated as message bits, and 32-bit parity bits are generated according to the message bits.
In the decoder, position and vector of burst error are calculated according to the received message bits and parity bits. The error position shows the first position of a burst error, and corrected data is calculated by ExORs with message bits and the vector of the burst error.
In the case of our method, synchronous bit of clearly showing the parity position in the sequential bit data does not exist. Then, the synchronizer detects the position of the best error rate, and considers to be the codeword boundary there (i.e., the synchronous point). If the synchronization state does not lock, the synchronizer monitors the results of error detection, and moves a synchronous point to the position in which the error rate is near zero. After the error rate becomes near zero, and synchronization state locks, a synchronous point is not moved until the error rate exceeds the definite value.
Evaluation

Performance Evaluation of FEC
In the PHY layer, one-bit error is enhanced to burst error by the DFE in the receiver. As an evaluation of the FEC, performance of one-bit error correction is simulated. The decoding-error probability with code length n is given by Eq. (3):
where p is BER of an uncoded channel, and t is errorcorrecting capability. In case of single bit error (t=1), P B can be approximated as Eq. (4).
As shown in Fig. 6 , channel characteristics with BER of 1E-12 can be improved to BER of 5.4E-22 by the (544, 512) code. This coding gain is about 2.4 dB at BER of 1E-12 (Fig. 7) , namely, slightly better than that of 10 GBASE-KR FEC (i.e. about 2.2 dB at same BER).
Logic Size, Area Size, and Latency
The proposed PHY architecture has been implemented in an HDL logic circuit and installed in ASIC. The configuration of the HDL circuit gives 10 Gb/s by 10 lanes, and the size of the logic is 168 kgates or 604 kgates on the transmitter (Tx) or receiver (Rx) side, respectively (total size is 772 kgates without built-in self-test circuits, one logic gate is equivalent to 2-port NAND). Figure 8 shows the layout of the prototype LSI. In this LSI prototype, a 10-Gb/s by four-lane configuration with a four-lane skew-compensation mechanism is installed. This four-lane configuration is restricted by the limitation of usable area in the prototype LSI. This four-lane configuration is sufficient to confirm feasibility of skew compensation and circuit size estimation, so this mechanism can also be applied to 10 Gb/s by 10-lane configuration.
A 90-nm CMOS process is used. Clock speed when FEC is not used is 322.265625 MHz (that is, 10.3125 Gb/s divided by 32). When FEC is used, clock speed is 342.41 MHz (about 10.96 Gb/s divided by 32; referred to in Eq. (2)). Because the PCS in this prototype LSI contains other test elements, it occupies 5.0 × 2.0 mm (area utilization (logic cell area/physical area): < 16%). However, the circuit size per lane is about 77k gates, so it can be improved to 0.4 × 0.9 mm area (area utilization is 64%; 77k (gates) × 3 (µm 2 /gate)/360k (µm 2 )) by removing other test elements and optimizing the layout.
Latency of each block is listed in Table 1 . The block with the longest latency (74.4 ns) is the FEC encoder block (whose latency depends on the length of the Fire code). Total latency of the Tx block is 31.0 ns, and that of the Rx block is 111.6 ns (latency excludes SerDes and signal line delay).
Especially, the FEC decoder of our PHY and 10 GBASE-KR are compared here in terms of circuit size Fig. 8 Layout of LSI prototype. † † . It is clear that circuit size and latency of our FEC are smaller than those of the 10 GBASE-KR FEC. These differences of circuit size and latency chiefly depend on the size of the codeword. When a lot of lanes are installed in one LSI, for the VSR market or backplane applications, a small size codeword is effective in view of lower circuit size, latency, and power consumption.
Conclusion
The developed 100-Gb-PHY interface architecture is well suited to next-generation higher-speed Ethernet for VSR and backplane applications. In this architecture, a multi-lane high-speed serial interface achieves high-throughput transmission with a 10-Gb/s-by-10-lane parallel optical interface. A 64B/66B-based skew-compensation mechanism enables synchronized parallel data transmission in both the backplane (< 1 m) and long-haul (< 40 km) CWDM applications.
To achieve highly reliable data transmission, we developed (544,512) code FEC with enough coding gain of 2.4 dB. This performance improves the channel characteristics from BER of 1E-12 to 5.4E-22, which is sufficient for data centers and HPC markets. This code also provides † The circuit size of the 10 GBASE-KR FEC decoder is 7 kgates + 2211 bit RAM (2211 bit = 33 bit × (64 + 3)) [11] . When 1 bit RAM = 1 FF (=16 gates) is simply assumed, it is 57.4 kgates (= 7 kgates + 2211 bit × 16 gates).
† † The bit rate per lane is assumed to be 10.3125G according to the upper bound of 10G-SerDes. Therefore, the data rate becomes 9.41 Gb/s. low-latency transmission, namely, 31.0 ns on the Tx side and 111.6 ns on the Rx side.
To evaluate this interface, we implemented this architecture in an HDL logic circuit. Circuit size is 16.8 kgates on the Tx side and 60.4 kgates on the Rx side. This size of logic was implemented into an area of 0.4 mm × 0.9 mm (0.36 mm 2 ). This result shows that 100-Gb-Ethernet exploits next-generation high-speed network services in VSR, backplane, and MAN/LAN applications with highly reliability and low latency.
