We study the impact of using on-chip forward error correction (FEC) in two-dimensional optical data link (2D-ODL) applications. We demonstrate that FEC can reduce the required launch power for a vertical-cavity surface-emitting laser (VCSEL), improve the reliability of 2D-ODLs, reduce on-chip power consumption and relax the requirements of the optical system. Both analytical and simulation results are presented to support our arguments.
INTRODUCTION
Parallel optical interconnects (POIs) promise to deliver tremendous gains in bandwidth and interconnect density for applications such as massively parallel computing systems and telecommunication switches. At the core of any interconnect solution lies the fundamental problem of reliable transmission. Next-generation optical communication system designers are running into hard limits when it comes to increasing data transmission rates and reducing errors. These two factors are typically in opposing balance: minimize BER and data rates suffer, increase transmission rates and data integrity is compromised.
Our previous work [1] , [2] helped us identify two types of errors in a 2D-ODL such as the one depicted in Fig. 1(a) : random errors and systematic errors. Any electrical or optical interconnection is subject to random errors that are statistically unavoidable. They can be due to thermal noise in the receiver or to inter-symbol interference. Systematic errors are permanent sub-link failures and can be due to, for example, a dead VCSEL, a malfunctioning photodetector (PD), a dark fiber, an alignment problem or a fault on the ASIC. Sub-links subject to systematic errors cannot transmit any data and will be referred to as erasures. Large 2D-ODL arrays are likely to contain erasures due to their high sub-link count. Erasures could make some error-correction coding (ECC) schemes worthless if their error correction capability is not good enough. This is particularly true for automatic repeat request (ARQ) techniques [3] , which rely on retransmissions to correct errors.
2D-ODLs are subject to process variations on the ASIC, the VCSEL and photodetector (PD) arrays, and to power throughput non-uniformity in the optical system from aberrations or misalignment. This can affect the various sublinks of the 2D-ODL in different ways, resulting in a variety of sub-link BERs. Those sub-links for which their BER is higher than the average BER are referred to as bad sub-links.
Szymanski [3] and Neifeld [4] have already demonstrated the benefit of using ECC in parallel optical interconnects. Szymanski suggested to use an ECC scheme based on ARQ to optimize the bandwidth of ODLs while simultaneously decreasing the BER. Neifeld suggested the use of Reed-Solomon (RS) [5] codes in free-space optical interconnects (FSOIs) and demonstrated that they can facilitate an increase in both spatial density and data rate, resulting in FSOI capacity gains. Neifeld also demonstrated that ECC could help relax alignment accuracy, manufacturing uniformity, and other implementation tolerances. Our analysis builds on the work of Szymanski and Neifeld by adding erasures and bad sub-links to our 2D-ODL model. Our contribution is to demonstrate that FEC can 1) reduce the required VCSEL launch power to achieve a given BER, 2) keep the BER below the target BER even in the presence of erasures, 3) reduce the on-chip power consumption and 4) relax the optical system throughput requirement. As we will see in Sec. 3, all four objectives can be met using forward error correction (FEC), which differs from ARQ in that errors are corrected at the receiver without the need for retransmission.
Many long-haul optical systems use FEC to produce a coding gain. The coding gain is usually used to increase the distance between repeaters for a given BER. Two FEC approaches have been developed for long-haul optical systems [6] . Both have been submitted for ITU ratification. The first is an in-band SONET/SDH approach based on the Bose-Chaudhuri-Hocquenghem-3 code (BCH-3) [7] , [8] . The specific code utilized is a shortened version of a (8191, 8152) parent code, covering 4320 information bits and utilizing 39 redundant bits. BCH-3 codes can correct up to three errors. The second approach is a digital (out-of-band) wrapper approach based on the RS code, specifically an RS(255,239) code. The RS codes operate on symbols instead of bits. For this particular case, the symbol is an octet so the block is 255 octets in length. The payload is 239 octets, meaning that there are 16 redundant octets in the code. An RS(255, 239) code can correct up to 8 symbols in error and detect (but not correct) up to 16 symbols in error.
2D-ODLs differ considerably from long-haul optical systems. By transmitting data in parallel, the need for multiplexing and demultiplexing before and after transmission is reduced or eliminated. As stated in [9] , conventional error correction techniques involve decoding in a time-sequential (i.e. serial) fashion - Fig. 2(a) . In a serial implementation, one symbol of the received vector is provided as input to the decoder during each clock cycle. However, the highly parallel nature of the data transmitted over a 2D-ODL requires an alternate solution since such a serial decoding scheme can produce a severe bottleneck in high data rate applications. Moreover, the serial decoding scheme requires data multiplexing/demultiplexing. One way to increase decoding speed within the serial paradigm is to utilize an array of serial decoders operating in parallel - Fig. 2(b) . Each decoder will then operate independently on separate code words. If there are m serial decoders in such an array, the aggregate decoding data rate achieved is now m times that of single serial decoder. An alternate solution is the unfolding of the time domain algorithm to produce a parallel pipeline decoder - Fig. 2(c) . This decoder receives an entire code word at each clock cycle. Such a decoding paradigm provides to the decoder simultaneous access to all code word symbols and therefore can yield significant savings in implementation resources as compared with the array of serial decoders [9] .
The FEC techniques used in long-haul optical systems are not suitable in 2D-ODLs due to their serial nature and excessively long block lengths. The BCH-3 and the RS(255,239) codes require 4339 and 2040 bits, respectively, whereas the densest 2D-ODL reported to date has only 540 ODLs [1] . The error correction capabilities of the BCH-3 and RS codes are not sufficient for the low yield 2D-ODLs considered in this analysis. In Sec. 3, we will show that the Golay code is a more suitable FEC for 2D-ODLs.
Throughout this work, a transmitter will refer to a VCSEL and its driving circuit; a receiver will refer to the PD, the transimpedance amplifier (TIA) and the post-amplifier. Further, FEC will specifically refer to the use of the Golay code and the terms frequency and data rate (or bit rate) will be used interchangeably.
MODEL
Our analysis will be performed on 2D-ODLs such as the one shown in Fig. 1 . For clarity, only 16 sub-links are shown in the picture. However, our analysis holds for any 2D-ODL size. Arrays of VCSELs and PDs are heterogeneously integrated with CMOS chips [2] . The imaging system can either use free-space optics or guided wave optics, but the later case will be assumed to simplify the analysis.
The FEC used in our model is the Golay code. The (24, 12) extended Golay code is a half rate code that can correct and detect 3 and 4 errors out of a 24-bit codeword, respectively. The 12 extra bits are used at the receiver in an attempt to correct errors that may have occurred during transmission. The Golay code is linear and block-based. Efficient algorithms therefore exist for decoding the codewords. Due to its short codeword length, the Golay code is particularly suitable for packet data communications. The block-based Golay code does not require memory, unlike convolutional codes such as the Viterbi code [10] . The Golay code does not offer the same coding gain as the powerful turbo codes [11] , but its decoding does not require iterations, which is critical to minimize latency. Because latency is to be minimized, we considered 2D-ODLs for which FEC is performed on-chip. On-chip FEC also has the advantage of reducing the pin count, which may be a bottleneck for large optoelectronic-VLSI (OE-VLSI) chips. An area efficient decoding algorithm proposed by Cao [12] was selected to model the FEC block on the OE-VLSI chip. Performance characteristics of the Golay encoder and decoder blocks will be given in Sec. 3
In our analysis, our target packet error rate (to be defined later) will be 10 -15
. The target aggregate bandwidth is 10.8 Gb/s and every 2D-ODL considered had enough sub-links and a sufficient data rate per sub-link to support this aggregate bandwidth (see Table I ).
Bit-error-rate vs. signal-to-noise ratio
We will consider a 2D-ODL that uses the direct detection (DD) modulation scheme. In the DD scheme, the absence of optical power denotes a logical 0 and the presence of optical power denotes a logical 1. The basic BER of an optical data link that uses the DD modulation scheme, denoted BER DD , is given by [13] :
where the SNR is measured at the receiver output, as shown in Fig. 1(b) . Next, following [3] , [4] , the dominant source of noise is taken to be the detector-amplifier thermal noise characterized by a noise-equivalent power (NEP) referred to the optical domain [13] . We also assume that the optical transmission is through guided-wave optics, for which the optical cross talk is negligible. Under these assumptions, the SNR at the receiver output can be approximated using the power hitting the photodiode [3] :
where P o is the VCSEL launch power, η optics is the throughput of the optics, and f is the bandwidth of the receiver. This definition of SNR ignores inter-symbol interference, which increases the noise at high frequencies. However, at the data rates considered (< 450 Mb/s), inter-symbol interference should not be an issue and was neglected. Throughout our discussion, we will assume a NEP of 0.3 nW/Hz ½ . This value is compatible with simple receiver designs that can achieve a SNR = 10 with 50 µW of optical power over a bandwidth of 250 MHz in the absence of intersymbol interference [3] .
Packet error rate vs. bit error rate
Because data is transmitted in parallel, the figure of merit that we will use to compare the performance of different 2D-ODLs is the packet error rate (PER), defined as ( ) 
P(t) in (3) gives the probability of finding t errors out of an n-bit packet given the raw BER of each individual sublink. The raw BER is simply the probability of error (p e ) of a sub-link. The PER as defined in (3) simply states that the probability of a packet error using a FEC scheme capable of correcting t errors is the sum of the probabilities of finding (t +1) to n errors out of the n-bit packet. This definition assumes that the errors are statistically independent and that all the sub-links, except the erasures, have the same raw BER. Our analysis will therefore consider random errors and systematic errors (erasures), but not bad sub-links.
On-chip power consumption
The following two formulas will be used to compare the on-chip power consumption of a 2D-ODL without FEC to that of a 2D-ODL with FEC.
, , ( ) Fig. 1(b) shows a schematic representation of the 2D-ODLs considered in our analysis with the variables in (4) and (5) included as annotations. We assume that the FEC block and the transceivers (transmitters + receivers) are implemented in a commercial 0.18 µm CMOS technology, for which V DD = 1.8 V. I B and I M model the bias and modulation currents of the VCSEL driver. I R models the current supplied to the receiver. The receiver current and the power consumption of the FEC block are both dependant on frequency, as will be seen in Sec. 3. K represents the number of sub-links, each operating at a data rate f necessary to achieve an aggregate bandwidth of K × f = 10.8 Gb/s. R represents the information rate of the Golay code and has a value of 0.5. Using FEC therefore requires doubling the number of sub-links if the aggregate bandwidth is to be kept constant. I M, no FEC /α(f) represents the (ideally smaller) modulation current required for a 2D-ODL with FEC to achieve a PER of 10 -15
. α(f) is therefore the factor by which the modulation current can be reduced as a result of using FEC.
RESULTS AND ANALYSIS
In this section, we will present a few case analyses using the 2D-ODL model described in the previous section. To perform the analysis, typical values for all variables in (4) and (5) were required. In order to be as close to reality as possible, we synthesized a Golay encoder and decoder [12] using the Synopsis synthesis tool in a 0.18 µm CMOS target technology. For a decoder with a maximum propagation delay of 3.25 ns (corresponding to a data rate of approximately 300 Mb/s), the area predicted by Synopsys was 200 × 200 µm 2 . For a maximum propagation delay of 2.16 ns (corresponding to a data rate of approximately 450 Mb/s), the predicted area doubled to 283 × 283 µm 2 . The encoder is much simpler than the decoder and the design could be synthesized in an area of 42.72 × 42.72 µm 2 with a maximum propagation delay of 1.12 ns, corresponding to a data rate of approximately 900 Mb/s. Clearly, the decoder is the bottleneck in the encoder/decoder pair. For the data rates considered (see Table I ), the maximum number of encoder/decoder pairs needed to sustain an aggregate bandwidth of 10.8 Gb/s is 18. Even with this many encoder/decoder pairs, the total area of the FEC block predicted by Synopsys is 867 × 867 µm 2 . This is less than 1 mm 2 for an aggregate bandwidth of 10.8 Gb/s and we therefore believe that the Golay code is suitable for on-chip FEC. Table I shows the power consumption of the FEC block (P FEC ) at various data rates. Note that the power consumption of the encoder and decoder blocks taken individually follow the rule P d = C L V DD f p [14] , where P d is the dynamic power consumption, C L is the capacitive load, V DD is the supply voltage and f p is the frequency of operation. On the other hand, P FEC is not proportional to the data rate because the number of encoder/decoder pairs is reduced as the data rate increases.
We also simulated the single-ended receiver described in [2] . The simulated current drawn from the power supply (I R ) was used in our analyses. The receiver was designed in 0.35 µm CMOS and therefore the variable I R in (4) and (5) was slightly overestimated compared to a 0.18 µm design. Because we based our analysis on 0.18 µm CMOS, V DD was assumed to be 1.8 V. We assumed VCSELs with a threshold current of 1.5 mA and a bias current equal to the threshold current. Because the Golay code is a half rate code, the variable R in (5) was set to 0.5 throughout the simulations. Variable K depends on the sub-links' data rate (see Table I ).
SNR requirement to achieve a PER of 10 -15
Using (1), the raw BER was plotted against the SNR (see Fig. 3 ). The PER for 5 other cases was plotted in the same figure using (3). The second case of interest is without FEC and no erasure. At any given SNR, the PER is greater than the raw BER of each individual sub-link. This can be explained by the fact that any one of the 12 sub-links can cause a packet error. The SNR required to achieve a PER of 10 -15 is 24.3 dB (see Table II ). In the third case, with FEC and no erasure, the SNR can be brought down to 18.4 dB, corresponding to a 5.9 dB coding gain at a PER of 10 -15 . In cases 3 to 6, a 24-bit ODL with 1, 2 or 3 erasures requires an SNR of 19.6, 21.4 and 24.4 dB, respectively. It should be noted that at a given SNR, the PER for case 6 is slightly higher than in case 2. This is due to the fact that the probability of finding one error out of a 21-bit packet is higher than finding one error in a 12-bit packet. Note that only 21 bits are considered in the calculation of the PER in case 6. The decoder corrects the 3 erasures and therefore the PER becomes the probability of finding one error out of a 21-bit packet. The same discussion applies to cases 4 and 5, where the PER is calculated on 23 and 22-bit packets, respectively. Fig. 4 shows the VCSEL launch power required to achieve a PER of 10 -15 under various conditions. The VCSEL launch power is tabulated in the figure as a function of data rate, optical loss and number of erasures. Cases 2 to 6 of the previous section are considered here. The cases were ordered from left to right according to their launch power requirement. For the same set of conditions, the 2D-ODL that will require the lowest VCSEL launch power is the one with FEC and no erasure. For example, consider a 2D-ODL operating at 225 Mb/s/sub-link and having an optical loss of -6 dB (which corresponds to an optical throughput of about 25%). The VCSEL launch powers for cases 2 and 3 are 4.87 mW and 1.24 mW, respectively. FEC therefore reduces the required VCSEL launch power by a factor of 3.9 when there are no erasures. Moreover, a forward error corrected 2D-ODL with as many as two erasures decreases the VCSEL launch power by a factor of almost 2 (4.87 mW vs. 2.49 mW) while still maintaining a PER of 10 -15
VCSEL launch power requirement
. Because the SNR requirement for cases 2 and 6 is almost the same (see Table II ), the VCSEL launch powers are comparable.
The required VCSEL launch power increases with frequency (see Fig. 4 ). This can be explained by (2) , from which it is clear that the VCSEL launch power must increase proportionally to the square root of frequency in order to maintain a constant SNR. Fig. 4 also suggests that the launch power increases with optical loss. This is also in accordance with (2) , which indicates that the VCSEL launch power must increase proportionally to the optical throughput to maintain a constant SNR. At high data rates or for high optical losses, a point is reached where the required VCSEL launch power is not practical or realistic anymore. For example, consider the 2D-ODLs of cases 2 and 5 in Fig. 4 . Assuming that the 2D-ODLs use VCSELs having a maximum optical output power of 5 mW, Fig. 4 says that there are certain data rates or optical losses that would require too large a VCSEL launch power to maintain the PER below 10 -15
. The 2D-ODL of case 2 (no FEC) will not be able to maintain a PER of 10 -15 if the optical loss is -8 dB. However, provided that the data rate is 225 Mb/s or less, the 2D-ODL of case 5 (with FEC) will be able to maintain a PER of 10 -15 under the same optical loss condition. Additionally, the 2D-ODL of case 5 is able to handle two erasures.
In summary, our analysis thus far suggests three things regarding the use of FEC. 1) It makes a 2D-ODL robust to erasures. 2) It allows the reduction of the VCSEL launch power. 3) It helps relax the requirements on the optical system.
On-chip power consumption
We saw in the previous section that FEC can significantly decrease the required VCSEL launch power to achieve a given PER. In this section, we will show that FEC can also decrease, in some cases, the on-chip power consumption. This is somewhat counterintuitive considering that FEC doubles the number of sub-links necessary to support a given aggregate bandwidth at a given data rate per sub-link, not to mention that the FEC block consumes power.
Using (4) and (5), the on-chip power consumption of 2D-ODLs with no erasures (cases 2 and 3) were plotted in Fig.  5 . Table III presents a greater subset of the results. Fig. 5(a) shows that when the optical loss is -3 dB, a 2D-ODL with FEC consumes more power than a 2D-ODL without FEC at any data rate, for a negative power saving. From Table III , the power penalty is 464 mW, 185 mW and 96 mW for data rates of 100, 225 and 450 Mb/s/sub-link, respectively. A -3 dB optical loss is a relatively conservative number and therefore optical losses of -6, -8 and -10 dB were considered in Fig. 5(b-d) , respectively. For a -6 dB optical power loss, FEC starts to give a power saving for data rates above approximately 150 Mb/s. From Table III , this power saving is 74 mW at 225 Mb/s and 87 mW at 450 Mb/s. As the optical loss increases, the break-even data rate decreases. For an optical loss of -8 dB or more, FEC reduces the power consumption of a 2D-ODL at all data rates considered - Fig. 5(c, d) .
In terms of PER, a 2D-ODL with no FEC is only as good as its worst sub-link. This is so because an error in any one of the sub-links transmitting in parallel will cause a packet error. A 2D-ODL with no FEC and only one erasure will therefore fail to provide a PER of 10 -15
. On the other hand, a 2D-ODL with FEC will be able to handle as many as 3 erasures out of a 24-bit packet before it fails to provide a PER of 10 -15
. We compared the power consumption of a 2D-ODL without FEC (case 2) to the power consumption of a 2D-ODL with FEC and one (case 4), two (case 5) or three (case 6) erasures. In case 2, a 2D-ODL with absolutely no defect is needed because no erasure can be tolerated.
In cases 4 to 6, a less than perfect 2D-ODL is acceptable since three erasures can be tolerated out of 24 sub-links. Fig. 6(a) shows that the 2D-ODL with one erasure gives a power saving relative to the 2D-ODL without FEC and no erasure -assuming a data rate greater than 150 Mb/s and an optical loss of -8 dB. For a -10 dB optical loss, the power saving is greater -see Fig. 6(b) . As can be seen in Fig. 6(c) , we stop having a power saving for the 2 or 3-erasure cases (cases 5 and 6) -only case 5 is shown in the figure. With two erasures and an optical loss of -10 dB, the on-chip power penalty is 503 mW and 321 mW (see Table III ) at data rates of 225 and 450 Mb/s, respectively We are willing to take this power penalty though as FEC has made the 2D-ODL robust to erasures. The power penalty is even more justified considering that FEC reduces the required VCSEL launch power and relaxes the design requirements of the optics.
CONCLUSION
We have demonstrated that on-chip FEC offers many benefits to 2D-ODLs. In particular, we showed that FEC can 1) reduce the required VCSEL launch power to achieve a given BER, 2) keep the BER below the target BER even in the presence of erasures, 3) reduce the on-chip power consumption and 4) relax the optical throughput requirement. For certain low data rates or optimistic optical throughput, the on-chip power consumption of a 2D-ODL with FEC becomes higher that the power consumption of a 2D-ODL without FEC. For less than perfect 2D-ODLs containing erasures, we concluded that the power penalty was acceptable considering the many advantages that FEC offers. 14. N. H. E. Weste, and K. Eshraghian, Principles of CMOS VLSI design: a systems perspective, 2 nd ed., AddisonWesley, Massachusetts, 1994. Table I Values used for the variables in (4) and (5). K is obtained by dividing 10.8 Gb/s by the data rate. Dividing K by 12 (the number of information bits for the Golay code) gives the number of encoder/decoder pairs. The encoder and decoder blocks were synthesized in 0.18 µm CMOS. The receiver [2] was designed in 0.35 µm CMOS and I R has therefore been slightly overestimated in our analysis. Fig. 4 . VCSEL launch power requirement to achieve a PER of 10 -15 as a function of data rate, optical loss and number of erasures. The corresponding SNR requirement for each case can be found in Table II . 
TABLES AND FIGURES
• • • • • • • • • • • • • • • (b)
