Abstract-In this paper, we demonstrate a low-cost, 48-channel, high-speed, flexible printed circuit (FPC)-based interconnect packaging concept for on-board optical modules. Due to the good high-speed performance and low cost, the FPC board is used as the base carrier for both transmitter and receiver modules. The on-board transmitter and receiver are based on a commercial 1-mm-pitch ISI HoLi pin grid array connector. The size of each module is only 31.5 mm × 31.5 mm and offers a state-of-art bandwidth density of 0.483 Gb/s/mm 2 by using a compact design. Investigation of RF signal propagation on the FPC is carried out for design validation at 10 Gb/s and, in order to further explore the potential of the suggested platform, differential pairs are simulated up to 30 GHz. The low-cost packaging approach requires only several flip-chip bonding steps using industry-standard solder reflow and ultrasonic bonding processes. An 8 × 12-channel optical straight lens connector is used to couple the light from the optics into two 48-fiber multi-fiber push on connectors with 8 × 12-channel MT ferrules. The fully assembled transmitter and receiver are tested at 10 Gb/s demonstrating error-free operation with sensitivities comparable with those of commercial devices. Bit error rates for all 96 channels as well as representative eye diagrams at 10 Gb/s are reported.
I. INTRODUCTION
A T PRESENT, the required optical interconnect bandwidth is exponentially growing to support the growth of traffic in data center networks. Parallel optical transmitter and receiver modules should be improved to meet the increase of both bandwidth and bandwidth density in the network [1] . The majority of current commercially available transceivers are using pluggable front panel modules such as SFP+, QSFP+, and so on [2] . The size of a pluggable transceiver module, which is mainly employed at the edge of the data center T. Li, C. Li, R. Stabile, and O. Raz are with the Department of Electrical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands (e-mail: t.li@tue.nl; chenhui.li@tue.nl; rstabile@tue.nl; o.raz@tue.nl).
S. Dorrestein is with TE Connectivity Ltd., 5222AR 's-Hertogenbosch, The Netherlands (e-mail: sander.dorrestein@te.com).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/ TCPMT.2018.2852758 switchboard, and the area of the front panel of the switchboard limit eventually the total supported bandwidth for the switching system. This issue is referred to as the front panel bottleneck [3] . Also, the long transmission lines (TLs), connecting the optical transceiver modules from the edge of the board to the switch application-specified integrated circuit (ASIC), lead to an increase in power consumption. As the bit rates rise, increased power consumption and electrical distortion become more notable. In addition, retimers need to be placed on the printed circuit board (PCB) to guarantee sufficient signal integrity, and these require additional power [4] . As a result, the increasing research and design efforts have been given to placing the optical transceiver modules closer to the switch ASIC [5] . On-board optics (OBO) transceivers are proposed as an alternative to the front panel transceivers to overcome the front panel bandwidth bottleneck with the small footprint and large bandwidth density [6] ; placing interconnects on-board reduces the power consumption by shorting the TL and making the retimers redundant [4] . When looking into high-speed optical transceiver modules, most of the effort is invested in the electronic and optical packaging processes to fulfill several requirements such as high bit rate, high bandwidth density, high channel count, low cost, and low power consumption. For example, cleanroom technology is often used in the interposer fabrication process [7] , and bond wiring techniques are used to make the connections from the interposer to chip or between the chips [8] . Such packaging processes are relatively expensive and complicated or include serial fabrication steps making them costly and hard to automate.
The performance of the optical transmitter is highly dependent on the signal integrity in the high-speed TLs. Therefore, reducing the propagation loss and unwanted reflections in the packaging is the main criteria when choosing the substrate and designing the metal tracks for the optical transmitter module. Flexible printed circuit (FPC) boards, which are made of polyimide material, have been demonstrated to be a promising platform for optical transceiver modules in the past [9] . Compared to a rigid PCB, the FPC can offer higher fabrication resolution [10] , and also has been shown to provide good performance at bit rates up to 40 Gb/s [9] . However, the assembly process for FPC can be more challenging due to the deflection of the nonrigid substrate when pressure is applied during the flip-chip assembly step.
In this paper, we report our results for an FPC-based 48 × 10-Gb/s on-board transmitter and receiver modules. In order to ensure easy and low-cost packaging, the electrical and optic components are attached to the FPC using a flipchip machine. Both solder reflow and ultrasonic bonding processes are used directly on the FPC, making the need for any additional interposers or cleanroom technology processes expendable. We suggest that this process can be taken over by automated flip-chip machines in assembly lines as it requires low alignment accuracy [11] . Each module includes 48 channels at 10 Gb/s and offers a bandwidth density of 0.483 Gb/s/mm 2 . An 8 × 12-channel optical straight lens connector is used to couple the light to/from the optical dies into two single 48-fiber multi-fiber push on (MPO) connectors with 8 × 12-channel MT ferrules. The ISI pin grid array (PGA) connector is used to connect the module to the field-programmable gate array (FPGA) board, and it has been shown that the PGA itself can support up to 20-Gb/s signaling [12] . This paper is organized as follows. Section II will show the design concept for both transmitter and receiver modules. In Section III, we will show the fabricated FPC board, and the TL on it will be simulated and discussed. In Section IV, the assembly processes for both transmitter and receiver will be fully described. In Section V, the test results for both transmitter and receiver modules will be given including bit error rate (BER) curves for all 96 channels and eye diagrams at 10 Gb/s for representative channels. Also, the crosstalk will be discussed in this section. In Section VI, results of thermal simulations, performed by using COMSOL, will be given. Finally, we will draw the conclusion in Section VII.
II. DESIGN CONCEPT
The schematic of the optical transmitter/receiver is shown in Fig. 1 .
A low-loss polyimide material has been used as the base material, which has a dielectric constant of 3.2 and a dissipation factor of 0.004 at 1 MHz. Both FPC for transmitter and receiver have dimensions of 3.15 cm × 3.15 cm × 90 μm (W × L × D), with two copper layers sandwiching a thin insulating polymer film. Copper circuit patterns are affixed on both sides of the polymer layer to allow interconnections between the chips and connection to the commercial ISI HoLi 556 PGA connector. Solder mask layers on both sides are typically supplied to protect the metal tracks. There are several reasons for choosing this PGA connector. First, the pitch of this PGA connector is 1 mm, which is the standard pitch of most packaged FPGA and switching ASIC chips. Second, the size of commercially available optical lens interface (2.6 mm × 6.4 mm) sets the eventual minimum size limit of the module, and this PGA connector can provide enough space in the center to allow the placement of the 4 × 12-channel optical straight lens connectors and their fan-out fiber ribbons. Moreover, the ground plane between the differential pairs reduces the crosstalk between the channels but increases the module size (see Fig. 2 ).
Although the polyimide is an excellent material for high-speed applications, its shape will change during the high-temperature reflow process, and this might lead to misalignment during the flip-chip bonding process. To solve this problem, a plastic stiffener is attached on the FPC to give the required mechanical support, which will help to avoid the deflection of the FPC and makes the assembly process easier.
In the center of the FPC, 50-μm-wide coplanar differential pairs for four separate 12-channel transmitters or receivers are routed compactly. Along the edge of the FPC, 1-mm standardpitch round pads for a PGA connector are located. The PGA connector is machined to have a hole in the middle to leave space for the electrical and optical components, and it is shown in Fig. 3 . This PGA connector has been shown to support up to 20-Gb/s signaling [12] .
III. SIMULATIONS OF FPCS A. Digital Signal Transmission
The differential TLs, which are based on a coplanar waveguide design, are used for sending digital high-speed signals within the FPC from the PGA pads to the electrical chips. The differential impedance for these lines is calculated Fig. 4 .
Drawing of the layer stack of the FPC for supporting 100-differential TLs. and optimized to be 100 by Polar SI9000 and is further simulated by computer simulation technology (CST). As shown in Fig. 2 , the differential pairs are fan-out from the pads with 125-μm pitch in the chip to the 1-mm pitch pad for PGA connector, and each chip is routed with 12 differential pairs. Due to the locations of the pads, the length of the differential pairs varies from 13.1 to 4.8 mm. Also, the impedancematched differential pairs are designed to have 70 μm width and 125 μm pitch (between two differential pairs) and 150-μm gap from the ground plane. The drawing of the FPC layer stack for a differential pair is shown in Fig. 4 .
The S-parameters of the designed differential pair on FPC are then simulated using 3-D full-wave electromagnetic simulation tool of CST. We simulate the same differential TL with and without the PGA pads. It is clear that the size of the round pad (with 1-mm pitch) also influences the signal transmission performance, especially at high frequencies. The transmission parameter and reflection parameter of channel 9 with 10.1 mm length are shown in Fig. 5 .
Based on the simulation results, the TL itself has a reflection parameter (S 11 ) below −20 dB, and its transmission loss (S 21 ) is better than −1.2 dB up to 30 GHz. It is obvious that the large pads for the PGA connector will introduce some unwanted reflections and propagation losses at high frequencies. However, commercially available high-speed PGA connector, which has 0.34-diameter round pads, can easily support up to 30 GHz and provide only −2-dB transmission loss (S 21 ) and below −10-dB reflection (S 11 ). The loss coefficient of the differential pair on FPC is about 0.6 dB/mm at 30 GHz. For a maximum transmission loss of 3 dB, the maximum length should not exceed 5 cm.
In order to analyze the high-speed performance of this differential pair, the 28-Gbaud PAM4 signal is injected at the longest differential pair-channel 11-which is 13.2 mm, and picked up at the other end. This channel suffers the largest propagation loss and reflections because of the length. The PAM4 eye diagram and the back-to-back (BTB) performance are shown in Fig. 6 . Due to the insertion loss of the probe and propagation loss of the TL, the amplitude swing is reduced from 600 to 400 mV. Also, the jitter is increased due to the small length difference of the differential pair. However, the eye is still clearly open.
B. Analog Signal Transmission
Based on the pad layout of the chips, the trace needs to be designed as a 125-μm-pitch ground signal line to connect the anode and cathode pads of the 250-μm-pitch optical arrayed devices. Although for many high-speed TLs a 50-impedance is often desired, the impedance of optoelectronic ICs (vertical cavity surface emitting laser (VCSEL) and photodetector) may be quite different. Therefore, the eventual performance of the device including both CMOS and optoelectronic is highly dependent on this TL.
There are several issues that need to be considered in the design. Since VCSEL parameters vary with bias current, the connection should be designed as short as possible since a long trace can lead to signal degradation. Shorter connections allow for tight coupling between the anode and cathode, which can reduce lane-to-lane crosstalk [13] . As the length of the connection grows, attention should also be paid to impedance matching to reduce the unwanted reflections [14] .
The VCSELs used in this transmitter support up to 14 Gb/s with 65-series resistance. Based on this resistance, the TL is designed as a 50-μm-wide signal trace and 50-μm-wide ground trace with 75-μm gap. The characteristic impedance of this trace is calculated to be 65 to match the VCSELs by Polar SI9000. However, not all high-speed 850-nm VCSELs have 65-differential series resistance, and their impedances depend not only on the design structure corresponding to the speed but also on the bias current [15] . For example, the 10-, 14-, and 25-Gb/s VCSELs have a differential series impedance of 50 [16] , 65 [17] , and 80 [18] , respectively, at 6-mA bias current.
CST is also used to simulate the S-parameters for this signal ground TL. In the simulation, the impedance of driver source is set to 50 , and the length of the TL is 500 μm. The impedance of the load is set as 50, 65, and 80 , respectively. The simulation results are shown in Fig. 7 .
Based on the S-parameter simulation results (Fig. 7) , due to the small length and low material loss, even though the impedance is not matched, there is little difference in the propagation parameter, and for all three impedances, the propagation losses are less than −0.25 dB up to 30 GHz. The reflection factors for different loads are changing a lot because of the impedance mismatch. The TL is designed for 65 to match the 14-Gb/s VCSELs. When the source with 50 drives 10-Gb/s VCSEL with the same impedance through this TL, the reflection factor rises from −38 dB at 2 GHz to −16 dB at 30 GHz. This shows that matching the impedance for both source and load without considering the impedance of short TL is possible, especially for the low frequencies. Also, the result shows that matching the impedance for a short TL and a load of 65 and leaving the driver source with 50 gives a good solution. Even though this solution (matching to 65 ) is not as good as the 50-solution at the lowfrequency region, when the frequency is beyond 26 GHz, the performance becomes better. The reflection factor (S 11 ) of this solution is very flat and always below −17 dB from 2 to 30 GHz. If the load with 80-series resistance is applied with 50-driver source and 65-TL, due to the short length of the trace, the simulation result shows that even though the reflection performance is not as good as the other solutions, it is still below −13 dB. Overall, we believe the designed TL between driver chip and VCSELs can support up to 28-Gb/s signaling speeds in our FPC platform.
Different types of driver designs might also affect the performance based on their output impedance. For example, in order to reduce the power consumption, the output resistance of the driver chip is designed to be relatively large compared with the VCSEL junction resistance [18] .
In addition to the factors discussed above, the length of the trace can also make a large difference on the high-speed performance. In order to validate this influence, 500-μm-length traces and 1000-μm-length traces have been designed in the four 12-channel transmitters.
Because of the low-cost process, the fabrication resolution is coarse and hard to control. In order to avoid potential short circuits, overetching is applied during the fabrication process. This results in traces on the FPC that are 30 μm wide (nominal size was 50 μm) with the 125-μm pitch ground-signalground (GSG) TL and the drawing of GSG trace is shown in Fig. 8 .
CST software is used to resimulate this fabricated TL, and the vector network analyzer (VNA) is used to measure its S-parameters. In both simulation and test, the output resistance of driver chip is 50 , and the impedance of VCSEL is set to 50 , respectively, to verify the performance at 10 GHz. The results of these measurements and simulation are further discussed in Section V.
Compared to the transmitter, the link performance between the photodiode and transimpedance amplifier (TIA) chip cannot be optimized by changing the current. Therefore, highspeed signal degradation is clearly observed [7] . To overcome these potential problems, the photodetectors are placed closer to the TIA chips using a trace only 300 μm long.
IV. ASSEMBLY
The FPC is fabricated with analog and digital traces as designed in Section III. The metal traces are made of copper, and the FPC base material is a polyimide film, which is flexible with a tensile strength of 345 MPa. As a result, the direct flipchip bonding of chips on such a thin film becomes a challenge. In order to overcome the deflection during bonding, plastic stiffeners are glued on the FPC to enhance the stiffness of the FPC. The thermal expansion coefficient of the stiffener is close to the thermal expansion coefficient of the FPC to avoid possible coefficient of thermal expansion mismatchrelated issues during the high-temperature reflow processes. Four cavities have been designed in this additional plastic layer to allow the insertion of the optical straight lens connectors. The top view of fabricated FPC is shown in Fig. 9 .
A FINEPLACER Femto is used to flip-chip bond both the CMOS driver integrated circuits (ICs) and 850-nm multimode VCSEL arrays through SnAg solder bumps reflow process. In order to improve the reflow process, formic acid vapor is introduced during the process to reduce the surface oxides of the solder bumps. To further improve the stiffness of the FPC substrate during the die reflow, a special attachment for the hotplate has been machined with a relief for contacting the FPC in the cavity during the bonding [ Fig. 10(a) ]. When using a standard flat hotplate, the air inside the cavity has a much lower thermal conductivity, reducing the heat transfer from the hotplate to the pads on FPC and solder bumps on chips. The specially designed hotplate attachment is made of aluminum, which has a high thermal conductivity greatly improving the heat transfer between the hotplate and the pads on the FPC. After visual alignment, both the chip pickup holder (330°C) and hotplate (300°C) are warmed up for 60 s to complete the solder reflow process between the VCSEL array and FPC [ Fig. 10(b) ]. Then, for the attachment of the CMOS dies, the heat is only applied to the hotplate and FPC. The reflow process takes place at 300°C and lasts 60 s [ Fig. 10(c) ] to complete the assembly process. The FPC is flipped upside down, and the optical straight lens connectors with the size of 2.6 mm × 6.4 mm are assembled through the cavities with The assembly processes for the receiver module require some modifications. As no PD arrays were available with solder bumps, ultrasonic bonding process is used to make the connection between the photodetector array and FPC. The VCSELs and photodetectors are usually commercially available as dies with gold pad. Ultrasonic bonding process is a good solution for these relative robust optical chips based on GaAs materials. For soft chips such as InP, solder paste transfer process can be applied on their pads, which may prove to be compatible with reflow process in industrial production lines. It means that both packaging approaches can be easily used for most III-V semiconductor-based optical chips. The fully assembled device with fiber ribbons attached is shown in Fig. 11 . Both the front [ Fig. 11(a) ] and the back [ Fig. 11(b) ] sides of the 48-channel on-board transmitter are shown.
Then, solder reflow process is applied to bond the ISI connector at 250°C for 60 s. The fully assembled on-board optical receiver is shown in Fig. 12. V. EXPERIMENTAL RESULTS High-speed data transmission and detection are used to evaluate the high-speed performance of the transmitter and receiver modules. The experimental setup is shown in Fig. 13 . A 1-mm-pitch differential RF probe is used to inject signals to a differential pair of PGA pads. A microcontroller is used to control the driver and TIA chips with two wire communication interfaces and provides a 3.3-V power supply. The standard MPO connector is connected to the optical straight lens connectors through the MT ferrules. A commercial SFP+ module functions as a high-speed light source or detector in the experiment depending on the module tested.
A. Transmitter Testing
A 10-Gb/s differential electrical signal, with 2 31 − 1 non-return-to-zero (NRZ) pseudorandom binary sequences (PRBS), is injected into the driver chips and VCSEL array on the FPC through a 1-mm-pitch RF differential probe. All the VCSEL channels are turned ON, and the measured output power ranges from −1.5 to −0.5 dBm. The receiver inside the SFP+ module is used to detect the optical signals from the VCSELs. The electrical output signal from the SFP+ is then connected to the oscilloscope and error detector. Based on the results, all 48 channels in this transmitter module show error-free performance at 10 Gb/s. The BER of 48 channels is tested one by one, and the results are shown in Fig. 14 . Based on the BER test results, the transmitter design with 1000-μm-long TLs between the driver output and VCSEL array (from channel 37 to channel 48) shows the worst performance, and its average sensitivity at 1e-9 is −11.53 dBm. The average sensitivity for other channels with the 500-μm-long TL (from channel 1 to channel 36) is −12.02 dBm for a BER of 1e-9. All the 48 channels in the transmitter module perform better than the SFP+ module whose sensitivity is −11.1 dBm. In addition, the receiver sensitivity spread is within 1 dB for a BER of 1e-9.
One possible reason for the reduced performance of the transmitter chip based on the longer traces could be the difference in RF performance. In order to investigate this issue, CST software is used to simulate the expected performance, and a VNA is used to measure the S-parameter of both 500-and 1000-μm-long TLs. The reflection and transmission coefficients of the long and short traces are shown in Fig. 15 . In both the simulation and measurement, the driver source and load are set to 50 . We see that the long TLs cause more signal loss, about 0.5 dB, and higher reflections, around 5 dB. Based on this, the signal degradation will lead to a drop in the current amplitude swing leading to a reduced extinction ratio and finally poorer receiver sensitivity. Also, the parasitic capacitance of the bond pads [13] might impact the BER performance for longer PRBS sequences due to pattern effects.
Based on the S-parameter simulation results (Fig. 15) , all channels in the transmitter module have less than −0.3-dB propagation loss and lower than −12-dB reflection at 10 GHz. The measurement results are clearly underperforming when compared with the simulation. This can be attributed to the loss factor of material, which increases at high frequencies as well as additional impedance mismatch of the real devices. Compared with the simulation results in Section II, the fabricated trace suffers additional 0.3-dB loss and 5-dB reflection relative to the theoretically designed trace.
The eye diagrams for representative channels are visibly open at 10 Gb/s and shown in Fig. 14 . Similar to the BER test result, the long connection between the driver chip and VCSELs also leads to increased jitter in the eye pattern.
B. Receiver Testing
The same experimental setup is used to verify the high-speed performance of the receiver. The transmitter of SFP+ module is used to generate optical signals with 10-Gb/s NRZ PRBS 2 31 − 1 sequence. Then, the optical signals are detected by the photodetector and amplified by the TIA chip. The converted differential output electrical signals are visualized on the oscilloscope, with one port terminated with 50 , and are further connected to an error detector for BER measurements. The BER testing is also performed on the receiver module, and the measured results are shown in Fig. 16 . The actual receiver sensitivity value should be compensated with an additional 0.6 dB, which is the minimum insertion loss of the optical straight lens connectors. After the compensation, all 48 channels in this receiver outperform, in terms of BER, the SFP+ module. All 48 channels show a receiver sensitivity distribution of around 1.2 dB. This is caused by misalignment of the optical connector in both the vertical and horizontal directions and deflections on the surface of the FPC.
The eye diagrams for selected channels are clearly open as shown in Fig. 16 . In this receiver design, 10-and 0.1-μF decoupling capacitors are placed next to the electrical TIA chips. The eye diagrams before and after the decoupling capacitors assembly are shown in Fig. 17 . It is clear that the eye diagram after assembling these decoupling capacitors greatly improves.
C. Crosstalk Testing
As mentioned in Section III, compared to the transmitter, the receiver is more prone to signal degradation, and it is more sensitive to the crosstalk created by adjacent channels. Therefore, crosstalk is measured on the receiver module. Two optical signals, with 10-Gb/s PRBS 2 31 − 1 sequence, are generated by a QSFP28 module and fed to the inputs of channel 4 and channel 6 while testing the performance of channel 5 using an error detector module with differential inputs. The BER curves of channel 5 are drawn in Fig. 18 , compared with one-channel, two-channel, and three-channel operations. The results show that the impact on the performance of channel 5 is less than 0.4 dB reduction in receiver sensitivity. Feeding more optical signals into other nonadjacent channels has not resulted in any further measurable degradation. We can conclude that the crosstalk-induced receiver sensitivity penalty for each adjacent channel is less than 0.25 dB. 
VI. THERMAL SIMULATION
One of the biggest concerns in compact designs is the heat dissipation and the eventual thermal conditions to which the optical devices are exposed. Based on the datasheet, a typical IC used in our packaging demonstration will dissipate 1.02 W for each 12-channel module when all channels are operating at 10 Gb/s simultaneously. COMSOL is used to simulate the thermal dissipation of this compact design.
The electrical chip is predominantly composed of silicon with a metal stack layer (copper) and SiO 2 passivation layer on the top. Its physical dimensions are 3870 μm×2245 μm × 430 μm (W × L × H ), and the dimensions of the optical chip are 3250 μm × 500μm × 200 μm. As shown in Fig. 19 , four 12-channel modules are placed on the FPC following the design concept in Section II. In the simulation, the ambient temperature is set to 20°C. The heat sink is directly attached to the top of the four electrical chips, and these four heat sources-12-channel moduleare assumed to be operating at maximum power dissipation. A possible heat sink design is simulated. The suggested heat The result of the thermal simulation is shown in Fig. 21 . After using a heat sink, the maximum temperature is 34°C, which is 14°C higher than ambient. Based on the typical data sheets for VCSELs, we expect the output power of VCSEL to drop by only 0.2 mW at a bias current of 6 mA due to this elevated temperature.
VII. CONCLUSION
A cost-effective on-board optical transmitter and receiver modules have been packaged by assembling the electrical and optical chips on an FPC. The assembly concept was used to package both a compact 48-channel transmitter and receiver, offering a larger bandwidth density than most commercially available OBO modules. The impedance-matched connections are designed in Polar SI9000 and simulated in CST. The design of electrical and optical connections can be used for the next-generation optical transceivers working at the higher speed. The electrical traces have been simulated up to 30 GHz and show excellent performance for future high-speed design. All 96 channels in both transmitter and receiver are tested at 10 Gb/s. The eye diagrams are clearly open for all channels at 10 Gb/s. The BER results show both on-board transmitter and receiver reported in this paper have better performance than a commercial SFP+ module.
In future designs, short interconnection is highly recommended to improve the performance and realize higher bit rate. We will aim to build a large-scale low-cost transceiver module using the standard 1-mm-pitch ball grid array layout, which is compatible with switching ASICs and FPGAs.
