Optical backplane bus based on glass substrate with volume holographic gratings on top surface possesses a great ability to broadcast information. This feature is utilized to accomplish a bit-interleaved optical interconnect system. In this system, each daughter board sends only one bit per round and the bit pulses from different boards can cascade in a designed series when the transmitters are distributed in an appropriate manner. In this way, even slow electronic chips can be coordinated to generate an aggregate bandwidth up to 10Gbps, which is impossible to achieve with a multi-drop electrical bus. Besides the benefits of high data rate and low crosstalk, such a bit-interleaved architecture provides a secure data storage method. Each daughter board only stores a quarter bits of any byte, so that no single board has the entire information and security is enhanced.
Introduction
In high performance computing systems, interconnect is becoming an even more dominant factor [1] , and optical interconnects have been adapted in HPC systems as the product of distance and bandwidth has surpassed the capacity of electrical interconnects at the machine-to-machine hierarchical level. As the clock frequency of CPU and the number of cores within one CPU are increasing at a rapid rate, the bandwidth demand among the boards and processors inside a box will soon reach to a point that electrical interconnects are inadequate due to the fundamental physical limitations, such as skin effect, crosstalk and transmission line effect.
However, due to computer architecture design, optical interconnects are still not widely implemented inside the box of modern computers. Due to physics of the DRAM and hard drive storage mechanism and technology, the data rate that these storage modules can afford are really low compared to CPU. A multi-channel interleaved memory technology can provide higher data rate than a single memory module. Similarly, RAID (Redundant Array of Inexpensive Disk) system also can provide higher data transfer rate to hard drives than just a normal motherboard hard drive interconnect. The enhancement of data rate actually comes from the fact that multiple storage modules are used in the system. Low speed parallel data from multiple modules are multiplexed by central switches into high speed serial data and therefore, the low speed storage components can catch up with the high CPU speed.
Electrical interconnects cannot achieve high data rate using bus architecture so point-to-point interconnects were major choices for high speed system. Therefore, wiring congestion is a concern for multi-channel interleaved technology. On the contrary, optical backplane permits different daughter boards to share transmission channel while maintaining the high data rate, and therefore does not have wiring congestion problem. With bus architecture, system security and reliability will also be improved since data can be distributed into different modules. We first successfully demonstrated optical backplane bus using volume holographic gratings [2] with a data rate of 1.25bps per channel. In such type of optical backplane, bus architecture is retained to fulfill the task of broadcasting and transmission channel sharing, while the potential data rate is comparative to or even higher than that in electrical point-to-point type backplane as long as high-speed electro-optical transceivers are available. The diffraction efficiency of the volume holographic grating at each fan-out was controlled so that the power delivered to each board was equalized [3] . Therefore, the effort to find photodetectors with a large dynamic range is greatly reduced.
To build a Bit-Interleaved Optical Bus (BIOB), alignment tolerance and crosstalk are critical issues that should be first investigated. Researches have been carried on to understand the alignment tolerance in free-space optical interconnect systems [4] , [5] . However, these efforts mostly were focused on optical point-to-point interconnects. We will describe theoretical calculation of alignment tolerance and discuss the effect of volume holographic gratings on crosstalk suppression in Section 2. Experimental data will be presented as supportive evidence. In Section 3, bit-interleaved architecture will be illustrated and experimental results will be demonstrated. Potential limiting factors for data rate will be also discussed. Finally, a summary will be given in Section 4. Fig. 1 illustrates the architectural concept of the optical backplane bus based on Volume Holographic Gratings (VHG). A distributor board is inserted into the central slot (#A), with daughter boards on both sides. The receivers on distributor board A collect signal from any daughter board, and then broadcast the detected signal or signal from the CPU to all the daughter boards. The electro-optical interface modules, including vertical cavity surface emitting lasers (VCSELs), photodetectors, and OE/EO driver circuits, are located over a glass substrate, which is the optical interconnect layer with VHG on the top surface so that light can be guided into and out of the substrate.
Crosstalk and Alignment Tolerance Issue in Multi-Channel Optical Backplane Bus System Using Volume Holographic Gratings
In the first step to build a multi-channel optical backplane bus, there is only one distributor in the center with one daughter board on each side. To describe the alignment tolerance, the position range and orientation range, within which a detector can receive enough optical power at a given bit error rate, will be estimated. In order to investigate crosstalk in an alignment tolerant system, VCSEL with large divergence angle is used so that the output beam can cover a larger area.
To utilize the limited space of glass substrate effectively and minimize crosstalk, the VCSELs and detectors are interlaced as illustrated in Fig 2. Detector A has a nearest detector B, second nearest detectors C and D. For crosstalk consideration, the power that detector A receives from the VCSEL in another board for channel A is calculated and denoted as P A→A . Portion of the optical signals emitted by the VCSEL in other board designated for channel B, C or D may also get into detector A and are denoted as P B→A, P C→A, and P D→A respectively. Although a higher power to detector is desired for higher data rate, the overall system performance will not be improved if the crosstalk from the adjacent channels increases consequently. Therefore, normalized power is used to describe crosstalk, e.g., C B→A = P B→A / P A→A can be used to denote the crosstalk from B to A. Achieving reduced crosstalk together with higher power transmitter will be desired for system with higher data rate. A simple way to reduce the crosstalk among adjacent channels is to use polarizing films to the optical layer. For example, the polarizer used for the whole column where detector B locates should have a polarization perpendicular with that in columns where detector A, C and D locate, because B is the nearest neighbor to A. Therefore, if crosstalk from B is suppressed to be below the desired crosstalk, the channel spacing then can be reduced. Similar to the polarizer film, the volume hologram film also functions as filters. The accurate calculation of crosstalk needs to take the hologram diffraction efficiency into account. When light emitted from the VCSEL designated for detector A reaches the receiver slot, the beam could cover a range larger than the detector lens area due to divergence of the beam. Since the beam is diverged, the incident angle θ at the receiver hologram is a variable over that range, as depicted in Fig. 2 . Therefore, the incident angle of the beam will have a 2.1° range for diffraction efficiency above -3dB. The calculations of crosstalk with hologram were based on the Kogelnik's theory [6] . Fig. 3(a) shows the ideal hologram efficiency as a function of the beam incident angle. As a result, in a diverged beam, light with incident angle away from the perfect condition will experience more loss. In theoretical calculations, we assumed VCSEL beam divergence angle around 2°, detector dome lens radius around 2.3mm, hologram thickness d from 20µm to 200µm and index modulation depth ∆n around 0.001~0.01. These parameters were chosen to describe the real conditions. In Fig. 3(b) , the output power density along x axis is compared for Gaussian beam with no hologram and for Gaussian beam with hologram filtering effect. The ratio of fan-out power along x axis is defined as the beam fan-out function f(r x,y ). The beam waist is reduced along the x axis only, because it is the beam propagation direction which is determined by the orientation of the hologram. Therefore, the fan-out beam profile will become elliptical as depicted in Fig. 3(c) . ( 2 ) in which f(r x,y ) is the beam fan-out function. In a real system, a receiver may have 4 or more nearest channels so the crosstalk value should be multiplied by a factor of 4 or more.
Final results of some crosstalk calculations are listed in Table 1 . By choosing different hologram parameters, the crosstalk C B→A can be reduced by 8~20dB. Therefore, we can expect that channel density can be improved by more than 70% if we reduce the channel spacing in x dimension from P 3 to P or less, in which, P is the component spacing in x dimension. From the calculation, we can see that the thicker the hologram, the better the crosstalk suppression is, because the output beam profile will be more elliptical. However, if we consider the designated channel loss due to beam reshaping, thicker hologram may not be always a good choice since we have to use higher power VCSEL to compensate higher loss. System merit function can be defined as
to describe the impact of designated channel loss, in which, ∆S nearest is the crosstalk suppression from nearest channels, ∆L designated is the designated channel loss and α is an economic factor to be determined by the system implementation. In general, if α=1, we can plot the system merit function vs. hologram thickness in Fig. 5 to understand the impact of designated channel loss. The system merit has large oscillation for small thickness so a smoothed curve is also plotted to illustrate the lower profile. We measured the alignment tolerance for system which consists of VCSELs with 2° beam divergence. Fig. 6(a) theoretically compares the power that a detector can receive as we move it around with and without hologram and Fig.  6(b) gives experimental result. Since detector has a lens, the power vs. position plot has a flat top for small displacement. For large displacement, we get an 8-9dB additional loss using hologram. The crosstalk suppression for nearest channels is desired in the system to reduce channel pitch and support higher data density. The 3-dB incident angle bandwidth of transmitter is measured to be about 2.3° FWHM, which matches with theoretical calculation if beam divergence is considered. The hologram itself provides wavelength bandwidth above 20nm which is equivalent to frequency bandwidth above THz and is experimentally verified [7] . Assuming that a state-of-the-art highest speed transmitter is used, data rate is then limited by optical power that can be delivered to the detector and the detector active area. In reality, higher data rate transmission can be achieved using high power transmitter and/or better collimated transmission channel. Transmission at 10Gb/s was tested using collimator with beam divergence angle of only 0.5°. A low insertion loss is preferred for high speed system and is measured to be 2.7dB. Bit error rate was estimated to be below 10 -9 with a power budget above 5dB. Eye diagram is shown in Fig. 7 to give a clue of Q factor. 40Gbps/cm 2 signal density was achieved and was limited by the outer diameter of VCSEL package, which is 5mm. In the future, however, if smaller collimators can be integrated with extreme small form factor 10Gb/s VCSEL and detector array, the signal density can be much higher. 
Bit-Interleaved Optical Bus Architecture
Dual-channel interleaved memory access technology was applied in PC industry to improve memory to CPU data rate. For example, in a system with 800MHz front side bus, the chip set will multiplex data from dual 400MHz DDR memory module to catch up with the CPU front side bus. Similarly, if the data from memory module is converted into optical signal, then the system can be called Interleaved Optical Bus. The difference is, in the scenario of optical bus, there is no chip set to multiplex the signals from memory modules and supply the multiplexed signal to CPU. Distributed multiplexers will generate electrical pulses and then apply them to optoelectronic transmitters. Optical pulses from each board are delayed by specific time according to different board location and concatenated into a serial data stream. In this way, slow memory modules could provide information stream at much higher data rate so a fast CPU doesn't have to wait or slow down. Fig. 8 illustrates the diagram of the BIOB architecture. In the demonstration, a 2.5G electrical multiplexer chip was assembled on the board to sharpen the signals from memory modules to 400ps long electrical pulses. Then, electrical pulses experience different delays through electrical delay chips which have 2.5Gbps throughput. The delayed electrical signals are then applied to VCSEL drivers. As long as the output laser lights from different VCSEL do not have significant interference, the power of optical pulses can be added up together. The concatenated optical output signal was displayed in Fig. 9 showing the 2.5Gbps data rate. In BIOB system, potential limiting factor for data rate comes from different components: signal multiplexing chip, signal delaying chip, and dispersion of optical layer. In order to generate electrical short pulse signal, 10Gbps multiplexing can be realized using current silicon semiconductor chips. Higher speed electrical pulses as short as 10ps to 20ps can be produced using GaAs or InP circuit. The digital delay circuit used in the demonstration has an ability of tuning the delay at accuracy of 10ps. This will set the limit of deterministic jitter to be within 10ps which can be tolerated in 2.5G system. A 10Gbps system might require the deterministic jitter to be less than 10ps. Therefore optical delay might be more useful for higher speed system for optical passive delay lines are able to produce delay in the order of 1ps. Dispersion of optical layer was found not to be an issue since the bandwidth can be as high as 2.5THz.
Security also is a concern in modern computing system since different parts could be fabricated from different companies. By storing the confidential data separately into multiple storage devices, any one of the memory modules or hard drives only has partial information. Or, hard drive can work in array to realize redundant storage to enhance reliability. Therefore, the bit-interleaved approach not only could enhance the system speed, but also improves the security and reliability.
Conclusion
During our research, we used hologram and other optical filter as crosstalk suppression method and verified the effect from both theoretical calculation and experiments. For better alignment tolerance to be achieved, higher power VCSELs are preferred. In general, hologram can be used to suppress crosstalk from adjacent channel by 8 to 20dB. Although higher crosstalk suppression can be achieved by using thicker hologram, there is a penalty as the use of higher power laser is required. We showed the concept of system merit function considering the penalty of using very thick hologram. Transmission at data rate of 10Gbps was tested with better confined beam using collimators. Signal density of 40Gbps/cm 2 was achieved and is mainly limited by the OE/EO component speed and power. Bit-Interleaved Optical Bus was designed and implemented in order to allow slow storage devices to support the high data rate requirement of CPU and also provides better security. Transmission of optical interleaved 2.5Gbps data channel was demonstrated.
This work is supported by DARPA and AFOSR.
