Abstract-High-speed optical data links enable local area networks (LANs) that operate at data rates above 10 Gb/s. Various network, protocol and switch architectures have been proposed that use these links. The optical network interface card (ONIC) is an important component for demonstrating efficient application of these architectures. In this paper, we describe the design of a programmable ONIC that interfaces a 12-channel gigabit parallel optical link module with a 64-bit/66-MHz PCI computer bus. Hardware programmability (using FPGAs) enables the ONIC to efficiently implement different communication protocols. For hardware testing, the ONIC hardware was programmed for bit error rate (BER) analysis. In continuous operation at 8 Gb/s for 30 days through a 1-m fiber, no errors occured. For application testing, a custom ONIC software driver was developed. We used this driver to demonstrate message passing between applications running on two ONIC-equipped servers. The ONIC design provides a low-cost solution that can be readily adapted for application and device specific requirements. The use of ONIC in a free-space optical switch system is described here.
I. INTRODUCTION
V ARIOUS very short-reach (VSR) optical data links that operate at data rates of 10 Gb/s and beyond are now becoming available as commercial products [1] - [3] . Various network, protocol and switch architectures that utilize these links have been proposed [4] - [6] . An example network architecture, shown in Fig. 1 , uses VSR optical data links to interconnect multiple compute nodes through a central switch. In order to efficiently utilize the increased bandwidth capability of VSR optical data links, these networks architectures use new communication protocols rather than relying on existing standards such as ATM, Ethernet, or HIPPI [7] - [9] . The optical network interface card (ONIC) is an important instrument for demonstrating efficient application of these new architectures. The purpose of the ONIC is to interface a standard computing node, such as a workstation or an embedded processor, with a VSR optical data link. On the hardware front, the ONIC converts slow, wide-parallel and clock-synchronous data streams (typically used in chip-level computer interconnection) to narrow gigabit-speed data streams with embedded clock information (typically used in VSR optical data links). The ONIC hardware often contains first-in first-out (FIFO) memory storage to buffer incoming and outgoing network data for flow control. On the software front, the ONIC includes software drivers to provide communication and flow control between the hardware, application running on the computing node and the custom network protocol used by the network that is being demonstrated. In this paper, we describe the design of a programmable ONIC that interfaces a 12-channel, gigabit parallel optical link module [10] , [11] with a 64-bit/66-MHz PCI bus [12] . The adoption of PCI bus allows our ONIC design to be used in widely available PCI-based workstation and server computers. Hardware programmability is achieved using field programmable gate array (FPGA) integrated circuits. This enables our ONIC design to efficiently implement different network protocols. Although the current ONIC design uses a specific VSR optical data link hardware, it can be readily modified to support other types of optical data links. Our ONIC design was originally developed to demonstrate a specific network architecture that used free-space optical interconnection inside the switch fabric [4] . However, the novelty of our design is that it provides a low-cost network interface solution that can be readily modified by other researchers for network protocol and optical device specific requirements. The following paragraphs compare our approach with existing ONIC implementations and provide justification for our approach.
Commercially available ONICs use custom integrated circuits and/or network processor chips to implement specific network protocols [13] , [14] . They cannot keep up with the 10-100 Gigabit data rates available with VSR optical data links. Finally, commercial ONICs use proprietary designs and cannot be modified to use new VSR optical link hardware. These attributes make commercial ONICs unusable for experimenting with new network architectures and VSR optical data links. On the other hand, an ONIC has been previously demonstrated that used custom-made integrated circuits to implement a specific network protocol [15] . While this approach successfully demonstrated the new network protocol proposed by its authors, it is difficult to modify this design because of the high cost and extensive knowledge required for the design of custom integrated circuits.
References [5] and [16] describe a network architecture demonstration that used an ONIC design similar to the one being proposed in this paper. That ONIC design also used FPGAs and SERDES that can be reconfigured to support various network protocols. However, it employed a proprietary memory-bus interface to connect the optical link with the computing node. While a memory bus interface permits higher bandwidth communication between the processor and the optical link, our PCI-based ONIC supports a broad range of computing hardware from a multitude of computer manufacturers. Future modifications of our ONIC design can use the emerging PCI bus extensions [17] to achieve higher communication bandwidth than that possible with existing 64-bit/66-MHz PCI standard. Finally, we have done more work at the software driver level to demonstrate application-level message passing between computing nodes interconnected using ONICs.
We have built several ONIC prototypes and tested them for continuous error-free data transmission through the VSR optical data links. A software driver was developed and used to experimentally demonstrate application-level message passing between two server computers, each equipped with ONICs that were connected with 1 m of parallel fiber ribbon cable. As previously described, our motivation for designing the ONIC was to support the demonstration of a network architecture that uses free-space optical interconnects inside the switch fabric. The custom network protocol used in that architecture and its mapping onto the ONIC hardware will be presented in this paper. All these results can be used as a starting point for researchers who want to develop their network protocol and optical device specific ONIC.
The remainder of this paper is organized as follows. Sections II describes the ONIC architecture and design. Section III describes the experimental tests performed with prototype ONICs and the results. Section IV describes the use of ONIC in a new network architecture. Finally, Section V provides concluding remarks.
II. ONIC ARCHITECTURE AND DESIGN
The ONIC architecture is shown in Fig. 2 . It is a PCI based network interface card (NIC). It has a high end FPGA that can be programmed with the network protocols. The network interface of the ONIC is through two 12-channel gigabit parallel optical link modules to transmit and receive data. The FPGA sends and receives the parallel data from the compute server to the optical modules through a set of SERDES. The SERDES convert the parallel data to high-speed serial data suitable for the optical transmit module and recover the data and clock from the optical receive module. The following paragraphs describe the functionalities of he chips on the ONIC, hardware programming steps and physical layout of the printed circuit board (PCB).
The schematic of the ONIC is shown in Fig. 3 . The communication between the ONIC and the compute server is through the fast-wide PCI 64-bit/66 MHz interface. The theoretical maximum bandwidth of the bidirectional PCI interface is 4.22 Gb/s half-duplex. The FPGA is a high-performance VIRTEX XCV1000 from Xilinx having a capacity of more than one million system gates. The whole network protocol can be implemented on to the FPGA and give greater flexibility in modifying it. Part of the FPGA resources on the ONIC were devoted to Xilinx's PCI core.
The data from the FPGA goes to three AMCC S2065 quad-channel serial back-plane SERDES. Each SERDES has a 32-bit slow speed input and output channels and four high-speed differential inputs and outputs that can operate between 0.7 and 1.3 Gb/s. The data to and from the compute server to the SERDES is from the FPGA through the 32 data lines running between 70 and 130 Mb/s. The SERDES on the transmit side does an 8 b/10 b encoding of the data. It generates K28.5 synchronization characters to establish communication with the destination node.
The high-speed differential signals from the SERDES connect to a 12-channel gigabit parallel optical link driver and receiver module from Honeywell Technology Center. Each module couples to a 12-channel parallel fiber link. Each channel of this link has been demonstrated at 1.06 Gb/s giving the composite parallel link a full-duplex bandwidth of 24 Gb/s. With the 8 b/10 b encoding/decoding of the data done by the SERDES at the transmitting/receiving ONIC's to maintain signal integrity, the real full-duplex data bandwidth is 24 Gb/s for each compute server. The data from the optical receive module is sent to the SERDES. The SERDES does data decoding, clock recovery and synchronization.
The design flow for the FPGA on the ONIC starts with a description of the protocol functional requirements. The functionality is then modeled using very high speed integrated circuits (VHSIC) hardware descriptive language (VHDL) [18] . The design is further converted to register Transfer Level code. A behavioral simulation of the VHDL model is performed to validate functionality. On completion of the simulation, the VHDL model is synthesized to a netlist using tools from Synopsys. During synthesis the VHDL model is targeted toward the FPGA on the ONIC. This determines the resources that will be used on the FPGA and the timing. Once a netlist is generated it is simulated to verify functionality. Fig. 4 shows the steps involved in programming the FPGA with the protocols and IP core. Upon validation a bit file is generated for programming the FPGA. The bit file is loaded into a programmable read only memory (PROM) on the ONIC through a joint test action group (JTAG) interface [19] . Once the PROM is loaded with the bit file, it can be used to program the FPGA multiple times with the same functionality. When a new functionality needs to be implemented the steps described above are followed and the PROM is loaded with the new bit file.
The ONIC was fabricated using a standard copper and FR-4 PCB fabrication process. The PCB has eight layers of routing with full and split power planes. Fig. 5 shows a picture of the fabricated ONIC hardware. The ONIC draws more power than the PCI bus can sustain. The SERDES draws close to 10 watts of power at peak performance. Hence the ONIC is powered externally from the compute server power supply. This isolation of power from PCI bus ensures reliable power from the compute server without affecting the ability to add other system components on the PCI bus. On board dc/dc converters are used to regulate the power to various components on the ONIC.
The parallel optical driver and receiver modules are located on either side of the ONIC. The Transmit module is shown in Fig. 5 . The receiver module is attached on the other side of the ONIC and the fiber link comes through a cut out on the card. Differential transmission lines run from the SERDES to the optical transmit and receive modules. These lines are 50-impedance matched and they are located only on the outer layer of the ONIC. This was done to avoid multiple vias. Fig. 6 is a snapshot of the high-speed traces on the ONIC.
III. TEST RESULTS
The ONIC hardware was tested in two phases. A link integrity test for bit-error-rate (BER) analysis and a message passing application was run to test the ONIC hardware. These tests were conducted by sending synchronization characters followed by a digital signature (data starting point). This is followed by the real data.
A. Link Integrity Test
The link integrity test was performed on the ONIC hardware for BER analysis. Initially, ONIC loop-back test was performed. Only one ONIC was used in this test. The parallel fiber ribbon from the transmitting optical module is looped back to the optical receiver module. The fiber ribbon was 1 meter in length. The ONIC was plugged into the PCI bus of a compute server.
The FPGA of the ONIC was programmed with the test protocols and the PCI core. A 32-bit linear feedback shift register (LFSR) was used to generate pseudorandom bit stream (PRBS). All three SERDES on the ONIC are controlled by the FPGA. Two of the three SERDES were run in sync with each other. The third SERDES was not used in the test. The K28.5 character for data synchronization between the two ONIC's was sent on all four channels of each SERDES. After sending the K28.5 characters, an 8-bit digital signature was sent. This was followed by the same PRBS on both SERDES delayed from one another by a clock cycle. This data is encoded by the SERDES and sent to the optical modules. The data is then sent through the fiber-ribbon and is received by the receiver module. Fig. 7 shows the steps involved in the link integrity test.
The data goes to the receiver optical module and is converted to electrical signals and sent to the SERDES. The SERDES waits for the K28.5 characters and when it recognizes the characters it synchronizes onto the oncoming data. The SERDES performs data decoding, recovers the clock from the data and sends them to the FPGA. The FPGA establishes the word boundary in the data stream from the K28.5 characters. It waits for the digital signature. As soon as the FPGA sees the digital signature it starts an LFSR with the same seed as the transmit LFSR. It compares the oncoming data with its LFSR data. A status bit gets asserted when an error occurs. An error counter keeps track of the error from each SERDES. It also reports error on loss of synchronization in any of the SERDES. Fig. 8 shows a scope snapshot of the data through one of the high-speed channels at 1 Gb/s. The error count is communicated back to the compute server through the PCI interface and is continuously updated. The test was run for 30 days and we encountered no errors. The total bandwidth of optical data communication in this test was 8 Gb/s. The same test was repeated with two ONIC's plugged onto two compute servers. One ONIC acts as the transmitter and the second ONIC as the receiver. The protocols were separated and the transmitting ONIC was programmed with the PCI core and the transmission protocol. The receiver ONIC was programmed with the PCI core and the receiver protocol. The length of the fiber ribbon used in this test was 1 m. The test was run for 2 h without any errors.
B. Message Passing Application
A message passing demonstration was performed to test application level communication between two ONIC's plugged into the PCI bus of two compute servers. They were connected to each other through a 12-channel fiber-link. Fig. 9 shows a picture of the two servers with ONIC hardware.
A custom software driver was written for the application to communicate with the PCI core on the ONIC FPGA through the PCI bus. The device driver was developed for Windows NT 4.0. The server uses the driver function calls to communicate with the ONIC.
The application itself was simply the transmission of characters typed on the keyboard of one compute server to another. To start the transmission, the compute server announces to the ONIC to establish communication. The ONIC in turn, sends K28.5 characters to establish the link and continues to send K28.5 characters to maintain the link. Each keyboard stroke is recognized by the driver, assigned a unique address (even if it is repeated) and sent to the ONIC through the PCI interface. The FPGA on the ONIC is programmed to recognize each character with its unique address. It sends it repeatedly to the SERDES until the next character with a different address is received for transmission. The SERDES 8 b/10 b encodes the data and sends it over the fiber link to the second compute server. Since the data bandwidth is very small compared to the bandwidth available, only one SERDES was used in this test. Fig. 10 shows the steps involved in the message passing application test.
The SERDES on the ONIC of the receiving compute server, 8 b/10 b decodes the data received. It recognizes the K28.5 characters sent to it and establishes a word boundary. The FPGA then waits for the data. As soon as it receives a new character with a unique address, it is sent to the compute server. The ONIC continues to monitor the received data. It sends the next character only when its address is different from the previous character received. Both the message sent and received was displayed on the respective display screens of the compute servers. With only one SERDES used in the test the application was run at 4-Gb/s bandwidth using only four channels in the 12-channel fiber ribbon.
IV. APPLICATION OF ONIC
VIVACE program stands for Vertical-cavity surface-emitting lasers (VCSELs)-based Interconnects in VLSI Architectures for Computational Enhancement [20] . This is a Defense Applied Research Projects Agency (DARPA) funded Very Large Scale Integrated circuits (VLSI) Photonics program. This program is aimed at building a high bisection bandwidth, free-space optically interconnected switch and to demonstrate it in a system of multiple compute servers in close physical proximity running a Message Passing Interface (MPI)-based distributed algorithm.
The VIVACE demonstrator system will consist of eight compute servers interconnected by a high-speed free-space optically interconnected (FSOI) switch fabric [4] . The system however can be scaled up to 128 nodes using the same design. Fig. 11 shows the VIVACE system concept. The heart of the VIVACE system is the FSOI switch built on a multichip module (MCM) that will consist of a 4 4 array of VLSI CMOS chips heterogeneously integrated with large-scale 2-D VCSEL/Photodetector (PD) array. A prototype FSOI switch fabric was built and demonstrated as part of the Free-space Accelerator for Switching Terabit Networks (FAST-Net) system [21] . Fig. 12 shows the FAST-Net switch assembly.
The switch takes advantage of the tremendous bandwidth available through free-space for communication and utilizes high-density parallel gigabit optical links to communicate with the switch I/O ports. The capacity of the switch prototype is up to 128 ports. The bidirectional throughput of each port is from 2.12 Gb/s up to 12 Gb/s. The 128-port switch requires 16 2-D smart-pixel chips (32 32 array of VCSEL/Photodetector pairs) for a total switch matrix size of 128 128. This amounts to a total bisection bandwidth of 1.69 Tb/s of the free-space interconnected switch.
Each of the compute servers in the VIVACE system will be interconnected with the FSOI switch though the ONIC using the parallel gigabit optical links. Custom protocols are being developed to communicate between the compute servers and the switch fabric. Fig. 13 shows the schematic view of the VIVACE protocols.
The protocols were developed for communication between the compute server and the ONIC, ONIC and the switch, switch and ONIC and from ONIC to the compute server. Also, foreseeing interconnecting more compute servers, communication between nodes on different switches was also considered. The protocols for intraswitch and interswitch communication are not explained in this paper. The data received by the ONIC from the server through the PCI bus is at 64-bits/66 MHz data rate. The FPGA on the other end of the ONIC communicates with the three SERDES on a 96-bit wide bus for data transmission and reception at 100 Mb/s rate. The bandwidth discrepancy is fixed by using first-in first-out (FIFO) logic to convert the 64-bit data to 96-bit data and vice-versa. A flow control algorithm is 
A. Communication Request
In phase 1, Node A wants to establish communication with Node B to send a message. Node A initiates this communication by the sending a MPI call from the application. A control block with MPI TAG and MPI priority without the address of the application data (address 0) is created on the MPI memory of Node A. MPI then sends header information with the message length, memory address and the destination address to the ONIC. The ONIC examines the header and determines that there is no application data. It then reads the control block created in the MPI memory. The ONIC sends the header with the destination address, address where the message is to be written, source switch ID and port on source switch. It first transmits the header followed by the control block with the cyclic redundancy check (CRC) checksum. This request message negotiates through the switch and arrives at the destination node. Fig. 15 gives a schematic view of the communication between the nodes and the data format.
B. Communication Response
In phase 2 the ONIC on Node B receives the header followed by the control block. The header is examined by the ONIC and determines the message to be a communication request. It also does a CRC checksum on the received data. It then direct memory access (DMA) writes the control block into the compute server MPI memory. The MPI determines that the message is a request and also destination address is necessary for transmission. It keeps track of the control blocks received to determine if the data associated with it was received. The MPI on Node B then waits for its application to send a control block to match the received control block. The MPI creates a control block for response with permission to transmit or no-transmit of data for the request from Node A. It creates a header and sends it to the ONIC. The ONIC examines the header and determined that the message is without data and DMA reads the control block from the MPI memory. It then sends the header followed by the control block with a CRC checksum. Fig. 16 gives a schematic view of the communication acknowledgment protocol.
C. Message Transmission
During phase 3 the message negotiates through the switch and is received at the ONIC in the transmission request node, Node A. The ONIC examines the header to determine if there is data following the header. If not it DMA write the control message following the header to the MPI memory of Node A. The MPI examines the control block and finds it to be a response message. It matches it with the initial control message sent for transmission request. If the response is an acknowledgment for transmission request with the address for the message, the MPI creates a new control block to send the long application message. It discards the previous control block and creates a new header information with the address where the data needs to be written on the destination application memory. The MPI memory in Node A DMA writes the header and control block to the ONIC. The ONIC examines the header and modifies it to include the destination address, address to write the memory, source switch ID and port on source switch. It now sends the header followed by the application data DMA read from Node A along with CRC checksum. The message arrives at the ONIC of Node B. The ONIC examines the header and the MPI memory writes the data to Node B application memory. The control block associated with this received message is discarded. Fig. 17 shows the communication protocols involved in phase 3. Node A waits for a predetermined time and if it sees no response it assumes the message reached the destination node. V. CONCLUSION This paper described the design of a programmable ONIC that interfaces a 12-channel, gigabit parallel optical link module [10] , [11] with a 64-bit/66-MHz PCI computer bus [12] . This ONIC design was originally developed to demonstrate a specific network architecture that used free-space optical interconnection inside the switch fabric [4] . However; it can be readily modified to support new network protocols and optical data link technologies.
We have built several ONIC prototypes and tested them for continuous error-free data transmission through the VSR optical data links. A software driver was developed and used to experimentally demonstrate application-level message passing between two server computers, each equipped with ONIC's that were connected with 1 m of parallel fibe ribbon cable. As previously described, our motivation for designing the ONIC was to support the demonstration of a network architecture that uses free-space optical interconnects inside the switch fabric. All these results can be used as a starting point for researchers who want to develop their network protocol and optical device specific ONIC.
The selection of PCI bus for the computer interface was driven by the need to make the ONIC compatible with a wide range of computing hardware. In addition, the use of PCI bus simplified driver software development because of extensive programming tools and support documentation available for PCI software driver development. Recently, higher-performance computer interfaces have become commercially available (e.g., INFINIBAND [22] and PCI-X [23] ). The ONIC design can be modified to use these computer interfaces to provide higher communication bandwidth. For example, PCI-X doubles the 4 Gb/s maximum bandwidth available with 64-bit/66-MHz PCI to 8 Gb/sec.
The ONIC design described in this paper uses a first-generation Xilinx VIRTEX FPGA device that was available when the ONIC design was initiated. Recently, Xilinx has started shipping 3rd generation VIRTEX FPGA devices that integrate up to 24 SERDES circuits that can operate at data rates from 622 Mb/s to 3.125 Gb/s. These new FPGA devices also integrate multiple POWERPC microprocessor cores that can be programmed for complex network protocol computations. Using these new FPGAs can greatly simplify the ONIC design because it eliminates the need for separate SERDES chips and the large amount of PCB wiring required to interconnect these SERDES with the FPGA.
Finally, newly developed network processors include software programmability and support up to 10 Gb/s network bandwidth [24] . These network processors offer sophisticated packet processing capabilities, but they cannot scale to the bandwidth Gb/s Gb/s offered by the latest FPGA devices [25] . They are also considerably more difficult to use due to their specialized nature and low distribution volume. Thus the FPGA-based ONIC approach continues to be a good approach for demonstrating new network architectures and new VSR optical link technologies. where he was intimately involved in research and development, manufacturing, and business development worldwide. As the inventor of the first lithium battery powered, hermetically sealed, implantable heart pacemaker, he is the holder of several U.S. patents in this field and has authored numerous articles and publications related to cardiac pacing.
Kevin Driscoll, photograph and biography not available at the time of publication.
Brian Vanvoorst, photograph and biography not available at the time of publication.
