The design and development of a fault-tolerant fiber-optic backplane to reduce fhe effects of electromagnetic environments @ME) on flight critical computing platforms is presented. The backplane was developed at the NASA Langley Research Center, and is currently undergoing analysis, simulation, and tests. The simulation results of tests on the backplane in the advent of simulated High Intensity Radiated Fields (HIRF) induced faults are presented, and the fault recovery capability of the architecture is demonstrated.
INTRODUCTION
The development of an architecture capable of implementing a fault-tolerant, fiber-optic backplane, [l] , is presented in this paper. This architecture consists of a set of Bus Interface Units (BIU) and Redundancy Management Units (RMU) forming multichannel redundant fiber-optic backpla.nes [2] . Each channel, in turn, consists of a set of B[Us that are tied to a RMU via separate fiber-optic read and write buses (the action of read and write are taken from the perspective of the BIUs), Figure 1 . Fault-tolerance is achieved by replicating BIUs on several channels and combining their outputs in the RMUs to mask any errors or failures before the data is placed on the read different channels communicate with each other through separate fiber-optic write busel;, Figure 2 . The RMUs also provide global time synchronization across buses. In such redundant system, the RMUs of backplanes and timing control through the local read buses. All processing and bus accesses are controlled by time, i. e., all data appearing on the backplane can be uniquely identified by the time at which they become available. The BIUs on the backplane are time division multiplexed onto the write bus. The RMU is the only device that writes to the read bus. Because the RMU is fundamental to the backplane's operation, both the read bus and the RMU may be replicated to increase reliability. Finally, the RMU can be integrated with a gateway to a network thus providing fault-tolerant access to remote processing nodes.
DEVELOPMENT
The testbench architecture in Figure 1 shows a single working backplane channel. This testbench was designed, developed, and tested to demonstrate functionality of a single working backplane channel. For testing purposes a maximum of four BIUs proved to be sufficient to demonstrate full channel functionality. Therefore, the testbench consisted of four BIU testbenches and one RMU testbench. These testbenches were connected via a read bus, and four write buses that were funneled to the RMU through a multiplexer.
BIU Testbench
The BIU testbench, Figure 3 , encompasses the BIU/RMU and all the necessary components for its normal operations as a separate PC board. The adjoining components are an EPROMRAIVI that contains the scheduled events of operations, a FIFO for the input data, a FIFO for the output data, and a microprocessor (PC) with its associated input and output files that acts as the BIU front-end. A single external bit, BZU-OR-RMU, specifies its functionality to be a 
RMU Testbench
Analysis of the behavior of the RMU revealed that by preserving the BIU interface, specifically to the FIFOs, the RMU can be defined as a special case of a BIU. As a result, the BIU component has since been renamed to BIU/RMU and is the core component of the BIU and RMU testbenches. These testbenches are described in more detail in the following, section.
The RMU testbench, encompasses the BIURMU and all the necessary corriponents for its normal operations as a separate PC: board. The adjoining components are an EPROh4RAM that contains the scheduled events of operations, a single FIFO for both the input and output data, and a microprocessor (PC) with its associated input and output files that acts as the RMU font-end. The RMU 
BIU/RMU
As mentioned in the previous section, analysis of the behavior of the RMU and BIU revealed that these modules have so much in common that RMU should be treated as a special case of BIU. In particular, the main functions of the BIU and RMU are transmission of data, reception of data, and execution of the scheduled instructions. Of course, RMU interpretation of the scheduled operations are slightly different from the BIU. The only RMU specific function is voting on the input data and masking out the faulty BIU(s). However, this function should be performed by an independent module that complements the BIU module functionality. Outputme and RMU can be designed to have identical interfaces to the outside world. Therefore, the terms BIU and RMU are used interchangeably in the architectural sense. Also, by accommodating for their differences in interpreting the scheduled operations via the, BIU-OR-RMU bit, the BIU/RMU architecture can be developed as a single module. Joint development of the BIU/RMU has the added advantages of requiring less development time and code maintenance. Also, it reduces the overall ASIC fabrication cost by 50%, since one single die would suffice. Therefore, in this section, unless specifically stated, all details and descriptions of this module includes both BIU and RMU.
The BIU/RMU has two separate interfaces for two FIFOs; one for reading of the input data and the other for writing of the output data. Since this module can transmit and receive data simultaneously, two FIFO interfaces are necessary to handle the input and output data flux. Every instance of this module, requires its own unique identifier. This identifier is set externally via the BJU-ID parameter.
After the power on and upon reset, Figure 4 , the BIU/RMU resets its internal counters, clears its registers, and resets its transmitter and receiver clocks. If the BIU-OR-RMU bit is set high, the architecture is that of a BIU, it goes into a wait state where the BIU awaits the Start-Cycle command from the RMU. Otherwise, the architecture is a RMU and it begins reading the scheduled operations and takes appropriate actions at the right times. Upon receiving the Start-Cycle command, the BIU resets its internal counters, clears its registers, and resets its internal clocks. At this time, all BIUs are synchronized with respect to the RMU. The BIU then begins reading the scheduled operations and takes appropriate actions at the right times.
Data transmission requires reading a data packet from a FIFO, checking the data integrity by examining the packet header, and converting the data bytes into a continuous serial bit stream, Figure 5 . If the packet header, specifically the Sync-Pattem, is not detected at the expected time, then the error is registered and the transmission operation is aborted. Also, to avoid issuing any commands by the MicroProcessor to the RMU and to safeguard against any undesirable side effects, the Command bit is examined. If Command bit is set, then the error is registered and transmission operation is aborted. "his, therefore, 
4.3-28
of this module, then the Command bit is examined. If the Command bit is not set then the rest of the data packet will be treated as a data packet for this module and will be routed to a FIFO, otherwise, the packet is treated as a Command packet from the RMU and the proper action will be taken. Reading of the scheduled operations from the EPROMRAM requires setting the appropriate address lines and issuing the read signal. The EPROWRAM data is then loaded into the Delta-Time-Clock, and ZnstructionBufSer registers. The fetched instruction is then decoded and executed after the specified delta time.
Voting of the data is a RMU specific function and is performed by all RMUs in a redundant multichannel system to ensure fault-tolerance coverage of the backplane. In a redundant multi-channel system, all
RMUs broadcast their input data to all other RMUs as data becomes available. Therefore, a BIU output is available to all RMUs at the same time. Each RMU, then, votes on the data it receives from RMUs of other channels and on the data from the corresponding BIUs of its channel. In case of any discrepancy, the faulty BIU is identified and the voted BIU output is broadcast in the local channels.
No
Get Second Byte (1 -bit) (1 -bit) (5-bits)
The scheduled events and instructions are stored in an EPROM or a RAM based on the format depicted in Figure 7 . The scheduled events are two 8-bit bytes long. The first byte is reserved for delta time. This allows for a time interval between two consecutive events to be at most 256 byte clocks. However, to extend this time interval beyond 256 byte clocks, no-op instructions should be inserted between the actual events. The most significant bit, bit 7, of the second byte indicates transmission event, the next bit, bit 6, indicates receiving event while the next bit, bit 5 (in conjunction with bits 6 and 7), indicates the nature of the event as being statuddatdcommand. The five least significant bits, bits 4 through 0, identify the RMU/BIU that is scheduled to take the appropriate action after the delta time has elapsed. Therefore, this format allows for a maximum of 32 BIUs per channel.
When the MSB of the second byte is set high, it is interpreted by the BIUs as a transmit instruction. However, RMU interprets it as a switch channel instruction and uses the BIU identity field, bits 4 through 0, as the multiplexer select lines to switch to the appropriate BIU write bus.
RMU Commands
The only global command issued by the RMU to all BIUs is the Start-Cycle command.
Fault Injection
There are three metha% of injecting a fault into this system.
These injected faults represent a manifestation of HIRF induced faults. The first is the brute force method where a BIU is turned off. In an EME this fault corresponds to a permanent latch-up and processor failure due to exposure to HIRF. Since at power down the exact state and condition of the BIU is not known, this method of fault injection is random. be accomplished by forcing the BIU to reset and wait in the idle state during the simulation process.
The second way of injecting a fault is through the schedule and by instructing the BIU to stop transmitting data at a specific time. In an EME this fault corresponds to processor malfunction due to exposure to HIRF. In effect, the BIU goes off line at the designated time. As a result, the time of fault Occurrence is predicable. Since the fault can be scheduled to occur at a specific time, this method is extremely helpful in examining integrity of the system in the presence of a fault at different states of the system.
The third method is also through the schedule and by switching the channel to another BIU, preferably an unattached BIU. In an EME this fault corresponds to processor malfunction due to exposure to HIRF. As a result, even though all BIUs are functioning normally, switching to a bogus channel will in effect disrupt proper routing of the intended BIU output to the target BIUs. This method can simulate data packet corruption through the write bus as well as BIU babbling.
Fault Recovery
In the case of brute force method where a BIU is powered down, the BIU can be reintroduced into the system upon power on and at the start of the next schedule cycle. At power on, the BIU resets its internal registers and enters the idle state, Figure 4 , awaiting the Start-Cycle command from the RMU before restarting its normal operations. Therefore, this fault recovery method, lends itself to upgrading the system by taking the BIUs off line, one at a time, and without having to power down the whole system.
In all other cases, where a BIU is either babbling or is not transmitting data, the BIU may recover from the fault provided that the fault is not persistent. In that case, the BIU may recover at the start of the next schedule cycle and upon receiving the Start-Cycle command from the RMU. However, if the fault persists for more than one schedule cycle, then the BIU may never recover.
Reporting Faults
Regardless of the nature and timing of the faults, symptoms eventually show up on the read and write busses. When matched against the scheduled activities on these busses, the faulty BIU and nature of the fault is identified. The symptoms indicate whether the faulty BIU is babbling or is not transmitting at the scheduled time. These errors are reported by setting their designated bits, in the error register. A more descriptive m o r reporting would require time stamping the errors.
SIMULATION RESULTS AND CASE STUDIES
In this section, three case studies are presented as a demonstration of the capabilities of the fiber optic backplane. In the first case study the system operations under ideal conditions are examined. In the second case failure of a BIU due to power down or reset is studied. Finally, in the third case a BIU babbling and data corruption is investigated.
The single channel under study consists of one RMU and four BIUs, Figure 1 . To examine the operations of the system under various conditions, a generic schedule is setup to encompass all aspects of the fault injection and recovery while exercising all BIUs. In these case studies, the schedule consists of transmission windows for the BIUs im the following order: 1,2, 3,4, 1,2, 1, and 3.
Case 1. Ideal Case
The schedule for this case study is listed in Figure 8 . Figure 9 shows the typical activities of the BIUs during one scheduled period and in the absence of any faults. 
Case 2. Failing a BIU
In this case study, failure of a BIU due to latch-up, power down, or reset is simulated by forcing the BIU to reset. The system starts with all BIUs functioning normally. BIU 1 is then forced to reset in the middle of a scheduled period. As a result, the BIU stops executing scheduled instructions and is taken off line, Figure 10 . BIU 1 is seen not to be transmitting for the rest of that schedule period. In case of power down, BIU 1 will remain off line, however, after it is powered on or in the case of reset, BIU 1 will recover at the start of the next scheduled period. In this case study, a BIU babbling and data corruption is examined. BIU 1 is scheduled to transmit before BIU2 is finished. As a result, 1) BIU 2 transmission is ignored, 2) BIU 3 receives BIU 1 data and it doesn't know it, and 3) BIU 2 receives BIU 1 data, i. e., BIU 2 is transmitting and receiving data at the same time. In other words, during this time, BIU 2 is babbling and BIU 3 receives a corrupted data packet. Also, this case study examines the independence of the transmitter and receiver modules of the BIU/RMU, Figure 11. 
SUMMARY
A single channel, fault-tolerant, fiber-optic backplane is developed to study the feasibility of the architecture proposed in [l] and [2] . This backplane also assist with the investigations of behavior of the architecture in the presence of faults. The particular implementation of the architecture that is presented here enables a RMU to connect to as many as 31 BIUs; however, for testing purposes a maximum of four BIUs are sufficient to demonstrate full channel functionality. The architecture is designed, developed, and implemented using VHDL. Large portions of the developed architecture is synthesized and implemented in hardware using Xilinx FPGA on multiple PC boards. 
4.3-32
The PC boards are designed so that they can be configured to function as either a BIU or a RMU. Analysis of the test cases shows the feasilbility of the backplane as well as backplane integrity in the presence of faults and recovery from faults. 
4.3-33

