This is an informal report intended primarily for internal
DECIAMER

INTRODUCTION
Spacecraft fault-tolemnt computing is implemented in hadware and softwiue at the part and system levels [l ,2] . Ideally the additional hardwme and software nx@red for fault-tolerant features should have minimal impact on the computers ystem's size, speed power requirements, and no decrease in computational performance. However, the part level approach often has an adverse impact on computational performance. The limitation on computational performance will become more significant as spacecraft computational xeqdrements increase. A fundamental problem is that radiation hardened parts trail both the performance and the performance cost ratio of commercial parts due to long design cycle and small market for hardened parts. This requires comparing the output of the lock-stepped processors every cycIe for possible error.
When an error occurs, hardware causes a soft reset to occur. All data within the processor are .
assumed b@ while main memory is unaffected. The software is written with checkpoints, and upon restart will rollback to the most recent correctly written checkpoint. The soft restart operation takes about 10 msec, so the performance degradation, even in high SEU environments, is negligible.
HARDWARE
Initial developments involved the clock synchronization, the compa.m circuit and the reset circuit to link two R30(K)processors running on commercially available development boards. This allowed software development for error recovery and test while the fault-tolerant processor board was designed and built. Communication is by a 9600 baud RS232 serial interface. Test programs are down loaded in to cache or main memory. The interface allows direct mad access to main memory including check bits without EDAC. The direct read access allows checking for SEUS.
4
The phase lock loop circuit shown in Figure 2 adjusts the input clocks to both processors to achieve clock synchronization. The skew between the two SYSOUT clocks shown in Figure 3 machines are "PREEMPT," "MASTER," "TEST1 ," and 'TEST2." as shown in Figure 4 .
The PREEMPT state machine is triggered by a timer interrupt. Once triggered it checks for data sent to it by the "MASTER state machine. The data consists of x y coordinate pairs and an iteration number. Using the iteration number modulo 4, the PREEMPT state machine calculates another pair of x y coordinates and passes them back to the MASTER state machine. PREEMPT and MASTER use integer offsets to calculate the four comers of a square rotated by 45 degrees.
The MASTER state machine checks the iteration and calculation. Any lost messages or failure of the PREEMPT routine to perform an operation to completion is detected. Communication to and from the PREEMPT routine is by two ring structure of predetermined sizes. One ring is for incoming data and the other for outgoing data. 
PROIT)N SEU~TS
Piior to the radiation effects tests the NOFAULT program underwent simulation testing to veri@ software recovery. Simulation testing done on a VAX used pseudo-random timer interrupts and routines to unwind the stack. Additional tests on a UNIX system used timer interrupts and software jump facilities. These two systems simulated recovery from both random events and events timed to cause upset during critical routines, such as during soft reboot.
The test application program utilizes cache memory for the stack and executes a variety of operations designed to detect error conditions. Each high level state machine data structure includes locations for counting detected errors and recoveries. In addition, the data structures associated with the lower level routines include locations for counting the number of times the routine executes its recovery code. The error and restart counts are monitored by having the program periodically print the numbtxx or by independently monitoring the memory of interest.
SEU tests were performed at the variable energy isochronous cyclotron housed at the Crocker Nuclear Laboratory at the University of California at Davis. The beam line is dedicated to proton SEU experiments and equipped with an automated beam current monitoring system. Current monitoring is done with a Faraday cup and a secondary electron emission monitor [4] .
Our tests were conducted with 60 MeV protons, and a 3.5 inch thick Delrin block was used to collimate the beam onto the slave processor.
The proton SEU tests were done at fluxes in the range of 108 to 109 protons/cm2-s. This flux range gave an approximateerrorraterangingfrom one every ten seconds to once a second h comparison, the peak flux of protons in the South Atlantic Anomaly is about 103 protonslcm2-s.
However, since the SEU rate is proportional to the flux, the SEU measurements are made at a higher flux in order to limit the time and expense of the testing. To ensure that them was no prob- 
* Hung after 5th event and recovery
The SEU device cross section based on the test program is 6.82x10-12 cm2 k 1.99x 10-12 cm2. Comparing this cross section to previous LLNL experiments shows the cross section is smaller than the "Stress Test" device cross section of 1.36x10-11 cm27 and close to the "CPU Test" &vice cross section of 6.51X10-12 cm2 [6] . This is expected since the "Stress Test" is a very extensive processor test that checks all 32 generaJ registers and Table Lookasi& Buffer. The "CPU Test" is more limited and only checks 6 of the general purpose registers and none of the Table Lookaside Buffer similar to a typical application program. The upset cross section is useful for estimating upset rate. However, the actual upset rate is dependent upon the application code.
SEU and SEL predictions due to protons and ions for a spacecraft with 60 roils of aluminum shielding in a 60 degree inclination 500 km orbit were calculated using "SPACERAD'' [7] . Both Galactic Cosmic Ray and trapped proton environments use the IGRF 85 magnetic field model. The trapped proton environment is solar minimum (Sawyer& Vette) while the Galactic Cosmic Ray environment is Adarn's solar minimum. The proton upset rate for this orbit is based on the Bendel and Petersen "A parameter for the processor [8] .
Prior proton SEU tests performed for single IDT R3000A operation produced an upset cross section of 1.4x 10-11 cm2 and implies an orbit upset rate of 5.7x10-5 upsets per day [6] .
The unrecoverable upset cross section is much lower for the dual lock step configuration. Using the data from the last 4 runs that produced 12 successful recoveries for each potential upse~the upper bound estimate of the unrecoverable upset cross section is 5.7x10-13 cm2 (one unrecoverable upset in the total fluence of 1.76x1012 p/cm2). This upper bound estimate implies an orbit upset rate of 1.5x10-6 upsets per day.
The limiting factor testing and operation of the fault-tolerant computer is no longer the upset 
CON~USION
The project successfully demonstrated that dual lock-step comparison of commercial RISC processors is a viable fault-tolerant approach to handling SEU in space environment. The fault tolerant approach on orbit error rate was 38 times less than the single processor error rate. The random nature of the upsets and appearance in critical code section show it is essential to incorporate both hardwareandsoftwarein thedesignandoperationof fault-tolerantcomputers.
[1]
[2]
[3]
[4]
[5]
.
[6]
