SSC05-VIII-3
Low Power High-Speed Radiation Tolerant Computer
David Czajkowski, Praveen Samudrala, Manish Pagey, and David Strobel
Space Micro Inc.
9765, Clairemont Mesa Blvd, Suite A
San Diego, CA – 92124
(dcz, praveen, pagey, dstrobel)@spacemicro.com
ABSTRACT: This paper presents a new concept of building a high performance, radiation hardened computer from
Commercial-Off-The-Shelf (COTS). We discuss the underlying radiation mitigation technologies used, and
demonstrate their (radiation-tolerance) capabilities by citing the radiation test results. Hardness against Single Event
Upsets (SEU) for COTS microprocessors is achieved using Time-Triple Modular Redundancy (TTMR). Single
Event Functional Interrupts (SEFI) of the microprocessors are being mitigated with an auxiliary Hardened Core
(H-Core) chip. The other on-board chips used are a combination of space qualified, radiation hardened COTS parts.
A flight unit of the computer based on the above mentioned technologies, Proton 100k, was adapted for the Air
Force Research Laboratory's Space Vehicles Directorate RoadRunner satellite program.
INTRODUCTION

Commercial microprocessors can be hardened against
Single Event Upsets (SEUs) using a myriad of methods
including Triple Modular Redundancy (TMR), Time
redundancy etc. These methods use redundant resources
for detecting and correcting faults. These methods
however attractive suffer from drawbacks such as
performance, power and area.

Current space qualified computers use custom built
microprocessors that employ a range of radiation
hardening techniques. However, such methods result in
multi-fold detrimental effects area-wise, power-wise,
and cost-wise.
In this paper we present TTMR and H-Core techniques
to improve the radiation immunity of commercial
microprocessors. TTMR combines time and spatial
redundancy within a single microprocessor to detect
and correct SEUs. H-Core uses an external radiation
hardened circuit to monitor and manage a COTS
microprocessor during SEFI events. H-Core is
responsible for detecting the occurrence of SEFI events
and, in case of such an event, asserting a sequence of
signals until complete recovery of the microprocessor is
confirmed. In addition, the H-Core also provides
capabilities for application programs to restore their
states after recovery from a SEFI event.

TTMR Method
TTMR is a hybrid method based on both TMR and
Time redundancy. TTMR exploits the redundant ALUs
of a VLIW processor to process identical instructions in
parallel. To increase the performance of the processor,
only two sets of instructions run first. The results of
these two sets are compared and, incase of a mismatch,
time redundancy is used to run other set and the results
voted.
The flowchart shown in Figure 1 represents the steps
followed by the TTMR algorithm. The algorithm startsoff by loading the first instruction of the program. .O.
and .M. represent the .Original. and .Mirror. copies of
the instruction. The O and M instructions are spatially
separated by executing them on the different ALUs of
the VLIW microprocessor. Doing this eliminates the
points of identical failures that might occur in the two
sets as in the case of a microprocessor using time
redundancy. The next step of the flow chart is to
compare the results of the O an M instructions. In case
the results match the current state of the program is
saved, or in other words, the uncorrupted results of the
O and M are written back to the main memory.

We briefly discuss the application of the Proton 100k
computer to the Air Force Research Laboratory's
(Space Vehicles Directorate) RoadRunner Onboard
Processing Experiment (ROPE) satellite program
(planned for launch in Spring 2005).
SEU MITIGATION USING TIME-TRIPLE
MODULAR REDUNDANCY (TTMR)

David

1

19th AIAA/USU
Conference on Small Satellites

SSC05-VIII-3

Figure 1. TTMR Steps
However, a mismatch in the O an M results indicate the
occurrence of an SEU. At this point another set of
results are calculated. Either ALUs of the processor can
be used in calculating the results of the Third or .T.
instructions. The results of T and O are compared now.
If the results match, the M results were corrupted. The
M value is overridden by the O results and the process
is repeated until the end of the program.

responding. The effects can be much worse with
commercial, non rad-hard microprocessors. The stateof-the-art method of recovering the processor from an
SEFI is to power cycle the entire computer [7] after
detecting the system is hung, with few published
methods of detection. A serious drawback of this
solution is that the whole system is inoperable for the
entire duration from the SEFI, through detection until
power is cycled and the processor is rebooted. This is
undesirable and in some cases unacceptable.

If O and T results do not agree, the T results and M
results are compared. If they match, the M results are
copied into the O results. The algorithm continues with
the execution of other instructions of the program. On
the other hand, a disparity in T and M implies that there
were probably more than one SEUs during the
execution and hence resulted in an uncorrectable SEU.
The probability of such an event is however, very small
and therefore negligible.
An input program can be modified into its TTMR
version by feeding it to the Space Micro's
“Precompiler”. The precompiler includes the O and M
copies, and the compare statements in the program.
Precompiling, as the name suggests is performed before
compile stage.

Figure 2. H-Core
Hardened Core (H-Core) built using a radiation tolerant
process (such as Silicon On Insulator (SOI)) can be
used for SEFI mitigation.

HARDENED CORE (H-CORE)
Single Event Functional Interrupts (SEFI) [1] in
microprocessors are a serious threat to space computers.
SEFI is an Single Event Effect (SEE) that leads the
microprocessors to enter into an unknown state and stop

David

The microprocessor sends its status through a dedicated
line (shown in Figure 2) to the H-Core. A SEFI in the
microprocessor results in a failure to send the status

2

19th AIAA/USU
Conference on Small Satellites

SSC05-VIII-3
signal. H-Core chip detects the occurrence of the SEFI
and asserts interrupt signals in an escalating fashion
until the processor reverts back to a known state. The
H-Core is not merely hardware but is a combination of
both hardware and software. The effectiveness of HCore chip depends on how fast the H-Core detects an
SEFI and activates itself. The customized Interrupt
handlers that work along side with the hardware are
very important. These handlers perform a set of
routines that restore the operation of the
microprocessor.

were observed. The TTMR algorithm was able to detect
and correct all 4 SEUs thus providing 100% coverage
of the induced SEUs.
SEFI Results
1. During Pentium test 21 SEFI instances were
observed. The processor was then revived by activating
one of the H-Core signals. Table 1 lists the interrupts
and the number of times the interrupt signal was
attempted to mitigate SEFI and the numbers of times it
was successful in recovering the processor.

RADIATION TESTS AND RESULTS
Table 1. Pentium SEFI Test Results
This section discusses the radiation tests performed on
Microprocessors at the Crocker Nuclear Laboratory,
University of California, Davis. A beam of 63MeV
energy was used for the tests. The goals of the tests
were:
1. Evaluating the performance of TTMR in mitigating
SEUs.
2. Determining the effectiveness of H-Core in reverting
the processor from SEFI hang, and
3. Investigating the radiation characteristics of the
microprocessors under test.
Radiation tests on Intel Pentium III, Texas Instruments
TMS 320C6713, and BSP-15 microprocessors were
performed hitherto. The basic test procedure is to run a
TTMR test code on the Microprocessor while
irradiating it and monitoring for SEUs and SEFIs.
Detection and Correction of SEUs is taken care by the
test code, while SEFIs are detected when the board
stops transmitting the status signals of the program
execution. The H-Core setup is then used to revive the
processor from hang.

A total of 21 SEFIs were observed during the test. As
seen from the table the less severe interrupts such as
BINIT#, INIT#, and LINT0 were asserted 12 times
without any success. However, RESET and Hardware
Watchdog were very effective.
It was observed during the test that the H-Core was
successful in reviving the processor in all the 21 SEFI
cases without powering down the board.

Radiation Test Results

2. Texas Instrument 320C6713 Radiation Test: For the
second radiation test on the TI 320C6713 processor, we
used the CodeComposer software program to Monitor
and Control the H-Core signals. Hence, no SEFI switch
and the related software was used for this test.

SEU and SEFI results obtained from the tests are
analyzed in this section.
SEU Results
A total of 3 SEUs were observed in the TI's 320C6713
microprocessor during the test. CodeComposer
software was used to understand the manifestation of
the SEUs. Out of the 3 SEUs induced, TTMR algorithm
detected all 3 of them but was able to correct only 1. A
further analysis done on this issue, after the test, led us
to a modified and more powerful version of the TTMR
algorithm.

Table 2 shows the effectiveness of the H-Core signals
in mitigating SEFIs. It can be seen from the table that
none of the interrupt signals had any effect on
improving the health of the processor. We believe this
is because the CodeComposer software did not exactly
function as a H-Core, masking the potential results with
faulty handshaking between CodeComposer and the TI
DSP hardware.

The algorithm used in the previous tests were modified
and applied to the radiation test on BSP-15
microprocessor. During the third radiation test 4 SEUs

David

3

19th AIAA/USU
Conference on Small Satellites

SSC05-VIII-3
Reset however had 100% SEFI coverage. The board
was brought back to normal state in all the 6 observed
SEFI cases without powering down the board.
Table 2. TI DSP H-Core Test Results

-- No Single Event Latchup, and
-- Total dose tolerance greater than 95 krad (Si).
In order to achieve the above performance and radiation
figures, the FP has been carefully designed to mitigate
various anticipated radiation environments in space
applications. The radiation response of the processor
has been strengthened through the use of our H-Core
technology. In addition, the primary board functions
have been implemented using radiation-tolerant FieldProgrammable Gate Array (FPGA) circuits.
These major components of the FP are described below:
BSP-15 Processor: It was inferred from the radiation
tests that the BSP-15 processor has superior
performance under radiation compared to other
processors. Hence, it is used on the Proton 100k
computer. As mentioned before, BSP-15 is VLIW
architecture based and is capable of withstanding a total
dose of around 95 krad (Si).

3. Equator BSP-15 DSP Test: The custom board
provided of the option of 3 interrupts, reset, and power
reset for the processor. Customized interrupt handlers
were used to acknowledge the reception of interrupts by
the processor. Table 3 shows the interrupt signals and
their success rates. A total of 26 SEFIs were observed
during the test.

ROMIO FPGA: ROMIO is one of the radiation
hardened 54SX Actel Antifuse FPGAs that
accommodates the H-Core circuit, the UARTs, and the
other interface circuit. This FPGA is responsible for
monitoring and mitigating SEFIs, communicating with
MSP, and IAU ground station through two different
UARTs.

Table 3. BSP-15 Test Results

EDAC FPGA: The function of EDAC FPGA is to
monitor the SDRAM for errors. The EDAC used is a
single-error correcting, double error detecting (SECDED) Hamming code. Every word read from the BSP15 is checked and corrected (if necessary) before
writing it to the SDRAM. An additional SDRAM IC is
used by the EDAC FPGA to store SEC-DED codes
with every word written to the SDRAM from the BSP15. Single, corrected, errors are signaled by the EDAC
to be counted in the ROMIO, so as to be reported in
sub-system status on the down-link. Uncorrectable
errors, cause an interrupt so that the operations with
those data or programs maybe flushed and restarted.
While maintaining data over extended durations the
data is periodically read and rewritten back into the
memory (also called scrubbing) to clear any errors that
might have occurred, since the datum was last used by
or as a program. The EDAC circuit is also mapped onto
an Actel 54SX series FPGA.

Each of the 3 interrupts had a 11.5% success rate in
mitigating SEFIs. As in the previous cases, Reset had a
100% coverage. H-Core was able to mitigate SEFIs
without powering down the board in all the test
sequences.
PROTON 100K DESIGN
As mentioned earlier, Proton 100k or Fusion Processor
(FP) (Figure 3) is built on TTMR and H-Core
technologies and its primary goal is to manage the
imaging subsystem and to maintain the integrity of
received image data while processing it. The major
specifications of the FP are as listed below:

PCI FPGA: The PCI FPGA provides the functional
interface and control to the Fusion Processor to
interface the SSB with the MSP. Like ROMIO and
EDAC FPGAs, PCI circuit is also mapped on an Actel
54SX series FPGA and hence tolerant to the radiation

-- Over 1200 MIPS performance,
-- Less than 1 x 10-4 uncorrected SEUs per day,
-- SEFI mitigation using H-Core Chip,

David

4

19th AIAA/USU
Conference on Small Satellites

SSC05-VIII-3

Figure 3. RoadRunner Experiment
effects. The PCI FPGA is responsible for controlling
these operations: 1. SDRAM Initialization 2. PCI write
access to SSB. 3. PCI read access from SSB. 4. MSP
write access to SSB. 5. MSP read access from SSB. 6.
RX processor write access to SSB. FP being tolerant to
all the radiation effects, processes the data received
without corrupting it and transmits it to the ground
station.

Force Road-Runner satellite program. The RoadRunner
program is a 250 kg satellite being developed by the Air
Force Research Laboratory's (AFRL) Space Vehicle
Directorate under the Responsive Space Program
(RSP). Figure 3 shows the block diagram of the
RoadRunner Experiment system. Some of the main
components of the RoadRunner system are: (1)
Integrated Avionics Unit (IAU), (2) Fusion Processor,
(3) Solid State Buffer (SSB), (4) Malleable Signal
Processor (MSP), and (5) Imager Board. The FP, the
MSP, the SSB, and the imager board are collectively
called the ROPE subsystem. The primary function of
the ROPE subsystem is to capture, process, and
transmit images during the mission when directed by
the ground station. The major components of the RoadRunner system are briefly described below:

THE FLIGHT EXPERIMENT
As the results indicate TTMR and H-Core technologies
can effectively mitigate all SEUs and SEFIs
experienced by commercial microprocessors operating
in radiation environments. The application of these
technologies to COTS VLIW architecture processors
can provide excellent combination of radiation
tolerance and high performance. The design of the
Proton 100k computer has been driven by the
application of these technologies and uses commercial
components to build a system capable of space
missions. This section describes the application of the
Proton 100k to an Air Force satellite system.

Integrated Avionics Unit (IAU): The ground station,
responsible for overseeing all the blocks of the satellite,
interfaces with the ROPE subsystem through IAU. The
IAU communicates with the ROPE subsystem through
a high-speed UART and controls the operation of
ROPE by issuing various commands. The ROPE
subsystem uses the IAU to relay command responses as
well as health and status information to the ground
station in the form of telemetry messages.

The ROPE Subsystem
The first Proton 100k computer, also called the Fusion
Processor (FP), is currently being used in the US Air

David

5

19th AIAA/USU
Conference on Small Satellites

SSC05-VIII-3
Fusion Processor (FP): The FP is a version of the
Proton 100k computer being used as part of the ROPE
subsystem. It plays a pivotal role in the RoadRunner
experiment. The FP is the only communication
interface between the IAU and the ROPE subsystem.
The FP receives and processes the commands received
from the IAU and, in turn, may issue commands to
control the activities of other components of ROPE.
The FP is directly responsible for controlling the
actions of the MSP and the SSB depending on the
commands received from the IAU. In addition, it
indirectly controls the imager board and the camera via
the commands issued to the MSP. In particular, the FP
commands the MSP to capture and store images to the
SSB and to transfer the images from the SSB to the
ground. Finally, the FP also performs some image
processing tasks to manipulate and extract information
(for example, camera calibration data) from the
captured images.

describes the monitoring methods to be used by the FP
during the flight.
As shown in Figure 3, the ROMIO FPGA of the
FP/Proton 100k computer implements all the required
functions of the H-Core technology. The H-Core circuit
of the FP continuously monitors the operation of the its
BSP-15 processor. The firmware executing on the BSP15 processor periodically interacts with the H-Core
circuit to indicate that it is operational. If the processor
suffers a SEFI, it will stop this periodic communication
with the H-Core circuit prompting the H-Core circuit to
begin recovery procedures. In the case of the FP, after
detecting a SEFI event, the H-Core circuit asserts a set
of hardware interrupts in sequence to revive the CPU. If
the processor does not respond to either of the hardware
interrupts, the H-Core circuit asserts CPU reset. Finally,
the H-Core circuit can power-cycle the FP if it does not
recover from a SEFI through any of the previous
efforts. In all cases, after recovery from a SEFI, the FP
firmware records and reports each detection of a SEFI,
the signals that were asserted by the H-Core circuit in
response to the SEFI, and the signal that resulted in
recovery. This information is transmitted to the ground
station as part of the telemetry. As a result, a complete
record of the activities of the H-Core circuit will be
received as part of the telemetry during the RoadRunner
flight.

Solid-State Buffer (SSB): The SSB is a 64-gigabit
solid-state data recorder board consisting of an array of
SDRAM chips. The SSB will be directly controlled by
the FP and will be primarily used to store the images
captured by the imager board. The images will be
transferred from the imager board to the SSB by the
MSP. In addition, the SSB will also be used to transfer
large sets of command data from the FP to the MSP.
Signal Processor (MSP): The MSP, built by PSI Logic,
Inc., interfaces with both the FP and the Imager Board.
It controls the operation of the imager board to capture
images through an on-board camera. The MSP may
compress/decompress the data received from the imager
board and transfers it to the SSB. Similarly, the MSP
can also read previously stored image data from the
SSB and transfer it to ground via a high bandwidth
communication link.

The operation and efficacy of the TTMR technique will
be monitored through a set of test loops executing on
the FP. A TTMR test loop will consist of several short
subroutines that will execute on the BSP-15 processor.
Each of these subroutines is designed to .activate. and
verify the operation of different sub-circuits of the
processor including the ALUs, the memory controller,
the Flash ROM controller, the PCI controller, the DMA
controller, and the I/O bus. For example, a set of
arithmetic calculations will be used to activate the four
ALU units of the BSP-15 processor. Notice that it is
essential to use a collection of subroutines that utilize
different capabilities of the BSP-15 processor in order
to detect SEUs across the processor. This will help
validate the ability of the TTMR technique to correct
SEU-induced errors in all major sub-circuits of the
processor. The test loops will be executed on the FP
whenever it is idle i.e. is not processing any commands
from the IAU (Figure 3). Incoming commands from the
IAU will preempt any test loops in progress to ensure
that the TTMR test loops do not interfere with the
primary functions of the ROPE subsystem. The
individual subroutines will be implemented using the
TTMR technique and will record the number of times
the routines are executed, the number of SEUs detected
by the routines, and the number of SEUs corrected by
the routines. The results of the TTMR experiments

Imager Board: The imager board is responsible for
capturing images and transmitting them to the MSP.
The imager board is directly connected to an on-board
camera and controls the functions of the camera.
Proton-100k Radiation Experiments
The TTMR and H-Core methods have been tested
thoroughly by simulating the space environment on
ground. The operation of the FP on the RoadRunner
flight will allow us to observe and validate the
operation of these technologies in a real space
environment. In addition to the primary functions of the
ROPE subsystem, the FP will also monitor the efficacy
of the TTMR and H-Core methods in detecting and
correcting radiation-induced errors. This section

David

6

19th AIAA/USU
Conference on Small Satellites

SSC05-VIII-3
performed by the FP will be reported periodically to the
ground as part of the telemetry during the RoadRunner
flight.

FPGAs have more SEU susceptibility and hence, are
prone to more number of SEUs.
Also, an SEU in the cells responsible for storing the
control data can lead to SEFI. These SEFIs can lead to
interruption of service, which might be undesirable in
most of the cases.
Specifically, Xilinx based FPGAs suffer from the
following 3 SEE problems:
a.
b.
c.

Data SEUs - Single event upset (SEU) of the data
being passed through the FPGA
Reconfiguration of the FPGA - SEU of the
reconfiguration area, causing part or all of the chip
to change function
FPGA SEFI - reconfiguration of the FPGA to a
point that the FPGA is “hung”

Xilinx has proposed many different solutions to
mitigate the effects of SEU. These include the addition
of redundant logic or/and the use of other features of
Virtex, like TMR, Readback and Reconfiguration,
Scrubbing, use of majority and minority voter circuits
etc. TMR leads to increase in power, area, and board
space of the device along with the increase in the
number of required IOs. The increase of IOs is because
of the way of implementation of TMR in Xilinx.

Figure 4. Proton 100k Single Board Computer
Figure 5. shows a picture of the upgraded proton 100k
along with the Compact PCI (CPCI) card. This board is
to be used on International Space Station (ISS) for as a
medical equipment computer.

TTMR has been adapted for Xilinx FPGAs. A
embedded DSP system with TI’s TMS320C6713 DSP
and Xilinx Virtex-II has been used for demonstrating
the working model of TTMR. The two redundant
modules of TTMR are mapped on the FPGA and the
DSP is used as a Voter. The Voter is a TTMR’d C++
program running on the DSP. The program itself being
TTMR’d is tolerant to SEUs. If the results of the
modules don’t match they are recomputed and voted
again to obtain the uncorrupted output.
TTMR has been shown to perform as proposed in lab
by artificially introducing SEUs.
H-Core2 is based on similar lines as H-Core. H-Core2
monitors Xilinx FPGAs for monitoring any of the three
SEFIs occurring in Xilinx Virtex. Power-on-Reset
(POR) SEFI, Select MAP SEFI, and JTAG SEFI are the
three types of SEFIs (named after their sources of
trouble) occurring in Virtex. H-Core monitors for these
SEFIs and toggles the “PROG” pin for mitigating it. HCore2 also includes additional scrubbing circuitry that
can be used along with TTMR for improving the SEU
tolerance.

Figure 5. ISS proton 100k with 3U CPCI
PORTING TTMR AND H-CORE2 FOR XILINX
VIRTEX FPGAs
The attractive feature of SRAM based FPGAs is the
ability to reprogram it. This feature can be a major issue
under
harsh
radiation
environments.
The
reprogrammable SRAM cells in the FPGA are highly
sensitive to SEUs. Since almost all the functionality of
the FPGA is implemented using SRAM cells, more
“chip area” is susceptible to SEUs. Hence, Xilinx

David

CONCLUSIONS

7

19th AIAA/USU
Conference on Small Satellites

SSC05-VIII-3
It has be investigated in this paper that TTMR and HCore have excellent performance in mitigating radiation
induced effects from commercial processors. It has also
been concluded that Proton 100k is a powerful space
computer that has several advantages compared to the
state-of-the-art rad-hard computers. The application of
Proton 100k for Road Runner experiment further
strengthens its paramount capabilities and its feasibility
for use in space applications.
REFERENCES
1. R. Koga et al., “Single Event Functional Interrupt
(SEFI) Sensitivity in Microcircuits,” Fourth European
Conference on Radiation and Its Effects on
Components and Systems (RADECS), Pages: 311.318,
September, 1997.
2. J. Howard and K. LaBel et al., “Total Dose and
Single Event Effects Testing of the Intel Pentium III(P3)
and AMD K7 Microprocessors,” IEEE Radiation
Effects Data Workshop, Pages: 38.47, July, 2001.
3. J. V. Neumann., “Probabilistic logics and synthesis
of reliable organisms from unreliable components,” in
Automata Studies, Princeton, NJ, Princeton University
Press, 1958, Pages: 43.98, September, 1997.
4. N. Oh, “Software Implemented Hardware Fault
Tolerance,” Ph. D Dissertation, Center for Reliable
Computing, Stanford University, December, 2000.
5. P. Shirvani, “Fault-Tolerant Computing for
Radiation Environments,” Ph. D Dissertation, Center
for Reliable Computing, Stanford University, June,
2001.
6. D. R. Czajkowski et al., “Ultra Low-Power Space
Computer Leveraging Embedded SEU Mitigation,”
IEEE Aerospace Conference Proceedings, Vol.5, Pages:
2315 - 2328, March 8.15, 2003.
7. G. M. Swift et al., “ Single-event upset in the
PowerPC750 microprocessor,” IEEE Transactions on
Nuclear Science, Vol.48, Issue: 6, Pages: 1822- 1827,
December, 2001.
8. M. P. Pagey et. al, “SEFI Mitigation Technique for
COTS Microprocessors: Demonstration Using Proton
Irradiation Experiments,” 2004 Military Aerospace
Programmable Logic Devices Conference, September,
2004.

David

8

19th AIAA/USU
Conference on Small Satellites

