Non Radiation Hardened Microprocessors in Spaced Based Remote Sensing Systems by Estes, Robert F. et al.
Non Radiation Hardened Microprocessors in Spaced Based Remote Sensing Systems 
 
 
The CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite 
Observations) mission is a comprehensive suite of active and passive sensors including a 
20Hz 230mj Nd:YAG lidar, a visible wavelength Earth-looking camera and an imaging 
infrared radiometer. CALIPSO flies in formation with the Earth Observing System Post-
Meridian (EOS PM) train, provides continuous, near-simultaneous measurements and is a 
planned 3 year mission.  CALIPSO was launched into a 98 degree sun synchronous Earth 
orbit in April of 2006 to study clouds and aerosols and acquires over 5 gigabytes of data 
every 24 hours. Figure 1 shows the ground track of one CALIPSO orbit as well as high 
and low intensity South Atlantic Anomaly outlines. CALIPSO passes through the SAA 
several times each day. 
 
 Spaced based remote sensing systems that include multiple instruments and/or 
instruments such as lidar generate large volumes of data and require robust real-time 
hardware and software mechanisms and high throughput processors. Due to onboard 
storage restrictions and telemetry downlink limitations these systems must pre-process 
and reduce the data before sending it to the ground. This onboard processing and real-
time requirement load may mean that newer more powerful processors are needed even 
though acceptable radiation-hardened versions have not yet been released. CALIPSO’s 
single board computer payload controller processor is actually a set of four (4) voting 
non-radiation hardened COTS Power PC 603r’s built on a single width VME card by 
General Dynamics Advanced Information Systems (GDAIS). 
 
 Significant radiation concerns for CALIPSO and other Low Earth Orbit (LEO) 
satellites include the South Atlantic Anomaly (SAA), the north and south poles and 
strong solar events. Over much of South America and extending into the South Atlantic 
Ocean (see figure 1) the Van Allen radiation belts dip to just 200-800km and spacecraft 
entering this area are subjected to high energy protons and experience higher than normal 
Single Event Upset (SEU) and Single Event Latch-up (SEL) rates. Although less 
significant, spacecraft flying in the area around the poles experience similar upsets. 
Finally, powerful solar proton events in the range of 10MeV/10pfu to 100MeV/1pfu as 
are forecasted and tracked by NOAA’s Space Environment Center in Colorado can result 
in Single Event Upset (SEU), Single Event Latch-up (SEL) and permanent failures such 
as Single Event Gate Rupture (SEGR) in some technologies.  (Galactic Cosmic Rays 
(GCRs) are another source, especially for gate rupture) 
 
CALIPSO mitigates common radiation concerns in its data handling through the 
use of redundant processors, radiation-hardened Application Specific Integrated Circuits 
(ASIC), hardware-based Error Detection and Correction (EDAC), processor and memory 
scrubbing, redundant boot code and mirrored files. After presenting a system overview 
this paper will expand on each of these strategies. Where applicable, related on-orbit data 
collected since the CALIPSO initial boot on May 4, 2006 will be noted.   
 
  
https://ntrs.nasa.gov/search.jsp?R=20080014189 2019-08-30T04:18:20+00:00Z
  
 
(Figure 1) South Atlantic Anomaly 
 
System Overview 
 
The CALIPSO Single Board Computer (SBC) is a VME single slot General 
Dynamics Integrated Spacecraft Computer (GDISC) system. This board was chosen for 
several reasons including: processor performance and radiation tolerance. A functional 
diagram is shown in figure 2 and significant system characteristics are listed in table 1 
below. CALIPSO runs all 4 four system processors but at a reduced clock speed of 
160MHz in order support spacecraft platform power requirements. 
 
Processor Main Processor: PowerPC 603r (4), 240 MHz. 
CALIPSO runs at a reduced speed of 160Mhz 
Performance 480 peak MIPS at 240Mhz 
Non-volatile 
memory 
16 MB Flash (EPROM), on-orbit programmable  
128KB (2) (EEPROM), storage for bootstrap 
code with SECDED, CALIPSO application 
supports verifying/updating on-orbit   
RAM system 
memory 
64MB SDRAM with Triple Error 
Correction/Quadruple Error Detection 
(TECQED) 
Interface 
memory 
4MB SDRAM with TECQED 
I/O MIL-STD-1553B 
Two RS-422 serial lines 
Three high-speed (10Mbits/sec) serial lines 
 
(Table 1) CALISPO Specific SBC Characteristics (Courtesy GDAIS) 
 
PowerPC 603r
With Cache
PowerPC 603r
With Cache
PowerPC 603r
With Cache
PowerPC 603r
With Cache
Fault-Tolerant
Microprocessor
Array
Voter 2
V21 ASIC
Voter 1
V21 ASIC
Memory
Controller
V21 ASIC
I/O
Controller
V20 ASIC
SDRAM Array
128/64 MB
Flash EPROM
2 x 8 MB
EEPROM
2 x 128 KB
RS-422 Serial
1553
Discretes
Interface
Memory
4 MB RAM
VME Bus
Ethernet Adapter
(EM Only)
PCI Bus CPU
IOC
 
 
(Figure 2) SBC Functional Block Diagram (courtesy of GDAIS) 
 
CALIPSO application code, developed by Ball Aerospace in Boulder, includes 
more than 12 interrupt service routines and over 40 tasks many of which run briefly 
and/or rarely. Several of the more processor-demanding CALIPSO flight software tasks 
are listed below in Table 2. The utilization values listed were acquired on-orbit while in 
nominal data acquisition mode. 
 
Task Name   CPU % 
WFC_RECEIVE_TASK                            1.398    
IIR_RECEIVE_TASK                            0.709    
PMC_TARGET_RANGE_TASK                       2.080    
AD_SPPS_MON_TASK                            1.067    
LDR_LDP_PROCESS_532P_FRAME_TASK             22.097   
LDR_LDP_PROCESS_532S_FRAME_TASK             21.715   
LDR_LDP_PROCESS_1064_FRAME_TASK 18.363   
WFC_PROCESS_TASK                            0.151    
MC_MEM_SCRUB_TASK                           0.004    
  
…  
Idle Task                                   29.725   
  
Total CPU Utilization 70.28% 
(Table 2)  Significant CALIPSO Software Tasks 
 
 
 
Radiation Mitigation Strategies 
 
 
Redundant processors and radiation-hardened Application Specific Integrated Circuits 
 
As mentioned earlier CALIPSO uses a General Dynamics VME Single Board 
Computer with 4 non-radiation-hardened COTS PPC 603r’s running in strict lockstep. 
Address, data, and control outputs to memory and I/O are majority voted.  If one of the 
members has been upset and is not working correctly, a voting set of three still provides 
correct outputs. Processors that mis-compare are disabled until they can be reset.  This 
design allows the non-radiation hardened processors to be used while still providing 
mitigation of cosmic ray or proton-induced upsets. As shown in figure 3 below, if a 
voting set of 4 loses a processor, it continues as a voting set of 3.  If a voting set of three 
loses a processor, it continues as a comparing set of 2.  If a comparing set of 2 encounters 
a mis-compare, the computer will be reset and the software will restart. To ensure high 
reliability operation, the voter and memory control logic are implemented in redundant 
radiation-hardened ASIC technology. The four PowerPC voting design allows a reliable 
three processor voting set to be maintained even if one of the PowerPCs permanently 
fails, thus increasing the SBC long-term reliability [1].   
 
 
 
4 CPUS
Agree
Resync
2 CPUs
Agree
Failed
Computer
3 CPUs
Agree
Reset
and
Reboot
Disagree
Disagree
Disagree
Resync
Initiate
No CPUs
Failed
One CPU
Failed
Two CPUs
Failed
Resync
Initiate
2 Disagree
2 Disagree
 
 
 
 
 
(Figure 3) SBC Processor Fault Detection and Reaction (courtesy of GDAIS) 
 
 
CALIPSO flight code uses a polling technique to detect and reset as needed any 
processors that mis-compare. This process significantly reduces and can eliminate system 
resets due to mis-compares. This technique called “Processor Error Scrubbing” will be 
discussed in more detail in the “Processor and Memory Scrubbing” section below. 
  
 
Hardware-based Error Detection and Correction (EDAC) 
 
 The GD SBC implements hardware-based EDAC in order to ensure each task 
processes correct data. EDAC runs all of the time on all in-use SDRAM and on the boot 
EEPROM during system initialization. Two additional byte wide columns of RAM 
memory chips are implemented for each 64-bit word.  If a memory device fails, this spare 
memory can be used to replace the failed device, enhancing long term reliability.  This 
mechanism can even circumvent shorts on data lines [1]. Spare RAM i.e. memory not 
currently being used is not supported by EDAC. During operations errors are corrected as 
they are read but are not corrected in the memory array, i.e. not written back to the 
device. Therefore, bit errors in system and interface memory can build up over time. 
Permanent corrections only occur as part of the CALIPSO flight software Memory Error 
Scrubbing (MES) Task, discussed in more detail in the “Processor and Memory 
Scrubbing” section below. In normal memory mode the SBC supports SECDED or 
Single-bit Error Correction/Double-bit Error Detection but CALIPSO uses the optional 
mirrored memory mode whereby the two 64MB banks of RAM store the same image 
thus supporting Triple bit Error Correction and Quadruple bit Error Detection 
(TECQED). The EDAC processing will therefore correct all single, double, and triple-bit 
errors, and will detect all quadruple errors within a single byte. 
 
 
 
 
Processor and Memory Scrubbing 
  
  
The sole purpose of the Memory Error Scrubbing (MES) task is to keep single-bit 
errors from accumulating up to the point where uncorrectable multi-bit errors occur. The 
MES task uses DMA and thus the processor does not have to dedicate resources for this 
task. All of the used system and interface memory are scrubbed every 10 minutes 
(selectable via software table value). Spare system memory is NOT scrubbed. The MES 
task for system memory scrubbing is illustrated in figure 4; interface memory scrubbing 
is similar. The 128MB System Memory is partitioned into two 64MB banks to support 
mirrored mode. The 64MB banks are broken up for scrubbing into 128 512KB blocks. 
The MES Task scrubs both 64MB banks together. Every 10 minutes this DMA process 
starts, and beginning at the top of memory, pulls every 32-bit word across the bus, 
stopping at the bottom of each 512KB block. When the bottom of a block is reached, the 
process checks an error status registers and if set, generates an interrupt. The Interrupt 
Service Routine (ISR) for this corrects the data at the memory address noted in the error 
register. Also, an entry is placed in the Mission Support Software (MSS) error log 
indicating the MES task corrected an error[2]. CALIPSO flight code reads this log at a 
1Hz rate and stores the data for transmission to the ground for analysis. If an error was 
detected/corrected the MES task will re-scrub the same block starting at the address 
immediately following the address that was just corrected. The current version of 
CALIPSO flight code will repeat this “block-retry” process until 12 errors are 
detected/corrected in a block or until no errors are generated. Once the limit is reached or 
no errors are found, the MES task will move on to the next block in the 128-block series. 
All active memories are scrubbed every 10 minutes. 
 
(Figure 4) CALIPSO System Memory Scrubbing Diagram 
 
 
  Total SEUs in Time Interval 
  1 SDRAM 2 SDRAMs 20 SDRAMs 22 SDRAMs 
Worst (peak) minute 0.057 0.11 1.14 1.254 
Worst hour 0.95 1.89 18.94 20.834 
Worst 4.6 hours 4.36 8.71 87.1 95.81 
Worst 24 hours 22.7 45.5 454.6 500.06 
Quiet (peak) minute 0.005 0.010 0.101 0.1111 
Quiet hour (no SAA) 0.013 0.027 0.267 0.2937 
Quiet 4.6 hours (no SAA) 0.06 0.123 1.227 1.35 
Quiet 24 hours (no SAA) 0.26 0.520 5.200 5.72 
(Table 3) CALIPSO Pre-Launch SEU Rate Predictions 
  
  
(Figure 5) Memory Error Scrub Geo-location Map 
 
The location of all system memory errors and processor mis-compares that the 
CALIPSO scrubbing software has fixed to date are shown in figure 5. The asterisks in the 
plot represent the location of CALIPSO when each error was fixed, as flight software 
scrubs and fixes all memory errors every ten minutes the actual location of CALIPSO 
when the error occurred is not known, therefore the line trailing each asterisk represents 
10 minutes of travel. 
 
 
CALISPO pre-launch predicted orbital average SEU rate:  
(5 minutes * 1 peak * 0.1111 SEUs/min) + (94 minutes * 0.0049 SEUs/min) = 1.015 
SEUs/orbit or 0.0102/minute 
 
 
 
The CALIPSO SBC currently uses 22 SDRAMS including 20 for system memory and 2 
for interface memory. Table 3 above shows the pre-launch SEU predictions based on 
proton testing at GD and table 4 below shows the MES SEU data acquired from the time 
CALIPSO was first powered on May 4, 2006 through July 29, 2006. 
 
 Size SEU Count % of  Total Scrub Events 
System Memory 64MB 461 93 
Interface Memory 4MB 32 7 
Totals  493 100% 
(Table 4) CALIPSO SEU Data from the MES Error Log 
 
 
As of July 29, 2006 CALIPSO had been running for 87 days and at 16 orbits per 
day the predictions indicate that the total SEU count should be 87*16*1.015 or 1412. 
CALIPSO is currently experiencing an SEU rate of approximately 0.345/orbit or 0.0034 
per minute, well below predications. Based on the pre-launch SEU predictions CALIPSO 
flight code was configured such that a 60K SEU scrubbing margin would be supported. 
 
 
12 SEUs (per block) * 512 (blocks) / 10 (minute scrub period) / 0.0102 = ~ 60K  
 
With the current SEU rate CALIPSO’s scrubbing margin is: 
12 SEUs (per block) * 512 (blocks) / 10 (minute scrub period) / 0.0035 = ~175K 
 
 It is expected that CALIPSO will experience this large margin for its entire 
planned 3 year life. 
 
The purpose of Processor Error Scrubbing (PES) is to prevent multiple CPUs 
from sitting in a disabled state. As noted earlier processors that suffer radiation induced 
upsets and that do not vote with the majority are disabled until explicitly resynchronized. 
Multiple processors in this disabled state will lead to a system reset. CALISPO flight 
software polls the SBC “re-sync pending” register bit at a rate of 1Hz and if indicated 
initiates a processor resynchronization. Per GD engineers “re-syncing” time is 
approximately 1 ms. The pre-launch predictions indicated that CALIPSO may see 3 
processor mis-compares every week or every 168 hours. Shown in table 5 are the on-orbit 
mis-compare data acquired for 87 days between May 4, 2006 and July 29th 2006.  
 
 
 
 
 
 
 
 
(Table 5) CALIPSO Processor Mis-Compare Data 
 
Processor Number Mis-Compare Count % of Total 
Processor (0) 2 22 
Processor (1) 3 33 
Processor (2) 3 33 
Processor (3) 1 10 
Total 9 100 
 
 
Based on the data to date CALIPSO is experiencing a processor mis-compare rate of 
approximate 0.726 per week, well below the prediction. With this relatively low mis-
compare rate the CALIPSO re-sync period of 1Hz is more than adequate to prevent most 
if not all system resets due to mis-compared processors.  
 
 
 
 
 
Redundant Boot Code 
   
 
The non-volatile 128KB EEPROM contains the bootstrap code which is used 
upon power up to initialize, configure, and verify the SBC hardware. These two 128KB 
EEPROM devices store identical boot code images, power to these devices is applied 
only when needed and as a result these chips are powered off the majority of the time. A 
soft reset will automatically result in a switch to the redundant EEPROM. During start-up 
EDAC is performed on the active boot device. All single bit errors are corrected while 
multi-bit errors may result in a watchdog timeout and subsequent soft reset. This Single 
Error Correction/Double Error Detection (SECDED) feature requires an extra byte of 
check data for each 4 byte address, thus the 68KB boot image is 80KB when this 
SECDED information is added. The CALIPSO software team decided that on-orbit 
verification and if necessary update to the boot code would be implemented. While on-
orbit the boot images are routinely dumped to the ground and verified, if errors are 
observed the original image can be rewritten. If the device itself begins to fail a new 
image can be built that bypasses failed memory addresses. As an operations note, to load 
a new boot image of approximately 80KB requires 15 minutes of spacecraft contact time 
or two nominal contacts. The CALIPSO software teams at NASA Langley and Ball 
Aerospace have verified that they can rebuild from source code a valid boot image. As a 
developers note the GD Refresh Boot Memory (RBM) API was used to support rewriting 
EEPROM. As of 29 July, 2006 the onboard EEPROM devices have been dumped and 
examined three (3) by the operations group at Langley and no errors have been identified.  
 
  
 
Mirrored files 
 
 
Application files are stored on redundant 8MB EPROM devices which like the 
boot devices for radiation reasons are powered only when being accessed. These files are 
checked by operations staff on a regular basis via payload command and rewritten as 
necessary. Certain executable image files are mirrored, i.e. stored on both devices for 
added safety. CALIPSO maintains 2 operational images onboard, one on each device, 
and one “maintenance” image. This maintenance image is expected to be used only when 
neither of the operation images will boot, this is the only file that is mirrored on both 
EPROM devices, to date no errors have been detected.   
 
 
 Conclusions 
 
Spaced based remote sensing systems that include multiple instruments and/or 
instruments such as lidar generate large volumes of data and require robust real-time 
hardware and software mechanisms and high throughput processors. Due to onboard 
storage restrictions and telemetry downlink limitations these systems must pre-process 
and reduce the data before sending it to the ground. This onboard processing and real-
time requirement load may mean that newer more powerful processors are needed even 
though acceptable radiation-hardened versions have not yet been released.  
 
 Use of non-radiation hardened systems requires that robust mitigation strategies 
be developed and employed. CALIPSO utilizes several mitigation techniques including: 
Error Detection and Correction (EDAC), memory and processor scrubbing, device, file 
and processor redundancy. CALIPSO is proof that with the right mix of software and 
hardware COTS systems can be used in LEO and used effectively and efficiently.  
 
Acknowledgements:  
The authors wish to thank Kathy Powell of SAIC for data processing and plotting 
assistance, CALIPSO Technical Manager Carl Weimer, and BATC Flight Software 
Development Lead: Mike Wallner for their technical insight and assistance, NASA 
Langley Electronics Systems Engineer Dave Rosenbaum for his work on system rates 
and margins. Doyle Lahti of General Dynamics for his work on predicted processor mis-
compare rates. 
 
 
 
References: 
[1] General Dynamic’s Hardware Reference manual, version 13233361vA 
 
[2] General Dynamic’s Intergrated SpaceCraft (ISC) Software Programmers Guide 
(SPG), version 13232438v7 
 
[3] M.N. Lovellette, K.S. Wood, D.L. Wood, J.H. Beall , P.P. Shirvani, N. Oh and E.J. 
McCluskey, “Strategies for Fault-Tolerant, Space-Based Computing:Lessons Learned 
from the ARGOS Testbed” 
 
[4] Doyle Lahti, Gary Grisbeck, and Phil Bolton, “ISC (Integrated Spacecraft Computer) 
Case Study of a Proven, Viable Approach to Using COTS in Spaceborne Computer 
Systems”, 14th Annual USU Conference on Small Satellites, pp IV-4.1-8, 2000. 
