Method and system for environmentally adaptive fault tolerant computing by Jeremy, Ramos et al.
mu uuuu ui iiui iiui mii uui mu imi uui iuu umi uu uii mi
(12) United States Patent
Copenhaver et al.
(54) METHOD AND SYSTEM FOR
ENVIRONMENTALLY ADAPTIVE FAULT
TOLERANT COMPUTING
(75) Inventors: Jason L. Copenhaver, Sarasota, FL
(US); Ramos Jeremy, Clearwater, FL
(US); Jeffrey M. Wolfe, Parrish, FL
(US); Dean Brenner, Largo, FL (US)
(73) Assignee: Honeywell International Inc.,
Morristown, NJ (US)
(*) Notice: Subject to any disclaimer, the term of this
patent is extended or adjusted under 35
U.S.C. 154(b) by 524 days.
(21) Appl. No.: 11/202,467
(22) Filed:	 Aug.12, 2005
(65)	 Prior Publication Data
US 2007/0022318 Al	 Jan. 25, 2007
Related U.S. Application Data
(60) Provisional application No. 60/620,047, filed on Oct.
19, 2004.
(51) Int. Cl.
G06F 11/00	 (2006.01)
(52) U.S. Cl . ........................................... 714/47; 714/10
(58) Field of Classification Search ....................... None
See application file for complete search history.
(56)	 References Cited
U.S. PATENT DOCUMENTS
4,970,644 A * 11/1990 Berneking et al . ............. 702/6
(1o) Patent No.:	 US 7,840,852 B2
(45) Date of Patent:	 Nov. 23, 2010
5,022,027 A * 6/1991 Rosario	 ....................... 714/15
5,323,337 A * 6/1994 Wilson et al .	 ................. 702/73
5,487,149 A * 1/1996 Sung	 ........................... 714/10
6,738,930 B1 * 5/2004 Medin et al .	 .................. 714/30
7,080,290 132 * 7/2006 James et al .	 .................. 714/47
7,237,023 132 * 6/2007 Menard et al . 	 .............. 709/224
2002/0046365 Al* 4/2002 Avizienis	 ..................... 714/43
2004/0210800 Al* 10/2004 Vecoven et al .	 ............... 714/47
2005/0005203 Al* 1/2005 Czajkowski	 .................. 714/47
2007/0022318 Al* 1/2007 Copenhaver et al.	 .......... 714/11
OTHER PUBLICATIONS
Donohoe et al. `Adaptive Computing for space'. IEEE 1999. Ramos
et al. `Environmentally Adaptive Fault Tolerant Computing'. IEEE
2004. Broderick et al. `Error Detection for Adaptive Computing
Architectures in Spacecraft Applications'. IEEE 2001.*
Brebner et al. `Reconfigurable computign in Remote and Harsh Envi-
ronment'. 1999.*
(Continued)
Primary Examiner Yolanda L Wilson
(74) Attorney, Agent, or Firm Fogg & Powers LLC
(57)	 ABSTRACT
A method and system for adapting fault tolerant computing.
The method includes the steps of measuring an environmental
condition representative of an environment. An on-board pro-
cessing system's sensitivity to the measured environmental
condition is measured. It is determined whether to reconfig-
ure a fault tolerance of the on-board processing system based
in part on the measured environmental condition. The fault
tolerance of the on-board processing system may be recon-
figured based in part on the measured environmental condi-
tion.
22 Claims, 9 Drawing Sheets
230
232	 ('270
238	 EAFTC Controller 	 260 -
242	 \i FPGA
Spacecraft	 250	 FPGA	 Configuration!
	
Configuration	 Refreshephemeris	 Environmental 	 Deployment	 ControllerServer	 Plan	 262254
SEU Alarm	 Flux Measurments
«sensor»	 .I*-- Figure 9	 Target240
	
Deployment	 Health	 Computer
«onboardGenerator	 Monitor	 Health	 computer>>
Alert Level234	 Generator
	 256History	 245 252
	
Process
	
 
244	 CPU	 DeploymenConfiguration
►
 Direction of Data Flow	 Controller	 258 't_
264
236
https://ntrs.nasa.gov/search.jsp?R=20110000831 2019-08-30T13:58:57+00:00Z
US 7,840,852 B2
Page 2
OTHER PUBLICATIONS
Gailser Research. `Suitability of reprogrammable FPGAs in space
applications'. Sep. 2002. microelectronics.esa.int/techno/
fpga00201-0-4.pdf. *
Beahan, John et A., "Detailed Radiation Fault Modeling of the
Remote Exploration and Experimentation (REE) First Generation
Testbed Architechture," IEEE Aerospace Conference Proceedings
(2000) pp. 279-281.
Kalbarczyk, Zbingniew et al., "Application Fault Tolerance with
Armor Middleware,"IEEE Internet Computing (Mar.-Apr. 2005) pp.
24-33.
Kars, Daniel S., "Computer 2003.pdf', Keywords: NASA, robot,
space exploration, computer.
Kim, K.H. (Kane), "Middleware of Real-Time Object Based Fault-
Tolerant Distributed Computing systems: Issues and Some
Approaches," Pacific Rim International Symposium on Dependable
Computing (2001) pp. 3-8.
Kim, K.H. (Kane), "A Middleware Architecture for Real-Time
Object-Oriented Adaptive Fault Tolerance Support," Paper Presen-
tation.
Lala, Jaynarayan H. et al., "A Dependability Architecture Framework
for Remote Exploration & Experimentation Computers" (1999).
Madiera, Hemique et al., "Experimental Evaluation of a COTS Sys-
tem for Space Applications," Proceedings of the International Con-
ference on Dependable Systems and Networks (2002) pp. 325-330.
Sengupta, R. et al., "Software Fault Tolerance for Low-to-Moderate
Radiation Environments," Astronomical Data Analysis Software and
Systems X—ASP Conference Series, vol. 238 (2001) pp. 257-260.
Some, Raphael R., "High Performance Computing in Dependable
Space Systems," Proceedings IEEEPacicRiminternationalSym-
posium on Dependable Computing (Mar. 2004) p. 335.
Some, Raphael R. et al., "REE: a COT S-Based Fault Tolerant Parallel
Processing Supercomputer for Spacecraft Onboard Scientific Data
Analysis," AIAA/IEEE Digital Avionics Systems
Conference Proceedings (Oct. 1999) pp. 7.13.3-1-7.13.3.12.
Unknown Author, "Application Fault Tolerance with Armor
Middleware" PowerPoint Presentation, pp. 1-50.
Whisnant, K. et al., "An experimental Evaluation of the REE SIFT
Environment for Spaceborne Applications," Proceedings ofthelnter-
national Conference on Dependable Systems and Networks (2002)
pp. 585-594.
Whisnant, K. et al., "A System Model for Dynamically
Reconfigurable Software," IBM Systems Journal, vol. 42, No. 1
(2003) pp. 45-59.
Whisnant, Keith et al., "The Effects of an ARMOR-Based SIFT
Environment on the Performance and Dependability of User Appli-
cations," IEEE Transactions of Software Engineering, vol. 30, No. 4
(Apr. 2004) pp. 1-21).
* cited by examiner
U.S. Patent	 Nov. 23, 2010	 Sheet 1 of 9	 US 7,840,852 B2
C^r
m
0 o S'
U ^ a
o E
IM v v
F-
\ °°	 N
a`CL
a)
N
O r
^	 o
W°
oa	 0H p	 o
LL L	 C
LV p	 ^.0	 _	 LL.N	 L
W
M
N
M	 >	 ` N
J	 N C
T	 C O
21	
`u	
C
C	 # w
W
N
i CD
co
CL
N
N
cE CD o,0
a^ _^^`° u
^,
W N
0	 d	 N
u
m
v Lu CL V CO)
CD
N	 N
U.S. Patent	 Nov. 23, 2010	 Sheet 2 of 9	 US 7,840,852 B2
6452a	 52b
«processor» #2
i	 RHPPC SBC:
--spacecraft interface	 System Controller
A/B»
Spacecraft
«processor>> #1
RHPPC SBC:
System Controller
54
«28V Power»
1	 «Switch Fabric A»^
«Switch Fabric B-
60
50
62
«processor>> #4
APC:
«processor>> #3
APC:
<<processor>> #2
APC:
«processor» #1
APC:
Data Processor
^4.4rdSev1Pcpe
56
«device»	 5 8SEUAlarm: Environmental
Sensors A/B
FIGURE 2
0M
W
j
U.S. Patent	 Nov. 23, 2010	 Sheet 3 of 9	 US 7,840,852 B2
0IV
W
cx
z
Q)
00
U.S. Patent	 Nov. 23, 2010	 Sheet 4 of 9	 US 7,840,852 B2
U.S. Patent	 Nov. 23, 2010	 Sheet 5 of 9	 US 7,840,852 B2
150
or^
Proton	 152	 Ion	 154 COTS Proton 156scintillator and
	
scintillator and
	 scintillator and
Photo Detector	 Photo Detector	 Photo Detector
Threshold	 Output	 Threshold	 Output	 Thresholdl	 Output
Alarm Analog/
Digital Electronics
Control and
Data
160
166
Alarm Analog/
Digital Electronics
Control and
162	 Data
SSM
Controller
FPGA
LDNN
Alarm Analog/
Digital Electronics
Control and
Data
164
sN
$	 ri
N	 f>ij
L a
cPCI Connector
168
FIGURE 5
U.S. Patent	 Nov. 23, 2010	 Sheet 6 of 9	 US 7,840,852 B2
180
181
	 0^1^
182180
	
«processor>> #4 APC: Data Processor
Direction of Data Flow
COTS
	
190	 186	 «processor» #3 APC: Data Processor^ 18 Q	 :processor» #2 APC: Data Processor
<<processor>> #1 RHPPC
SBC:System Controller
92	 1941
EAFTC	 FT System
Controller	 Controller	 214
-x196	 X20
<<processor>> #1 APC: Data Processor
210	 r212
FT System	 ApplicationNode
16
ray luau
Control and
Comm.
Messg.
Middleware
«PCI» Messg.
Middleware
RP
Middleware
202 20
rSystte':
ofw
mpnts
4 0
Linux:OS
222System
Software
ComponentsVxWorks:OS
<<device>>	 208	 «RIO>> «device>>	 226
:SEU Sensor	 :RIO Switch
FIGURE 6
U.S. Patent	 Nov. 23, 2010	 Sheet 7 of 9	 US 7,840,852 B2
't(D
N
co
co
N
O
co
N
O
ti
N
W
Q)
rL-
314 316
Hosted
Applicatlons
f
Mon ito ring Services
API
System Capability
Monito rin g
. Application Monito rvtg
• RP Corrponent Monit—mg
F--
	 Dam	 Process
Schedule	 Integrity	 Group	 •	 System CapabilityService	 Service	 Service	 Management
306	 308	 310	 • F-1—nt>aseOVmApplication Services
Reliable Platform API	 systemorCa nl^ad or. AP I
U.S. Patent	 Nov. 23, 2010	 Sheet 8 of 9	 US 7,840,852 B2
300
Cluster Services
	 I_a<-1
Local Res outee I	 —:Z-^`s
Monito ring 	 (S	 i— do,mch-%Appli cordon	 304	 Man aLC.no ntService Management)
Local Services (Scheduler, Networking, opSys Services)
Native Hardware, operating System, and Vendor Device Drivers
FIGURE 8
O
O	 N
I 
co
CD
mimi"T
^	 I ^
0
Nr
c
.cc
c
a
N
N
v^
W
0qT 0
mqqt
OIt Luju1v nas
wO
d'O
114'
U.S. Patent	 Nov. 23, 2010	 Sheet 9 of 9	 US 7,840,852 B2
NN
MIPS per Watt	 0
I	 ca
IC)
US 7,840,852 B2
1
	
2
METHOD AND SYSTEM FOR	 tion. Furthermore, a more recent adoption of silicon-on-
ENVIRONMENTALLY ADAPTIVE FAULT	 insulator ("SOP') technology by COTS integrated circuit
TOLERANT COMPUTING	 foundries has also resulted in devices with moderate space
radiation tolerance. See, e.g., the references F. Irom et al.,
GOVERNMENT RIGHTS	 5 "Single-Event Upset in Evolving Commercial Silicon-on-
Insulator Microprocessor Technologies, Nuclear and Space
The United States Government may have acquired certain 	 Radiation Effects Conference 2003 and Xilinx Corporation,
rights in this invention pursuant to Contract No. NM0710209	 "QPro Virtex 2.5V Radiation Hardened FPGA," Xilinx Web
awarded by the NASA.
	
	 site http://www.xilinx.com/,  November 2001 herein entirely
io incorporated by reference and to which the reader is directed
BACKGROUND	 for further information.
Despite such progress, COTS components continue to be
I. Field of the Invention	 somewhat highly susceptible to SEUs. One popular approach
The present invention is directed to mitigating radiation 	 for mitigating such SEUs is to employ fixed component level
induced faults. More particularly, the present invention is 15 redundancy. See, e.g., Daniel P. Siewiorek and Robert S.
directed to a method and/or system for handling an inherent 	 Swarz, Reliable Computer Systems Design and Evaluation
susceptibility of Commercial-Off-The-Shelf ("COTS") com- 	 Yd edition, MA: AK Peters Ltd., 1998 herein entirely incor-
ponents to Single Event Upsets ("SEUs"). The invention is 	 porated by reference and to which the reader is directed for
particularly useful in providing real time environmental sens-	 further information. However, one disadvantage of utilizing
ing, utilizing a COTS based computer architecture that sup-  20 fixed component level redundancy is its low efficiency and its
ports adaptable configuration levels of fault tolerance, while	 unrealized system capacity.
also increasing performance and efficiency while maintain-	 Certain conventional onboard processing computers con-
ing reliable operation. However, aspects of the invention may 	 sist mostly of radiation hardened components based on COTS
be equally applicable in other scenarios as well. 	 equivalents. Though COTS compatibility offers certain per-
II. Description of Related Art	 25 ceived benefits, including adoption of commercial software,
Science and defense missions alike have increasing	 typically large amounts of Non-Recurring Engineering
demands for data returns from their space born assets. In more 	 (NRE) are often required for an initial silicon implementa-
recent times, there has been an increase in the capability of the	 tion. Additionally, radiation hardened components often lag
instruments deployed in space. For example, such an increase 	 their commercial counterparts in overall performance and
has been discussed in the following references which are 30 capability by at least 1 to 2 orders of magnitude. There are a
herein entirely incorporated by reference and to which the 	 number of factors that contribute to this deficiency. One such
reader is directed to for further information: "An Overview of 	 factor relates to radiation-hardening techniques and that such
Earth Science Enterprise", NASA Goddard Space Flight 	 techniques for microelectronics require the use of fixed tran-
Center, FS-2002-3-040-GSFC, March 2002; Wallace M. Por- 	 sistor or gate level redundancy. This additional logic
ter And Harry T. Enmark, "A System Overview of The Air- 35 increases the power required to perform the same unit of
borne Visible/Infrared Imaging Spectrometer (AVIRIS)", 	 computation.
7PL Pasadena, Calif.; and H. L. Huang, "Data Compression 	 An approach towards improvement concerns the use of
of High-spectral Resolution Measurements", Satellite Direct 	 true COTS microprocessors and Field Programmable Gate
Readout Conference for the Americas, December 2002. 	 Arrays ("FPGAs"). Typically, such an approach avoids the
In one typical approach for data gathering, data compres- 4o high cost and long development time associated with radia-
sion and data transmission no longer appears sustainable. It is 	 tion hardened equivalents. However, true COTS devices are
difficult to transmit a vast amount of data via available down-	 typically quite susceptible to SEUs. One popular SEU miti-
link channels in a reasonable period of time. One proposed 	 gation approach is to use component level N-module redun-
solution to such a situation is to reduce demand on a downlink 	 dancy. However, such N-module redundancy often results in
by moving processing away from earth and onto the space 45 low efficiency and low capacity due to an overhead that often
born asset.	 approaches 2/s, or more.
However, there are certain limitations to such an approach. 	 Furthermore, the level of redundancy is fixed and is often
For example, this approach is hampered by limited capabili- 	 unnecessary. To overcome the deficiencies of fixed redun-
ties of conventional on-board processors. It is also prohibitive 	 dancy, two characteristics of space missions may be focused
based on the cost of developing radiation hardened high- 50 on: first, the variability of space environment and second, the
performance electronics. Such issues are discussed in the 	 task level criticality. Most missions will have a mix of pro-
references J. Marshall and R. Berger, A Processor Solution	 cesses with varying criticality. This characteristic of mission
for the Second Century of Powered Space Flight," Digital	 processing can be exploited to increase a systems efficiency
Avionics Systems Conferences, 2000. Proceedings. DASC.	 by applying redundancy at a task level. Furthermore, there is
The 19th Volume: 2, 7-13 Oct. 2000, Pages: 8.A.21- 55 a variability involved in a space environment and this vari-
8.A.28 and Gary R. Brown, "Radiation Hardened PowerPC	 ability provides a temporal and orbital position dependency
603eTM Based Single Board Computer," 20th Digital Avionics	 on the necessary redundancy.
Systems, 2001. October 2001 herein entirely incorporated by 	 There is, therefore, a general need for a method and/or
reference and to which the reader is directed for further infor- 	 system for the mitigation of radiation induced faults
mation.	 60 ("SEUs"). There is also a general need for a method and
Based in part on these perceived concerns, the relevant 	 system that can utilize lower cost COTS components in space
industry has considered the use of COTS components. For	 which exhibit acceptable overall TID and Latch Up charac-
example, such general considerations are generally described 	 teristics, but are still susceptible to SEUs. A further need
in the reference E. R. Prado et al., A Standard Approach to	 exists for a system and/or method that facilitates the use of
Spacebome Payload Data Processing," IEEE Aerospace Con- 65 COTS components in SEU abundant environments, while
ference, March 2001 herein entirely incorporated by refer- 	 also maintaining adequate levels of system efficiency and
ence and to which the reader is directed for further informa- 	 capacity.
US 7,840,852 B2
3
There is a further need for such systems and methods of
accomplishing such adequate levels of system efficiency and
capacity by adaptively configuring a level of fault tolerance in
a system as mandated by a mission environment and/or a
mission application. Consequently, there is a general need for
real time environmental sensing, utilizing a COTS based
computer architecture that supports adaptable configuration
levels of fault tolerance, while also optimizing performance
and efficiency while maintaining reliable operation.
SUMMARY
According to an exemplary embodiment, a method of
adapting fault tolerant computing comprises the steps of mea-
suring an environmental condition representative of an envi-
ronment and analyzing an on-board processing system's sen-
sitivity to the measured environmental condition; and
determining whether to reconfigure a fault tolerance of the
on-board processing system based in part on the measured
environmental condition.
In an alternative embodiment, a system for environmen-
tally adaptive fault tolerant computing (EAFTC) comprises a
sensor that senses a characteristic of a dynamic environment
and generates an output signal based on the characteristic. A
system configuration controller receives the output signal, the
controller assessing a potential environmental threat to an
availability of the system based in part on the output signal. A
computing device receives an input from the controller. A
configuration of the computing device is adapted to effec-
tively mitigate the potential environmental threat to the sys-
tem's availability.
These as well as other advantages of various aspects of the
present invention will become apparent to those of ordinary
skill in the art by reading the following detailed description,
with appropriate reference to the accompanying drawings.
BRIEF DESCRIPTION OF DRAWINGS
An exemplary embodiment of the present invention is
described herein with reference to the drawings, in which:
FIG. 1 illustrates one arrangement of an EAFTC based
system incorporating aspects of the present invention;
FIG. 2 illustrates one arrangement of a target computer that
may be utilized with the EAFTC based system illustrated in
FIG. 1;
FIG. 3 illustrates one arrangement of an adaptive process-
ing computer that may be utilized with the target computer
illustrated in FIG. 2;
FIG. 4 illustrates one arrangement of a rapid I/O system
that may be utilized with the target computer illustrated in
FIG. 2;
FIG. 5 illustrates one arrangement of an alarm module that
may be utilized with the target computer illustrated in FIG. 2;
FIG. 6 illustrates a software framework for the target com-
puter illustrated in FIG. 2;
FIG. 7 illustrates an exemplary block diagram of the
EAFTC controller illustrated in FIG. 1;
FIG. 8 illustrates an exemplary block diagram of reliable
middleware that may be utilized with the EAFTC controller
illustrated in FIG. 1;
FIG. 9 illustrates one example of applying the EAFTC
system illustrated in FIG. 1.
5 FIG. 1 illustrates an exemplary block diagram of a first
arrangement for an EAFTC based system 10. Preferably,
EAFTC based system 10 employs a system level fault toler-
ance based on historical and/or environmental conditions.
EAFTC system 10 comprises an EAFTC controller 12, an
io environmental sensor suite 14, and a target computer 16.
EAFTC controller 12 comprises history 18 and a deployment
plan 20. Sensor suite 14 preferably comprises a plurality of
sensors including but not limited to a SEU alarm 22, environ-
ment measurement 24, and the spacecraft 26. Other sensor
15 suite arrangements are also possible.
A preferred process implemented by the arrangement illus-
trated in FIG. 1 includes the following steps: First, sensor
suite 14 provides a method of sensing an environmental con-
dition. For example, sensor suite 14 can provide an energy
20 level indication 32 from SEU alarm 22, a sensor response 34
from environmental measurement 24, or alternatively ephem-
eris 36 from spacecraft 26. Once such signal or signals are
received, EAFTC controller 12 evaluates the environmental
condition (which may be an environmental threat) to the
25 system's 10 availability. If EAFTC controller 12 determines
that such an environmental threat exists, system 10 then
adapts (if deemed necessary) a configuration of target com-
puter 16. In this manner, system 10 effectively and dynami-
cally mitigates potential threats presented by the environ-
30 ment. As seen from FIG. 1, the direction of data flow 38
proceeds from sensor suite 14 through EAFTC controller 12
and then towards target computer 16.
In general, EAFTC controller 12 may be implemented to
accept various different environmental inputs from sensor
35 suite 12 that can induce faults in target computer 16, such as
a payload computer system. However, for a particular
arrangement presently discussed herein, environmental
monitoring may be focused on measurements of high-energy
particle flux such that may occur in a space born asset. For
40 example, in such the situation where the EAFTC system 10 is
provided in a spacecraft, by monitoring flux of high-energy
particles, it is possible to assess the systems overall suscep-
tibility to SEUs. However, those of ordinary skill in the art
will recognize that alternative measurement and system
45 arrangements and/or alternative environmental inputs may
also be utilized.
Returning to FIG. 1, sensor measurements (e.g., tempera-
ture, available power, etc) and a state of health of target
computer 16 are continuously monitored by EAFTC control-
50 ler 12 via health signal 42. Such information and data 42 are
combined with a mission defined application task deploy-
ment plan. Preferably, the mission defined application task
deployment plan contains task level criticality requirements
as well as other pertinent information used by EAFTC con-
55 troller 12. Based on that input, EAFTC controller 12 deter-
mines whether there exists any reliability and/or availability
threat and preferably the level of such threat posed by the
present environment in which the asset resides on target com-
puter 16.
60 EAFTC controller 12, which acts as a system configuration
controller, then generates the requisite signals by way of
process deployment which are then sent to adapt target com-
puter. In this manner, the process deployment 40 will counter
a potential hostile environmental threat to computer 16.
65 Based on the threat assessment the system configuration con-
troller 12 reconfigures the on-board processing system fault
tolerance to match the threat level. The on-board processing
4
DETAILED DESCRIPTION
A. General Overview of EAFTC System
US 7,840,852 B2
5
system preferably implements configurable fault tolerance
that match the variable threats that will be encountered by the
system. In response, target computer 16 optimally employs
the requested fault tolerant mechanism. This process is per-
formed in real-time and on-line as an integral part of overall
operation of system 10.
Hardware Implementation
As may be seen from FIG. 1, EAFTC controller 12 receives
certain commands from a target computer 16 by way of health
signal 42. In one preferred arrangement, hardware for target
computer 12 may comprise Honeywell's Integrated Payload
System. The Honeywell Integrated Payload System is essen-
tially a cluster computer consisting of a multitude of data
processors and one cluster manager.
FIG. 2 illustrates one arrangement of a target computer 50
that may be utilized with EAFTC system 10 illustrated in
FIG. 1. In this arrangement, target computer 50 comprises
various hardware elements including a system controllers
52a and 52b, a plurality of data processors 64a, 64b, 64c, and
64d, a first packet switched fabric 62a, a second packet
switched fabric 62b, and an environmental sensor suite 58. A
power supply 56 is also provided.
A. System Controller
System controller 52 for target computer 50 is preferably
implemented using redundant, Radiation Hardened Single
Board Computers. Such a reliable radiation hardened system
controller 52 provides a platform for deployment of critical
control software such as the EAFTC controller. For example,
in one arrangement, a potential candidate for a system con-
troller 52 may comprise a Honeywell radiation hardened
RHPPC Single Board Computer ("SBC"). See, for example,
the description as provided by Gary R. Brown, "Radiation
Hardened PowerPC 603e Based Single Board Computer,"
IEEE Aerospace Conference, 2001 (http://cism.jpl.nasa.gov/
events/seminardoc s/Big_sky0802Ol .pdf).
In one preferred arrangement, radiation hardened SBC is
based on Motorola 603e microprocessor technology. Such a
radiation hardened SBC is generally described in Gary R.
Brown, "Radiation Hardened PowerPC 603e TM Based Single
Board Computer," 20th Digital Avionics Systems, 2001.
October 2001 herein entirely incorporated by reference and to
which the reader is directed to for further information. The
use of an RHPCC SBC may be preferred for a number of
reasons. Some of these reasons are summarized in Table 1
provided below:
TABLE 1
RHPPC SBC Features
Salient Features
3.3 V and 5.0 V Power
RHPPC delivering 100 MIPS
Peripheral Enhancement Component support
chip
4 MB EEPROM with Single Error Correction
and Double Error Detection
512 KB EEPROM
128 MB DRAM with SuperEDAC
6 U x 220 mm Euro Card Form Factor
Max Power Draw 15 W
Mass >3 lbs
Redundant 1553 (interface to spacecraft
computer)
32-bit 33 MHz PCI (interface to cluster and
MIB electronics)
6
B. Data Processors
As illustrated in FIG. 2, target computer 50 further com-
prises a plurality of data processors 64. In this preferred
arrangement, plurality of data processors comprise a first, a
5 second data, a third, and a fourth data processor 64a, b, c, and
d, respectively. In one preferred arrangement, these data pro-
cessors comprise COTS based processors. More preferably,
these data processors comprise COTS based processors com-
prising a unique architecture herein referred to as an Adaptive
10 Processing Computer ("APC"). APC is a multi-mode device
that combines the use of COTS microprocessors and FPGAs
on a single platform. In one arrangement, the APC employs a
COTS IBM PowerPC 750FX microprocessor and a Xilinx
VirtexII 6000 FPGA. The IBM 750fx and Xilinx VirtexII
15 devices are suitable COTS devices for flight experiment.
C. Adaptive Processing Computer
FIG. 3 illustrates one arrangement of an adaptive process-
ing computer ("APC") 80 that may be utilized with target
20 
computer 50 illustrated in FIG. 2. APC 80 comprises a COTS
compute resources portion 82 and a portion comprising a
radiation hardened configuration manager 84 along with sup-
porting functions. Configuration manager 84 handles various
functions including but not limited to mode changes of APC
25 
80, basic FPGA configuration, FPGA configuration memory
scrubbing, low-level health monitoring, and power mode
control.
In one preferred arrangement, APC 80 may implement a
plurality of operational modes of operation. For example,
30 APC 80 may implement a microprocessor mode, a customprocessor mode, and a hybrid processor mode. The mode of
operation may be determined by the active configuration of a
FPGA labeled Processing Element/Processor Controller
("PE/PC") 88 in FIG. 3.
35 1. Microprocessor Mode
APC 80 may be configured in a microprocessor mode.
While inthis mode, APC's FPGA is configured as a Processor
Controller and the microprocessor is enabled. As such, APC
behaves much like a SBC. Processor Controller FPGA hosts
4o all of the support functions for PPC including IO, memory
controller, interrupts, timers, etc.
2. Custom Process
When enabled as a custom process, microprocessor is dis-
abled and does not execute software. While APC 80 is in this
45 
custom process mode, FPGA of PE/PC 88 is configured as a
Processing Element and hosts a full-custom application
including all IO and processing logic. The processing logic in
Processing Element is defined by an image loaded into
FPGA's configuration memory by configuration manager 84.
So Configuration manager 84 receives commands from software
on system controller 52 of target computer 16 (see FIG. 2).
3. Hybrid Mode
The thirdAPC capability is a hybrid mode operation. In the
55 hybrid mode, FPGA hosts processor controller for micropro-
cessor as well as application specific modules. This third
alternative mode canbe likened to a co-processor system. The
application specific modules could be Digital Signal Process-
ing ("DSP") functions, data compression, vector processors,
60 etc. As with the custom mode, the use of application specific
modules may result in high efficiency and performance
yields. For example, a general description of such efficiencies
and performance yields is generally described in J. S. Donald-
son, "Push the DSP Performance Envelop," Xilinx Xcell
65 Journal, Spring 2003 herein entirely incorporated by refer-
ence and to which the reader is directed for further informa-
tion. This third mode also offers additional flexibility by
US 7,840,852 B2
7
retaining a programmable microprocessor and access to cus-
tom hardware. APC is also capable of dynamic switching
between these modes. Such a feature may prove useful in
many applications. For instance, such a feature may prove
useful if multiple data channels are part of the same payload,
then the APC's operating mode can be switched to better
serve the needs of the active data channel.
APC's flexibility allows one to adopt the target processor
for a variety of mission level requirements. As just one
example, enhanced efficiency may be achieved by using more
custom hardware modules in FPGA. Similarly, enhanced pro-
cessing performance may also be realized in FPGA modules.
However, for certain applications that may require enhanced
programmability, microprocessor mode might be a more suit-
able application. Utilizing an APC can facilitate these needs.
Moreover, other implementation alternatives are not typically
available in on-board processor modules. An example of the
APC's flexibility is in a processing situation where there is a
mix of control flow as well as data flow processing on the
same computer. Control flow applications are generally more
likely to be sequential where data flow tends to be more
parallel. In the case of sequential applications, a micropro-
cessor may yield acceptable performance results. However,
parallel applications can better use the FPGA co-processor to
accelerate their processing.
Certain relevant features of a preferred APC, such as APC
80, are provided below in Table 2.
TABLE 2
APC Features
Features
750 fx @ 650 MHz Delivering 1300 MIPS
VirtexII 6000 Processing Element/Processor
Controller
PCI 32-bit 33 MHz
Rapid UO
128 MB DRAM with Super EDAC
4 MB EEPROM with SECDED EDAC
Configuration Manager with support FPGA
SEU mitigation
PCI-to-PCI bridge facilitating a local PCI bus
Ethernet development interface
6 U x 220 mm Euro Card Form Factor
Mass <3 lbs
Max Power Draw 20 W
Returning to FIG. 2, target computer 50 further comprises
a packet switched fabric A 60 and packet switched fabric B
62. Preferably, the various modules comprising system 50 are
interconnected via a packet switched fabric based on a
RapidIO ("RIO") industry standard. Additional information
on this industry standard, the reader is directed to RapidIO
Trade Association Web site at http://www.rapidio.org/  herein
entirely incorporated by reference and to which the reader is
directed to for further information.
RIO is an industry standard and is generally recognized as
one of the more popular, conventional COTS interconnect.
Certain conventional payload data processor interconnects
are based upon multi-drop configurations. Such multi-drop
configurations include but are not limited to MODULE BUS,
PCI and VME. One advantage of such multi-drop systems is
that they distribute available bandwidth over each module.
However, this may result in producing points of contention
among participant nodes often resulting in system level
bottlenecks.
In contrast to such multi-drop systems, RIO implements a
packet-switched, point-to-point interconnect. Such an inter-
connect has certain advantages. For example, packet-
8
switched, point-to-point interconnects allow, multiple full-
bandwidth point-to-point links to be simultaneously
established between end-nodes in a network. Another advan-
tage of packet-switched, point-to-point interconnects is that
5 they reduce contention while also delivering more bandwidth
to an application.
FIG. 4 illustrates one arrangement of a rapid I/O ("RIO")
system 100 that may be utilized with target computer 50
illustrated in FIG. 2. RIO system 100 comprises sensor data
10 116, two processors 102 and 104, a rapidIO switch 108, bulk
memory 110, general purpose I/O 114, a backplane 106, and
non-volatile memory 112.
RIO system 100 comprises essentially two building
blocks: a RIO end-node 120 and a RIO switch 122. Each
15 end-node 120, 122 in RIO system 100 comprises a RIO
network interface. Each RIO network interface comprises a
point-to-point link to shared RIO Switch 108. RIO switch 108
receives and routes packets to the appropriate destination over
backplane 106. The non-blocking nature of RIO allows con-
20 current routing of multiple packets. For example: sensor data
116 may be stored in bulk memory 110 at the same time as
processors 102,104 access general purpose I/O 114. By using
multiple switches as illustrated in FIG. 4 in the EAFTC sys-
tem 10 of FIG. 1, topologies consisting of hundreds or thou-
25 sands of nodes may be achieved.
In one preferred arrangement, RIO interfaces are based on
LVDS signaling technology and can achieve bandwidths of
up to 60 Gbits/s for each active link. A 16 bit RIO system with
two active point-to-point links is capable of 120 Gbits/s pro-
30 viding >120x performance increase over a 33 MHz 32 bit
Compact PCI based system.
One benefit of a RIO protocol is this protocol's error detec-
tion and recovery mechanism. By combining retry protocols,
cyclic redundancy codes ("CRC") and single/multiple error
35 detection, RIO handles all in network errors without applica-
tion intervention. This inherent error handling and recovery
capability proves beneficial for certain applications that may
require a generally high reliable interconnect, such as space
applications.
40
Environmental Sensor Suite
Returning to FIG. 2, target computer 50 further comprises
an environmental sensor suite 58. Therefore, EAFTC system
10 relies, to a certain extent, on an ability to sense its envi-
45 ronment. As part of PSI's Reconfigurable Environmentally-
Adaptive Computing Technology (REACT), a miniature
embedded radiation monitor, the SEU Alarm has been devel-
oped. The SEU Alarm is based on certain flight-proven tech-
nology originally developed for PSI's radiation diagnostic
50 instrumentation. General background information on this
radiation diagnostic instrumentation may be obtained from
Physical Sciences Inc. Web site http://www.psicorp.com/in-
dex.shtml herein entirely incorporated by reference and to
which the reader is directed to for further information. Advan-
55 tages of a SEU Alarm over conventional sensors are its rela-
tively small foot print and that the Alarms are designed to
support SEU rate predictions.
In one arrangement, SEU alarm (shown as alarm 22 in FIG.
1) provides continuous monitoring of proton and heavy-ion
60 fluxes that cause single event upsets. In one preferred arrange-
ment, SEU alarm comprises a small block of scintillators
coupled to a photo-detector. For example, FIG. 5 illustrates
one such arrangement of a SEU alarm module 150. Module
150 comprises three sensors 152, 154, 156, respectively
65 coupled to three controller electronics 160, 162, and 164.
Module 150 further comprises a controller 166, and a network
interface 168. Controller 166 provides the control and inter-
US 7,840,852 B2
9
face register for software interface to the sensor modules.
Software configures each sensor for a given application by
setting alarm thresholds and refresh rates. Software can also
access the alarm measurements for use in evaluating the
threat to the system.
SEU alarm 150, by way of sensors 152, 154, 156, provides
continuous monitoring of the proton and heavy-ion fluxes that
cause single event upsets. The basic components of the SEU
Alarm are a small block of scintillators coupled to a photo-
detector. In one preferred arrangement, a number of these
devices can be consolidated onto a single module.
Software Framework
FIG. 6 illustrates a preferred software framework 180 for a
target computer, such as the target computer 16 illustrated in
FIG. 1. Software framework 180 comprises an operating sys-
tem/system software, fault tolerant system controller/node,
EAFTC controller 192, messaging middleware 200, and reli-
able platform middleware 216. One objective of the target
computer software framework is to provide system develop-
ers with a stable yet familiar software platform. In FIG. 6, the
software comprises mission specific payload control 196 and
communications hosted on system controller 194, and appli-
cationprocesses distributed across data processor cluster 181.
These software components may be developed using COTS
environments and associated Application Program Interfaces
("APIs").
In one preferred arrangement, the proposed Operating Sys-
tems are VxWorks 202 for System Controller 194 and Linux
for Data Processor cluster 181. Information on this proposed
Operating System by VxWorks may be found at Wind River
Systems Web sitehttp://www.windriver.com/  which is herein
entirely incorporated by reference and to which the reader is
directed for further information.
VxWorks OS 202 provides the capabilities necessary for
the deployment of real-time control processes such as those
implemented by EAFTC controller 192, fault tolerant system
controller 194, and payload control and communications 196.
VxWorks OS 202 also provides a familiar platform for devel-
opers of these types of applications. Data processor cluster
181, unlike system controller 194, is the domain of the sci-
ence application developer. In this case, Linux OS 220 is a
preferred OS due to its popularity in the scientific community.
To mitigate concerns associated with the interaction of het-
erogeneous operating systems, a COTS messaging middle-
ware 214 may also be introduced. For example, the messag-
ing component of GoAhead's SelfReliant Middleware
provides a common interface for communication between
Linux OS 220 and VxWorks OS 202 along with a variety of
practical messaging services such as publish-subscribe, and
replicated databases. See, e.g., GoAhead Web site http://ww-
w.goahead.com/ which is herein entirely incorporated by ref-
erence and to which the reader is directed for further infor-
mation.
Messaging within data processor cluster 181 may be
accomplished via Reliable Platform (RP) Middleware 216,
which is also responsible for the Software Implemented Fault
Tolerance (SIFT) in the cluster. C. J. Walter, P. Lincoln and N.
Sun, "Formally verified on-line diagnosis," IEEE Trans. on
Software Engr., vol. 23, #11, pp. 684-721, November 1997
which is herein entirely incorporated by reference and to
which the reader is directed to for further information.
Together, the OS and Middlewares provide the base platform
on which other software may be implemented.
10
EAFTC and RP Middleware
In one preferred arrangement, EAFTC comprises essen-
tially two software components: a Reliable Platform Middle-
ware (RP) and an EAFTC controller.
5 1. EAFTC Controller or System Configuration Controller
EAFTC controller or system configuration controller pro-
vides control of an EAFTC based system illustrated in FIG. 1.
Since the integrity and dependability of the EAFTC system
relies on thi s controller its realization must be highly reliable.
to Hence, the EAFTC may be selected to be implemented as a
software component hosted on a reliable system controller.
One advantage of such a system is that this implementation
provides an enhanced level of flexibility for future use and
15 adaptations.
FIG. 7 depicts an overview 230 of internal functions of a
system controller 270 in the context of a characteristic system
implementation. The general description of the various com-
ponents comprising one preferred arrangement of a system
20 controller is provided below.
In one arrangement, a system controller 270 comprises an
Environmental Server 242, Alert Level Generator 244,
Deployment Plan 250, Deployment Generator 252, FPGA
Configuration Controller 254, Health Monitor 256, and CPU
25 
Configuration Controller 258. Given a variety of possible
sensory input, a function has been defined to collect and
organize sensor signals into abstract representations that may
be shared with other EAFTC components. Environmental
Server 242 encapsulates the low-level interfaces to each of the
sensors in the system, including the sampling of each signal.30 In the arrangement illustrated in FIG. 7, this would be from
spacecraft 232 and SEU Alarm 234.
Health Monitor
Health Monitor 256 monitors a state 266 of each target
35 system computer resource 236. Signals such as heartbeats,
redundant output consistency mismatches, watchdog time-
out, etc are collected via Fault Tolerant Controller/Node com-
ponents. These signals are then provided to Health Monitor
256. Given predefined policies, Health Monitor 256 makes a
4o determination of the health for each Data Processor inanAPC
cluster, such as APC cluster 64 illustrated in FIG. 2 and APC
cluster 181 in FIG. 6. This information is then shared with
Deployment Generator 252 where it is used in determining
the system's task deployment from Deployment Plan 250.
45 History Database
Although reacting to immediate sensory input may be
adequate for certain applications, the ability to predict near
future threats to an EAFTC system provides certain advan-
50 
tages. In particular, adapting fault tolerance to address antici-
pated threats reduces an exposure of the system to faults.
History Database 248 is a component of a predictive filter
implemented in Alert Level Generator 244. As just one
example, sensor measurements from a previous spacecraft
55 
orbit may be maintained in History Database 248 and subse-
quently retrieved by Alert Level Generator 244 for use by
Deployment Generator 252.
Alert Level Generator
The process of evaluating an environmental threat to an
6o EAFTC system is implemented inAlert Level Generator 244.
Given the current sensory input received from spacecraft 232
and/or SEU Alarm 234, Historical Database 248, and a set of
system specific thresholds, Alert Level Generator 244 outputs
a discrete threat level 245 for EAFTC system. An important
65 algorithm of Alert Level Generator 244 is an Adaptive Linear
Predictive Filter. This Adaptive Linear Predictive Filter gen-
erates a particle flux prediction. Based on this particle flux
US 7,840,852 B2
11
prediction, a series of user defined thresholds may be evalu-
ated to determine a current system alert level to be used by a
Deployment Generator in determining EAFTC system's pro-
cess deployment.
Deployment Plan
The on-line behavior of an EAFTC controller may vary
based on a target environment, system level requirements,
target application, target system architecture, and other
implementation specific factors. This application specific
behavior may be captured as a user defined parameter set. In
particular, the Deployment Plan describes the desired system
dependability for a given spacecraft position, threat level, and
time. The Deployment Plan may be defined by the require-
ments of each individual application process.
Deployment Generator
Once the system threat level has been assessed, Deploy-
ment Generator 252 acts to counter the threat. Given a par-
ticular Deployment Plan 250, target system health 262, and
alert level 245, Deployment Generator 252 produces a new
system deployment. The process of generating a new deploy-
ment is primarily based on determining a lowest cost distri-
bution of application processes (including number of repli-
cas) across available target resources. The generated
deployment is then sent to each node in a cluster where local
actions implemented by Fault Tolerant Node software fulfill
the deployment requests. Specifically, in one arrangement,
Fault Tolerant Node collaborates with RP Middleware, as
discussed in greater detail below, to deploy fault tolerance as
requested.
Configuration Controllers
CPU Configuration Controller 258 is designed to interface
with a particular target system 236 and provide process
deployment 264. Where more than Configuration Controller
258 is implemented and given a new deployment, each Con-
figuration Controller generates the low-level signals to effect
required changes in a targeted system. In a preferred arrange-
ment, two Configuration Controller types are implemented.
The first Configuration Controller is responsible for interac-
tion with APC nodes operating in microprocessor mode. The
second Configuration Controller interacts with APC nodes
operating in custom processor mode.
Reliable Platform ("RP") Middleware
The role of WW Technology's RP in the overall EAFTC
solution is that of Software Implemented Fault Tolerance
(SIFT). SIFT is a fault tolerant technique that relies on soft-
ware to provide redundancy at the process level. (See, e.g.,
Daniel P. Siewiorek and Robert S. Swarz, Reliable Computer
Systems Design and Evaluation Yd edition, MA: AK Peters
Ltd., 1998 herein entirely incorporated by reference and to
which the reader is directed for further information.) The RP
manages the fault tolerance of applications and services dis-
tributed across clusters of processors by establishing a con-
sistent framework and common context in which the system
operates.
In one preferred arrangement, RP consists of a set of ser-
vices that facilitate the implementation of reliable systems
through the dependable management of redundant/replicated
resources. RP addresses the needs of composing systems
utilizing COTS hardware and software components, as it
offers a software based solution that provides transparent
Fault Detection, Isolation and Removal ("FDIR") services,
enabling hosted applications to provide uninterrupted deliv-
ery of service in the presence of faults.
FIG. 8 illustrates an exemplary block diagram 300 of reli-
able middleware that may be utilized with the EAFTC con-
12
troller illustrated in FIG. 1. FIG. 8 depicts a block diagram of
the RP and its relationship to other software elements of the
system. The main RP framework components are described
as follows.
5 Local Services 302 are services that are local to each pro-
cessor in the distributed system. These services provide local
functionality required for a processor to perform useful work
in a cluster. Examples of these types of services include but
are not limited to networking, local scheduling, timing, and
io inter process communications.
Cluster Synchronization 304 establishes a dependable dis-
tributed time base that is consistent across the entire system.
This service is based on a messagepassing technique and uses
local physical clocks at each component to form a logical
15 system clock. Preferably, Cluster Synchronization 304 is
scalable and efficiently establishes the time base across pro-
cessors. This time base may be used as a backbone for sched-
uling distributed operations across the cluster.
System Configuration Services 306 establish and control
20 the configuration of the cluster. The cluster configuration
comprises the system physical resources and logical capabili-
ties. The System Configuration Service interacts directly with
the EAFTC Fault Tolerant Node component. This in turn
communicates with Fault Tolerant Controller. EAFTC con-
25 troller sends its generated deployment via Fault Tolerant Con-
troller/Node to each processor's System Configuration Ser-
vice where deployment changes are finally effected.
System Monitoring Services 314 supplies the system with
an ability to dynamically assess the health of the cluster and
30 localize failed processors and application processes. Assess-
ments are made with a cluster wide perspective using distrib-
uted decision-making and integrated monitoring information
from across the cluster. Failure notifications from this service
may be forwarded to the EAFTC Health Monitor via the Fault
35 Tolerant Controller/Node components.
Process Group Management In one preferred approach for
enhancing the availability and dependability of payload
applications relies on replication. The set of replicated
instances are managed as a "process group." This is a peer-
40 to-peer entity in which the support services of each replica are
constantly checking the performance/behavior of its local
replica against that of its remote peers.
Scheduling provides a scheduling mechanism that is avail-
able to the hosted applications. This mechanism initially pro-
45 vides indications to application processes as to when to per-
form its execution cycle and when interaction with other
support services may be performed. This scheduling mecha-
nism is based on the common time base established through
cluster synchronization. Operations controlled by this sched-
50 uling service can be coordinated intime across all elements of
the cluster.
Data Integrity 308 provides consistent data sets across
replicas. A deviation from this consistent data by a replica is
to be interpreted as an error by that replica. This capability
55 allows hosted applications to expose internal state data facili-
tating warm starts of additional resources as they come on-
line. Additional replicas may join an established group by
adopting the internal state of the existing replicas.
RP 312 offers its services in a flexible manner, supporting
6o a distribution of applications that is not necessarily tied to the
physical realization of the cluster. In one preferred arrange-
ment, RP utilizes a clustering approach to manage a cluster
processor. Application replicates are hosted on each RP-En-
abled resource via RP Interface (RPI). This renders the appli-
65 cation "unaware" of the fact that it has been replicated, or to
what extent it has been replicated. RP works in the back-
ground to monitor application behavior and recognizing
US 7,840,852 B2
13
when a fault has resulted in application divergence. RP not
only provides dependability to hosted applications, but RP is
in-and-of itself dependable, capitalizing internally on the
same techniques and properties conveyed to hosted applica-
tions.	 5
The EAFTC system combines a set of innovative technolo-
gies to enable a system and/or method for the efficient use of
high performance COTS processors while these processors
operate in generally harsh space environments. An enhanced
level of performance may also be achieved while also main- 10
taining a certain required system availability. For example,
FIG. 9 illustrates one example 400 of applying the EAFTC
system illustrated in FIG. 1. On the left side of FIG. 9, a
particular satellite's orbit 402 is illustrated as comprising a set
of four regions. These regions comprise a first region 404, a 15
second region 406, a third region 408, and a fourth region 410.
Each region 404, 406, 408, 408, and 410 has associated there-
with a varied radiation environment. Although only four
regions and four radiation environments are illustrated, those
of ordinary skill in the art will recognize that more or less than 20
four regions may be employed.
As the EAFTC system travels through orbit from one
region to the next region, the system collects measurements of
the SEU Alarm response to the radiation. This SEU Alarm
response 414 is illustrated as a function of orbit position 25
(404a-410a), fluctuating as the space borne craft traverses
from one region to the next. The EAFTC system dynamically
creates regions based on these measurements and based in-
part on the on-board processing system's sensitivity to radia-
tion.	 30
As the EAFTC system enters and leaves a particular region,
the system dynamically configures the fault tolerance to
match the environment. The overall result is an increase in the
system's performance as depicted by curve 420. Curve 420
represents the EAFTC system's instructions per unit of 35
power, in this case Millions of Instructions Per Second Per
Watt ("MIPS/Watt"). When compared to a conventional sys-
tem designed for a worst case scenario, illustrated as a first
alternative line 422, the average performance of an EAFTC 
40
system illustrated as a black dotted line 424 will be higher.
Though the overall performance gain depends on a particular
orbit and the on-board processing system's sensitivity and
adaptability, EAFTC provides a solution that is just as good if
not better than the conventional approach.
45
Therefore, the EAFTC system as illustrated in FIG.1 miti-
gates faults, and in particular SEUs in COTS devices. Such
fault mitigation is accomplished while also increasing the
system's overall efficiency and capacity. EAFTC system 10
accomplishes this feat by optimally applying fault tolerance 50
over the life of the mission as demanded by the task criticality
and environmental measurements.
The proposed EAFTC system results in a novel technology
for on-board payload processing. The disclosed EAFTC is a
COTS based computing system architecture and associated 55
system control algorithms that together provide a reliable
on-board processing platform. Applicants' EAFTC system
senses an environment, assesses the fault threat presented by
the environment, and adjusts the processing system's fault
tolerance to thereby effectively mitigate certain threats pre- 60
sented by the environment. In this manner, EAFTC optimally
employs fault tolerance based on historical and environmen-
tal conditions. EAFTC can therefore also increase the overall
system efficiency, in terms of unit of computations per Watt.
Exemplary embodiments of the present invention have 65
been described. Those skilled in the art will understand, how-
ever, that changes and modifications may be made to these
14
embodiments without departing from the true scope and spirit
of the present invention, which is defined by the claims.
We claim:
1. A method of adapting fault tolerant computing, said
method comprising the steps of: measuring an environmental
condition representative of an environment; analyzing an on-
board processing system for sensitivity to said measured
environmental condition; and determining whether to recon-
figure a fault tolerance of said on-board processing system,
based at least in part on said measured environmental condi-
tion, with a controller comprising: an alert level generator
configured to evaluate a potential environmental threat based
on said measured environmental condition, the alert level
generator comprising an adaptive linear predictive filter that
generates a particle flux prediction; a deployment plan that is
user definable to describe one or more application tasks and
system thresholds; and a deployment generator configured to
receive data from the alert level generator and the deployment
plan.
2. The method of claim 1 further comprising the step of
reconfiguring said fault tolerance of said on-board process-
ing system based in part on said measured environmen-
tal condition.
3. The method of claim 2 wherein said fault tolerance of
said on-board processing system is reconfigured to match
said environment.
4. The method of claim 2 wherein said fault tolerance of
said on-board processing system is reconfigured based in part
on historical data.
5. The method of claim 1 wherein said step of measuring
said environmental condition occurs during an orbit position
of said on-board processing system.
6. The method of claim 1 wherein said on-board processing
system is included in a space borne asset.
7. The method of claim 1 wherein said step of measuring
environmental conditions comprises the step of detecting
radiation conditions.
8. The method of claim 1 wherein said step of measuring
said environmental condition comprises monitoring flux of
high-energy particles that cause single event upsets.
9. The method of claim 1 further comprising providing an
alarm suite, said suite providing an alarm signal representa-
tive of an environment threat.
10. The method of claim 1 further comprising the step of
collecting measurements of an alarm in response to said envi-
ronmental condition on a space borne spacecraft.
11. The method of claim 10 wherein said step of collecting
measurements of an alarm in response to said environmental
condition further comprises the step of
sensing proton and heavy-ion fluxes in space.
12. The method of claim 10 further comprising the step of
generating a particle flux prediction based in part on said
collected measurements of said environmental condi-
tion.
13. The method of claim 1 further comprising the step of
operating said on-board processing system in a plurality of
operational modes, said on-board processing system com-
prising a COTS processor.
14. The method of claim 1 wherein said step of determining
whether to reconfigure said fault tolerance of said on-board
processing system further comprises the step of
predicting a near future threat of said system.
15. A system for environmentally adaptive fault tolerant
computing, said system comprising: a sensor that senses a
characteristic of a dynamic environment and generates an
output signal based on said characteristic; a system configu-
ration controller that is executable by a processor and that
US 7,840,852 B2
15
receives said output signal, said controller assessing a poten-
tial environmental threat to an availability of said system
based at least in part on said output signal, said controller
comprising: an alert level generator configured to evaluate the
potential environmental threat, the alert level generator com-
prising an adaptive linear predictive filter that generates a
particle flux prediction; a deployment plan that is user defin-
able to describe one or more application tasks and system
thresholds; and a deployment generator configured to receive
data from the alert level generator and the deployment plan;
and a computing device that receives an input from said
controller; wherein a configuration of said computing device
is adaptedto effectively mitigate said potential environmental
threat to the availability of said system.
16. The system of claim 15 wherein said sensor comprises
an environmental sensor suite.
17. The system of claim 15 wherein said sensor senses
radiation conditions.
18. The system of claim 15 wherein said computing device
is a payload computer system.
19. The system of claim 15 wherein said controller assess-
ing said potential environmental threat to said availability of
said system is based in part on said input and also in part on
previously measured environmental conditions.
20. The system of claim 15 wherein said controller oper-
ates in a plurality of operational modes.
21. The system of claim 15 wherein said controller com-
prises a COTS processor.
16
22. A system for environmentally adaptive fault tolerant
computing, the system comprising: an environmental sensor
suite comprising: a plurality of sensors including at least one
single event upset alarm that provides an alarm signal repre-
5 sentative of a potential environmental threat; a controller that
is executable by a processor and in operative communication
with the environmental sensor suite, the controller compris-
ing: an environmental server configured to receive sensory
input signals from the plurality of sensors; a history database
io configured to contain prior sensor measurements; an alert
level generator configured to evaluate the potential environ-
mental threat based on the sensory input signals and the prior
sensor measurements, the alert level generator comprising an
adaptive linear predictive filter that generates a particle flux
15 prediction; a deployment plan that is user definable to
describe one or more application tasks and system thresholds;
a deployment generator configured to receive data from the
alert level generator and the deployment plan; and a computer
health monitor in operative communication with the deploy-
20 ment generator; and a computer comprising at least one data
processor in operative communication with the controller and
configured to effectively mitigate the potential environmental
threat; wherein the deployment generator of the controller is
configured to evaluate the potential environmental threat to
25 the computer based on data from the deployment plan, the
computer health monitor, and the alert level generator, and
send a deployment signal to the computer to counter the
potential environmental threat.
