A fault tolerant parallel computing architecture for remote sensing satellites by Lim, S
  
A thesis submitted in fulfilment of the requirements for 
the degree of Doctor of Philosophy    
Sharon Lim Siok Lin 
PhD   
School of Computer Science and Information 
Technology 
RMIT University 
August 2009
ii 
 
Declaration 
I certify that except where due acknowledgement has been made, the work is that of the 
author alone; the work has not been submitted previously, in whole or in part, to qualify for 
any other academic award; the content of the thesis is the result of work which has been 
carried out since the official commencement date of the approved research program; any 
editorial work, paid or unpaid, carried out by a third party is acknowledged; and, ethics
procedures and guidelines have been followed.
  
Sharon Lim Siok Lin 
31st August 2009
iii 
 
Acknowledgments 
I would like to thank my supervisor Prof. Heiko Schröder for all the professional advice and 
endless moral support that he has shown all these years. The whole research path has been 
enlightening and a very pleasant one too.
Special thanks to Prof Ian McLoughlin and Dr Timo Bretschneider for their ideas and help in 
the early phase of the project. This work contains many cross faculty research, and I need to 
thank my fellow colleagues Mok Yin Liong, Jin Zhanli and Zheng Jialing for their technical 
support in some parts of this work.  I would also like to express special gratitude to Philip 
Teng for proof-reading my thesis and for the valuable discussions we had. His excellent 
suggestions has helped in producing this thesis one in which I am truly proud of.
 
I would like to thank my husband Tay Shaw Wee for his constant support, encouragement 
and patience all these years, which has been important in the undertaking of this research. 
Thanks to my parents for their moral support over the years, and my lovely kids for giving 
me inspiration!  
iv  
Publications Resulting from this Thesis 
Portions of the material in this thesis have previously appeared in the following 
publications and presentations: 
 
[1] I.V. McLoughlin, V. Gupta, S. Singh, S.L. Lim, T. Bretschneider, Fault Tolerance Through 
Redundant COTS Components for Satellite Processing Applications , Proceedings of the 
Fourth International Conference on Information, Communications & Signal Processing and 
IEEE Pacific-Rim Conference on Multimedia, Singapore, 2003 
[2] S.L. Lim, T. Bretschneider, I.V. McLoughlin, and H. Schroder, Reconfigurable, Fault 
Tolerant and High Performance Payload for Space Missions , Proceedings of the 
International Conference on Military and Aerospace Programmable Logic Devices, 
Washington, 2003.
 
[3] Chua, C.Y., S.L. Lim, D.L. Douglas, High Performance, Reliable and Flexible Computing 
Payload for Space Missions , IEEE Tencon Conference, Chiangmai, 2004.
[4] S.L. Lim, H. Schröder, A Fault Tolerant Parallel Computing Architecture for Onboard 
Satellite Image Processing , RMIT Research Conference, 2004. 
[5] S.L. Lim, H. Schroder, A COTS Parallel Computing Architecture for Space Operating in a 
Linux-based Platform , RMIT Research Conference, 2005. 
[6] S.L. Lim, Y.L. Mok, Reliability Prediction of a Fault Tolerant COTS Parallel Computing 
Payload , Proceedings of the International Conference on Military and Aerospace 
Programmable Logic Devices, Annapolis, 2008.
[7] Internal XSat publication on XSat Reliability Modelling .
[8] Internal XSat Publication on XSat Radiation Analysis .
v  
Abbreviations Used  
CAN 
 
Controller Area Network  
COTS Commercial-off-the-shelf  
DSP Digital Signal Processors  
EAFTC Environmentally Adaptive Fault Tolerant Computing  
EDAC Error Detection and Correction  
EM  Engineering Model  
FM Flight Model  
FTMPA Fault Tolerant Mesh Processing Array  
GCRs Galactic cosmic rays  
GIPs Giga-instructions per second  
GPS Global Positioning System  
ISRO Indian Space Research Organisation  
ITAR International Traffic in Arms Regulations   
LEO Low Earth Orbit  
LVDS Low-Voltage Differential Signalling  
MIMD Multiple-Instruction Multiple-Data  
MIPS Million Instructions Per Second  
MOPS   Million Operations Per Second  
vi  
Abbreviations Used   
OBC 
 
Onboard Computer  
OS Operating System  
PN Processing Node  
PPC Power PC  
PPU Parallel Processing Unit  
QM Qualification Model  
QoS Quality-of-Service  
REE NASA Remote Exploration and Experimentation   
SDRAM Synchronous Dynamic Random Access Memory  
SEUs Single-event-upsets  
SIMD Single-Instruction Multiple-Data  
SSC Synchronous Serial Interface  
SSR Solid State Recorder  
SSTL 
 
Surrey Satellite Technology Limited  
TID Total Ionising Dosage  
TMR Triple Modular Redundancy  
VTGB Variable Time Global-Slotted Bus  
vii  
Table of Contents 
Declaration ..................................................................................................................................................................
 
ii 
Acknowledgments ..................................................................................................................................................iii 
Publications Resulting from this Thesis ........................................................................................................ iv 
Abbreviations Used ................................................................................................................................................. v 
Table of Contents ...................................................................................................................................................vii 
Table of Figures .................................................................................................................................................... xiii 
List of Tables ...........................................................................................................................................................xvi 
Abstract ........................................................................................................................................................................ 1
1 Introduction ...................................................................................................................................................... 2
1.1 Motivation ................................................................................................................................................ 2
1.2 Research Test-Bed ................................................................................................................................ 3
1.3 XSat Mission Requirements .............................................................................................................. 5
1.3.1 Physical and Power Constraints ........................................................................................... 5
1.3.2 Reliability Requirements ......................................................................................................... 5
1.3.3 Re-configurability Requirements ......................................................................................... 5
1.3.4 Software Requirements ............................................................................................................ 6
1.3.5 Remote Sensing Mission Requirements ............................................................................ 6
1.4 Research Focus and Contribution of this Thesis ...................................................................... 6
1.4.1 First Research Question ........................................................................................................... 7
1.4.2 Second Research Question ...................................................................................................... 7
1.4.3 Third Research Question ......................................................................................................... 7
1.4.4 Fourth Research Question ....................................................................................................... 8
1.4.5 Fifth Research Question ........................................................................................................... 8
1.4.6 Main Contributions ..................................................................................................................... 9
1.5 Thesis Outline ...................................................................................................................................... 10
2 Literature Review ....................................................................................................................................... 12
viii  
2.1
 
Introduction ......................................................................................................................................... 12
2.2 Survey of Space Processing Platforms ...................................................................................... 12
2.2.1 Traditional Radiation-hardened Processor Roadmap .............................................. 12
2.2.2 COTS Processor Roadmap .................................................................................................... 14
2.2.3 Space Computing Platforms Based on COTS Processors ......................................... 15
2.2.4 Fault Tolerance and Fault Handling Techniques for COTS_Based Space 
processor boards ......................................................................................................................................... 17
2.3 Survey of Parallel Processing Architectures ........................................................................... 18
2.3.1 Parallel Processing Models .................................................................................................. 18
2.3.2 Communication Structures for Parallel Processing ................................................... 19
2.3.3 Fault Tolerant Networks ....................................................................................................... 20
2.4 Survey of COTS based Parallel Computing Modules Developed for Space Missions 
20
2.4.1 Scalability .................................................................................................................................... 21
2.4.2 MIPS/Watt Performance....................................................................................................... 22
2.4.3 System Reliability..................................................................................................................... 23
2.4.4 Application Support Level .................................................................................................... 24
2.4.5 Cost Effectiveness .................................................................................................................... 25
2.5 Technology Gaps in Parallel Processing for Space ............................................................... 26
2.6 Conclusion ............................................................................................................................................ 27
3 Design Methodology of a Low Cost, High Performance and Fault Tolerant Payload ...... 28
3.1 Onboard Computing Platform Design Methodology ........................................................... 28
3.2 PPU Design Approach ...................................................................................................................... 28
3.3 PPU Architectural Tradeoffs and Design .................................................................................. 29
3.3.1 Parallelism Structure .............................................................................................................. 30
3.3.2 Software Reconfiguration ..................................................................................................... 30
3.3.3 Fault Tolerance ......................................................................................................................... 30
3.3.4 Software Development .......................................................................................................... 31
3.3.5 Conceptual Diagram ................................................................................................................ 31
3.4 PPU Component Selection Process ............................................................................................. 33
ix  
3.4.1
 
Step 1: Deriving a possible component set.................................................................... 34
3.4.2 Step 2: PCB component placement checks .................................................................... 40
3.4.3 Step 3: Power Operation Scenario Profiling ................................................................. 40
3.4.4 Step 4: Thermal analysis to check for component temperature violations ..... 42
3.5 PPU Radiation Analysis ................................................................................................................... 43
3.5.1 Radiation Effects and Analysis in the PPU ..................................................................... 44
3.5.2 Analysis of XSat Radiation Environment ........................................................................ 44
3.5.3 Implication of XSat Radiation Environment Analysis Results ............................... 49
3.5.4 Radiation Analysis on AX1000s Actel FPGA ................................................................. 50
3.5.5 Radiation Mitigation Techniques for AX1000.............................................................. 51
3.6 Part Reliability Modelling ............................................................................................................... 52
3.7 System Reliability Modelling and System Trade-off ........................................................... 53
3.8 Conclusion ............................................................................................................................................ 59
4 PPU Fault Tolerance and Fault Management ................................................................................... 60
4.1 Introduction ......................................................................................................................................... 60
4.2 PPU Hardware Redundancies and Reconfiguration ............................................................ 61
4.3 Communication Network Fault Handling ................................................................................ 62
4.3.1 PPU Communication Network Topology ........................................................................ 62
4.3.2 Introduction to the Variable Time Global-slotted Backplane (VTGB) Bus....... 63
4.3.3 VTGB Framing Structure ....................................................................................................... 65
4.3.4 VTGB Fault Tolerant Features ............................................................................................ 71
4.4 Processor Fault Handling ............................................................................................................... 73
4.5 Flash Chip Fault Handling .............................................................................................................. 75
4.6 Network Element Fault Handling ............................................................................................... 76
4.7 Reconfiguration of the Parallel Processing Architecture .................................................. 76
4.7.1 Fault Tolerant Mapping of Virtual-to-Physical Resources ...................................... 76
4.7.2 Selection of System Node ...................................................................................................... 77
4.7.3 Graceful Degradation .............................................................................................................. 77
4.7.4 Multiple Code Redundancy .................................................................................................. 78
x  
4.7.5
 
Operational Trade-offs ........................................................................................................... 78
4.8 Software Fault Tolerance ............................................................................................................... 79
4.9 Verification of the Fault Tolerant Computing Architecture ............................................. 79
4.9.1 Verification of Quality of Service ....................................................................................... 79
4.9.2 Verification of Operation Scenarios.................................................................................. 81
4.10 Conclusion ............................................................................................................................................ 84
5 Fault Tolerant Mesh Processing Array ............................................................................................... 85
5.1 Introduction ......................................................................................................................................... 85
5.2 PPU Parallel Processing and Communication Needs .......................................................... 85
5.3 Concepts of a Fault Tolerant Mesh Processing Array (FTMPA) ..................................... 86
5.4 FTMPA Theoretical Framework .................................................................................................. 88
5.4.1 Part 1: The FTMPA Physical Network Graph................................................................ 88
5.4.2 Part 2: FTMPA Global Remapping Strategy................................................................... 89
5.4.3 Part 3: FTMPA Local Remapping Strategy..................................................................... 90
5.4.4 Global Mapping versus Local Mapping ........................................................................... 90
5.5 FTMPA Global Remapping Strategy ........................................................................................... 91
5.5.1 Working Processor Set ........................................................................................................... 91
5.5.2 Global Remapping Algorithm .............................................................................................. 92
5.5.3 Proof of FTMPA Global Remapping Algorithm ............................................................ 96
5.6 Local Remapping Strategy ...........................................................................................................103
5.6.1 Local Processor Mapping ....................................................................................................103
5.6.2 Local Interconnection Switch Network ........................................................................103
5.6.3 Local Switch Configuration Mapping .............................................................................105
5.7 Verification of the FTMPA ............................................................................................................108
5.7.1 Verification Approach ..........................................................................................................108
5.7.2 Verification Conditions ........................................................................................................109
5.7.3 Verification Cases ...................................................................................................................110
5.8 Conclusion ..........................................................................................................................................112
6 A Practical High Performance Computing Platform ...................................................................113
xi  
6.1
 
Introduction .......................................................................................................................................113
6.2 PPU Hardware Schematic Design ..............................................................................................113
6.2.1 FPGA Chip and Package Selection ...................................................................................113
6.2.2 FPGA Testing Design Considerations .............................................................................115
6.2.3 Meeting High Density Form Factor Requirement .....................................................116
6.2.4 Hardware Interface Design Consideration ..................................................................116
6.2.5 Power-up Consideration .....................................................................................................117
6.3 PPU PCB Routing Considerations ..............................................................................................118
6.4 PPU PCB Manufacturing Considerations ................................................................................118
6.5 FPGA Firmware Design .................................................................................................................119
6.5.1 Overview of FPGA Design ...................................................................................................119
6.5.2 Meta-Stability ...........................................................................................................................121
6.5.3 CAN Interface Protocol ........................................................................................................121
6.5.4 SSR Interface Protocol ..........................................................................................................122
6.5.5 Runtime Configuration of Data Distribution from SSR ...........................................123
6.5.6 Flash Interface Protocol ......................................................................................................124
6.5.7 SA-1110 Interface Protocol................................................................................................126
6.6 PPU Application Software Interface .........................................................................................127
6.6.1 Booting Process ......................................................................................................................127
6.6.2 SA-1110 Communication Mechanism with FPGA.....................................................129
6.6.3 Parallel Processing Paradigm ...........................................................................................130
6.7 Conclusion ..........................................................................................................................................131
7 PPU Qualification Process ......................................................................................................................132
7.1 Introduction .......................................................................................................................................132
7.2 Qualification Process Flow ...........................................................................................................132
7.3 PPU QM Module Level Functional Testing ............................................................................133
7.3.1 Introduction .............................................................................................................................133
7.3.2 Debugging Facility incorporated in the PPU Hardware .........................................134
7.3.3 Power Module Interface Test ............................................................................................136
7.3.4 CAN Link Interface Test .......................................................................................................137
xii  
7.3.5
 
LVDS Link Interface Test .....................................................................................................138
7.3.6 Environmental Stress Screening (ESS) Qualification..............................................139
7.4 Integration of the PPU QM into Flat Satellite .......................................................................140
7.5 Thermal Environmental Test (ISRO).......................................................................................142
7.6 Vibration Test (ISRO).....................................................................................................................144
7.7 Conclusion ..........................................................................................................................................145
8 Conclusion ....................................................................................................................................................146
8.1 Research Achievements ................................................................................................................146
8.1.1 Achieving a Feasible Design Approach for a Space Computing Payload for 
High Computational Performance and Reliability .......................................................................146
8.1.2 Achieving an Efficient Fault Tolerant Communication Network Implemented 
on a FPGA Platform...................................................................................................................................147
8.1.3 Achieving a Fault Tolerant Processing Array Design Specifically for Remote 
Sensing Missions ........................................................................................................................................147
8.1.4 Demonstrating a Practical and Feasible Implementation of a Computing 
Payload on a LEO Satellite mission ....................................................................................................148
8.1.5 Qualifying PPU Design and Implementation for Launch and Operation in 
Space Environment...................................................................................................................................149
8.2 Transiting Commercial components for Space-borne Implementation ...................150
8.2.1 The PPU as a Demonstration Test Bed ..........................................................................150
8.2.2 The PPU as a State of the Art .............................................................................................151
8.3 Future Work .......................................................................................................................................153
8.4 Final Conclusion ...............................................................................................................................155
9 Appendix A: Part Failure Rates for the PPU (in FITS) ................................................................156
10 References ...............................................................................................................................................160
 
xiii  
Table of Figures 
Figure 1-1: XSat System Configuration
 
........................................................................................................... 4
Figure 3-1: Conceptual Diagram..................................................................................................................... 32
Figure 3-2: Component Selection Flow Chart ........................................................................................... 35
Figure 3-3: PPU Hardware Design ................................................................................................................. 39
Figure 3-4:  Inter-cluster network configuration options .................................................................... 40
Figure 3-5:
 
(a) Top Placement View   (b) Bottom Placement View ........................................... 40
Figure 3-6: PPU PCB Component Zoning for Power/Thermal Analysis......................................... 41
Figure 3-7: The PPU Thermal Plot for Zone 2 ........................................................................................... 43
Figure 3-8: World distribution of Electron Flux encountered in the XSat orbit ......................... 45
Figure 3-9: Integral Electron and Proton Fluxes for Different Orbits ............................................. 46
Figure 3-10: Electron and Proton Fluxes for XSAT Orbit at different Altitudes ......................... 46
Figure 3-11: Solar Proton Fluences for XSAT............................................................................................ 47
Figure 3-12: Dose Depth Curve for Silicon Target behind Al Shielding.......................................... 48
Figure 3-13:  Geometrical model of XSat using the SPENVIS Software .......................................... 49
Figure 3-14: Cross Section Vs LET Curve for AX1000 R-Cells ............................................................ 50
Figure 3-15: Functional Diagram of EDAC RAM....................................................................................... 52
Figure 3-16: The PPU Reliability Block Diagram plotted using Reliasoft software................... 54
Figure 4-1: Variable Time Global-slotted Backplane (VTGB) Bus spanning across all four 
processing clusters ............................................................................................................................................... 65
Figure 4-2: VTGB Short (a) and Long (b) Message packet formats.................................................. 66
Figure 4-3: Processor Fault Handling Routines in the FPGA .............................................................. 74
Figure 4-4: PPU Operation Flow ..................................................................................................................... 81
xiv  
Figure 5-1: Mesh Processing Array ................................................................................................................
 
87
Figure 5-2: A practical mapping of physical processors and FPGA resources to a logical mesh 
processor array ...................................................................................................................................................... 87
Figure 5-3: A parallel processing architecture with a physical processor network (a), from 
which a logical mesh processor network (b) is constructed. ............................................................. 89
Figure 5-4: Global Mapping Partitions ......................................................................................................... 90
Figure 5-5: FPGA Label ....................................................................................................................................... 94
Figure 5-6: (a) A mapping scenario where (A,B,C,D) = n2
 
(b) A mapping scenario that 
illustrates vertical partition line shift (c) Mapping path definition (d) A mapping scenario 
that illustrates both vertical and horizontal partition line shifts ...................................................... 94
Figure 5-7: (a) illustrates the mesh partition diagram for mapping scenario 0 < F < n (b) 
illustrates one possible mapping for this scenario ................................................................................. 98
Figure 5-8: (a) illustrates the mesh partition diagram for mapping scenario F= n (b) 
illustrates one possible mapping for this scenario ...............................................................................100
Figure 5-9: (a) illustrates the mesh partition diagram for mapping scenario n < F 2n (b) 
illustrates one possible mapping for this scenario ...............................................................................101
Figure 5-10: (a) illustrates the mesh partition diagram for mapping scenario 2n/3 F<0, (b) 
illustrates one possible mapping for this scenario ...............................................................................102
Figure 5-11: Local Mapping ............................................................................................................................103
Figure 5-12: Time-multiplexed Switch ......................................................................................................104
Figure 5-13: Inter-processor Communication Link Scanning Sequence......................................106
Figure 5-14: 4-bit Switch Configuration Identifier ...............................................................................107
Figure 5-15: Processor and Link Mappings in the Mesh.....................................................................109
Figure 6-1: FPGA CQ352 and FBGA484 Package Comparison .........................................................114
Figure 6-2: FPGA Prototype Board with AX FPGA Sockets................................................................115
Figure 6-3: FPGA Architectural Block Diagram......................................................................................120
Figure 6-4: SSC Interface with the C515C Microcontroller ...............................................................122
Figure 6-5: SSR Interface Definition............................................................................................................123
xv  
Figure 6-6: Header Frame Formats for (a) Image Distribution (b) Code Upload.....................
 
124
Figure 6-7: FPGA interface to the triple flash chip ................................................................................125
Figure 6-8: Serial Flash VTGB Command Types.....................................................................................125
Figure 7-1: Debugging Facility on the PPU Main Electronic Board................................................134
Figure 7-2: PPU Daughterboard....................................................................................................................135
Figure 7-3: (a) Daughter-board connection to logic-analyser (b) Daughter-board connection 
to Silicon-explorer ..............................................................................................................................................136
Figure 7-4: NI PXI-6562 Instrument ...........................................................................................................139
Figure 7-5: Thermotron SE-360-2-2 Test Set-up ...................................................................................139
Figure 7-6: The PPU Temperature Cycling Profile ................................................................................140
Figure 7-7: Flat Satellite Test-bed ................................................................................................................141
Figure 7-8:  XSat Fully Assembled................................................................................................................141
Figure 7-9: ISRO Thermal Chamber Test Preparations ......................................................................142
Figure 7-10: Temperature sensor locations during Thermal Vacuum Test ...............................143
Figure 7-11: XSat Thermal Test Profile......................................................................................................143
Figure 7-12: Thermal Plot for the PPU.......................................................................................................144
Figure 7-13: Sine and Random Vibration Tests......................................................................................145
xvi   
List of Tables 
Table 2-1
 
: Radiation-hardened Processors ............................................................................................... 13
Table 2-2: COTS processors .............................................................................................................................. 14
Table 3-1: Component Selection Checklist ................................................................................................. 34
Table 3-2: Comparing the PPU Component Characteristics with Space Grade Options .......... 36
Table 3-3: Power Operation Scenario Profiling........................................................................................ 41
Table 3-4: The PPU Thermal Simulation Results for the various zones......................................... 42
Table 3-5:  Radiation Data for AX1000S...................................................................................................... 51
Table 3-6: Probability of N number of SA-1110s in Operation (duty cycle of 10%)................. 56
Table 3-7: Probability of N number of SA-1110s in Operation (duty cycle of 100%).............. 58
Table 4-1: VTGB Command Directives ......................................................................................................... 67
Table 4-2: VTGB Protocol Machine................................................................................................................ 68
Table 4-3: VTGB Event/Action Description ............................................................................................... 69
Table 4-4: Quality of Service Level ................................................................................................................ 80
Table 5-1: Global Clock Cycle Time-multiplexed Activities ...............................................................105
Table 5-2: Processor Switch Configuration Table .................................................................................107
Table 5-3: Global Link Switch Configuration Table ..............................................................................108
Table 6-1: FPGA Input / Output (I/O) Resource ....................................................................................114
Table 6-2: Chips with BGA/Leadless Package.........................................................................................116
Table 6-3: FPGA Pin Allocation......................................................................................................................117
Table 6-4: Entity Description .........................................................................................................................119
Table 6-5: SA-1110 Memory_mapped Read Registers in FPGA.......................................................126
Table 6-6:  SA-1110 Memory Mapped Write Registers in FPGA .....................................................127
xvii  
Table 6-7: Flash Address Allocation
 
............................................................................................................128
Table 6-8: SDRAM Address Allocation .......................................................................................................128
Table 7-1: Power Interface Test Results ...................................................................................................137
Table 8-1: The PPU Specifications Summary ...........................................................................................151
1   
Abstract 
This thesis is concerned with the design concepts of a fault tolerant, high performance 
parallel computing payload for remote-sensing missions. Current small satellite missions 
generally do not have high computational power onboard due to limitations of power, 
space, volume or budget. This thesis researches on a cost-effective way of designing space 
computing architectures that enable reliability, despite the usage of Commercial-off-the-
shelf (COTS) components.
The COTS-enabling technology from this work has achieved a high reliability figure for the 
PPU computing payload, designed using commercial grade processors, Field Programmable 
Gate Arrays (FPGAs), memory chips and serial flash chips. The optimal usage of resources in 
the PPU has made it a valuable high performance computing resource for small satellite 
missions. The PPU s computational power will enable a new class of space applications for 
small satellite missions. 
The computing payload proposed in this thesis is a parallel cluster of COTS processing 
nodes, interconnected using network elements that are based on COTS FPGAs. Part of this 
research work has been adopted for use in the Parallel Processing Unit (PPU) - a secondary 
payload onboard the XSat micro-satellite. The XSat is built by the Centre of Research for 
Satellite Technologies (CREST), and scheduled for launch in 2011. The satellite centre is 
located in Nanyang Technological University, Singapore. The author is a full-time project 
member in this centre, in charge of the PPU payload development. 
The computing payload uses parallelism of COTS processor nodes to achieve high 
computing performance, and fault tolerant schemes to maintain reliability. This thesis 
focuses on the provision of highly fault tolerant and reconfigurable networks that enable 
reliable communication not only among parallel processors, but also with memory chips 
and external interfaces. Provision of multiple communication schemes that consist of an 
inter-cluster ring network and mesh processor array, both of which are fault tolerant and 
reconfigurable, have given the payload a high probability of survival in the harsh space 
radiation environment. This is coupled with autonomous processor fault detection and 
recovery schemes. 
The PPU computing payload is also highly adaptive to changing reliability and computation 
needs, allowing a trade-off between the two at mission runtime. The PPU adopts industrial 
standards for part reliability computation and system reliability modelling. The PPU s
system reliability figure is a valuable check that the extent of fault tolerance is sufficient yet 
not over-catered. Over-catering of fault tolerant paths results in unnecessary wastage of 
valuable and expensive resources onboard the satellite.
  
2  
1 Introduction 
1.1 Motivation 
This thesis is undertaken to research on the architectural design of a high performance 
space computing platform suitable for small satellite missions. The motivation behind this 
work is the current trend towards increased autonomy on-board the satellite, as well as the 
usage of more complex and accurate onboard instruments. This trend has led to a rapid 
increase in the amount of data to be acquired, processed and stored onboard the spacecraft.  
The traditional approach of receiving and processing the data on the ground using powerful 
computers is now constrained by the limited downlink transmission bandwidth available. 
Thus, newer, high performance onboard processing and supercomputing capabilities are of 
interest to satellite builders or space agencies, to handle the massive amounts of data 
volume generated from onboard instruments, before it is transmitted to the ground. 
At the same time, there is a growing trend in the use of small satellites [1,2] in space 
missions as alternatives to large conventional satellites. Rapid advancements in small 
satellite technologies have resulted in lower development cost and faster development time.  
Space access is more affordable now for many countries - especially with the entrance of 
new players in the satellite launch industry. However, as small satellites have limited 
resources of power, volume and mass, as well as lower cost budgets, there are challenges to 
designing computing platforms for them. The aim of the research is to propose enabling 
technologies for high performance computing onboard low cost small satellites. Such 
capability will add tremendous value to the small satellite missions as they move towards 
greater autonomy and intelligence, and perform tasks otherwise thought impossible for 
these satellites.
As onboard processing platforms need to survive the entire mission phase in a harsh 
radiation environment, they are traditionally implemented with space-grade components 
that are not susceptible to radiation damage in space. However, due to limited demand and 
high engineering costs, component vendors are not motivated to improve the technology of 
space-grade components at the same pace as conventional components [3]. Hence space 
grade components have generally failed to keep up in many aspects when compared to 
standard commercial-off-the-shelf (COTS) components. Besides, space-grade components 
have many drawbacks in terms of higher prices, increased size and larger power 
consumption. For example, COTS processors are generally at least one order of magnitude 
faster than their radiation-hardened counterparts, despite using fewer resources of power 
3  
and space. Obviously, the usage of COTS components in space missions is advantageous if 
reliability and fault tolerance in the face of radiation and the harsh environment can be 
guaranteed. Therefore, an important research direction for this project is to explore 
computing architectures built using COTS components, which can achieve overall system 
reliability through a combination of fault tolerant techniques.
One possible approach is to adopt a design philosophy that combines the use of COTS 
components with ample component redundancy and faulty component replacement 
schemes, so that there is no single point of failure [4]. Parallelism of processors as a means 
of high performance computing and as a suitable platform to incorporate component 
redundancy is specifically researched in this project. 
This thesis basically combines the concept of parallel computing and fault tolerant 
computing to propose a cost-effective design of a new generation of powerful onboard 
computers for space use. The thesis discusses the architecture, design process and 
methodology for the development of such onboard space computers. 
1.2 Research Test-Bed 
This research started as a result of the collaboration between Nanyang Technological 
University (NTU) and the Royal Melbourne Institute of Technology (RMIT) to research on 
various architectures for a space computing platform onboard NTU s first experimental 
micro-satellite, XSat. The result of this research was used in the actual development and 
implementation of a computing platform, that will be flown as an experimental payload 
onboard XSat. This experimental payload is called the Parallel Processing Unit (PPU), and 
will serve as the research test-bed for the concepts proposed in this thesis.
XSat is a 120kg Polar and Sun-synchronous Low Earth Orbit (LEO) micro-satellite 
developed by CREST (Centre of Research for Satellite Technologies). CREST is a joint project 
team set up by Nanyang Technological University and the DSO National Laboratories of 
Singapore.  The XSat mission is a technology demonstration that will support three 
payloads, one primary and two secondary. The primary mission is to carry out image 
acquisition, storage and download of images from a multi-spectral camera payload, called 
IRIS. One secondary mission is to demonstrate high performance computing through the 
Parallel Processing Unit (PPU) payload, a platform capable of supporting the primary 
mission and other on-board applications that require high computing power. The other 
secondary mission is to assist Germany s Deutsches Zentrum für Luft- und Raumfahrt (DLR) 
in carrying out precision GPS navigation experiments, using the GPS receiver provided by 
DLR. This satellite will be launched in India in 2011, using a Piggy-back launch with the 
Polar Satellite Launch Vehicle (PSLV) at Sriharikota.  The PSLV launch vehicle will deliver 
the spacecraft into a planned sun-synchronous orbit of 817km altitude.
 
4  
Figure 1-1: XSat System Configuration
The Parallel Processing Unit (PPU) payload is a research payload used to demonstrate high 
performance computing in space.  It contains 20 StrongARM processors connected to 
perform parallel processing on the images captured by the IRIS camera. The XSat Space 
Segment system configuration is shown in Figure 1-1.  The PPU is connected to the Solid 
State Recorder (SSR) where the IRIS images are stored after acquisition.  The SSR is also 
used to store the processed image for subsequent downlink to the ground reception station.
As this project involved practical development of a computing payload for launch in the 
XSat mission, it had to consider mission-specific specifications and requirements.  It also 
5  
had to follow space manufacturing processes, environmental testing of payload under 
simulated space conditions of temperature, vacuum, vibration etc, to ensure that it was fully 
qualified for space usage.  The research concepts and implementation had to fit within the 
payload development constraints, in terms of functionality and computational power.  
However, the research undertaken in the thesis proposed more functionality and 
computation power than required by the XSat mission.  It was also generalized for future 
missions in mind. As a research test bed, the implementation and launch of the PPU payload 
will demonstrate the practicality of the computing architectures proposed in the research.
1.3 XSat Mission Requirements 
As a secondary payload onboard XSat, the PPU payload has to meet resource constraints 
and requirements imposed by the XSat mission, and also has to meet XSat mission needs.
1.3.1 Physical and Power Constraints 
XSat adopts a tray structural housing for all its major electronic Printed Circuit Boards 
(PCBs). The PPU hardware has to conform to a tray size form factor of 36cm by 29cm and a 
mass not greater than 1.8kg. The XSat mission limits the PPU peak power consumption to 
within 40W. The temperature profile of the payload for 50 minutes of continuous operation 
at peak power should be verified to meet thermal constraints of components. The PPU 
budget for component and fabrication costs is about US$80000.
1.3.2 Reliability Requirements 
In the PPU, COTS components are used instead of radiation-hardened components that are 
typically used for space applications. Despite using COTS components, the design goal for 
the PPU is to achieve an overall system reliability of at least 0.9 for 1 Giga-instructions-per-
second (GIPS) of computation power. 
1.3.3 Re-configurability Requirements 
The PPU is envisioned to be a payload that can be used to test and evaluate new image 
processing algorithms and software fault tolerance schemes onboard the satellite. The 
results of the evaluation will be important for future satellite missions. The PPU payload 
supports uploading of new parallel processing applications that can be designed even after 
the satellite is launched. This provides users of the payload with flexibility in re-
programming and re-configuring the parallel processors to support new parallel processing 
algorithms and applications.  
All software components in the PPU including the type of operating system, the boot-
loading codes and software applications have to be up-loadable in orbit. That is, the PPU is 
totally software reconfigurable in every possible aspect.
6  
1.3.4 Software Requirements 
There is a desire to make the PPU a generic computing platform for application users so that 
it can provide application developers with a familiar application environment. Since many 
image application developers use Linux operating system (OS), which is an open-source OS, 
the PPU architecture focuses on supporting applications running on this operating system.
The use of the Linux OS ensures that any C or C++ software that can be compiled and run 
correctly on a desktop Linux system can be easily ported to execute in the PPU. Extensive 
software verification of the application codes is not needed, as long as the underlying 
operating system and its libraries have been correctly validated. 
Though the PPU architecture is flexible enough to run other operating systems, Linux OS is 
the main operating system currently used for all code development in the PPU [5]
1.3.5 Remote Sensing Mission Requirements 
One of the PPU s design requirements is to support and enable on-board processing of 
remote sensing data from an instrument (like an electro-optical camera).  Remote sensing 
instruments can generate huge amounts of data, especially when there are many spectral 
bands involved, and the collection duration is long.  XSat mission, for example, which has a 
three band multi-spectral optical camera with a swath width of 50km and a resolution of 
10m, generates 10Mbytes/ sec of imaging data. Hyper-spectral remote sensing missions 
which are generally composed of data sets with about 100 to 200 spectral bands of 
relatively narrow bandwidths (5-10 nm), generate even greater amounts of data.  The 
design and implementation of on-board computing systems for collecting, storing and 
analyzing remote sensing data onboard the satellite will have to take into consideration the 
high demands placed in terms of computational power and memory.
Hence in order to meet near real time processing requirements, images from the imaging 
payload are usually sub-divided into linear strips or square tiles for simultaneous 
processing in different processors to meet processing speed requirements. Sometimes, 
information of image boundaries has to be communicated across the different processing 
elements to collectively perform an algorithm such as detection of landscape changes from 
satellite images collected over different passes. Such an application requires an underlying 
inter-processor communication structure to support the real time, parallel processing of the 
image data across multiple processors. 
1.4 Research Focus and Contribution of this Thesis 
This project researches on an enabling technology for low cost small satellites to have high 
performance, fault tolerance and reconfigurable computing capability onboard. It aims to 
answer five research questions.
7  
1.4.1 First Research Question  
The first research question is phrased as follows: 
What is a feasible design approach for a space computing payload that can achieve 
computational performance and reliability?
In answering this research question, the thesis will take into consideration the processing 
power offered by COTS components and study the reliability provided.  The contribution of 
the thesis is to investigate the unique approach of using parallel processing as a means to 
satisfy both performance and reliability requirements. The feasibility of such a design 
approach is established by evaluating the computing performance and reliability 
quantitatively. This research question is addressed primarily in Chapter 3.
1.4.2 Second Research Question  
The second research question is phrased as follows:
What implementations of reconfigurable communication network, fault detection and fault 
management schemes are possible and efficient on the FPGA platform? 
In this research question, the communications  network and fault tolerance schemes are 
limited to FPGA implementation, as it is a platform of choice to provide the necessary 
interconnect between parallel entities in a cost-effective and flexible manner.  The focus of 
the research will be to explore an implementation that can support heterogeneous 
components, and achieve the necessary performance and reliability required for the 
mission. The techniques used ensure dynamic response to fault conditions at runtime, and 
are transparent to the end application user. Essentially this is to ensure the design of a high 
availability payload and ease of usage. This research question is addressed in Chapter 4.
1.4.3 Third Research Question  
The third research question is phrased as follows: 
What design of a fault tolerant processing array is possible and specifically catered for a 
remote sensing mission?  
Since onboard image analysis and compression constitute the primary application set in 
remote sensing, linear and mesh processing arrays are applicable parallelism techniques. As 
a linear array is a subset of a mesh, the research concentrates on the design methodology 
for a mesh processing array topology. The proposed architecture can be easily adapted for 
programs with linear structures [6]. It takes advantage of the greater mapping quality that 
arises when a program structure of a lower node degree and a parallel computing system 
graph of a higher node degree are considered [7].
8   
In this processing technique, the image processing load can be divided over several 
processing elements to achieve processing speed close to real time.   However, in the face of 
faults, the processing network will have to be reconfigured.  The successful reconfiguration 
of a network is a NP hard problem. In response to this research question, the objective will 
be to explore and propose a feasible scheme to reconfigure the parallel processing network 
in the face of faults.  A sample size of 4 network nodes and 20 processors are used, as this 
can be implemented within the XSat mission constraints.
Basically a framework for an arbitrary large mesh network that can be dynamically 
reconfigured in orbit is developed [8,9]. Reconfiguration is used not only to handle the in-
orbit processor faults, but also to adapt to the topological needs of different image 
processing applications. This is translated in hardware as a FPGA switch structure design 
that connects a large parallel network of COTS processors to form a logical mesh array 
topology. This network is configurable, and the specific mapping of the logical cell in the 
mesh to the actual physical processors can be changed in the presence of faults. 
Under no fault conditions, the mapping of the physical processors to the logical processor 
cells is optimised. But under faults, the state of the physical processor graph changes. The 
change in the physical processor graph makes the optimal remapping of the physical 
processor graph to the logical processor graph an NP-complete problem [10,11]. 
Hence the research topic considers heuristics to find mapping solutions for the fault 
tolerant mesh in the presence of faults. This research is unique as it proposes a 
reconfigurable mesh processing array that can span across different FPGAs. This research 
question is addressed in Chapter 4.
1.4.4 Fourth Research Question  
The fourth research question is phrased as follows: 
What are the practical hardware, software and protocol considerations in the design of a 
computing payload that is both practical and efficient, and that can be flown and operable 
on a small LEO satellite mission?
The practicality of the design methodology described in Chapter 3 has to be shown and 
illustrated through the actual development of the payload. The specific hardware and 
software considerations that impact the practicality of the design will be investigated in 
Chapter 6 of this thesis. The resource utilisation for the computing payload shall be an 
important assessment of the effectiveness of the proposed design architecture. The features 
that enhance ease of operability will also be investigated. 
1.4.5 Fifth Research Question  
The fifth research question phrased as follows:
9  
What is the approach to qualify a COTS-based computing payload for use in space, under 
simulated space environment, as well as under fault conditions?  
This research question focuses on the approach and means to test the payload for 
survivability in the harsh space environment. In space, the payload can be subjected to hot 
and cold conditions and frequent temperature cycles. It can also experience huge 
mechanical forces during launch, and material out-gassing in space.  Due to the use of COTS 
components, there is always a likelihood of component failures due to radiation or 
component weakness.  The validation of the computing platform will have to address fault 
scenarios.
Hence qualification of the computing payload for operation in space is a necessary and 
important area that can reveal design weaknesses and finally to ensure payload reliability. 
This research question is addressed in Chapter 7.
1.4.6 Main Contributions 
The main contributions of this thesis are as stated below:
1. The PPU is the first computing payload to adopt the use of massive parallelism at every 
level (processor, memory, interconnect, interface) to overcome possible failures due to 
the use of COTS components in the harsh space environment.
2. This research results in a design approach that combines the use of reliability 
modelling, fault tolerance schemes and radiation analysis to achieve a feasible parallel 
architecture which interconnects heterogeneous elements to achieve a robust 
computing design.
3. The research has developed a communication network design that forms the basis of 
the fault tolerant scheme that spans across multiple network elements to support a 
scalable processor array topology.
4. The research has also targeted remote sensing applications that require communication 
intensive tasks.  The resultant array interconnect design enabled a new class of on-
board processing applications for this and future missions, to carry out intelligent 
remote sensing.
5. The research focussed on an innovative design for a dynamically reconfigurable fault 
tolerant mesh processing array that can span across multiple network elements. 
Current literature focuses on symmetric mesh array design implemented in a single 
network element, whereas this thesis focuses on an asymmetric mesh array design that 
spans across network elements. The mesh remapping algorithm takes into 
consideration the higher cost of inter network element links, as compared to links 
within one network element.
10
  
6. The PPU is the first and only technology demonstration of a massive COTS parallel 
computer in space on a LEO microsatellite, with 20 StrongARM processors providing up 
to 4700 Dhrystone 2.1 MIPS of peak computational power. 
7. This range of computation power far exceeds most existing satellite missions in the LEO 
micro-satellite domain, which has limitations of power, space and cost. A comparison of 
onboard processing platforms in the LEO micro-satellite domain was done by Surrey 
Space Centre and published in a conference paper. This paper was written by Sir Martin 
Sweeting, the Chairman of Surrey Satellite Technology Limited, together with members 
of the Surrey Space Centre [12]. The survey showed that the PPU has the highest 
onboard processing capabilities among all the computing payloads they compared. 
8. The research has gone beyond conceptualization, and has taken practical hardware and 
software considerations to realize a working flight module that has been validated 
according to standard space qualification processes.  The implementation of this PPU 
payload shows that the concepts are feasible and suitable for small satellite platforms. 
9. The PPU has changed the conventional economics for the provision of high performance 
onboard capabilities. It represents a technology demonstration of a powerful computing 
payload that is built within a budget of US$80000 for components and manufacturing, 
and within the power, volume and space constraints of a micro-satellite mission. 
1.5 Thesis Outline 
The rest of this thesis is organised as follows:
Chapter 2 provides a literature survey of high performance and COTS-based space 
computing modules that are currently available in the market and their fault tolerance 
approaches. The chapter also contains a literature review of parallel processing 
architectures, their communication structures and network topologies, and on-chip 
fault tolerant communication networks for parallel computing platforms.
Chapter 3 presents the design methodology for the PPU payload. It provides a design 
methodology of a computing module that achieves high performance through 
parallelism, and that achieves reliability through a fault tolerant connection of 
processing elements, memories and external interfaces.  It also analyses the XSat 
radiation environment, to assess the extent of the PPU s radiation exposure. With that, 
the possible effects of radiation on electronic components in the PPU can be estimated 
and mitigation techniques proposed. This helps establish feasibility of usage of COTS. 
This chapter also proposes a technique of accessing reliability through computing part 
reliability and system reliability; and of using reliability as a basis of architectural 
evaluation and improvements.  The design methodology is a systematic approach to 
ensure that the practical design can meet performance and reliability requirements. 
This chapter basically addresses the first research question.
11
  
Chapter 4 presents the implementation schemes of the reconfigurable communication 
network and fault management in the FPGA. Communication network features that 
achieve re-configurability and fault tolerance are studied. This research translates to a 
proposal of a communication network protocol structure that strongly interconnects 
entities in the PPU reliably.  Schemes to detect and autonomously handle faults are also 
proposed. This addresses the second research question.
Chapter 5 presents the architecture and implementation of the fault tolerant mesh 
processing array topology for reliable inter-processor communication.  The fault 
tolerant mesh that is proposed provides a reconfiguration algorithm to construct a 
logical mesh array out of a fault-laden physical processor array which has redundant 
processing elements. It is unique in that the logical mesh can span across four network 
elements for added scalability to deal with the limited connectivity of a single network 
element. This addresses the third research question.
Chapter 6 presents the unique aspects of PPU s hardware, firmware and software 
aspects of the design, instrumental to enable PPU to meet its design requirements. It 
provides details on the design of the communication network features, the unique 
processor booting process, as well as the dynamic runtime configuration of PPU 
operation based solely on header frames coming from the Solid State Recorder (SSR). 
This chapter describes the various implementation structures that are critical in the 
efficiency of payload operation and in terms of efficient utilisation of resources. This 
addresses the fourth research question.
Chapter 7 describes the PPU transition process from Engineering model development, 
to Qualification model and finally, to the flight model development. It describes the 
Qualification Model (QM) testing process which will qualify PPU for use in space. Hence 
it also includes environmental testing of PPU operation in various thermal-vacuum and 
vibration conditions. The results of this chapter serve to justify that the design 
methodology of PPU results in a space qualified computing module and is not just a 
prototype board. This addresses the fifth research question.
Chapter 8 presents the conclusions. It emphasises the main contributions of this thesis, 
especially in answer of the six research questions. It summaries the design methodology 
that was proposed for the reliable and fault tolerance space computing platform, using 
parallelism as a means to achieve both performance as well as fault tolerance. The 
performance specifications of the computing payload, as well as the main characteristics 
of the fault tolerant communication network and fault tolerant mesh processing array 
are also given.
12
  
2 Literature Review 
2.1 Introduction 
Space processing platforms have come a long way, evolving from a simple IC based 
computer developed for the Apollo space shuttle mission in 1960s to the high performance 
flight computer and instruments processor used in recent missions.  The evolution has also 
seen the move from uni-processor systems to multi-processor systems using both 
radiation-hardened and COTS based processors.
The objective of this chapter is to provide a background on the high performance and COTS-
based space computing modules that are currently available in the market, and their fault 
tolerance methodologies.  A literature review on parallel processing architectures is also 
done to have a detailed study of their fault tolerant features, communication structures and 
network topologies.
2.2 Survey of Space Processing Platforms 
This section describes the various space processing platforms, to review on onboard 
computing capabilities for different space missions and the type of processors used on these 
missions.
2.2.1 Traditional Radiation-hardened Processor Roadmap 
Radiation hardened processors are processors that are designed to be hardened against 
radiation effects.  They are fabricated with special processes in specialised semiconductor 
foundries.  They are extremely expensive because of the fabrication process and the low 
yield. 
Table 2-1 shows the characteristics of the onboard processing modules that are based on 
radiation-hardened processors. These modules are used onboard critical space missions 
that demand ultra-dependable onboard computers. These missions are high cost missions 
that require high availability. 
 
13
  
Table 2-1 : Radiation-hardened Processors 
Processor MIPS Year 
Introduced
Manufacturer Unit 
Cost 
(US)
Example Mission 
RCA1802 1 1976 Radio 
Cooperation of 
America (RCA)
Viking, Galileo
BAE MIL-
STD-1750A
1.5 1990s BAE (British 
Aerospace 
Electronics)
Cassini Spacecraft 
RAD6000 35 1996 BAE 250000 Sojourner (on Mars)
ERC32 
(TSC695)
20 2000 Atmel 25000 SMART-1 Lunar 
Mission 
RAD750 266 2001 BA 200000 Mars Reconnaissance 
Orbiter (MRO) , 
WorldView-1 
satellite
AT697F 90MIPS 
@100 MHz
2009 Atmel 10000
BRE440 400 2009 BRE 750000
It is observed that the availability of space-grade processors is limited, and the computing 
capability of missions employing these traditional ultra-reliable processors is very 
constrained.  The progression of radiation-hardened processors in terms of computing 
performance is slow. Before 1990, the onboard processors are operating only at a few MIPS. 
The performance improved to tens of MIPS in the 1990s and to hundreds of MIPS in the last 
10 years (2000-2009).
Today, there are still many missions relying on radiation hardened processors to provide 
reliable computational resources to run attitude determination and control, satellite 
housekeeping and other tasks.  In order to handle payload data processing, additional ASICs 
or hardware based solutions have to be developed to carry out the necessary payload 
functions.  However this is not ideal as ASIC development is equally expensive and not 
flexible for changing mission requirements.
14
  
2.2.2 COTS Processor Roadmap 
It is clear from Table 2-2 that the performance capabilities of radiation-hardened 
processors have been increasing at a slow rate over a period of 20 years.   The computing 
capabilities of commercial computing modules are illustrated in Table 2-2.  Comparatively, 
the performance power of space-grade processors basically lags that of commercial grade 
processors by about 10 years.  The recently developed Broad Reach Engineering BRE440 
chip is a fully radiation-hardened version of the PowerPC 440 processor core. It can operate 
at a speed of 200MHz and can achieve about 400 MIPS. This performance level has been 
achieved by some commercial processors several years back. 
Table 2-2: COTS processors
Processor MIPS or MHz Year 
introduced 
Manufacturer 
ARM2 12 MIPS @ 8MHz 1986 VLSI Technology, Inc
ARM710 40 MHz 1994 VLSI Technology, Inc
PPC 603 
(Generation 2 PowerPc)
75 MHz 1995 IBM and Motorola (AIM 
alliance)
PPC 750 (Generation 3 
PowerPc)
233-400MHz 1997 IBM and Motorola (AIM 
alliance)
StrongARM  235MIPS @ 266MHz 1999 Intel 
PPC 7400 450Mhz 1999 Motorola 
PPC 750CX 400MHz 2000 IBM 
PPC 7455 1GHz 2002 Motorola 
MPC7448 (Generation 4 
PowerPc)
1.5 GHz 
2.3MIPS/MHz
2005 Freescale Semiconductor
MPC8610 3060MIPS @ 
1333MHz
2007 Freescale Semiconductor
Hence the potential of using commercial processors is great, especially when applications 
require increasing amounts of onboard processing capability. Sometimes, in order to 
achieve a high level of processing capability, the use of high performance COTS processors 
15
  
is necessary. However, due to the extreme radiation and environmental conditions in space, 
commercial semiconductor technology cannot be readily used in space-borne applications.
The challenge lies in revolutionising commercial semiconductor technology for suitability in 
space application usage. As space microelectronics represent a very insignificant part of the 
total world semiconductor output, it is not practical to expect semi-conductor 
manufacturers to focus their energy on space electronics. Rather, it might require space 
microelectronics technology to heavily leverage and incrementally build upon the 
commercial technology. This is essential if there is a need for space applications to rapidly 
increase onboard computing capabilities. 
Hence the question is whether there is a space electronics roadmap for space-borne 
systems that can be based on components from the commercial semiconductor industry? 
Alternatively, can the space electronics roadmap be addressed with minimum changes to 
the commercial semiconductor roadmap? Nowadays, with the availability of state-of-the art 
COTS processors exhibiting adequate Total Integrated Dose (TID) performance which meets 
the requirements of the radiation environment in space orbit and processors based on the 
Silicon-on-insulator (SOI) technology are latch-up resistant [13], this is becoming a real 
possibility. But of course, single event upsets (SEUs) caused by Galactic Cosmic Rays and 
Solar Protons will continue to be a problem. However, the economic savings and 
performance improvements are so great, as compared to radiation-hardened computing 
platforms, that many missions are finding COTS computing platform attractive and 
unavoidable [14,15,16]. This is especially so with decreasing space budgets. Space 
computing platforms have been trying to trade-off between the conservative choices of 
well-known and reliable solutions versus the need for miniaturisation and more power by 
adopting new technology [17], such as fault tolerant processors [18].
2.2.3 Space Computing Platforms Based on COTS Processors 
Space computing platforms based on COTS processors is a very recent development, a path 
that deviates from the traditional radiation-hardened processor based approach.  Missions 
usually employ COTS processor based computing for applications where a single error of its 
computing module will not prove catastrophic. These applications can tolerate a few 
occasional faults as well as allow time for recovery. 
However in the last few years, there is research in areas of producing onboard computing 
modules that can achieve a reliability figure better than traditional space-grade computing 
modules.  The Maxwell SCS750 was one example of a COTS computer that claimed to have 
reliability higher than that of a traditional space-grade computer [19].
The COTS enabling techniques employed in current missions or current space computing 
research are based on a broad spectrum of both software and hardware implemented fault 
tolerant measures. They range from software implemented fault tolerant techniques (SIFT) 
to hardware redundancy, and to a wide variant of voting or hardware reconfiguration 
schemes.  
16
  
Some examples of missions that employ COTS processor based computing platform are as 
described below.
The USAF Argos satellite which was launched in 1999 did an onboard experiment to 
compare the reliability performance of a radiation-hardened processor (RH-3000) and the 
COTS IDT-3081 processor. For the COTS computing module, the Control Flow Checking by 
Software Signatures (CFCSS) technique was used with the Error Detection by Duplicated 
Instructions (EDDI) technique to increase the fault coverage. Simulation results showed that 
running test applications on the IDT-3081 with EDDI greatly reduced the rate of undetected 
incorrect output [20]. 
The Remote Exploration and Experimentation (REE) program is a major program by the Jet 
Propulsion Laboratory (JPL) that is dedicated towards investigating COTS usage. Its 
objective is to use COTS hardware and software components to build a data processing 
payload for scientific missions. The project goal is to demonstrate that such a COTS payload 
can survive for at least 10 years or more in space. It also aims to provide good 
computational performance per watt of power. To protect its COTS hardware and software 
layers, it has a software implemented fault tolerance (SIFT) middleware layer that can 
provide a projected system availability of 0.95 over the life of the mission and at least 0.90 
within any 1 day period [21].  It also uses an Algorithm Based Fault Tolerance (ABFT) for its 
applications.
Centre National d Etudes Spatiales (CNES) over the last 10 years has also been developing 
the expertise to support the use of COTS onboard satellites. The motivation for this is due to 
the limited availability of space components, the improved reliability of commercial 
components, as well as the widening performance gap of space components versus COTS 
components. Two main fault tolerant architectures are developed by CNES for its COTS 
missions. One is the Duplex Multiplexed Time (DMT) fault tolerant technique, which is a 
time replication and macro-granularity oriented approach. The other is the Dual Duplex 
Tolerant to Transients (DT2) approach, a mini-duplex structure based on macro-
synchronisation [22]
The Proton100k Space computer that was launched by Space Micro Inc. onboard TacSat-2 
satellite is a radiation-hardened computer that is capable of 1,440 MIPS at single-event-
upset rate of 10-5 unrecoverable errors per day using 8 watts of power.  It uses Space 
Micro s patented TTMR (Time-Triple Modular Redundancy) and H-Core (Hardened Core) 
technology, together with commercial processors (e.g. Equator Technologies BSP-15 DSPs). 
The Proton100k(TM) Radiation Hardened Single Board Computer (RH SBC) is the 
instrument computer of the Air Force Research Laboratory's (AFRL) RoadRunner small 
satellite [23]. The third generation Space Micro Inc. Proton 200k was claimed by the 
company as the world s fastest and most energy efficient radiation hardened computer 
utilizing state-of-the-art commercial processors in 2006 [24].  
17
  
SCS750 is Maxwell s solution to a high performance computing platform. It uses the latest 
Silicon-on-insulator (SOI) PowerPC750-FX processors and latest radiation-tolerant FPGAs, 
together with highest density and fastest memory (SDRAM and FLASH). It builds a highly 
reliable computing platform with its advanced set of mitigation schemes. This includes 
having Triple Modular Redundancy (TMR) on its processor configuration, Reed-Solomon 
based error checking and Bit-Scrubbing on the memory to reduce radiation induced error 
rates. The SCS750 is able to achieve high level of performance while guaranteeing radiation 
tolerance, through a careful combination of parts selection, fault injection testing and 
mitigation [19]. The SCS750 is the selected onboard computer for ESA s Gaia astronomy 
mission.
2.2.4 Fault Tolerance and Fault Handling Techniques for COTS_Based Space 
processor boards 
In order for COTS processors to be used on-board the processing platform, fault tolerance 
and fault handing schemes have to be incorporated [25]. The techniques used are varied, 
and it depends on whether the mission is critical and the computer system has to be 
designed to be ultra-dependable not allowing for failure, or designed to be high 
performance with minimal downtime [26,27].   In the former, fault tolerance is important to 
mask out the fault encountered and to ensure that there is no disruption of operations.  In 
the latter, the fault is handled and the system reconfigured to maintain the level of 
performance required.  Service can be disrupted during the reconfiguration process.
Fault Tolerance can be employed either in hardware or in software [28]. Hardware based 
schemes like Error Detection and Correction (EDAC) logic circuits or triple modular 
redundancy or memory voting techniques, work in the background, and are efficient in 
terms of performance. However, due to the amount of hardware resources required, it 
might not always be applicable or feasible. Proton100K and Maxwell s SXS750 are examples 
of COTS computing platforms that use hardware fault tolerance to achieve high availability 
to great success.
An alternative approach is to employ fault tolerance in software. Software fault tolerance 
can be employed in terms of redundancy of software execution, either by multiple 
executions in different processors or multiple executions in time in the same processor. 
Examples of such COTS Space computing systems are the CNES DMT-based computer and 
CNES DT2-based space computer. Software fault tolerance can also be employed at the 
algorithmic level [29,30], like in the NASA REE project; or by achieving redundancy in 
instruction execution using different sets of CPU registers, like the ARGOS COTS Computer. 
The above COTS computing systems are elaborated upon in Section 2.2.3.
Some fault-tolerant systems use a diversity of design to achieve reliability, for example 
Airbus use redundant channels and processors and diverse software. This has the potential 
to protect against unknown bugs or weaknesses in design.
18
  
Dynamic fault detection and reconfiguration schemes that support runtime fault handling 
will increase availability of the computing payload.  Processor fault detection through 
watchdog time-out and automatic software reboot; or network reconfiguration to bypass 
processor or link faults are some examples of such schemes.
However fault tolerance comes with a cost. For example, the additional hardware logic 
resources that are required in the FPGA to support fault detection and handling, is a cost to 
the design. To reduce the cost of fault tolerance, there might be an increased need for 
innovative approaches to fault tolerance. 
2.3 Survey of Parallel Processing Architectures 
It is envisioned that science data processing will be increasing in complexity rapidly and 
that payload data processing needs to be performed onboard the spacecraft. This is 
especially true for real time space-borne image processing [31]. In fact, the next advances in 
fault-tolerant computing for space will be in parallel and scalable COTS computing 
platforms for space [32, 33].  
2.3.1 Parallel Processing Models 
Different parallel processing models are adopted by computing platforms for space.  They 
are broadly classified either as a Multiple-instruction-multiple-data (MIMD) or a Single-
instruction-multiple-data (SIMD) type of parallel computing platform [34,35]. A SIMD 
computing platform is a form of fine-grained parallelism usually using a large number of 
simple processing elements. It usually executes instructions broadcast from a central 
instruction server to every processing element. A MIMD represents a form of coarse-grain 
parallelism, having an array of processors where each processor runs its own independent 
code. A MIMD parallel computing platform might use a smaller number of processing 
elements as compared to SIMD, but they tend to be of much higher speed grades. 
A parallel computing platform can be further classified as a shared-memory or a 
distributed-memory platform. A shared-memory system is usually an array of high-speed 
processors, where each processor has access to a global shared memory, and inter-
processor communication is done through the global shared memory. In distributed-
memory systems, each processor has its own local memory where on each processor a local 
program is loaded and run, and where inter-processor communication is achieved through 
message passing between the processing elements of the machine. The MIT Alewife 
Machine is a classic example of a distributed memory parallel platform [36].
The REE COTS Parallel computer is an example of a distributed-memory MIMD platform 
developed by NASA [21].  The Convex SPP-1000  is an example of a MIMD platform by NASA 
Goddard Space Flight Center, which combines the shared-memory concept within a 
hypernode boundary, and message passing paradigm between hypernodes [37].  
19
  
Typically, MIMD parallel platforms are suitable for general purpose parallel computing 
platforms as they can support the running of multiple algorithms in different subsets of 
processors, and impose no constraint for the data set to match the number of processors.  A 
MIMD platform can be based on the distributed-memory or shared-memory architecture. A 
distributed-memory MIMD does not need global address space management compared to 
the case of the shared-memory architecture. This task can get very complex as the number 
of processors increases. 
2.3.2 Communication Structures for Parallel Processing  
For most image processing applications, a data set is typically partitioned into regular 
structures, such as in linear strips or square tiles.  The image processing is distributed 
across several nodes but works collectively on the same data set, with each parallel task 
operating on a different data partition.  Image processing applications that require very 
little or no inter-processor communication are classified as embarrassingly parallel 
applications.  There are also image processing applications (e.g. image classification and 
feature detection) that require communication across the data partitions. For these 
problems, inter-processor communication is usually required to acquire knowledge of the 
neighbouring partitions. These types of applications have localised communication, largely 
confined to its immediate neighbours. 
The communication network topology of a parallel computing platform has to suit the type 
of data processing applicable to its applications.  In communicative intensive applications, 
the communications topology has to support the distribution of data across nodes.  A 
communication network that minimises the latency of communication to its immediate 
neighbours (e.g. to a single global clock latency), might be useful.
Literature on Parallel Processing platforms largely shows usage of ring or hierarchical ring 
topologies for its interconnection network. Examples include the Convex SPP-1000 [37] and 
the KSR-1 [38].  This is because ring topologies are cost effective from the implementation 
point of view. It has a simple node interface with a degree of only 2.  Broadcast operations 
are also easily implemented on a ring.  
However scalability can be an issue for ring networks as network latency is proportional to 
the number of nodes in the network. This is sometimes mitigated through the use of 
hierarchical ring designs [38,39]. Multiple ring hierarchical topologies will increase
aggregate network bandwidth, while retaining the simple ring node interface.
For some parallel platforms, more symmetric networks like the linear, mesh, torus or 
hypercube interconnection topologies are used. These symmetric topologies are good for 
scalability. Hence, they are suitable for use in massive parallel computing platforms where 
latency must be kept low despite the size of the network,  and where the advantage of 
communication locality can be exploited [40]. 
20
  
2.3.3 Fault Tolerant Networks 
A parallel computing platform depends largely on the reliability of its communication 
network to ensure a fault-tolerant connectivity between processing nodes. In space, on-chip 
communication networks might suffer from single event upsets due to radiation, causing 
corruption and possibly failure of the network.  Fault tolerance has to be built into the 
communication structure for the parallel processing platform [41,42,43]. 
A survey of the various fault tolerant network topologies was done.  For ring topology 
networks, several fault tolerant schemes have been suggested in the literatures surveyed. A 
token ring network for example will need recovery mechanisms to handle the cases of a lost 
token, a duplicate token or an always busy token [44]. These fault events can be due to 
single-event upsets. For mesh topology networks [45], fault tolerant schemes usually 
include spare tracks, rows or columns of processors, together with a replacement path or 
strategy [46,47,48]. Fault tolerant meshes require a physical fault tolerant interconnection 
network grid or topology, as well as an appropriate reconfiguration strategy that can alter 
the network connectivity to bypass the various processor or link faults  that occur in the  
mesh network [49, 50, 51, 52]. 
In the presence of faults, the speed of recovery in these communication networks depends 
on the characteristics of the network fault detection and recovery procedures [53,54]. 
Ideally, to ensure availability of the parallel computing modules, fault handling should be 
done dynamically at runtime for a faster response. Also a trade-off needs to be done 
between a more centralised monitoring or a more distributed monitoring approach [55]. 
While a centralised approach might mean a faster response, a distributed approach allows 
for more scalability.  An example of centralised monitoring will be the detection of a 
network failure through a global monitoring entity which then activates a global reset to 
remove the network fault. On the other hand, the employment of the timeout scheme in 
various nodes to deal with suspected network failures and to activate network recovery 
action is an example of distributed monitoring approach.
2.4 Survey of COTS based Parallel Computing Modules Developed for 
Space Missions 
In Section 2.2.3, a survey was done on COTS based computing modules meant for use in 
space, and the fault tolerant techniques that are commonly employed in those systems.  
Parallel processing computers are rare in use for space.
In this section, a survey of COTS parallel computing modules for space was specially done to 
assess the state of the art.  Parallel computing modules have additional evaluation metrics 
that have to be considered. By virtue of their multi-processor platform, inter-processor 
connection topologies and the overall system reliability versus individual component 
reliability have to be taken into account.  Their exhibited characteristics can be discussed in 
terms of the following parameters.
21
  
1. Scalability 
2. MIPS/Watt Performance
3. System Reliability 
4. Application Support 
5. Cost Effectiveness  
In the later sections, these characteristics will be discussed, and a literature survey will be 
conducted on a few state-of-the-art COTS parallel computing architectures that are 
currently researched or implemented by space agencies (e.g. NASA1) for use in space 
missions. Essentially, four platforms will be discussed:
1. Environmentally Adaptive Fault Tolerant Computing (EATFC) by Honeywell 
research [56].
2. NASAs Millennium Program (NMP) Space Technology 8 (ST8) Dependable 
Multiprocesor [13].
3. NASA Remote Exploration and Experimentation (REE) Project [21]
4. Intel iWARP that is evaluated by the NASA AMES Research Centre for space usage 
[57, 58]
2.4.1 Scalability 
Scalability refers to the ability of the architecture to be expanded, to include more 
processing elements or memory to support future missions that are more demanding. Many 
of the new instruments under development involve faster and higher resolution sensors 
and more complex processing requirements. Having scalability is important to take 
advantage of the heritage of proven designs and reusing the architecture for future 
missions.
EAFTC is based on a packet switched network fabric to connect various Adaptive Processing 
Computers (APCs) together, providing point to point connection between APCs at any one 
time.  It is highly scalable, allowing a new APC to be added by simply increasing the number 
                                                            
1 NASA - The National Aeronautics and Space Administration is an Executive Branch 
agency of the United States government, responsible for the nation's civilian space program 
and aeronautics and aerospace research.
22
  
of switches into the fabric.  The number of switches supported is limited by the number of 
logic gates available to implement the switch fabrics.
NMP ST8 is a product of the Dependable Multiprocessor project by Honeywell, to build a 
COTS based, fault tolerant multiprocessor system.  The system is based on a redundant 
Gigabit Ethernet switch to interconnect multiple COTS nodes together.  Nodes can be added 
to the network, but bandwidth will be reduced as the system scales up.
The REE makes use of a COTS switch based, redundant Myrinet to connect processing nodes 
together.  Each REE node has a dual redundant PPC750 processor together with a FPGA 
node controller. Though Myrinet is a highly scalable switch network, however the resource 
heavy REE node limits the capability to scale.   
The iWarp uses a 2D toroidal mesh network to connect the processing elements together.  
The original iWarp array consists of 8 iWarp cells each containing a RISC processor, 
memory as well as a iWarp component handling communication. The proposal for flight 
evaluation of the iWarp has suggested the fixed iWarp array of 8 cells. The iWarp 
communication agent supports the message passing service as well as a Programmed 
Communication Service (PCS). The latter requires the use of a statically defined network 
and is more complex. Both these factors will affect the ease of scalability of the iWarp if PCS 
communication agent is used - usually for communication-bounded applications. In 
addition, the iWarp scalability can also be limited by the complexity of the logic required to 
support the re-configurable network topology schemes. The iWarp can be reconfigured to 
form a mesh, torus or hypercube array.
While all parallel processing modules are scalable, the extent of scalability is limited directly 
or indirectly by power or implementation resources.   While the interconnection network 
fabric (e.g. the switch fabric) of the parallel processors might be highly scalable, however 
the feasibility to scale is actually limited by factors like the available logic resources in FPGA 
for implementing network fabrics, as well as the spacecraft allocation of power, volume and 
weight for the onboard computing module.  From the literature, these platforms have yet to 
demonstrate in practice the ability to scale beyond 10 processors in an actual space mission. 
From available literature, it appears that most space missions have low-level scalability,
supporting less than 10 processing elements. 
2.4.2 MIPS/Watt Performance
For space missions, computing performance is constrained by the power available to the 
system, which affects the maximum number of processing nodes that a computing module 
can have.  This is especially true for smaller missions, which have tight power margins.  
Thus, computing performance should be measured in terms of MIPS/ watt or MOPS/ watt
parameter.  By measuring the amount of MIPS generated with the amount of power, the 
efficiency and performance of the computing platform can be normalised across different 
architectures and designs. 
23
  
The EAFTCs APC for example, draws up to 20 watts, and provides up to 1300 MIPS (based 
on 750FX operating at 650 MHz).  This works out to be about 65 MIPS/watt per APC. 
The NMP ST8 s data processor operates at 1 GHz, providing up to 1500 MOPS, using PPC 
750FX without hardware accelerator. Based on a 1K complex FFT benchmark, a 75 
MOPS/watt performance is achieved. .
The NASA REE first generation testbed operates at 366 MHz, and expected to achieve 45 
MOPS/ watt based on a typical maximum power of 32W. The REE project aims to target a 
computer that can scale up to 100 watts, which would amount to about 4500 MOPS for this 
REE architecture.
The iWarp s RISC processor operates at 20 MHz, providing 20 MIPS/ MFLOPS.  No power 
consumption value was provided for the RISC processor. IWarp however claims of being 
able to provide massive computation power at low power and low cost.
For a certain class or size of spacecraft, the range of the maximum power of the spacecraft is 
roughly in the same range. Hence it is the MIPS/ watt performance that matters and which 
determines the total MIPS of a mission, rather than the maximum MIPS of a single 
processor. The reason for the wide usage of the Power PC 750 processors is not because it 
has the highest MIPS among the power PCs, but rather because it belongs to the low power 
series of the Power PC family, That is the reason for its widespread usage in many space 
missions (e.g. REE, EATC, NMP ST8, Maxwell SCS750). Also the commercial version of the 
PowerPC offers much higher MIPS as compared to the rad-hard version. According to the 
paper on NMP ST8, a radiation-tolerant version of the PowerPC 750 performs the 1K 
Complex FFT benchmark in only 13 MOPS/ watt, while the commercial PowerPC 750 does it 
in 75MOPS/ watt.  The commercial PowerPC 750 thus offers more than 5 times 
improvement over the radiation-tolerant version of the processor. Low MIPS/ watt 
processors are generally used onboard spacecrafts to avoid thermal hot spots during 
operation.
2.4.3 System Reliability 
System reliability measures the probability of the parallel computing module to perform 
its task as a whole, rather than the survival probability of each individual computing node. 
For example, a fault tolerant triple-majority voted memory has to take into consideration 
the voter and its reliability.  In the case of a parallel processing platform, the system 
reliability modelling is important to establish the system reliability of the platform, to 
justify its use in a particular mission.  However, as not all parallel computing modules do 
reliability modelling, a quantitative comparison of system reliability figures might not be 
possible across the various parallel computing platforms. However, a qualitative 
assessment can be done to observe for possibility and probability of single point of failures, 
as well as presence of redundancies or replacement paths.
24
  
All state of the art COTS computing modules aim to achieve system reliability through 
several fault tolerant approaches.  The EAFTC employs a combination of hardware and 
software fault tolerance. It first imposes a proper component selection, testing and 
screening process. At the hardware level, there is redundancy   to mitigate hardware faults. 
At the software level, faults not covered by the hardware fault tolerance will be handled, 
through techniques like algorithm-based fault tolerance. At the highest level is the EATFC 
system level fault tolerant approach that addresses faults not covered by the application. It 
provides software replication and system monitoring services (e.g. detection of failed 
processors or failed application processes) at the system level.
The NMP ST8 applies a generic fault tolerant framework followed by application-specific 
fault tolerance. The generic fault tolerance includes middleware services and a fault tolerant 
message passing interface for inter-processor communication, while the application-specific 
fault tolerance employ schemes like the algorithm-based fault tolerance approach.
In both EAFTC and NMP ST8 platforms, there is a presence of radiation hardened system 
controllers used in the architecture to perform the critical function of monitoring and 
configuring the parallel clusters. Hence, these platforms are not totally COTS modules.
The REE relies on a network of distributed nodes that are connected by a resilient network 
based on Myrinet network research from DARPA. In the architecture, each processing node 
has a FPGA node controller to interface the processing node to the Myrinet network. This 
network is supposed to have no single point failure and support graceful degradation.
The iWarp design is more focussed on performance than reliability. There is no mention of 
fault tolerance in its communication services or in the host interface to the iWarp array.
As can be seen in most systems, system reliability in parallel computing platforms is usually 
achieved through a fault tolerant interconnection and communication among the parallel 
entities. Highly reliable operations of parallel clusters are usually done though both 
hardware and software fault tolerance, in order to provide a holistic approach to handling 
of different categories of faults. This is very apparent in the EAFTC, NMP ST8 and REE 
parallel systems.
2.4.4 Application Support Level 
Application support level measures how well the computing architecture is able to 
support different types of parallel applications. Different applications possess different 
characteristics. Some are more computationally-intensive, while others are more 
communicative-intensive. The effectiveness of parallel processing tasks depends on how 
well underlying communication networks support their communication needs.  Application 
support level is also enhanced when parallel computing modules are configurable to meet 
changing mission needs. For example, an architecture that allows the application user to 
trade-off reliability for performance when executing less critical processes, and increase 
levels of redundancy or software replication when executing critical processes, will 
25
  
optimise resource usage. Application support level also measures the ease of code 
development and portability on the platform.
Some parallel computing modules support multiple communication network topologies that 
can enable efficient inter-processor communications for a wider range of applications. The 
iWarp project is an excellent example of an architecture that supports a wide range of 
communication topologies. The iWarp allows its parallel processors to operate in either a 
loosely-clustered MIMD or in a tightly-coupled systolic array.  The iWarp system consists of 
iWarp processor cells in a 2-dimensional array. Each iWarp cell has a communications agent 
that supports a range of k-ary processing arrays like the mesh, torus, and hyper-cubes. K-
ary processing arrays are efficient for applications which exhibits spatial communication 
locality [59]. The iWarp can also effectively perform programmed scatter and gather or data 
forward operations - operations required in several parallel processing tasks.
As for parallel processing modules that provide adaptive configuration of fault tolerance, 
the only available literature was a NASA s published research on the EATFC module. The 
EATFC architecture provided a means of doing a trade-off between performance and 
reliability by using software replication level as a tool. This approach allows the 
architecture to better suit different mission application needs, and at different stages of the 
mission.
In the REE project, there is an emphasis on using COTS software, such as a COTS operating 
system, development and support tools, and libraries. Standard Application Programming 
interfaces are employed to ensure seamless code portability between the workstation and 
the flight hardware one of the main advantages of COTS software systems.
It can be observed that communication network design, platform flexibility as well as a 
friendly software development and porting environment, have been the primary concerns 
of the designers in providing an enhanced application support level for the above parallel 
processing platforms.
2.4.5 Cost Effectiveness 
For the technology of a parallel computing platform to be accessible to small satellite 
platforms, it has to be cost-effective. This is measured in terms of cost of the technology, in 
terms of component cost as well as resources like volume, mass and power consumption 
[60].  The use of COTS in space missions for example will result in huge savings in cost. They 
tend to be significantly cheaper, and have smaller volume, mass and power consumption, as 
compared to their space-grade counterparts.  The cost of different technology chips also 
determines their popularity of usage in smaller space missions. FPGAs for example are used 
more often in smaller space missions for implementing logic platforms, as compared to 
ASICs, due to cost considerations  [61].   
Using commercial network technologies like Rapid IO industry standard, Gigabit Ethernet 
and Myrinet have enabled very high communication performance levels in the EAFTC, NMP 
26
  
ST8 and REE projects. The use of commercial operating system like Lynx and other 
commercial software suites have reduced the cost of development and eased porting, as 
well as provide a familiar environment for application users.
The EAFTC parallel cluster requires the usage of a radiation-hardened Motorola 603E 
microprocessor from Honeywell for its system controller [62]. This might reduce the 
feasibility of the EAFTC for use in missions with smaller budgets, or those with no 
availability to such high reliability processors due to US government export restrictions.
2.5 Technology Gaps in Parallel Processing for Space 
A detailed literature survey of the state-of-the-art parallel computing platforms reveals a 
few important observations. One is the importance of the MIPS/ watt parameter. Also some 
areas that can be explored further would be to achieve cost effectiveness as viewed from 
another perspective, in terms of efficiency of resource utilisation. 
Prediction of reliability or availability to estimate the system s behaviour in the space 
environment is not a new concept [13,63]. However applying the results of system 
reliability computation to iteratively assess and justify component selection, architecture 
and redundancy parallel paths for a parallel cluster is an innovative area that could be 
explored further. This approach can avoid over-catering for fault tolerance. It is not difficult 
to observe that conventional techniques of fault mitigation like Triple Modular Redundancy 
(TMR) usually consume a large amount of resource overheads (200% overhead for TMR). 
Hence, the option of converting fault tolerance overhead to useful computation power when 
the mission does not need it will save cost. 
Another potential cost-saving measure is to research on the use runtime adaptive fault-
tolerance approaches, to optimise the usage of power, computing resource and expensive 
hardware board space.
In addition, a design methodology that provides a similar mechanism for configuring the 
parallel cluster s levels of fault tolerance and computation power can be researched further.  
For example, a methodology that uses parallelism to achieve high performance and also as a 
means to achieve fault tolerant connectivity in the parallel cluster. The objective is to avoid 
the fixed cost of overheads that results from fault tolerance mitigation schemes, and to offer 
graceful degradation in the process.
The literature survey conducted for the mesh fault-tolerant reconfiguration schemes also 
shows that the reconfiguration schemes are not optimised for the scenario where the mesh 
is implemented across more than one hardware IC chip. A single hardware IC chip has 
limited connectivity. For scalability, a state-of-the-art technology will be a mesh fault 
tolerant array architecture that can be scaled across more than one hardware chip, in an 
optimised way. First of all, this ensures that a failure in one hardware IC chip will not bring 
down the entire mesh network. In fact, single point of failure in the parallel computing 
27
  
architecture is definitely not acceptable if reliability is the goal. Removing the single point of 
failure (i.e. platform to implement the mesh fabric), the need for an ultra-reliable 
implementation mesh platform is also removed.
2.6 Conclusion 
It is envisioned that science data processing will increase rapidly in complexity, such that it 
needs to be performed on COTS parallel processing cluster computers in space.  This vision 
seems truer today than ever. Hence many of the next advances in fault-tolerant computing 
for space will be needed to be applied to space parallel and scalable COTS computing 
platforms. 
From the literature survey, through a combination of hardware and software fault-tolerant 
techniques, it is possible to develop a COTS computing module with reliability that exceeds 
that of a radiation-hardened processor board (e.g. Maxwell SCS750).
But ultimately, the objective is to design a COTS enabling technology for space platforms in 
a cost-effective fashion. The desired result is to achieve a high performance and fault 
tolerance computing platform, which is effective in terms of cost and resource usage. This is 
to ensure that such technology is applicable and easily applied to low cost small satellite 
missions.
This thesis hopes to combine the concept of parallel computing and fault tolerant 
computing to propose a cost-effective design of a new generation of powerful onboard 
computers for space use. The focus shall be architecture, design process and methodology 
for the development of such an onboard space computer. 
28
  
3 Design Methodology of a Low Cost, High 
Performance and Fault Tolerant Payload 
3.1 Onboard Computing Platform Design Methodology 
Onboard computing platform design can be divided into two broad categories.  Some space 
computing platforms are mission-critical (e.g, on-board computer system for Atlantis Space 
Shuttle).   They have to be designed with ultra-reliable components, as they are expected to 
have a degree of reliability of 0.9999 [64]. Other space computing platforms can tolerate 
some degree of unavailability or unreliability and accept a reliability figure around the 
range of 0.9. Both types of systems are designed with very different design methodologies.  
The former is based strictly on fault-avoidance, while the latter is based on fault-tolerance 
[65]. The latter can tolerate occasional occurrence of faults, but is usually protected by fault 
masking techniques that allow the system to output correct results in the presence of faults.
While mission-critical systems have to be designed with conservative ultra-reliable space-
design concepts and use expensive space-grade parts, the design of other systems is more 
flexible. The second system provides more opportunities to leverage on existing huge 
technological investments made in commercial processors and commercial components. 
In pursuit of an on-board computing platform that can achieve performance and reliability 
in the most cost-effective way, a design approach for such systems is needed. This is what 
the first research question aims to address.   Such a design approach assures small satellite 
missions of a cost-effective computing platform, yet with the performance and reliability 
that only bigger satellites can afford. In this chapter, a design approach is proposed to 
answer the need for high performance, high reliability and low cost computing platform. 
3.2 PPU Design Approach 
The PPU design approach allows the analysis of the trade-off options between achieving 
reliability using quality components and through fault tolerant architectural design. The 
assumption is that the payload must survive three years of operation under the expected 
satellite space environment.
 
29
  
The process of determining the final PPU architecture consists of the following five steps.  
Architectural Tradeoffs and Design 
Component Selection Process 
Radiation Analysis and Mitigation 
Parts Reliability Modelling 
System Architecture Reliability Modelling 
In the first step, an onboard computing payload architecture that offers low-cost but high 
computation performance with fault tolerance is evaluated and proposed. The second step 
is to establish a component selection process to meet design goals within the resource 
limitations of volume, mass, power and cost. The third step is to analyse the satellite 
radiation environment to ensure parts selected can survive the radiation, and to propose 
radiation mitigation techniques for the design architecture. The fourth step proposes a 
technique for estimating the part reliability of all individual parts in the payload. The 
reliability figures of the individual parts serve as inputs to an architectural reliability 
modelling tool in the fifth step. This allows the overall system reliability of the payload to be 
computed.  The five steps can be iterated under different PPU operational scenarios to 
ensure that the reliability goal is met. 
The above process allows us to identify design modifications or changes to overstressed 
parts to achieve reliability objectives and meet mission needs [66]. This design methodology 
is proposed by the author, to provide a quantitative means of evaluating tradeoffs between 
different design alternatives. Through reliability prediction, critical reliability items can be 
identified and controlled. 
The materials in this chapter are the sole contribution of the author, with the exceptions of 
sections 3.4.4, 3.5, 3.6 and 3.7. Section 3.4.4 was done with the help of the XSat thermal 
team, while Sections 3.5, 3.6 and 3.7 were done with the help of the XSat Quality Assurance 
Team.
3.3 PPU Architectural Tradeoffs and Design 
In this section, desirable architectural features for a proposed low cost, high performance 
and fault tolerant parallel computing payload for space applications are identified. These 
features are then used to derive a concept diagram for such a computing platform. The 
concept diagram will then be used for the part selection process in the second stage. 
30
  
3.3.1 Parallelism Structure 
The PPU architecture achieves high performance through parallelism of COTS processing 
elements. It is expected to support both data and functional parallelism, and a Multiple-
Instruction Multiple-Data (MIMD) parallel processing architecture, as described in Section 
2.3.1, is proposed for the PPU. 
A MIMD parallel platform describes an array of high speed processors, where each 
processor runs its own independent code. MIMD supports the running of multiple 
algorithms in different subsets of processors an important requirement in the PPU as a 
general purpose software payload. Such a MIMD computer imposes no constraint for the 
data set to match the number of processing elements.  It is a form of coarse-grained 
parallelism with a small number of powerful processors, rather than a large number of less 
powerful ones like in the case of the Single-Instruction Multiple-Data (SIMD) computer.  It 
requires usage of processing elements with high MIPS. Communication between processing 
nodes in the PPU parallel cluster is to be based on the Message-Passing MIMD platform 
rather than the Shared-memory MIMD platform to avoid memory bus contention.
3.3.2 Software Reconfiguration 
For the PPU to support software reconfiguration, non-volatile memory is required onboard 
the PPU to store a common copy of the uploaded codes, before they are streamed to the 
distributed parallel processors at runtime. This makes the process of code upload easier, as 
only the copy in the shared non-volatile memory needs to be changed. The non-volatile 
memory must be large enough to store a copy of the operating system image, processor 
boot loader codes and application codes. 
Software reconfiguration also requires the networking element in the cluster to support the 
necessary network interconnections between entities. For example, there should be 
connections among the spacecraft 2Gbyte Solid State Recorder (SSR) data interface from 
which the codes are streamed, the distributed processing nodes and the non-volatile 
memory banks.
3.3.3 Fault Tolerance 
One important feature in the PPU is to minimise the use of space grade components by 
employing redundancy at all levels. In fact, it aims for maximum use of COTS components 
a solution to the limited resource constraint of power, mass and cost. But an important 
design rule is to have no single point of failure at the hardware component level. This 
implies redundant paths for all important functionalities like power, command interface, 
data interface, processors and network interconnecting elements. 
By eliminating single-point failures, the PPU design allows for graceful degradation from a 
fully working multi-processor system down to a single processor. Fault tolerant 
31
  
interconnection of processors and network elements is the key to ensure that the PPU can 
be flexibly reconfigured to handle various fault scenarios. 
3.3.4 Software Development 
As for software development, the aim is to minimise the effort for software developers to 
develop applications for the payload. Hence a familiar software, simulation and debugging 
environment for application developers is desirable. The Linux operating system (OS) is the 
chosen OS for all processors in the PPU, for application developers to take advantage of the 
familiar Linux PC environment and its available suite of tools.  The PPU resembles a Linux 
Beowulf cluster, a parallel computing cluster based on the open source Linux OS to support 
the continuous evolution of software and drivers.  
With such a software architecture, software developed on a normal Linux cluster 
workstation can be simply ported to the PPU by a simple recompilation in the GCC 
environment. There is no need for revalidation of the execution environment, as long as the 
underlying OS and driver libraries have been correctly validated. This is cost-effective as it 
provides a fast and efficient software test-bed.
3.3.5 Conceptual Diagram 
With the above considerations and architectural methodology, the conceptual diagram 
representative for such a parallel computing platform is shown in Figure 3-1. The various 
elements in the architecture are described below:
1) A loosely coupled parallel cluster of N parallel processing nodes  
The number of processing nodes depends on the aggregate computation performance 
required of the payload application. Each processing node has its own volatile memory 
for running its own local version of the operating system and application software.   
These parallel processing nodes are thus loosely coupled, as in the MIMD parallel 
processing model described previously, each carrying out its own processing in a 
parallel manner without a need for lock-step execution.
2) Array of Interconnected Network Elements 
Network elements are logic gate devices for the purpose of interconnecting the various 
network entities such as the processing nodes, memory banks and external command 
and data bus. For the purpose of redundancy and taking into consideration the limited 
number of input/ output pins for connectivity to entities, the architectural diagram 
proposes multiple network elements.
32
   
Figure 3-1 : Conceptual Diagram 
3) Redundant Shared Non-volatile memory bank 
The non-volatile memory banks store the operating system image and application 
software that are uploaded to the PPU payload in orbit. The architecture proposes 
multiple banks of non-volatile memory for redundancy, at least one bank per network 
element. 
4) Redundant Power supplies 
The power supply module is critical to the payload and hence there will be at least two 
power modules for the entire parallel processing cluster. This is to ensure that when 
one power module fails, half of the processing clusters can still function. 
5) External Command interface (primary and secondary)
The payload command interface is the main link to the satellite spacecraft bus. The 
payload receives telecommands and sends telemetry data for health monitoring 
purposes through this link. Once this link is lost, the satellite loses its capability to 
command the payload. Hence there should be at least two redundant links for the 
payload command interface. Onboard XSat, this link is the link to the Controller Area 
Network (CAN).
6) External Data interface (primary and secondary)
The payload data interface is the high speed link through which the PPU receives 
payload data from the spacecraft solid state recorder (SSR) and transfers processed 
Array of 
interconnected 
Network 
Elements 
PN1
PN2
PNn
Redundant 
Distributed 
Volatile 
Memory
Distributed 
Volatile 
Memory
Distributed 
Volatile 
Memory
Redundant Data 
Interface
Redundant Power 
Modules 
Redundant Non-volatile 
Memory Bank 
33
  
payload data to the SSR. It is also through this link that newly uploaded codes stored in 
the SSR are streamed to the PPU. There should be at least two redundant links for this 
data interface.
3.4 PPU Component Selection Process  
The PPU payload has a parts selection methodology to meet mission related requirements 
and resource constraints of mass, volume, power and costs imposed by the XSat micro-
satellite remote-sensing mission.  It requires iterations of the following four design steps.
Step 1: To derive a possible component set that can meet design specifications.
Step 2: PCB component placement check that the design fits a tray size with form 
factor of 36cm by 29cm. Total mass must also not exceed 1.8kg. 
Step 3: Power operation scenario profiling to check that power consumption is 
never more than 40W - the maximum power that can be supplied from the 
spacecraft power module.
Step 4: Thermal analysis to ensure that continuous PPU operation of 50min is 
possible without violating the thermal limits of any component in the PPU.
Step 1 requires the selection of the main design components in the PPU.  Parts with space 
heritage or those with available manufacturer s test data are preferred since expensive 
parts testing and screening process is not feasible for this project budget. Otherwise, parts 
from vendors with better manufacturing processes are chosen. Parts with manufacturer s 
radiation data to support performance in a radiation environment are considered a bonus. 
As the PPU needs to pack a lot of computation power within a standard size PCB, BGA 
packaged components are widely used to save volume and mass.
After component selection, step 2 involves using the footprint packages of the various IC 
chips for PCB component placement check using Mentor Graphics Boardstation 
software[67].  If step 2 demonstrates feasibility of design in terms of space, step 3 will be 
performed. This step involves profiling the board s power consumption for different 
operation scenarios. The result of the power profiling is an excel sheet which depicts the 
board s power consumption in different area zones, computed under nominal and peak 
operation scenarios. 
The power profile of the different PCB area zones are then used for board thermal analysis 
in step 4. The board thermal analysis involves deriving the estimated temperature reached 
in various zones of the board. This is to verify that no component will be heated beyond its 
manufacturer specified ratings in the datasheet.
34
  
3.4.1 Step 1: Deriving a possible component set 
Component assessment criteria are divided into the following four main categories of 
considerations. The checklist for each category is as given in Table 3-1. 
Table 3-1: Component Selection Checklist
Parameter Candidate 1 Candidate 2 Candidate 3
Part No / Description
  
Part Description    
Electrical, Mechanical 
and Performance 
Characteristics 
Power Specifications    
Performance Specifications    
Mechanical Specifications    
Environmental 
Assessment 
Package/Material Outgassing Specifications
Not a Prohibited Part    
Temperature Range    
Procurement  
Assessment 
Lead Time    
MOQ    
Cost    
ITAR Restriction    
Manufacturer or Vendor Quality    
Risk Assessment 
Space Heritage    
Radiation Data    
Component Grading and Failure Rate (FITS) 
Computation    
The checklist has to be done before assessing the suitability of a component for the project.  
Electrical, mechanical and performance specifications, e.g. power consumption, 
mechanical packaging, heat dissipation, clocking speed. 
Environmental assessment, e.g. operating temperature range or suitability of 
package or material for the space environment.  This means meeting outgassing 
requirements of Total Mass Loss (TML) of maximum 1% and Collected Volatile 
Condensable Materials (CVCM) of maximum 0.1%. Prohibited parts for use in space 
environment include those with pure tin plating (for IC leads) because of tin 
whisker, hollow core resistors and air capacitors (because of the vacuum 
environment) and unpassivated semiconductors (because of radiation exposure).
35
  
Procurement assessment, like lead time, cost, Minimum Order Quantity (MOQ) 
purchase, presence of ITAR restrictions and manufacturer/ vendor quality.  
International Traffic in Arms Regulations (ITAR) is a set of United States 
government regulations that control the export and import of defence-related 
articles and services on the United States Munitions List.
Risk assessment, such as part failure rate analysis report (see section 3.6), space heritage or 
availability of radiation data for that part.  Part failure is represented by FITS failure units 
representing number of failures per billion hours 
The flow chart in Figure 3-2 describes the part selection process to determine whether a 
part is accepted or rejected.  The part selected must meet design requirements and design 
constraints (power, mass, volume), procured reliably within budget and delivered within a 
reasonable time frame. 
   Figure 3-2: Component Selection Flow Chart
The part has to be assessed for risk. It must have either space heritage, supportive radiation 
data, or be used in a fault-mitigation or fault-masking fashion to protect against radiation-
induced faults. Its failure rate must also be comparable to other components in its class.
36
  
The approach is to implement fault tolerant techniques that allow acceptance of parts with 
lower reliability, but not compromising overall system reliability. Hence for COTS 
components that have neither space heritage nor available radiation data for analysis, 
radiation mitigation techniques or enhanced levels of redundancy are required (see Table 
3-2). Most selected components are COTS because they offer specifications that meet design 
requirements at a reasonable cost.
For critical missions requiring higher reliability figures, up-screening or re-designing of 
COTS parts might be required. Selective re-designing of commercial-grade parts is NASA s 
approach to produce spacecraft electronic systems that are reliable enough for the tough 
space environment. But such costly processes are avoided in the PPU project.
Table 3-2: Comparing the PPU Component Characteristics with Space Grade Options
Component Name Component 
Description  
Space 
Heritage 
Radiation 
Data 
Radiation Mitigation 
Techniques 
Part Failure 
Rate (FITS) and 
component 
grading 
SA-1110 Microprocessor Yes No Power Strobing/
Multiple Redundant 
Commercial 
AX1000-fg484 Actel Antifuse 
FPGA 
No Yes Multiple Redundant
Partial TMR 
Industrial
MTR283R3SF DC_DC 
Converter 
Yes No Dual Redundant Military
FMC-461 EMI Filter Yes No Dual Redundant Military
K4S561632H SDRAM No Yes Multiple  Redundant
EDAC 
Industrial
AT45DB642D-CNU Serial Flash No No Multiple Redundant
Triple-Majority- Read 
Flash Content 
Correction 
Industrial
C515C CAN 
microcontroller 
Yes No Dual Redundant Industrial
TJA1054AT CAN 
Transceiver 
Chip 
Yes No Dual-Redundant Industrial
DS90C031AWGQML LVDS Driver Yes No Dual Redundant Military
DS90LV032AWGQML LVDS Receiver Yes No Dual Redundant Military
3.4.1.1 Processing nodes 
The PPU computing payload uses the Intel® StrongARM SA-1110 processor as its main 
processing element for three reasons. Firstly, the SA-1110 has an impressive 
performance/ power and performance/ cost ratio. Secondly, it has space heritage in a couple 
of space missions. Its predecessor, the SA-1100, was used as the computer onboard the 
SNAP1 Nano-satellite - a Surrey Satellite Technology Ltd initiative [68]. This was followed 
by the use of SA-1110 on Nano-satellites by Tsinghua University. Thirdly, the SA-1110 chip 
occupies a small PCB footprint.
Each SA-1110 is rated at 235 Dhrystone 2.1 MIPS at a clock speed of 206 MHz and 
consumes a nominal power of 0.5W, up to a maximum of 1 watt. An array of 20 StrongARMs 
37
  
provides a combined peak computation power of 4700 Dhrystone 2.1 MIPS.  This works out 
to be nominally 400 Dhrystone 2.1 MIPS/watt.
In comparison, XSat s main On Board Computer (OBC) uses a radiation-hardened ERC32 
processor that offers only 20MIPS of computation power with 5W of power consumption. 
Cost-wise, one SA-1110 is only about US$80 while one radiation hardened ERC32 processor 
chip costs about US$16000.  
3.4.1.2 Interconnecting elements for the processing cluster 
The network elements in the PPU architecture are chosen to be based on the FPGA logic 
devices. Antifuse FPGAs are preferred over SRAM-based FPGAs due to the fact that they 
have better resilience to radiation and single-event-upsets (SEUs) [69]. The internal 
architecture of antifuse FPGAs will not be affected by SEUs since the cells are programmed 
by blowing physical fuses. SRAM-based FPGAs on the other hand require protection of their 
programming configurable bit-streams from corruption [70,71,72] and other advanced 
mitigation techniques [73,74]. Of course, the disadvantage of antifuse FPGAs is that they are 
not programmable. However in the architecture proposed, it is not required for FPGAs to be 
reconfigurable. Flexible switch architectures implemented in FPGAs will still allow 
configuration of the network interconnection without programming the FPGAs.
Among the various vendors of antifuse FPGAs, Actel is preferred because of its vast 
experience in developing space-grade FPGAs which have been flown on many NASA 
missions. Actel produces FPGAs of various grades (commercial, industrial, military, 
radiation-tolerant and radiation-hardened parts). However parts that are radiation tolerant 
or radiation-hardened are under ITAR control, and difficult to procure. Hence for ease of 
acquisition and to save cost, only non-radiation-hardened FPGAs from Actel will be chosen. 
In particular, Actel AX1000-fg484 industrial grade FPGAs are chosen because they have 
available radiation data and share a similar core architecture with the RTAX1000 FPGAs 
(the radiation version of the AX1000 FPGAs).  One main difference is that the AX1000  does 
not have Triple Modular Redundancy (TMR) registers, which might subject it to radiation 
effects like any other non-radiation grade IC.
3.4.1.3 The DC-DC converter and the EMI filter in the power module  
The DC-DC converter and EMI filters used in the PPU power module are from Interpoint - a 
reputable manufacturer in the space industry. Though the MTR283R3SF is not a space-
grade part, it has a relatively low failure rate, about 25 failures per billion hours (or FITS) 
compared to 200 FITS for other DC-DC converters. Similarly, the FMC-461 EMI filter also 
has a low failure rate of 4 FITS, which is expected since this is a passive part. Both the 
MTR283R3SF and the FMC-461 are screened to meet Mil-hdbk-338 screening requirements 
38
  
3.4.1.4 The volatile memory chips for each processing cluster 
For the distributed 64Mbyte of SDRAM memory in every processing node, industrial grade 
SDRAM chips from SAMSUNG are preferred. Several families of Samsung commercial 
SDRAM have good radiation data. In particular, Samsung K4S561632H SDRAM chips are 
chosen because radiation data on an earlier predecessor, K4S561632E-TL75 is available. 
The radiation test on its predecessor shows that it is fairly latch up immune and has a Mean 
Time Before Failure (MTBF) of 27 days [75].
3.4.1.5 The non-volatile memory chips for each processing cluster 
Flash chips that meet design cost requirements, and which has space heritage or available 
radiation data is uncommon. Hence industrial grade Atmel serial flash chip AT45DB642D-
CNU, which has neither space heritage nor available radiation data, is chosen. It is chosen 
because of its small form factor and the small number of I/ O interface pins required for 
interfacing to this chip.  In terms of memory capacity, it provides 16Mbits of memory per 
chip, which is sufficient to store the PPU boot codes and application codes.
However it is decided that there should be a set of triple-redundant flash chips connected to 
each FPGA network element, to work in a triple-voting manner. The triple voting read 
access helps to mask out possible errors in the serial flash chips. The triple-majority read is 
also used for the software to check flash contents and perform subsequent correction if 
errors are detected. In case of hardware failure of one or more flash chips, the FPGA 
network element can default to using a single flash chip in a non-voting fashion.
3.4.1.6 Controller Area Network (CAN) Chip 
For the Controller Area Network (CAN) command interface, the PPU uses industrial grade 
C515C-8E 8-bit Microcontroller Chip from Siemens. This microcontroller has an internal 
CAN controller to interface to the CAN bus. The C515C chip has space heritage as it has been 
flown on Surrey UOSAT12 micro-satellite, and has no reported problems.
3.4.1.7 LVDS Transceiver Chip 
Low Voltage Differential Signalling (LVDS) transceiver chips from National Instruments are 
used for the high speed LVDS links with the SSR. The specific LVDS driver chip used is the 
DS90C031AWGQML, and DS90LV032AWGQML is the LVDS receiver chip. These chips 
conform to a military grade of MIL-STD-883, and have space heritage in SATREC-I satellites.
3.4.1.8 Circuit and Interface Design 
With the selection of the main components in the PPU parallel cluster, the preliminary 
circuit and interface design is done. This translates the concept diagram to an actual 
hardware block diagram that meets design requirements. The final block diagram of the 
PPU architecture is shown in Figure 3-3. 
39
  
Figure 3-3 illustrates a parallel cluster of processing nodes interconnected via FPGA 
network elements that are arranged in a 2 x 2 square matrix configuration. Only two of the 
FPGA network elements have external interfaces to the onboard satellite command and data 
bus. Network elements with external interfaces are termed as the Master FPGAs, while the 
rest of the FPGAs are termed as Slave FPGAs. The external interfaces include interface to the 
CAN spacecraft command bus, and to the 2Gbyte SSR.
Figure 3-3: PPU Hardware Design 
To build a large parallel network, a ring network that spans across multiple FPGA network 
elements is proposed. This ring network is reconfigurable, to ensure that it can be 
40
  
reconstructed in the presence of one or more faults in the FPGA network elements. Figure 
3-4 shows possible network configuration mode on a platform with four network elements. 
Figure 3-4:  Inter-cluster network configuration options 
3.4.2 Step 2: PCB component placement checks    
The above architecture is designed in hardware using Mentor Graphics to obtain 
preliminary PCB placement checks. 
     
Figure 3-5: (a) Top Placement View (b) Bottom Placement View
The top and bottom placement layouts are as shown in Figure 3-5, designed on a PCB of 
form factor of 36cm by 29cm. This is to verify that the design can fit into the standard PCB 
dimensions allocated for the PPU payload onboard XSat.              
3.4.3 Step 3: Power Operation Scenario Profiling  
The PPU component zoning represents a preliminary PCB hardware layout of the PPU 
components. The concept of zones is used to perform power profiling and thermal analysis
on the different sections of the board. The PPU zones are highlighted in Figure 3-6.
41
  
Figure 3-6: PPU PCB Component Zoning for Power/Thermal Analysis
Zone 1 consists of the main 28V DC-DC converter which has an efficiency of only 75%. 
Hence 25% of the power input to the convertor will be converted to heat. The rest of the 
power will go to the various electronics. The power efficiency of the switch-mode regulators 
in zone 5 and 6 has an efficiency factor of 80%. Thus 20% of the power that goes into these 
regulators is converted to heat. Zone 2 contains FPGA network elements, while zone 3 
contains the processing nodes and associated components. Zone 4 contains the chips 
related to the CAN and LVDS interfaces. Table 3-3 describes the PPU power consumption in 
various zones of the board. Nominal performance scenario refers to the situation when the 
PPU payload is operating simultaneously with 4 processing nodes.  The peak performance 
scenario refers to a simultaneous operation of 16 processing nodes. 
Table 3-3: Power Operation Scenario Profiling
Module/      
Component Item
Component/PCB/Zone power consumption 
(Watts) 
Nominal Peak Efficiency (from power to heat)
Zone 1 8.25 10 0.25
Zone 2 0.5 0.65 1
Zone 3 2 2.4 1
Zone 4 0.675 0.81 1
Zone 5 2.25 2.7 0.2
Zone 6 0.625 0.75 0.2
42
  
As shown in Table 3-3, the input power from the DC-DC converter in Zone 1 is 10W at its 
peak. Hence the peak power consumption is 10W for half of the processing clusters in the 
PPU. Hence if both halves were to operate together, the total wattage is 20W. This is less 
than the 40W maximum that the spacecraft power module can supply.
3.4.4 Step 4: Thermal analysis to check for component temperature violations 
Thermal analysis is done for the entire XSat structure by the thermal team. The thermal 
model of the PPU was built using FEMAP (Finite Element Modelling and Post-processing). 
FEMAP is a finite element processor and modeller. The analysis is done using SINDA/G 
(Systems Improved Numerical Differencing Analyzer/ Gaski) and NEVADA (Net Energy 
Verification and Determination Analyzer). 
Thermal analysis requires the simulation of the worst hot and cold cases, which are 
respectively the marginal cold and hot scenarios. Marginal hot scenario occurs when the 
satellite experiences the greatest heat flux from the environment, and the longest sunlit 
period. For the marginal cold scenario, it is the reverse and occurs when the satellite 
experiences minimum environmental heat flux and the longest eclipse. For the marginal hot 
case, it is assumed that all processing clusters of the PPU are operated simultaneously for 
50 minutes. For each of the marginal hot and code scenarios, the maximum and minimum 
temperatures of various zones in the PPU are shown in Table 3-4.  The results obtained 
from the simulation have a +/ -15oC margin added to them. Hence the simulation results 
show that with even with the added margin, none of the zones have temperature violations.
Table 3-4: The PPU Thermal Simulation Results for the various zones
PPU Module
Component/Zone Operating 
Temperature  (degrees Celsius)
QM Thermal Design Results (with +/- 15 deg C 
margin) - Marginal Hot & Cold
Min Max Min Max 
Zone 1 -55 125 12.2 66.3
Zone 2 -40 85 12.3 68.3
Zone 3 0 70 12.5 62.4
Zone 4 -20 80 12.1 59.7
Zone 5 -40 85 12.2 66.3
Zone 6 -40 85 12.2 68.6
43
  
The temperature history of zone 2 in one orbit is shown in Figure 3-7 . Note that the unit of 
X axis is 5 seconds and that of Y axis is oC. It can be found that the temperature increases 
from 33.4oC to 53.3oC in 50 minutes during its operation. 
Figure 3-7: The PPU Thermal Plot for Zone 2 
3.5 PPU Radiation Analysis 
Radiation effects on components are not considered in the Part reliability computation in 
Section 3.6 because quantitative computation of its effects require detailed radiation data 
from the component manufacturers, which is not common for COTS components.  However 
knowledge of the types of faults induced in electronics in a space radiation environment and 
their estimated probability of occurrence is important. Such information is useful in the 
3selection of radiation effect mitigation techniques. This is important in the development of 
a COTS computing payload which tends to be more susceptible to radiation.
Hence the first part of the radiation analysis uses XSat orbit information to input into 
radiation environment models for an assessment of the XSat radiation environment. This 
shall help estimate the PPU s exposure to radiation during its mission lifetime. The next part 
of radiation analysis is to validate the radiation performance of Actel AX1000 FPGA, the 
network element used in the PPU. The reliability of the AX1000 is crucial for the PPU as it 
contains the logic that controls all entities and interfaces in the PPU.  Radiation analysis is 
possible for this component as it has available cross-sectional Linear Energy Transfer (LET) 
information.  Hence, an analysis is done to predict the single-event-upset (SEU) rate for 
AX1000 FPGA.  The SEU rates for the FPGA system gates and its internal memories are 
considered for the suggestion of radiation mitigation techniques for this chip. 
44
  
3.5.1 Radiation Effects and Analysis in the PPU 
Essentially, high energy particles in space can interact with microelectronic devices and 
deposit energy in them by ionizing the surrounding silicon atoms [76]. The ability of the 
particle to release its energy and deposit energy on the device is expressed using the 
parameter, the Linear Energy Transfer (LET). The event in which a single particle deposits 
its energy in the device is called a single-event-effect (SEE). This event could result in a bit-
flip in logic states which is termed as a single-event upset (SEU). It could also cause a single-
event transient (SET) if the deposition of charges by a single particle into a combinatorial 
gate induces some transient spurious signal or voltage in the device. If this event is 
propagated through a combinatorial circuit path into a downstream register, it would cause 
the registering of a wrong state.
Other than single-event-upsets, the chips can also suffer from Total ionizing dose (TID) 
effects. This is due to the cumulative charge build-up due to ionization in an IC device. The 
subsequent effects could be degradation in electrical parameters or even leakage currents 
in the device. Thus, the susceptibility of devices to TID limits the number of years the device 
can survive in the specified orbit.
State-of-the-art COTS components are generally sufficiently hard to Total ionizing Dose 
(TID) to survive a three year mission in a LEO orbit. However, they are susceptible to the 
single-event-upsets from the high energy particles in space that will result in bit-flips in the 
logic gates within the device.
Radiation analysis requires the use of established software packages such as Spacerad [77] 
or CREME96 [78]. The Cosmic Ray Effects on Microelectronics (CRÈME) model is developed 
by the Naval Research Laboratory and available as part of CNES s OMERE or ESA s SPENVIS 
software tool. The SPENVIS software tool is used in the XSat Project. Basically these 
software tools incorporate radiation environment models to assess the electronics exposure 
to radiation based on the given satellite orbit information. If given the LET cross-sectional 
information of a particular component, the radiation effects on that component can be 
assessed for the particular orbit. The software tools can also compute the SEU probability of 
that component.  The first part of assessing the XSat radiation environment is done in 
Section 3.5.2, while the second part of analysing radiation effects on the AX1000 FPGA is 
done in Section 3.5.4. 
3.5.2 Analysis of XSat Radiation Environment 
The orbit that the device is in determines greatly the amount of radiation that the device is 
subjected to. XSat operates in a sun synchronous orbit at a nominal altitude of 817km, with 
an inclination of about 99 degrees. This gives approximately 14 orbits per day, where 30% 
of each orbit undergoes eclipse. 
Based on XSat s orbit parameters, the SPENVIS space radiation analysis tool [79] is used to 
predict the exposure of XSat to the following three main sources of radiation. 
45
  
1. Particles trapped by the Earth s magnetic field (i.e. the trapped particle belts)
2. Solar flares where particles (electron, proton and heavy ions) are blasted from the 
Sun towards the Earth in the form of solar wind 
3. Cosmic rays, which originate outside our solar system 
3.5.2.1 Exposure to Trapped Particle Belts in XSat Orbit 
From the SPENVIS tools, the world distribution of Electron Flux encountered in the XSat 
orbit is obtained and shown in Figure 3-8. This gives a good indication of XSat exposure to 
the trapped particle belts. The trapped radiation belt comprises belts of electrons and 
protons trapped in the magnetic field surrounding the earth. The electron environment 
consists of two flux maxima; the inner zone extends to 2.5 RE (Earth radii=6371km) and the
outer zone from 3 to 12 RE. The outer zone envelops the inner zone, its contours extending 
towards the Earth in cusps of relatively high density flux. Electron energies are up to 7 MeV,
with the most energetic particles occurring in the outer zone .
Figure 3-8: World distribution of Electron Flux encountered in the XSat orbit 
46
  
The level of fluxes is dependent on a satellite s orbit inclination and altitudes (see Figure 3-9
and Figure 3-10). Figure 3-9 shows that the XSat is exposed to more electrons in the outer 
belt when passing the Polar Regions, as compared to LEO satellites at lower inclinations.
Figure 3-10 shows that as the satellite altitude increases from 600 km to 1000 km, the level 
of trapped particles increases by 10 times.   
Figure 3-9: Integral Electron and Proton Fluxes for Different Orbits  
Figure 3-10: Electron and Proton Fluxes for XSAT Orbit at different Altitudes 
Shielding for Proton Differential Fluxes
47
  
3.5.2.2 Exposure to Solar Flares in XSat Orbit 
The exposure of XSat to Solar Flares is given in Figure 3-11. Solar flares are emitted by the 
Sun during solar disturbances and they vary with the solar cycle. The solar cycle consists of 
7 active years when significant solar-proton flux may be expected and 4 quiet years when 
only minor events occur. XSat s high inclined orbits is exposed to solar flares even at low 
altitudes, whereas equatorial orbits are quite shielded from solar protons except at high 
altitudes.
Figure 3-11: Solar Proton Fluences for XSAT 
3.5.2.3 Exposure to Galactic Cosmic Rays (GCRS) in XSat Orbit
Galactic cosmic rays (GCRs) originate outside the solar system and comprise about 85% 
protons, 14% alpha particles and 1% heavier nuclei. GCRs are at their peak level during 
solar min and at their lowest level during solar max. XSAT is to be launched in 2011, and 
hence approaches the start of a solar max. 
The level of exposure to GCRs also varies with the spacecraft orbit. Satellites with higher 
inclination and altitudes have less geomagnetic shielding and higher exposures to GCR 
particles. Hence XSAT is exposed to the full range of GCR particles even though it is at a 
comparable altitude as a LEO satellite. 
48
  
Although GCR flux levels are low compared to trapped particles, their high energies make 
them penetrating and an important contributor to SEE. Satellite Shielding is only effective 
for ions below energies of 1000 MeV/nucleon. Hence other mitigation measures (e.g. power
cycling and reset) need to be explored.
3.5.2.4 XSat Total Dosage Computation 
XSat TID analysis involved calculating radiation doses at the different locations of major 
satellite electronic units when the satellite is in orbit.  Firstly, a simplified geometrical 
model of the XSAT satellite configuration is constructed using simple shapes. This is to 
allow the simulation of a large number of rays, which are followed out through the 
geometry. The total aluminium (Al) shielding thickness along each ray is computed. Based 
on the shielding thickness at each point of the spacecraft, the corresponding total dosage is 
read off from a graph which plots the total absorbed dosage versus the Al shield depth 
curve.
          
Figure 3-12: Dose Depth Curve for Silicon Target behind Al Shielding.
The total dose versus Al shield-depth curve shown in Figure 3-12 is generated using the 
SHIELDOSE program in SPENVIS. It calculates the dose absorbed in different detector 
materials for incident proton and electron incident on aluminium sphere geometry. 
XSAT is modelled as a box with dimensions of approximately 0.85mx0.6mx0.6m. The 
structure of the body is assumed to be constructed from 10 mm Al honeycomb panels 
49
  
(density 50 kg m-3), with 0.4mm Al face-sheets on both sides. The honeycomb panels can be 
treated as solid Al with equivalent Al thickness of 0.97 mm since solid Al density is about 
2700 kg m-3. Similarly the solar array is modelled as two panels with dimensions of 
0.6mx0.8m and equivalent Al thickness of 0.67 mm. The model of XSat is shown in Figure 
3-13.
Figure 3-13: Geometrical model of XSat using the SPENVIS Software 
3.5.3 Implication of XSat Radiation Environment Analysis Results 
Due to its polar high orbit inclination, XSAT will experience radiation worse than a lower 
inclination LEO satellite. This is primarily due to its greater exposure to trapped electrons, 
solar particles and cosmic rays. Trapped electrons can be effectively shielded out, but 
trapped protons are a concern. Fortunately protons do not cause SEE directly; and that the 
PPU mainly operates in the equatorial region and not the polar regions.
The radiation analysis of XSat also shows that as the altitude of the satellite increases to 
1000km from 600km, the level of trapped particles increases by 10 times and the dose over 
three years will range from 3.15 to 12.03 krad under solar min or 2.76 to 15.21 krad under 
solar max. Hence the altitude of XSat orbit should not be near 1000 km if the mission is to 
be more feasible for COTS. The expected XSAT orbit is 817km.
Based on the range of TID experienced by components in the XSat orbit, the PPU 
components used preferably should have a TID requirement of at least 10 krad.  Most COTS 
components can survive such levels of total dosage. Hence use of COTS is possible for the 
PPU.
Analysing the radiation environment for the XSat orbit is necessary for estimating the 
amount of radiation that the components on PPU are exposed to. The Total Ionising Dosage 
50
  
(TID) experienced by the PPU module can be read off from the DOS Depth curve (Section 
3.5.2.4), once the average Al shielding thickness is determined. Based on the XSat Structural 
design, the shielding thickness for PPU is about 3.52mm. Hence by reading off the dosage 
curve, the TID can be known.  Assuming the worst case condition of solar max occurrence 
and an XSat orbit altitude of 875km, the TID experienced by PPU will be 7.98 krad. A 
minimum TID rating of 10krad is thus set for components to be used in PPU. Most COTS 
components can survive such levels of total dosage, making the use of COTS components for 
PPU hardware feasible.
3.5.4 Radiation Analysis on AX1000s Actel FPGA
In this section, single event upset rates for the AX1000 ACTEL anti-fuse FPGA is computed 
in the PPU because of availability of cross sectional LET data and because of its key role in 
the PPU hardware fault tolerant architecture. Available cross-section curve for the AX1000
is based on SEE testing done by Actel [80] (See Figure 3-14). The susceptibility of this 
component to single-event-upsets (SEUs) will determine the extent of design measures to 
mitigate them. 
SEE rate prediction involves identifying the size of the sensitive volume, calculating the rate 
of ion hits and the consequent energy depositions. The subset of total ion hits that causes 
SEU has to be determined. The SEE rate for the AX1000 is assessed using the CRÈME model, 
which calculates the Linear Energy Transfer (LET) spectra of incident radiation on 
electronics inside XSAT. This is based on the orbit parameters, the shielding surrounding 
the electronics and the solar environment. 
Figure 3-14: Cross Section Vs LET Curve for AX1000 R-Cells 
The cross section and sensitive volume of the AX1000 are integrated over the LET spectrum 
to determine the SEU rate. The output files from CRÈME contain energy and LET spectra 
51
  
and single event response rates for the AX1000.  Table 3-5 shows the comparisons of single 
event response rates for AX1000. 
Table 3-5:  Radiation Data for AX1000S 
                  FPGA 
Radiation Specs 
AX1000S XSat 
Requirement 
SEU LETTH (MeVcm2/mg) 1.4 (for SRAM)
2.89 (for registers)
6.73 (for clock & control logic
37(satellite 
industry 
benchmark) 
SEU Rate (errors/ flip flop/ 
day)
8.9 x 10-6
SEL LETTH (MeVcm2/mg) 120 40
Single-Event Transient (SET) NA NA 
Total Ionizing Dose 200krad (Si) 10 krad (Si)
3.5.5 Radiation Mitigation Techniques for AX1000 
From the analysis report, the AX1000 has a TID of 200krad. The AX1000 does not show 
much change in its core supply current characteristics up to an accumulated total radiation 
dose of 200krads. This is way beyond the 10krad estimated total dosage for the XSat orbit. 
Hence, AX1000 does not need additional shielding to protect against total dosage. 
Based on published radiation data, AX1000 has a high linear energy transfer threshold 
(LETTH) for single event latch-up (SEL). At 120 MeVcm2/mg, it is considered latch-up 
immune. But the AX1000 single-event-upset (SEU) LETTH is much lower than the satellite 
industry s benchmark of 37 MeV cm2/ mg [81]. The expected SEU rate for the AX1000 is 8.9 
x 10-6upset/ flip flop day. From the number of flip flops used in the FPGA logic design, there 
is likely to be 0.0267 upsets per day or 1 upset in 38 days.
Hence mitigation technique against SEU is advisable for the AX1000. Employing triple 
majority voting logic for each register in the FPGA logic design is one method of making the 
registers resilient to SEUs. The inbuilt tr iple-majority-redundant (TMR) registers for the 
Actel RTAX FPGAs is what makes that family of FPGAs radiation-tolerant. However 
complete TMR approach is costly as it requires three times the register resource. Hence, the 
best alternative would be to adopt partial TMR approach on critical portion of the circuit 
like the communication networks. If irrecoverable errors occur, the logic to power cycle the 
FPGA should be present.
The internal SRAM memory in the AX1000 also has low SEU LETTH.  For internal SRAM 
memory that stores data sensitive to single bit errors, Actel Axcelerator s intellectual 
property (IP) core for a SRAM circuit with an error detection and correction (EDAC) based 
52
  
on a class of linear block codes called the shortened Hamming codes is used. The core is 
accessed via Actel's SmartGen Macro Builder software [82]. This class of code is able to 
correct a single bit error and detect two bit errors. The functional diagram of this EDAC core 
is as shown in Figure 3-15. The use of the SRAM/ EDAC core combination works effectively 
to address the effects of soft errors, which achieves error rates better than 10-10 errors/bit-
day.
Figure 3-15: Functional Diagram of EDAC RAM
3.6 Part Reliability Modelling 
After the components in the PPU are selected, the failure rates of various individual 
components in the payload are derived. The results will be used to compute the overall 
system reliability of the payload in actual operation by constructing a Reliability Block 
Diagram, as elaborated in the next section [83]. 
In the Part Failure Analysis of the PPU, the component failure rates are derived from one of 
the methods below.
Manufacturers High Temperature Operating Life (HTOL) test data
Estimated using Empirical reliability models from MIL-HDBK-217F Notice 2, 
Reliability Prediction of Electronic Equipment.
Life test failure rates are preferred if they are available, as they are presumably more 
accurate. Unfortunately, only a limited number of component manufacturers (e.g. Texas 
Instruments, National Semiconductors and Analog Devices) provide them. Hence the 
reliability computation for rest of the parts is based on estimations using the Mil-Hdbk-217 
53
  
reliability models. Mil-Hdbk-217 reliability models incorporate many factors like packaging 
considerations, electrical stress, thermal stress and environmental conditions etc. Each 
factor is determined from the information provided by the part s datasheet, the PPU 
hardware design schematic and the XSat operational conditions. To facilitate the analysis of 
each factor on the part failure rate computation, the Lamda Predict software from Reliasoft 
is used. 
The results for the part failure analysis are as given in Appendix A. Components with life 
test data constitutes a larger portion of the IC components. For reliability prediction,
operating temperature of XSat is taken as 55 degrees Celsius.  In a space vacuum
environment, operating temperature refers to the maximum temperature of the mounting 
surface in contact with a part (e.g. heat sink). 
3.7 System Reliability Modelling and System Trade-off  
The goal in this section is to access the inherent reliability of the proposed payload 
architecture to meet reliability goals over three years of operation under space 
environment. Reliability Block Diagram (RBD) modelling is used as a means to evaluate and 
refine the architectural and detailed circuitry design.  RBD modelling displays a system as 
functional blocks, where the blocks are interconnected through series, parallel or complex 
configurations. Series blocks represent functionalities that depend on all members of the 
chain to function properly. That is when one fails, the whole chain fail. Blocks in parallel 
represent redundant paths. Each block has a reliability figure that is calculated via 
summing the failure rates of the components that make up the block. The RBD for the PPU is 
done in software called ReliaSoft [84], and as shown in Figure 3-16.
The RBD for the PPU is too complex for analytical computation of reliability. Hence, Monte 
Carlo Simulation is used. In the RDB there is a start block and an end block, with 
intermediate nodes in between. There are different paths that can be taken from the start 
block to the first node in the RBD (Node A) and altogether 20 possible paths from Node A to 
Node B and to the end block. Each possible path from start block to the end block represents 
a successful scenario path in the PPU operation.  The RBD contains Mirrored blocks - blocks 
with a tiny square at the top left corner. These blocks are used to place more than one 
instance of the same component in different locations without double counting during 
reliability computation.
From start block to node A, there are several possible parallel paths to increase probability 
of reaching Node A. For example, the provision of dual-redundant power module has helped 
create two main parallel paths from start block to Node A. Each power module supplies to 
only half of the processing clusters. 
54
  
Figure 3-16: The PPU Reliability Block Diagram plotted using Reliasoft software
A proposed interconnection network that can be dynamically configured to interconnect 
entities in one, two or all four processing clusters has also helped improve reliability in 
many ways. Figure 3-4 illustrates the various network configuration options. Due to such 
interconnectivity, cross-strapping of CAN interfaces and LVDS interfaces are supported, as 
observed in the RBD cross links from CAN1 to LVDS2 and from CAN2 to LVDS1. This 
interconnection network also allows the triple voting serial flash memory in each network 
element to be accessible throughout the entire network. Hence one out of four flash banks 
can be used. In addition, as long as two out of three serial flash chips survives in any of the 
A 
B 
55
  
flash bank, the processing nodes would have the capability to boot up by reading the 
operating system codes in that set of serial flash chips in a triple-voting read fashion. The 
cost of the above flexibility would be the considerable design effort required to implement 
the configurable network interconnection facility.
Next, let s study the 20 parallel paths from Node A to the Node B. Each path depicts the 
series conditions required for a particular StrongARM processor to boot up healthily. For 
example, a StrongARM connected to Slave FPGA A is functional only if its power module is 
healthy (pwr1); hence the presence of the Pwr1 mirrored block in its path. It also requires 
both Master FPGA A and Slave FPGA A to be healthy. The reason is because from the 
possible interconnection network configurations (see Figure 3-4) that the architecture 
supports, there is no network configuration for a Slave FPGA which does not route through 
its corresponding Master FPGA. Hence a StrongARM that is connected to Master FPGA B 
would require only Master FPGA B to be healthy for the network to be operational. 
An architecture that supports more parallel paths would improve reliability. But 
incorporation of each parallel path is an additional architectural feature which adds to 
design complexity and cost. Hence careful design decisions and tradeoffs are done to 
balance the cost and benefits. 
Finally, by plotting the probability of N successful paths to the node, the probability of the 
PPU operating with N healthy StrongARM processors can be obtained (as shown in Table 
3-6). To compensate for the fact that some redundant units while switched off are subjected 
to less stress than active units, the concept of dormancy factor is introduced. Passive failure 
rate is assumed to be a factor of one-tenth the active failure rate - implying a dormant factor 
of 0.1. This is the factor commonly used by spacecraft manufacturers for non-operating 
units [85].  The duty cycle of the payload is taken to be 10%, which refers to the percentage 
duration in which the payload is in operation in orbit.
The following terms are explained.
Active failure rate (AFR) Failure rate when the subsystem or module is powered on 
100% of the time as is the case for during the orbit.
Dormant failure rate (DFR) Failure rate when subsystem is inactive or powered off as 
will be the case when the module is used only in certain operating scenarios.
Effective failure rate (EFR) Failure rate that is dependent on the duty cycle and can be 
calculated as follows:
EFR = Duty Cycle * AFR + (1-Duty Cycle)*DFR
The current reliability figures are computed by the ReliaSoft software using Monte Carlo 
simulation on the RBD, as shown in Figure 3-16.
56
  
Table 3-6: Probability of N number of SA-1110s in Operation (duty cycle of 10%)
Failure Rate Reliability  
N Number of Healthy SA-1110s Active 
(AFR)
Dormant 
(DFR)
Duty Cycle 
(DC)
Effective 
Failure Rate 
(EFR)
Current 
Design 
1 0.9517 0.0952 10 0.1808 0.9953
2 0.9750 0.0975 10 0.1853 0.9951
3 0.9790 0.0979 10 0.1860 0.9951
4 0.9907 0.0991 10 0.1882 0.9951
5 1.0260 0.1026 10 0.1949 0.9949
6 1.1980 0.1198 10 0.2276 0.9940
7 1.4030 0.1403 10 0.2666 0.9930
8 2.3990 0.2399 10 0.4558 0.9881
9 4.6487 0.4649 10 0.8833 0.9771
10 8.0418 0.8042 10 1.5279 0.9606
11 10.139 1.0139 10 1.9264 0.9506
12 10.283 1.0283 10 1.9538 0.9500
13 10.674 1.0674 10 2.0281 0.9481
14 11.330 1.1330 10 2.1527 0.9450
15 12.979 1.2979 10 2.4660 0.9372
16 16.994 1.6994 10 3.2289 0.9186
17 25.540 2.5540 10 4.8526 0.8803
18 42.198 4.2198 10 8.0176 0.8100
19 72.494 7.2494 10 13.7739 0.6963
20 128.80 12.8800 10 24.4720 0.5256
57
  
The advantage of using system modelling tools like RDB is that design decisions are backed 
by quantitative system reliability figures. Three operation states are defined, where each 
state represents different computation performance level that can be derived from the 
payload, with access to the spacecraft command bus and payload data bus.  The spacecraft 
command bus is a Controller Area Network bus through which the payload receives its CAN 
telecommands from the satellite ground station and sends its CAN telemetry packets back 
to the ground station. The payload data bus is a high speed LVDS link with the onboard 
Solid State Recorder a shared mass memory storage device with links to the onboard 
optical camera output and uplink data packets.
Survival state - availability of at least one processing node with access to spacecraft 
command bus and payload data bus.
Nominal Performance State   - availability of at least 4 processing nodes all with 
access to spacecraft command bus and payload data bus.
Peak Performance State availability of at least 16 processing nodes all with access 
to spacecraft command bus and payload data bus.
From the Reliability Table, the Survival State Probability (at least one working SA-1110) is 
0.9953, the Nominal State Probability (at least 4 working SA-1110s) is 0.9951, and Peak 
Performance State Probability (at least 16 working SA-1110s) is 0.9186. The high Survival 
and Nominal Performance State reliability probabilities are attributed mainly to the ample 
redundancy paths in the hardware, and also due to the low duty cycle of the PPU. As a fair 
means of comparison with the radiation-hardened ERC32 Onboard Computer on XSat, the 
reliability figures are recomputed for an assumed duty cycle of 100%. The lowered 
reliability figures are as shown as in Table 3-7.
The results illustrate that the reliability figures are still very promising at duty cycle of 
100%. Survival Probability of the PPU is 0.9753. This is comparable with the reliability 
figure of the XSat Onboard Data Handling (OBDH) subsystem, which has a value of 0.976. 
The XSat OBDH consists of a dual redundant ERC32 onboard computer and its I/ O interface 
modules. As the reliability results for the PPU payload satisfy XSat mission needs, the 
hardware architecture of PPU will be designed as planned.
    
58
  
Table 3-7: Probability of N number of SA-1110s in Operation (duty cycle of 100%)
Failure Rate Reliability  
N Number of Healthy SA-1110s Active 
(AFR)
Dormant 
(DFR)
Duty Cycle 
(DC)
Effective 
Failure Rate 
(EFR))
Current 
Design 
1 0.9517 0.0952 100 0.9517 0.9753
2 0.9750 0.0975 100 0.9750 0.9747
3 0.9790 0.0979 100 0.9790 0.9746
4 0.9907 0.0991 100 0.9907 0.9743
5 1.0260 0.1026 100 1.0260 0.9734
6 1.1980 0.1198 100 1.1980 0.9690
7 1.4030 0.1403 100 1.4030 0.9638
8 2.3990 0.2399 100 2.3990 0.9389
9 4.6487 0.4649 100 4.6487 0.8850
10 8.0418 0.8042 100 8.0418 0.8095
11 10.139 1.0139 100 10.1390 0.7661
12 10.283 1.0283 100 10.2830 0.7632
13 10.674 1.0674 100 10.6740 0.7554
14 11.330 1.1330 100 11.3300 0.7425
15 12.979 1.2979 100 12.9790 0.7110
16 16.994 1.6994 100 16.9940 0.6398
17 25.540 2.5540 100 25.5400 0.5111
18 42.198 4.2198 100 42.1980 0.3299
19 72.494 7.2494 100 72.4940 0.1488
20 128.80 12.8800 100 128.8000 0.0339
59
  
3.8 Conclusion 
In answering the research question for a feasible design approach that can achieve 
computational performance and reliability in a cost effective manner, the PPU design 
approach is proposed and presented.  This chapter describes a design approach that 
enables the rapid transfer of COTS processing technology for use in space to achieve 
performance, without compromising reliability. 
The thesis has investigated a unique approach of using widespread parallelism and 
redundancy at all levels to satisfy performance, reliability and resource requirements. This 
is supported by a detailed part selection process that includes practical considerations like 
thermal, power, mass, volumetric and component placement constraints.
For a design to be feasible for space use, radiation is an important consideration for the 
parts selected. For some components, radiation mitigation measures needs to be put in 
place.  Critical components like the FPGA have to be screened for susceptibility to radiation.  
The TID rating of AX1000 FPGA of 200krad is much higher than the minimum requirement 
of 10rad. However the AX1000 is expected to experience some single event upsets in the 
internal memory and in the flip flop. These upsets can be mitigated using techniques 
mentioned in Section 3.5.5.  For other components, an Al shielding thickness of 3.52mm is 
sufficient to keep the dosage level within the tolerable limit of the COTS components.
In the design approach, parts reliability assessment of the COTS components is done, in 
order to subsequently do a quantitative measurement of system reliability. Using such a 
system reliability assessment, different forms of parallel processing and parallel connection 
architectures can be compared. The impact of various design decisions can be assessed, and
through iterative trade off and re-assessments, the most cost-effective architectural 
approach that can meet design goals is derived [86]. 
60
  
4 PPU Fault Tolerance and Fault Management 
4.1 Introduction 
This chapter focuses on the 2nd research question, which is to achieve efficient 
implementation of the reconfigurable network and fault management schemes in FPGA. 
This is necessary since the fault avoidance approach of using radiation-hardened parts is 
not adopted for the PPU. The use of commercial-off-the-shelf (COTS) or non-radiation 
hardened components for a space computing payload requires knowledge of the possible 
faults induced in electronics in a space radiation environment. Mitigation techniques have 
to be employed to mask its effects; and examples of such mitigation techniques can be found 
in the following literature [21, 87, 88, 89].
There are different principles of dealing with fault tolerance. One is the static approach of 
using passive hardware redundancy for fault masking. This approach uses triple-modular-
redundancy either in using three of the same parts, or in the triple execution of the same 
software code on the same data set; and having a voting process among them. The other is 
the use of active hardware redundancy, which is a dynamic approach of detecting, 
containing and dealing with faults. The first approach is costly in terms of resources but 
does not allow faults to affect operation. The second approach tolerates faults but recovers 
from them through hardware reconfiguration. The latter is a flexible approach of handling a 
wide range of faults at a moderate resource cost [90]. The PPU payload adopts a hybrid 
approach, where both static and dynamic redundancy is employed to reach reliability 
targets. 
The PPU reliability modelling in Section 3.7 predicts the hardware survivability of the 
payload under different operation scenarios. The PPU achieves a high level of system 
reliability because of ample hardware redundancies as proposed from the reliability 
modelling. There are redundancies in the power module, processing nodes, network 
elements, volatile memory and non-volatile memory components and external interfaces to 
the command and data bus. The strength of the PPU design lies in the ability to be 
reconfigured in the event of component failures [91]. Some important characteristics of this 
reconfiguration are as follows.
1. The PPU can be reconfigured in the case of faults the same way it can be 
reconfigured for varying levels of power consumption or computation power.  
61
  
Hence reconfiguration is not only a means to handle faults, but it is also a means to 
perform trade-off between reliability, computational capability and power 
consumption.
2. The PPU employs dynamic reconfiguration techniques [92] and not static 
reconfiguration; dynamic reconfiguration can be done any time in orbit, whereas 
static reconfiguration requires presence of ground pass and usually means re-
programming of the FPGA network element to remove faulty elements from the 
network. Hence dynamic reconfiguration enables runtime efficiency, although 
more is involved in the implementation of such a scheme.
3. The PPU s architecture and the operational concepts allow the internal 
reconfiguration to be hidden from external interface entities. This requires less 
monitoring and intervention from the main onboard computer on XSat.
4.2 PPU Hardware Redundancies and Reconfiguration 
In normal operations, the PPU requires a minimum number of processing nodes to work in 
parallel (depending on the computational demands of the application), as well as at least 
one working interface to both the Controller Area Network (CAN) and the onboard Solid 
State Recorder (SSR). The CAN bus is the XSat s onboard commanding network. It is 
essential for the PPU operation because it receives all its commands from the XSat onboard 
computer (OBC) through the CAN bus. The PPU health status is also sent to the OBC through 
the CAN bus, and transmitted as telemetry data to the ground station. The PPU s operational 
success depends heavily on its high speed communication link to the SSR. This is the link 
where the PPU receives image data from the IRIS optical camera, as well as the PPU boot-
loader and application codes from the ground.
Hence, one way of increasing the fault tolerance of the PPU is to provide sufficient 
redundancy in the elements that are identified as crucial for the PPU operation. The next 
step is to have an operational framework that does runtime selection of healthy processing 
elements, network elements and interfacing entities to perform a task collectively.  In 
summary, there are 20 processing nodes in the PPU, divided equally into four processing 
clusters.  There are also four sets of triple-redundant serial flash chips for redundancy and 2 
sets of CAN and SSR interfaces.
For the purpose of redundancy, the four processing clusters in the PPU are divided into two 
symmetrical halves. Each half is powered by a separate DC-DC converter and switch-mode 
regulators. Hence failure of any power module would only affect half of the total number of 
processing clusters. Each half has access to one CAN and one SSR interface. The 
communication network routes data through all entities in the PPU. Hence command or 
data packets received from the CAN or SSR interfaces can be transmitted to any entity on 
the PPU as long as the communication network services provided by the FPGA network 
62
  
elements are not broken. Similarly, code and static data stored in any non-volatile flash chip 
are accessible to the rest of the entities on the PPU.
To support the above hardware architecture with redundancies, the FPGA network 
elements detect faults and reconfigure the PPU in the event of faults. Different 
reconfiguration schemes are designed to respond to various hardware failures. Failures can 
occur in the processing nodes, processor booting process, communication network, flash 
chip or even in the FPGA network elements. There could also be failure in the 
communication path to interface entities or failure in the interface entities. The high 
reliability figure obtained in the reliability modelling in Section 3.7 assumes the presence of 
an underlying fault detection and handling framework, as well as a strong fault tolerant 
inter-communication network between entities.  These architectural features make those 
reliability estimates a reality.
In short, the hardware fault tolerance strategy of the PPU is to recover from faults in the 
hardware entities to ensure minimal downtime of the computation resources. However, 
during the occurrence of faults, the hardware reconfiguration process may result in loss of 
data and context information. 
The PPU incorporates several fault tolerance schemes and concepts for its processing 
clusters, so as to meet reliability requirements. The PPU fault tolerance mechanisms 
comprehensively deal with faults and failures in the following areas: 
Communication network  
Processor Nodes 
FPGA Network element  
Flash chip failure or data corruption 
External Communication Interfaces (e.g. SSR, CAN)
The various operational failure handling methods will be described in the following 
sections.  The concept of how virtual addressing helps to simplify fault handling procedures 
is described.
4.3 Communication Network Fault Handling 
4.3.1 PPU Communication Network Topology 
As mentioned in Section 2.3.2, the communication structure used for the parallel processor 
is very dependent on the characteristics of the application supported.  One important and 
most basic class of image processing applications is image compression.  For this class of 
application, the processing is very much data driven and requires minimal inter-processor 
63
  
communications.  Each processor runs the same execution code and works on its own 
allocated data partition.
In the PPU, a dedicated on-chip network is used for the distribution of execution codes and 
image data from the Solid State Recorder (SSR) to the various processors for processing and 
for streaming back the processed data from the processors to the SSR.  In addition, the 
network is also used for the broadcasting of the system states for global synchronization.
In the design of the communication network for the PPU, two important criteria have to be 
fulfilled.
1. Fault Tolerance - The communication network is required to handle faults as well 
as dynamically changing configuration, due to faults in the processors.  To avoid a single 
point failure of the on-chip communication network FPGA, the communication network has 
to support inter-FPGA clustering that is capable of reconfiguring the network route to avoid 
the failed FPGA.
2. The communication network has to be capable of scaling and accommodating 
heterogeneous components on its network, interconnecting them to form the complete 
system.  Unlike the conventional network, consisting of just processors, the communications 
network for the PPU has to interface with different type of network entities (e.g, processors, 
memories and interface chips). The communication network is an important component of 
the fault tolerance scheme, to fulfil the reliability goals of using parallel processing to 
overcome faults.
A ring topology is chosen for the communication network for several reasons.  Firstly, it is 
simple to implement and also requires relatively less FPGA resources.  Secondly, in a typical 
data intensive processing application, the communication network is used to stream
application data in at the beginning of an operation scenario, and to stream the processed 
data out at the end of the scenario. A ring network will be suitable for serial streaming of 
the image data to all the nodes. Thirdly, the simplicity of the ring topology makes the 
incorporation of fault tolerance-related logic, like virtual addressing, detection of missing 
messages and recovery of network transmission, easier.  Fourthly, it is simper to implement 
variable length messaging on the ring network and to support different interfaces with 
different data rates and volumes.
4.3.2 Introduction to the Variable Time Global-slotted Backplane (VTGB) Bus
The backbone of the PPU architecture is a communication network that interconnects all 
entities. This network is message-based and has a ring topology. It routes through all four 
processing clusters, and is internally referred to as the Variable Time Global-slotted 
Backplane (VTGB) bus. The original TGB is a simple fixed ring network proposed by Dr Ian 
McLoughlin [93].  The TGB is a fixed 32-bit frame slot network, where destination 
addressing requires the sender to have knowledge of the position of the destination node in 
the ring network.
64
  
VTGB, on the other hand is a substantially modified version of the TGB. As the name 
suggests, VTGB is a variable length frame slot network that has a 16-bit bus width.  It 
contains useful command directives (see Table 4-1) and several fault detection and 
recovery capabilities (see Section 4.3.4), which makes it significantly different from the 
generic TGB. The added features enhance the reliability of the communication network to 
recover from bit errors that could occur due to radiation in space.
The VTGB bus supports both the point-to-point communication mode as well as the 
broadcast communication mode. For the point-to-point communication mode, sending 
entities and receiving entities are identified by the source and destination addresses in the 
VTGB message. A message sent by the sending entity circulates round the network till it 
reaches the specific destination entity where it is removed from the network. In the case of 
broadcast communication, the message circulates one round through the entire network till 
it reaches the sending entity again. The broadcast message can be grabbed by any bus node 
identified as having the same broadcast destination address as that specified in the VTGB 
message.
Figure 4-1 shows the entities connected onto the VTGB bus. The network comprises all the 
PNs, SSR, CAN and Flash interface entities, each connected to the VTGB interface logic. This 
network supports interfaces to entities of different communication bandwidth, data volume 
and transmission characteristics (continuous or in bursts). This network is intended for use 
as a command and data distribution network, as well as for transfer of data to and from the 
flash banks. It is not intended for intensive inter-processor communication for executing 
parallel algorithms. A second network is designed for this purpose, in the mesh array 
topology (see Sections 5.2 and 5.3). 
The VTGB packet slot can be used to send packets as small as 8 bytes and packets as long as 
224 half- words (each half-word is 16 bits long). The frame slot is variable in length to cater 
for entities with different data volume. For example, the CAN interface entity typically 
communicates with data packets of 16-bytes while SSR or StrongARM interface entities can 
communicate in data packets in the Mbyte range. 
 
65
  
VTGB network loops through 
four FPGAs
Figure 4-1: Variable Time Global-slotted Backplane (VTGB) Bus spanning across all 
four processing clusters  
4.3.3 VTGB Framing Structure 
The VTGB framing structure is shown in Figure 4-2. VTGB message contains an 8-bit 
destination address and an 8-bit sending address. For both the sending and destination 
address, the most significant bit of the address differentiates between the physical and 
logical (virtual) address. A physical address is a hard-coded address allocated for each 
entity when the FPGA is first programmed; whereas a logical (virtual) address is a 
dynamically-allocated address determined at runtime by software routines. Every node can 
have both a physical as well as a virtual node address. This virtual address can also be a 
virtual broadcast group address. The second most significant bit of the destination address 
differentiates between the two communication modes.
66
  
Figure 4-2: VTGB Short (a) and Long (b) Message packet formats
The description of the various fields is provided below:
Destination - message destination address 
Source message source address  
Command The most significant bit (MSB) determines if the message is of type long 
or short, while the lower seven bits of the command field differentiate up to 27
command types.
Short Message Data Field - Data field for the short message type.
Long Message Counter Field Specifies the number of half-words in the data field for 
a long message. 
Checksum - 16-bit long cyclic redundancy checksum over the first 6 bytes of the 
packet fields. 
4.3.3.1 VTGB Protocol Machine 
The VTGB is a synchronous ring network where at every clock cycle, a VTGB node sends a 3-
bit output command directive and 16-bit data to the clockwise node in the ring and receives 
a 3-bit input command directive and 16-bit data from the anti-clockwise node in the ring. 
The 3-bit command directive differentiates up to 8 command directives, out of which 6 are 
used. These commands specify if the half-word message received represents an empty slot, 
the start of a VTGB message packet, a continuous half word in the VTGB message packet or 
the last half word in the VTGB message packet. It can also represent a transmission abort 
command. 
67
  
Flow control logic is also incorporated in the VTGB to support entities of different 
communication bandwidth. A VTGB node receives a busy status input signal from the 
clockwise VTGB node and sends a busy status output signal to the anti-clockwise node. 
Logic high on its busy status input line indicates that the clockwise node is too busy to 
process input data, and hence an idle message instead of valid data is sent to the clockwise 
node. A VTGB node indicates its busy state by outputting logic high or low on its busy 
status output line. 
Table 4-1: VTGB Command Directives
Directive 
Name 
Logic on 
Command lines 
Directive Description 
IDLE  000 Indicates that the 16-bit data line contains idle data 
ABORT 100 Indicates an abort request to end current packet transmission.
FIRST 010 Indicates that the 16-bit data line contains the first half-word of 
the packet   
CONT 001 Indicates that the 16-bit data line contains continuous half-words 
of the packet   
LAST 011 Indicates that the 16-bit data line contains the last half-word of
the packet   
EMP_SLOT 101 Indicates an empty packet slot that can be seized by a node 
wanting to transmit a new packet.
The VTGB bus operation begins when an empty frame slot starts to circulate in the ring 
network. This empty frame slot is hardcoded to be released by one of the entities.  In the 
idle state, this empty frame slot continues circulating round the ring network and can be 
seized by any entity that has something to transmit. The entity that is transmitting is termed 
the sending entity and the entity that is receiving is termed the receiving entity.
In VTGB command terms, a frame slot is inserted into the VTGB network by sending an 
EMP_SLOT input command directive to the designated starting node. This is done only once 
during the start of the network operation. A node seizes the slot by sending the FIRST 
output command directive, together with the 16-bit first half-word of a packet. It sends 
continuous half-words of the packet by sending the CONT output command directive, 
together with the continuous half words of the packet on its 16-bit data lines. The last half-
word of the packet is sent on the 16-bit data line together with the LAST output command 
directive. Finally the slot is released by sending the EMP_SLOT output command directive.
68
  
A VTGB node begins in the idle state and can transit to the passing, sending or grabbing state 
according to the events that occur. From the events that occur and the current state that the 
node is in, the specific action function to be executed and the next transition state can be 
determined from the VTGB protocol machine as described in Table 4-2. For example, in the 
passing state, when event E7 occurs, it executes action function A3 and returns back to idle 
state. The VTGB protocol machine described in Table 4-2 has been simplified for ease in 
explanation. The events and action functions mentioned in the state machine are described 
in Table 4-3.
Table 4-2: VTGB Protocol Machine 
Events Idle State Passing State Sending State Grabbing State
E0 A1/Idle A3/Passing A4/Sending A2/Grabbing
E1 A1/Idle or Sending A3*/Idle A4*/Idle A2*/Idle
E2 A1/Grabbing A3*/Idle A4*/Idle A2*/Idle
E3 A1/ Passing A3*/Idle A4*/Idle A2*/Idle
E4 A1*/Idle A3/Passing A4/Sending A2/Grabbing
E5 A1*/Idle A3/Idle A4/Sending A2/Idle
E6 NA NA A4/Idle NA 
E7 A1*/Idle A3/Idle A4/Idle A2/Idle
E8 NA NA A4/ Sending NA 
E9 NA NA A4/Idle NA 
E10 NA NA A4/Idle NA 
* refers to non-applicable (NA) paths. An error will be flagged if these paths are reached
69
  
Table 4-3: VTGB Event/Action Description  
Event
Number
Description of Event
E0 Receive idle message
E1 Receive empty slot
E2 Receive start of packet
addressed to node
E3 Receive start of packet
addressed to other nodes
E4 Receive continuous half-words
of a packet
E5 Receive last half-word of a
packet
E6 Sending state timeout
E7 Receive abort request
E8 Send continuous half-words of
a packet
E9 Send last half-word of a packet
E10 Send Abort Request
Action
Function
Description of Action
Function
A1 Action function in the
idle state
A2 Action function in the
grabbing state
A3 Action function in the
passing state
A4 Action function in the
sending state 
Description of Action Function in idle state (A1)
If a VTGB node in the idle state receives an EMP_SLOT input command directive from the 
previous VTGB node, the VTGB node seizes the slot if its associated entity has a valid data 
packet to send. It seizes the slot by sending a FIRST output command directive to the next 
VTGB node together with the first half-word of the data packet on its 16-bit output data 
lines and transits to the sending state. If there is no packet to send, it releases the slot by 
sending the EMP_SLOT output command directive to the next VTGB node and remains in 
the idle state.
Upon receiving a FIRST input command directive, it checks the contents of its 16-bit data 
line, containing the first half-word of the received data packet for the destination and source 
address.  If the source address is its own address, the packet is discarded (since the 
destination node has not grabbed the message). Otherwise if the destination address is its 
own address, it grabs the first half-word and transits to grabbing state. Else, it passes the 
first half-word to the next node together with the FIRST command output directive and 
transits to passing state.  
However if the next VTGB node is busy (busy status input = 1 ), the data is buffered and 
sent at a later time. In this situation, the VTGB node sends an IDLE output command 
directive to the next node. It also sets logic high on its busy status output line, to inform the 
previous node to stop sending. In the case of a broadcast receive message, the contents are 
also passed to the next VTGB node.
 
70
  
Description of Action Function in passing state (A3)
If a VTGB node in the passing state receives a CONT input command directive from the 
previous VTGB node and the next VTGB node is not busy (busy status input = 0 ), it passes 
the contents on the 16-bit data lines to the next VTGB node together with the sending of the 
CONT output command directive. If a VTGB node receives a LAST input command directive 
from the previous VTGB node and the next VTGB node is not busy ( busy status input = 0 ), 
it passes the contents of the 16-bit data lines to the next VTGB node together with the 
sending of the LAST output command directive. In the latter case, it transits back to idle 
state after sending the last half-word of the packet.
However if the next VTGB node is busy (busy status input signal = 1 ), the half-word of the 
packet is buffered. In this situation, the VTGB node sends an IDLE output command 
directive to the next node. It also sets logic high on its busy status output line, to inform the 
previous node to stop sending.
In this state, if it receives the ABORT input command directive, it sends an ABORT output 
command directive to the next node. It then exits the passing state and transits back to idle 
state. 
Description of Action Function in sending state (A4)
In the sending state, the VTGB node transmits a data half-word to the next VTGB node if it is 
not busy (busy status input= 0 ). In the case where the data half-word is a continuous half-
word of a data packet, the CONT output command directive is sent together with the data. If 
the data half-word is the last half-word of a packet, the LAST output command directive is 
sent together with the data. 
However if the next node is busy (busy status input = 1 ), it buffers the half-word and sends 
an IDLE output command directive to the next node. When waiting for transmit data from 
its associated entity, if the waiting time between half-words exceeds a certain time interval, 
an abort timeout occurs. In this case, the sending node sends an ABORT output command 
directive to the next node to terminate the packet transfer. 
Description of Action Function in grabbing state (A2)
If a VTGB node in the grabbing state receives a CONT or LAST input command directive 
from the previous VTGB node, the VTGB node grabs the half-word on its 16-bit data lines 
and passes the half-word to its associated entity.  If it is a broadcast message, and the next 
VTGB node is not busy (busy status input = 0 ), it passes the contents on the 16-bit data 
lines to the next VTGB node together with the sending of the CONT or the LAST output 
command directive. In the case where the node receives a LAST input command directive 
from the previous VTGB node, the VTGB node transits to the idle state after sending the last 
half-word to its associated entity.
71
  
However if its associated entity is unable to process more data (i.e. when the entity input 
FIFO is almost full) or the next node is busy in the case of a broadcast message, it buffers the 
half-word. It also sends logic high on its busy status output line to the previous node to 
request the previous node to stop sending data half-words.
In this state, if it receives the ABORT input command directive, it exits grabbing state and 
transits back to idle state. If the message is a broadcast message, it sends an ABORT output
command directive to the next node. Otherwise, the IDLE output command directive is sent.
4.3.4 VTGB Fault Tolerant Features 
To achieve fault tolerance in the communications network, the VTGB network must be able 
to recover from the following error scenarios: 
1. Failure of the sending entity, causing the empty slot not to be released back into the 
network 
2. Failure of the receiving entity, causing the receiving FIFO not to be cleared
3. Circulating messages addressed to non-existent entities that are not grabbed, causing 
the empty slot not to be released back into the network.
4. Possible network failure due to protocol anomalies in the command lines, resulting from 
single-event upsets occurring in the FPGA gates.
In the following sections, the fault handling for each of the above error scenarios will be 
described.
4.3.4.1 Failure of the sending entity causing the message slot not to be released back 
into the network 
To prevent a sending entity from holding onto the message slot forever, a timeout must be 
imposed. The VTGB handler monitors the duration an entity is in the sending state. Upon 
timeout, the handler sends an abort message to the receiving entity to terminate the 
transfer, and release an empty slot back into the network. The default timeout period is set 
to be 10 seconds. This is computed based on the speed of the VTGB network, the expected 
maximum packet length and the worst burst transmission interval. When operation 
conditions change, this timeout can be re-configured at runtime by sending CAN commands 
from the satellite s onboard computer.
4.3.4.2 Failure of the receiving entity causing the receiving FIFO not to be cleared 
Failure of the receiving entity would result in the receiving FIFO not being cleared.  The 
receiving entity will be in the busy state perpetually, causing VTGB handler for the sending 
entity to timeout after 10 seconds. The VTGB handler at the sending entity will then send an 
72
  
abort message to the receiving entity to terminate the transfer. When this happens, an 
empty slot is released back into the network. 
4.3.4.3 Circulating Messages addressed to non-existent entities that are not grabbed, 
causing the empty slot not to be released back into the network 
The VTGB handler in the sending entity constantly monitors its receive FIFO while it is in a 
sending mode.  Presence of messages in the receive FIFO could indicate an absence of a 
valid receiving entity in the case of a point-to-point message type; or it could be the return 
of a broadcast message. Either case, these messages have to be removed from the network 
by the sending entity.
4.3.4.4 Possible network failure due to protocol anomalies in the command lines, 
resulting from to single-event upsets occurring in the FPGA gates.
The PPU uses an industrial grade Actel Antifuse FPGA for its network element logic, thus it 
is not immune to single-event-upsets (SEUs) that can happen in the space radiation 
environment (refer to Section 3.5.2). Triple modular redundancy (TMR) is the most 
frequently used technique to mitigate errors due to SEU, but it is very expensive in terms of 
FPGA gate resource and power consumption. By trading off reliability and applying TMR 
selectively to parts of the FPGA design, the cost in terms of resources and power can be 
reduced. 
In the PPU, TMR is partially applied to VTGB communication network registers and to 
important registers storing critical information like processor states or configuration data. 
However, there are still sections of the FPGA protocol logic that might be susceptible to 
SEUs. SEUs in state machines can result in protocol anomalies.
To protect against such anomalies, the FPGA code first detects several types of VTGB 
protocol errors.  Some examples of these errors are given below:
Receiving an empty frame slot in the midst of sending, indicating duplicate frame 
slot in network 
Receiving an invalid command on the 3-bit command lines.
Not receiving an empty frame slot after a reasonable period of time.
Upon detecting such scenarios, the only way to recover the network is to perform a reset of 
the VTGB network. The FPGA provides a separate internal reset line for the VTGB network.  
Upon reset all registers are re-initialised and a new empty slot will be released into the 
network, resuming normal operation of the network.
73
  
4.4 Processor Fault Handling 
The FPGA logic in the PPU network element implements several schemes for processor fault 
detection and handling for autonomous recovery of faulty processors. These dynamic fault 
handling schemes require minimal intervention from the XSat Onboard Computer (OBC). 
Hence there is no computation overhead or added complexity to the OBC software. Logic is 
incorporated to monitor the health of each processing node and control their operational 
states. 
The PPU hardware is divided into two separate power planes, each sourced by its own DC-
DC converter and switch-mode regulators. Each power plane has two processing clusters 
and each processing cluster consists of one FPGA network element, one triple-redundant 
flash chip bank, together with its associated SA-1110 processors and memory. 
Each FPGA network element has individual power and reset control lines to each of its 
connected StrongARM SA-1110 processors. The FPGA can power-cycle or shutdown a 
processor when it hangs in execution or fails. Figure 4-3 shows the details of the fault 
tolerant control logic for powering up, booting and health checking of the SA-1110s during 
operation.
When a CAN command is received to power-up the StrongARM SA-1110 nodes, the FPGA 
attempts to sequentially start up the nodes, in an order  based on the physical address of the 
processing node. It is important for the SA-1110 to power-up the VDDX (3.3 V nominal) 
before VDDI (1.8 V nominal). The SA-1110 enforces the required sequencing by holding its 
PWR_EN output pin de-asserted until the 3.3-V supply is sufficiently high [94]. The FPGA 
logic uses PWR_EN output to control the voltage power-on sequencing. The SA-1110 is then 
held in reset for at least 2ms.
Once the SA-1110 reset is released, SA-1110 starts its booting process, which is completed 
only when its operating system is up and running. The boot-up software has to write to the 
health register in the FPGA to indicate that it is alive upon successful boot-up of its 
operating system. This action resets the 16 second boot-up-timer. If the boot-up timer fails 
to reset and timeout occurs, the StrongARM is power-cycled to restart the booting process. 
Three boot attempts are allowed to boot from the same flash bank. Each processing cluster 
has its own default flash bank where it retrieves its boot-loading and operating system 
code. After three failed attempts to boot from the default flash bank, the booting restarts 
using an alternative flash bank in the network. The alternative flash bank is designated as 
the flash bank residing in the same power plane as the primary flash bank. Only one reboot 
attempt is allowed via the alternative flash chip bank.  
74
  
      
Figure 4-3: Processor Fault Handling Routines in the FPGA
If booting succeeds, the processor has to periodically write its health status to the health 
register in the FPGA to reset the watchdog timer. If this is not done within 1 second, the 
processor is deemed to have hung in operation and reboots. If the reboot count exceeds the 
maximum allowed, the processor is deemed faulty and it is switched off.
75
  
The FPGA maintains a state table for each processor. At any one time, each processor is 
classified under one of the following four states. 
Powered off and healthy the state where the node is switched off, with no previous 
record of failure to boot since last FPGA reset. 
Powered off and faulty state where the node is switched off, with a previous record of 
failure to boot since last FPGA reset. 
Powered on and ready state where the node is switched on and the operating system 
booting is complete  
Powered on and not ready state where the node is switched on and the operation 
system booting is still in progress. 
Processor nodes can request the PN state table information from the various network 
elements for use in their software routines.
4.5 Flash Chip Fault Handling 
The non-volatile flash chip memory in the PPU serves an important role. It stores the entire 
software package for the software configurable processor nodes. This includes a first stage 
boot-loader code, a compressed second stage boot-loader code, a compressed Linux kernel 
image and compressed Linux RAMDISK image. Other than the first stage boot-loader code 
that is loaded into the internal FPGA memory upon boot-up and directly mapped to the SA-
1110 boot start memory address space, the rest of the codes are transferred to the SA-1110 
via the VTGB network.
The PPU uses the industrial grade Atmel serial flash chip AT45DB642D-CNU for its non-
volatile flash chip memory. This chip can support a fast synchronous serial interface that 
can clock up to 66MHZ.  To ensure reliability in the SA-1110 booting process, the PPU has 
three sets of AT45DB642D-CNU chips per FPGA, which can operate in triple-voting memory 
read.  Each AT45DB642D-CNU serial flash chip is of 8Mbyte size. Hence, there is a total of 
32Mbyte of triple-voted serial flash memory for the four processing clusters in the PPU.
In order to check, maintain and correct the contents of the serial flash, the FPGA provides 
flexibility for the StrongARM processor nodes to perform either read access from a single 
flash chip or read access in a triple-majority-voting fashion from the three serial flash chips. 
This allows a StrongARM processor node to choose its level of reliability, as well as to run 
routines to compare the contents of the three flash chips. The FPGA also provides the ability 
to write to each individual flash chip. Hence the StrongARM processor nodes can run 
periodic software routines to correct the contents of the serial flash chip. 
76
  
4.6 Network Element Fault Handling 
For reliability, the VTGB network can be configured to span across one, two or four 
processing clusters depending on operational requirements and the health of the various 
clusters. The ability to reconfigure the VTGB network without the need to perform static 
programming of the FPGA is a salient feature that enables the PPU to degrade gracefully in 
the scenario where network elements fail. 
The network can be configured to route through a single FPGA (Single Cluster Mode), two 
FPGAs (Double Cluster Mode), 3 FPGAs (Triple Cluster Mode) or 4 FPGAs (Quadruple 
Cluster Mode). The different configurations are shown in Figure 3-4. It is to be noted that 
for the Single Cluster Mode, the network can only be routed within the Master FPGAs and 
not the Slave FPGAs. The reason is because the Slave FPGAs have no communication 
interfaces to the SSR or the CAN. Hence, Slave processing clusters are detached from the 
onboard bus command and data inputs when they operate without the Master processing 
clusters.
Because the PPU Master 1 and Slave 1 processing clusters are on a different power plane 
from the Master 2 and Slave 2 processing clusters, there is a possibility that half is on and 
the other half is off. Hence the default network configuration is the Double Cluster mode, 
the mode where the VTGB network spans across two processing clusters. If both power 
planes are switched on, commands can be sent from the onboard computer to reconfigure 
the network switch to the Quadruple Cluster Mode. This is the mode where the VTGB 
network spans across all four clusters. The network mode can also be changed to bypass 
failures in specific network elements. 
When the FPGA receives a command from OBC to change to a network configuration 
different from its current configuration, three steps are done:
Step 1: VTGB reset is asserted to reset all VTGB registers. 
Step 2: The physical VTGB network structure is configured by setting the select 
signals to the multiplexer network switch.
Step 3: VTGB reset is de-asserted.
4.7 Reconfiguration of the Parallel Processing Architecture  
4.7.1 Fault Tolerant Mapping of Virtual-to-Physical Resources   
Communication between entities in the PPU VTGB network can be specified using either 
physical or virtual address identifiers (refer to Section 4.3.3). This is a unique feature 
supported by the VTGB network (See Section 4.3.1). The virtual address of the processing 
entities is a property that is dynamically determined at runtime. 
77
  
Upon successful boot-up of each processor, the FPGA network element allows the processor 
software driver to define its own virtual address configuration logic. The processor then 
writes the computed virtual address of the entity into a FPGA register. The virtual address is 
used to configure the VTGB handler associated with that entity. 
An application process can use this virtual address to identify its logical function and role in 
the parallel processing platform. They are logical addresses assigned to parallel nodes in an 
application task. A virtual address can indicate the logical position of a processor node in a 
processor array topology like the mesh or linear array. It can also refer to a master 
processor for a parallel algorithm based on the master and slave processing topology. 
Virtual addressing is a means of hardware abstraction. The applications need not be 
concerned with how the logical processing nodes are mapped to actual physical nodes. 
Runtime fault tolerant processor reconfiguration and mapping are transparent to the 
application processes. Pre-prepared ground station command scripts are also based on the 
virtual addressing scheme. For example, a ground station command script sends command 
to a logical processor node using its virtual address. At runtime, the PPU software driver 
selects a healthy physical processor to be that logical node and allocates it the designated 
virtual address. Hence virtual addressing allows the PPU to internally map logical entities to 
physical entities without the need to change ground station command scripts. This is crucial 
since ground scripts are usually hardcoded and are insensitive to changes in the actual 
logical-to-physical mapping process in the PPU. These ground scripts are uploaded to the 
satellite during a satellite ground pass.
4.7.2 Selection of System Node 
External entities like the Onboard Computer (OBC) communicate to the PPU via the CAN 
bus using virtual addresses for reasons explained in Section 4.7. The communication though 
the SSR link to the PPU is also using virtual addresses. At runtime, the PPU software driver 
on each processing node runs an algorithm that collectively determines which physical 
processor node shall be the system node . At runtime, the PPU software driver selects a 
healthy physical processor to be the system node and allocates it 0x80 as its virtual address. 
External entities communicate to the PPU by specifying the system node virtual destination 
address in the CAN messages or in the SSR data packets.
The provision for runtime selection of the system node adds reliability to the PPU. As long 
as there is a working processor in any processing cluster, the system node will be in 
operation. Hence reliability in communication with external entities is established. 
4.7.3 Graceful Degradation 
When the number of healthy processors in the PPU have degraded to the extent where 
there are insufficient physical processing entities to map to all the virtual processing 
entities, there will be at least one virtual address that is not mapped to a physical entity. 
There is likelihood that command and data packets generated from its interfaces might 
78
  
contain non-existent virtual addresses. Those packets placed on the VTGB bus will not be 
grabbed by any entity. In this situation, these packets will be automatically removed from 
the VTGB network. VTGB network has the logic incorporated to remove packets at the 
sending entity if they are not grabbed by any receiving entity in the network. However the 
application data associated with these non-existent virtual processing nodes will be lost. 
The ability to degrade gracefully in the PPU is important. The PPU can operate in this 
degraded mode till the next ground station pass, where new application codes are uploaded 
to the spacecraft. The new set of codes can be programmed to use less processing resource, 
at the expense of longer execution times. Thus, there is a trade-off between computational 
power and speed. 
4.7.4 Multiple Code Redundancy  
Fault tolerant schemes in hardware recover processors by replacing faulty processors with 
healthy ones. However, they do not guarantee recovery of application data associated with 
the faulty processing element - data that has been earlier streamed and processed in the 
faulty processing element. 
The PPU s VTGB thus implements the virtual group broadcasting functionality which allows 
application users to define virtual broadcast group addresses for a processor group of 
varied size. Each processor in the same virtual group will receive the same set of application 
data, process them using the same algorithm and send them back.   Thus, the final output of 
the PPU will contain multiple copies of the same computed data.
By having multiple copies of the processed data output sent to the ground station, a scheme 
like triple-majority voting can be used to obtain a more reliable result for the final user.
4.7.5 Operational Trade-offs 
In the PPU software architecture, there is always the option of trading off computation 
capability as well as performance to power ratio. The PPU can be adapted to different 
mission needs to meet reliability goals because of its re-configurability. 
The application user pre-decides the amount of computational power needed for a parallel 
task. This translates to the number of processing clusters required to operate, as well as the 
number of processing nodes per cluster. The application user also decides how to use the 
processing resource - whether to provide maximum computation power or to operate in 
maximum reliability mode. 
The PPU can also be adapted easily to different operation scenarios with different power 
constraints. If the maximum power allocated to the payload is low, the payload can reduce 
power consumption by being programmed to use less computation nodes to perform a 
parallel task. Towards the satellite s end of life where perhaps there could be failure of 
some of the battery strings, this feature might come in useful.
79
  
4.8 Software Fault Tolerance 
Other than tolerance towards electronic hardware failures, the PPU fault handling schemes 
also need to take into account the soft errors that might be caused by radiation (e.g. bit flips 
in registers or bit errors in memory components). Occasional single-event upsets in these 
COTS memory and processor component are a common problem.  Effective software fault 
handling schemes can be employed to deal with the random bit-flip errors [95]. 
This will increase the availability of the payload and minimise payload downtime during 
occurrences of faults [96,97]. Traditionally, hardware-based Error Detection and Correction 
(EDAC) schemes are employed to protect memory devices, as they are very reliable.  There 
is no additional overhead or constraints on the software developers. However the PPU 
utilises software EDAC schemes (e.g. Reed-Solomon) because of its many processing 
elements and distributed memory components. Hardware based EDAC schemes are 
impractical in PPU as they require too many FPGA I/ O pins, which increases the PPU board 
complexity. 
4.9 Verification of the Fault Tolerant Computing Architecture 
The verification of the fault tolerance scheme in PPU is carried out by defining operating 
scenarios and simulating the hardware and software fault that could exist in the various 
entities connected together by the communication network, as well as faults in the 
communication network itself.
4.9.1 Verification of Quality of Service 
An operation scenario describes a high level task that is to be performed in space (e.g. code 
upload, image compression or image feature detection).  For each operation scenario, one or 
more Quality-of-Service (QoS) levels can be associated according to the level of degradation 
allowed for the operation to be successful.  This QoS level is defined by the level of 
parallelism performance required versus the number of processors available for the task.  
The number of processors available for the task is in turn dependent on the number of 
processors that are turned on.  It is also dependent on the successful reconfiguration of the 
computing platform to provide the required performance service, under different simulated 
fault conditions.  Table 4-4 provides examples of the Quality-of-Service levels, taking into 
consideration the number of PNs turned on and made available to the application.
The QoS table can be interpreted in such a way that the likelihood of achieving the required 
performance increases with the number of PN turned on, and decreases with the number of 
PNs requested by the application.  However, the power required to support the operation 
increases if more redundancy is employed.
80
   
Table 4-4: Quality of Service Level 
Operation Scenario PNs Requested 
by Application 
PNs that are 
turned on 
No of simulated 
faulty PNs 
Quality of 
Service Level 
Low Computation 
Mode (LCM)
(e.g, Housekeeping 
operations)
1 1 0 No redundancy, 
No degradation 
1 4 3 400% 
redundancy 
1 2 2 100% 
degradation 
Normal Computation 
Mode (NCM)
(e.g, Code upload)
4 4 0 No redundancy, 
No degradation 
4 8 3 100% 
redundancy 
4 4 2 50% 
degradation 
Peak Computation 
Mode (PCM)
(e.g, Image 
Compression and 
feature detection)
16 16 0 No redundancy, 
No degradation 
16 20 2 125% 
redundancy 
16 20 8 25% 
degradation 
To validate that the PPU architecture supports the selection of healthy PNs at runtime to be 
used in the applications and allocates them virtual addresses, the operation scenarios are 
tested under varying conditions of processor redundancy. This validates the fault tolerance 
procedures in an environment where numbers of processing nodes that are powered on are 
more than that requested by the application. 
The behaviour of the hardware during degraded performance mode is also to be recorded. 
Degraded performance is the mode where the number of healthy processors is less than 
that requested by the application. A processor fault in the PPU is simulated in hardware by 
removing its power jumper which removes its power source. Degraded performance results 
in data loss. For example, in the case of parallel JPEG compression, the image partition that 
is associated with a faulty processor is lost. However if the application is designed to work 
in triple-redundancy mode where multiple processors are used to run the same application 
on the same data set, then the data is recoverable.  In a similar manner, QoS level can also be 
defined in terms of the number of Boot ROMs available to the operating scenario, or the 
number of working interfaces.  In short, the QoS level represents the readiness of the 
computing architecture to handle the task at hand.
The PPU s fault tolerance features have been verified under many operating scenarios, 
operating at various QoS levels.  The strength of the fault tolerant architecture proposed in 
81
  
this chapter lies in the availability of redundant paths available to complete the operating 
scenarios, with the level of QoS expected.  As new processing tasks are defined, new 
operating scenarios have to be created and the expected QoS levels have to be set.  The fault 
tolerant architecture contains the scalability to scale up in terms of hardware, by adding 
more parallel entities, or to increase power consumption to meet the QoS required for the 
operating scenario.
4.9.2 Verification of Operation Scenarios 
In the verification of operation scenarios, each operation scenario is split into several 
procedures or steps of interaction, with various parallel entities connected to the network. 
After powering up the PPU, the payload waits for the CAN telecommand (TC) TC_PPU_INIT 
to come from either of the CAN interfaces. The explicit initialization command TC_PPU_INIT 
specifies which PN nodes should be started up. Figure 4-4 shows a typical PPU operation 
flow scenario which occurs after power-up and initialisation phase.
OBC/OIM
send
TC_PPU_RECEIVE_DATA
TC_ACK_PPU_RECEIVE_DATA
done LVDS data transfer
TC_STATUS_PPU_RECEIVE_DATA
TC_PPU_COMPLETE_DATA
TC_ACK_PPU_COMPLETE_DATA
SSR PPU
OBC/OIM PPU
TC_PPU_EXEC_APPL
TC_ACK_PPU_EXEC_APPL
TC_STATUS_PPU_EXEC_APPL
TC_PPU_TERMINATE_APPL
TC_ACK_PPU_TERMINATE_APPL
OBC/OIM
receive
TC_PPU_TRANSMIT_DATA
TC_ACK_PPU_TRANSMIT_DATA
done
LVDS data transfer
TC_STATUS_PPU_TRANSMIT_DATA
TC_PPU_COMPLETE_DATA
TC_ACK_PPU_COMPLETE_DATA
SSR PPU
Figure 4-4: PPU Operation Flow
In the operation flow, there is a data download phase where data from the SSR is streamed 
to the PPU. This data could be image data, code data or other form of data required as inputs 
to the application. The second phase is application execution and termination. CAN TC 
command TC_PPU_EXEC_APPL is sent to start application execution. The parameters in the 
CAN command indicate the identifier of the application to execute as well as application 
arguments. The application is terminated by sending the TC_PPU_TERMINATE_APPL CAN 
command to the PPU. The data output from the application, if present, will have to be sent 
back to the SSR. Hence the last operation process flow is the transfer of processed 
application data back to the SSR.
The processes described in the PPU operation flow diagram are present in different 
combinations for each operation scenario. A few of these operation scenarios tests are 
mentioned below.
82
   
1. The PPU Checkout Verification during early orbit 
2. Code Upload 
3. Parallel Image Compression 
4. Flash Validate  
1. The PPU Checkout Operation Scenario
The PPU Checkout verification is done at early orbit phase of the satellite in the few days 
after launch, to check the health state of the processors. Each StrongARM processor in the 
PPU take turns to perform the same checkout routine, as described in the following 
procedural steps. 
Power on the PPU 
Send the TC_PPU_INIT CAN command to power up a StrongARM
Send the TC_AP CAN command to request that the StrongARM run the application to 
compress a small image (256 256 pixels) that is pre-programmed in the SSR and 
to compare the compression output with a pre-programmed output benchmark.
Send TC_STATUS_EXEC_APPL telemetry status to indicate the application execution 
status.
The status code in TC_STATUS_EXEC_APPL states success or failure.
Send the TC_PPU_INIT CAN command to power off the StrongARM.
Send the TC_PPU_INIT CAN command to power on another StrongARM and repeat 
the above steps.
2. Code Upload Operation Scenario
The PPU code upload is performed whenever the boot-load or application code in the flash 
chip needs to be updated. The steps to this operation scenario are as follows:
Power on the PPU 
Send the TC_PPU_INIT CAN command to power up a StrongARM.
Transfer the SSR configuration header frame from the SSR to the PPU to prepare the 
PPU to receive code data (type 1 configuration information) as well as define the 
code data structure (type 3 configuration information).
Transfer code data frames from the SSR to the PPU.
Send the TC_AP CAN command to request a StrongARM application to program the 
flash chip with the received code data. 
83
  
Send TC_STATUS_EXEC_APPL telemetry status to indicate the application execution 
status.
Send TC_PPU_INIT CAN command to power off the StrongARM.
3. Parallel JPEG Image Compression
The PPU JPEG compression is performed on image data before down-linking to the X-band 
communications downlink to optimise data throughput.  The steps to this operation 
scenario are as follows:
Power on the PPU 
Send the TC_PPU_INIT CAN command to power up multiple StrongARMs.
Transfer the SSR configuration header frame from the SSR to the PPU to prepare 
entities in the PPU to perform parallel JPEG image compression (type 1 
configuration information); as well as to define the image data distribution to the 
various entities in the PPU (type 2 configuration information).
Transfer Image data frames from the SSR to the PPU.
Send the TC_AP CAN command to request that the StrongARMs run the JPEG 
compression application to compress the data and later to terminate the application.
Send TC_STATUS_EXEC_APPL telemetry status to indicate the application execution 
status.
Transfer compressed data back to SSR.
Send the TC_PPU_INIT CAN command to power off the StrongARM.
4. Flash Validate Operation Scenario
The objective of this operation scenario is to verify the flash chip contents on the PPU with 
the version uploaded from the ground station. The steps to this operation scenario are as 
follows:
Power on the PPU 
Send the TC_PPU_INIT CAN command to power up a StrongARM.
Transfer the SSR configuration header frame from the SSR to the PPU to prepare 
entities in the PPU to receive a copy of the flash content data for validation from the 
ground (type 1 configuration information); as well as to define the flash content 
distribution to the desired entity in the PPU (type 2 configuration information).
Transfer flash data frames from the SSR to the PPU.
Send the TC_AP CAN command to request the StrongARM to run the application to 
read from the flash chip.
The PPU sends TC_STATUS_EXEC_APPL telemetry status to indicate the flash read 
status 
Send the TC_AP CAN command to request the StrongARM to run the application to 
validate the flash chip contents with the version received from the ground. 
84
  
Send TC_STATUS_EXEC_APPL telemetry status to indicate the flash validate 
application execution status, whether it is success or failure.
Send the TC_PPU_INIT CAN command to power off the StrongARM.
4.10 Conclusion 
In this chapter, a reconfigurable communication network architecture is proposed. It 
addresses research question two, which aims to design a communication network with fault 
detection and fault management schemes that are efficient and possible to implement on 
the FPGA platform. 
The proposed VTGB communication network is a FPGA-based interconnect network that 
forms the backbone of the PPU. It provides not just the on-chip communications function 
but also the fault tolerance mechanism to support the integration of heterogeneous 
processing and memory elements. The VTGB caters different quality levels of service to 
different type of applications. 
The unique design of the VTGB communication network and its configurable routing 
schemes through the various clusters are key innovative features that have made the PPU 
architecture resilient to a wide range of faults from processor node failure, FPGA failures 
to memory failures. Fault recovery and selection of healthy processing nodes for use at 
runtime are transparent to external entities because of the virtual addressing scheme that 
was implemented. 
The defining of virtual resources by the user, and the mapping of the virtual resources to 
actual physical hardware entities only the runtime, allows runtime replacement of faulty 
components, enabling reliable operation. The PPU ensures reliability through a combination 
of both static and dynamic redundancy. Static redundancy schemes include implementing 
triple-majority voted flash banks and configuration of links. Otherwise, mostly dynamic 
redundancy schemes are employed. This includes selection of healthy resources for use at 
bootup, processor fault detection and handling routines and network reconfiguration.
85
  
5 Fault Tolerant Mesh Processing Array 
5.1 Introduction 
The PPU was designed to handle image processing applications that will benefit from a 
linear or mesh processing array topology.  However, since it is also meant to operate in the 
harsh space environment, it is also expected to be fault tolerant in order to maintain the 
quality of service required.  The third research question was raised to research on possible 
topologies, re-configurability and fault tolerant features of such a fault tolerant mesh 
processing array.  
The proposed Fault Tolerant Mesh Processing Array (FTMPA) aims to achieve a high 
survival probability for the COTS processing array in the harsh space radiation 
environment. Variants of this custom FTMPA architecture have been analysed and 
described in other publications by the author [98,99]. The current proposed FTMPA 
architecture is scalable, enabling the mesh processor array to be implemented across 
multiple network elements to support a higher level of processor connectivity in the 
network.
5.2 PPU Parallel Processing and Communication Needs  
The primary purpose of the PPU is to enhance the IRIS multi-spectral EO payload s mission 
value by performing value added processing like image classification, segmentation and 
compression. The image streamed in from the IRIS multi-spectral camera is divided into 
strips or tiles and distributed to multiple processing nodes for processing. Partitioning of 
images and processing them simultaneously in various processing nodes offers huge 
performance speedup. However, in some image processing applications like feature 
detection or change detection, exchanges of information on image partition boundaries 
among processors may be required. For such applications, the inter-processor 
communication network efficiency achieved when processing the image partitions is a 
critical measure of the PPU s performance. 
A parallel application is executed in multiple nodes. It can be described as a collection of 
data processing activities and inter-processor communication activities necessary for 
coordinating data distribution, exchange or gathering information among processing nodes.
Parallel applications can be classified according to the dependencies between their 
subtasks, the need to synchronise and the frequency of communication with each other. 
86
  
Depending on the extent of communication, they can be described as fine-grained2, coarse-
grained3 or embarrassingly parallel4 applications. 
The performance of communication intensive applications is greatly dependent on 
communication network efficiency. If the PPU is operated in data parallelism mode where 
essentially embarrassingly parallel applications are executed, the VTGB ring communication 
network is more than sufficient. The VTGB network can stream data to and from the various 
processing nodes efficiently at a data rate of 160Mbps. However, if the PPU is to execute 
communication intensive parallel tasks, then the VTGB network has to be complemented 
with an additional network for inter-processor communications. 
It is observed that the division of images into strips or tiles results in the need for a linear or 
mesh array communication topology for transferring image boundary data across various 
processors. These boundary data are required to carry out image processing across parallel 
processors, with each processor working on its local set of image data.  In a ring topology, 
these boundary data will have to be transferred from node to node, which is not efficient 
when the number of processing nodes increases.  In a bus topology, while broadcasting will 
allow these boundary data to be sent simultaneously to all processing nodes, the 
transmission is time shared and bandwidth will be limited. On the other hand, symmetric 
communication topologies like the mesh processing array offer low latencies for 
communication patterns that exhibit spatial locality [40]. Hence the decision is for the 
architecture to support the mesh communication network as it suits the communication 
topological needs of the PPU parallel applications [100,101]. A linear array is naturally 
embedded in a mesh array and automatically supported when mesh topology is adopted. 
5.3 Concepts of a Fault Tolerant Mesh Processing Array (FTMPA)
A mesh processing array is a logical array of processor cells, such that each logical 
processor cell is physically connected up to four other logical processor cells in its north 
south, east and west directions. This is as shown in Figure 5-1.  These four immediate 
communication neighbours of the cell are termed as the cell s north, south, east and west 
neighbours respectively. 
                                                            
2 An application exhibits fine-grained parallelism if its subtasks have a high frequency of communication during 
execution.
3 An application exhibits coarse-grained parallelism if the subtasks have a low frequency of communication 
during execution.
4 An application is embarrassingly parallel if the subtasks have minimal dependency and rarely or never have to 
communicate.
87
  
Figure 5-1: Mesh Processing Array
The construction of a mesh processing array in hardware requires each working cell in the 
logical mesh processing array to be mapped to a working physical processor in the physical 
processor array. This process is termed as the physical to logical mapping of the mesh 
processing array. An example of a practical physical processor network is shown in Figure 
5-2(a), and its corresponding logical mesh processing array of working cells mapped to 
actual physical processors is shown in Figure 5-2(b).  PN denotes a processing node in the 
figures.
Figure 5-2: A practical mapping of physical processors and FPGA resources to a 
logical mesh processor array 
88
  
The hardware needs to incorporate a physical network that can establish the processor 
communication links as required by the logical mesh processing array. When a physical 
processor mapped to a working cell in the mesh processing array fails, it is replaced by 
another physical processor. Several methods of network architecture that support the 
presence of spare processors and their reconfiguration have been proposed [102,103]. The 
specific architectural approach proposed in this thesis serves to cater for a reconfigurable 
mesh network that spans across multiple network elements and that is optimised for 
implementation on a FPGA platform.  The mesh communication network is reconfigurable 
for the replacement processor to establish mesh links with its adjacent processors in the 
logical mesh. For a logical mesh processing array to be constructed out of a physical 
processor array with faulty processing nodes, the following has to be provided in the mesh 
array for realisation in hardware.
1. Firstly, it requires a flexible physical network design for the processing nodes. This 
network contains switches that can be configured to create the communication links 
between processing nodes as required by the logical mesh processing array.
2. Secondly, it requires a remapping mechanism to produce the set of switch 
configurations for the communication network.
5.4 FTMPA Theoretical Framework   
The Fault Tolerant Mesh Processing Array (FTMPA) is designed to construct a logical mesh 
processing array out of a physical processor graph containing faulty processors.  The 
FTMPA architecture basically consists of three parts, the FTMPA physical network graph, 
the global remapping strategy and the local remapping strategy.
5.4.1 Part 1: The FTMPA Physical Network Graph
The FTMPA graph structure defines the number of FPGA network elements in the FTMPA 
graph. In the graph, n is defined to be the order of the graph, where the number of physical 
processors connected to each FPGA network element and the number of inter-FPGA links, 
are both expressed as a function of n. This is shown in Figure 5-3.
The FTMPA network structure can be represented as a 2x2 sub-graph array. Each sub-graph 
represents one FPGA interfaced to maximum of (n2+n) locally connected processors, with 
(n+2) interconnection links to adjacent sub-graphs. 
89
   
FPGA
00
FPGA
11
FPGA
01
FPGA
10
n2 + nn2 + n
n2 + nn2 + n 2n
2n
(a) Physical 
Processor 
Network
(b) Logical Mesh 
Network
Remapping 
Algorithm
n + 2n + 2
n + 2
n + 2
Figure 5-3: A parallel processing architecture with a physical processor network (a), 
from which a logical mesh processor network (b) is constructed.
The FTMPA network structure and remapping strategies are valid for any arbitrary integer 
value of n, greater or equal to 2. The PPU, is a specific example of an FTMPA implementation 
with n=2. The selection of n is dependent on user requirements, which are usually 
influenced by the following two factors:
1. The number of parallel processors required by application user.
2. The size of the hardware board, which limits the number of parallel processors that 
can be placed on the board.
Based on the above two considerations, it is decided that a FTMPA parallel network of n=2 
is sufficient. For the case of n=2, there can be a maximum of (n2+n) = 6 locally connected 
processors per FPGA. However due to PPU hardware space constraints, only five processors 
are connected instead of six in the practical hardware realisation. The global remapping 
strategy in this chapter applies when the number of connected processors per FPGA in the 
FTMPA ranges from at least n2 to (n2+n), for the construction of the 2n x2n mesh array.
5.4.2 Part 2: FTMPA Global Remapping Strategy
The global remapping strategy defines the process of obtaining a sub-graph of the logical 
mesh processor graph for each FPGA network element. The sub-graph allocated for each 
FPGA is also termed as a mesh partition. The specific mapping depends on the number of 
working processors in each processing cluster. With the number of working processors, the 
global mapping algorithm performs mesh partitioning. Figure 5-4 illustrates an example of 
the output of the mesh partitioning.
Essentially the division of the logical mesh processor graph into sub-graphs creates inter-
FPGA communication boundaries between processing nodes. The global remapping 
90
  
algorithm is a systematic way of performing the mesh partitioning such that the maximum 
number of inter-FPGA communication boundaries is optimised and bounded. 
Figure 5-4: Global Mapping Partitions
Inter-FPGA logical mesh connections are eventually mapped to inter-FPGA physical 
connection resources. By limiting the maximum inter-FPGA communication links, the 
communication overheads in terms of internal FPGA gate resources and the number of 
input/output pins required to implement those links are reduced.
5.4.3 Part 3: FTMPA Local Remapping Strategy
Local mapping is a process for realising the local mesh partition in hardware. The local 
remapping strategy defines the following three main areas:
1. Firstly, it defines the local processor mapping within a local mesh partition or FPGA 
sub-graph.
2. Secondly, it defines the local switch network that establishes the inter-processor 
connections required by the local mesh partition.
3. Thirdly, it derives the switch configuration mapping for all the inter-processor 
communication links, both internal (between the processors in the same FPGA) as 
well as external (between processors in different FPGAs). 
For all processing clusters executing the same strategy to independently derive the same 
switching configuration, a systematic mapping of the inter-FPGA processor communication 
links to the physical global link resource is required.
5.4.4 Global Mapping versus Local Mapping 
In the FTMPA architecture, global mapping creates the mesh partitions while local mapping 
achieves a suitable mapping within a partition. Faults originating within the partition are 
91
  
handled by the local remapping strategy. The local remapping strategy fails when the 
physical resources within the local partition are insufficient to map to the logical working 
cells in the partition. In this case, global remapping strategy is reapplied to create new 
partitions across the four network elements, taking into consideration the current number 
of working processors in each processing cluster.
5.5 FTMPA Global Remapping Strategy 
5.5.1 Working Processor Set 
The global remapping algorithm defines the four mesh partition boundaries. It determines 
which locations in the logical mesh are mapped to processors from each of the local 
processing clusters. Execution of the global remapping algorithm is done in software 
running in the processor designated as the system node. The role of a system node was 
described in Section 4.7.2. Before executing the global remapping algorithm, the working 
processor set in each processing cluster has to be decided. The algorithm to decide this is 
given below. The objective of the algorithm is to create spare processing capacity within the 
local mesh partition so that processor failures can be handled by changing only the local 
mapping configuration. 
Algorithm Start:
Step 1: 
Determine the healthy processor set which is defined as (N00, N01, N10 and N11} where N00,  
N01, N10 and N11 represent the number of healthy processors in FPGA00, FPGA01, FPGA10 and 
FPGA11. This information is obtained during runtime boot-up phase for the PPU.
Step 2: 
Determine the working processor set denoted as (W00, W01, W10 and W11}. W00, W01, W10 and 
W11 represent the actual number of healthy processors in FPGA00, FPGA01, FPGA10 and 
FPGA11 selected for mapping to the logical mesh. The values of W00, W01, W10 and W11 are 
determined using the following iterative computation which aims to create spare 
processing capability within the local partition. The computation method is described in the 
following sub-steps.
Sub-step1:
Compute the total number of healthy processors in the combined clusters and 
denote it as NTotal. NTotal is the summation of N00, N01, N11 and N11.
 
92
  
Sub-step2:
Determine the desired size of the 2n x 2n mesh array. The total size of the array is n2. 
For the global mapping algorithm to work, NTotal must be at least equal to n2. If    NTotal  
<n2, then the FTMPA can operate using only a smaller size mesh as there are 
insufficient healthy processors to  be mapped to the logical mesh.
Sub-step3:
If NTotal  n2, obtain the working processor set (W00, W01, W10 and W11} by first 
setting the initial value of (W00, W01, W10 and W11} to be equal to (N00, N01, N10 and 
N11} and run the following five iterative steps: 
Iterative Step 1:
Compute the total number of working processors in the working processor 
set and denote it as WTotal. WTotal is the summation of W00, W01, W10 and W11.
Iterative Step 2:
If WTotal > n2, set W00 = W00 -1 and go to iterative step 3. Else, go to Step 3.
Iterative Step 3:
If WTotal > n2, set W01 = W01 -1 and go to iterative step 4. Else, go to Step 3.
Iterative Step 4:
If WTotal > n2, set W10 = W10 -1 and go to iterative step 5. Else, go to Step 3.
Iterative Step 5:
If WTotal > n2, set W11 = W11 -1 and go to iterative step 1. Else go to Step 3.
Step 3:
Execute the FTMPA Global Remapping Algorithm using the newly derived working 
processor set {W00, W01, W10 and W11}. The FPGA with the minimum number of working 
processors is denoted as FPGAa in the global remapping algorithm. FPGAs in the anti-
clockwise direction of FPGAa are sequentially labelled as FPGAb, FPGAc and FPGAd..                     
Algorithm End:
5.5.2 Global Remapping Algorithm 
The FTMPA global remapping algorithm divides the logical mesh processing array into four 
partitions. The logical cells in each mesh partition are mapped to physical processors in one 
of the four network elements. The global remapping algorithm attempts to minimise the 
number of logical mesh connections required between FPGAs by bounding it to a maximum, 
as specified in the FTMPA network graph. Hence the link structure between FPGA network 
93
  
elements in the FTMPA network graph illustrates the worst case inter-FPGA link resource 
required.
Constructing a mesh out of a faulty FTMPA graph is an NP complete problem, and will be 
done through remapping heuristics. These remapping heuristics guarantee a set of worst 
case physical connection resources required among the FPGAs, as expressed in Theorem 1 
below. FPGAs are referred to as network elements (NEs) in the theorem to illustrate the 
possibility of using routing platforms other than FPGAs on which the PPU is based. For the 
purpose of generalisation, assume that an arbitrary large 2n 2n logical mesh array is 
maintained, where n  2. The PPU represents a specific case where n = 2.
Theorem 1:
Given an architecture with a 2 x 2 array of NEs, where there are (n2 + n) PNs interfaced to 
each NE, the theorem states that as long as there are at least (2n)2 working PNs, the logical 
connectivity in the  2n x 2n mesh can be constructed with at most (n+2) physical 
connections between adjacent NEs.  The theorem assumes that a logical connection 
between two diagonal NEs can be routed via two inter-NE physical connections.
Algorithm:
Let boundary conditions represent the maximum number of physical connections allowed to 
maintain the mesh network.  As stated in the theorem, the boundary conditions are (n+2)
physical connections between adjacent NEs. Every physical-to-logical processor mapping 
configuration results in a specific set of inter-NE communication edges which need to be 
resolved into actual physical connections between the NEs. To prove that Theorem 1 is 
valid, it has to be proven that the boundary condition is not exceeded for all possibilities of 
physical-to-logical processor mapping scenarios.
Before the physical-to-logical processor mapping can be performed, a set of 4(n2+n) PNs 
from the network has to be chosen as the working processor set.  Let s label the FPGA with 
the least number of working processor nodes as FPGAa, and the FPGAs in the anti-clockwise 
direction of FPGAa sequentially as FPGAb, FPGAc and FPGAd. This is shown in Figure 5-5. Let 
A, B, C and D represent the number of working PNs in FPGAa, FPGAb, FPGAc and FPGAd. A is 
the minimum of A, B, C and D since FPGAa is labelled as the FPGA with the least number of 
working PNs. 
The logical mesh partitioning algorithm starts processor assignment using the PNs from the 
FPGA with the least number of working processor nodes, which is FPGAa. It then continues 
with the next FPGA in the anti-clockwise direction. This strategy reduces possible mapping 
scenarios and aids proving of the algorithm. 
94
  
Figure 5-5: FPGA Label
Figure 5-6: (a) A mapping scenario where (A,B,C,D) = n2  (b) A mapping scenario that 
illustrates vertical partition line shift (c) Mapping path definition (d) A mapping 
scenario that illustrates both vertical and horizontal partition line shifts    
95
  
The mesh partitioning algorithm states that  
If A  n2, the logical mesh partitioning algorithm need not execute since each FPGA 
can provide a n x n sub-mesh, which are pieced together to achieve a 2n x 2n mesh.
This is shown in Figure 5-6(a). 
However, if A < n2, the logical mesh partitioning algorithm uses the sums of (A+B) 
and (C+D) to decide the position of the vertical partition line which divides the mesh 
into two partitions. The left vertical partition is mapped to PNs from FPGAa and 
FPGAb, while the right vertical partition is mapped to PNs from FPGAc and FPGAd..  If 
either (A+B) or (C+D) is less than 2n2, the vertical partition line has to be shifted in 
the following manner: The vertical partition line is moved by one step towards the 
left for |F| rows, where F is computed as [2n2-(A+B)].  F can be negative, which 
implies a shift of the vertical partition line towards the right for |F| rows. Figure 
5-6(b) illustrates an example for F >0.
The next step is the physical-to-logical assignment of the logical mesh positions. The logical 
mesh partitioning follows a static assignment path as depicted by the numbered arrows in 
Figure 5-6(c). It starts by mapping mesh locations along assignment path 1 to working 
processors in FPGAa. Hence the first A positions along the sequentially numbered 
assignment paths are mapped to PNs from FPGAa. This is followed by the mapping of mesh 
locations along the sequentially numbered assignment paths to working processors in 
FPGAb, FPGAc and FPGAd. Hence, the next B positions are mapped to PNs from FPGAb, the 
next C positions are mapped to PNs from FPGAc and the process continues till all working 
processors in FPGAd, are mapped. The above assignment steps define the horizontal 
partition line which further divides the mesh into four segments, where each represents an 
assigned mesh partition to working processors in the respective FPGAs.
In Figure 5-6(c), it is observed that arrows of the row-wise assignment paths originate from 
the centre of the mesh to the edge of the mesh in the top left partition, but change direction 
in the bottom left partition. Arrows continue in the same direction in the bottom right 
partition but change direction again in the top right partition. The above assignment 
heuristics for the logical mesh partitioning ensures that inter-FPGA processor 
communication links present in the derived mapping configuration are resolvable using the 
existing network links. 
Let us define the term boundary condition between FPGAx, and FPGAy, denoted as Bxy to 
represent the number of PN communication pairs between them.  Hence Figure 5-6(d) 
illustrates a mapping solution for a particular mapping scenario, showing the boundary 
conditions for the various inter-FPGA communication interfaces (Bab, Bbc, Bcd, Bda, Bac and 
Bbd). A Bab of n denotes that there are n PN communication pairs between FPGAa and FPGAb. 
Observe that the boundary conditions between adjacent FPGAs (Bab, Bbc, Bcd and Bda) never 
exceed the limit of (n+2), and that the two diagonal links between FPGAb and FPGAd (Bbd = 
96
  
2) can be resolved by routing one diagonal communication channel through FPGAa, and the 
other through FPGAc.
5.5.3 Proof of FTMPA Global Remapping Algorithm 
This section provides a proof that the FTMPA Remapping Algorithm satisfies the boundary 
conditions for all mapping scenarios and for n  2. Firstly, let us derive the following 4 
equations based on the rules of the mesh partitioning algorithm:
FnDC 22)(  --------------- (1)
FnBA 22)(  --------------- (2)
ADCBA ),,,min(  --------------- (3)
nnDCBA 2),,,(0 --------------- (4)
From equations (1) to (4), the minimum and maximum values of A, B, C and D are derived in 
equations (5) to (7) for the case F 0. 
Amin occurs when B is maximum at (n2+n). 
Amin  is computed as (n2-n-F) from equation (2) if  (n2-n-F)   0. Otherwise, Amin = 0.
Amax occurs when B = A as B must be at least A. 
Amax is computed as (n2 F/2) from equation (2). 
Cmin occurs when D is maximum at (n2+n).
Cmin is computed as (n2-n+F). 
Cmax occurs when Dmin occurs. 
Dmin = Cmin and Cmax = Dmax.
For F  0,
2
22 FnAFnn --------------- (5)
nnBFn 22
2
--------------- (6)
nnDCFnn 22 ,
     --------------- (7)
97
  
For the case of F<0, the minimum and maximum values of A, B, C and D are derived in 
equations (8) to (10).  The value of Cmin is however changed to Amin because the value of C 
when D is maximum at (n2+n) is smaller than A, which does not satisfy equation (3).  Dmax is 
correspondingly computed as (n2+n-2|F|).  Equation (11) states the maximum value of |F| is 
computed through the inequality Cmax  Cmin.
For F < 0,
2
22 FnAFnn
--------------- (8)
nnB
F
n 22
2
 
--------------- (9)
FnnDCFnn 2, 22
     -------------- (10)
3
2nF              --------------- (11)
To depict all possible boundary conditions for the various mapping scenarios, a mesh 
partition diagram with the following six mapping regions is used to illustrate each scenario. 
Four mapping regions depict the minimum mapping regions for FPGAa, FPGAb, FPGAc and 
FPGAd. The remaining two depict the possible boundaries between FPGAa and FPGAb, and 
between FPGAc and FPGAd respectively. These regions are obtained from the minimum and 
maximum values of A, B, C and D. This visualisation will facilitate the proving of the 
algorithm - that the boundary conditions (Bab, Bbc, Bcd and Bda) never exceed the limit of 
(n+2), and that any diagonal links can be resolved by routing through unused adjacent links.
Now, let us analyse the boundary conditions systematically for the various mapping 
scenarios, depending on the range of F. The boundary conditions state the number of 
maximum adjacent and diagonal links between FPGAs for the different mapping scenarios. 
Boundary conditions are considered met if the number of physical link resources of (n + 2) 
is sufficient to resolve or meet the boundary conditions for that particular mapping 
scenario. Theorem 1 of FTMPA Global Remapping Algorithm is proven true if boundary 
conditions are met for all mapping scenarios.
  
98
   
Case 1: 0  F < n
Figure 5-7 illustrates the mesh partition diagram for mapping scenario 0 F< n for n 2. The 
following conditions can be observed for all possible mappings: 
(Bab)max n+1
(Bbc)max n+1
(Bcd)max n+1
(Bda)max n+1
(Bac)max 1
(Bbd)max 1                
Figure 5-7: (a) illustrates the mesh partition diagram for mapping scenario 0 < F < n 
(b) illustrates one possible mapping for this scenario
99
  
The boundary conditions between any adjacent FPGA pairs are all within (n+1) since          
[Bab)max, (Bbc)max, (Bcd)max and (Bda)max] n+1. Given a link resource limit of (n+2), there is at 
least one spare link between any two adjacent pairs of FPGAs. These spare links can resolve 
up to a maximum of two diagonal mesh communication channels. One diagonal mesh 
communication channel (FPGAbd /FPGAac) can be resolved via a spare FPGAab and FPGAda 
link; while a second diagonal mesh communication channel can be resolved using a spare 
FPGAbc and FPGAcd link.
For any mapping scenario, there is either a maximum of one FPGAbd diagonal mesh 
communication channel ((Bbd)max =1) or one FPGAac diagonal mesh communication channel 
((Bac)max =1), but never both. Hence maximum total number of diagonal mesh 
communication channels in any scenario is only 1. This is less than the maximum of 2 which 
can be resolved by the spare links between adjacent FPGAs. 
This completes the proof that adjacent and diagonal links can always be resolved for 
mapping scenario 0 F<n, based on an available link resource of (n+2).
Case 2: F = n
Figure 5-8 illustrates the mesh partition diagram for mapping scenario F=n for n 2. The 
following conditions can be observed for all possible mappings. 
(Bab)max n 
(Bbc)max = n 
(Bcd)max n+1
(Bda)max n 
(Bac)max = 0
(Bbd)max 3
The boundary conditions between any arbitrary adjacent FPGAs ((Bab)max, (Bbc)max, (Bcd)max 
and (Bda)max) are all within the link resource limit of (n+2). Since [(Bab)max, (Bda)max ] n, two 
diagonal mesh communication channels can be resolved by two spare FPGAab  and FPGAda 
links. In addition, since [(Bbc)max, (Bcd)max ] n+1, one diagonal  FPGAbd mesh communication 
channel can be resolved using a spare FPGAbc and FPGAcd link. From the above analysis, a 
total of 3 diagonal mesh communication channels can be resolved by the spare adjacent 
links. 
100
  
Figure 5-8: (a) illustrates the mesh partition diagram for mapping scenario F= n (b) 
illustrates one possible mapping for this scenario 
For any mapping scenario, there is a maximum of 3 FPGAbd. diagonal channels  ((Bbd)max =3), 
and no FPGAac diagonal mesh communication channel ((Bac)max =0). Hence the maximum 
total number of diagonal mesh communication channels in any scenario is only 3. This is 
equivalent to the maximum of 3 which can be resolved by the spare links between adjacent 
FPGAs. Hence this completes the proof that adjacent and diagonal links can always be 
resolved for mapping scenario F =n, based on available link resource of (n+2).
Case 3: n < F  2n
Figure 5-9 illustrates the mesh partition diagram for mapping scenario n<F 2n for n 2. The 
following conditions can be observed for all possible mappings. Note that the Amin partition 
in Figure 5-9(a) illustrates the case where n 3. When n=2, Amin =0. It is stated earlier that 
Amin  is computed as (n2-n-F) from equation (2) only if  (n2-n-F)   0. Otherwise, it takes the 
value of 0.
(Bab)max n 
(Bbc)max = n+1
(Bcd)max n+2
101
  
(Bda)max n 
(Bac)max = 0
(Bbd)max 2
The boundary conditions between any arbitrary adjacent FPGAs ((Bab)max, (Bbc)max, (Bcd)max 
and (Bda)max) are all within the link resource limit of (n+2). Since [(Bab)max, (Bda)max ] n, two 
diagonal mesh communication channels can be resolved by two spare FPGAab and FPGAda 
links. As there are no spare links in FPGAcd, no diagonal mesh communication channel can 
be resolved via FPGAcd path. From the above analysis, a total of 2 diagonal mesh 
communication channels can be resolved by the spare adjacent links. 
Figure 5-9: (a) illustrates the mesh partition diagram for mapping scenario n < F  2n 
(b) illustrates one possible mapping for this scenario
For any mapping scenario, there is a maximum of 2 diagonal FPGAbd channels ((Bbd)max =2), 
and no FPGAac diagonal mesh communication channel ((Bac)max =0). Hence maximum total 
number of diagonal mesh communication channels in any scenario is only 2. This is 
equivalent to the maximum of 2 which can be resolved by the spare links between adjacent 
FPGAs. Hence this completes the proof that adjacent and diagonal links can always be 
resolved for mapping scenario n<F 2n, based on available link resource of (n+2).
102
  
Case 4: -2n/3 F < 0
Figure 5-10 illustrates the mesh partition diagram for mapping scenario when F is negative, 
and within the range of -2n/3 F<0 for n 2. The following conditions can be observed for all 
possible mappings: 
(Bab)max n+1
(Bbc)max n+1
(Bcd)max n+1
(Bda)max n 
(Bac)max = 0
(Bbd)max 1 
Figure 5-10: (a) illustrates the mesh partition diagram for mapping scenario 
2n/3 F<0, (b) illustrates one possible mapping for this scenario
The boundary conditions between any arbitrary adjacent FPGAs ((Bab)max, (Bbc)max, (Bcd)max 
and (Bda)max) are all within the link resource limit of (n+2). In particular, [(Bab)max, (Bbc)max, 
103
  
(Bcd)max (Bda)max ] n+1. By similar reasoning with the earlier mapping scenarios, there are 
sufficient spare adjacent links to resolve a maximum of 2 diagonal mesh communication 
channels.
From the boundary conditions, there is a maximum of only 1 diagonal FPGAbd mesh 
communication channel ((Bbd)max =1) and no FPGAac diagonal mesh communication 
channel((Bac)max =0). Hence the maximum total number of diagonal mesh communication 
channels in any scenario is only 1. This is less than the maximum of 2 which can be resolved 
by the spare links between adjacent FPGAs. Hence this completes the proof that adjacent 
and diagonal links can always be resolved for mapping scenario -2n/3 F<0, based on 
available link resource of (n+2).
5.6 Local Remapping Strategy 
The primary objective of local remapping strategy is to realise the global mesh partition in 
hardware by mapping the logical mesh cells to actual physical processor, and by 
establishing the mesh communication network links.
5.6.1 Local Processor Mapping 
The first step of the local remapping strategy is the assignment of actual physical healthy 
processors to the local mesh partition. An illustration of the physical to logical mapping of 
the mesh is shown in Figure 5-11. Four local FPGA mesh partitions are obtained.
These processors are assigned to the local mesh partition in accordance to the global 
mapping path definition in Figure 5-6(c). Runtime working processors with the lowest 
physical address identifiers are allocated first. 
Figure 5-11: Local Mapping
5.6.2 Local Interconnection Switch Network 
In the logical mesh, every PN has four input/ output ports. Every PN writes to its north, 
south, east and west output port in FPGA, and reads from its north, south, east and west 
input port. Though each PN is interfaced to four input/ output ports in the FPGA, the FPGA 
104
  
logic time-multiplexes the input and output port to have more effective utilisation of 
resources. 
Figure 5-12 shows the multiplexer structure for the time-multiplexed switch. For a FTMPA 
graph where n=2, the number of PNs per network element is six. They are four global links 
to both the clockwise (right adjacent) FPGA and anticlockwise (left adjacent) FPGA. In terms 
of the number of multiplexed switches, there are 14 in each FPGA network element. Six of 
the switches configure the time-multiplexed input for each of the six PNs. The remaining 
eight switches configure the time-multiplexed input for each of the outbound global inter-
FPGA links. There are four outbound inter-FPGA links to the clockwise FPGA, and four 
outbound inter-FPGA links to the anti-clockwise FPGA.
Figure 5-12: Time-multiplexed Switch  
With time multiplexing, every PN requires only one multiplexed port switch to connect to 
its logical neighbours in the network. The FPGA logic works in four phases of the global 
clock. Data from a PN s north, south, east and west output port is written to the time-
multiplexed output port once every four global clock phases. Similarly, data from a PN s 
105
  
north, south, east and west input port is read from the time-multiplexed input port once 
every four global clock phases. Thus a PN outputs to its north port, or samples from its 
north port once every four FPGA global clock cycles. The actions at each of the four clock 
phases are described in Table 5-1.
Table 5-1: Global Clock Cycle Time-multiplexed Activities 
Synchronous Global Clock
Cycle Activities 1st Cycle 2nd Cycle 3rd Cycle  4th Cycle  
Time-multiplexed output  North South East West 
Time-multiplexed input  East South North West 
Multiplexing switch select configuration
(* this selects the time-multiplexed input for 
sampling in the next cycle)
South North West East
As an example, the actions for a specific PN mesh handler are described for the 2nd cycle. In 
the 2nd cycle, the FPGA outputs the contents of the PN south port onto the PN time-
multiplexed output port. It also samples the time-multiplexed input from the PN south port 
at the rising edge of the 2nd cycle of global clock. In the same cycle, the switch select signal is 
configured to select the time-multiplexed input from its north neighbour. This will allow the 
FPGA to sample the input from the PN north neighbour in the 3rd cycle.
Between every pair of FPGAs, there are four outbound global links and four inbound global 
links. The interface between FPGAs is via asynchronous FIFO queues to enable data to be 
written and read using different global clock domains. This is important as each FPGA has 
its own local clock and the asynchronous FIFO interface prevents signals crossing clock 
domains from having meta-instability problems. 
5.6.3 Local Switch Configuration Mapping 
To ensure that FPGA network elements in different mesh partitions assign the same inter-
FPGA global links to the same inter-processor communication link there is a systematic 
scanning process for the inter-processor communication links in the local mesh. The 
scanning sequence for the communication links follows the sequence numbering shown in 
Figure 5-13. Each FPGA network needs to consider only communication links in its local 
106
  
partition. The four inter-FPGA global links between two adjacent FPGAs are numbered from 
0 to 3. The clockwise inter-FPGA global links with the smallest identifier is assigned to the 
right inter-FPGA communication link associated with the smallest sequence number.  This is 
similarly the case for the anti-clockwise inter-FPGA global links.
Figure 5-13: Inter-processor Communication Link Scanning Sequence 
From the results of the global mesh partition as well as the global link allocation, the local 
switch configuration table can be derived for each FPGA network element. The information 
from the switch configuration table drives the selected signals to the time-multiplexed 
switches. An example switch configuration table is shown in Table 5-2. This table is derived 
for the local mesh partition in FPGA A, for the specific instance of mesh partitioning as 
shown in Figure 5-11.  Table 5-3 shows the corresponding switch configuration for the 
outbound global links for the mesh partition in FPGA A.
The fields in the 4-bit switch configuration identifier are described in Figure 5-14. The most 
significant bit of the switch configuration identifier specifies if the inter-processor 
communication link is internal (logic 0 ) or external (logic 1 ). Figure 5-14(a) describes the
case where the inter-processor communication link is internal between processors from the 
same FPGA network element. In this case, the next 3-bits of the switch configuration 
represent the identifier of the local PN. For external communication links, the second bit of 
the switch configuration identifier denotes if the link is connected to the left adjacent FPGA 
network element (logic 0 ) or to the right adjacent FPGA network element (logic 1 ); and 
the last 2-bits denote the global link number and ranges from binary 00 to 11 for the 
four inter-FPGA global links between adjacent FPGAs.
107
  
Figure 5-14: 4-bit Switch Configuration Identifier 
Table 5-2: Processor Switch Configuration Table  
Physical 
Processor 
Logical 
Working Cell
4-bit Switch Configuration for PN Time-multiplexed Input Port 
   3-bit PN 
Identifier 
Logical Mesh 
Location  
North Port South Port East  Port West  Port 
000 (0,0) Disabled 0100 1100 Disabled 
001 (1,1) 1101 1001 1110 0100
010 ------- Disabled Disabled Disabled Disabled 
011 ------- Disabled Disabled Disabled Disabled 
100 (1,0) 0000 1000 0001 Disabled 
101 ------- Disabled Disabled Disabled Disabled 
110 ------- Disabled Disabled Disabled Disabled 
  
108
  
Table 5-3: Global Link Switch Configuration Table 
3-bit Global Link 
Identifier 
Port 
Direction 
4-bit Switch Configuration for the outbound 
Global Links 
000 South 0100
001 South 0001
010 Disabled -----
011 Disabled -----
100 East 0000
101 North 0001
110 East 0001
111 Disabled -----
5.7 Verification of the FTMPA 
5.7.1 Verification Approach 
The proposed mesh remapping strategies have been verified to be able to construct the 
communication links of the N x N mesh array correctly for different simulations of N and 
different working processor sets (Section 5.5.1). The working processor set consists of the 
healthy N2 working processors that are chosen out of the 4(N2/4 + N/ 2) physical processors 
that are connected to the four FPGAs. The working processor set represents the processors 
that are to be part of the constructed N x N mesh processor array.
In exercising the mesh remapping algorithm the global remapping heuristics will select the 
healthy PNs for mapping into the mesh array, while the local remapping will configure and 
assign the network switch.  In verifying the mesh remapping heuristics, the sufficient 
condition is good enough to establish the feasibility of the heuristics, since the construction 
of a mesh from a faulty FTMPA graph is NP hard.
Basically the derived switch configuration table for the four mesh partitions will be checked 
to reconfirm that the constructed mesh links correctly represent the communication 
linkages of a complete mesh.   The presence of a complete mesh under different simulated 
mesh array size and number of faulty processors is a validation of the scheme.
109
  
5.7.2 Verification Conditions  
The successful verification of the remapping strategy is based on fulfilling the verification 
conditions defined.
Let us denote the size of the mesh array as N x N. Each logical mesh location is expressed as 
(x, y), where x represents the row of the mesh, and y represents the column of the mesh.  
The values of x and y range from 0 to N-1. A physical processor node (PN) mapped to logical 
mesh location (x, y) is denoted as PN(x, y).  The switch configuration table stores the 3-bit 
physical PN local identifiers of PN(x, y) that are mapped to the respective logical mesh 
locations. It also stores the 4-bit switch configuration for the north, south, east and west 
ports of PN(x, y), denoted respectively as PN(x, y)(North), PN(x, y)(South), PN(x, y)(East) and PN(x, y)(West). 
Figure 5-15 shows a diagrammatic representation of these parameters, for a small portion 
of the logical mesh. For this example, the mesh links are verified by showing that both 
PN(2,2)(East) and PN(2, 3)(West) are referring to the same physical link; and  that both PN(3, 2)(North) 
and PN(2, 2)(South) are referring to the same physical link.
Figure 5-15: Processor and Link Mappings in the Mesh
In general terms, this implies every processor node s north, south, east and west 
communication links can only be verified correct if conditions 1 and 2 are true:
Verification Condition 1:
 
East neighbour of PN(x, y) = West neighbour of PN(x, y+1),
where 0 x < N and 0  y < N-1
Verification Condition 2:
South neighbour of PN(x, y) = North neighbour of PN(x+1 ,y)
where 0 x < N-1 and 0 y < N 
110
  
Essentially, the links have to be checked for conformance throughout the array. To do this, 
the information about the switch configurations is extracted and stored in 2-dimensional N 
x N sized array K. Each entry in the array K stores a byte value and is represented by K[i][j], 
where i represents the row index, and j represents the column index.
KP[x][y] stores value of PN(x, y) in the lower 3 bits, 
where 0 x < N, 0  y < N 
KE[x][y] stores the configuration of the switch value of PN(x, y)(East) in the lower 4 bits, 
where 0 x < N and 0  y < N - 1. 
KW[x][y] stores value of PN(x, y)(West) in the lower 4 bits, 
where 0 x < N and 0 < y < N. 
KN[x][y] stores value of PN(x, y)(North). in the lower 4 bits, 
where 0 < x <N and 0  y < N . 
KS[x][y] stores the value of PN(x, y)(South) in the lower 4 bits, 
where 0 x < N -1 and 0 y < N. 
5.7.3 Verification Cases 
The verification cases have to take into consideration whether the PN on both sides of the 
network link are located in the same FPGA network element.  This is because the global 
remapping heuristics can assign PNs from different network elements into the mesh 
network.
Case 1: If both PN(x, y) and PN(x, y+1) belong to the same FPGA network element
This is the case when the east-west communication link is local within the same FPGA.  For 
internal links, the bit 0 of the switch configuration for both east port of PN(x, y) and west port 
of PN(x, y+1) will be 0 (see Figure 5-14 for field definitions of the 4-bit switch configuration).
In this case, for condition 1 to be true, the following two equations must be true. The and 
operation in the following equations, with hexadecimal value 0x07, is a masking to compare 
bits 2 to 0 of the switch configuration. 
Equation 1:
3-bit local PN identifier field of PN(x, y+1)(West) = 3-bit local PN identifier field of PN(x, y) 
111
  
KW[x][y+1] & 0x07
 
        = KP[x][y] & 0x07
where 0 x<N and 0 y<N-1
Equation 2:
3-bit local PN identifier field of PN(x, y)(East) = 3-bit local PN identifier field of PN(x, y+1) 
KE[x][y] & 0x07
  
= KP[x][y+1] & 0x07
where 0 x<N and 0 y<N-1
Case 2: If both PN(x, y) and PN(x, y+1) belong to different FPGA network elements
This case implies that the communication link is an inter-FPGA link. For external links, bit-0 
of the switch configuration for both east port of PN(x, y) and west port of PN(x, y+1) will be 1 .  
In this case, for condition 1 to be true, the 2-bit Global Link Number field of PN(x, y)(East) must 
be identical to the 2-bit Global Link Number field of PN(x, y+1)(West). Thus the corresponding 
entries in array K pointing to these two parameters must be identical: The and operation 
with hexadecimal value 0x03 is a masking to compare bits 1 to 0 of the switch 
configuration; and the and operation with hexadecimal value 0x04 is a masking to 
compare bit 2 of the switch configuration
In addition, the most significant bits of the 3_bit global identifier for both PN(x, y)(East) and PN-
(x, y+1)(West) must be opposite in value; as the global link will be a right link from the 
perspective of one FPGA, but a left link from the perspective of the other FPGA. 
Hence for condition 1 to be true, Equation 3 and Equation 4 must be verified true.
Equation 3:
Global Link Number field of PN(x, y)(East) = Global Link Number field of PN(x, y+1)(West).
KE[x][y]) & 0x03 = KW[x][y+1] & 0x03
where 0 x<N and 0 y<N-1
Equation 4:
Bit2 of Global Identifier of PN(x, y)(East) != Bit2 of Global Identifier of PN(x, y+1)(West).
KE[x][y]) & 0x04            != KW[x][y+1] & 0x04
where 0 x<N and 0 y<N-1
112
  
By following a similar way of analysis, it can be shown that South neighbour of PN(x, y) = 
North neighbour of PN(x+1 ,y) for all Verification cases. 
5.8 Conclusion 
In this chapter, a framework for an arbitrary large mesh network that can be dynamically 
reconfigured in orbit is developed. This addresses research question three.  Mesh 
reconfiguration is used not only to handle in-orbit processor faults, but also to adapt to the 
topological needs of different image processing applications. The theoretical framework is 
translated in hardware as a FPGA switch structure design that connects a large parallel 
network of COTS processors to form a logical mesh array topology.
In the hierarchical mesh network, processors are connected locally into processor clusters 
and where processor clusters are interconnected using multiple network elements to form 
the final network. The FTMPA global remapping strategy divides the mesh into the various 
physical partitions in a fashion that minimises inter-FPGA communication links using a set 
of heuristics, while the local remapping strategy provides a practical scheme for an efficient 
FPGA switch network structure. 
By building a parallel processor network out of multiple FPGAs, there will not be complete 
failure in the situation where one FPGA network element fails. There is a possibility that the 
processor network can still work in a gracefully degraded mode with fewer processing 
nodes and a smaller logical mesh. Hence, the FTMPA is designed with high reliability in 
mind.  
The specially designed parallel processing platform with four FPGAs also resolves the 
limitation in the number of physical processors that one FPGA network element can connect 
to. This number practically ranges around 4 to 10 due to the limited number of 
input/ output pins of one FPGA. By expanding to four FPGAs, a 4x4 mesh array is supported 
for the PPU. This is one method of scaling up the size of the parallel processor network. 
Though work done on FTMPA is based on a 2x2 network element array, the concept can be 
adapted to larger arrays.
113
  
6 A Practical High Performance Computing 
Platform 
6.1 Introduction 
In the realisation of this computing platform, there are several hardware and software 
considerations to make this platform practical and efficient in operation.  A computing 
payload that can maximise its value to the mission and provide ease in usage to the 
application user will have to incorporate some necessary hardware and software features.  
As such this chapter explores the various considerations that are necessary for a practical 
high performance computing platform, as raised by the fourth research question. 
It brings out considerations on three aspects of the design process, mainly hardware PCB 
schematic design, routing, fabrication. FPGA logic block structures and interface protocol 
definitions. Last but not least, it brings out our considerations pertaining to PPU processor 
software code structure. Some of the detailed design documentation has been captured in 
the XSat PPU Critical Design Review Document [104] and referenced.
6.2 PPU Hardware Schematic Design 
The PPU hardware board is a complex board that packs twenty processors, four FPGAs, 
forty SDRAM memory chips, twelve flash chips, two 20W DC-DC converters, two EMI Filters 
and several other IC chips on a PCB of size 36 cm x 29 cm. It is a high density board with 
about 100 IC chips on the board, many of which will be operating in parallel at any one time. 
The following schematic design considerations had to be catered for in this board.
6.2.1 FPGA Chip and Package Selection  
Initial estimation of the total size of the FPGA logic in terms of gates showed that half-a-
million system gate FPGAs should be sufficient. For the reasons mentioned earlier in Section 
3.4.1.2, antifuse FPGAs from Actel were the preferred FPGA choice. Hence AX500 and 
AX1000 were possible FPGA chips for selection. FPGA packages that are pin compatible to 
both AX500 and AX1000 were used to have a hardware design that allows the 
interchangeable use of both chips [105]. This is to allow for code expansion. This early 
consideration proved important in a later stage of the project when it was decided to 
migrate from the AX500 to the AX1000 in order to take advantage of the larger number of 
memory blocks in the AX1000.
114
  
For package selection, package size and the number of user input/ output pins were 
important factors to consider. The initial schematic design estimated that at least 260 I/ O 
pins were required for use in each FPGA design. From Table 6-1, FG484 and CG624 are 
possible FPGA packages. However the CG624 package comes only in military grade which is 
costly. Hence the FG484, which is a fine ball grid array package, was the final choice. 
A package like the FG896 might have additional input/ output pins to interface to more 
StrongARM processing nodes. However the routing complexity will be high, which might 
translate to more routing layers [106]. Hence the FG484 is a balanced choice. It offers a high 
number of input and output pins at a comparatively small form factor [107]. Figure 6-1
shows the form factor for the FG484 package and the ceramic quad flat pack (CQ352) 
package. The PPU design required four FPGA network elements. In accordance with the 
physical dimensions in millimetres given, having four CQ352 chips on the board would 
occupy 20% of the PCB board area! However four FG484 chips on the board occupy only 
2.8% of the board area.
Table 6-1: FPGA Input / Output (I/O) Resource
      
Figure 6-1: FPGA CQ352 and FBGA484 Package Comparison
115
  
6.2.2 FPGA Testing Design Considerations 
As antifuse FPGAs are one-time-programmable (OTP), incorporating a programmable FPGA 
in the same hardware for use during the prototype stage was useful. This aids development 
and eases constraints on the testing process. However, usage of antifuse FPGAs in the early 
design phase can be costly, and this might translate to a limited number of hardware design 
test cycles. The hardware PCB had to cater for the design of eight FPGAs in the same board, 
four programmable and four antifuse OTPs. For each processing cluster, either the 
programmable or the OTP antifuse FPGA was used at any one time.
It was practical to search for the programmable FPGA within the Actel ProAsic Flash-based 
family of FPGAs due to similarity of package and internal design resources with the 
Axcelerator FPGA series. Based on the requirement of half-a-million gates, the APA600 
which had 600,000 system gates was suitable as it was also available in the FG484 package. 
The high level of pin compatibility between APA600-FG484 and AX500-FG484 simplifies 
PCB routing and pin mapping.
Using the same package had the advantage that the PPU board can be designed with the 
APA600-FG484 on the bottom side and the AX500-FG484 completely aligned on the top 
side of the PCB.  The input/ output pins (I/ Os) of both FPGAs were mapped to the same set 
of signals and had the same external interface definitions. Having both chips aligned in the 
same horizontal axis was ideal for PCB routing as tracks were routed to the same external 
interfaces from the same locality. 
The AX500-FG484 chip was placed on the top side to cater for the use of FPGA sockets 
during development. FPGA sockets which were bulky, had to be placed on the top side for 
easy removal and insertion of FPGA chips onto the socket. The use of FPGA BGA sockets 
was important as AX500-FG484 chips are one-time-programmable and the sockets allowed 
for easy replacement of the FPGA in testing different FPGA hardware design iterations. 
Figure 6-2  shows the topside of the PPU hardware board with the FPGA sockets mounted.
Figure 6-2: FPGA Prototype Board with AX FPGA Sockets
116
  
6.2.3 Meeting High Density Form Factor Requirement 
In order to achieve the high component placement density, the PPU relied heavily on the use 
of Ball Grid Array (BGA) packages.  The very small form factor of BGA chips allow the PPU to 
be designed with huge computation power per unit area. This was a critical decision that 
directly impacts the PPU board computing performance specifications. However, there were 
manufacturing concerns associated with the use of BGA chips that will be addressed later in 
this section [108]. Table 6-2 shows the BGA chips or leadless chips used in the PPU 
hardware.
Table 6-2: Chips with BGA/Leadless Package
Part Number Part Description BGA Package 
AX1000 Actel Antifuse FPGA FG484
APA600 Actel Flash FPGA FG484
AT45DB642D-CNU Atmel Serial Flash Chip Leadless Package 
SA-1110 Intel StrongARM Processor 256-Mini-BGA 
6.2.4 Hardware Interface Design Consideration  
The AX500-FG484 FPGA has a maximum of 317 I/ O pins. These pins had to be carefully 
allocated for use by the various interfaces. After several iterations, the final FPGA pin 
allocation chart is as shown in Table 6-3. 
Interface pin count was an important consideration in interface design. For example, the  
SA-1110 to FPGA interface was defined using a half-word data interface (16 data lines) 
instead of a full-word interface (32 data lines) to conserve FPGA I/ O pins. Serial bit 
interfaces were preferred over parallel interfaces because they use less I/ O pins. Hence the 
interface to the C515C CAN controller was based on synchronous serial communication. The 
SSR interface used the serial LVDS signalling scheme, which allows data to be transferred at 
high data rates over a serial line. The use of LVDS avoids the need for parallel interfaces 
with large data width to achieve the same data rate. In addition, in such a high density 
board, parallel interfaces are prone to electrical interference.  More care was put into the 
routing to avoid such problems.  With the use of high speed serial interfaces, this effect was 
minimized.
In addition, AT45DB642D serial flash chips were used instead of parallel flash chips. The 
AT45DB642D was available in a small package (AT45DB642D-CNU). It provides a 
Synchronous Peripheral Interface (SPI) compatible interface to access data sequentially, 
117
  
instead of the parallel access in normal flash chips. With sequential access, the number of 
active pins is reduced dramatically and this facilitates hardware PCB layout. 
Table 6-3: FPGA Pin Allocation
Interface  Interface 
Description 
Number 
of  Pins 
Pin Description 
CAN Interface  SSC interface  5 Synchronous interface consisting of these signals: SCLK, 
STO, nSLS, INT0, SRI (see Section 6.5.3)
SSR Interface  Synchronous  
Duplex LVDS Link
9 One enable pin and eight synchronous pins, four for 
each direction (see Section 6.5.4)
SA-1110 
interface 
Memory-mapped 
interface 
33 x 5 Each SA-1110 interface to the FPGA consists of 10 
address lines, 16 data lines and 7 control lines. The 
control signals include two interrupts, one reset line, 
one chip select (nCS), one output enable (nOE), one 
write enable (nWE) and one reset out.
SA-1110 
power-on 
switch interface
1.8V  and 3.3V 
switch control for 
5 StrongARMs
2 x 5 Two pins are used to control the 1.8V and 3.3V 
transistor switch for each SA-1110.
One Flash Bank
(3 flash chips)
Synchronous 
interface 
8 Synchronous interface consisting of these signals:  
SCLK, nCS1, nCS2, nCS3, SI, SO1, SO2 and SO3. (see 
Section 6.1.1)
Debugging Pins Used to route 
internal signals of 
the FPGA.
32 32 FPGA debugging pins for observing internal FPGA 
states.
Right FPGA  Inter-FPGA 
interface 
25 Synchronous inter-FPGA interface. Time-multiplexed 
interface across the various communication network 
channels.
Left FPGA  Inter-FPGA 
interface 
25 Synchronous inter-FPGA interface. Time-multiplexed 
interface across the various communication network 
channels.
Total No of Pins Used 279
6.2.5 Power-up Consideration 
The PPU hardware consists of many processing elements and IC chips, each with their own 
power sequencing requirements. To ensure that the board powers-up to a deterministic 
state each time, it was important to ensure a correct boot up sequence and timing of the 
various hardware entities. The PPU board ensured that the FPGAs are powered on first, 
before the internal CAN module.  The weak pull ups on the transistor switches controlling 
118
  
the power of the StrongARM ensured that the StrongARMs are in a fully powered off state 
before the FPGAs are fully powered up. 
Power sequencing control also ensured that when Actel APA600 FPGAs are used, the FPGA 
core voltage of 2.5V was provided before the input/ output port voltage of 3.3V. The 
StrongARM requires that I/ O port voltage of 3.3V is provided first before 1.8V. Each 
StrongARM has individual switch control for its core voltage and I/ O voltage to support 
power-on sequencing. In addition, each FPGA powers on its five processing nodes 
sequentially to avoid power spikes and distribute traffic on the VTGB.
6.3 PPU PCB Routing Considerations 
The schematic design and routing of the PPU PCB board was done using the Mentor 
Graphics PCB design tool. To ease the high density PCB routing demands, the PCB was 
designed with twelve routing layers. There were altogether six signal layers, three power 
layers and three ground layers. The PCB was designed with sufficient power and ground 
planes to enhance capacitive effects for better performance.
The four FPGAs were connected in a square matrix array, where each FPGA has a 25-line 
network interface with two other FPGAs. An inter-FPGA VTGB communication network was 
designed to route through the four functioning FPGAs through these 25-line synchronous 
network interfaces. For this interface to carry high speed signals of up to 100MHz, these 25 
lines were routed with equal length (within 10% tolerance) on the PCB. 
To ease routing complexity, the PPU hardware board was designed with symmetry in mind. 
This was important to make use of a feature in the Mentor Graphics tool that allows tracks 
between similar entity constructs to be duplicated. Hence the routing within one processing 
cluster can be duplicated for the remaining three processing clusters. This results in 
tremendous saving of PCB routing time.
6.4 PPU PCB Manufacturing Considerations 
The PPU hardware board was fabricated in a design house that produces high quality multi-
layer boards and a silkscreen layer which meets out-gassing requirements. The component 
mounting was performed by a team qualified to perform space grade soldering by ISRO5. 
The reliable use of BGA components require  
1. A high quality reflow process for BGA component mounting.  
2. A detailed inspection process of the BGA component solder joints onto the pad, 
based on X-RAY reports.
                                                            
5 ISRO = Indian Space Research Organisation
119
  
To ensure reliable mounting of the BGAs on the PCB, the reflow had to be well controlled. 
Firstly, spare PCB board profiling was done to calibrate the reflow process. This is to ensure 
that temperature across the board is well controlled at about 220 degrees ± 5 degrees 
Celsius (the temperature required for eutectic solder). Secondly, the reflow process for the 
BGA uses tacky flux instead of solder paste. Tacky flux is able to spread more consistently 
across the BGA pad to ease component alignment. It also ensures wetting between the BGA 
ball and pad and reduced air voids in the BGA pads. Thirdly, eutectic solder was used as it 
offers the best reliability in the industry in terms of pad wetting.
The X-Ray diagrams for the BGA pads were inspected to ensure that the solder paste (which 
shows up as dark areas on the pad) covers the pad sufficiently. The air holes which are 
common on the pad must not cover more than 50% of the pad. In accordance with XSat 
imposed board fabrication standards, the number of permitted reworks was limited to only 
2 per component and fixed at 6 for the whole board. The board also had to be conformal 
coated.
6.5 FPGA Firmware Design  
6.5.1 Overview of FPGA Design 
The PPU FPGA architectural block diagram is shown in Figure 6-3. The FPGA network 
element implements the communication networks and entity interface logic. The 
description of the various entity blocks is provided in Table 6-4.
Table 6-4: Entity Description  
FPGA Entity Name Entity Description
VTGB Handler  This entity contains the protocol mechanism for the Variable Time Global-slotted 
Bus (VTGB) communication network implemented in the FPGA.  
SA-1110 Handler
This entity implements the memory-mapped interface between the FPGA and the 
SA-1110. 
Address Translator 
This entity reads the hardware address identifier of the FPGA network element. 
This identification is used by internal FPGA logic functions to derive physical 
addresses of various entities in the four FPGAs; and compute the execution logic for 
that specific FPGA.
SSR Handler 
This entity implements the logic required to receive and transmit data from the 
SSR.
Data Protocol 
Handler 
This entity processes the data received from the SSR handler and distributes the 
data to the respective network entities accordingly to the configuration information 
specified in the configuration header frame.
CAN Handler 
The CAN handler implements the logic to receive and transmit data from the C515C 
Microcontroller Synchronous Serial Communication (SSC) interface.
Command Handler This entity processes commands addressed to the FPGA via the VTGB. These 
commands come from CAN or from internal entities and serve to configure the PPU 
operation parameters.
120
  
FPGA Entity Name Entity Description
Mesh Handler This entity contains the multiplexer switch structure for the Fault Tolerant Mesh 
Processing Array. Select signals to each multiplexer switch configure a particular 
processor input port to connect with a particular processor output port.
Power Handler This entity controls the power-up sequence for the processor and stores the 
current state of all processors in its registers.
Boot Handler This entity implements the logic to load SA-1110 first stage boot-loader code from 
the serial flash memory into the FPGA internal SRAM blocks. This logic operates 
immediately after power-up of the FPGA and before power-up of processors.
SA1110
PN4
VTGB VTGB VTGB VTGB
VTGB VTGB VTGB
Mux 
Logi
c
VTGB
Data_
Protocol
CAN 
Handler
SSR 
Handler
FLASH 
Handler
Power
Handler
Mux 
Logic
Boot
Flash
Read
Address 
Translator
Flash 
ChipsSSR
C515C 
CAN
SA111
0
Switch
SA1110
PN0
SA1110
Switch
SA1110
PN0
SA1110
Switch
SA1110
PN0
SA1110
Switch
SA1110
PN0
SA1110
Switch
Mesh Switch 
Fabric
FPGA LOGIC BLOCKS
Physical Hardware Entities 
interfaced to the FPGA I/O pins
Logic Blocks in the FPGA
Legend
Figure 6-3: FPGA Architectural Block Diagram
PNPN PN PN
PN
121
  
6.5.2 Meta-Stability 
Each entity interface in the PPU operates with its own local clock. For example the SA-1110 
works on a 206MHz clock, the CAN C515C microcontroller operates on its 8MHz oscillator 
and the SSR receiving interface works on the SSR transmit clock. Each of the four FPGAs in 
the PPU has its own local oscillator. Hence communication between entities involves signals 
that cross different clock domains, which might be susceptible to meta-stability problems. 
A meta-stability problem happens when a signal enters a clocked circuit element (e.g. flip-
flop) too close to the clock edge [109].  This results in the clocked circuit element not 
settling to a known value immediately. It is critical that the output signal from the flip-flop is 
not used until it has settled. To ensure that signals cross clock domains reliably, the PPU 
adopts the use of well-designed asynchronous FIFOs for entity interface. 
Asynchronous FIFO design supports a read clock that is different from the write clock. In an 
asynchronous FIFO design, the critical signals that have to cross clock domains are the FIFO 
read and write pointers. These pointers have to be reliably transferred across before FIFO 
empty and full signals can be accurately generated. As each pointer is a multi-bit signal,
there is always the danger of multi-bit signals crossing clock domains and losing 
correlation. Correlation could be lost because different signals take different amounts of 
time to settle down when there is meta-instability. To resolve this issue, Gray encoded 
pointers are used in the FIFO technique.  Gray encoded pointers change only one bit at one 
time.  These pointers are synchronized into different clock domains before the "FIFO full" or 
"FIFO empty" conditions are tested. The FIFO design is documented in a paper published by 
Sunburst Design [110].
6.5.3 CAN Interface Protocol 
The CAN handler is the entity in the FPGA which implements the logic to receive and 
transmit CAN data packets from the C515C Microcontroller. The CAN Handler interfaces to 
the VTGB Handler via asynchronous input and output FIFOs; and interfaces to the C515C 
microcontroller via a Synchronous Serial Communication (SSC) interface facility available in 
the C515C. The data transferred between the PPU Electronic Board and the C515C in either 
direction is based on a fixed frame length of 16 bytes. This fixed frame is a short VTGB data 
network packet that consists of an 8- byte VTGB header and an 8-byte CAN data field. 
The C515C SSC interface is based on a four line synchronous interface (SCLK, STO, nSLS and 
SRI).  The C515C microcontroller is the master in the SSC interface while the FPGA CAN 
Handler is the slave. The digital lines are described below and the interface waveforms are 
specified in Figure 6-4. The master generates the SCLK clock and transmits on the STO data 
line during data transfer.  The slave transmits on the SRI data line. The C515C sets the logic 
on the nSLS line to indicate whether it is transmitting to the slave or receiving data frames 
from the slave. The INT0 interrupt line from the FPGA to the C515C allows the slave to send 
a transmit interrupt request to the master.
122
  
Digital Signal Description
SCLK synchronous clock generated by the C515C microcontroller 
STO C515C transmit data line
nSLS slave transmit enable control signal generated by the C515C
INTO positive-edge triggered interrupt line generated by the CAN Handler as a 
transmit request to the C515C microcontroller. 
SRI - C515C receive data line
Figure 6-4: SSC Interface with the C515C Microcontroller
6.5.4 SSR Interface Protocol 
The SSR handler implements the logic required to receive and transmit data from the SSR. 
There were altogether four digital lines between the SSR handler and the LVDS transmit 
driver and another four digital lines between the SSR handler and the LVDS receiver. The 
transmit and receive interface waveforms are similar and depicted in Figure 6-5.  This 
interface is based on a fixed frame length interface.  The SSR handler interfaces to the VTGB 
handler via asynchronous FIFOs.
Digital Signal Description
SYNC This is the start of a frame indicator, which is high for the first 2 clock cycles 
of a frame.
CLK This is the synchronous clock generated by the sender. The sender sends data 
at the rising edge of this clock and the receiver samples the data at the falling edge 
of the clock. 
123
  
DATA(1:0) This is the 2-bit data interface, where data(0) denotes the lower 
significant bit while data(1) denotes the higher significant bit.
CLK
SYNC
DATA
[1:0] [7:6]
"11""00" [5:4] [3:2] [1:0] [7:6] [5:4] [3:2] [1:0] [7:6] [5:4] [3:2] [1:0] [7:6] [5:4] [7:6] [5:4]"11""00".......
Vertical
Transfer
Sycn.
Signal Data
Sycn.
Signal
Vertical
Transfer
No of  clock cycles = F x 4
(This clock is not a free running clock and depending on the 
data arrival times, might have missing clock edges )
2 clks
idle cycles
before start 
of frame 
(optional) 2 clks
Minimum 2 
idle cyles 
between 
frames
Fixed Data Frame of 
Length (F bytes) 
Figure 6-5: SSR Interface Definition
6.5.5 Runtime Configuration of Data Distribution from SSR 
The data protocol handler is the entity responsible for processing the data received from 
the SSR handler. To maintain flexibility in operation this entity implements a configuration 
mechanism that dynamically defines the data distribution structure from the SSR at 
runtime. Performing data distribution by FPGAs is highly efficient because of pipelining and 
the FPGAs large internal data bus.
A header frame is sent from the SSR prior to the start of a data transfer frame. This header 
frame specifies how data packets are to be distributed in the VTGB network. There are 
altogether three header frame configuration types. Their descriptions are provided below. 
Header Frame Type 1 Configuration contains the VTGB messages to be 
sent to the various entities in the network to prepare them for the data processing 
phase.
Header Frame Type 2 Configuration states how many nodes will receive 
the application data and exactly how many frames or blocks to each node.
Header Frame Type 3 Configuration states code upload information (code 
size, the flash chip that this code is to be written to and its location).
One header configuration frame can contain multiple types of information (type 1, 2 and 3). 
Figure 6-6(a) shows a header frame that configures the PPU to prepare reception of image 
frames for parallel image processing. It specifies type 1 configuration in the first part and 
type 2 configuration in the second part. The third part is the padding which is required 
since the header frame is a fixed length frame of 18102 bytes. In Figure 6-6(b) the header 
frame specifies type 1 configuration in the first part and type 3 configuration in the second 
124
  
part. This is a standard header frame used to prepare the PPU for code upload information 
in the data phase.
Figure 6-6: Header Frame Formats for (a) Image Distribution (b) Code Upload
6.5.6 Flash Interface Protocol 
The flash Handler is the entity in the FPGA implementing the flash protocol interface logic to 
the serial flash bank, which consists of three ATMEL AT45DB642 serial flash chips. The 
interface signals to the three triple-redundant chips are as follows. The schematic circuit 
design is shown in Figure 6-7. 
Digital Signal Description
SCLK synchronous clock generated by FPGA 
nCS1, nCS2, nCS3 chip select lines to each of the three flash chips 
SI FPGA common data output line to the three flash chips.
SO1, SO2, SO3 flash chip data output lines to the FPGA 
125
  
Figure 6-7: FPGA interface to the triple flash chip
The StrongARM processor communicates with the flash handler via the VTGB. Three VTGB 
command messages to the serial flash chips include status register read, main memory read 
and write. The command type and specific configuration is differentiated using the lowest
four bits of the 7-bit command identifier in the VTGB message protocol (see Figure 6-8).  
Figure 6-8: Serial Flash VTGB Command Types
126
  
As shown, a main memory read can be requested from serial flash chip 1, 2 or 3 of a certain 
flash bank. It can also be done in a triple majority-read fashion from three chips in a bank.
6.5.7 SA-1110 Interface Protocol
The SA-1110 handler is the entity implementing the memory-mapped interface between the 
StrongARM processors and the FPGAs. The FPGA maps read/ write registers into the static 
memory space of each SA-1110 from 0x000 to 0x4FF. Memory address space from 0x000 to 
0x3FF is interpreted as SA-1110 first stage boot code, while 16-bits register assess from 
0x400 to 0x40E are interpreted as special registers. The definition of these registers is given 
in Table 6-5 and Table 6-6. 
Table 6-5: SA-1110 Memory_mapped Read Registers in FPGA
Read 
Register 
Number 
Register  
Name 
Address Description 
R0 GLOBAL 0x400 Contains a SA-1110 physical network address and  
network error flags.
R1 VTGB_QTY 0x402 Indicates available space for VTGB write FIFO and read 
FIFO 
R2 INT_SER0 0x404 Read this register to reset the interrupt GPIO_0 pin to 0
R3 INT_SER1 0x406 Read this register to reset the interrupt GPIO_1 pin to 0
R4 COMM 0x408 Contains VTGB/ Mesh read FIFO empty and write FIFO 
full status. It also contains information on whether the 
half-word in the VTGB/ MESH FIFO is the start, 
continuous or end half-word of a packet.
R6 MESH_READ_DATA 0x40C Read this register to obtain next half-word in the Mesh 
Read FIFO 
R7 VTGB_READ_DATA 0x40E Read this register to obtain next half-word in the VTGB 
Read FIFO 
   
127
  
Table 6-6:  SA-1110 Memory Mapped Write Registers in FPGA
Write 
Register 
Number 
Register 
Name 
Address Description 
R1 VTGB_DATA_START 0x402 Write the first half-word of VTGB message to this 
register 
R2 VTGB_DATA_CONT 0x404 Write continuous half-words of VTGB message to this 
register 
R3 VTGB_DATA_END 0x406 Write the last half-word of the VTGB message to this 
register 
R4 VTGB_ABORT 0x408 Write to this register to abort VTGB service during a 
packet transmission.
R5 MESH__DATA_START 0x402 Write the first half-word of MESH message to this 
register 
R6 MESH__DATA_CONT 0x404 Write continuous half-words of MESH message to this 
register 
R7 MESH__DATA_END 0x406 Write the last half-word of the MESH message to this 
register 
R8 HEALTH 0x410 Each PN has to write to the Health Register upon power 
up to indicate the status of the PN.
R9 VIR_ADD 0x412 A PN writes its virtual address to this register 
R10 MESH_ABORT 0x414 Write to this register to abort MESH service during a 
packet transmission.
6.6 PPU Application Software Interface 
This section defines the interaction mechanism between the StrongARM processor and the 
FPGA and software operation characteristics.
6.6.1 Booting Process 
The SA-1110 processor booting process can be described in three stages. The first stage 
booting occurs immediately after the processor powers up. It begins when the processor 
retrieves its first stage boot-load code from its static memory address space 0x00. This 
address space is assumed by the SA-1110 to be ROM at boot time. This boot ROM space is 
memory mapped to the FPGA. Upon power-up, the FPGA reads the SA-1110 first stage boot-
load code from its serial flash chip and stores it in its internal SRAM memory. The FPGA 
then powers up the processor and provides the op-code to the SA-1110 in accordance with 
the address that the processor writes onto its address bus. The boot ROM data width for 
128
  
this interface is configured to be 16 bits via the logic on the processor ROM_SEL hardware 
pin.
Before the first stage boot-load code terminates, the SA-1110 sends VTGB messages to read 
the second stage boot-load code from the serial flash chip. The second stage boot-load code 
is loaded into the SA-1110 s SDRAM memory. The boot-loader then jumps to the start of the 
SDRAM memory space and starts executing the 2nd stage boot-load code from there.
The 2nd stage boot-load code is used to retrieve the Linux kernel image and RAMDISK 
partition from the serial flash chip via the VTGB network. It is loaded to a specific location in 
the SDRAM memory. Table 6-7 shows the addresses of the various codes in the FPGA serial 
flash chip. 
Table 6-8 shows the SDRAM addresses where the 2nd stage boot-loader code, kernel and 
RAMDISK are stored. A verification of the kernel and RAMDISK image is performed before 
booting up the Linux kernel.
Table 6-7: Flash Address Allocation 
Name Starting address in FLASH 
The 1st stage boot-loader (blob-start) Page 8000, byte 0. The size of blob-
start is limited to 1024bytes.
2nd stage boot-loader (blob-rest) Page 0, byte 0
OS Kernel Image Page 34, byte 0
RAMDISK Page 1039, byte 0
Table 6-8: SDRAM Address Allocation
Name Starting address in SDRAM 
Blob-rest 
0xC0000000
Linux Kernel 0xC0400000
RAMDISK 0xC0800000
129
  
6.6.2 SA-1110 Communication Mechanism with FPGA
Each SA-1110 has several memory-mapped registers implemented in the FPGA (see Table 
6-5 and Table 6-6). The registers are used for the following functions:
Health Status Write 
Communication with VTGB 
Communication with Mesh Network 
Global Register Read 
Virtual Address Register Write 
The SA-1110 software writes to the HEALTH register when the Linux operating system is 
first booted-up. Subsequently, it writes periodically to the HEALTH (address 0x410) 
register at 1 sec interval to indicate that it is still alive . 
To send a packet to the VTGB network the software begins by writing the first half-word of 
the VTGB packet to the VTGB_DATA_START (address 0x402) register; writes the continuous 
half-words of the VTGB packet to the VTGB_DATA_CONT (address 0x404) register; and the 
final half-word to the VTGB_DATA_END (0x406). It reads from the VTGB input FIFO by 
reading the VTGB_READ_DATA register and aborts the VTGB packet by writing to the 
VTGB_ABORT register (address 0x408). The mechanism for interaction with the Mesh 
network is similar. The registers involved are MESH_DATA_START, MESH_DATA_CONT, 
MESH_DATA_END, MESH_ABORT, MESH_READ_DATA.
The mechanism to indicate VTGB FIFO empty and FIFO full status to the SA-1110 is as 
follows: Each time the VTGB input FIFO changes from empty to not empty, the FPGA asserts 
the GPIO_0 interrupt line to the SA-1110. The processor services the interrupt by reading 
the INT_SER0 (address 0x404) register. This action resets the GPIO_0 interrupt line.  The 
GPIO_1 interrupt line to the SA-1110 is set when the VTGB output FIFO changes from full to 
not full. This GPIO_1 interrupt is reset when the processor reads INT_SER1 (address 0x406) 
register. 
The change in Mesh FIFO empty and FIFO full status also triggers the same set of interrupt 
lines. The processor is required to read the COMM (address 0x408) register for all its FIFO 
full and empty status values, to figure out the source of the interrupt.
The processor obtains its physical address on the VTGB network from the Global (address 
0x400) Register. This physical address is used by the processor for specification of source 
addresses in all the VTGB data packets it generates. The processor is responsible for writing 
its virtual address into the VIR_ADD (address 0x412) register. The VTGB handler uses this 
130
  
information to check if the incoming VTGB packet addressed to a virtual node matches the 
virtual address of this entity.
6.6.3 Parallel Processing Paradigm 
Applications in the PPU were designed using the master and slave parallel processing 
paradigm where the system node behaves like the master. Each processor runs an 
algorithm that independently detects if it is the system node, based on the status health 
enquiry of all the physical nodes from the FPGA. Currently the system node is associated 
with the healthy processor node having the smallest physical address in the VTGB network. 
The application processes in the PPU perform different roles in the parallel processing 
paradigm depending on its associated virtual address. If it discovers that it is a system node, 
it performs the role of master as specified in the application code; else it behaves as a slave 
node. A slave waits for the master node to inform it of its virtual address.
The system node takes a virtual address of 0x80 and is responsible to carry out the 
following:
Compute the virtual addresses of the rest of the nodes in the network.
Send VTGB messages to inform each node of its virtual address.
Process VTGB messages generated by the FPGA CAN Handler (only the system node 
receives CAN frames from the VTGB).  The CAN frames from the C515C CAN 
interface are injected into the PPU VTGB network using virtual address 0x80 as the 
destination address. Hence CAN frames are always addressed to the system node 
regardless of the physical processor that the system node is mapped to at runtime.
Co-ordinate the parallel processes. (E.g. controlling the sequence in which the nodes 
send data back to the SSR.)
The concept of virtual address is important as it gives a node identification based on 
runtime roles and functions. Pre-determined addresses do not have the flexibility of 
responding dynamically to runtime fault conditions and to diverse application needs.  
Therefore, the parallel processing application uses virtual addressing, as opposed to 
absolute physical processing, to specify how the application data is to be processed.  Each 
node that is configured as a slave will have to wait for the system node to allocate data to it 
and coordinate the transfer of data according to the sequence specified by the system node.
131
  
6.7 Conclusion 
This chapter has addressed the fifth research question, by considering the hardware, 
software and protocol design factors that made the computing payload practical and 
efficient enough, to be flown and operable within the resource limitations of a small LEO 
satellite mission. The practicality of the design methodology described in the chapter is 
realised through the actual development of the PPU payload. 
The choice of using a simple VTGB ring network topology to interconnect all entities and the 
time-multiplexing of the mesh network, were decisions made which helped save FPGA gate 
resource utilisation. The PPU firmware was implemented within a 1 million gate Actel 
Accelerator FPGA.
The processor booting process in the PPU is unique. The first stage processor boot-code is 
provided by the FPGA after loading the code into its internal memory from the flash 
memory. The second stage of processor booting is done via the VTGB network. This
innovative booting scheme enables all stages of boot-loading to be software re-configurable 
in space, enhancing reliability. 
At the software level, the concept of a system node is introduced. The system node is a node 
selected at runtime to perform all external communication functions. Because of the virtual 
addressing schemes proposed, this node can be dynamically selected at runtime after 
ensuring the healthy status of the node. 
At the operation level, the PPU hardware is designed to handle the dynamic runtime 
configuration of the data distribution process. Hardware control for image division into tiles 
and distribution to the various parallel processing nodes is much more efficient and faster, 
as compared to a software controlled process. The SSR runtime configuration is also the key 
mechanism that enables direct code upload to flash memory, without processor 
interference. This enables codes to be uploaded even when all processors are not in 
operation.
With the above hardware and software considerations and specially defined protocols, the 
PPU computing platform has been designed to be a practical high performance computing 
platform. The PPU is more than just a concept. It is both reconfigurable and efficient in 
terms of implementation and shown to be suitable for small satellite applications. 
 
132
  
7 PPU Qualification Process 
7.1 Introduction 
In preparing the PPU computing payload for space use, the computing platform has to 
undergo rigorous module level testing before integration into the satellite and be qualified 
under simulated vibration and vacuum-thermal conditions.
This chapter elaborates on the qualification process of the PPU board as well as the test 
scenarios and procedures it has to undergo before it is qualified for space use. This 
addresses the fifth research question, which relates to the qualification process to validate 
the feasibility of the PPU design. The qualification process ensures that the PPU design 
results in a space qualified computing module and not just a ground prototype board -
justifying the choice of components and architectural design as described in Section 3.4.
This chapter also describes the debugging facilities incorporated in the PPU which enable 
tests to be performed during the qualification process. These debugging facilities support 
fault injection, data input simulation and data output logging during operation scenario 
testing.
7.2 Qualification Process Flow 
The XSat project adopts a 3-model development philosophy, namely Engineering Model 
(EM), Qualification Model (QM) and Flight Model (FM). The objective of the Engineering 
Model (EM) development was to develop an electronic design that met functional 
requirements and specifications; and that integrated properly with the complete satellite on 
the flat satellite (a term to mean an integrated test-bed). The Qualification Model (QM) had 
to be developed to be resilient enough to survive the launch and to perform in a vacuum 
environment. It was manufactured following the European Cooperation for Space Standards 
(ECSS-Q-ST-70-38C and ECSS-Q-ST-70-08C) as the QM hardware had to be put through 
environmental tests (thermal-vacuum, vibration, etc.). The FM was the final model that had 
to be built and launched into space, after testing. 
The transition from EM to QM and finally to FM was a long process and PPU s use of COTS 
components simplified this process.  In PPU development, the EM, QM and FM boards were 
highly similar as actual flight parts could be used for testing early in the prototyping design 
phase, possible only because cost was manageable with the use of COTS.  For other modules, 
prohibitive cost meant that the most expensive space grade parts were left only to the FM.  
133
  
Having non-identical EM and QM hardware carried some development risks. The advantage 
of using COTS components in the PPU was apparent in the PPU s smooth transition from EM 
to QM and finally to FM. Of course, the lower margins provided by COTS components, such 
as temperature and vibration tolerances, had to be carefully taken into consideration during 
the design phase.
The qualification process of the PPU started from the standalone module level functional 
testing. The PPU module level test involved intensive laboratory bench testing; with the use 
of hardware/ software simulators to generate different test inputs. The PPU has also 
undergone the module level Environmental Stress Screening (ESS) [111], which subjected 
the board to temperature extremes specified by component manufacturers.
At the system level, integration tests of the PPU with the rest of the satellite module on the 
flat satellite test-bed were conducted. When all the modules on the satellite were verified at 
the system level, the team proceeded with system assembly.  The completed satellite 
assembly was then subjected to system qualification testing (e.g. vibration and thermal-
vacuum tests).
Qualifying a payload for launch in space was a tedious process [112], but it was definitely 
essential to ensure its fitness for space usage. 
7.3 PPU QM Module Level Functional Testing 
7.3.1 Introduction 
The objective of the PPU module level functional testing was to qualify the payload for 
operations in space. The following activities were carried out to verify the functions and 
performance of the PPU system.
1. Interface Verification 
2. Operation Scenario Verification 
3. Electrical Stress Screening (ESS)
Interface Verification tests the PPU s interface specification with other onboard modules. 
Operation scenario verification tests the payload under several expected operating 
conditions and its capability of executing different applications and tasks. This was done at 
different levels of parallelism and computational power, as well as under simulated fault 
conditions. 
The PPU board also had to be placed in a thermal chamber to undergo Electrical Stress 
Screening (ESS) where it was subjected to repeated temperature cycling under hot and cold 
temperature extremes. ESS tests the ability of the payload to function at both the minimum 
and maximum temperature that the payload is designed for. The repeated temperature 
134
  
cycling of the board would also reveal board workmanship problems and weakness in 
components if present. 
7.3.2 Debugging Facility incorporated in the PPU Hardware 
The PPU is a complex parallel processing board. To facilitate the testing, extensive effort 
was put in to developing a range of hardware/ software simulators and debugging facilities 
into the hardware board and extension daughter boards.  Figure 7-1 shows the main PPU 
Electronic Board, and Figure 7-2 shows the PPU daughterboard. The debugging features 
proved useful as the PPU has many entities working in parallel and was difficult to test.
Each processing cluster in the PPU has 50-pin sockets that can be connected to the 
daughter-board. Up to four daughter-boards can be simultaneously connected, one for each 
processing cluster. These daughter-boards provide an extensive hardware/ software 
simulation facility and also provide visibility to internal hardware states during testing and 
operation.  
The main PPU Electronic board has jumpers to on and off power to each PN. These jumpers 
can be used to isolate faulty processors, or can be used to simulate a processor at fault.  
Switches to multiplex the flash chip signals between the FPGA and PC interface are present. 
Hence this allows the Linux PC software simulator to interface and program the flash 
directly, without going through the FPGA. Thus software codes in the flash can be changed 
easily and bit faults can also be injected into the flash chip contents.  
Figure 7-1: Debugging Facility on the PPU Main Electronic Board
Legend:
1. Jumper to enable power to 
each PN
2. Switch to select flash 
interface option
3. Two 50-pin daughterboard 
sockets
4. Switch to select CAN 
interface option
5. Jumper to enable power 
from switch-mode regulator 
to PPU processing clusters.
2
3
5
4
1
1
135
  
There are also switches for selecting whether the synchronous CAN interface signals from 
the FPGA go to the C515C microcontroller or to the PC CAN simulator. Hence the PC CAN 
simulator can be used to generate CAN data frames for operation scenario testing. For the 
purpose of load testing and voltage level testing on the PPU power module there are 
jumpers to isolate the power lines.
On the daughter-board there is also a 25-pin parallel port connector to the PC for 
interfacing to the SSR data simulator.  There is logic implemented in FPGA that can switch 
the SSR interface signals between the SSR data simulator and the actual SSR interface. This 
SSR simulator is a convenient way for generating data packets on the VTGB communication 
network. 
 
Figure 7-2: PPU Daughterboard
Each PPU daughter-board services the debugging facility for one processing cluster. On each 
of the daughter-boards, there are two connectors to the logic analyser probe (see Figure 
7-3(a)) for hardware waveform inspection. One connector outputs 32 debugging I/O signals 
from the FPGA network element; while the other outputs flash and CAN hardware signals. 
The logic analyser can sample signals based on the rising edge of a synchronous clock. The 
clock input to the logic analyser can be selected using the jumpers on the daughterboard. 
For additional runtime probing of Actel AX1000 FPGA registers, Silicon explorer [113] can 
be interfaced to the FPGA JTAG probing signals on the daughterboard (see Figure 7-3(b)). 
3
7
1
4
68
5
1
2
Legend:
1. Mictor Connector for Logic 
Analyser Probe connection
2. AX1000 JTAG Probing Signals to 
Silicon Explorer
3. 2 50-pin sockets to PPU 
Electronic Board
4. Jumper to select the desired 
clock signal frequency to be 
connected to Logic Analyser 
Probe 
5. Jumper for Serial Port Selection 
6. 25-pin Connector to Linux PC 
running CAN/Flash  software 
simulator
7. 25-pin Connector to Linux PC 
running SSR  software simulator
8. Four Serial Ports
136
  
(a)                                                                         (b)
Figure 7-3: (a) Daughter-board connection to logic-analyser (b) Daughter-board 
connection to Silicon-explorer 
7.3.3 Power Module Interface Test 
The PPU s main electronic board accepts a 28V supply from the spacecraft power 
distribution module. This 28V supply is down-converted to various voltages (3.3V, 1.5V, 
1.8V, 2.5V) that power-up various components in the PPU electronics board.  There is also a 
direct 5V voltage input to power-up the components in the internal CAN module (e.g. C515C 
microcontroller, CAN transceiver, buffer chips etc.). 
The voltage outputs of the regulated distribution lines were checked to meet specifications. 
The power module was tested at maximum current loading using the Agilent Electronic 
Load (AEL). The AEL was used to provide a passive current load to test the DC-DC regulator 
at 20W output load. The power loading test was performed by first isolating the main PPU 
electronic board from the internal power module with the shorting jumpers removed. This 
test verified the power module design. It was not done on the final board used for QM 
satellite assembly for the reason that jumpers were removed in that hardware iteration. 
Output voltage was measured using the multimeter and the peak-to-peak spikes and peak-
to-peak ripple of the converter/ regulator output were measured using the alternating-
current mode of the oscilloscope and voltage probe.  The results are shown in Table 7-1.
137
  
Table 7-1: Power Interface Test Results
DC/DC 
Converters and 
regulators 
Output Voltages Output Ripples 
(peak to peak)
Output Spikes 
(peak to peak)
Expected 
Results 
Results(V)
Expected 
Results 
Results(mV)
Expected 
Results 
Results(mV)
18.4V 30.5V 18.4V 30.5V 18.4V 30.5V
3.3V DC/ DC 3.3V 66mV 3.390 3.390 165mV 32.8 25.2 330mV 109 204
1.5V Regulator 1.5V 30mV 1.501 1.501 75mV 33.2 6.60 150mV 59.2 49.7
1.8V Regulator 1.8V 36mV 1.797 1.797 90mV 32.4 6.21 180mV 59.6 45.4
2.5V Regulator 2.5V 50mV 2.536 2.536 125mV 38.0 8.60 250mV 66.0 64.1
CAN 1 input 
voltage 
5V 100mV 5.080 5.079 N.A. N.A. N.A. N.A. N.A. N.A.
CAN 2 input 
voltage 
0V 50mV 0.002 0 N.A. N.A. N.A. N.A. N.A. N.A.
7.3.4 CAN Link Interface Test 
There are two CAN nodes on the PPU Electronic Board. There is a CAN node on the PPU_A 
(termed as PPU_CANA), and a CAN node on the PPU_B (termed as PPU_CANB). There is 
internal circuitry on the PPU Power Board to ensure that the PPU_CANA is switched on only 
when PPU_A is turned on, and that the PPU_CANB switched on only when PPU_B is on. 
The PPU CAN node consists of the C515C 8051-based microcontroller with an integrated 
CAN controller and its associated electronics. The C515C interfaces to the CAN bus via a 
TJA1054 CAN transceiver. The CAN high and CAN low logic signalling lines can switch 
between the primary and secondary CAN bus via a relay. 
  
138
  
Several PPU CAN interface tests were done to ensure that  
Different entities in the PPU received the CAN telecommand (CAN_TC) 
frames from the onboard CAN bus.
CAN positive telecommand acknowledgements (TC_ACK) and telecommand 
negative acknowledgements (TC_NCK) from the PPU were properly received 
by the OBC.
CAN telemetry request (TM_REQ) frames sent from the onboard computer 
to the PPU were properly received by entities in the PPU; and the 
corresponding telemetry data packets were received by the OBC.
CAN status replies (TC_STATUS or TC_ERROR) from the PPU in response to 
some dedicated commands were received by the OBC.
The CAN node relay control logic is such that it switches between the 
primary and secondary CAN bus once there is no activity on that particular 
CAN bus for a stipulated period. 
During the tests, a PC based CAN card together with CAN monitor software were used to 
simulate CAN frames from the onboard computer. The CAN bus monitoring software 
provided an interface menu that allowed the user to load a CAN command script for 
transmission on the CAN bus. It had a watch window to view the telecommand 
acknowledgements and status replies from the PPU. The CAN monitor software generated 
the CAN heartbeat frame that was sent to the PPU CAN node every 1sec to ensure that the 
CAN node did not switch between the primary and secondary CAN bus. Different specific 
CAN commands were tested under various operation scenarios which will be described in 
the next section.
7.3.5 LVDS Link Interface Test 
LVDS is a low voltage, low power differential technology used for data transfer between the 
PPU Electronic Board and the SSR. The PPU implements a dual redundant LVDS interface 
with the SSR, hence there are two DB25 male connectors connecting the PPU to the SSR. 
Both the LVDS links have to be verified to ensure data is transferred across in accordance to 
the frame formatting and protocol handshaking as defined in Sections 6.5.4 and 6.5.5.  
To simulate high speed data transfer of image data from the SSR to the PPU, the NationaI 
Instruments (NI) PXI-6562 (see Figure 7-4) was used to generate data in the form of LVDS 
waveforms. The NI PXI-6562 was also used to log the high speed LVDS data output from the 
PPU to the SSR. The NI instrument simulated data streams that tested the PPU s capability 
to process correct frames, generate error signals on reception of erroneous frames (e.g. 
receiving a new frame start when the previous frame was still in reception); and to 
resynchronise to a new start of frame after an error. As there is a limit to the size of the data 
139
  
file that can be loaded to NI for transmission, repeated transmission of the data file was 
used to test the link capability for data sizes totally at least 32Mbytes. 
Figure 7-4: NI PXI-6562 Instrument 
7.3.6 Environmental Stress Screening (ESS) Qualification
At the module level, the PPU was subjected to Environmental Stress Screening (ESS). The 
ESS subjected the hardware to multiple hot and cold cycles at the temperature limits. These 
tests were done early to allow latent defects to surface before the System Level 
Environmental Qualification Test. The PPU was also operated during the temperature high 
and low limits to detect temperature dependent failures. ESS was carried out using the 
Thermotron (SE-300-2-2) thermal chamber located in the satellite laboratory (see Figure 
7-5). The temperature cycling profile is shown in Figure 7-6.
Figure 7-5: Thermotron SE-360-2-2 Test Set-up 
140
   
Figure 7-6: The PPU Temperature Cycling Profile  
Temperature cycling (TC) is effective in revealing PCB de-lamination, solder joint and plated 
through-hole problems, as well as part defects that escaped incoming inspection. The ESS 
test precipitates hidden defects due to weak parts and workmanship defects. It precipitates 
latent defects like damages due to electrical overstress, excessive heating during soldering 
or material induced flows. ESS can also precipitate problems in areas where there is 
insulation damage, inadequate wire stress relief or poor contact termination.
7.4 Integration of the PPU QM into Flat Satellite 
Satellite integration was first carried out on the flat satellite test-bed, where all electronic 
modules were integrated together on a laboratory bench in the clean room (see Figure 7-7). 
The flat satellite integration test verified all system interfaces between modules. Satellite 
operation scenarios were conducted on this test bed to verify that the satellite could 
operate as designed. Automated test scripts were used to put the satellite under different 
modes of operation.   
141
  
.
Figure 7-7: Flat Satellite Test-bed 
(a) Open view                (b) Closed view
Figure 7-8:  XSat Fully Assembled 
Upon successful completion of all operation scenario tests in the flat-satellite test-bed, the 
whole satellite was assembled into a 3-dimensional frame structure. Figure 7-8 shows the 
XSat satellite fully assembled. Operation scenario testing was repeated after the satellite 
was assembled. This served as the final verification before the whole satellite structure 
model was shipped to India for environmental tests in ISRO.
 
142
  
7.5 Thermal Environmental Test (ISRO)
The XSat thermal vacuum test was conducted on the XSAT Qualification Model in a 2 meter 
Thermovac chamber in ISRO in Dec 2008. The objective of the thermal vacuum test was to 
subject the entire satellite to qualification temperature limits in a vacuum condition and 
determine if there was any heat build-up that affected satellite performance.  In a vacuum, 
thermal heat dissipation or heating is purely by conductive or radiative means.  Convection 
is not possible in vacuum conditions. The XSat satellite (without deployable solar panels) 
was mounted onto the platform of the thermal vacuum chamber via an interface jig as 
shown in Figure 7-9.
Figure 7-9: ISRO Thermal Chamber Test Preparations
Sixteen T-type thermocouples were placed at the different temperature points for the 
purpose of temperature monitoring during the thermal vacuum test. The thermocouple 
locations were as indicated in Figure 7-10.
During the test, the shroud temperatures in the chamber were set at qualification limits of   
-40ºC and +25ºC, within ±1ºC. At each temperature limit, the chamber temperature was 
maintained for at least 8 hours, i.e. either a hot soak or cold soak .  The vacuum was 
maintained at 1.8x 10-6 mbar. The thermal test took 118 hours to complete, for a total of 5 
thermal cycles. 
Simulated payload operations were performed during each hot and cold soak. The PPU 
operation test was performed during the third thermal cycle as shown in Figure 7-11.
143
   
Figure 7-10: Temperature sensor locations during Thermal Vacuum Test  
Figure 7-11: XSat Thermal Test Profile 
144
  
Figure 7-12: Thermal Plot for the PPU 
The PPU operations conducted in the thermal chamber include code upload test, parallel 
image compression and flash read/ write operations. Figure 7-12 showed the minimum and 
maximum temperature achieved in the thermal-vacuum test during the hot and cold cycles 
for the PPU. The temperature ranges of the PPU hardware were kept within the PPU board 
temperature specification limits of 0 ºC to 70 ºC. All the PPU operations were verified to be 
conducted without any failures in the thermal chamber.
7.6 Vibration Test (ISRO)
Vibration tests were not carried out on the PPU at module level as it is part of the tray stack 
and protected from the launcher s shock and vibration by the structural frame.  Thus the 
PPU only underwent system level qualification, which was sufficient since it did not contain 
any components sensitive to vibration or shock.
Sinusoidal and random vibration tests were conducted on the XSat 3-dimensional stack-up 
in December 2008. The vibration tests verified that the satellite structure and the electronic 
boards (PPU included) could withstand the vibration levels experienced during launch 
along different axis, and operate without failures after the vibration tests. 
Both the sinusoidal and random vibration tests were performed for the longitudinal and 
lateral axis (see Figure 7-13). The purpose of sinusoidal vibration testing was to 
demonstrate the ability of the satellite to withstand low frequency excitation of the 
launcher, while the random vibration testing was to demonstrate the ability to withstand 
random excitation produced by the launcher. The former would be important to the 
structure of the satellite and the latter would be relevant to the electronics modules. After 
conducting the vibration tests on each axis, a functional check was carried out on every 
145
  
electronic module including the PPU. Hence the PPU has been proven to survive the 
qualification vibration levels.
(a)Lateral Axis
   
(b) Longitudinal Axis 
Figure 7-13: Sine and Random Vibration Tests
7.7 Conclusion 
The debugging facilities and test procedures are key components of the qualification test 
setup for the PPU computing payload, specially designed to facilitate module level testing, 
system level integration and environmental testing for space.  They allow the computing 
payload to be tested under various simulated scenarios, including fault simulation 
scenarios. The comprehensive set of debugging facilities supports remote monitoring of the 
payload, verifying the performance of the payload under the various operation scenarios. 
The qualification process that the PPU payload underwent established its readiness for 
launch. The PPU EM hardware was completed in December 2006. The design was proven to 
be feasible and meeting proposed payload requirements. The PPU QM hardware was 
completed in October 2007. After the completion of the QM flat satellite test in October 
2008, the satellite was fully assembled into a 3-dimensional structural frame and shipped to 
ISRO for environmental testing. The PPU QM, together with the rest of the electronic 
modules, successfully cleared the environmental tests. With that, the PPU FM hardware was 
manufactured in 2009. It is currently ready for final FM acceptance test in ISRO before it is 
launched.
146
   
8 Conclusion 
8.1 Research Achievements 
The PPU design methodology described in this thesis enables the use of COTS components 
in space changing the conventional economics for the provision of high performance and 
reliable onboard computing capabilities.
The research has addressed the research questions raised earlier at the start of the thesis.  
The following summarizes the achievements.
8.1.1 Achieving a Feasible Design Approach for a Space Computing Payload for High 
Computational Performance and Reliability 
The first research question in this thesis is concerned with the development of a design 
approach for a computing platform that achieves high computation power through 
parallelism of COTS processing elements.  To achieve this, parallelism is exploited at all 
levels to enable maximum usage of COTS components and configured in such a way that 
achieves the desired reliability. In this design approach, massive parallelism is exploited not 
only to achieve high computational performance but also to overcome the high probability 
of failure in space.  Parallelism is confined not simply to processors, but extended to the 
network, interfaces, storage and almost every other aspect of the design. 
The design process is supported by a detailed part selection process that includes practical 
considerations like thermal, power, mass, volumetric and component placement 
constraints. It factors in radiation analysis to assess COTS technology for space use and puts 
radiation mitigation measures in place to keep radiation upsets at reasonable levels. The 
exposure of the computing module in terms of radiation total dosage level is kept within the 
tolerable limits of the COTS components through carefully computed amounts of radiation 
shielding.
In the design approach, parts reliability assessment of the COTS components is done in 
order to subsequently do a quantitative measurement of system reliability. The accurate 
process of reliability assessment serves as a quantitative platform for comparing 
architecture baselines and performing architectural tradeoffs. Through an iteration of this 
entire design process, the final fault tolerant computing architecture and its redundant 
147
  
paths are derived and reliability figures computed for the computing platform. High system-
wide reliability figures computed for the COTS-based computing platform signified a 
successful transfer of COTS high-performance computing technology for space use, in a 
rapid and cost-effective fashion. The goal of achieving a reasonable system-wide reliability 
figure at reasonable cost has been met.  Essentially, a reliable platform has been designed 
without an expensive over-catering for fault tolerance. 
8.1.2 Achieving an Efficient Fault Tolerant Communication Network Implemented 
on a FPGA Platform 
The second research question of this thesis is concerned with achieving efficient 
implementation of the reconfigurable network and fault management schemes in a FPGA. 
This is necessary since the fault avoidance approach of using radiation-hardened parts is 
not adopted for the PPU. This was resolved through the innovative design of the VTGB - a 
resource efficient and reconfigurable communication network. The VTGB communication 
network is a FPGA based interconnect for a set of heterogeneous processing and memory 
elements. It can be configured to route through different combinations of the parallel 
processing clusters.  It serves as the backbone of the PPU, providing on-chip communication 
facilities between the various network elements as well as fault tolerance mechanisms.
The VTGB is resilient to a wide range of faults from processor node failures and FPGA 
failures to memory failures. The VTGB s unique virtual addressing scheme enables virtual 
resources to be mapped to selected healthy processing nodes at runtime, hence dynamically 
avoiding faulty elements without interference from the user. The proposed VTGB network 
has resolved the problem of interconnection of heterogeneous parallel elements through its 
support of variable length packets enabling peripherals of diverse data rates to be on the 
network.  The unique capability of the communications network is its ability to interconnect 
boot ROM elements, processors, interface chips onto the same communications bus, thus 
creating a reliability redundant path for every component in the system. 
The PPU can be configured to operate at different quality levels of service to the 
applications by determining the level of spare redundancy required at runtime. For 
instance, four banks of triple-majority flash memory can be shared among 20 processors 
over the communications network, providing a multitude of possible reconfiguration 
options for setting up the processor flash memory access modes. Such flexibility ensures 
that no one single failure will bring down the system. The VTGB supports both static and 
dynamic redundancy and allows faults to be detected, contained and overcome through 
network re-configuration.
8.1.3 Achieving a Fault Tolerant Processing Array Design Specifically for Remote 
Sensing Missions 
The third research question of this thesis is concerned with an architecture for a Fault 
Tolerant Processing Array (FTMPA), which greatly increases the parallel efficiency of 
148
  
applications that require inter-processor communication.  The FTMPA is built on a parallel 
processing platform with four FPGAs to solve the interface limitations of a single FPGA. As 
such, it can support a sufficiently large parallel network. Building a parallel processor 
network out of multiple FPGAs also means that when one FPGA network element fails, the 
processor network can still work. However, it might be operating in a gracefully degraded 
mode based on fewer processing nodes and FPGA network elements and a smaller logical 
mesh.  
The concept of FTPMA clustering, where processors are connected locally into processor 
clusters, and where the processor clusters are interconnected using multiple network 
elements forms the basis of the high reliability mesh.  The FTMPA architecture consists of a 
topology of 2x2 FPGA network elements. The FTMPA remapping algorithms ensure that 
when there are at least 4n2 healthy processors in the physical mesh, a 2n x 2n logical mesh 
communication network can always be constructed as long as there are at least (n+2) 
physical interconnection links between adjacent FPGA network elements.
The FTMPA flexible FPGA switch structure that connects the large parallel network of COTS 
processors to form the logical mesh is configurable. Hence, the specific mapping of the 
logical cell in the mesh to the actual physical processors can be changed in the presence of 
faults.  The FTMPA specially designed mesh remapping algorithm centres round a Global 
Remapping Strategy that divides the mesh into the various physical partitions. This bounds 
the number of required inter-FPGA communication links to a known value. On the other 
hand, the Local Remapping Strategy maps the local mesh partition to physical processors 
and defines the FPGA switch structure. The FPGA switch network structure provides an 
efficient utilisation of resources with the unique design of time-multiplexed input and 
output ports for the inter-processor communication ports.  
In short, the FTMPA mesh array has provided efficient communication between processors 
assigned to different logical mesh cells. This has increased the parallel processing efficiency 
of communication intensive image processing applications.
8.1.4 Demonstrating a Practical and Feasible Implementation of a Computing 
Payload on a LEO Satellite mission 
The fourth research question of this thesis is concerned with the specific hardware, 
software and protocol considerations that impact the practicality of the design. This has 
been investigated in Chapter 6 of this thesis. The COTS enabling technology and fault 
tolerant strategies of the PPU meant that space efficient COTS BGA packaged IC Chips for 
the Actel FPGAs and COTS StrongARM processors could be used. The huge savings in terms 
of PCB space has enabled the successfully packing of several high performance IC chips. The 
use of state of the art StrongARM processors in terms of MIPS/ watt has helped the payload 
achieve a computing performance of 4700 Dhrystone 2.1 MIPS and operate within power 
limits of 20W.
149
  
The specific design of the fault tolerant logic in FPGA firmware was highly gate-efficient. 
The fault tolerant logic for handling five processors in a local processing cluster was 
implemented within a 1 million gates Actel Accelerator FPGA. The choice of using a simple 
VTGB ring network topology to interconnect entities and the innovative time-multiplexing 
scheme of the mesh network were decisions made which helped save FPGA gate resource 
utilisation. 
The unique processor booting architecture via the network has allowed a single code bank 
to be shared among all the processing elements. This is an example of efficient use of 
resources, which frees resources for redundancy purposes. Direct code upload from SSR 
memory to flash memory is possible without the interference of processors. Codes can thus 
be uploaded even when processors are not in operation, making code-reconfiguration an 
easier and more reliable task. A hardware distribution of image tiles to the various 
processing nodes enabled higher image data transfer rates, saving communication time.  
Hence an efficient hardware and software design coupled with efficient operation concepts 
has helped achieve a high performance and reliable computing architecture. This is despite 
the low usage of resources like board space, volume, weight, power and cost. The 
practicality of the design methodology described in Chapter 3 has been shown and 
illustrated through the actual development of the payload. The resource utilisation for the 
computing payload, as summarised in Section 8.2.1 is an important assessment of the 
effectiveness of the proposed design architecture. 
8.1.5 Qualifying PPU Design and Implementation for Launch and Operation in Space 
Environment  
The fifth research question of this thesis is concerned with qualifying PPU design for launch 
and operation in space. As such, the PPU was designed with ample verification facilities.  
The verification setup can support proper operation scenario testing and collection of 
results, even during environmental testing in a thermal-vacuum or vibration chamber. This 
allowed operation scenario tests like parallel compression to be conducted to demonstrate 
operability when the hardware board is repeatedly thermal cycled within the minimum and 
maximum temperature range in the thermal chamber, or after the PPU is subjected to the 
effects of the lateral forces in a vibration chamber. 
When the PPU is being qualified, it has to go through rigorous module level testing before it 
is integrated with the rest of the satellite bus.   Both normal functions and fault handling 
functions were tested.  With the use of the debugging facilities, fault injection was carried 
out to simulate failure modes and demonstrate that the PPU is functional even in the 
presence of faults.  At system level, the PPU is integrated and various operations scenarios 
were exercised to simulate real operations in space.
The success of the thermal cycling and vibration tests demonstrates the high quality of 
board manufacturing and fabrication. The environmental tests show that PPU is a 
150
  
computing payload that fulfils the rigorous requirements of a qualified space module.  The 
qualification process has demonstrated the PPU to be a practical high performance 
computing payload that attains the same level of robustness as other parts of the satellite.  
As the only COTS-based computing payload qualified to be carried onboard XSAT, it is on its 
way to demonstrate COTS computing in space.
8.2 Transiting Commercial components for Space-borne 
Implementation 
8.2.1 The PPU as a Demonstration Test Bed 
The PPU was developed as a demonstration test bed to test out the various concepts 
proposed in this thesis.  It is an output of the research work to validate the feasibility of the 
methodology and architecture advocated.  Within this project, the PPU was demonstrated to 
be implemented within the limited resources of power and space, typical of a satellite 
onboard computing module.  Through this test bed, the mission value of a high 
performance payload based on COTS components is realized.
As mentioned in Section 1.1, commercial-off-the-shelf (COTS) parts are widely available and 
are superior in performance to space-grade parts. However the process of transiting 
commercial technology components to a space-borne implementation of a computing 
platform is challenging.  The Parallel Processing Unit (PPU) was designed according to the 
design methodology proposed in Chapter 3, with a design goal of achieving high reliability 
and high probability of survival in the 3 years of satellite life-span.  It contains redundant 
elements at all levels, with the flexibility of reconfiguration in the usage of these elements.    
The outcome of the architectural design is a high performance computing board, with 
twenty 206MHz StrongARM processors interconnected via four FPGA network elements. 
There are two communication networks provided in the PPU. The VTGB network 
interconnects all entities within the PPU and interfaces the PPU to the external system 
interfaces (spacecraft CAN command bus/ SSR memory storage bank). The Mesh 
communication network is designed for inter-processor communication, to speed up 
parallel image processing requiring intensive inter-processor communication.
Besides computation power, the PPU satisfies the reliability requirement of an onboard 
computing module by having no single point of failure in the design and is able to perform 
graceful performance degradation in the event of hardware failure. It meets a reliability 
statement which states that the probability of the PPU offering 3760 Dhrystone 2.1 MIPS of 
computation power within the mission lifespan of three years to be 0.9. This is due to its 
fault tolerant mesh processing array, strong redundancy in its processing elements, its 
configurability and flexibility of operation.
151
  
A summary of the characteristics of the PPU payload is as given in the Table 8-1.
Table 8-1: The PPU Specifications Summary 
Parameter Specifications
Processing Power 4700 Dhrystone 2.1 MIPS
Distributed SDRAM Memory 1.28 GBytes  
Form Factor 36cm by 29 cm
Cost < US $80000
Operating Temperature Range 0 to 70 degrees Celsius
 
Reliability Figure* Survival Probability 0.9953
Normal Computation Probability 0.9951
Peak Computation Probability 0.9186
Software Configurable 
Components 
Boot-load code, Operating System Kernel , Kernel 
RAMDISK, Application Code
Maximum Power 20W
* See Section 3.7  for definition and derivation of reliability probabilities  
8.2.2 The PPU as a State of the Art 
The PPU is a highly scalable processing board for space use that is scaled up in connectivity 
by having a 2 by 2 array of FPGA network elements, enabling interface capability in the 
range of tens of processors. This interface capability can be expanded further through a 
possible iterative application of the remapping heuristics. 
The PPU s scalability extends to the ability to add and integrate heterogeneous components 
to the network, something yet to be demonstrated in other projects.  The VTGB s variable-
length and message-based communication paradigm is critical in enabling this feature, 
making signal interface transparent to the VTGB network.  
VTGB provides the framework to share network resources. In the case of the PPU, 4 sets of 
TMR boot-roms are shared among 20 StrongARM processors, for example.  The cost of 
adding new processors to the VTGB is minimised as no additional boot-rom is required to 
support the newly interfaced processors.  The same applies to interfaces like the CAN 
command bus and SSR data links, where processors can have accesses to these resources 
once they are connected to the network.  In the future, a design involving an expanded 
152
  
functionality of a PPU involving COTS DSPs can also be interfaced to the same network and 
similar fault tolerance schemes applied to them.
The PPU is designed with low power consumption in mind to make it suitable for use for 
microsatellite applications.  The power consumption is kept very low, compared to the 
amount of computational power generated.  At its peak computation performance, the PPU 
can operate at 235 MIPS/ watt.  This is significantly higher than most other parallel 
computing platforms surveyed in Section 2.4.  
The use of low power and high MIPS/ watt StrongARM processors serves to achieve the 
computation power target without creating thermal issues. The approach in PPU design  
was not to select the most powerful processors in the market, but rather to place more 
energy efficient processors on the same board to achieve the required total system MIPS.  
Designing with low power consuming parts avoids exceeding the temperature range of the 
commercial parts during operation and prevents thermal hot spots due to lesser heat 
dissipation by these processors.  Commercial parts are qualified only to a small temperature 
range of 0 to 70 degrees Celsius. 
Parallel processing platforms that appear scalable in their architecture might in fact be 
highly limited by the high power consumption and thermal heat dissipation of the 
processing nodes.  In the PPU, it has been demonstrated that 20 StrongARMs operating 
under various scenarios on the same board show no signs of thermal or power issues even 
under the most severe environment conditions conducted during system qualification 
testing, such as thermal-vacuum cycling.
The PPU has demonstrated the use of parallel processing as a means to achieve system 
reliability.  It does this through the provision of fault tolerance at both the local and global 
levels; local static fault tolerance schemes are coupled with global reconfiguration options 
and redundancy paths. It offers dynamic response to a wide category of faults, such as faults 
in the memories, processors and the network elements. Several of the PPU fault tolerant 
schemes are autonomous. This increases payload availability as there is no need to wait for 
a ground pass for ground operators to upload a new command script each time a fault 
occurs. 
Through successful provision of parallel redundancy paths for all components, the PPU 
approach to fault tolerance totally eliminates the need for radiation tolerant/ hardened 
components to ensure system reliability.  For example, there is no need for a radiation-
hardened processor to act as the system monitor or controller as in the case of EAFTC and 
NMP ST8.
The PPU s strength lies in its multiple communications schemes that suit the needs of both 
communication-intensive as well as computationally-intensive parallel processing 
applications. One salient feature of the PPU network is with the virtual addressing concept 
(see Section 4.7.1). When applications request a virtual resource, the PPU internally maps 
153
  
that virtual resource to a physical resource at runtime. Runtime mapping of resources 
allows the PPU to adapt dynamically to the operational health status of the network and 
processor arrays. As external entities communicate with the PPU via virtual addressing, the 
PPU s internal runtime configuration is transparent to external parties. It is also transparent 
to the user applications running in the processing nodes.
Software re-configurability enables different levels of Quality of Service, required for the 
tasks at hand. It also makes the PPU adaptable to varying spacecraft power conditions, 
mission reliability and computation needs.  If the application requires fault tolerance, 
redundancy can be enabled through software configuration.  The number of PPU processing 
nodes to be turned on is configured at runtime. These processors can work in parallel to 
provide peak computation power; or can work in a redundant mode to optimise reliability. 
A new PPU configuration is achieved by sending different CAN commands or by uploading a 
new set of application codes. The new set of application codes can be written into the flash 
chip for permanent storage or dynamically loaded to StrongARM volatile memory at 
runtime. The latter is lost once the PPU powers down. This flexibility allows for both highly 
critical tasks and high performance tasks to be run on the same system through mix-and-
match of processors configuration to each task.
Of course, like all other COTS based processor development, the PPU provides a highly 
familiar environment for applications to run.  It uses entire COTS software and the Linux 
operating system which allow application users to have the support of the wide multitude 
of COTS developments tools and also easy portability to the flight software environment.
8.3 Future Work 
The scope of the thesis has been to focus on the development of a fault tolerant parallel 
computing architecture for remote sensing missions.  This has been fulfilled with the 
concepts and methodology proposed in the thesis, with the PPU developed as a test-bed. 
Following the development of the PPU, the computing platform is awaiting launch.  In-orbit 
testing will be carried out to validate the operation of the payload and realize the mission 
value of PPU in space.  
As the PPU is designed to be a computing payload that can be flown without any user 
software applications at launch, different types of image processing applications can be 
uploaded. Hence, the next step will be to design applications that can make full use of the 
fault tolerant and parallel processing features in the PPU and demonstrate the capability 
increase due to such a high performance computing payload.  
While sensor data volume has been increasing rapidly over the years, the rate of increase 
for onboard storage or downlink is at a much slower pace. Onboard processing will help 
reduce the data volume through different methods of pre-processing. Maximum 
downloadable data to the CRISP ground station within a 10min ground pass and at a 
maximum rate of 50Mbits/ sec is only about 3.75Gbytes of image data. This translates to 
154
  
about 50 scenes of 5000 x 5000 pixels for 3 spectral bands. Onboard compression can be 
applied to increase the number of downloadable image scenes; or advanced image 
classification and segmentation techniques can be performed to extract useful image 
features and reduce the amount of data for download.
The XSat primary IRIS payload is a multi-spectral imager in the Near-Infrared (0.76 m to 
0.9 m), Red (0.63 m to 0.69 m), and Green (0.52 m to 0.62 m) bands. Its spectral and 
spatial resolution is similar to Landsat TM. Based on these band characteristics, several 
choice applications have been chosen. The Near Infrared and red band are very useful for 
detecting water [114]. Hence flood detection is one suitable application onboard XSat. Other 
possible applications are cloud detection. The Singapore sky is always covered with clouds. 
An onboard cloud detection algorithm to mask out cloud pixels from further processing will 
lead to greater efficiency and utilisation of the communication bandwidth. Onboard 
compression such as the content-based JPEG2000, if implemented onboard, can achieve 
compression ratios higher than JPEG2000 compression. The algorithm will utilize a
compression map that assigns adjustable weights to different regions in the image data.  
These weights are calibrated according to their image usefulness contribution to the user-
defined mission. Regions-of-interest can be preserved to a much higher degree.
With onboard intelligence, a satellite can perform complex automated tasks like change 
detection. Such a mission requires the satellite to be loaded with algorithms to detect events 
like flood, oil spills or fire automatically and store only images where the change detection 
suggests occurrence of events or changes in features. The PPU payload will help the XSat 
Earth observation mission extend its capabilities to this new class of applications.
For this research, the communication structure was developed for use with image 
processing of multi-spectral images. Future work can include exploring communication 
structures that are suitable for highly computation intensive operations, such as the 
processing of hyper-spectral imaging data.  This new class of application might mean the 
need to generalise the dynamic remapping procedures for an enlarged array of FPGA-
connected clusters to achieve higher processor scalability. 
The PPU is also a good experimental platform to test out new fault tolerant algorithms (e.g. 
EDAC schemes). It can also be used to collect data on radiation upset occurrences. The 
success of the payload will validate the potential use of a COTS payload for performing more 
critical tasks for future XSat missions. 
  
155
  
8.4 Final Conclusion 
In final conclusion, the thesis has proposed several innovative concepts for designing and 
implementing a high performance and fault tolerant parallel processing unit, a secondary 
payload onboard the XSat micro-satellite.  The PPU is one of the few satellites in the LEO 
micro-satellite domain with high speed onboard parallel processing capabilities that can 
support an automated and intelligent satellite mission. Its use of COTS components instead 
of expensive space-grade parts has changed the conventional economics for the provision of 
such onboard capabilities. 
156
  
9 Appendix A: Part Failure Rates for the PPU 
(in FITS)
Parent ID Part Number Name Qty Failure Rate Failure 
Rate(FITS)
PPU ELN Flash (2/3) AT45DB642D-
CNU 
64Mbit Serial Flash 2 0.1274 127.4
PPU ELN Flash (2/3) C0805C104K5RAC
-TU 
Ceramic Chip Capacitor,100nF,50V 11 0.002840648 2.840648
PPU ELN FPGA (1/4) AX1000 FG484 Actel Axcelerator FPGA 1 0.0401583 40.1583
PPU ELN FPGA (1/4) TPSB475K035R07
00
Tantalum Capacitor,4.7uF,35V.1 5 7.91588E-05 0.07915881
PPU ELN FPGA (1/4) C0805C103K1RAC Ceramic Chip Capacitor, 10nF, 50V 112 0.002298707 2.298707
PPU ELN FPGA (1/4) C0805C104K5RAC
-TU 
Ceramic Chip Capacitor,100nF,50V 246 0.002840648 2.840648
PPU ELN FPGA (1/4) CRCW080520K0 Thick Film Chip 
Resistor,20K,0.125W
1 7.14931E-05 0.07149309
PPU ELN FPGA (1/4) VCC1_B3D_100M0
00B
FPGA 100MHZ Oscillator 1 0.0187462 18.7462
PPU LVDS interface DS90LV031AWGQ
ML 
LVDS Quad Differential Line Driver 2 0.000364 0.364
PPU LVDS interface DS90LV032AWGQ
ML 
LVDS Receiver 2 0.00364 3.64
PPU LVDS interface CRCW0805100R Thick Film Chip Resistor,100 
ohm,0.125W
4 0.000185513 0.185513
PPU CAN C0805C104K5RAC
-TU 
Ceramic Chip 
Capacitor,100nF,50V,±10%
11 0.002840648 2.840648
PPU CAN C0805C105K4RAC
-TU 
Ceramic  Capacitor,1uF,16V,±10% 1 0.003970138 3.970138
PPU CAN C1210C107M9PA
C-TU 
SMD Ceramic Chip 
Capacitor,100uF,6.3V,±20%
1 0.00527045 5.27045
PPU CAN CRCW0805220RF
KEA 
Thick Film Chip 
Resistor,220ohm,0.125W
1 0.000194029 0.1940285
PPU CAN CRCW0805100R Thick Film Chip Resistor,100 ohm, 
0.125W
1 7.11538E-05 0.07115376
157
  
PPU CAN CRCW0805330R Thick Film Chip Resistor,330 
ohm,0.125W
1 9.51263E-05 0.09512625
PPU CAN CRCW08051K  1% 
100 ET1
Thick Film Chip 
Resistor,1kohm,0.125W
10 0.00021375 0.2137495
PPU CAN CRCW1206 2K 1% 
100 ET1
Thick Film Chip Resistor,2K,0.25W 2 7.18589E-05 0.07185894
PPU CAN CRCW0805100KF
KEA 
Thick Film Chip Resistor, 
100Kohm,0.125W,1%
3 7.13078E-05 0.07130775
PPU CAN CRCW08054K70 Thick Film Chip Resistor, 4.7K, 
0.125W
6 7.45612E-05 0.07456124
PPU CAN CRCW08050000 Thick Film Chip Resistor,0 
ohm,0.125W
8 7.1151E-05 0.07115104
PPU CAN AD780BRZ High Precision 2.5 V/ 3.0 V Bandgap 
Voltage Reference 
1 0.00182 1.82
PPU CAN SN74LVT245BDW Octal buffer driver/ mixed mode 2 0.00418 4.18
PPU CAN VCC1-A3D-8M000 Surface Mount Crystal Oscillator, 
Single-output Module,8MHz
1 0.01048634 10.48634
PPU CAN TJA1054AT Fault tolerant CAN transceiver 2 0.008809243 8.809243
PPU CAN MMBT2907A PNP Transistor, Small Signal 2 0.008372 8.372
PPU CAN ER422D-5A/SQ Miniature 2 coil latching Relay 1 0.06485964 64.85964
PPU CAN SAB-C515C-8E C515C 8-Bit Single Chip 
Microcontroller 
2 0.1248973 124.8973
PPU CAN MM74HCT573WM Octal D type Transparent Latches 1 0.00219 2.19
PPU Power Supply EL7532 Monolithic 2A Step-Down Regulator 2 0.228 228
PPU Power Supply EL7554 Monolithic 4 Amp DC-DC Step-Down 
Regulator 
1 0.06 60
PPU Power Supply IRF5210PBF Power Mosfet 1 0.06080007 60.80007
PPU Power Supply FMC-461F/883 EMI Filter, 28V, 2.7A 1 0.0775 77.5
PPU Power Supply MTR283R3SF/88
3
DC-DC Converter, 3.3V, 6.06A, 20W 1 0.248 248
PPU Power Supply CRCW080549K9 Thick Film Chip 
Resistor,49.9K,0.125W
2 7.12356E-05 0.07123562
PPU Power Supply CRCW08055K11 Thick Film Chip 
Resistor,5.11K,0.125W
2 7.3908E-05 0.07390799
PPU Power Supply C1206C224K3RAC
-TU 
Ceramic Chip Capacitor, 0.22uF,25V 4 0.00155822 1.55822
PPU Power Supply T510X106K050AT Tantalum Capacitor,10uF,50V 4 9.41709E-05 0.09417091
158
  
E090
PPU Power Supply CRCW080512K7 Thick Film Chip 
Resistor,12.7Kohm,0.125W
2 7.12003E-05 0.07120033
PPU Power Supply RN73C2A10K2BT
G 
High Precision Resistor,10.2 
Kohm,0.125W
2 7.12002E-05 0.07120017
PPU Power Supply C0805C183K5RAC
-TU 
Ceramic Chip Capacitor,18nF,50V 4 0.0024344 2.4344
PPU Power Supply C0805C221J1GAC Ceramic Chip Capacitor,220pF,100V 2 0.001630428 1.630428
PPU Power Supply CRCW1206 2K32 
1% 100 ET1
Thick Film Chip 
Resistor,2.32K,0.25W
2 0.00021375 0.2137495
PPU Power Supply T495D476K035A
TE300
Tantalum Chip Capacitor,47uF,35V 2 0.000134431 0.134431
PPU Power Supply C0805C473K1RAC Ceramic Chip Capacitor,47nF,35V 2 0.002642062 2.642062
PPU Power Supply CRCW0805100R Thick Film Chip Resistor,100 ohm, 
0.125W
1 0.000185513 0.185513
PPU Power Supply C0805C104K5RAC
-TU 
Ceramic Chip Capacitor,100nF,50V 4 0.002840648 2.840648
PPU Power Supply CRCW0805 10K 
5% 200 ET1
Thick Film Chip 
Resistor,10K,0.125W
4 7.13078E-05 0.0713078
PPU Power Supply C0805C222K5RAC
-TU 
Ceramic Chip Capacitor,2.2nF,50V 1 0.002006692 2.006692
PPU Power Supply CRCW080521K5 Thick Film Chip 
Resistor,21.5K,0.125W
1 7.13251E-05 0.0713251
PPU Power Supply T495D226K035A
TE300
Tantalum Capacitor,22uF,35V 1 0.000112895 0.1128948
PPU Power Supply PCB PCB 1 0.006833253 6.833253
PPU Power Supply CRCW08050000 Thick Film Chip Resistor,0 
ohm,0.125W
6 7.1151E-05 0.0711510
PPU Power Supply CRCW120611K5F
KEA 
Thick Film Chip 
Resistor,11.5Kohm,0.125W
2 7.17465E-05 0.0717465
PPU Power Supply IRF7220 P Channel Mosfet Switch 40 0.005797115 5.797115
PPU  
SDRAM+StrongARM 
Processors 
CRCW0805 10K 
5% 200 ET1
Thick Film Chip 
Resistor,10K,0.125W
74 7.13077E-05 0.0713077
PPU  
SDRAM+StrongARM 
Processors 
CRCW080522R0 Thick Film Chip Resistor, 
22ohm,0.125W
5 0.00021375 0.2137495
PPU  
SDRAM+StrongARM 
Processors 
CRCW0805330R Thick Film Chip Resistor,330 
ohm,0.125W
1 9.51263E-05 0.0951263
159
  
PPU  
SDRAM+StrongARM 
Processors 
C0805C104K5RAC
-TU 
Ceramic Chip Capacitor,100nF,50V 10 0.002840648 2.840648
PPU  
SDRAM+StrongARM 
Processors 
C0805C103K1RAC Ceramic Chip Capacitor, 10nF, 50V 4 0.002298707 2.298707
PPU  
SDRAM+StrongARM 
Processors 
K4S561632H-
TUI/P75
256Mbit SDRAM 2 0.33 330
PPU  
SDRAM+StrongARM 
Processors 
SA-1110 StrongARM Processor 1 0.5105202 510.5202
PPU  
SDRAM+StrongARM 
Processors 
AEL 3.6864 
(HC49S SMD)
AEL 3.6864MHZ Crystal 20 0.008774693 8.774693
PPU  
SDRAM+StrongARM 
Processors 
FX135 FOX 32.768KHz Crystal 20 0.002961194 2.961194
PPU  
SDRAM+StrongARM 
Processors 
DSUB25 Connector,rectangular,D sub,plug,25 
way,contact size 20,right angled
2 0.04995178 49.95178
160
  
10 References 
                                                            
[1] Nancy Leon and Lynn Chandler, New Satellites Push Technological Boundaries , NASA 
website, http://www.nasa.gov/vision/universe/solarsystem/st5.html, last accessed 2009.
[2] Yury Zaitsev, Russian Satellites: Smaller, Lighter, Cheaper , 
http://www.spacemart.com/reports/Russian_Satellites_Smaller_Lighter_Cheaper_999. 
html, last accessed 2009, IKN Space Research Institute.
[3] John Rhea, The challenges of space on the new COTS frontier , Military & Aerospace 
Electronics, May 1997.
[4]  I.V. McLoughlin, V. Gupta, S. Singh, S.L. Lim, T. Bretschneider, Fault Tolerance Through 
Redundant COTS Components for Satellite Processing Applications ,  Proceedings of the 
Fourth International Conference on Information, Communications & Signal Processing and 
IEEE Pacific-Rim Conference on Multimedia ,vol. 1, pp. 296-299, Singapore, 2003.
[5] I. McLoughlin, B. Ramesh, T. Bretschneider, First Beowulf Cluster in Space , Linux 
Journal vol. 2005, issue 137, pp. 9, 2005.
 
[6] D.W. Hammerstrom and D.P. Lulich, Image processing using one-dimensional processor 
arrays , Proceedings of the IEEE, vol. 84, issue 7, pp. 1005 1018, July 1996.
[7] Mikhail S. Tarkov, Youngsong Mun, Jaeyoung Choi and Hyung-Il Choi, Mapping Parallel 
Programs onto Distributed Computer Systems with Faulty Elements , Proceedings of the 
International Conference on Computational Science-Part II, pp. 148-157, May 2001.
[8] Sharon Lim Siok Lin, Ian McLoughlin, Timo Bretschneider, Heiko Schröder,
"Reconfigurable, Fault Tolerant and High Performance Payload for Space Missions", 
Proceedings of the International Conference on Military and Aerospace Programmable Logic 
Devices, Washington, 2003.
[9] Sharon Lim Siok Lin and Heiko Schröder, A Fault Tolerant Parallel Computing 
Architecture for Onboard Satellite Image Processing , RMIT Research Conference 2004.
[10] W-K. Chen and E. F. Gehringer, A Graph-Oriented Mapping Strategy for a Hypercube , 
Proceedings of the third Conference on Hypercube Concurrent Computers and Applications, 
vol. 1, pp. 200 - 209, January 1988.
[11] Lenwood S. Heath, Arnold L. Rosenberg and Bruce T. Smith, The Physical Mapping 
Problem for Parallel Architectures , Journal of the ACM, vol. 35, issue 3, pp. 603 634, 
1988.   
161
  
                                                                                                                                                                                    
[12] S. Yuhaniz, T. Vladimirova and M. N. Sweeting, Embedded Intelligent Imaging On-
Board Small Satellites , Asia-Pacific Computer Systems Architecture Conference, ACSAC 2005, 
IEEE, pp. 90-103, Oct 2005, Singapore.
[13] John R. Samson, J. Ramos, A. George, M. Patel and R. Some, Technology Validation: 
NMP ST8 Dependable Multiprocessor Project",  Proceedings of the 2006 IEEE Aerospace 
Conference, March, 2006. 
[14] M.Pigno, Very High-Performance Embedded Computing Will Allow Ambitious Space 
Science Investigation , Accession Number: ADA445276, http:/ / handle.dtic.mil / 100.2/ ,
2005, last accessed in June 2010.
[15] T. Sterling, D.S. Katz and L. Bergman, "High-Performance Computing Systems for 
Autonomous Spaceborne Missions," International Journal, High-Performance Computing 
Applications, vol. 15, no. 3, pp. 282-296, 2001.
[16] D.S. Katz and R.R. Some, "NASA Advances Robotic Space Exploration", Computer, vol.36, 
issue 1, pp. 52-61, January 2003. 
[17]  O. Torheim, K. Bronstad, K. Heerlein, et al., "Development of an Embedded CPU-Based 
Instrument Control Unit for the SIR-2 Instrument Onboard the Chandrayaan-1 Mission to 
the Moon," Geoscience and Remote Sensing, IEEE Transactions , vol.47, no.8, pp.2836-2846, 
Aug. 2009.
[18] E. Touloupis, J.A Flint, V.A Chouliaras, D.D; Ward, "Study of the Effects of SEU-Induced 
Faults on a Pipeline Protected Microprocessor," Computers, IEEE Transactions , vol.56, 
no.12, pp.1585-1596, Dec. 2007.
[19] R. Hillman,  G. Swift, P. Layton, M. Conrad, C. Thibodeau, F. Irom, Space Processor 
Radiation Mitigation and Validation Techniques for an 1800 MIPS Processor Board ,
Proceedings of the Radiation Effects on Components and Systems, RADECS 2003, pp. 347- 352,
2003.
[20] Oh, N., P.P. Shirvani and E.J. McCluskey, "Error Detection by Duplicated Instructions In 
Super-scalar Processors,", IEEE Transactions on Reliability Sep. 2001. 
[21] R.R. Some, D.C Ngo, REE: A COTS-based fault tolerant parallel processing 
supercomputer for spacecraft onboard scientific data analysis , Digital Avionics Systems 
Conference, 1999, 18th Proceedings, vol.2, pp. 7.B.3-1 - 7.B.3-12, 1999.
[22] M. Pignol, "DMT and DT2: Overview of two CNES Fault-Tolerant Architectures Intended 
for Electronic COTS Components in Space Applications", IEEE Proc. Dependable System and 
Network (DSN), Supplemental Volume Fast Abstract, pp. B34-B35, 2003.
[23] D. Czajkowski, and M. McCartha, "Ultra Low-Power Space Computer Leveraging 
Embedded SEU Mitigation", IEEE Proc. of Aerospace Conf., vol. 5, pp. 2315-2328, 
http://www.spacemicro.com, 2003.
162
  
                                                                                                                                                                                    
[24] Space Micro, Space Micro Proton200K(TM) Computer, Part of the Winning Solution 
for Air Force Angels Nano-satellite Program , http://www.spacemicro.com/news/pr/2006,   
August 2006.
[25] M.A/ Kuwaiti, N. Kyiakopoulos, S. Hussein, A comparative analysis of network 
dependability, fault-tolerance, reliability, security, and survivability,  IEEE Communications 
Surveys and Tutorials, vol. 11, issue 2, pp. 106 124, 2009.
[26] A.K. Somani, N.H. Vaidya, "Understanding Fault Tolerance And Reliability", Computer, 
vol. 30, issue 4, pp. 45-50, 1997.
[27] J. Gray, D.P. Siewiorek, "High Availability Computer Systems", Computer, vol.24, issue 9, 
pp. 39-48, 1991.
[28] P. Bernardi, L. Bolzani, M.S. Reorda, "A Hybrid Approach to Fault Detection and 
Correction in SoCs," On-Line Testing Symposium, 2007. IOLTS 07. 13th IEEE International, 
pp.107-112, 8-11 July 2007.
[29] C. Braun, H.J. Wunderlich, "Algorithm-based fault tolerance for many-core 
architectures," 15th IEEE European Test Symposium (ETS), pp.253-253, 24-28 May 2010.
[30] C. Zizhong , J. Dongarra , "Algorithm-Based Fault Tolerance for Fail-Stop Failures," 
Parallel and Distributed Systems, IEEE Transactions , vol.19, no.12, pp.1628-1641, Dec. 2008.
[31] Klimeck, G. Oyafuso, F. McAuley, M. Deen, R. Yagi, G. DeJong, E. Cwik, T. A., Near real-
time parallel image processing using cluster computers , JPL Technical Report Server, 
http://hdl.handle.net/2014/7147, 2003, last accessed in June 2010.
[32] J. Wall and W.C. Fang, Electronics brain design for advanced microspacecraft , Acta 
Astronautica, IAA International Symposium on Small Satellites for Earth Observation, vol. 39, 
issues 9-12, pp. 815-822, Nov, 1996. 
[33] P. Stolorz, P. Cheeseman, "Onboard Science Data Analysis: Applying Data Mining to 
Science-Directed Autonomy," IEEE Intelligent Systems, vol. 13, no. 5, pp. 62-68, Sep-Oct. 
1998.
[34]  L.N. Bhuyan, D.P. Agrawal, "Applications of SIMD computers in signal processing",  
Proceedings of the June National Computer Conference, AFIPS 82 Conference, pp. 135 - 142,  
New York, 1982.
[35] Jurgen Friedrich, "Spatial Modeling in Natural Sciences and Engineering", ISBN 3-540-
20877-1, Springer-Verlag Berlin Heidelberg, New York Publication, 2004.
[36] A. Agarwal, D. Chaiken, K. Johnson, et al. The MIT Alewife Machine: A Large-Scale 
Distributed-Memory Multiprocessor, M. Dubois and S.S. Thakkar, editors, Scalable Shared 
Memory Multiprocessors, Kluwer Academic Publishers, pp. 239-261, 1992.
163
  
                                                                                                                                                                                    
[37] T. Sterling, D. Savares, P. MacNeice, K. Olson, C. Mobarry, B. Fryxell, P. Merkey, A 
Performance Evaluation of the Convex SPP-1000 Scalable Shared Memory Parallel 
Computer , Proceedings of the IEEE/ACM SC95 Conference, pp. 55, 1995.
[38] H. Burkhardt, S. Frank, B. Knobe, and J. Rothnie, Overview of the KSR 1 computer 
system, Technical Report KSR-TR-9202001, Kendall Square Research, Boston, MA, Feb. 
1992.
[39] M. Holliday, M.Stumm, "Performance evaluation of hierarchical ring-based shared 
memory multiprocessors", IEEE Transactions on Computers, vol.43, issue 1, pp.52-67, 
January 2004. 
[40] Hu Kai, Wang Zhe, "An Analytical Model of k-Ary n-Cube under Spatial Communication 
Locality", IEEE 24th WAINA International Conference, pp. 24-29, 2010.
[41] P. Cholda, A. Mykkeltveit, B.E Helvik, O.J. Wittner, A. Jajszczyk, "A survey of resilience 
differentiation frameworks in communication networks," Communications Surveys & 
Tutorials, IEEE , vol.9, no.4, pp.32-55, 2007.
[42] A. Kohler, G. Schley, M. Radetzki, "Fault Tolerant Network on Chip Switching With 
Graceful Performance Degradation," Computer-Aided Design of Integrated Circuits and 
Systems, IEEE Transactions , vol.29, no.6, pp.883-896, June 2010.
[43] C. Weidong , X. Wenjun, B. Parhami, "Swapped (OTIS) Networks Built of Connected 
Basis Networks Are Maximally Fault Tolerant," Parallel and Distributed Systems, IEEE 
Transactions , vol.20, no.3, pp.361-366, March 2009. 
[44] W. Bux, F. Closs, K. Kuemmerle, H. Keller, H. Mueller, "Architecture and Design of a 
Reliable Token-Ring Network", IEEE Journal on Communications, vol. 1, issue 5, pp. 756-
765,1983.
[45] E. Cota, F.L. Kastensmidt, M. Cassel, M. Herve, P. Almeida, et al., A High-Fault-Coverage 
Approach for the Test of Data, Control and Handshake Interconnects in Mesh Networks-on-
Chip, IEEE Transactions on Computers, vol. 57, issue 9, pp. 1202-1215, 2008.
[46] Nobuo Tsuda, Tatsuyuki Shimizu, "Reconfigurable Mesh-Connected Processor Arrays 
Using Row-Column Bypassing and Direct Replacement", ISPAN '00 Proceedings of the 2000 
International Symposium on Parallel Architectures, Algorithms and Networks, 2000.
[47] N.R. Mahapatra, S. Dutt, "Hardware-efficient and highly-reconfigurable 4- and 2-track 
fault-tolerant designs for mesh-connected multicomputers, Proceedings of Annual 
Symposium on Fault Tolerant Computing, pp. 272 - 281, 1996.  
[48] Wu Jigang, T. Srikanthan, Han. Xiaogang, "Preprocessing and Partial Rerouting 
Techniques for Accelerating Reconfiguration of Degradable VLSI Arrays," Very Large Scale 
Integration (VLSI) Systems, IEEE Transactions, vol.18, no.2, pp.315-319, Feb. 2010.
164
  
                                                                                                                                                                                    
[49] B. Beresford-Smith and H. Schroder, Effective reconfiguration in fault tolerant mesh-
connected networks, Australian Computer Journal, vol. 21, no. 2, pp. 79-84, May 1989.
[50] R. Mazzaferri and H. Schroder, A superior class of networks for reconfigurable  
meshes, in Proc. 6th Int. Parallel Processing Symp., IEEE Computer Society, pp. 437442, 
Mar, 23-26, 1992.
[51] P. Zipf, "Applying Dynamic Reconfiguration for Fault Tolerance in Fine-Grained Logic 
Arrays," Very Large Scale Integration (VLSI) Systems, IEEE Transactions, vol.16, no.2, pp.134-
143, Feb. 2008.
[52] V. Puente, J.A Gregorio, "Immucube: Scalable Fault-Tolerant Routing for k-ary n-cube 
Networks," Parallel and Distributed Systems, IEEE Transactions, vol.18, no.6, pp.776-788, 
June 2007.
[53] O. Lysne, J.M. Montanana, J. Flich, J. Duato, et al. , "An Efficient and Deadlock-Free 
Network Reconfiguration Protocol," IEEE Transactions on Computers, vol.57, no.6, pp.762-
779, June 2008.
[54] A. Haider, R. Harris, "Recovery Techniques in Next Generation Networks," 
Communications Surveys & Tutorials, IEEE , vol.9, no.3, pp.2-17, 2007.
[55] J. Morris, D. Kroening, P. Koopman, Fault tolerance tradeoffs in moving from 
decentralized to centralized embedded systems , 2004 International Conference on 
Dependable Systems and Networks, IEEE Computer Society, pp. 377-386, July, 2004.
[56] J. Ramos and D. Brenner, Environmentally-Adaptive Fault Tolerant Computing 
(EAFTC): An Enabling Technology for COTS based Space Computing , Proceedings of the 
2005 IEEE Aerospace Conference,  pp. 1-10, March, 2005.
[57] B. Hine and T. W. Fong, "Evaluation of the Intel iWarp Parallel Processor for Space 
Flight Applications", AIAA Aerospace Design Conference, February, 1993.
[58] S. Borkar, R. Cohn, G, Cox, S. Gleason, T. Gross, H.T. Kung, et al. iWarp: An Integrated 
Solution to High-Speed Parallel Computing , Proceedings of Supercomputing 88, ACM/IEEE, 
pp. 330-339, 1988. 
[59] K. Hu, Z. Wang, An Analytical Model of k-Ary n-Cube under Spatial Communication 
Locality , IEEE 24th International Conference on AINA Workshops, pp. 24-29, 2010.
[60] Raphael R. Some, Daniel S. Katz, "NASA Advances Robotic Space Exploration," 
Computer, vol. 36, no. 1, pp. 52-61, Jan. 2003.
[61] M. Pigno, Very High-Performance Embedded Computing Will Allow Ambitious Space 
Science Investigation , http://handle.dtic.mil/100.2/ADA445276, 2005, last accessed June 
2010.
165
  
                                                                                                                                                                                    
[62] Gary R. Brown, Radiation Hardened PowerPC 603e  Based Single Board Computer, 
20th Digital Avionics Systems, 2001. Oct 2001.
[63] R. E. Harper, "Reliability Analysis of Parallel Processing Systems", 8th Digital Avionics 
Systems Conference, pp.183-186, 1988.
[64]] C. Gebhardt, NASA Reviews COPV Reliability Concerns for Final Program Flights , 
July 18th, 2010, http://www.nasaspaceflight.com/2010/07/nasa-reviews-copv-for-final-
program-flights/, last accessed 9th Sep 2010. 
[65] D.P. Siewiorek, R.S. Swartz, Reliable Computer Systems: Design and Evaluation , Third 
edition, Digital Press, Burlington, MA, 1998.
[66] J.A. Rivers, P. Kudva, "Reliability Challenges and System Performance at the 
Architecture Level," Design & Test of Computers, IEEE, vol.26, no.6, pp.62-73, Nov.-Dec. 2009.
[67] Mentor Graphics BoardStation XE,  www.mentor.com/ products/ pcb-system-design, 
last accessed June 2010.
[68] L. Richard. An Optical Remote Inspection System for the Surrey Nanosatellite 
Applications Program , thesis submitted for the Master of Science Degree in University of 
Surrey 2001, http:/ / www.carrotworks.com/ documents/ 20010822-1555-surrey-msc-
thesis.pdf.
[69] J. Wang, B. Cronquist, B. Sin, J. Moriarta, R. Katz, Antifuse FPGA for Space Applications , 
RADECS Workshop Record, IEEE, pp. 11, 1997. 
[70] M. Wirthlin, E. Johnson, N. Rollins, M. Caffrey, P. Graham, The Reliability of FPGA 
Circuit Designs in the Presence of Radiation Induced Configuration Upsets , Proceedings of 
the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines,
FCCM. IEEE Computer Society, Washington, DC, April 2003.
[71] R. Katz, J.J. Wang, R. Koga, K. A. LaBel, J. McCollum, R. Brown, R. A. Reed, B. Cronquist, S. 
Crain, T. Scott, W. Paolini, and B. Sin "Current Radiation Issues for Programmable Elements 
and Devices," IEEE Transaction on .Nuclear. Science, vol. 45, no. 6, pp. 2600-2609, 1998.
[72] H. Quinn, K. Morgan, P. Graham, et al. , "Domain Crossing Errors: Limitations on Single 
Device Triple-Modular Redundancy Circuits in Xilinx FPGAs," Nuclear Science, IEEE 
Transactions, vol.54, no.6, pp.2037-2043, Dec. 2007
[73] C.A. Hulme, H.H Loomis, A.A. Ross, R. Yuan   Configurable fault-tolerant processor 
(CFTP) for spacecraft onboard processing , Aerospace Conference, IEEE Proceedings, vol. 4,
pp. 2269-2274, March 2004.
[74] J.M Emmert, C.E Stroud, M. Abramovici,  "Online Fault Tolerance for FPGA Logic 
Blocks," Very Large Scale Integration (VLSI) Systems, IEEE Transactions , vol.15, no.2, 
pp.216-226, Feb. 2007.
166
  
                                                                                                                                                                                    
[75] C.K. Kouba, K. Nguyen, P. O'Neill, C. Bailey,  Proton Radiation Test Results on COTS-
Based Electronic Devices for NASA-Johnson Space Center Spaceflight Projects , Radiation 
Effects Data Workshop, IEEE, pp. 26-36, July 2005.
[76] NASA Preferred Reliability Practices: Space Radiation Effects on Electronic 
Components in Low-Earth Orbit, http://engineer.jpl.nasa.gov/practices.html,  last  access 
9th Sep 2010.
[77] Spacerad home page: https://www.spacerad.com, last access June 2010.
[78] CRÈME96 home page: https://creme96.nrl.navy.mil, , last access June 2010.
[79] ESA s Space Environment System, http://www.spenvis.oma.be/, last accessed May 
2010.
[80] J.J. Wang, W. Wong, S. Wolday, B. Cronquist, J. McCollum, R. Katz,  I. Kleyner, Single 
event upset and hardening in 0.15 m antifuse-based field programmable gate array , IEEE 
Transactions on Nuclear Science , vol.50, issue 6, pp. 2158 2166, December 2003.
[81] Ken O Neill, Antifuse FPGA Technology: Best Option for Satellite Applications , Journal 
of Military Electronics and Computing, http://www.cotsjournalonline.com/, December 
2003.
[82] Actel Using EDAC RAM for RadTolerant RTAX-S FPGAs and Axcelerator FPGAs , 
Application Note AC273, Actel, July 2006.
[83] Reliasoft, System Analysis: Reliability, Availability and Optimization Online 
Reference , ReliaSoft's eTextbook for System Analysis. 2007.
[84]  ReliaSoft Blocksim, http://www.reliasoft.com/products.htm., last access, Sep, 2010.
[85] C.K. Hansen, Reliability prediction and simulation for a communications-satellite 
fleet , Reliability and Maintainability Symposium, Annual Proceedings, IEEE, pp. 152 158, 
Jan 1995.
[86] A. T. Tai, S. N. Chau, L. Alkalai, COTS-Based Fault Tolerance in Deep Space: Qualitative 
and Quantitative Analyses of a Bus Network Architecture , Proceedings of the 4th IEEE 
International Symposium on High Assurance Systems Engineering (HASE 99), pp.97, 
Washington, DC, Nov 1999.
[87] T. Sterling, D.S. Katz, L. Bergman, High Performance Computing Systems for 
Autonomous Spaceborne Missions , Internat ional Journal of High Perform ance 
Comput ing Applicat ions, SAGE Publication, vol. 15, no. 3, pp. 282-296, 2001.
[88] J. Rohr, Software-Implemented Fault Tolerance for Supercomputing in Space, , 
International Fault-Tolerant Computing Symposium, IEEE, Germany,1998.
167
  
                                                                                                                                                                                    
[89] M.N. Lovellette, K.S Wood, D.L. Wood, J.H. Beall,  P.P. Shirvani, N. Oh, E.J. McCluskey;
Naval Res. Lab, Strategies for fault-tolerant, space-based computing: Lessons learned from 
the ARGOS testbed , IEEE Aerospace Conference Proceedings, vol.4 pp. 5-2109 - 5-2119, 
Washington, DC, USA, 2002.
[90] Richard Katz, Advanced Design:  Designing for Reliability , MAPLD International 
Conference, NASA, 2001.
[91] A.L. Benjamin, J.H.   Lala, Advanced fault tolerant computing for future manned space 
missions , Digital Avionics Systems Conference, 16th DASC., AIAA/IEEE, vol.2, pp. 8.5-26 -
8.5-32, October, 1997.
[92] A.L. Gehin, M. Staroswiecki, "Reconfiguration Analysis Using Generic Component 
Models," Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions, 
vol.38, no.3, pp.575-583, May 2008.
[93] I.V. McLoughlin, T. Bretschneider, Achieving Low-Cost High Reliability Computation 
Through Redundant Parallel Processing , ICOCI Conference, IEEE, pp.1-6, Kuala Lumpur, 
June 2006.
[94] Intel® StrongARM* SA-1110 Microprocessor Developer s Manual, June 2000.
[95] J. Mathew, A.M. Jabir, D.K. Pradhan, "Design Techniques for Bit-Parallel Galois Field 
Multipliers with On-Line Single Error Correction and Double Error Detection," On-Line 
Testing Symposium,  IOLTS '08. 14th IEEE International, pp.16-21, 7-9 July 2008.
[96] G.A. Reis, J. Chang, N. Vachharajani, R. Rangan, D.I. August,  and S.S. Mukherjee, 
Software-controlled fault tolerance ACM Trans. Archit. Code Optim., vol. 2, issue 4, pp. 366-
396, December, 2005. 
[97] M. Rebaudengo, M. Sonza Reorda, M. Torchiano, M.Violante, Soft-error Detection 
through Software Fault-Tolerance techniques , DFT 99: IEEE International Symposium on 
Defect and Fault Tolerance in VLSI Systems, Austin (USA), pp. 210-218, Nov 1999.
[98] S.L. Lim, T. Bretschneider, I.V. McLoughlin, and H. Schroder, Reconfigurable, Fault 
Tolerant and High Performance Payload for Space Missions , Proceedings of the 
International Conference on Military and Aerospace Programmable Logic Devices, 
Washington, NASA, 2003.
[99] C.Y. Chua,  S.L. Lim, D.L. Douglas, High Performance, Reliable and Flexible Computing 
Payload for Space Missions , IEEE TENCON Conference, Chiangmai, vol.4, issue 21-24, pp. 
427-430, 2004.
[100] R. Miller, V.K. Prasanna-Kumar,   D.I. Reisis,   Q.F. Stout, Parallel computations on 
reconfigurable meshes , IEEE Transactions on Computers, vol. 42, issue 6, pp. 678 692, Jun 
1993.
168
  
                                                                                                                                                                                    
[101] S. Hambrusch, X. He, and R. Miller, Parallel algorithms for gray-scale image 
component labeling on a mesh-connected computer , Proceedings of the Fourth Annual ACM 
Symposium on Parallel Algorithms and Architectures, California, United States, June, 1992.  
[102] I. Takanami, Built-in Self-Reconfiguring Systems for Mesh-Connected Processor 
Arrays with Spares on Two Rows/ Columns , Proceedings of the 15th IEEE international 
Symposium on Defect and Fault-Tolerance in VLSI Systems, DFT. IEEE Computer Society, 
Washington, DC, pp. 213-221, October 2000.
[103] T. McDonald, H. Schröder, C3 - a Powerful Connection Network for Fault Tolerance , 
in Proceedings of Pacific Rim International Symposium on Fault Tolerant Systems, IEEE 
Computer Society Press, pp 102-107, September 1991.
[104]] S.L. Lim, J.L. Zheng, XSat PPU CDR Design Document , CREST internal 
publication,2006.
[105] Actel Application Notes, Actel Antifuse Package Selector Guide , 2009.
[106] Charles Pfeil, BGA Breakouts and Routing, Effective Design Methods for Very Large 
BGA , Mentor Graphics Corporation, 2008.
[107] Actel Application Notes, Package Mechanical Drawings , Revision 39, 
http://www.actel.com/documents/PckgMechDrwngs.pdf, 2010.
[108] K. Bonner, S. Walton, Qualification of Ball Grid Array Assemblies for Space Flight 
Applications , Jet Propulsion Laboratory Technical Report, Feb, 1997.
[109] Altera Corporation, Understanding Metastability in FPGAs , Altera Website, 
www.altera.com/literature/wp/wp-01082-quartus-ii-metastability.pdf, July 2009.
[110] C.E. Cummings and P.Alfke, Simulation and Synthesis Techniques for Asynchronous 
FIFO Design with Asynchronous Pointer Comparisons , Synopsys Users Group Conference, 
SNUG 2002, Section TB2, 3rd paper,  www.sunburst-design.com/papers, San Jose, CA,
March, 2002.
[111] The Environmental Stress Screening Handbook , by Thermotron Industries, 1988. 
[112] European Cooperation for Space Standardisation (ECSS) standard on Space 
Engineering Testing , ECSS-E-10-03A, published by ESA, Feb 2002.
[113] Actel Application Note AC132 Using the Silicon Explorer for System-level Debug ,
1997. 
[114] T. Vladimirova, S. Yuhanuz, M. Meerman, P. Stephens, D. Hodgson, Intelligent Imaging 
on Board Small Observation Satellites , IEEE International Conference on GeoScience and 
Remote Sensing Symposium, pp. 3939 3942, Aug 2006.
