Single event upset hardened embedded domain specific reconfigurable architecture by Baloch, Sajid
Single Event Upset Hardened Embedded 
Domain Specific Reconfigurable Architecture 





Thesis submitted for the degree of Doctor of Philosophy. 
The University of Edinburgh 
January 2007 
ai 
Declaration Of Oriczinality 
I declare that this Thesis is my Original work except where stated 
Student Name: Sajid Baloch 
ACKNOWLEDGMENT  
This thesis is a result of three years of work whereby I have been accompanied and 
supported by many people. It is a pleasant aspect that I have now the opportunity to 
express my gratitude to all of them. 
The first person I would like to thank is my supervisor Professor Tughrul Arsian. I have 
been in his project since 2003 when I started my MSc project. During these years I have 
known Prof. Arsian as a sympathetic and principle-centred person. His enthusiastic and 
integral view on research and his mission for providing 'only high-quality work and not 
less' has made a deep impression on me. Prof. Arsian has inspired and challenged me to 
push the boundaries of what is possible in the field of Re-Configurable Computing and 
Single Event Upsets. 
I owe especial thanks to Dr. Adrian Stoica from Jet Propulsion Laboratory (JPL), NASA 
USA, who always was available when I needed his advices and an endless supply of new 
ideas and technical support. for both this research and other relevant technical areas.. 
I would also like to thank Mr. Sami Khawam (University Of Edinburgh). Kay-Chuan 
Benny Tan and Dr. Ahmet Erdogan for providing me with advice and technical 
support throughout this research. 
I extend a sincere thanks to all staff members of Institute for System Level Integration 
(ISLI) UK, who helped me in providing the environment and technical facilities to 
finish this task. 
I would especially like to thank Miss Elaine McCrohon who took an effort in reading and 
providing me with valuable comments on earlier versions of this thesis: 
I would like to thank my family for their support during my student life, and for their 
patience and encouragement while have been separated for many years. 
Lastly, I dedicate this thesis to my beloved late father and my family. 
Publications 
Submitted Journals 
S. Baloch, T. Arslan, A. Stoica, Modelling and Eradication of Single-Event 
Disruptions in Digital Microelectronics" submitted in the IEEE Transactions on 
Nuclear Science, initial review done and resubmitted with corrections 
Refereed Conferences 
S. Baloch, T. Arsian, A. Stoica, Radiation Hardened Coarse-Grain Reconfigurable 
Architecture for Space Applications', has been accepted as a regular paper for the 
14th Recontigurable Architectures Workshop RAW 2007, to be held in March 2007 
at Long Beach California, USA, 6 Pages. 
S. Baloch, T. Arslan. A. Stoica, 'An Efficient Fault Tolerance Scheme for 
Preventing Single Event Disruptions in Reconfigurable Architectures", IEEE Field 
Programmable Logic and Applications. 2005.on 24-26 Aug. 2006, Madrid Spain, 
pp. 618-621. 
S. Baloch, T. Arsian, A. Stoica, 'Design of a single event upset (SEU) mitigation 
technique for programmable devices" IEEE Quality Electronic Design, 2006. 
ISQED 06. 7th International Symposium on 27-29 March 2006, 4 Pages. Digital 
Object Identifier 10.1 109/ISQED.2006.46. 
S. Baloch, T. Arsian, A. Stoica, "Design of a Novel Soft Error Mitigation Technique 
for Recorifigurable Architectures" IEEE Aerospace Conference, 2006 IEEE 0411th 
March 2006, pp.  1 - 9. 
S. Baloch, T. Arsian, A. Stoica, "An Efficient Technique for Preventing Single 
Event Disruptions in Synchronous and Reconfigurable Architectures Adaptive 
Hardware and Systems, 2006. AHS 2006. First NASA/ESA Conference on 15-18 
June 2006, pp.  292 - 295. 
lv 
S. Baloch, T. Arslan, A. Stoica, "Probability Based Partial Triple Modular 
Redundancy Technique for Reconfigurable Architectures" IEEE Aerospace 
Conference, 2006 IEEE 04-11 March 2006, PP.  1 - 7. 
S. Baloch, T. Arsian, A. Stoica, "Embedded Reconfigurable Array Fabrics for 
Efficient Implementation of Image Compression Techniques", First NASA/ESA 
Conference on Adaptive Hardware and Systems, 2006. AHS 2006. pp.  15-18. 
S. Baloch, T. Arslan, A. Stoica, "Efficient Error Correcting Codes for On-Chip 
DRAM Applications for Space Missions", IEEE Aerospace, 2005 IEEE Conference 
5-12 March 2005, pp.  1 - 9. 
S. Baloch, T. Arsian, A. Stoica, "Low power domain-specific reconfigurable array 
for discrete wavelet transforms targeting multimedia applications", Field 
Programmable Logic and Applications. 2005. International Conference on 24-26 
Aug. 2005, pp.  618-621. 
S. Baloch, T. Arstan, A. Stoica. "Domain-specific reconfigurable array targeting 
discrete wavelet transform for system-on-chip applications", IEEE Parallel and 
Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International 4-8 
April 2005, 4 Pages. 
Industrial & Refereed Conferences 
S. Baloch. T. Arsian, A. Stoica, "Design of a 'Single Event Effect' Mitigation 
Technique for Reconfigurable architectures" 2005 MAPLD International 
Conference Ronald Reagan Building and International Trade Center Washington, 
D.C. September 7-9, 2005, 6 Pages. 
V 
Abstract 
In recent years reconfigurable computing machines have introduced a new set of 
alternatives for hardware and systems designers. Reconfigurable system-on-chip offers 
great room for innovations in system architecture because of increasing device densities 
and the combination of software targeted tasks with runtime reconfiguration of the system. 
Due to ability to customize hardware modules, it is possible to optimize control, data-path 
and interconnections according to specific algorithm requirements. Domain specific 
reconfigurable systems in particular, can achieve outstanding benefits from this paradigm, 
adapting to the instantaneous needs of an application. With the advance in VLSI digital 
technology, many high throughput and performance imaging and video applications have 
emerged and increased in usage. At the core of these imaging and video applications is the 
image and video compression technology based on DWT. DWT processes are by nature 
very computationally intensive and power consuming. 
Therefore this thesis sets out to realize an embedded synthesisable domain specific 
reconfigurable core for DWT algorithms and investigates whether optimizing at both 
algorithmic and architecture level will yield a high performance hardware. A novel 
implementation scheme for the realization of a low power JPEG-2000 lifting based DWT 
is proposed and presented in this thesis. The scheme targets to reduce the switched 
capacitance by reducing the number of computational steps and data-path/arithmetic 
hardware through the manipulation of configurable logic blocks and interconnects. These 
resulted in a novel DWT specific reconfigurable architecture that is more efficient than a 
generic reconfigurable core. 
The thesis also investigates single event effects on the proposed reconfigurable core. 
The research resulted in a number of single event upset (SEU) mitigation schemes for 
the proposed core. The thesis introduces a novel approach to eradicate single event 
disruptions in sequential and configuration bit storage circuits. Two novel SEU 
mitigation schemes for combinational circuits are proposed through this thesis. These 
are based on partial triple modular redundancy and on dual hardware redundancy with 
comparisons. The proposed schemes are implemented on the proposed reconfigurable 
core and evaluated in terms of performance overheads and the results proved the 







1.2 	Motives and Objectives .....................................................................................23 
1.2.1 	Reconfigurable Computing.......................................................................24 
1.2.2 	Role of COTS Technology in Aerospace Industry ...................................25 
1.2.3 	Single Event Upsets ..................................................................................28 
1.3 	Contributions ..................................................................................................... 29 
1.4 	Thesis Overview ................................................................................................30 
LITERATURE REVIEW 
2.1 	Introduction 	..................................................................................................- 32- 
2.2 	Reconfigurable Architectures ............................................................................ 33 
2.2.1 	General Purpose Reconfigurable Architectures........................................ 33 
2.2.1.1 	Fine Grain General Purpose Reconfigurable Cores.............................. 34 
2.2.1.2 	Medium Grain General Purpose Reconfigurable Cores ....................... 35 
2.2.1.3 	Coarse Grain General Purpose Reconfigurable Cores.......................... 36 
2.2.2 	Domain Specific Reconfigurable Architectures........................................ 38 
2.2.2.1 	Domain Specific Reconfigurable Cores for DWT................................ 39 
2.3 	Radiation 	Effects ............................................................................................... 39 
2.3.1.1 	Cumulative Effects ............................................................................... 40 
I) 	Displacement Damage................................................................................... 40 
ii) 	Total 	Ionizing dose (TID) 	............................................................................. 40 
2.3.1.2 	Single Event Effects (SEE) ................................................................... 40 
i) 	Static Single Event Effects............................................................................ 40 
ii) 	Transient Single Event Effects...................................................................... 41 
iii) 	Permanent Single Event Effects................................................................ 41 
2.3.2 	Single Event Upsets .................................................................................. 41 
2.3.2.1 	Brief History......................................................................................... 42 
2.3.2.2 	Physical Origins of SEUs and SETs..................................................... 44 
2.4 	Conventional SEU/SET Mitigation Techniques................................................ 45 
2.4.1 	Process Technology .................................................................................. 46 
2.4.2 Hardened Memory Cells........................................................................... 46 
vii 
 Hardened Gate Resistor Memory Cell ..........................................................48 
 IBM Memory Cell......................................................................................... 49 
 NASA Memory Cell .................................................................................50 
2.4.3 Error Detection and Correction Techniques ............................................. 51 
2.4.4 Hardware Redundancy..............................................................................52 
2.5 Hardening Techniques for General Purpose Reconfigurable Architectures...... 52 
2.5,1 SEU Hardening for SRAM Based FPGAs................................................ 53 
2.5.1.1 	Module Redundancy............................................................................. 55 
2.5.1.2 	Device Redundancy..............................................................................56 
2.5.1.3 	Correcting SEU through Partial Configuration .................................... 56 
2.5.2 SEU Hardening for Anti-fused based FPGAs........................................... 57 
2.5.3 SEU Hardening for EPLDs ....................................................................... 58 
2.6 Summary............................................................................................................ 58 
3 	RECONFIGURABLE FABRIC FOR DISCRETE WAVELET TRANSFORM 
3.1 Introduction 	..................................................................................................- 60 - 
3.2 JPEG-2000......................................................................................................... 61 
3.2.1 JPEG-2000 Compression.......................................................................... 62 
3.3 Discrete Wavelet Transform (DWT) ................................................................. 64 
3.4 DWT Implementation Techniques .................................................................... 65 
3.4.1 Direct Form Structure ............................................................................... 65 
3.4.2 Polyphase Structure ..................................................................................66 
3.4.3 Lifting Based Structure.............................................................................66 
3.5 Reconfigurable SoC Platform............................................................................ 68 
3.6 Reconfigurable Fabric For DWT....................................................................... 69 
3.7 Reconfigurable Logic Blocks ............................................................................ 70 
3.7.1 Add-Subtract Cluster ................................................................................ 71 
3.7.2 Coefficient Multiplier Cluster................................................................... 72 
3.7.3 Configurable Buffer Cluster ..................................................................... 74 
3.8 Programmable Interconnects ............................................................................. 75 
3.8.1 Configurable Switch 	................................................................................. 76 
3.8.2 Switch 	Boxes 	............................................................................................ 76 
3.8.3 Connection 	Boxes ..................................................................................... 78 
3.8.4 Tracks........................................................................................................ 79 
3.9 Placement and 	Routing..................................................................................... 80 
3.9.1 Circuit 	Net-list 	.......................................................................................... 81 
3.9.2 Reconfigurable Core's Architecture ......................................................... 82 
3.9.3 Placement Of Clusters............................................................................... 82 
viii 
3.10 	DWT Implementations 	 .84 
3.10.1 	Data Extension for Image Compression ................................................... 84 
3.10.2 	DWT Implementation-i ............................................................................ 87 
3.10.3 	DWT Implementation-2 ............................................................................88 
3.10.4 	DWT Implementation-3 ............................................................................ 89 
3.11 	Performance Evaluations ................................................................................... 91 
3.11.1 	Area Comparison ......................................................................................91 
3.11.2 	Power Comparison .................................................................................... 92 
3.11.3 	Area & Power Distribution Among Clusters ............................................ 93 
3.12 	Summary ............................................................................................................ 95 
4 	ERROR DETECTION AND CORRECTION CODES 
4.1 	Introduction 	..................................................................................................- 	96- 
4.2 	Background ........................................................................................................ 98 
4.2,1 	Hamming Code Definition ...................................................................... 100 
4.2.1.1 	Error Detection and Correction: ......................................................... 102 
4.3 	The Proposed Error Correcting Code .............................................................. 103 
4.3.1 	Case Example-1: (An Error in Data Bits) ............................................... 104 
4.3.2 	Case Example-2: (An Error in Parity Bits) ............................................. 105 
4.3.3 	Encoder Design ....................................................................................... 106 
4.3.4 	Decoder Design ....................................................................................... 108 
4.3.5 	Performance ............................................................................................ 110 
4.4 	Summary .......................................................................................................... III 
5 	SEU MITIGATION SCHEME FOR SEQUENTIAL CIRCUIT ELEMENTS 
5.1 	Introduction 	..................................................................................................... 112 
5.2 	SEU 	Mechanism .............................................................................................. 112 
5.3 	The Proposed Model ........................................................................................ 115 
5.3.1 	Temporal 	Sampling ................................................................................. 117 
5.3.2 	Weighted Voting Circuitry ..................................................................... 120 
5.4 	Case 	Examples ................................................................................................. 121 
5.4.1 	Single Fault Recovery ............................................................................. 121 
5.4.2 	Multiple Fault Recovery (Double Fault) ................................................. 122 
5.5 	The Proposed Mitigation Technique ............................................................... 123 
5.6 	SEU / SET Mitigation Process ........................................................................ 126 
5.6.1 	Static 	latch 	SEU ...................................................................................... 126 
5.6.2 	Data 	SET ................................................................................................. 126 
5.6.3 	Clock 	SET ............................................................................................... 126 
5.6.4 	Majority 	Voting ...................................................................................... 127 
ix 
5.6.5 Size Trade-Off 	 . 127 
5 .6.6 Static Data Storage .................................................................................. 128 
5.7 SEU I SET Simulator ....................................................................................... 130 
5.7.1 Scenario 	1 	.............................................................................................. 132 
5.7.2 Scenario 	2 	.............................................................................................. 133 
5.7.3 Scenario 	3: 	.............................................................................................. 133 
5.7.4 Scenario 4 	.............................................................................................. 133 
5.8 Summary .......................................................................................................... 137 
6 	PARTIAL TRIPLE MODULAR REDUNIEDANCY BASED SEU/SET 
PROTECTION OF COMBINATORIAL LOGIC 
6.1 Introduction 	..................................................................................................... 138 
6 .2 Related 	Work ................................................................................................... 139 
6.3 Model 	Description ........................................................................................... 142 
6.3.1 SEU 	Sensitive Gate ................................................................................. 143 
6 .3.2 Dominant Value ...................................................................................... 143 
6.3.3 Threshold Probability .............................................................................. 144 
6.4 Signal 	Probability Estimations ........................................................................ 147 
6.5 An Illustrative Example ................................................................................... 152 
6 .6 Experimental 	Flow 	.......................................................................................... 154 
6.7 Evaluations 	...................................................................................................... 154 
6.8 Summary .......................................................................................................... 158 
7 	SEU/SET MITIGATION WITH DUAL HARDWARE REDUNDANCY 
7.1 Introduction 	..................................................................................................... 159 
7.2 Background ...................................................................................................... 159 
7.3 Dual Hardware Redundancy With Comparison .............................................. 162 
7.3.1 Normal Operation - No Fault 	(O6Hex) ................................................... 164 
7.3.2 Transient Fault in Block-A 	(03Hex, OAHex) ........................................ 164 
7.3.3 Transient Fault in Block-B 	(05Hex, OCHex) ......................................... 164 
7.3.4 Permanent Fault in Block-A/B 	(09Hex, OFHex) ................................... 165 
7.3.5 Transient Fault in Detection 	(OOHex) .................................................... 165 
7.4 Voting 	Logic .................................................................................................... 165 
7.4.1 Scenario-1: Normal Operation (No Fault) .............................................. 166 
7.4.2 Scenario-2: Transient Fault (Block-A) ................................................... 166 
7.4.3 Scenario-3: Transient Fault (Block-B) .................................................... 166 
7.4.4 Scenario-4: Transient / Permanent Fault (syndrome Generator) ............ 166 
7.4.5 Scenario-5: Permanent Fault (Block-A / Block-B) ................................. 167 
7.5 Performance Evaluation ................................................................................... 168 
x 
7.6 	Summary 	 .172 
8 	PERFORMANCE EVALUATION OF THE PROPOSED ARCHITECTURE 
8.1 	Introduction 	..................................................................................................... 173 
8.2 	Performance Evaluations ................................................................................. 173 
8.3 	Evaluations: 	Level-I ....................................................................................... 177 
8.3.1 	Evaluation (Process Completion Time) .................................................. 180 
8.3.1.1 	Analysis .............................................................................................. 181 
8.3.2 	Area Comparison .................................................................................... 184 
8.4 	Evaluations: 	Level-2 ........................................................................................ 185 
8.5 	Evaluations: 	Level-3 ........................................................................................ 186 
8.6 	Evaluation: 	Level-4 ......................................................................................... 188 
8.7 	Evaluations: 	Level-5 ........................................................................................ 190 
8.8 	Summary .......................................................................................................... 193 
9 	CONCLUSIONS ..................................................................................................... 194 
9 .1 	Summary .......................................................................................................... 194 
9 .2 	Future 	Work ..................................................................................................... 196 
9.3 	Final 	Comments ............................................................................................... 197 
References ................................................................................................................................... 198 
xl 
List of Figures 
2.1: Different Types of Radiation Effects on Devices. 39 
2.2: Commonly Used Basic RAM Memory Cells [RAB-096] 47 
2.3: Effect of a Charged Particle on a Basic Memory Cell [RAB-096] 48 
2.4: Physical Phenomenon of SEU LBES-0931 48 
2.5: Gate Resistor Based SRAM Memory Cell[WEA-087J 49 
2.6: Transistor Diagram of IBM Memory Cell [ROC-092] 49 
2.7: Mux Failure Example 53 
2.8: PIP Failure Modes 54 
2.9: Buffer Failure Modes 54 
2.10: Module Redundancy Technique [ALF-0981 55 
2.11: Module Partitioning 56 
2.12: Device Redundancy-Triple Device Redundancy 56 
2.13: C-Module and S-Module by Actel [ACT-097] 57 
2.14: Actel TMR Implementation [ACT-00971 58 
3.1: Comparison of JPEG and JPEG 2000 compressed images [PAN-0001. 62 
3.2: JPEG-2000 Encoder Block Diagram [JPG-200] 62 
3. 3: Two level Decomposition showing sub-bands [ABO-001]. 63 
3.4: Dyadic Decomposition [ABO-001] 64 
3.5: Direct Form Structure for a Two Dimension DWT [MAR-099]. 66 
3.6: Polyphase structure of DWT [MAR-099] 66 
3.7: Lifting Based Structure [SWE-096] 67 
3.8: Reconfi gurable System-on-Chip 68 
3.9: 2-1) DWT Mechanism 69 
3.10: Design Flow Diagram of Domain Specific Reconfigurable Fabric 70 
3.11: Design of a Reconfigurable Add/Subtract Block 71 
3.12: Reconfigurable Coefficient Multiplier Block 73 
3.13: Reconfigurable Shifter Block 74 
3.14: Proposed Buffer CLB 74 
3.15: The proposed Reconfigurable Buffer Cluster with configurable Switches 75 
3.16: The Reconfigurable Switch 76 
3.17: Simple Switch Box 76 
xii 
3.18: Switch Box Flexibility and Direction [DEP-098] 77 
3.19: Routing Switch Connections with transistor Sizes [BET-099 77 
3.20: Disjoint Switch Box [ROS-0901 78 
3.21: Required Configurable Switches per Switch Box 78 
3.22: A Typical Connection Box 79 
3.23: Flexibility Definition of a Connection Box (C-Box) 79 
3.24: Cluster with S-Boxes and C-Boxes 80 
3.25: Outcome of Architecture File 82 
3.26: Outcome of Architecture File 83 
3.27: Reconfigurable Array for DWT Computations. 83 
3.28: JPEG2000 Symmetric Data Extension 85 
3.29: JPEG-2000 Symmetric Data Extension 85 
3.30: Data Dependence Graph of JPEG-2000 Data Extension [BEN-0041 86 
3.31: Data Dependence Graph of JPEG2000 Data Extension [BEN-0041 86 
3.32: Input Generator Block for DWT implementation 87 
3.33: Realization of DWT on the Proposed Reconfigurable Array 88 
3.34: DWT processing through Proposed Reconfigurable Array 88 
3.35: Realization of Lian et. a]. [LIA-001] in terms of Reconfigurable Array 89 
3.36: DWT Filtering Through the Proposed Architecture 89 
3.37: 9/7 Integer Fast DWT Implementation [DAN-002] 90 
3.38: Percentage Power Saving 93 
3.39: Average % Relative power consumption of different clusters with respect to 
Coefficient multiplier cluster 94 
3.40: Area Distribution of Add-Subtract cluster of the Reconfigurable Array 94 
3.41: Average Power Distribution of Add-Subtract cluster 95 
4.1: DRAM with Error Correction 99 
4.2: The Hamming Code composition 100 
4.3: The Check Bits in The Hamming Code 101 
4.4: Error Correcting code based encoder circuitry. 107 
4.5: Power comparison of different encoders 108 
4.6: Proposed (13,8) code based syndrome generator 108 
4.7: Kazéminéjad's (13,8) [KAZ-001] code based syndrome generator circuitry. 109 
4.8: Syndrome Decoder & Correcting Circuitry. 109 
4.9: Area comparison of encoder based on different error correction codes 110 
4.10: Overall comparison of encoder and decoder based on different error correction codes 
in terms of area. 111 
5.1: Critical Transient Width Vs Feature Size for Un-attenuated Propagation 113 
xlii 
5.2: SEU in Signal Routing Path 114 
5.3: General TMR Scheme 114 
5.4: SEU Sensitive Configuration Bit Storage Circuits 115 
5.5: Typical Sequential Circuit Topology and SEU 116 
5.6. Temporal Relationship for Latching a Data SET as an Error. 117 
5.7: Functional Equivalence of Flip-Flop 118 
5.8: Temporal Data Sampling 118 
5.9: Clocking Scheme for the proposed Architecture 119 
5.10: Clocking Scheme for the proposed Architecture 120 
5.11: Proposed SEU/SET Mitigation Technique with Self Correction Mechanism 121 
5.12: Simulation Results 123 
5.13: sample Node values for One Computation Cycle 123 
5.14: sample Node values for One Computation Cycle 124 
5.15: Proposed Temporal Data Sampling 124 
5.16: Clocking Scheme for the proposed Architecture 125 
5.17: Self-Scrub Mechanism for Static Data Storage 129 
5.18: The Proposed technique with reduced clock signals 129 
5.19: The proposed scheme with reduced clock signals and latches 130 
5.20: Optimized proposed SEU mitigation scheme 130 
5.21: Error evaluation of the proposed technique 131 
5.22: Fault injection process of the designed SEU simulator 132 
5.23: Percentage area saving through the proposed technique 136 
5.24: Area Comparison of the proposed technique 137 
6.1: The Approach Using delayed signal and Buffer Circuit [MON-003] 142 
6.2: The Proposed Model 143 
6.3: Algorithm to Determine SEU Sensitive Gates 143 
6.4: Algorithm to Determine Dominant Value 143 
6.5: The Logic Functionality for AND, OR Gates 144 
6.6: Proposed Model, SEU sensitive and insensitive gates 145 
6.7: (b) Partial TMR applied to only SEU Sensitive Gates 146 
6.8: The Test Circuit realizing Z= ABCDE + FGHI 147 
6.9: The Graph for Z = X5 + X4 - X9 148 
6.10: A Test Circuit to Compute Signal Probabilities 149 
6. 11: Simple Algorithm for probability estimations 150 
6.12.: Algorithm to Determine Mark Tuple 151 
6.13: Algorithm for Inference Rules 151 
6.14: Logical Circuit with SEU Sensitive Gates 152 
xiv 
6.15: Example Combinational Circuit with SEU Sensitive Gates 	 153 
7.1: Dual Hardware Redundancy with Comparison 	 163 
7.2:Voter circuit for the proposed scheme 	 165 
7.3: Implementation of the proposed scheme 	 168 
7.4: Hardened Voting Circuit for the Proposed Scheme 	 168 
7.5: Percentage Area Overhead comparison 	 170 
7.6: Power Comparison of the Proposed Scheme 	 172 
8.1: Performance Evaluation Flow Diagram for the Proposed Reconfigurable 
Architecture 	 176 
8.2: Power consumption graph of various DWT algorithms on the proposed 
reconfigurable fabric for the processing of a single frame of test image. Lenna. 
Barbra and Pepper at image size of 64x64. Test conducted at clock frequency 
of 40MHz 178 
8.3: Power consumption graph of various DWT algorithms on three different 
platforms for the processing of a single frame of test image, Lenna, Barbra and 
Peper at image size of 64x64. Test conducted for 329000ns at clock frequency 
of 40MHz 179 
8.4: Graphical representation of the average power consumed by the proposed 
reconfigurable fabric for the processing of one level of DWT of a single frame 
of test image 64x64 at its process completion time of 102368ns at clock 
frequency of 40MHz 181 
8.5: Graphical representation of the average power consumed by the proposed 
reconfigurable fabric for the processing of one level of DWT of a single frame 
of test image 64x64 for the test duration of 1 37488ns at clock frequency of 
40MHz 182 
8.6: Percentage power overhead comparison of TMR technique. The results are based 
on 64x64 Lenna, Barbra and Pepper test images implemented through various 
DWT algorithms (DWT.-1, DWT-2, DWT-3) 	 186 
8.7: Power consumption distribution among Level-1, Level-2 and Level-3 	 188 
8.8: Percentage power consumption overhead with respect to ASIC 	 189 
8.9: Percentage power consumption overhead with respect to ASIC 	 191 
xv 
List of Tables 
2. 1: FPGA Technology Characteristics 34 
2. 2: Commercial FPGAIPLD Characteristics 34 
2. 3: comparison of SEU hardened memory cells 50 
3.1: 9/7 Filter coefficients in CSD form 72 
3.2: Comparison of DWT algorithms based upon utilization of proposed array 91 
3.3: Comparison of Map and Place & Route Reports. 92 
3.4: Performance Evaluation of DWT Implementations (clock frequency 75MHz) 93 
4. 1: Error Correction and Corresponding Correction Bits 104 
4. 2: Comparison of different Encoders 110 
4.3: Comparisons of syndrome generators. 111 
5.1: Single Fault Recovery Process 122 
5.2: Multiple Fault Recovery Process 122 
5.3: Power requirement and analysis for SEU immunity for ISCAS89 circuits 133 
5.4: %age SEU Power Overhead for ISCAS89 with respect to standard circuits 134 
5.5: Power saving through the proposed scheme with respect to ISCAS89 circuits 135 
5.6: Area requirement for SEU immunity for ISCAS89 bench mark circuits 135 
5.7: Area requirement for SEU immunity for ISCAS89 bench mark circuits 136 
6. 1: Signal Probabilities calculation formulas 147 
6.2: Input signal Probability (Figure-6.10) 149 
6.3: Probability Calculations for Reference Circuit Figure-6.14 153 
6.4: Probability Calculation Reference Circuit Figure-6.7 154 
6.5: Simulation Results with probability threshold = 0.45 and SEU width = 3ns 156 
6.6: Simulation Results for %age Area Savings SEU Width = 3ns 157 
7.1: The Proposed Syndrome Generation and Analysis 164 
7.2: I/O Pin Comparisons of the DI-IRC with other SEU Mitigation Techniques 169 
7.3: Area Comparison Results of the DHRC with other SEU Mitigation Techniques 170 
7.4: Comparison Results of the DHRC with other SEU Mitigation Techniques 171 
8.1: Power consumption and comparison figures of various DWT algorithms on three 
different platforms for the processing of a single frame of test image, Lenna, 
Barbra and Peper at image size of 64x64. Test conducted at clock frequency of 
40MHz 177 
xvi 
8.2: Power consumption and comparison figures of various DWT algorithms on three 
different platforms for the processing of a single frame of test image, Lenna, 
Barbra and Peper at image size of 64x64. Test conducted at clock frequency of 
40MHz 178 
8.3: Power consumption and comparison figures of various DWT algorithms on three 
different platforms for the processing of a single frame of test image, Lenna, 
Barbra and Peper at image size of 128x128. Test conducted at clock frequency 
of 40MHz 179 
8.4: Process completion time for various implementation platforms for a single frame 
of test image at image size of 64x64 at clock frequency of 40Hz 	 180 
8.5: Power consumption and comparisons for various implementation platforms for a 
single frame of test image at image size of 64x64 at clock frequency of 40MHz. 
Test conducted for 137488nS. 180 
8.6: The idle period power consumption for various implementation platforms for a 
single frame of test image at image size of 64x64 at clock frequency of 40MHz. 
Test conducted for 137488nS. 	 182 
8. 7: Comparison of Map and Place & Route Reports (Xilinx Virtex-E). 	 184 
8. 8: Comparison of ASIC synthesis results. 	 185 
8. 9: Area Comparisons. 	 185 
8.10: Full TMR Power consumption figures of various DWT algorithms on three 
different platforms for the processing of a single frame of test image, Lenna, 
Barbra and Pepper at image size of 64x64. Test conducted at clock frequency of 
40Hz 185 
8.11: Mavis et. al [MAV-002]Power consumption figures of various DWT 
algorithms on three different platforms for the processing of a single frame of 
test image, Lenna, Barbra and Pepper at image size of 6404. Test conducted at 
clock frequency of 40MHz 
	 187 
8.12: Area Comparisons of Mavis et. al [MAV-0021. 	 188 
8.13: Lima et. al [LIM -003]Power consumption figures of various DWT algorithms 
on three different platforms for the processing of a single frame of test image. 
Lenna, Barbra and Pepper at image size of 64x64. Test conducted at clock 
frequency of 40MHz 189 
8.14: Power consumption figures and comparisons of various DWT algorithms on 
different platforms for the processing of a single frame of test image, Lenna, 
Barbra and Pepper at image size of 6404. Test conducted at clock frequency of 
40MHz 190 
xv" 
8.15: Percentage Power saving figures of the proposed techniques for various DWT 
algorithms on the proposed reconfigurable fabric for the processing of a single 
frame of test image, Lenna, Barbra and Pepper at image size of 64x64. Test 
conducted at clock frequency of 40MHz 191 
8.16: SEU immunity comparison of the proposed techniques for various DWT 
algorithms on the proposed reconfigurable fabric for the processing of a single 
frame of test image (Lenna) at image size of 64x64. 192 
8.17:SEU immunity comparison of the proposed techniques for various DWT 
algorithms on the proposed reconfigurable fabric for the processing of a single 
frame of test image (Lenna) at image size of 64x64. 192 
xviii 
Acronyms and Abbreviations 
1 -D One-Dimensional 
2-D Two-Dimensional 
ALU Arithmetic Logic Unit 
ASIC Application Specific Integrated Circuit 
C-Box Connection Box 
CLB Configurable Logic Block 
CMOS Complementary Metal Oxide Semiconductor 
COTS Commercially-Off-The-Shelf 
CPU Central Processing Unit 
DCT Discrete Cosine Transform 
DFF D-Flip-Flop 
DHRC Dual Hardware Redundancy with Comparison 
DICE Dual Interlocked Cell 
DSP Digital Signal Processor 
DWT Discrete Wavelet Transform 
DWT-1 Lian et. al. based implementation on the proposed RA (Chapter-8) 
DWT-2 Integer Fast DWT implementation on the proposed RA (Chapter-8) 
DWT-3 Proposed Lifting DWT implementation on the proposed RA (Chapter-8) 
EDAC Error Detection And Correction 
EEPROM electrically Erasable Programmable Read Only Memory 
epi Epitaxial 
FPGA Field Programmable Gate Array 
FPL Field Programmable Logic 
FT Fourier Transform 
HBD Hardening by Design 
HPF High Pass Filter 
HS Half Sample Extension 
IC Integrated Circuit 
IMP-i The Proposed Lifting Based DWT Implementation (chapter-3) 
IMP-2 Lian et. al. Lifting Based DWT Implementation (Chapter-3) 
IMP-3 9/7 Integer Fast based DWT Implementation (Chapter-3) 
xix 
JPEG Joint Photographic Experts Group 
LAB Logic Array Blocks 
LPF Low Pass Filter 
NMOS N-channel Metal Oxide Semiconductor 
PIP Programmable Interconnect Point 
PLD Programmable Logic Devices 
PMOS P-channel Metal Oxide Semiconductor 
PMOS P-channel Metal Oxide Semiconductor field-Effect Transistor 
PTMR-1 Partial Triple Modular Redundancy with proposed Model (chapter-6) 
PTMR-2 Partial Triple Modular Redundancy with Table-6.1 (Chapter-6) 
RA Reconfigurable Array 
RAM Random Access Memory 
RC Reconfigurable Computing 
RTL Register Transfer Level 
S-BOX Switch Box 
SEB Single Event Burnout 
SEE Single Event Effects 
SEH single Event functional Interrupt 
SEL Single event Latch-up 
SET Single Event Transient 
SEU Single Event Upset 
SoC System on Chip 
SO! Silicon On Insulator 
SRAM Static Random Access Memory 
STFT Short time Fourier Transform 
TID Total Ionization Dose 
TMR Triple Module Redundancy 
VLSI Very Large Scale Integration 
WS Whole Sample Extension 





The development in Very Large Scale Integration (VLSI) digital technology during the 
last decade has led to the development of systems with both high throughput and 
performance capabilities. These capabilities have given systems, devices and applications 
the ability to perform complex and computationally intensive tasks in real time that were 
never before possible. The huge commercial market has encouraged both designers and 
manufacturers to develop state of the art electronic circuits. Whereas, Aerospace and 
Military market could not keep pace with the commercial market due to limited production 
volumes and intensive manufacturing constraints. Among these state of the art commercial 
devices, reconfigurable architectures and more specifically Field Programmable Gate 
Arrays (FPGA) are replacing traditional logic circuits by offering the advantages of high 
integration, small size, low power and high flexibility. Applications using reconfigurable 
architectures represent an opportunity for new innovations in the embedded and high 
performance computing industry. Commercially-off-the-shelf (COTS) based 
reconfigurable architectures can provide a clear pathway to technology independence and 
the reduction of spiral development efforts for complex defence- and Aerospace-oriented 
electronics systems. 
As the microelectronics industry has advanced, Integrated Circuit (IC) designs in general 
and FPGA designs in particular have experienced dramatic increases in both density and 
speed, largely due to the decreasing feature sizes with which these devices can be 
manufactured. These advances are not without serious implications for microelectronics 
when used in space applications where ICs are subjected to a total ionizing dose as well as 
single event effects (SEE). Of these effects, SEUs represent the radiation-induced hazard 
most difficult to avoid in space borne microelectronic systems. In describing SEE 
mitigation techniques, we consider CMOS device technologies and their response to the 
cosmic ray environment of space. In particular, we address the following issues: 
• The impact of shrinking device sizes regarding SEUs in space-borne electronics 
• The importance of SET in the combinatorial logic of reconfigurable architectures 
21 
Chapter 1: Introduction 
The importance of SETs in the configuration bits of reconfigurable architectures 
The aim of this thesis is to review upset effects and their mechanisms and to determine the 
implications for present day space-borne FPGAs. We distinguish between two distinct 
upset mechanisms. The traditional SEU mechanism is related to logic state changes in 
storage cells (flip-flops, memories, latches, etc.). An emerging upset mechanism known as 
SET begins to impact the operation of space systems fabricated in deep submicron CMOS 
and submicron EEPROM technologies. Several conventional and commonly used SEU 
and SET mitigation techniques have been developed, so one of the aims of this research is 
to analyse these schemes and their inherent limitations. Finally, a new and novel circuit 
approach is proposed which is inherently immune at any technology feature size to both 
present day upset mechanisms and to emerging upset mechanisms. It not only addresses 
upsets in latches, but also addresses upsets caused by transients in combinatorial logic, 
global clock signals, and global control lines. When applied with special latches, the 
approach can also eliminate upsets caused by a single cosmic ray simultaneously striking 
two sensitive junctions. 
Image/Video application is one of the computationally extensive applications that are in 
demand for aerospace/defence, automotive, communications, consumer electronics, 
education, and medical electronics industries. With the advancement in VLSI digital 
technology, many new high performance and throughput image and video applications 
have emerged. The demand for image/video application in all sorts of fields has increased 
in the recent years. At the core of these productive and useful applications is image 
compression technology. Discrete wavelet transform (DWT) is one of these algorithms 
that had been developed for compression of image/video data. DWT is becoming popular 
because of it's many inherent features such as progressive image transmission by 
quality/resolution and above all the ease of manipulating compressed image data. DWT 
has been selected as a target application for the proposed architecture due to its unanimous 
importance in all fields. 
The rest of the chapter is organised as: section 1.2.1 advocates the importance of 
reconfigurable architectures. Section 1.2.2 establishes the requirement for these state of the 
art COTS centric reconfigurable fabrics to be used in aerospace and military industry. The 
next section looks into the challenges faced by incorporating these high performance 
devices into high demanding aerospace industry. Section 1.3 sights the main contributions 
of the thesis and section 1.4 describes the overall structure of the thesis and chapter 
contents. A summary of this chapter is given in section 1.5. 
22 
Chapter 1: Introduction 
1.2 Motives and Objectives 
FPGAs are an attractive hardware design option for many aerospace related computing 
applications. They can be reprogrammed while in orbit to adapt to changing mission 
requirements and allow design modifications. Although these COTS reconfigurable 
architectures offer several advantages for aerospace based operations, they are sensitive to 
single event upsets. This is an active research topic to explore how to take advantages of 
these COTS reconfigurable architectures especially when used in aerospace applications. 
Several techniques have been proposed in literature, to make these designs more reliable in 
the presence of radiation. The radiation hardening to existing COTS reconfigurable 
architectures comes either at the cost of extra area, power, or speed. Therefore, it is 
imperative to explore new techniques to make these state of the art technologies more 
reliable so that they can be used in aerospace or mission critical applications. The main 
aims and objectives of this research can be classified as follows: 
• Knowledge and comparative study of existing reconfigurable architectures. 
• Knowledge and study of DWT theory and different implementation techniques 
• Knowledge and study of radiation effects. 
• Knowledge and comparative study of existing radiation mitigation techniques. 
• To propose a domain specific reconfigurable architecture for DWT domain. 
• To propose and implement design flow for the proposed architecture. 
• To develop the proposed architecture as a synthesizable soft core. 
• To propose SEU/SET mitigation techniques for sequential circuit elements. 
• To propose SEU/SET mitigation schemes for combinatorial logic circuits. 
• To Propose and design a SEU/SET simulator for verification of the proposed 
techniques. 
• To implement and verify the proposed techniques on the proposed reconfigurable 
architecture. 
• To measure and compare the performance (area, speed, power consumption and 
SEU/SET resiliency) of the proposed SEU/SET hardened reconfigurable 
architecture. 




1.2.1 Reconfigurable Computing 
Various reasons have lead to a growing importance of reconfigurable systems. For 
instance, they opened up the opportunity to migrate from instruction processing to data 
processing, and thus offer a chance to overcome the classical Von Neumann bottleneck. 
Another partial reason is that reconfigurable devices form a perfect basis for multi-
protocol/multi-standard chips, with which the exploding mask cost can be amortized much 
better. Reconfigurable hardware is nowadays available in different granularities. Most of 
these variants have in common that they can be configured more than once, even at 
runtime. Dynamically reconfigurable systems (DRS) therefore open up a new dimension 
of using chip area. Adaptive systems can be realized, which can react to the needs of the 
running application. Also, aspects of performance efficiency (area, speed, power etc.) or 
fault tolerance can be considered [CAR-001]. Thus, this technology is a perfect match for 
the vision of smart systems for mission critical applications, for example, Aerospace 
related projects. 
When we talk about reconfigurable computing we're usually talking about FPGA-based 
system designs. Unfortunately, that doesn't qualify the term precisely enough. Field 
Programmable Gate Arrays are programmable semiconductor devices that are based on a 
matrix of configurable logic blocks (CLBs). These CLBs are connected through 
programmable interconnects. In contrast to Application Specific Integrated Circuits 
(ASICs) where the device is custom built for a particular design, FPGAs can be 
programmed according to the desired requirements of application or functionality. 
Although one-time programmable (OTP) FPGAs are also available, the dominant type is 
SRAM based which can be reprogrammed as the design evolves [BET-099]. 
System designers use FPGAs in many different ways. One of the most common uses of 
FPGAs is for prototyping the design of an ASIC [COM-0991. In this scenario, the FPGA is 
present only on the prototype hardware and is replaced by the corresponding ASIC in the 
final production system. This use of FPGAs has nothing to do with reconfigurable 
computing. However, many system designers are choosing to leave the FPGAs as part of 
the production hardware. Lower FPGA prices and higher gate counts have helped drive 
this change. Such systems retain the execution speed of dedicated hardware but also have 
a great deal of functional flexibility. The logic within the FPGA can be changed if or when 
it is necessary, which has many advantages. For example, hardware bug fixes and 
upgrades can be administered as easily as their software counterparts. 
24 
Chapter 1: Introduction 
Reconfigurable computing involves manipulation of the logic within the reconfigurable 
core at run-time. In other words, the design of the hardware may change in response to the 
demands placed upon the system while it is running. It acts as an execution engine for a 
variety of different hardware functions, some executing in parallel, others in serial much 
as a CPU acts as an execution engine for a variety of software threads. Reconfigurable 
computing allows system designers to execute more hardware than they have gates to fit, 
which works especially well when there are parts of the hardware that are occasionally 
idle. Through reconfigurable computing, it is possible to design systems that do more, cost 
less, and have shorter design and implementation cycles. Reconfigurable computing has 
several advantages. Firstly, it is possible to achieve greater functionality with a simpler 
hardware design as all of the logic does not need to be present in the reconfigurable core at 
all times, the cost of supporting additional features is reduced to the cost of the memory 
required to store the logic design. Another advantage is lower system cost, which does not 
manifest itself exactly as one might expect. On a low-volume product, there will be some 
production cost savings, which result from the elimination of the expense of ASIC design 
and fabrication. However, for higher-volume products, the production cost of fixed 
hardware may actually be lower [COM-0991. We have to think in terms of lifetime system 
costs to see the savings. Here, technical obsolescence drives up the cost of systems based 
on fixed-hardware designs. Systems based on reconfigurable computing are up-gradable in 
the field. Such changes extend the useful life of the system, thus reducing lifetime costs. 
The main advantage of reconfigurable computing is reduced time-to-market. There are no 
chip design and prototyping cycles which eliminate a large amount of development effort. 
In addition, the logic design remains flexible right up until (and even after) the product is 
shipped which allows for an incremental design flow. This even allows shipping a product 
that meets the minimum requirements and adding features after deployment. 
The main aim of this thesis is to design an efficient reconfigurable architecture. The thesis 
also investigates the advantages and limitations of the architecture if it is tailored for a 
particular domain and whether it can provide the required flexibility while maintaining the 
performance advantage over generic reconfigurable cores 
1.2.2 Role of COTS Technology in Aerospace Industry 
In the commercial sector, consumer electronics are driving the need for state of the art 
processors, sensors and digital multimedia semiconductor designs. The vast market 
opportunity and tremendous unit volumes associated with the consumers segments garners 
the attention of the largest COTS technology providers. This widespread use and growing 
25 
ChaDter 1: Introduction 
list of applications, for example mobile phones, digital media players, personal digital 
assistance and portable computers, dictates a very efficient and flexible solution. 
Consumer and commercial applications and their host devices are changing very rapidly. 
These changes are mainly attributed to the improvement and development of new 
algorithms, advances in technology and to consumer's changing demands and wishes. The 
typical usable life of a given consumer electronics device can easily be as short as one 
buying session and rarely exceeds a few years. These factors have urged the 
consumer/commercial industry towards reconfigurable architectures. Formerly, FPGA 
solutions had a reputation of being costly, due to long development cycles and high 
development costs compared to traditional software-based solutions. But today's FPGA 
devices have a short time-to-market due to high consumer demands and advances in 
technology, in addition, they offer low power and greater flexibility that simplifies adding 
a developers intellectual property (IP) to any system. 
The solid state electronics industry has grown in parallel with the jet airplane industry. 
Both were "invented" in the 1940's, saw their first significant applications in the 1950's 
and have grown to maturity since then. In the early days, military and commercial 
aerospace manufacturers depended on a well-developed military electronic components 
and specifications infrastructure to assure long-term availability of components to meet 
their needs. This was possible because the military market sector comprised about 25% of 
the total market; it was responsible for a good deal of the device innovation, and therefore 
"owned" many device designs. As a result, military and commercial aerospace electronic 
design, manufacturing, procurement, operation, maintenance, and support decisions have 
been based on two assumptions: 
The supply of electronic components will be available for a limited time. 
Component designs will remain stable for long periods of time. 
These assumptions are no longer true. The entire aerospace industry (including both 
commercial and military) now consumes less than one per cent of the electronic 
components produced. The major component markets are computers, consumer 
electronics, and others, which do not have the demanding environmental or long 
production life cycle requirements of aerospace products. This means that the availability 
of components specified for aerospace applications is decreasing. Since 1992, at least 12 
major manufacturers of electronic components, including Motorola, Intel, and Philips, 
have left the military market [DSP-iOU]. Therefore. Space-borne microelectronics 
typically lags behind their commercial counterparts by one or two generations because of 
26 
Chapter 1: Introduction 
more complicated fabrication steps and low market volume. The limited market results in 
higher unit prices and a longer time-to-market but at the same time requires efficient and 
state of the art technology. Although electronic components and systems are not the 
largest cost elements in military or commercial aerospace vehicles, they are ubiquitous: 
electronic components are to be found in almost every system, including those that are 
primarily mechanical, hydraulic and pneumatic. This has resulted in the steady growth in 
importance of aerospace electronics since the beginning of the jet age. 
Each year, a lot of money is spent and allocated specifically for upgrading and maintaining 
aging and obsolete systems in aerospace and military industry. Therefore it is recognized 
that these obsolete systems should be replaced with a technology which can incorporate 
future changes and upward trends of the technology. 
The obvious choice is reconfigurable architectures which provide enough performance and 
flexibility required for mission critical applications. COTS based integrated circuit 
technology in general and reconfigurable architectures in specific have become a primary 
staple within the Aerospace research and development (R&D) and production teams. 
Science researchers are aware of how the defence/Aerospace market segments can take 
advantage of the benefits of COTS reconfigurable technology while still addressing the 
performance limitations and obsolescence challenges at hand. There is a natural 
progression in computing technology that can serve to address obsolescence issues while 
delivering the required performance requirements of complex electronics systems. 
Therefore, Programmable Logic Devices (PLD) and more specifically FPGAs are 
replacing traditional logic circuits by offering the advantages of high integration (small 
size, low power, and high reliability) without the disadvantages of custom ASICs (high 
non-recurring engineering cost and high risk, especially in limited production volume). 
SRAM based FPGAs offer an additional unprecedented advantage. These can be 
reprogrammed for an unlimited number of times, even in the end-user's system. 
Reconfigurable architectures are becoming increasingly popular with space related design 
engineers because of the benefits discussed above. The salient feature of the 
reconfigurable architectures can be classified as that they are inherently flexible to meet 
multiple requirements and offer significant performance and cost savings for critical 
applications. As the microelectronics industry has advanced. Integrated Circuit (IC) design 
and reconfigurable architectures (FPGA5, reconfigurable SoC and etc.) have experienced 
dramatic increase in density and speed. 
27 
Chapter 1: Introduction 
Advances in System-on-Chip (SoC) based multi-processing architectures will enable 
designers of complex aerospace systems to retain the measurable gains realized through 
the historic exploitation of COTS. Reconfigurable SoC architectures can address the 
growing need to add low power processing capacity while mitigating long term 
obsolescence problems on the front end of the design process. 
Despite the benefits of reconfigurable COTS available architectures, reliability issues limit 
their widespread use in safety or mission critical applications. Although radiation hardened 
reconfigurable devices are also available but they are much more expensive than standard 
devices, and thus when cost and flexibility are major concerns they are not affordable. 
Moreover, majority of SEU immune FPGAs are based on antifuse technology that does 
not allow re-programmability. 
1.2.3 Single Event Upsets 
SEU is defined by NASA as radiation-induced errors in microelectronic circuits caused 
when charged particles (usually from the radiation belts or from cosmic rays) lose energy 
by ionizing the medium through which they pass, leaving behind a wake of electron-hole 
pairs." [NAS-WEB]. 
In the last decade, two major factors contributed to increase the importance of SEEs. One 
was the dramatic decrease in the number of manufacturers offering radiation-hardened (or 
more particularly to our purposes here, SEU-hardened) digital ICs. This (among other 
factors) led to the increased usage of commercial electronics in spacecraft systems. Many 
system designers at that time embraced the use of modem commercial ICs because of the 
increased functionality and performance. Their relative sensitivity to SEE presented 
significant challenges towards maintaining system reliability. The second development 
was the continued progression in fabrication technologies toward smaller IC feature sizes 
and higher speeds and more complex circuitry. These advances typically increase 
sensitivity to SEE, even for terrestrial applications in a benign desktop environment, and 
may also lead to new failure mechanisms. 
As we enter the 21st century, the increased sensitivity to SEU is expected to continue, both 
in memories and core logic. Upsets in terrestrial electronics are a serious reliability threat 
for commercial manufacturers. In fact, single-event vulnerability has become main stream 
product reliability metric for all facets of the integrated circuit industry, as outlined by the 
National Industry Association Roadmap [NTR-0991. Single event upsets on devices have 
Chapter 1: Introduction 
been constantly magnified due to the continuous technology evolution that has led to more 
and more complex architectures with an immense amount of embedded memories 
followed by an amazing scaling down process of transistor dimensions (Moore's Law) 
[MOO-075]. Semiconductor process technology is approaching the ultimate limits of 
silicon in terms of transistor geometry shrinking, power supply, speed and density [NTR-
0941. By approaching these limits, the circuits are becoming more and more sensitive to 
noises coming from magnetic fields, signal couplings and radiation fields. The need to 
protect them has become more and more essential. Terrestrial applications that are 
determined as critical, such as bank servers, telecommunication servers and avionics, 
require more and more the use of tolerant techniques to assure reliability. Although many 
techniques have been developed in the quest to avoid an SEU, efficient fault tolerant 
solutions are still a challenge for the future generation semiconductor industry, especially 
due to the complexity of the new architectures. This thesis investigates different SEU 
hardening techniques and proposes a number of novel SEU mitigation techniques for 
synchronous and combinatorial elements of microelectronic circuits. The analysis of SEU 
effects on integrated circuits and the development of SEU mitigation techniques are 
strongly associated with the target device architecture. For each different circuit, there is a 
different most suitable SEU mitigation solution to be applied. Consequently, in order to 
suggest a SEU mitigation solution, it is necessary to investigate the architecture first. In 
the past years the integrated circuit industry has designed complex architectures in order to 
improve performance and logic density and to reduce cost. In order to propose a SEU 
solution for these complex architectures, a thorough insight of these complex systems is 
required which surely results in more design time and design efforts. Therefore, this thesis 
investigates reconfigurable architectures and novel SEU mitigation techniques are 
proposed for these fabrics. Circuit designs that are inherently radiation resistant [known as 
hardening by design (HBD)] are receiving considerable attention [FAC-099]. Therefore, 
the aim of this thesis is also to propose an efficient reconfigurable architecture for 
aerospace applications. 
1.3 Contributions 
The aim of this thesis is to realise an efficient reconfigurable architecture which can 
placate the disruptions caused by single event upsets when used in a hostile environment, 
for example aerospace related critical missions. As such, this work has lead to the 
development of the novel and efficient implementation scheme for 5/3 and 9/7 centric 
JPEG2000 lifting based DWT. This Scheme is built around common computational blocks 
which allow flexibility to incorporate different innovative algorithms. 
29 
Chapter 1: Introduction 
This work has lead to the development of a novel embedded reconfigurable architecture 
that can be incorporated into SoC design flow. The architecture is based around 
heterogeneous computing array elements which makes it efficient and flexible for DWT 
domain. 
This work has also lead to the development of a number of novel SEU mitigation 
techniques for synchronous and combinational elements. The work related to synchronous 
elements of reconfigurable architectures is based on hardware redundancy and temporal 
sampling to mitigate the effects of SEUs and SETs. The efforts regarding combinatorial 
parts of microelectronic circuits have produced two novel SEU mitigation schemes. One is 
a partial TMR based on input signal probability model and the second is based on dual 
hardware redundancy with comparison. 
Finally, the proposed DWT domain specific reconfigurable array fabric together with SEU 
hardening schemes was implemented. The power consumption and area of these 
architectures were evaluated and compared with one another. 
1.4 Thesis Overview 
This thesis is divided into eight different chapters with this chapter as the first chapter, the 
remaining structure of the thesis is organised as follows: 
Chapter 2 gives an overview of two different areas. Firstly, General purpose and 
domain specific reconfigurable computing are discussed and different reconfigurable 
design strategies and techniques researched and developed in the past few decades are 
presented. Secondly, the physical origins of disruptions along with the work done in 
the past are presented. Different already employed mitigation schemes for general and 
domain specific reconfigurable architectures are enlightened and their merits and 
limitations are discussed. 
• Chapter 3 details the scope and introduction of Discrete Wavelet Transform's theory. 
Different types of DWT along with different VLSI implementation schemes are 
discussed. The chapter presents the proposed embedded reconfigurable fabric for 
DWT. Different building blocks for the proposed reconfigurable architecture are 
discussed. In addition the design flow is presented. Finally, hardware architecture 
comparison in terms of area and power consumption is made and presented. 
30 
Chapter 1: Introduction 
• Chapter 4 gives an overview of error detection and correction codes (EDAC). A brief 
introduction of different types of EDACs is presented. The details of the proposed 
EDAC codes are discussed and performance is evaluated in terms of area and power 
consumption. 
• Chapter 5 presents the proposed SEU mitigation scheme for synchronous elements of 
the reconfigurable architecture. Various concepts and different blocks of the proposed 
scheme are discussed along with the design flow. In addition, the proposed software 
SEU simulator is introduced. Different possible optimisations for the proposed 
mitigation scheme are discussed. Finally, power and area evaluation of a hardware 
performing DWT computations are presented. 
• Chapter 6 gives the detail architecture of the proposed techniques for combinatorial 
logic. Two schemes are proposed. One is partial TMR approach based on input signal 
probability and the second is dual hardware redundancy with comparison. This chapter 
presents the details of partial TMR scheme. Finally, performance evaluation of a 
hardware implementing these two schemes for DWT computations is carried out and 
results are discussed. 
• Chapter 7 presents the proposed SEU mitigation scheme for combinational elements 
based on dual hardware redundancy with comparison for the reconfigurable 
architecture. Related work has been explored and the chapter introduces various 
concepts. Different blocks of the proposed scheme are discussed along with the design 
flow. Finally, power and area evaluation of hardware realization are presented. 
• Chapter 8 details the implementation of the proposed schemes on the proposed 
reconfigurable fabric. Design flow along with comparative evaluation among different 
schemes is discussed. 
• Chapter 9 presents a summary of all the findings of the contributing chapters, 
reiterates the main contributions and gives a summary of the thesis. Finally some 





During recent years, a number of research efforts focused on the design of new 
reconfigurable systems for general purpose and for particular areas of application. The 
driving force behind such growth in the number of research activities is the potential of 
reconfigurable computing to greatly accelerate a wide variety of applications. The work in 
this area flows in two major directions. The first hardware oriented direction is geared 
toward designing new hardware architectures or optimizing the current architectures. The 
second software oriented sub-area of research is focused on the investigation of new 
placement, routing, and mapping methods that tackle the dynamic reconfiguration 
challenges. 
The work in the area of developing new reconfigurable architectures covers the research 
on coarse-grained and fine-grained reconfigurable architectures. Additionally, some of 
these architectures are created on more than one reconfigurable chips ([MIR-096]. [RAD-
098], and [VIL-0981). These architectures (or rather systems) are called Custom 
Computing Machines (CCM). In this section of the dissertation the focus will be mainly 
on coarse-grained-single-chip architectures, since they are related to this work. 
The chapter gives details of related previous work done in the fields of reconfigurable 
architectures and their hardening techniques. The section 2.2 looks into different features 
reconfigurable architectures along with their limitations. The sub-sections 2.2.1, 2.2.1.1 
and 2.2.1.2 give details of different general purpose reconfigurable architectures based on 
their granularity. The section 2.3 enlightens the space related issues when these state of the 
art microelectronics circuits are used in harsh environments. A brief study of different 
basic concepts is presented. A theoretical background of SEU phenomenon and methods 
of mitigating radiation effects is presented. The most commonly used latest reconfigurable 
-32- 
Chapter 2: Literature Review 
products are discussed with their advantages and limitations in terms of SEU 
susceptibility. 
2.2 Reconfigurable Architectures 
The reconfigurable architectures can be classified on the basis of different criterion for 
example, granularity (fine grain and coarse grain), re-configurability (dynamic 
reconfigurable and static reconfigurable) and structural (two dimensional and one 
dimensional structures). The criterion adopted in this thesis is based on the intended use of 
the reconfigurable architectures or target application. Therefore, the reconfigurable 
architectures are classified in two different types, which are: 
• General purpose reconfigurable architectures 
• Domain specific reconfigurable architectures 
General purpose reconfigurable architectures mainly consist of FPGA and some other 
general purpose architectures. These are discussed in the next sections. 
2.2.1 General Purpose Reconfigurable Architectures 
FPGAs are mainly classified as general purpose reconfigurable cores. FPGAs are available 
in different granularities by many commercial vendors. The devices are inherently flexible 
to meet multiple requirements and offer significant cost and flexibility advantages. As 
FPGAs are re-programmable, data can be sent after launch to correct errors or to improve 
the performance of spacecraft. Therefore, Field Programmable Gate Arrays are becoming 
increasingly popular with spacecraft electronic designers as they fill a critical niche 
between discrete logic devices and the mask programmed gate arrays. 
The architecture of a programmable device is based on an array of logic blocks that can be 
programmable by the interconnections to implement different designs. A FPGA logic 
block can be as simple as a small logic gate or as complex as clusters composed of many 
gates. The routing architecture incorporates wire segments of various lengths, which can 
be interconnected via electrically programmable switches. The distribution of different 
length wire segments affects the density and the performance of the FPGA. There are 
mainly three types of such programmable switch technologies currently in use: 
33 
Chapter 2: Literature Review 
• SRAM, where the programmable switch is a pass transistor controlled by the state 
of a configuration bit. 
Anti-fuse, when an electrically programmable switch forms a low resistance path 
between two metal layers. 
• EPLD/EEPLD based on EPROM. EEPROM or FLASH cell, where the switch is a 
floating gate transistor that can be turned off by injecting charge into the floating 
gate. 
Each of them has particular architecture and logic blocks in its matrix. Table-2.1 and 
Table-2.2 summarizes the salient features of each technology. 
Table-2. I: FPGA Technology Characteristics 
TECHNOLOGY VOLATILE RE-PROGRAMMABLE 
SRAM YES In Circuit 
Anti-Fuse NO No 
FLASH NO In circuit 
EPROM NO Out Circuit 
EEPROM NO In Circuit 
Table-2. 2: Commercial FPGA/PLD Characteristics 
TECHNOLOGY ARCHITECTURE COMPANY LOGIC BLOCK EXAMPLES 
SRAM Symmetric Array Xilinx LUT XC4000, Virtex 




Anti-Fuse Row Based Array Actel Mux SX, MX Series 
SRAM Symmetric Array Altera LUT Flex 8K, 10K etc 
EEPROM Hierarchal PLD Altera OR-AND Array MAX7000 etc 
2.2.1.1 Fine Grain General Purpose Reconfigurable Cores 
Most reconfigurable hardware is based upon a set of symmetric/repeated computation 
structures to form an array. These structures, commonly called logic blocks or cells, vary 
in complexity from a very small and simple block that can calculate a function of only 
three inputs, to a structure that is essentially a 16-bit ALU. Some of these block types are 
configurable - the actual operation is determined by a set of loaded configuration data. 
Other blocks are fixed structures, and the configurability lies in the connections between 
them. Granularity refers to the size and complexity of the computing blocks. 
34 
Chapter 2: Literature Review 
An example of a very fine-grained logic block can be found in the Xilinx 6200 series of 
FPGAs [XIL-096}. Although this is now unfortunately no longer available but it is the 
finest grain architecture and is useful product for comparisons. The functional unit from 
one of these cells can implement any two-input function and some three-input functions. 
Although this type of architecture is useful for very fine-grained bit manipulation, it is too 
fine grained to efficiently implement many types of circuits, such as multipliers. Similarly, 
finite state machines are frequently too complex to easily map to a reasonable number of 
very fine-grained logic blocks. However, finite state machines are also too dependent upon 
single bit values to be efficiently implemented in a very coarse-grained architecture. This 
type of circuit is more suited to an architecture that provides more connections and 
computational power per logic block, while still providing sufficient capability for bit-
level manipulation. The logic cell in the Altera FLEX 10K architecture [ALT-0981 is a 
fine-grained structure that is somewhat coarser than the Xilinx 6200. This architecture 
mainly consists of a single 4-input LUT with a flip-flop. Also, there is specialized carry 
chain circuitry that helps to accelerate addition, parity, and other operations that use a 
carry chain. These types of logic blocks are useful for bit-level manipulation of data, 
which is frequently found in encryption and image processing applications. Because the 
cells are fine-grained, computation structures of arbitrary bit widths can be created, which 
allows the implementation of data-path circuits that are based on data widths not 
implemented on the host processor (5 bit multiply, 21 bit addition, etc). Reconfigurable 
hardware can not only take advantage of small bit widths, but also large data widths. 
When a program uses bit widths in excess of what is normally available in a host 
processor, the processor must perform the computations using a number of extra steps to 
accommodate the full data width. A fine-grained architecture can implement the full bit 
width in a single step, without the fetching, decoding. and execution of additional 
instructions, provided enough logic cells are available. 
2.2.1.2 Medium Grain General Purpose Reconfigurable Cores 
A number of reconfigurable systems use a medium-grained logic block [XIL-094][HAU-
097] [HAY-098] [LUC-098] [MAR-99a]. The Garp [HAU-097] [CAL-000] is designed to 
perform a number of different operations on or up to four 2-bit inputs. Xilinx Virtex FPGA 
is an example of medium-grained architecture due to it's relatively coarser blocks than 
Xilinx 6200 series [XIL-0001. The CHESS architecture [MAR-99a] is another medium-
grained structure that was designed to be embedded inside a general-purpose FPGA to 
implement multipliers of a configurable bit-width [HAY-098]. The logic block used in the 
multiplier FPGA is capable of implementing a 4x4 multiplication, or can be cascaded into 
35 
Chanter 2: Literature Review 
larger structures. The CHESS architecture also operates on 4-bit values, with each cell 
acting as a 4-bit ALU. Medium-grained logic blocks can implement data-path circuits of 
varying bit widths, similar to the fine-grained structures. The ability to perform more 
complex operations of a greater number of inputs permits this structure to efficiently 
implement a wider variety of operations. 
2.2.1.3 Coarse Grain General Purpose Reconfigurable Cores 
Very coarse-grained architectures are used primarily to implement word-width data-path 
circuits. They perform these operations much more quickly (and consume less chip area) 
than a set of smaller cells connected to form the same type of structure because the logic 
blocks used are optimized for large computations. Moreover, their composition is static 
and they cannot leverage optimizations in the size of operands. The RaPiD-I architecture 
[EBE-0961 and the Chameleon architecture [CHA-WEB], are examples of very coarse-
grained designs. Each of these architectures is composed of word-sized adders, multipliers, 
and registers. Even when adding numbers smaller than the full word size, all of the bits in 
the full word size are computed, which can result in unnecessary area and speed 
overheads. However, these coarse-grained architectures are much more efficient than fine-
grained architectures for implementing functions closer to their basic word size. 
An alternate form of a coarse-grained system consists of logic blocks that are very small 
processors, potentially each with its own instruction memory and/or data values. The 
REMARC architecture [MIY-098] is composed of an 8x8 array of 16 bit processors. Each 
of these processors uses its own instruction memory in conjunction with a global program 
counter. This style of architecture closely resembles a single-chip multiprocessor with 
much simpler component processors, as the system is meant to be coupled with a host 
processor. The RAW project [MOR-098] is another example of a reconfigurable 
architecture based on a multi-processor design. 
The Morphosys architecture is an example of coarse-grain parallel reconfigurable system 
on chip (SoC). The main design motive of Morphosys architecture was to speed up general 
purpose computation intensive applications [LU-0991. It is mainly targeted for 
applications with inherent data-parallelism and high regularity. However, it can be used 
for general purpose computing due to the included processor. Some examples of the 
application areas for the Morphosys architectures are video compression (DCT, ME) and 
data encryption and DSP [LU-099]. These applications exhibit inherent parallelism and are 
trol 
Chapter 2: Literature Review 
highly regular. The Morphosys architecture is a combination of a RISC processor with an 
array of coarse-grain 8x8 reconfigurable cells [LU-099]. 
The Colt architecture [BIT-0971. is implemented by configuring pipelines or parts of 
pipelines. The concept is similar to PipeRench architecture [GOL -000][HER-002][GOL-
099]. However the Colt architecture is based on Wormhole-Run Time Reconfiguration 
(RTR) as an execution model [BIT-097]. The pipelines are used to process data streams 
whereby the data flow graph is reduced to a set of interacting pipelines. The configuration 
data holds the information about the routing and the functionality of all processing 
elements [BIT-097]. 
The KressArray [HAR-098] is a two dimensional array of 32-bit processing elements. 
These processing elements are classified as reconfigurable Data Processing Units (rDPUs). 
The KressArray is designed for general use in contrast to a domain specific reconfigurable 
solution. The KressArray supports partial dynamic reconfiguration HAR-0981. 
The Pleiades is an example of reconfigurable computing fabrics that do not follow the two 
dimensional array topology. The Pleiades (Ultra-Low-Power Hybrid and Configurable 
Computing) was designed for low power and high performance for multimedia computing 
applications [WAN-000] [RAB-097] [ZHA-099]. 
The PipeRench architecture is a coarse grain reconfigurable architecture [GOL-000] [HER-
002][GOL-0991. The fabric is an example of architectures based on linear array. The 
PipeRench is specifically designed to speed up the reconfiguration process through a 
unique technique called pipeline reconfiguration [HER-002]. The technique allows one 
stage of the pipeline path to be configured in every cycle, while concurrently executing all 
other stages. 
Some of the coarse grain reconfigurable architectures are discussed above. Many survey 
reports are available reconfigurable computing platforms in the academia and commercial 
can be found in [VIL-098] [HAR-00 1] [ABI-WEB] [HAR-0 1 A] [COM-099] [HAR-
001][HAR-0 1 A]. Some of the information on these architectures is not disclosed in 
literature because of being proprietary items. The most known/popular and relevant 
general purpose reconfigurable architectures are presented above on the basis of published 
information. 
Chapter 2: Literature Review 
2.2.2 Domain Specific Reconfigurable Architectures 
Domain specific reconfigurable architectures are an alternative to the general purpose 
reconfigurable computing. The idea of domain specific is inspired by high performance 
application specific reconfigurable architectures. The coarse-grained general purpose 
reconfigurable architectures were developed to overcome the performance limitations of 
fine-grained reconfigurable architectures. The coarse-grained reconfigurable architectures 
can be classified as domain-generic reconfigurable architectures. The Morphosys 
architecture is better and well suited for all domains of applications that have inherent data 
parallelism and are highly regular than fine grain general purpose reconfigurable 
architectures. Therefore, some applications in this wide spectrum of the domain can have 
better performance than some other applications for the particular coarse grain general 
purpose reconfigurable architecture. Reconfigurable instruction cell array (RICA) can be 
classified as domain generic reconfigurable architecture [RICA-Oi]. Specific cores are not 
restricted to a particular domain, and can run any code, however power consumption will 
be lowest if used for their targeted application [RICA-Oi]. RICA is under development 
and three cores are under development targeting networking applications, speech 
synthesis, and software defined radio [RICA-Oil. Another example of this type of 
reconfigurable architectures is Totem project [SCO-03]. The architecture is based on 
RaPiD coarse grain architecture and interconnects are refined through programmable array 
logic (PAL), programmable logic array (PLA) and complex programmable logic devices 
(CPLD). The domain specifications are defined at the higher abstraction level, appropriate 
functional blocks are selected from the design library and re-configurability is added 
through RaPiD coarse grain reconfigurable architecture. The architecture proposed by 
Arthur et. al. is another example of domain specific reconfigurable architectures [ART-
0961. The architecture is based on heterogeneous array of satellite processors and a control 
processor to configure the array of processors. 
The concept of domain specific reconfigurable computing in the last few years has gained 
considerable attention from the research community. Some domain specific cores have 
been developed around coarse grain general purpose architectures. Carl et. al [CAR-004] 
developed an OFDM receiver based on RaPiD coarse grain reconfigurable architectures. 
As discussed earlier, RaPiD is based on linear array with no register file and no crossbar 
interconnects. Therefore, the OFDM core has inherent limitations of the underlined 
architecture. 
Chapter 2: Literature Review 
Commercial solutions for telecommunication and wireless applications are being 
developed by Chameleon Systems [SMI-099][CHA-WEB], and MorphlCs [MOR-WEB]. 
Chameleon Systems Inc. announced the CS2000 family of multi-protocol multi-
application reconfigurable platforms for telecommunication and data communication. 
2.2.2.1 Domain Specific Reconfigurable Cores for DWT 
Based on all of the published work, no architecture was designed specifically for DWT. 
Some work has been done to add configurability at algorithmic level for DWT domain but 
these algorithms were implemented on general purpose FPGAs. for example, Georgi et. al. 
implemented lifting based DWT on Xilinx cores and Sarin et. al. proposed an adaptive 
reconfigurable encoder based on Xilinx FPGAs [GEO-001]IISAR -001 I. Additionally, a 
general architecture that could perform well for all applications is very hard (if not 
impossible) to be designed [HAR-096]. Therefore, tailored or domain specific 
architectures for different areas of application are currently necessary. Presently, no formal 
methods or guidelines have been devised for a reconfigurable architecture design. Today, 
image processing applications are of critical importance. Therefore, it is important to 
develop models and rules to design the reconfigurable architectures for these applications. 
2.3 Radiation Effects 
Microelectronic devices experience different kind of radiation defects when exposed to 
radiation. Some of them are destructive while others are non-destructive in nature. Figure-
2.1 represents different types of radiation effects on devices. A brief definition of these 
effects is presented. 
Radiation Effects on Devices 
Single Event Effects 	 Cumulative Effects 
Transient SEE 	Permanent SEE 	Static SEE 	I 
Total Ionization 	Displacement 
Dose 	 Damage 
Single Event 	 Single Event 
Upsets Functional Interrupt 
Single Event 	Single Event 	Single Event Gate 
Burnout Latch-up Rupture 
Figure-2. 1: Different Types of Radiation Effects on Devices. 
39 
Chapter 2: Literature Review 
2.3.1.1 Cumulative Effects 
Cumulative effects are caused by different particles for example. electrons. protons, 
neutrons. alpha particles, heavy ions, and gamma particles. These effects are long-lived. 
The accumulation of defects induced by energy deposited alters the material properties 
permanently which result into change in device parameters. These effects are further 
divided into two categories depending on the process through which the radiation deposits 
its energy in the material. 
Displacement Damage 
Displacement damage is mainly due to neutrons, protons, alpha, heavy ions, and gammas 
particles with very high energy. These high energy particles lose their energy by 
displacing atoms from their original positions. The energy deposited that contributes to 
displacement damage does not cause ionization in the material. This energy loss is called 
Non Ionizing Energy Loss (NIEL). The dislocated atoms create additional energy levels 
(defects states). These defect states allow trapping, recombination, and intermediate 
thermal excitation to the conduction band. These in turn change the carrier lifetime, 
mobility, and concentration, and therefore the device parameters. Specific examples are; 
gain decrease in bipolar devices, threshold voltage shift in Junction Field Effect 
Transistors (JFET), and leakage current in PIN (p-type intrinsic n-type) diodes. 
Total Ionizing dose (TID) 
The high energy radiation deposits energy by causing ionization in the material. The 
ionization can change the charge excitation, charge transport. bonding, and decomposition 
properties of the material, and therefore, the device parameters. An important example is 
charge sheet build-up near the Si-Si0 2  interface in MOS (Metal Oxide Semiconductor) 
structures and also creation of interface charge states. This causes threshold voltage shift, 
and leakage currents in MOS transistors. 
2.3.1.2 Single Event Effects (SEE) 
These effects can be classified into three sub-categories: 
i) 	Static Single Event Effects 
Static effects are classified as non-destructive in nature. These are further divided into two 
categories: 
Chapter 2: Literature Review 
a,) Single Event Upset (SEU): SEU is a change of state or transient induced by 
an energetic particle such as a cosmic ray or proton in a device. These are "soft 
errors" and a reset or rewriting of the device causes normal device behaviour 
thereafter. 
b) Single Event Functional Interrupt (SEFI): A severe SEU is the SEFI in 
which a SEU in the device's control circuitry places the device into a test mode, 
halt, or undefined state. The SEFI halts normal operations of the device, and 
requires a power reset to recover. 
Transient Single Event Effects 
Charge collection from an ionization event creates a spurious signal that propagates in the 
circuit. These are classified as soft errors. 
Permanent Single Event Effects 
These are classified as hard errors and may be destructive in nature. These effects can 
cause failure of some parts of logic device or complete device failure. These permanent 
single-event effects are further divided into three sub-categories. 
Single Event Latch up (SEL): a condition which causes loss of device 
functionality due to a single-event-induced high current state. A SEL may or may 
not cause permanent device damage, but requires power strobing of the device to 
resume normal device operations. 
Single Event Burnout (SEB): a condition, which can cause device 
destruction due to a high current state in a power transistor. 
Single Event Gate Rupture (SEGR): a single ion induced condition in 
power MOSFETs that may result in the formation of a conducting path in the gate 
oxide. 
2.3.2 Single Event Upsets 
SEE are caused when highly energetic particles present in the natural space environments 
(e.g. protons, neutrons, alpha particles or any other heavy ions) strike sensitive regions of a 
microelectronics circuit. The outcome of the strike depends on many factors, therefore the 
strike may cause: 
41 
Chapter 2: Literature Review 
No observable effect, 
A transient disruption of circuit operation, 
A change in logic state, 
A permanent damage to the device or integrated circuit (IC). 
In this section, we will examine the basic physical mechanisms causing SEE in digital 
microelectronics for space-borne applications. Due to reflection of their relative 
importance in the commercial marketplace, we will concentrate on silicon CMOS devices 
and digital ICs. Our focus and scope of this thesis is limited to non-destructive SEEs. We 
begin with a brief historical overview of the discovery of SEU in space and terrestrial 
systems. We will then discuss the mechanisms and characteristics of non-destructive SEE 
in detail, with particular emphasis on SEU in DRAM, SRAM and SET in logic. The 
traditional SEU mechanism is related to changes in logic state of storage cells (e.g. flip-
flops, memories, or latches). Then we will encompass several conventional and commonly 
used techniques to mitigate SEU and SET at different levels (e.g. device, circuit, and 
system), along with their inherent limitation. We will look at different industrial 
reconfigurable architectures and the technique employed by the industry to make their 
product radiation hardened. 
2.3.2.1 Brief History 
Surprisingly, the first ever paper to deal with the issue of SEU was not a paper on the use 
of electronics in the space environment, but a paper assessing scaling trends in terrestrial 
microelectronics [WAL-062]. In this paper the authors have forecasted the eventuality of 
SEU occurrence in microelectronics due to terrestrial cosmic rays and deduced that the 
minimum volume of semiconductor devices would be limited to about 10jm on a side due 
to these upset mechanisms. However, the first confirmed report on cosmic-ray induced 
upsets in space was presented in 1975 by Binder et. al. [BIN -075]. In this document, four 
upsets were observed in 17 years of satellite operation in bipolar J—K flip—flops. 
It took a few years before the importance of SEU was fully recognized. Significant SEU-
related research was not published until 1978-1979. The soft errors in terrestrial 
microelectronics manifested themselves shortly after the first observations of SEU in 
space [MAY-079]. This important discovery by Intel engineers revealed a significant error 
rate in DRAMs as integration density increased up to 64K, spurring a flurry of terrestrial 
SEU-related work in the late 1970s [ZIE-078]. The primary cause of soft errors at the 
ground level was attributed to alpha-particle contaminants in packaging materials [MAY- 
42 
2: Literature Review 
0791. By using low-activity materials for IC fabrication and on-chip shielding coatings 
[MAY-79A][PIC -078]. the terrestrial soft error problem essentially disappeared for several 
years and has only recently become of serious concern again. In the late 1970s the first 
model to predict system error rates was formulated [WAY-079]. 
The term "Single-Event-Upset" was introduced by Guenzer et. al. [GUE-079] and 
immediately adopted by the community to describe upsets caused by both direct and 
indirect ionization. The first watershed report on si ngle-event- latch up (SEL), describing 
the potentially destructive nature of the failure mode, was published in 1979 [KOL-0791. 
By 1979, the important fact of soft errors due to proton and neutron indirect ionization was 
very much established [GUE -079]1K0L-079]. This was a very important discovery due to 
the presence of large numbers of protons as compared to heavy ions in the natural space 
environment. In fact, proton-induced SEE often dominates the single event response of 
commercial ICs operating in low earth orbits [GUE-079]. 
In the early 1980s. research on SEU continued to expand, and methods for hardening ICs 
against SEU were widely developed and used throughout the decade [GID -085][DIE-084]. 
Much of the research in 1980s was focused on errors observed in latch circuitry, such as 
DRAMs, SRAMs, non-volatile memories, latches, and registers. There were, however, a 
few studies in the 1980s addressing another emerging and potentially troublesome single-
event issue; errors due to single events in combinational or embedded core logic. Other 
studies of combinational logic was done in the late 1980s (e.g., [FRI-085][MOO-088]),  but 
the work on combinational logic was mainly overshadowed by the huge volume of work 
addressing memory upset. 
In the 1990s. two major factors contributed to increase the importance of SEEs. One was 
the dramatic decrease in the number of manufacturers offering radiation-hardened (or 
more particularly to our purposes here, SEU-hardened) digital ICs. This among other 
factors led to the increased usage of commercial electronics in spacecraft systems. Many 
system designers at that time embraced the use of modern commercial ICs because of the 
increased functionality and performance. Their relative sensitivity to SEE presented 
significant challenges towards maintaining system reliability. The second development 
was the continued advancement in fabrication technologies toward smaller IC feature sizes 
and the higher speeds and more complex circuitry. These advances typically increase 
sensitivity to SEE, even for terrestrial applications in a benign desktop environment, and 
may also lead to new failure mechanisms. These two factors, or the developments of the 
43 
2: Literature Review 
past decade, have led to an interesting convergence of applications from two different 
communities within the integrated circuit field: 
As we enter the 21st century, Semiconductor process technology is approaching the 
ultimate limits of silicon in terms of transistor size, power supply, speed and density [LIM-
001][MAV-002]. The advantages are not surely without serious concerns as the circuits 
are becoming more and more sensitive to noises coming from magnetic fields, signal 
couplings and radiation fields. The requirement to protect them has become more and 
more vital [McI-002}. Although many techniques have been developed in the quest to 
avoid SEU, however, efficient fault tolerant solutions are still a challenge for the future 
generation semiconductor industry due to the complexity of the new architectures. 
2.3.2.2 Physical Origins of SEUs and SETs 
The effects of technology scaling on the single-event response of microelectronics is a 
direct result of the physics of energy loss, charge collection, and upset due to a cosmic ray 
striking a junction in an IC device. The review here is brief and qualitative. Many good 
summaries exist PAT-097I[MAS-097][SEX-0921 which review these concepts in more 
detail 
When an energetic ion passes through any material it loses energy through interactions 
with the inbound electrons, causing an ionization of the material and the formation of a 
dense track of electron-hole pairs. The rate at which the ion loses energy is known as 
stopping power (dE/dx). The incremental energy dE is usually measured in units of MeV 
while the material thickness is usually measured as a mass thickness in units of mg/cm 2 
tLAB-099IIBRY -098 1. The radiation effects community has adopted the term LET 
(Linear Energy Transfer) for the stopping power. An ion with an LET of 100 MeV-
cm 2/mg deposits approximately lpC of electron-hole pairs along each micron of its track 
through silicon [SEX-092]. In the presence of electric fields, these electron-hole pairs 
quickly separate as they drift in opposite directions in the field and are quickly collected 
by respective voltage sources. This phenomenon produces a current transient. In bulk 
CMOS designs, such electric fields are present across every pn junction in the device. If a 
heavy ion strikes a junction connected to a signal node, a current transient is subsequently 
observed on the signal. The initial prompt current pulse is short lived, lasting in the order 
of only 100 x10' 2 10 12  200x 1012  sec IMAV-0981. High energy protons and neutrons also 
produce similar effects indirectly through nuclear reactions within the silicon. In these 
cases, a heavy ion recoil reaction by-product passes through a junction and produces a 
similar charge collection current pulse. In space, high energy protons primarily originate 
44 
2: Literature Review 
from the trapped proton radiation belts and from solar flares. For high-altitude aircraft, 
both high energy neutrons and protons are encountered as reaction by-products found in 
cosmic ray showers formed when an energetic heavy ion from space undergoes a nuclear 
reaction in the atmosphere. Sometimes, terms like "Radiation Hardness" and "Radiation 
Tolerance" are used to define SEU resiliency of a particular device. For example the 
minimum value of LET threshold for radiation tolerance is 40MeV- cm 2/mg whereas for 
radiation hardness, it is 80MeV- cm 2/mg [NIC-099]. However, these terms are used 
interchangeably for SEU-hardening by design and it is not a key parameter for space 
worthiness in this particular case [NIC-099]. 
Over the last 15 years roughly, these induced currents are mainly responsible for SEUs 
observed in airborne circuits, typically in static latches and SRAMs [DOD-095]. The 
effect of these currents on a circuit depends on the response of the circuit to the charge 
collected on the particular signal node. Basically, the capacitance of the signal node 
determines how large a voltage swing dV results from the collection of a charge dQ 
according to dV=dQ/C. For latches and SRAMs, positive gain feedback loops result into a 
data bit flip once the collected charge reaches a critical value (Q), sufficient to drive a 
node voltage over the switching voltage. SEU in static latches and SRAMs became an 
important issue once feature sizes dropped below 10 microns and the critical charge for 
upsetting a circuit dropped below 1 pC (roughly corresponding to a particle with LET of 
50 MeVcm2/mg and a collection depth of 2 microns) I$EX -092 ]{DOD-095 1. Static latch 
SEU vulnerability has been calculated and measured as a function of technology feature 
size to establish a relationship between the critical charge required to upset circuit and the 
technology feature size [PAT-0821 [SEX-0921. Experimentally observed LETs for 0.8 
micron standard cell latch designs have been as low as 5 MeV-cm 2/mg and as high as 20 
MeV-cm2/mg [DOD-095] [PAT-82]. 
2.4 Conventional SEU/SET Mitigation Techniques 
In this section the emphasis is on mitigation techniques for memory due to its importance 
in reconfigurable architectures. Chapter-6 looks in detail at measures specific to logic 
design. Several obvious hardening approaches can be used to help mitigate static latch 
upsets as well as transient pulses in combinatorial logic and clock lines. The most straight 
forward approach is to use high drive transistors with increased node capacitance [WHI-
093]. High drive transistors dissipate the transient charge more quickly and high node 
capacitance reduces the voltage excursions resulting from the transient charge. This 
obviously results in physically larger circuits with longer propagation times, due to the 
45 
Chapter 2: Literature Review 
high node capacitance. These conventional approaches have successfully produced 
circuits having SEU threshold of LETs above 30 MeV-cm 2/mg for designs fabricated with 
technologies greater than 0.5 micron [PAT-097]. This approach is not very practical 
because the technology feature sizes are reaching the deep submicron level and it is nearly 
impossible to design latches with LETs exceeding 3-5 MeVcm 2/mg [PAT-0971. The first 
SEU mitigation solution that has been used for many years in spacecraft industry was 
shielding. It reduces the particle flux but it does not eliminate it completely. One example 
of shielding is the aluminium that can retain electrons but is inefficient to retain cosmic 
rains with energies of MeV to GeV. Shielding is also inefficient for neutrons [MAy-
098][WI-1I-093]. Extra techniques must be searched to avoid SEU. The primary SEU 
mitigation techniques used nowadays are based on the following considerations. 
2.4.1 Process Technology 
A solution for SEU is through process technology. The epitaxial (epi) bulk CMOS process 
may avoid latch-up completely but it comes with a price of increasing the device cross-
section. The "epi" technology can ensure sufficient reliability for some applications. On 
the other hand, Silicon on Insulator (SOI) technology is characterized by placing a thin 
layer of silicon on the top of an insulator during the chip manufacturing process. The 
transistors are built on top of this thin layer that helps in reducing capacitance [MUS-001]. 
The chip performance can be enhanced and this is the main advantage of the SO! 
technology. The isolation of each transistor makes SO! technology essentially latch-up 
free. The results presented in [COL-001] show that SOI with body ties has reduced the 
error rate by a factor of 50 to 500. In short, an appropriate use of SO! can increase the 
reliability of the device in the presence of SEU but it does not mitigate completely the 
effect of single event. 
2.4.2 Hardened Memory Cells 
Memory cells are considered as the most SEU sensitive elements in any microelectronic 
design. So, a solution that can protect memory cells against radiations is the obvious 
choice to improve the reliability for space and military applications. Different circuits 
(microprocessors. memories, ASICs, programmable circuits and others) can be protected 
against SEU by replacing all original memory cells by a radiation hardened version of the 
memory cells. The memory hardening technique is based on three basic rules. Which are: 
Technology specific design of SEU hardened memory cell 
46 
Control 	 Control 
Chapter 2: Literature Review 
• The instantiation of the new memory cell 
• Performance Evaluation (timing constraints) 
Different SEU hardened memory cells have been presented in [CAR-096]EVEL -
0941 [KAT-0OiJ. All of them are based on either duplication or triplication using a restore 
feedback approach. Each differs from each other in terms of number of transistors, 
performance and level of immunity against SEU. The most important characteristic is that 
almost all of them do not accumulate upsets, because by their design construction the bit 
flip is completely avoided. The main drawbacks are area overhead and performance 
penalty. Three different standard memory cells without SEU tolerance are presented in 
Figure-2.2 [RAB-096]. Each cell has different number of transistors, for example circuit a, 
b and c has 4, 5 and 6 transistors, respectively. The memory cell shown as Figure-2.2(b) is 
used in Xilinx FPGAs. However, circuit presented in Figure-2.2(c) is the most commonly 




Bit 1i Bit 
a) 	 (b) 	 (C) 
Figure-2. 2: Commonly Used Basic RAM Memory Cells [RAB-096} 
In the last 10 years some design hardened memory elements have been developed jBES-
0931[CAL-096][L1U -092][WHI-091 ][WIS -093 1. In the next subsections some of these 
memory cells will be presented. 
Charged particle affects the memory cells inducing a current pulse in the drain of OFF 
transistors and it can flip the memory data. Figure-2.3 presents a design of a typical 6-
transistor based memory cell. The memory cell can be affected by a charged particle. For 
example, if the Bit' is at logic "0". It will turn 'ON' P2 and Ni transistors while P1 and 
N2 OFF'. If a heavy ion hits the drain of an OFF' transistor (P1 or N2), the transistor 
will starting conducting. For example if P1 is upset by radiation, N2 turns ON' and the 
node-A will change its logic from initial value to "1". 
47 





N3 eN 	N2 N4 
Charged Particle 
Figure-2. 3: Effect of a Charged Particle on a Basic Memory Cell [RAB-096] 
SEUs are produced when a single charged particle hits over integrated circuits. As stated 
before, the SEU targets the drain of an 'OFF' transistor. When a single charged particle 
strikes an integrated circuit element, it loses energy through the production of electron 
hole pairs. This results in a dense ionized track in the local region. This ionization causes a 
transient current pulse [BES-093]. Figure-2.4 illustrates this phenomenon. 
p 
(.iat 	- 
. 	 — — 
. 	 .. 	 . 
Depleted 
Region 	 I 
I Drift 
..L. 




(a) CMOS Transistor 	 (b) Capacitor 
Figure-2. 4: Physical Phenomenon of SEU [BES-093] 
i) Hardened Gate Resistor Memory Cell 
This solution uses a resistor gate to protect the memory cell data from SEUs [WEA-087]. 
In this case, resistors have been introduced into the feedback paths. These feedback 
resistors, in conjunction with the gate capacitance, create a low pass filter to suppress the 
effects of SEU induced transients. The time constant must be in the order of a nano-second 
or so to avoid logic state changes due to particle strikes [JOH-095]. The deliberate 
introduction of delay deteriorates the performance of memory elements especially when 
the tolerance of the resistive element is taken into account. Figure-2.5 presents the resistor 
based SRAM memory cell. 
48 




Figure-2. 5: Gate Resistor Based SRAM Memory Cell[WEA-0871 
The design technique is highly temperature sensitive, performance vulnerability with 
temperature variations, and extra processing steps (extra mask) in the fabrication process. 
ii) IBM Memory Cell 
A hardened memory cell design was first proposed by IBM in a standard CMOS 
technology process [ROC-0921. It is based on 6 transistors to build the memory part and 6 
more p-channel transistors for SEU immunity to the latch circuitry. It is shown in Figure-
2.6. The design has two data state-control transistors. These are shown as PA and PB in 
the figure. The PC and PD transistors are pass-transistors and PE and PF are cross-coupled 
transistors. The sensitive nodes are highlighted in the figure (e.g. A, B, C and D). 






Figure-2. 6: Transistor Diagram of IBM Memory Cell [ROC-092] 
When a particle hits the node A, it instantly goes low and momentarily the cell is unstable 
with both nodes A and B at a relative low potential. Transistor PD momentarily turns 'ON' 
but node D cannot go low enough to turn PB fully 'ON' since transistor PF remains SON'. 
However the fully 'ON' PA transistor, reinforces the pre-hit relatively positive data state at 
node A and restores node A without logic upset. If a particle hits node B. node B 
instantaneously goes high turning transistor PC 'OFF, momentarily isolating the node C 
EU 
Chapter 2: Literature Review 
at its relative low potential. The gates of transistors P1 and Ni connected to node B. and 
results in a data feedback response. This causes the node A to go low. However, with the 
transistor PA 'ON' reinforces the pre-existing high state at node A and node A maintains 
its high logic state. Therefore node B eventually returns to its pre-hit low logic state after 
the momentary disturbance. The transistor N2 eventually pulls down the node B. and it 
recovers from the logic upset. The same can be explained for node C. The main 
advantages of IBM memory cell are its low static power and its SEU immunity. However, 
the main limitations are large transistor counts and large transistor sizes. 
iii) NASA Memory Cell 
The NASA memory cell is also known as 'Whitaker' memory cell WHI-0911. It is based 
on three fundamental concepts. 
• Redundancy in the information storage. 
• Non-corrupted feedback to recover the lost data after a particle strike. 
. The flow of the induced current by a particle must flows from n-type to p-type 
diffusion. 
The main advantage of this approach is that transistors do not need to be designed in 
special sizes. One of the drawbacks of this cell is its high static power dissipation. The 
weak devices are not driven to be fully cut-off by the degraded levels and cause a race 
condition. It then results in high static power dissipation. Based on this observation, a 
revision of the NASA memory cell was introduced [LHJ-092]. A comprehensive 
comparison of some of the most common and popular memory cells is presented as Table-
2.3. 
Table-2. 3: comparison of SEL hardened memory cells 
COMPARISON OF SEU HARDENED MEMORY CELLS 
IBM Memory Cell [ROC-092] 
Salient Features Advantages Disadvantages 
Total of 16 transistor with different Technology independent Large number of 
size; 6 transistors for memory part, Low static power transistors (16) 
6p channel transistors for SEU dissipation; 
immunity and 4 transistors for SEU immunity (LET 
read/write, up to 74MeV.cm 2/mg).  
NASA-] Memory Cell [WiiI-091] 
Salient Features Advantages Disadvantages 
Total 16 transistors with different Technology independent Large 	number 	of 
size; It has two parts. One with p- SEU immunity (LET transistors (16): 
50 
Chapter 2: Literature Review 
channel transistors and other with up to I20MeV.cm2/mg). Static power 
only n-channel transistors; Dissipation 
NASA-2 Memory Cell [LIU-0921 
Salient Features Advantages Disadvantages 
Improved version of the No static power dissipation. The Transistor size. 
Whitaker's SEU hardened CMOS Reduced number of Latch-up faults Above 
Memory cell with 	14 transistors. transistors (14). 30 MeV.cm2/mg. 
Ganaris Memory Cell [WIS-093] 
Salient Features Advantages Disadvantages 
Composed of AND-NOR and OR- Applicable to both Long recovery time 
NAND SEU immune cells with 8 combinational and Large leakage current 
transistors for each cell. sequential logic problems 
SEU immunity (LET up to 
120 MeV.cm/mg).  
HIT-] & HIT-2 Memory Cell [BES-O 93]1 VEL-094] 
Salient Features Advantages Disadvantages 
Total of 12 transistors arranged as Small number of transistors Long Recovery Time. 
two storage structures with feed SEU immunity (LET 52 Inherently slow design 
back paths. MeV.cm2/mg).  
SGS Thomson Memory Cell [CAL-096a] 
Salient Features Advantages Disadvantages 
Based on Logic Redundancy Cells Small number of inverters. Large transistor size 
(LR) known as DICE (Dual Low power dissipation. Fails with multiple bit 
Interlocked CEll). SEU immunity (LET up to upsets. 
Total 12 transistors with a 50 MeV.cm2/mg). 
symmetric structure. I _______________________ 
2.4.3 Error Detection and Correction Techniques 
EDAC techniques [PET-080] have been in use for years to protect digital data against 
errors in storage cells and in transmission channels. The technique is based on an encoding 
and a decoding algorithm in order to restore the correct value. The encoding and decoding 
of data can be done either in hardware. software or a combination of both. There are 
several types of codes. Hamming code [Hamming] technique is a suitable solution for both 
circuit design and software level. Using hamming code as a SEU mitigation solution in the 
design of a circuit, extra logic blocks are required to code and decode the values such as 
registers and internal memory. The main limitations 01 t.UAU tecflniques are area 
overhead, performance penalties, faults in encoding/decoding circuitry itself and the 
necessity of refreshing in order to avoid accumulation of upsets. The topic is discussed in 
detail in chapter-4. 
51 
Chapter 2: Literature Review 
2.4.4 Hardware Redundancy 
Redundancy techniques such as duplication and triplication are commonly used for 
designing reliable systems to ensure high dependability and data integrity. Triple module 
redundancy (TMR) was first introduced in the 1980s [McI-080]. TMR is a circuit 
hardening approach that does not rely on any increased capacitance in the layout, and is 
therefore suitable for deep-submicron designs. Basically, three redundant latch paths 
(with common input) are voted to provide the correct result. The voter is usually 
performed on a bit-by-bit basis [McI-080]. The voter is the single point of failure in the 
TMR design because it is a combinational logic circuit. Additionally, the voter logic can 
be equipped with transistors that are sized large enough to have a high tolerance of 
environmental conditions. The main drawback is area overhead and power, which is very 
significant for space applications. The TMR approach does not provide any inherent 
immunity to errors induced through SETs in RESET, control lines, and/or the clock line. 
Clock SET induced faults are particularly problematic for commercial design approaches 
where both the clock and its complement (both needed for CMOS implementations) are 
buffered locally within the latch and thus introduce very low capacitance nodes susceptible 
to SETs at very low LETs. 
A relatively newer approach, developed in the mid 1990s, effectively replaces TMR with 
special latches that can only be upset if two critical nodes are simultaneously struck by a 
heavy ion. The first such latch was developed by Dooley [DOO-0901 and required four 
times the number of transistors as a conventional latch. A subsequent design, the DICE 
(Dual Interlocked Cell) latch [CAL-096] only requires twice the number of transistors as a 
normal latch. Although the original motivation of these designs was the same as TMR (to 
reduce static latch upset rates), the DICE-based approach can go beyond TMR techniques 
and also mitigate several SET induced en -or modes. Among other limitations, the most 
severe limitation of this method is that for SET width (i.e. TW). If it exceeds delay Time 
(TD) by even the smallest amount, all transient immunity is abruptly lost. Because of the 
inherent problems associated with this technique, special circuit techniques are employed 
to ensure that the delay (TD) was always long enough. 
2.5 Hardening Techniques for General Purpose 
Reconfigurable Architectures 
The majority of recent radiation hardening techniques have been developed for 
commercial general purpose reconfigurable cores (FPGAs). As discussed in earlier 
52 
Chapter 2: Literature Review 
sections, FPGAs can be classified into three basic types according to their underlying 
technology. The most common SEU hardening techniques for each type of FPGAs are 
discussed in the following sections. 
2.5.1 SEU Hardening for SRAM Based FPGAs 
SRAM based FPGAs are fast programmable reconfigurable fabrics with a configuration 
bit-stream. The overall functionality can be changed at anytime by loading in a new bit-  
stream. This feature makes them the most appropriate choice for space applications (for 
example satellites, spacecraft, airplanes, etc.), where the reconfigurable capability of the 
architecture can be very useful to solve problems and enhance performance. 
Xilinx is the key player in the field of SRAM based FPGAs nowadays. Virtex family 
[CAM-099] is one of the most popular and fast families of FPGA. Virtex has become a 
common ASIC replacement in commercial markets due to its density, performance, and 
wide range of capabilities. Altera on the other hand is another prosperous company that 
fabricates SRAM-based FPGAs and one of its popular products is called FLEX. 
FPGAs have many SEU-induced failure modes that conventional circuits do not have. For 
example, a wire might not connect the same two endpoints. An input may suddenly be 
coming from somewhere else. There are many ways to possibly categorize the failure 
modes. One way to look at this problem is to classify the errors on the basis of routing 
errors' vs. logic errors'. On the other, hand the classification can be done on the basis of 
configuration data, as each configuration bit has a specific function - such as turning off a 
buffer. Multiplexers constitute a large part of the routing network for Xilinx Virtex family. 
Among many other signals, all circuit inputs and outputs are multiplexed. Multiplexers are 
very sensitive to SEUs because any change in their select lines will cause a different 
routing configuration. An example of Mux select failure is shown in Figure-2.7. 
SEL 
(a) Before SE!J 	 th After SEU 
Figure-2. 7: Mux Failure Example 
53 
Chapter 2: Literature Review 
AOiveNre P 	'OFF' 	Active vdTeE 
ConSguraboB.l 	- 
(91 Normal Opecabor Originally Disconnectac 
Source Vitro P 	'ON 	Load Wre 
nSguorSonBt  
	
Active Vitro P 	'ON 	Active Vitre F 
Contgurab 	 W SEU 
(t Abnormal Opecabor - Shod Circuit Failurr 
Source cure p 	OFF 	Load Wire 
Confluiion Br! 	 SLI I 
(c Normal Operairor Originally ConneOnri 	 lc i Abnormal Operation - Open Circuit Failure 
Figure-2. 8: PIP Failure Modes 
Programmable interconnect point (PIP) is another main component of the Virtex routing 
network. A PIP is a pass transistor between two wires that can have two states ('ON' or 
OFF'). This is used to connect or disconnect two wires. PIPs can cause a few different 
kinds of SEU-induced failures. These are illustrated in Figure-2.8. The figure explains 
'open circuit failure' and 'short circuit failure' for PIP. As explained before. PIP is 
essentially a pass transistor therefore, it is bi-directional. The other component of the 
Virtex routing network is buffer. A buffer in this context is a driver which can either be 
turned 'ON' or 'OFF'. As shown in Figure-2.9, the buffer has two failure modes 
associated with it and they are very similar to the PIP failures. The main difference here is 
that instead of a pass transistor, the failure is being caused by an active driver and, 
therefore, is uni-directional. 
'OFF 
Active Wre A 	 Active Wire B 
Configuration Bit 0 










Configuratio 	 SEU 
It Abnormal Operation Short Circuit Failure 
'OFF 
Source Wire P 	 Load Wire B 
Configuration Ni C - 	SEU 
(C Normal Operatior Originally Connectec 	 IC Abnormal Operatiori Open Circuit Failure 
Figure-2.9: Buffer Failure Modes 
54 
Chapter'-: Literature Review 
The next paragraphs discuss very briefly some solutions to mitigate single event upsets in 
SRAM based FPGAs. Detailed discussion can be found in [KAT-094][KAT -097][XIL-
98b] [CAM-0991. 
2.5.1.1 Module Redundancy 
The most common approach to mitigate the effects of SEU is through module redundancy. 
It is a simple method which employs replicating instances of an entire module and 
mitigating the error effects at the final outputs of the modules using a voter as illustrated in 
Figure-2.10. 
The advantages of this method are that it may be a single chip solution which is a factor in 
cost saving. The obvious disadvantage is the limitation on the design size. However, most 
SRAM based logic devices cannot reliably implement the voter function because the 
voting circuit itself would have to be implemented in SRAM cells just as any other 
Boolean function would be, and is therefore itself equally sensitive to upsets. The Virtex 
architecture uses the Tri-State Buffers (BUETs) that are composed of a hard-wired AND-
OR logic structure and considered reliable as shown in Figure-2.10[ALF -098 ]. Module 




Figure-2. 10: Module Redundancy Technique {ALF-098] 
• Logic Partitioning and Redundancy This technique is employed for designs that are 
approximately 1/3 of the device size. The design is partitioned into sub-modules small 
enough to be replicated and mitigated within a single device and spread across several 
devices. This solution is presented in Figure-2.11. This partitioning may affect system 
performance due to extra interconnections at device level. 
55 
Chapter 2: Literature Review 
MAk 	Mojie 	 Modu. Modthe- Module 
E E 
T 	Hil 
Figure-2. 11: Module Partitioning 
• Logic Duplication An alternative to logic partitioning is logic duplication. This 
technique is used where the design is approximately less than 1/2 the size of the total 
device. The modules are duplicated in two FPGAs. The obvious disadvantage of this 
method is the use of multiple FPGAs and this approach is essentially a SEU detection 
technique. It is not enough for SEU correction. 
2.5.1.2 Device Redundancy 
A commonly known method for SEU mitigation is "triple module redundancy with 
voting." As explained in the previous sections, this mitigation scheme uses three identical 
logic circuits performing the same task in tandem with corresponding outputs compared 
through a majority vote circuit. Triple device redundancy and mitigation is until now the 
most rock-solid mitigation method for SRAM based FPGAs. This is shown in Figure-2.12. 
MI11GAT1ON 
DEICE 
Figure-2. 12: Device Redundancy-Triple Device Redundancy 
2.5.1.3 Correcting SEU through Partial Configuration 
An efficient SEU mitigation technique can be defined as that which can detect the effects 
of upsets as well as correct the upsets. In some systems SEU detection and correction 
errors can achieve an acceptable level of reliability by partial configuration [XIL-002]. A 
configuration bit-stream can be read/written through configuration interfaces "Read-back' 
is a post-configuration read operation and 'Partial Reconfiguration" is a post-configuration 
write operation to the configuration memory [XIL-000]. Read-back and Partial 
56 
Chapter 2: Literature Review 
Reconfiguration allow a system to detect and correct faults caused by SEUs in the 
configuration memory without disrupting FPGA normal operations. Scrubbing is another 
technique where the entire CLB frame segment at a chosen interval is reloaded 
[XIL-002]. This method reduces substantially the overhead but requires the 
configuration logic to be in the write-mode for a great percentage of time. 
2.5.2 SEU Hardening for Anti-fused based FPGAs 
Anti-fused FPGAs implement user defined logic blocks and the routing through 
open/close anti-fuses. These anti-fuses are considered fairly immune to the radiation 
upsets. However, the latch and flip-flops in anti-fused based devices are equally sensitive 
to radiation induced upsets as in SRAM based FPGAs [KAT-098]. 
The combinatorial portion of the user defined design in the anti-fused based FPGAs have 
an advantage in terms of SEU mitigation when compared with SRAM based FPGAs as it 
is realized in the form of anti-fuses instead of latches (LEJT). The most well known 
company in anti-fused based FPGAs is Actel [ACT-097]. The interconnection matrix is 
composed of rows of logic blocks and routing channels. There are two kinds of logic 
blocks in the Actel. One is combinational logic blocks (C-module) and other is sequential 
logic blocks (S-module). These are shown in Figure-2.13. 












Figure-2. 13: C-Module and S-Module by Actel [ACT-097] 
The S-module can implement the same combinational logic as the C-module, and it also 
contains a flip-flop that can be configured in different modes to add flexibility to the 
design. This flip-flop is called SFF. There are two hardened families from Actel: RH 1280 
and RHI020. The most sensitive elements of the Actel FPGA are the flip-flops. These flip-
flops must be protected to avoid upsets. There are two techniques for SEU mitigation 
proposed by Actel. The first obvious one is to avoid the use of the flip-flops in the FPGA 
matrix. In other words, the SFF in the S-module must be avoided. For this first solution. 
the two logic blocks using only the combinational logic parts of the logic blocks must 
57 
Chapter 2: Literature Review 
implement a flip-flop in the system. This solution is known as bypass S-module. The 
second proposed technique by Actel is to triplicate the implementation of a flip-flop and to 
vote the correct output. This solution is basically a TMR. Figure-2.14 illustrates this 
scheme. 
Figure-2. 14: Actel TMR Implementation [ACT-00971 
2.5.3 SEU Hardening for EPLDs 
EPLDs are Programmable Logic Devices that are programmed through EEPROM. EPLDs 
are electrically programmable devices. They differ from standard FPGAs in terms of 
interconnections. The logic structure is based on arrays of OR/AND logic cells. Their 
performance and density are usually smaller as compared to SRAM-based FPGAs. Altera 
is one of the companies in the market to produce EPLDs. The most commonly used 
families from Altera are MAX. There are no Hardened families of EPLDs proposed by 
Altera (ALT-098]. The SEU mitigation techniques in the EEPROM element are based on 
the technology based mitigation. The foundry must use a specific process to avoid or 
reduce the transient current generated by the charged particle hit. 
There is no example of any domain specific reconfigurable architectures being designed 
specifically for SEU resiliency. Some of the domain generic architectures like RICA have 
been introduced with fault tolerance but its details are not published. 
2.6 Summary 
This chapter introduces reconfigurable computing and their use in different applications. 
The literature review has been done with respect to general purpose and domain-specific 
reconfigurable architectures. Various reconfigurable architectures are discussed in this 
58 
2: Literature Review 
chapter along with their advantages and inherent limitations. The chapter has looked also 
the concept of domain generic reconfigurable architectures 
There are special issues with the use of microelectronic circuits in general and 
reconfigurable cores in special, into aerospace related industry. Different radiation based 
faults are discussed. Conventional SEU mitigation techniques are discussed along with 
their design features. The Chapter introduces various available options to make the general 
purpose reconfigurable architectures resilient towards radiation. This research mainly 
deals with single event upsets and all the literature review has been done with this 
perspective. There is no SEU hardened domain specific reconfigurable architecture for 




RECONFIGURABLE FABRIC FOR 
DISCRETE WAVELET TRANSFORM 
3.1 Introduction 
Multimedia processing on hand-held devices, such as mobile phones. requires significant 
computational power to execute specific computations such as DWT. Digital Signal 
Processors (DSPs) provide a hardware solution but due to the high operating frequency, 
they result in high power consumption. The other solution is a dedicated hardwired logic. 
This solution reduces the power consumption considerably, at the expense of reducing 
flexibility in the hardware. The specification for such algorithms consistently change over 
a period of time, hence it is important to have a flexible architecture that can accommodate 
such changes and integrate them within a system-on-chip (SoC) design overflow. A 
possible solution could be FPGA which provide high-flexibility and low-cost but at the 
expense of increased power consumption and large integration costs [COM-0991 
During recent years, a number of research efforts have been focused on the design of new 
reconfigurable systems for general purpose and for particular areas of application. The 
main motive behind such growth in the number of research activities is the potential of 
reconfigurable computing to greatly accelerate a wide variety of applications. The work in 
this area flows in two major directions. The first hardware oriented direction is geared 
toward designing new hardware architectures or optimizing the current architectures. The 
second software oriented sub-area of research is focused on the investigation of new 
placement, routing, and mapping methods that tackle the dynamic reconfiguration 
challenges [COM-099}. 
Domain-specific arrays are less flexible than generic FPGA but in a particular domain 
related computation, these are more efficient over generic reconfigurable fabric. 
Furthermore, image compression techniques such as DWT have a number of different 
EM 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
algorithms each with different advantages in terms of quality, power-consumption and 
processing time. Reconfigurable arrays provide an efficient platform for mapping a 
number of these possible implementations and switch between them dynamically 
depending on overall system requirements. There are two types of DWT, one is 5/3 for 
loss-less image compression and the other is 9/7 which is lossy. Both, 5/3 and 9/7 DWT 
computations can be computed through different algorithms which are based on either 
convolution, lifting or integer based DWT [DAU-092][ANT-092]. Each has their own 
merits and limitations depending upon quality and application. The Consultative 
Committee for Space Data Systems (CCSDS) data compression working group has 
recently adopted a recommendation for image data compression [PEN-005]. The 
algorithm adopted in the recommendation consists of a two dimensional discrete wavelet 
transform of the image, followed by progressive bit-plane coding of the transformed data 
tWHI-0911. The algorithm can provide both lossless and lossy compression [PEN-005]. 
As mentioned, CCSDS is based on 2D-DWT, which further justifies the selection of DWT 
as target application for this research. 
In this chapter, we present a novel domain specific reconfigurable fabric. The details of the 
design consideration are elaborated. Discrete Wavelet Transforms are chosen as a domain 
for these architectures due to numerous reasons as discussed above and later in the 
chapter. A brief theory of wavelet transforms is discussed with different implementation 
options. The architectures are novel in the sense that currently there are no reconfigurable 
fabrics which can efficiently implement different DWT algorithms such as Lifting and 
Integer based DWT, etc. The implementations have different characteristics in terms of 
array usage (Area usage and power consumption) which are explored in this chapter. The 
new architecture demonstrates the ability to support these implementations which are well 
suited for high-flexibility, demanding applications such as JPEG-2000. Later in the 
chapter, the design of reconfigurable architecture is discussed along with the design flow. 
At the end, performance evaluation is performed and the results are compared with 
different implementation options. 
3.2 JPEG-2000 
The heart of the JPEG 2000 standard is wavelet-based compression, which has several 
advantages over the Discrete Cosine Transform (DCT), used by its predecessor (JPEG). 
The DCT compresses a still image into 8x8 blocks and places them consecutively in the 
output file. Since the blocks are compressed individually and without reference to 
adjoining blocks, the compressed image has a distinctive "blocky" quality - more apparent 
61 
wow 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
in highly compressed images. High compression levels imply that only the most important 
information is kept to convey the essentials of the image. However using this technique, 
most of the subtlety that makes for a pleasing, continuous image is lost [ABO-001]. 
Figure-3.1 shows a comparison of the compressed image through JPEG-2000 and JPEG. 
at 9C usina JPEC (22 KE 
I 
	
— a- - - 
L 
Image compressed v 	osing JPEC200C 12 KE, Image comprcsse-. 	•sq PEC2cflC Ill KE 
Figure-3. 1: Comparison of JPEG and JPEG 2000 compressed images [PA-000]. 
Wavelet compression works by converting the image into a series of wavelets, which can 
be stored more efficiently than pixel blocks. Images compressed through Wavelet 
transform have rougher edges, but are better rendered since the "blockiness", which is 
apparent using DCT based compression, is eliminated. Thus a JPEG-2000 image 
compressed using wavelets will have smoother colour toning and clearer edges in regions 
of sharp colour, as compared to a JPEG image with the same level of compression using 
the DCT [ABO-001]. 
3.2.1 JPEG-2000 Compression 
The image compression is accomplished using the JPEG-2000 encoder, similar to most 
transform based coding schemes, as shown below in Figure-3.2. 
4 	 ENCODER
J OmreSSedImage QUANTIZER 
Figure-3. 2: JPEG-200() Encoder Block Diagram IJPG-2001 
62 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
DWT is first applied on the source image data. The transform coefficients are then 
quantized and entropy coded, before forming the output. The decoder is the reverse of the 
encoder. Unlike other coding schemes, JPEG-2000 can be "lossy" or "lossless" depending 
on the wavelet transform and quantization [ABOU-001 1. The image is decomposed using 
DWT into a series of decomposition levels, each containing a number of sub-bands. The 
sub-bands describe the horizontal and vertical characteristics of the original image 
(Figure-3.3). A 1-dimensional (1D) sub-band is decomposed into a set of low-pass and 
high-pass samples. Low-pass samples represent a smaller low-resolution version of the 
original. The high-pass samples represent a smaller residual version of the original and are 
necessary for perfect reconstruction of the original image from the low-pass samples. 
t • ' 
qir 
Figure-3. 3: Two level Decomposition showing sub-hands ABO-0011. 
All of the wavelet transforms employing the JPEG-2000 compression method are 
fundamentally one-dimensional in nature. A one-dimensional transform applied first in the 
horizontal and then the vertical direction, forms a two-dimensional transform. This 
decomposes the original image into four smaller image blocks: 
• One with low vertical & low horizontal resolution (LL), 
• One with high vertical & low horizontal resolution (HL), 
• One with low vertical & high horizontal resolution (LH), 
• One with all high vertical & high horizontal resolution (HH). 
This process of applying the one-dimensional transform in both directions is repeated a 
number of times (as required) on the low-resolution (LL) image block. This procedure is 
called dyadic decomposition, as shown below in Figure-3.4. 
Me 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
IEI 
2LH 	2HH 
I LH 	 1Ht 
Figure-3. 4: Dyadic Decomposition {ABO-001] 
3,3 Discrete Wavelet Transform (DWT) 
Wavelet Theory (WT) is applicable to various problems in signal processing for example; 
sub-band coding leading to efficient data compression multi resolution signal processing 
for pattern recognition and many more. The Wavelet Theory allows a very general and 
flexible description to transform signals from the time domain to a time- frequency like 
domain, the so-called timescale domain. This representation is a very useful alternative to 
the Window Fourier Transform or Short Time Fourier Transform (STFT) for the analysis 
of non-stationary signals, because it provides for many applications a more sufficient 
resolution at the time-frequency plane. Compared to the STFT, the WT uses short 
windows for high frequencies resulting into a good time resolution and larger windows for 
low frequencies to give a good frequency resolution. 
The wavelet transform gives a time-frequency representation of a signal - similar to the 
Fourier transform. It was developed to overcome the shortcomings of the Short Time 
Fourier Transform (STFT). While the STFT gives a constant resolution at all frequencies, 
wavelet transforms utilise multi-resolution techniques, analysing different frequencies at 
different resolutions [ALI-099]. Wavelet analysis is similar to STFT analysis in the sense 
that the signal to be analysed is multiplied by the wavelet function and the transform 
computed for each segment. However, for the continuous wavelet transform, the width of 
the wavelet function changes for each spectral component. As a result, the wavelet 
transform gives a good time but a poor frequency resolution at high frequencies, and a 
good frequency but poor time resolution at low frequencies. The equation for the 
(continuous) wavelet transform (CWT) is given below [VAL-004]. 
	
- 1 	jr_ri 
fx(t) '1' I 	Idt 	 (3.1) 
L s 
RTIJ 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
Where x(t) is the signal to be analysed and 'P(t) is the mother wavelet or the basis 
function. The parameter r is a translation parameter and determines the position of the 
wavelet with respect to the signal - providing the time information, while s is the scaling 
parameter - providing the frequency information. The scaling factor either expands or 
compresses a signal - thus large scales will expand the signal giving detailed information 
within the signal and small scales will compress the signal [VAL-0041. 
3.4 DWT Implementation Techniques 
Most DWT video compression is implemented as a dyadic sub-band coding process. Large 
parts of these processes are implemented as FIR filters for both the high and low band 
processing. The large amount of resource needed for realising DWT seems to be necessary 
and inevitable, when considering both the dyadic sub-band coding and FIR filtering 
needed for DWT. However, there are many redundancies in the DWT system which can 
be eliminated or minimized. The following sub-sections describe an overview of DWT, 
the conventional DWT implementation techniques. As the theory of DWT and the lifting 
based scheme are well documented [SWE-096][DAU-098]IVAL -004][S 1D098 ] ,  it will not 
be covered comprehensively in this chapter. However, the relevant part of the transform 
implementations will be presented in the following sub-sections. As most DWT are 
implemented with FIR filters, the effect of using a FIR filter for its implementation will be 
considered briefly. However, the theory on FIR filters will not be covered in this thesis as 
it is well established and documented OPP-0971[PRO-096 ]. 
3.4.1 Direct Form Structure 
The direct form structure (also known as convolution based) consists of a set of low-pass 
(LPF) and high-pass (HPF) filters followed by decimators. This is known as the analysis 
stage. The output of the analysis stage is processed based on the application. The original 
image can be recovered by the synthesis stage. The synthesis (reconstruction) stage 
consists of up-samplers followed by similar high-pass and low pass filters [VAL-004]. 






Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
Figure-3.5: Direct Form Structure for a Two Dimension DWT [MAR-0991. 
Conventional realisations of DWT are direct implementations. The conventional 
convolution-based DWT is computationally intensive and area wasting as all the image 
pixels are put through all of the 6 filtering processes just for a 2-D single level of 
decomposition or re-composition. This makes the traditional implementation of DWT not 
viable as it is power hungry. Conventional DWT wastes redundant computing power and 
memory space on processing and storing data vectors, since half of these will be dropped 
during the down-sampling or sub-sampling later on in the process as can be seen from 
Figure-3.5. By analytically removing these redundancies by an algorithmic method, great 
savings can be achieved. 
3.4.2 Polyphase Structure 
In the Direct Form Structure discussed previously, the image (signal) is first filtered and 
then down-sampled. Thus half of the filtered samples are redundant. If the down-sampling 
is done before the filtering, it will reduce the number of computations and thus improve 
the efficiency (by 50%). The input signal is split into odd and even samples (resulting in 
automatic decimation by a factor of 2). The filter coefficients are also split accordingly 
into odd and even components. The two phases (odd/even) are added after filtering to 




Figure-3. 6: Polyphase structure of DWT MAR-099] 
3.4.3 Lifting Based Structure 
The lifting scheme was developed by Sweldens et. al. [SWE-096] and allows for very 
efficient implementations of the DWT using integer wavelet and scaling coefficients 
instead of floating-point coefficients. The drawbacks of the conventional convolution-
based implementation are overcome by the lifting-based DWT algorithm (Figure 3.7). The 
lifting scheme was devised not only to reduce the computational requirement of DWT, it 
can also be used to obtain a DWT filter pair that is invertible or reversible. However, the 
66 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
details of how to find the reversible filter pair will not be covered here. The lifting-based 
DWT reduces computational and area redundancy by: 
Firstly, taking into account the redundancy of down-sampling and avoiding 
computation of the output vectors that will eventually be dropped. 
• Secondly, exploiting the similarities between the low-pass filter (LPF) and high-
pass filter (HPF) to further reduce redundancies. 
By considering these computational redundancies, the lifting-based scheme can reduces 
more than 50% of the computational requirement compared to a conventional DWT. The 
lifting scheme reduces redundancy and utilises similarities within the transform by 
factorising the original LPF and HPF equations into common factors. These commons 
factors are then efficiently combined again later on to obtain the final sub-sampled 
outputs. The common factors are incorporated into two sets (predict and update) that are 
responsible for producing direct down-sampled low pass (odd terms) outputs and high pass 
(even terms) outputs. 
The source data is first split into odd and even samples. Then a predict step followed by an 
update step is performed. The predict step basically tries to predict the odd samples using 
the even samples (or vice-versa) with a prediction operator. This is possible because the 
consecutive samples are highly correlated (especially at the start of a stream). Thus the 
high-pass sub-band is lifted using the low-pass sub-band. The update step consists of 
lifting the low-pass sub-band using the high-pass sub-band, and is done in order to 
maintain the statistical properties of the original input stream (e.g. average) within the 







Figure-3.7: Lifting Based Structure [SWE-0961 
A distinct advantage of the lifting structure is that transformed data can take the same 
place as the input data. Thus the transform can be performed without the need for 
additional memory, which is very useful for embedded or SoC applications minimising on 
67 
3: Reconfigurable Fabric for Discrete Wavelet Transform 
expensive space [VAL-004]. However its limitation is that the filtering units cannot 
operate in parallel as each filtering unit depends on results from the previous filtering unit 
[BEN-002]. 
The mathematics behind the wavelet transforms is fairly complex and the proof required to 
reduce the Continuous Wavelet Transform (CWT) to the relatively simple Lifting Based 
DWT is beyond the scope of this thesis. 
33 Reconfigurable SoC Platform 
The proposed overall system contains processors and DSP along-with a number of 
embedded reconfigurable arrays (RAs) as shown in Figure-3.8. Each RA can be specific to 
a particular computation such as DWT, etc. The system also allows a provision of the 
combined array that may target multiple computations. The RA can be easily incorporated 




IP 	 I)WF 
Figure-3. 8: Reconfigurable System-on-Chip 
The RA is a synthesizable core mapped on 0.18im CMOS technology. The RA itself 
consists of programmable clusters which are interconnected through configurable switches 
(discussed later in the chapter). The clusters define the functionality of the array. Each RA 
is heterogeneous and contains different types of clusters, with each cluster specific to one 
operation. The clusters that constitute the RA can be chosen at design-time according to 
the requirements and constraints. The domain and the degree of flexibility for a RA can be 
set through the choice of the clusters and interconnects used. The RA is provided as a soft-
core which can be simulated, synthesized and routed as a normal ASIC core. This allows 
RAs to be incorporated into design flow of the full SoC, making the design and 
verification of the system easier. In addition, the RA can be configured by a processor or a 
DSP. This can be done dynamically to allow the adjustment of the RA's operation at run-
time. The data from and to the array is read and fed by the processor through on-chip bus. 
RM 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
= I GE L 	L —>— , 
I/ _/ - 
E —I 
0.) econhgrabe Ar,a 
K TRANSPOSE < 
Figure-3.9: 2-D DWT Mechanism 
The RA computes the 1-D DWT. The output of the DWT array is stored in the main 
memory and then transposed to be fed into the same RA to get the 2-D DWT output. The 
reconfigurable array is reused in order to get the 2-D DWT output. Figure-3.9 illustrates 
the process. The controller can be a dedicated hardware or can be implemented in 
software. 
3.6 Reconfigurable Fabric For DWT 
Various steps are involved in the design process. The design process starts with the 
decision of the target application for the proposed architecture (Domain). It is worth 
mentioning that the design engineer will have the domain information before starting the 
design work. As explained before, DWT is selected as the domain for this research work. 
The first step in the design process is a careful study of the different implementation 
options. These implementation options can be limited depending on the nature of the 
application or it can be unlimited. If the options are limited then maximum area and power 
savings can be achieved while maintaining the 100% required flexibility. If the options are 
unlimited then a comprehensive study of all the options is required to incorporate 
maximum flexibility while maintaining the performance edge over generic FPGA. The 
Design flow of the proposed architecture is presented in Figure-3.l0. 
The study of a particular application (literature review) helps to determine the essential 
blocks required for the array. Obviously, these blocks are domain specific. After 
determining the basic logic blocks, the configurability of these blocks is decided according 
to the required flexibility. The flexibility can be defined as: 
Flexibility of each basic element 
Granularity of each basic element 
69 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
The flexibility determines the different functional configurability of the logic block while 
granularity means the flexibility in physical implementation for different bit-widths. The 
basic elements are realized in a hardware language. The interconnect scheme is based on 




Different clusters of the proposed architecture are interconnected through interconnects 
and then synthesized through Synopsys's Design compiler. The netlist of the synthesized 
RA is used for placing and routing through VPR tool. The outcome of VPR interms of 
configuration bits for the proposed architecture was used in post synthesis simulation 
alongwith the netlist file. Active HDL is used to verify synthesized design. The details of 






Figure-3. 10: Design Flow Diagram of Domain Specific Reconfigurable Fabric 
3.7 Reconfigurable Logic Blocks 
The clusters for the reconfigurable arrays targeting DWT computations were designed for 
different DWT algorithms including 5/3 and 9/7, with varying performance requirements 
70 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
of speed, power and area. The following clusters were identified as common reusable 
blocks and arranged in the proposed reconfigurable array (RA). The clusters are designed 
with the consideration that the RA can be used fully/partly for any general purpose 
computations when it is not being used in DWT domain. This feature makes it quite an 
attractive choice for hand-held multimedia applications. 
3.7.1 Add-Subtract Cluster 
The Add-Subtract is the basic module in the proposed array and it performs different basic 
arithmetic operations [BALO-06c]. The add-subtract cluster can be configured in the 
following ways: 
adder/subtractor 
• Performs A-B and B-A operation 
. Accumulator 
The proposed CLB can be programmed for the above mentioned functionality with the 
help of configuration bits (Figure-3.11). The proposed block has four different types of 
configuration bits for example, ADD/SUB, Accumulator, INTIEXT and A-B/B-A. The 
configuration bit ADD/SUB is used to determine addition and subtraction as the basic 
operation of the configurable block. The Accumulator and INTIEXT is used for the 
addition and accumulation process. The basic module/CLB is 8-bits wide, four modules 
are grouped into a cluster and configurable switches (discussed later in the chapter) are 
provided between them to support cascading to get wider bit ranges (up to 32-bits). Even 
wider bit ranges are possible for different operations by cascading multiple clusters 




Figure-3.1 1: Design of a Reconfigurable Add/Subtract Block 
71 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
The different configurations of Add-subtract clusters can be selected through configuration 
bits reserved for the cluster. The precision of the calculation can be selected by 
configuring the cluster depending upon how many modules are required (basic module is 
8-bits wide). 16 bits are required for the loss-less calculation based upon 5/3 [BEN-004], 
which can be achieved by incorporating two basic modules through configuration 
switches. Flexibility in selecting the bit-widths makes the architecture more versatile. 
3.7.2 Coefficient Multiplier Cluster 
The second basic module in the proposed allay is the coefficient multiplier cluster 
[BALO-06c]. As explained earlier, the DWT is basically a filter operation that requires a 
multiplier for multiplying different filter coefficients with the input samples. These filter 
coefficients within the DWT array are multiplied through reconfigurable coefficient 
multiplier clusters. Generally, multiplication is the most expensive operation in terms of 
power consumption and time. Many research efforts have been invested on the topic to 
find the optimum solutions. 
It is necessary to study the specific domain to specify design specifications and constraints 
for the logic block. The array has been proposed for DWT domain for JPEG-2000 and it 
targets different types of DWT implementations. The JPEG-2000 defines two types of 
DWT operations (5/3 and 9/7). The 5/3 is simple to implement in terms of the multiplier 
because all the coefficients are multiples of two and can be realized in hardware through 
appropriate shifting operations. However, the 9/7 based DWT has floating point 
coefficients. JPEG-2000 standard defines 9/7 DWT for lossy image compression [JPG-
200]. Mahesh et. al. [MAH-098] proposed a technique for an area efficient implementation 
of multiplier-less FIR filters. The technique is based on representing a floating point filter 
coefficient through canonical-signed-digit representation (CSD) [MAI-I-098]. Filter 
coefficients for the 9/7 lifting based DWT are incorporated through CSD and explained in 
Table-3.l. 
Table-3. 1: 9/7 Filter coefficients in CSD form 
VALUE 12-BIT CSD REPRESENTATION 
Alpha 1.586134342 = 1.5861816 
Beta 0.052980118 242729+212 = 0.0529785 
Gamma 0.882911076 2u2+27 = 0.8828 125 
Delta 0.443506852 2 - '-2 
-4  +2 -2+2' 2  = 0.4436035 
72 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
Figure-3.12 illustrates the internal structure of the reconfigurable coefficient multiplier 
block incorporating canonical-signed-digit representation. The reconfigurable block has 
three internal sub-modules which are: 
Configurable Shifter Module 
. add-Subtract (add-sub) Module 
Multiplexer Module 
Each internal sub-module can be configured through individual configuration bits (cfg) for 










 Shifter cfg 





Figure-3. 12: Reconfigurable Coefficient Multiplier Block 
These sub-modules (programmable shifter, add-subtract and multiplexer) accommodate a 
wide range of coefficients suitable for different DWT implementations. The cluster can 
handle up to 32-bit operation to facilitate the required precision. The configurable shifter 
performs the multiplication and division depending upon the algorithm. It can be 
configured to multiply or divide by any even integer value between 2 and 32. The add-
subtract is the same as explained before. The multiplexer is configured to select one of its 
inputs based upon the DWT algorithm. The decision to incorporate adders and multiplexer 
inside the cluster is inspired to keep the interconnect load minimum. As the reconfigurable 
array is designed for a specific domain and by incorporating these modules inside the 
multiplier cluster, we can achieve better performance while keeping the required 
flexibility. 
The configurable shifter block can perform either shift-right or shift-left operations to 
accommodate division and multiplication. Five reconfigurable bits are reserved to 
73 
r 3: Reconfigurable Fabric for Discrete Wavelet Transform 
determine the amount of shift required. Figure-3.13 explains the logic along with the 
required configuration bits. The Mult/Divide' input is used to determine between division 
and multiplication and enable input outputs the required result at the output. The output is 
set to zero if the block is not used. 
5 
0_ 
Shifted b)  
U L 
ContigurabI 
INPUT 	 Shiftei 
Block 
OUTPUT 
Figure-3.13: Reconfigurable Shifter Block 
3.7.3 Configurable Buffer Cluster 
The third basic cluster is configurable buffers. The cluster can be programmed for 
different bit-widths [BALO-06c]. The basic operation of the cluster is shown in the 
Figure-3.14. The cluster has been designed for functional and architectural flexibility. The 
cluster can be reconfigured to perform three basic functions required in most of the DWT 
calculations. These are: 
. Buffer 
2 to 1 multiplexer 
Straight connection between two wires 
INPUT-2 :::::::EJ_1I ii:I:IrIIIIIIIIIIIIIIIIIIII:1 ;1:__1I_ Connector 
Figure-3.14: Proposed Buffer CLB 
The Figure-3.15 presents the cluster and demonstrates the process of interconnecting 
different sub-modules for larger bit-width operations. Moreover, the operation of the 
cluster itself can also be reconfigured through the configuration bits which allow the 
74 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
complete reusability of the cluster and gives more flexibility to incorporate different types 
of DWT algorithms for various applications. 
The cluster is composed of four sub-modules (configurable blocks). These modules are 
connected through reconfigurable switches. The switches are controlled through 
configuration bits for connection. The cluster can be programmed for (2x1), (3x1). (4x1) 
and (5x1) multiplexers. Figure-3.15 presents the cluster with configurable switches for 
different implementation options. Internal configuration bits are used to configure each 
block (CLB-1. CLB-2, CLB-3 and CLB-4) separately. The cfg-1, cfg-2, cfg-3 and cfg-4 
are used to connect outputs and inputs of different CLBs for different implementation 
options. 
InpulI 	 I 
i Buffer CLB-i 	cfg.2 	 BufferCLB.2 
inoutI-? 
Internal configuration bits P ___ internal configuration bits 
4 
Buffer CLB• 	4 cfg•4 	I 	I Buffer CLB.4 
Internal configuration bits 	 internal configuration bits 
Figure-3.15: The proposed Reconfigurable Buffer Cluster with configurable Switches 
3.8 Programmable Interconnects 
Generally, routing programmable cells compromise 80% of the configurable logic block 
tile and consequently are the main concern [KAF-003]. The routing is composed of switch 
boxes, connection boxes, configurable switches and tracks. The internal design of these 
reconfigurable switches and interconnect elements affects the overall flexibility and power 
consumption of the array. The flexibility of a switch or connection box is determined by 
the number of possible programmable connections. The flexibility of these boxes affects 
the overall flexibility of the array (hence routability) as well as other characteristics like 
area and power consumption. Several researches have focused on designing optimal boxes 
[ROS-090]. The characteristics and implementations of the proposed interconnect 
hierarchy using these boxes is described below. 
75 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
3.8.1 Configurable Switch 
The main element used in making the reconfigurable interconnects is a reconfigurable 
switch as shown in Figure-3.16. This one is unidirectional and if the configuration bit (cfg) 
is high, it makes the connection and information is copied from a to b. If the configuration 
bit is low, then output b is held at high-impedance state. One bit is required per each 
configurable switch to operate the switch. These switches are used within clusters to 
cascade different sub-modules for larger data lengths. This feature gives re-configurability 
to the proposed logic blocks at architecture level. A bidirectional switch is implemented 




Figure-3. 16: The Reconfigurable Switch 
3.8.2 Switch Boxes 
The switch block is a programmable interconnect block that is found at the intersection of 
each horizontal and vertical routing channel. Figure-3.18 shows a simple switch box. 
Horizontal Tracks 
- 	 -. 	Trick 
Hi L&Iii:f<tH 
0 	 Track 
I 	I 	 Vertical Tracks 
I2 	3 	4 
Figure-3.17: Simple Switch Box 
The flexibility of a switch matrix (Fs) is defined as the number of connections to each 
incoming track to a number of outgoing tracks (Figure-3.19(a, b)). Clearly, the flexibility 
of each switch block is the key to the overall flexibility and routability of the device. The 
other important term W', is defined as the number of connection directions as represented 
in figure-3.19(c). 
76 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
linT 
t; FIexb4Ixt) € 
\ ;/ 
al FIexibibt c, Direction of Switch Box A = € 
Figure-3.18: Switch Box Flexibility and Direction [DEP-098] 
The switch block can be realized through different components, for example multiplexers, 
pass transistors or tn-state buffers. Figure-3.19 shows two examples of connection 
elements, the pass transistor and the tn-state buffer. 
vCC 
I OUT 	 IN,OUT 
a PASS TRANSISTOR 	 TR STATE BUFFER 
Figure-3.19: Routing Switch Connections with transistor Sizes [BET-099] 
Since, the transistors in the switch block add capacitance and have a loading effect to each 
track, the switch block has a significant impact on the speed of each routable connection 
and hence on the speed of the overall reconfigurable architecture. In addition, such a large 
portion of a reconfigurable fabric is devoted to the routing; the chip area required by each 
switch block will have a large effect on the achievable logic density of the device. Thus, 
the design of an efficient hardened switch block is of the up-most importance. Most of the 
research has been targeted towards non-synthesizable circuit designs and their routability 
[LEM-002][OCH -098 1. Kafafi et. al. [KAF-003] presented synthesizable interconnects for 
small arrays. These interconnects are based on a directional block which allows data flow 
in only one direction. The proposed switch box is based on disjoint type of switch box and 
composed of tn-state buffers. There are two primary reasons for choosing tn-state buffers, 
which are: 
Synthesizable 
Better performance over multiplexer [SAM-004] 
77 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
Figure-3.20: Disjoint Switch Box ROS-090I 
The Figure-3.20 presents the type (disjoint) of switch box for the proposed array with W = 
6 and flexibility = 3. It is found through a number of routing experiments that flexibility = 
3 gives a better performance in terms of area and power. Twelve configuration bits 




Figure-3. 21: Required Configurable Switches per Switch Box 
If a switch block is placed at an intersection of 12 one-bit and 12 eight-bit tracks, then the 
total number of configuration bits can be calculated as: 
Configuration bits per track = 12 
Total configuration bits for 12 tracks of 1-bit = 144 (12 x 12) 
Total configuration bits for 12 tracks of 8-bits each = 144 (12 x 12) 
Total Configuration bits per Switch Box = 288 
3.8.3 Connection Boxes 
The role of the connection box is to connect the pins of the logic block to the tracks. 
Usually one pin is assigned to one track, as is shown in Figure-3.22. However, this can be 
reduced or increased to change the flexibility of the array. 
Configurable Switches   
 ~~ r 
Logic Block 
Vr 





Figure-3.22: A Typical Connection Box 
J. Rose et. al. worked on different connection boxes to prove that the connection block 
should have high flexibility because it is otherwise unlikely that any path will be able to 
connect at its terminating point [ROS-090]. 
Tracks 
LOGIC BLOCK —1---H-±-I I 	LOGIC BLOCK 
Connection box  
Figure-3.23: Flexibility Definition of a Connection Box (C-Box) 
The flexibility as explained before is the key in obtaining efficient use of the routing area. 
Figure-3.23 illustrates the flexibility of a connection box (Fe). The shown C-Box has F = 
2 because each output from the logic blocks has an option of being connected to either two 
of four tracks. 
The proposed array is routed successfully with a channel width of 16. The connection 
boxes are designed with 100% flexibility as explained and proved by J. Rose et. al.[ROS -
090]. The connection switches are realized through tn-state buffers which makes the array 
synthesizable. 
3.8.4 Tracks 
Tracks are used to carry the information from either input to configurable blocks, one 
block to another block or from block to any of the outputs. The elements of the array are 
interconnected through symmetrical configurable switches. Sixteen 8-bit tracks and 
79 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
sixteen-lbit tracks are provided for both the data and control lines. It has been chosen to 
have mixed tracks (8-bits and 1-bit) to reduce configuration bits and hence, configuration 
memory. The 8-bits tracks are used for data transfer and the 1-bit tracks for control 
signals. Hence, data ports of the clusters are connected (through a connection-box) to the 
tracks. Connections are on 8-bit level, i.e. an 8-bit port can only be connected to one of the 
available tracks (or to multiple tracks at the same time). For 16-bit values, two 8-bit tracks 
are used simultaneously. Tracks are simple wires connecting two segments through a 
configurable switch. The VPR [VPR-0011 tool is used to determine the minimum channel 
width (tracks) required to route the proposed array successfully. Figure-3.24 shows a 
cluster with C-Boxes for its pin connections to tracks along with S-Boxes for inter-track 
connections. Figure-3.24 is a snapshot of VPR [VPR-001] tool, showing a cluster with C-
boxes. 
Figure-3.24: Cluster with S-Boxes and C- Boxes created using VPR Tool [VPR-0011 
3.9 Placement and Routing 
VPR (Versatile Place and Route) [VPR-001I is an FPGA placement and routing tool and 
was used for the placement and routing of the proposed array. VPR requires certain 
information regarding the design before it can place and route: 
. Net-list File 
. Architecture File 
. Placement File 
The net-list (design.net ) contains the information about the design to be placed and/or 
routed, while architecture file (design.arch) describes the architecture of the reconfigurable 
architecture in which the circuit is to be realized. 
Of 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
VPR can be run in one of two basic modes. In its default mode. VPR places a circuit on a 
reconfigurable fabric and then repeatedly attempts to route it in order to find the minimum 
number of tracks required by the specified architecture. If a routing is unsuccessful, VPR 
increases the number of tracks in each routing channel and tries again; if a routing is 
successful, VPR decreases the number of tracks before trying to route it again. Once the 
minimum number of tracks required to route the circuit is found, VPR exits. The other 
mode of VPR is invoked when a user specifies a specific channel width for routing. In this 
case. VPR places a circuit and attempts to route it only once, with the specified channel 
width. If the circuit will not route at the specified channel width, VPR simply report that it 
is not routable. 
3.9.1 Circuit Net-list 
There are three basic circuit elements available in any reconfigurable design: 
input pads 
• output pads 
• logic blocks 
These are specified using the keywords .input, .output, and .clb, respectively. The format 
is shown below. 
.input/ .output/ .clb blocknarne 
pinhist: net_a net_b net_c 
dnli' needed if a cib 
subblock: subblock_narne pin_numi pin_num2 ... *BLEO 
[subblock: subbiock_name pin_numi pin—n=2 ..] #BLE1 
The file contains the information about the name of the logic blocks and lists the names of 
the nets connected to each pin of the logic block or pad. Input and output pads (.inputs and 
.outputs) have only one pin, while CLBs have as many pins as required and mentioned in 
the architecture file. The first net, listed in the pin-list connects to pin-0 of a CLB, and so 
on. If some pin of a CLB is to be left unconnected, the corresponding entry in the pin-list 
is specified with the reserved word open' instead of a net name. Logic blocks (.clbs) also 
have to specify the internal contents of the logic block with sub-block lines. The net-list 
file is generated through an automated software tool which was developed based on the 
information specified by the VPR Tool [VPR-00l]. 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
3.9.2 Reconfigurable Core's Architecture 
The architecture file (Design.arch) describes the architecture of the reconfigurable fabric 
in which the circuit is to be realized. The file contains the information regarding all the 
pins along with the pins numbers. Each pin is allocated its class (input, output, etc.) and 
the location of the pin is specified. The outcome of the architecture file for a design can be 
shown as Figure-3.25. 
in 	cik 
4 Lit  





Figure-3. 25: Outcome of Architecture File 
3.9.3 Placement Of Clusters 
The placement of different reconfigurable clusters along with the input and output pins are 
carried out through placement File. The VPR accepts a user defined placement file and it 
can work on self placement algorithms. The first line of the placement file lists the net-list 
file and the architecture description file used by VPR to create the placement. The second 
line of the placement file gives the size of the logic block array (e.g. 20 x 20 logic blocks). 
block—name 	x 	y sub-block—number 
The block name is the name of the block which is the same as that given in the input 
netlist. X and y are the row and column in which the block is placed, respectively. The 
subblock number is used only for pads. Since two pads can be placed in a row or a column 
(see the FPGA architecture description) the Sub-Block number specifies which of the 
possible pad locations (either location 0 or location 1) in row x and column y contains this 
pad. Figure-3.26 shows the coordinate system used by VPR for a 2 x 2 (logic block array). 
EX 
3: Reconfigurable Fabric for Discrete Wavelet Transform 
• 	1,21 
• __I 








Figure-3. 26: Outcome of Architecture File 
The placement is carried out to keep the interconnect load as minimum as possible. It is 
carried out by examining a variety of DWT algorithms to configure the sequence of 
different logical/arithmetic operations. The clusters which share more information between 
each other are placed next to each other. The inputs and outputs are placed close to their 
usage. Hence, different clusters were arranged in the array to keep the configuration bits 
and channel track width minimum while maintaining the flexibility of the newly designed 
array. Figure-3.27 shows the placement of different programmable blocks in the proposed 
array 
:,,& Co.ff 	:::M, • A& .CIQ 
: Mp6.CIu5t, Suk buft sut : bufti 
::  Ado 
::: Mflp1eCloet. 	I:: oet ::bu::  .ot 
Coefy : C 	-: Ad, : 	 Cig 
M lt,poe CIoeIO .L. -'--- 
- } 	
c 	a Ad --- 





Coott 	 : 





Fi2ure-3. 27: Reconligurahle Array for DWT Computation ,,. 
The VPR tool confirmed that the array is routable with the minimum channel width. The 
VPR tool is an open source tool and it was modified to get the configuration bits for the 
particular routing. These configurations were used to configure the array for different 
DWT implementations. The VPR tool was modified in such a way that only active bits are 
given for a particular implementation option BALO-06cJ. All the routing resources were 
programmed in the beginning with non-active bits. This is termed as initialization. All 
routing elements are made addressable and active bits are loaded to configure the fabric 
for a particular implementation. The decision to make each element addressable was made 
83 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
to provide partial reconfiguration functionality and later on for the use of single upset 
mitigation (discussed in later chapters). 
3.10 DWT Implementations 
Different DWT implementation algorithms were implemented on the proposed 
architecture. The input data is an image (Lenna[51 2x5 12]). Different data extension 
strategies were studied to implement different DWTs. The data extension and storage were 
carried outside the array and are explained in the next section along with different 
implementations. 
3.10.1 Data Extension for Image Compression 
Discrete Wavelet Transform operations are basically filtering operations. A perfect linear 
filter should have infinite number of taps and the data which the filter operates on should 
also be an infinite in number. However, in real life, neither of them are infinite (pixels in 
an image and filter taps). So, as a result, the output of practical filters will not only have a 
different number of data/pixels than the original data/image but also inevitably produces 
data output that contains distortions or artefacts, especially at the edges of the image. 
These artefacts become worse at low-bit rate compressions due to aggressive quantisation 
which is followed by the transformation procedures. If a system has no resource 
constraints, the wavelet transform can be performed on the whole image. In this way, the 
artefacts are made less obvious to the observer since they appear at the edge of the image. 
However, when there is limited memory available, the image has to be tiled and processed 
independently, the artefacts are especially disturbing along the boundaries of the tiles due 
to this discontinuity. These problems can be reduced or eliminated by using several 
different techniques. One of the effective ways of reducing edge artefacts is by symmetric 
extension. Such extension also ensures that a DWT, at the edge/boundary of an image, has 
enough data in order to produce a single output. The work on symmetric extension was 
first presented by Smith et. al. [SMI-090]. Symmetric extension, not only keeps the 
number of analyzed or decomposed wavelet transform coefficients the same as the number 
of pixels in the original input image, it also results in symmetrically decomposed 
coefficients. There are two types of symmetric extensions [MAR-093]: 
Half sample (HS) extension 
. Whole sample (WS) extension 
84 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
Examples of a HS and WS extension are shown in Figure-3.28 along with the original 
input data. 
i- b 
a Original Data Samples 
G F E D C B_AlA B C D E F G lH_G F E D CB 
06 
Extension 	 Extension 
Half Sample Extensior (HS 




Extension 	 c Vole Sample Extensior (WE 	
Extension 
Figure-3.28: JPEG2000 Symmetric Data Extension 
The data extension scheme that was adopted by the JPEG-2000 standard is the WS 
symmetric as shown in Figure-3.29. The i 0  indicates the first indexed input data sample 
and the i 1  the index for the last input data sample. 
D C_BIA B C D E F G HIG_F E 
Extensior 	 Extensior 
Figure-3.29: JPEG-2000 Symmetric Data Extension 
The standard also specifies the number of samples to extend to the left/right depending on 
whether the first/last is an odd or even indexed sample term. There are several ways of 
implementing the JPEG-2000s data extension. It is obvious that conventional 
straightforward data extension is both area and operation intensive. In order for the DWT 
compression to be viable for wide usage, the power consumption of the DWT process 
needs to be reduced. An embedded data extension algorithm was designed and developed 
to reduce both the redundant power and area required by the data extension process [BEN-
004]. Both area and power reduction are achieved by combining and embedding data 
extension into the main DWT algorithm in both the start and the end of the transformation 
process. The embedded data extension algorithm can be applied both to the 
analysis/decomposition and synthesis/re-composition process [BEN-0041. The embedded 
data extension algorithm embeds the data extension into the main DWT by simplifying the 
main lifting based on the consideration of the repetition of some terms in the extended 
data. Equation 3.2 describe the implementation of the 5/3 lifting based forward (analysis) 
DWT. Equation 3.2 is used for the computation of the odd and the even coefficients. The 
computations of the odd coefficients have to be performed first followed by the 
computations of the even coefficients. The relationship between the output Y and its 
extended input data sample X is shown in Figure-3.30. 
85 
3: Reconfigurable Fabric for Discrete Wavelet Transform 
[t(2Z1)+ t (21)+2 LP : Y(2n) = Xexr (2n )+ 	 ] 
4 




2 	 (3.2) 
>1 	>1( 	>1 	>1 	>1(4 	>1 	)((6 	>1(7 	)( 	>1 
-I 	 I,! 	 il_i 	 I: 	 I , • 
Extension 	Odd numbered starting and Even numbered ending term Extension 
Y(; vc; Y4; Y6; 
71(4) 	)1() 	 71(6; 
Extension C) Even numbered starting and Odd numbered ending term 
 Extension 
Figure-3.30: Data Dependence Graph of JPEG-2000 Data Extension [BEN-0041 
As such, both the data flow-graphs, Figure-3.30[a} and Figure-3.31[b], can be collapsed 
and be redrawn as shown in Figure-3.32[a, b] [BEN-004]. The arrow with a line across it, 
in both of the Figure-3.3 1 [a. b], represents doubling the proportion of the originating terms 
in the contribution to the resulting coefficient terms. 
Y2; 	 y4; 	 Y(6 	 Y(8; 
Vt; 	Y.L3 ; 	i; 	I1; 
X( * ; 	X(2; X(3 	X(4) X(5 	X(6 	Xi 	x(8) 
II 	1 1 . 1 	 i,.j 	 ii 
a) Odd numbered starting and Even numbered ending term 
Yfo;A_~_ 	Y(2,~ Y 4, 	 Y 6, YC 
xco; 	xc; 	X(2; 	X(3; 	x(4; 	xts; 	X(6; 	x(7; 
Even numbered starting and Odd numbered ending term 
Figure-3.31: Data Dependence Graph of JPEG2000 Data Extension [BEN-004] 
The scheme proved to be power and area efficient for the implementation of DWT [BEN-
004]. The same is used for the data extension for the images. The data extension part is 
implemented outside the reconfigurable array. The decision was inspired by the fact that 
the data extension is a one-off process and therefore, can be implemented in the memory 
controller. The reconfigurable array works on the extended data along with the original 
image. This design assumption makes the array usable with any type of data extension 
algorithms as these are carried out separately. 
3: Reconfigurable Fabric for Discrete Wavelet Transform 
3.10.2 DWT Implementation-1 
5/3 lifting based DWT has many advantages over other DWT algorithms. The 5/3 helps to 
achieve lossless image compression and has short filter length for both low-pass and high-
pass filters as compared to other JPEG-2000 specified DWT filters i.e. Daubechies 9/7 
filter [DAU-092]. A 5/3 Filter has only one set of lifting steps compared to 9/7, which has 
two 
The proposed unique implementation scheme is based on three input samples [BALO-
06c]. The implementation is efficient and quite unique as it requires less elements as 
compared to other implementation techniques [LIA -001][LIA-002 ]. An independent block 
was designed in Verilog outside the array for parsing and combining different input 
samples. The raw image is fed into input generator block and it calculates the required 
three pixels for DWT calculations. The three pixels are selected on the principle explained 
in section (3.10.1). Figure-3.32 explains the process. 
i INPUT GENERATOR 	> 	
i'. 
........ 6H 	I BLOCK 
X' ..........X5 	X3 	X 
Figure-3.32: Input Generator Block for DWT implementation 
The implementation scheme is used for both 5/3 and 9/7 DWT. This gives an added 
advantage as there is no need to reconfigure the whole array, only the multiplier block is 
reconfigured for 5/3 and 9/7 DWT calculations. The proposed implementation uses only 
one buffer and gives outputIDWT coefficients on every clock cycle while [LIA -001][LIA-
0021 requires 6 pipelines and 4 pipeline registers and initially requires more than one clock 
cycle to give the first output. The hardware realization for the proposed implementation is 
shown in Figure-3.33 and was used to bench mark the performance of our proposed array 
against standard FPGAs. A test image of Lenna (512 x 512), shown in Figure-3.34[a] was 
processed through the proposed architectures. The output from the proposed architecture 
was processed through MATLAB and the original picture was reconstructed (Figure-
3.34[b]). 
MVA 







 - 	 - 	 - 
: 
Add 
Figure-3. 33: Realization of DWT on the Proposed Reconfigurable Array 
The coefficient multiplier cluster is configured in such a way that it performs the required 
coefficient multiplication. The unused modules in the proposed array are kept off with 
the help of the configuration bits. This helps to reduce the overall power consumption. The 
important feature of this implementation is that the configuration for the RA is the same 
for the 5/3 and 9/7 lifting based DWT. This allows the array to be dynamically 
reconfigurable for 5/3 and 9/7 without changing the array configuration bits. 
I 	- 
a) DWT Process Lenna [51 2x5 12] b) Reconstructed Image (MATLAB) 
Figure-3. 34: DWT processing through Proposed Reconfigurable Array 
3.10.3 DWT Implementation-2 
Lian et. al. introduced an architecture for implementation of 5/3 and 9/7 lifting based 
DWT [LIA-001]. The architecture is efficient in terms of reusability. The implementation 
scheme was implemented on the proposed reconfigurable architecture. The coefficient 
multiplication was performed through a coefficient multiplier cluster based on canonic-
signed-digit [MAH-098]. The main difference between Lian's scheme and the earlier 
proposed technique is number of input pixels. The Lian's [LIA-001] scheme is based on 
M. 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
two input samples and has extra pipeline stages to enhance the throughput. The hardware 






. 	 - 
Figure-3. 35: Realization of Lian et. al. [LIA-001] in terms of Reconfigurable Array 
The bridge image shown in Figure-3.36, was processed through the proposed 
reconfigurable array with Lian's implementation technique. 
4 
Figure-3. 36: D\VT Filtering Through the Proposed Architecture 
3.10.4 DWT Implementation-3 
An integer fast DWT based implementation was carried out on the proposed array to check 
the flexibility of the array. The previous two implementations were lifting based DWT. 
The integer based DWT is an efficient approach that is based on the computing power-of-
two wavelet coefficients which are derived directly from the roots of the half-band filter 
[DAN-002}. 
This technique is quite efficient and fast as it does not use any multipliers. The scheme 
only computes 9/7 DWT. The DWT is computed through the following equation. 
Mc 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
A = N*C*S*L*II 	 (3.3) 
A' denotes the outputs of the two channel filter bank system and other definitions of 
different components of equation-3.3 are: 
A = 	 C= I [ao(n)1 	
Fi 1 1 1 ol 	N= 
r2- 	0 -J 
	
[a 1 (n)j LI 1 0 0 ij L 0 	2 - j 
100000001 ; 0-2 02224 
010000010 s3 00 0 2' 2 
L=00 1000100 S=s2 =23 0 —2 02 
000101000 s —10 _22 02 
000010000 ; 0-24 -24 22 0 
I = [x(n —4) 	. x(n —I) x(n) 	x(n + 4)] 
The proposed reconfigurable array is flexible to incorporate the integer Fast DWT and 













Figure-3. 37: 9/7 Integer Fast DWT Implementation [DAN-002] 
DWT algorithms for 5/3 and 9/7 were implemented on our proposed array. The 
hardware/cluster utilisation of different DWT algorithms is presented in Table-3.2. The 
RA has been specially tailored in terms of size due to the enhanced functionality of the 
clusters. This helps in reducing the overall power consumption of the improved 
customized domain-specific array. The results are included in a later section. The choice 
of the different cluster positions in the array is inspired by the study of different DWT 
90 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
algorithms and their implementations, for example; Addition/subtraction operations are 
required frequently in computations and due to this fact, the cluster is distributed all over 
the array. 
Table-3.2: Comparison of DWT algorithms based upon utilization of proposed array 
CONFIGURABLE CONFIGURABLE CONFIGURABLE 
COEFFICIENT BUFFER ADD-SUBTRACT 
IMPLEMENTATIONS MULTIPLIER CLUSTER CLUSTER CLUSTER 
Proposed DWT Architecture 
5/3 lifting based DWT 2 1 5 
9/7 lifting based DWT 2 I 5 
Lian et. al. [LIA-QOIl 
5/3 lifting based DWT 2 8 4 
9/7 lifting based DWT 2 8 4 
Dang et. Al. [DANG-002] 
9/7 lifting based DWT 5 2 8 
3.11 Performance Evaluations 
Three different VLSI implementations (discussed above) for 1-D DWT were implemented 
on the proposed reconfigurable array. The same DWTs were also implemented on general 
recontigurable architectures. The Xilinx Virtex-E XV50E device was chosen as the target 
device [XIL-DAT]. All these systems use 0.18m CMOS technology and run at 1.8V. 
The values are measured for a single frame of Lenna's image 128 x128. Active HDL 5.1 
was used for RTL simulations. Cadence's Verilog-XL TM was used for post simulation to 
verify the designs. Power analysis of the proposed architecture was carried out through 
Synopsis Prime Power. Xilinx ISE 6.2i was used to get the performance figures in terms of 
area, timing and power. 
3.11.1 Area Comparison 
The area estimates for Xilinx Virtex-E XCV50E are presented in Table-3.3. The selection 
of the Xilinx FPGA is inspired by the fact that it based on 0.18i.tm  CMOS technology. The 
proposed reconfigurable architecture is also synthesized on same technology for the sake 
of fair comparison. The area figures are an estimate (Virtex-E CLB a= -30 PM 2, each 
Virtex-ECLB consists of two slices) provided by Chipworks [CHI-WEB][SAM-004].  It is 
worth mentioning that the routing and mapping of the proposed array is done through VPR 
tool and hence, the results are less optimized than that of Virtex-E. This is due to 
commercial intelligent routing and placement algorithms. The area results of the proposed 
array can further be improved by employing more sophisticated tools for routing and 
placement as careful placement of logic blocks can optimise the performance of the circuit 
in terms of speed as well as power consumption by reducing the routing requirements. It 
91 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
can be seen that an area saving of 7% to 12% is achieved through the proposed 
architecture. As mentioned earlier, the area saving can be made more through dedicated 
and sophisticated placement and routing algorithms. 
Table-3. 3: Comparison of Map and Place & Route Reports. 
LIAN 





SLICES (of 768) 102 13% 66 8% 425 55% 
SLICE FFs (of 1,536) 95 6% 41 2% 188 12% 
4 input LUTs 128 
(of 1536 ) 
8% 109 7% 558 36% 
used as Logic 115 96 413 
used asroute-thru 13 13 145 
Total gate count 2,355 1,809 9,159 
JTAG gate count 
for lOBs  
3,216 3,984 5,032 
• 





Avg Connection Dly 1.259 ns 1.042 ns 1.163 ns 
Avg Connection Dly 
for 10 worst nets  
2.704 ns 2.567ns 2.868 ns 
Max pin delay 
TIMING  
5.121 ns 5.045 ns 5.874 ns 
3.11.2 Power Comparison 
The power consumption values measured for our array for different implementation are 
obtained with post-routing simulation with switching activity and accurate parasitic and 
load information. The tools used for power evaluations were Cadence's Silicon Ensemble 
and Synopsys's Prime Power. In the case of the Virtex-E FPGA, the power figures are 
obtained from the typical estimations provided by Xilinx [XIL-DAT]. The estimates for 
Virtex-11 device are based on the study of average power consumption for more than 500 
designs and it is reported as 5.9 1.tWIMHz per CLB [XIL-DAT]. The values provided 
include the power consumed by the configuration circuit and configuration memory. Our 
proposed array consumes between 42% and 54% less power than the Virtex-E. This is 
mainly caused by the fact that less interconnects and larger clusters are used in our array. 
It is quite evident from the results presented in Table-3.4 that the proposed array is 
efficient in speed and power consumption for all the DWT implementations. This is 
because the clusters are customized for one type of computations as compared to generic 
small clusters in case of generic FPGAs. The customization of clusters for one domain and 
arrangement of clusters, help to reduce the number of clusters required for the desired 
ON 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
operation: C-boxes and S-Boxes (which are main cause of power consumption [SAM-
004]). 




















MHz 5/3 9/7 5/3 9/7 9/7 
Xilinx Virtex-E 8.29 14.01 118 11.5 16.59 112 1 	27.13 102 
The Proposed RA 4.9 7.61 140 6.03 8.83 135 1 14.35 130 
The percentage power saving is shown in Figure-3.38. As stated before the proposed array 












Impen1enta0or. 	Impierner,fl vv,r , rretaIior.2 
Figure-3.38: Percentage Power Saving 
3.11.3 Area & Power Distribution Among Clusters 
The power and area measurement of different clusters were performed. The Coefficient 
multiplier cluster uses more area and power than the buffer and add-subtract cluster as 
illustrated in Figure-3.39. This is expected as the coefficient multiplier consists of multiple 
shifter blocks (6) and add-subtract logic blocks (5). 
The Coefficient multiplier cluster uses approximately 95% more power than the 
reconfigurable buffer cluster and approximately 88% more power than add-subtract 
cluster. Figure-3.40 shows the area overhead used to make the hardware reconfigurable. 
The add-subtract cluster occupies only 6% of the total area while the C and S-box switches 
occupy 54% and 46% respectively. As can be seen from the graph these area values 
include the area occupied by the configuration registers, which represents a large 
93 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
percentage of the area of the boxes. The total area can be reduced considerably if the 
flexibility of the C- and S-boxes is reduced. 
Figure-3.39: Average Percentage Relative power consumption of different clusters with respect to 







2. DOE +05 
0 OOE+00 
2 	 3 
Cluster 	C-Box S-Box 
Figure-3. 40: Area Distribution of Add-Subtract cluster of the Reconfigurable Array 
Figure-3.41 shows the average power consumption of one add-subtract cluster and its 
associated C-Boxes and S-Boxes. The cluster only consumes approximately 11% while C-
Box consumes more than 50% and S-Box approximately 41% of the total power. This is 
expected due to two reasons: 
• High number of configurable switches in the signal path 
• Long Routing Paths 
This is a trade-off between flexibility and power consumption. The power can be 
improved by reducing flexibility of the routing matrix (S-Boxes, C-Boxes). The proposed 
array promises the power advantage without losing much of the flexibility as the arrays are 
tailored for a specific domain. 
94 
Chapter 3: Reconfigurable Fabric for Discrete Wavelet Transform 
11i11 ___ 
S-So. 	 Ceo. 
Figure-3. 41: Average Power Distribution of Add-Subtract cluster 
The proposed reconhigurable array has a maximum frequency 18% to 27% higher than 
Virtex-E (Table-3.4). This is due to a smaller interconnecting load. As explained earlier, 
the transistors in each switch block add capacitance to each track and this has a significant 
effect on the speed of each routable connection hence, the speed of the reconfigurable 
architecture. Our proposed reconfigurable array has less interconnects due to more 
functionality inside the reconfigurable cluster and therefore it is faster in terms of 
operating speed over commercial FPGA. 
3.12 Summary 
This chapter has described a new customized domain-specific reconfigurable array for the 
implementation of different DWT algorithms. Each implementation has different 
advantages in terms of power consumption and time needed to complete these 
computations, as well as quality and precision of the output. The results demonstrate the 
flexibility provided by the arrays in allowing the mapping of a range of different DWT 
algorithms and a maintaining performance advantage over generic commercial counter 
parts. The new flexible arrays provide a better alternate to generic FPGA solutions for 




ERROR DETECTION AND 
CORRECTION CODES 
4.1 Introduction 
Applications such as DSP are known to be arithmetic-intensive activities. These 
computation intensive applications are facilitated by different technologies like ASIC, 
general purpose DSP microprocessors, application specific standard products (ASSP), FPL 
devices which include FPGA and general/application-specific reconfigurable SoC 
architectures. Reconfigurable architectures or FPGAs have comparatively weak arithmetic 
capabilities, for example a M x M-bit multiplier or multiply-accumulate unit (MAC) when 
implemented on reconfigurable fabric, is inferior to a well designed ASIC arithmetic logic 
unit (ALU) in terms of both speed and area [DIC-000]. In addition to this, reconfigurable 
devices deficiencies increase geometrically with word-length (precision). It is due to these 
limitations which have led the designers to look for different alternatives. The most 
popular and common technique in practice is distributed arithmetic (DA) IWI-II-0891. The 
DA technique reduces an algorithm to a set of different sequential look-up tables (LUT). 
These LUTs are memories which contain all the possible inner-products. For different 
critical applications, such as space related applications, the contents of these memories are 
very important and are vital for the successful operation. High-energy ions or radiation can 
corrupt or can induce errors in the contents of these memories. 
The main design constraints faced during the development of the space qualified 
electronics are low power consumption, low area overhead, remote re-configurability and 
fault tolerance with respect to the possible upsets due to cosmic radiation. In addition to 
this, increasing current and next generation electronic components and systems transient 
and intermittent faults are more prominent due to high energy radiation particles. At sea 
level, the energy of these radiation particles is not enough to drastically affect the 
operations of today's integrated circuits (ICs). However, it is predicted due to the fact that 
WON 
Chaoter 4: Error Detection and Correction Codes 
the device size is shrinking and due to power supply reduction that even with this much 
energy these radiation particles will create soft-errors, even at ground levels. The situation 
becomes worse at flight altitudes. The disintegration of radioactive isotopes contained in 
the materials of electronic systems produce alpha particles and it is becoming another 
cause of increased SEUs in the case of space related electronics. The basic reason for the 
increased sensitivity to SEU produced by either cosmic radiation or alpha particles is the 
reduction in device size and reduction in power supply. 
High reliability normally comes with extra hardware/software, which in turn costs more in 
area and power consumption. Reliability is a process of exhaustive testing and constantly 
improving the designs either due to shortcomings of the design or due to advances in the 
technology. The improvements cause more design time and cost. In fact a design error (as 
trivial as inverted signal) in one time programming (OTP) devices make these devices 
unusable and a new device has to be programmed. This 'trial and error" approach is costly 
in terms of development time and also in terms of wasted devices and printed circuits 
boards. 
In the design of space related architectures reconfigurable architectures offer the 
advantage of considerable reduction in development time and costs. FPL devices which 
include FPGA are well documented and widespread commercially and represent one of the 
most flexible architectures available today. One of the main problems related to the use of 
recont'igurable architectures in onboard systems is their sensitivity to SEU. These generic 
architectures also suffer from disadvantages like large area and high power consumption. 
Several hardware/software approaches have been developed to improve these architectures 
up to the required standards of the space environment. 
In case of circuits containing reconfigurable architectures, SEU can corrupt memory and 
logic nodes. Memory can be configuration or data memory. The corruption in the 
configuration memory can change the overall behaviour of the circuit and the corruption in 
the data memory can cause malfunctioning of the circuit and loss of precision. 
One of the most common approaches to avoid SEU is TMR with voter logic. The obvious 
disadvantage of this solution is that it triplicates the gate count which in turn may triplicate 
area and power consumption. 
This area has been researched for some time to discover for the fault tolerant architectures 
which can promise all the space related qualifications and which can shorten the design 
cycle. The manner in which reconfigurable logic can be used to make the satellite vehicle 
Chapter 4: Error Detection and Correction Codes 
less susceptible to hardware and software faults is a very broad, daunting and 
encompassing topic. 
In this chapter, we propose new single error correcting code based circuits (Encoder, 
Syndrome Generator and Decoder circuits comprising of Syndrome decoder and Error-
correcting logic circuits), which not only reduce the complexity of both encoder and 
decoder circuits at all stages but also are more efficient in terms of area and power than all 
previously proposed circuits reported in the literature. 
4.2 Background 
Coding theory is the umbrella term used to cover the study of two related, but 
distinguishable, topics: "error correcting codes" and the mathematics behind reliable 
communication in the presence of noise". Error-correcting codes are collections of 
sequences of elements from a finite set (for example words composed of finite alphabets) 
and any two words in the collection disagree on many coordinates. Thus the theory of 
error-correcting codes could be considered an area of combinatorial mathematics (with 
some natural algorithmic tasks that arise naturally). The theory of communication in the 
presence of noise, on the other hand often leads to information theory and/or statistics. The 
two aspects of coding theory enjoy a symbiotic relationship from the days of their origin. 
Coding theory owes its origins to two roughly concurrent seminal works by Hamming 
[HAM-050] and Shannon [SHA-048], in the late 1940s. 
Hamming invested efforts in studying information storage devices and designed simple 
schemes to protect the information from being corrupted through minimum possible 
number of bits. Hamming realized the explicit need to consider sets of words "code-
words" where every pair differs in a large number of coordinates. Hamming extracted the 
combinatorial underpinnings of the theory of error-correcting codes. He defined the notion 
of distance between two sequences over finite alphabets, which is now known as the 
Hamming distance. Hamming also constructed an explicit family of codes and his work 
was eventually published in 1950. 
Slightly prior to Hamming's publication, Shannon (1948) wrote a treatise of monumental 
impact formalizing the mathematics behind the theory of communication. In this treatise 
he developed probability and statistics to formalize a notion of information. He then 
applied this notion to study how a sender can communicate efficiently over different 
mediums, or more generally, channels of communication to a receiver. The channels under 
W. 
Chapter 4: Error Detection and Correction Codes 
consideration were of two distinct types: Noiseless or Noisy. In the Noiseless case, the 
main goal is to compress the information at the senders end in order to minimize the total 
number of symbols sent while allowing the receiver to recover the transmitted information 
correctly and efficiently. The goal in the Noisy case is to add some redundancy to the 
message being sent so that a few erroneous symbols at the receivers end still allow the 
receiver to recover the sender's intended message. Shannon's work showed, contrary to 
popular belief then. that when transmitting information at a fixed feasible rate, longer 
messages were more likely to be recovered (completely correctly) than short ones. 
Shannon's and Hamming's works were chronologically and technically deeply intertwined. 
Technically these works complement each other perfectly. Hamming focussed on the 
mathematical properties behind these combinatorial objects, while Shannon creates the 
perfect setting for their application. Shannon based his theory on probabilistic models (of 
error, message, etc.). Both works however, were immediately seen to be of monumental 
impact. Error correcting code based circuits have been used for years for tasks such as 
improving the reliability of communication and random access memory. However, most of 
the conventional error correcting codes like Reed-Solomon codes or Bose-Choudhuri-
Hochquenghem (BCH) are mainly used in digital communications as they are not very 
efficient for on-chip DRAM applications FMUZ-0931. Encoder and decoder circuits 
designed around these codes suffer from very high access delays due to the use of a linear 
feedback shift register (LFSR) [MUZ-093][boy -097 1. For this reason. Hamming codes or 
extended Hamming codes are normally used for the said applications. A general scheme 
with error correcting for memory is shown in Figure-4. 1. 
• 1T_ • • • 
• ________ • _ • Ab • • _ HH • 11?LiJL! - • €CCD 	 : - 
Figure-4, I: DRAM with Error Correction 
Kazéminéjad et. al. [KAZ -001][kAZ-OlaI proposed an improvement for the extended 
Hamming codes which proved that the codes are efficient in terms of area, speed and 
Chapter 4: Error Detection and Correction Codes 
power as compared to the conventional extended Hamming Code based encoder/decoder 
circuits. However, this improvement comes with the penalty of one extra bit. The idea of 
this SEU hardened technique is to identify that an error has occurred in a latch, flip-flop, 
register or memory and to correct the error when the stored value is used. It is necessary to 
use extra logic structures to correct the errors according to the amount and the class of 
stored cells located in the circuit. 
The Hamming Code is an error-detecting and error-correcting binary code that can detect 
all single- and double-bit errors and correct all single-bit errors. This coding method is 
recommended for systems with low probabilities of multiple errors in a single data 
structure (e.g., only a single bit in error in a byte of data). 
4.2.1 Hamming Code Definition 
A Hamming code satisfies the relation: 
>= m+k+1, 
where m+k is the total number of bits in the coded word, m is the number of information 
bits in the original message, and k is the number of check bits in the coded word. The 
Hamming code can correct all single-bit errors on n-bit words and detect double-bit errors 
when an overall parity check bit is used. The check bits are placed in the coded word at 
positions 1, 2, 4 ..... 2(k-1)  For example, for 8-bit data, 4 check bits (P1, P2, P3 and P4) are 
required so that the Hamming code can correct a single-bit error. Figure-4.2 exemplifies a 
12-bit coded word (m=8 and k=4) with the check bits P1, P2, P3 and P4 located at 
positions 1, 2, 4 and 8 respectively. It is possible with the help of check bits to get the 
information regarding the position of error bit. 
Positior 1 	2 	3 	4 	5 	6 	789 	10 	1112 
Pi I P2 	P 	 P4 
Check Bits 
Figure-4. 2: The Hamming Code composition 
The check bit P1 creates even parity for the bit group {1, 3. 5, 7, 9, and 11}. The check bit 
P2 creates even parity for the bit group 12, 3, 6, 7, 10, and 11). Similarly, P3 creates even 
parity for the bit group 14, 5, 6, 7, and 12). Finally, the check bit p4 creates even parity for 
the bit group (8, 9, 10, 11, 12), as shown in Figure-4.2. 
100 
Chapter 4: Error Detection and Correction Codes 
:::1PD : : 	•: 
9 10 11 12 
P2 1J H 
Pos.o 	2 	3 
H 
Postior 1 	2 	3 	4 	5 	6 	- 7 	8 	9 	10 	11 	1 
Figure-4. 3: The Check Bits in The Hamming Code 
An example of SEU mitigation technique using the Hamming Code is presented in 
[COTOOI, [LIM00I. The following example explains the Hamming code in detail: 
D 1 	D2 	D3 	D4 	D5 	D6 	D7 	D8 
	
(Original Data Message) 1 	1 	0 	0 	1 1 	1 	0 
(D I D ......... D8) represents original data bits along with their position in the original 
message. The Hamming Code with the check bits position can be written as: 
P1 P2 D 1 P3 D2 D3 D4 P4 D5 D6 D7 D8 
and the check bits (P1, P2, P3 and P4) are calculated through following equations: 
Pl=D1'D2®D4D5D7 	 (4.1) 
P1=Dl®D3D4D63D7 	 (4.2) 
P1=D2sD3eD4D8 	 (4.3) 
Pl=D5D6'D7®D8 	 (4.4) 
P1=0, 	P2=1, 	P3=0, 	P4=1 
Data = {110011101 	- 	Hamming Code = {01 10100111 10} 
Let us assume that one bit D 7  has been corrupted through SEU and the modified data along 
with the code word will be as follows:- 
Data stored without SEU 
	= 	{011010011110} 
Data after SEU (D 7 flipped) 
	= {011010011100} 
101 
Chanter 4: Error Detection and Correction Codes 
4.2.1.1 Error Detection and Correction: 
The modified data is read and decoder re-computes the check bits through Equations 4.1 - 
4.4. The newly calculated and old check bits are as follows: 
P1=1 	P2=0, 	P3=0 	P4=0 
P1 = 0 	P2 = 1, 	P3 = 0 	P4 = 1 (Original Check Bits) 
The error location in the Hamming code can be determined. The error correcting logic 
determines the position of the rotten bit in the following fashion: 
P4 	 P3 	 P2 	 P1 
1 	 0 	 1 	 0 	(old) 
0 	 0 	 1 	(new) 
1 	 0 	 1 	 1 (Bit Position 11 is Rotten) 
In recent years, there has been an increase in demand for efficient and reliable digital data 
transmission and storage systems. A major concern is to control the errors so that reliable 
reproduction of the data can be obtained. 
Efficient codes are designed by keeping the ratio of parity bits to the data bits and making 
the processing time involved in encoding and decoding the data stream as minimum as 
possible. Conventional (12,8) single error correction codes require 4 parity bits to correct 
single error in 8 data bits [BOY-097]. Kazdmindjad [KAZ-001] introduced systematic 
(13,8) codes and proved that with the penalty of only one extra bit, the decoding 
complexity can be reduced to a minimum. These codes were designed on the basis of the 
Hamming weight being equal to two [MUZ-0931. The parity check matrix of 
Kazdminéjad's systematic (13,8) code is as follows [KAZ-001]: 
Ml M2 M3 M4 M5 M6 M7 M8 
1 1 0 1 0 0 1 01 0 0 0 0 
1010100 10 1000 
A=0 1 10010 00 0100 
0 0 0 1 1 1 0 00 0 0 1 0 
0000001 10 0001 
Parity Check Identity Matrix 
Let Mi denote the data bits and Pi denote the parity check bits. Note that the data bits start 
from data bit-1 rather than zero. The parity check bits for (13,8) code are computed as 
follows [KAZ-001]: 
102 






The above parity check bits can correct any single error in a 13-bit codeword. 
4.3 The Proposed Error Correcting Code 
The proposed new single error correcting code is constructed on the basis of Hamming 
weight. Hamming weight's definition is taken as the number of ones in a column or row of 
the parity check matrix [KAZ-001]. The code has been designed on the basis of the 
maximum Hamming weight being equal to 2 in terms of columns but at least one column 
must have a Hamming weight equal to 1, in such a way, that row must have Hamming 
weight equal to 2. We have constructed new (13,8) error correcting codes based upon the 
above rule [BALO-05a]. This is further explained in the following section. As with 
Kazéminéjad's codes, the proposed code also suffers from the penalty of one bit more than 
the conventional (12,8) Hamming error correcting code but gives minimal 
encoder/decoder complexity and lower power dissipation than all previously introduced 
techniques in the literature. The parity check matrix for the new (13,8) code is as follows: 
Ml M2 M3 M4 M5 M6 M7 ME 
1 1 1 1 000 0l 0000 
1 0 0 0 1 	1 0 00 1 0 0 	0 
B=0 100100 l0 0100 
0 0 1 0 0 	1 0 1,0 0 0 1 	0 
0 0 0 1 0 	0 1 00 0 0 0 	1 
Parity Check Identity Matrix 
M• represents bits of input data message. The parity check bits are given by 
Pl=M1M2M3M4 (4.10) 




The parity bits mentioned above can correct any single error in 8-bit data/message Total 
length of codeword is 13 (Data [8] + Parity [5]).  The Mni denotes the changed/corrupted 
data bits and Pni are the parity check bits calculated on the basis of corrupted data. All Pni 
103 
ChaDter 4: Error Detection and Correction Codes 
bits are calculated in the same manner as explained in the equations [4.10 - 4.141. The 
correction of any single data-bit can be explained as: 
C 	= Pn 1 	P1 (4.15) 




Ci represent correction bits and M'i are the corrected data bits and are computed through 
following expressions: 
M '1 = (C I C 2) @ Mn 1 (4.20) 
M'2 = (CIC3) 	Mn 2 (4.21) 
M '3 = (C IC 4) 	Mn 3 (422) 
M'4=(ClC5)Mn4 (4.23) 
M '5 = (C2C3) 	Mn 5 (4.24) 
M'6=(C2C4)Mn6 (4.25) 
M 7 = dC 5 G Mn 7 (4.26) 
M '8 = (C3C4) @ Mn 8 (4.27) 
The following table explains the error correction process of different data bits. 
Table-4. 1: Error Correction and Corresponding Correction Bits 
CORRECTION BITS CORRECTED DATA BIT 
Cl &C2 Data bit-I 
Cl &C3 Data bit-2 
Cl &C4 Data bit-3 
CI &C5 Data bit-4 
C2 & C3 Data bit-5 
C2 & C4 Data bit-6 
IC! &C5 Data bit-7 
C3 & C4 Data bit-8 
4.3.1 Case Example-1: (An Error in Data Bits) 
Let's assume that M org  is original Data as shown below: 
Note that the least significant bit is referenced as D 1 rather than D0. And now assume that 
the first bit (D 1 ) is changed due to SEU from 0 to I, represents the data after single 
event upset: 
104 
Chapter 4: Error Detection and Correction Codes 
The equations [4.10 - 4.141 give Parity Check bits for the original data (M g) and for 
corrupted data (M iew). Porgi represents the parity check bits for the original data (M g) and 
Pi represents the parity check bits, calculated for the corrupted data (M new). The 
correction bits are calculated by using these parity check bits (P orei and Pcat) in Equations 
[4.15-4.19] 
Porg lO 1 cai1=1 C11 
Por92 = 1 Pcai2 = 0 C2 = 1 
Pca 3=0 C30 
Por940 Pca14 0 C40 
PorgS = 1 PcajS = 1 CS = 0 
The correction bits (Ci) along with the corrupted data (M,,,), due to SEU, is used in the 
equations [4.20 - 4.27] to correct the single bit faults. In this particular example DI (least 
significant bit) is corrected as follows 
M '1 = (1.1) 	1 = 0 
M'3=(1.0) EE) O=0 
M'5 = (1.0) G 0 = 0 
M '7 = LO S 0 = 0 
M 2 = (1.0) ED 1 = 
M'4 = (1.0) 0 1 = 1 
M'6 = (1.0)1 = 1 
M '8 = (0.0) 	I = 
_I 
Note that Dl is corrected while all other bits are unchanged. 
4.3.2 Case Example-2: (An Error in Parity Bits) 
Let's assume that SEU corrupts one of the parity bits stored in the memory. Let's take M 
as Data and is shown below: 
Equations [8-12] give parity check bits (P orgi) for the data (M). These parity bits are stored 
in memory along with the data bits and are shown below: 
Porg5 Porg4 Porg3 Porg2 Porg  1 
Porg 1 0 0 1 0 
105 
Chapter 4: Error Detection and Correction Codes 
Let's assume that the P 0 1 has been corrupted by SEU. Piew  represents the parity check 
bits after SEU: 
Pnew5 Pnew4 Pnew2 Pnewl 
Pnew 1 0 0 1 1 
Pcali are calculated through Equations [6-10] for the data stored in the memory. It is quite 
clear that the Porgi and Pi will be the same, as the data has not been changed. The 
correction bits are calculated by using the parity check bits (Pi and Pi) calculated 
through the equations [4.15 - 4.191 
P 1 11 = 0 Piew l = 1 C1=1 
Pc12=1 Piew2=1 C20 
P 1 3=0 P3=0 C3=0 
Pca14 = 0 P,A = 0 C4 = 0 
Pcai5 = 1 Pnew5 = I CS = 0 
The correction bits (Ci) along with the data (M) is used in Equations [4.20 - 4.271 to 
correct the single bit faults. In this particular example data is not corrupted and the only 
one of the parity check is corrupted. M'i represents the corrected data and it is same as 
original data message M. 




M'5 = (ftO) S 0=0 
	




An upset in the stored parity check bits does not disturb the data bits and proposed error 
correcting-code can eliminate the fault without disturbing the original data. The code 
introduced in this chapter can correct any single data bit among 8 data bits. Normally, the 
error correcting codes which are quite efficient for the large memories are not that efficient 
in terms of area and power. We have devised new error codes for small memories (4bit) 
wide memories on the basis of the same principle. The results prove that these codes are 
even better than previous approaches for small memories. 
4.3.3 Encoder Design 
An encoder converts the incoming data/message into a codeword. The codeword is then 
stored in a memory along with the original contents. This memory could undergo the 
effects of a high-energy ion or radiation that may corrupt the data/message. The decoder 
106 
Chapter 4: Error Detection and Correction Codes 
attempts to convert the codeword back to the original data/message. Each encoder/decoder 
can handle a set of possible error conditions. With some schemes, the decoder can only 
detect an error and outputs an "error" signal. With other schemes, the decoder can not only 
detect an error, but also can correct the error to reproduce the original data/message. The 
encoder for the proposed code is shown in the Figure-4.4. A two input XOR gate (XOR2) 
is used for the implementation of the encoder circuitry [BALO-05a]. 
M2 M4 M6 MO 
M2 M MO Ma 
 
P 
Proposed (13,8) code based encoder circuitry. 	b): Kazdminéjad's (13,8) [KAZ-0011 based 
encoder circuitry. 
Figure-4. 4: Error Correcting code based encoder circuitry. 
For comparison, an encoder based on Kazéminéjad's [KAZ-001] codes is presented in 
Figure-4.4[b]. 
Figure-4.5 shows the power comparison of encoder circuits based upon different error 
correction codes. The proposed error correction code requires less power than the 
Hamming code based encoder. Hence, a saving of approximately 22% is achieved by 
using the proposed code-based encoder. As compared to Kazémindjad's [KAZ-0011, our 
code-centric encoder is approximately 9% efficient in terms of power consumption 
(Figure-4.5). 
107 
4: Error Detection and Correction Codes 
R. 
E 6C 
,d, 	 oz.N.dCdc 
Error Corection Code 
Figure-4. 5: Power comparison of different encoders 
4.3.4 Decoder Design 
A decoder consists of a syndrome generator, syndrome decoder and error correcting logic 
[KAZ-011. The syndrome generator computes the parity check bits based upon the data 
received and compares it with the received parity check bits. Based upon this comparison 
the corrector logic corrects any single error that occurs in the 13-bit codeword. Figure-4.6 
represents the implementation of the syndrome generator based upon the proposed new 
(13,8) code while Figure-4.7 represents the Kazéminéjad's [KAZ-001] codes based 
syndrome generator. 
Note that Ci check-bits are used in decoder's error correcting logic to correct any single 
error in the codeword. It is quite evident that the proposed code gives a more simple 
solution for the syndrome generator's circuitry than all the previous implementations in 
terms of gate count. 
Figure-4.6: Proposed (13,8) code based syndrome generator 
108 







Figure-4.7: Kazêminéjad's (13,8) [KAZ-001] code based syndrome generator circuitry. 
This improvement makes these code-centric circuits most suitable for the applications 
which are power and area critical. The decoder's error correcting section based upon the 
proposed new (13,8) code is presented in Figure-4.8[b]. Note that M'i represents the 
corrected data, Mi denotes data stored in memory (which may be corrupted by SEU) and 
Ci represents the correction bits in the Figure-4.8. 
The decoders error correcting circuitry for the proposed (13,8) code uses one extra 
inverter than the Kazéminéjad's systematic (13,8) code [KAZ-001] based circuitry, 
however the overall complexity and power dissipation of the decoder (shown in Figure-4) 








a): Kazéminéjad's (13,8) [KAZ-001] 	 b): Proposed (13,8). 
Figure-4. 8: Syndrome Decoder & Correcting Circuitry. 
lIi 
Chapter 4: Error Detection and Correction Codes 
4.3.5 Performance 
For overall comparison, we have implemented circuits using 0.1 8tm technology and the 
results (in terms of area) are shown in Figure-4.9. The area consumed is a good measure 
for the complexity and it can be seen that the overall complexity of the circuits based upon 
the proposed code have been reduced by approximately 27% as compared to the Hamming 
code based implementations [BALO-05c]. The area saving is mainly due to the fact that 
the encoder uses a small number of gates as compared to the Hamming code and other 
implementations. This is explained in Table-4.2 and Table-4.3. 
A circuit's complexity and power dissipation are measured in terms of total gate count. As 
explained before, hardware realization is carried out by using two input XOR gates. Table-
4.2 provides a comparison between previous implementations of the encoder and that of 
ours. It is evident from the results that the implementation of the encoder based upon our 
technique, is better in terms of complexity and power dissipation than the previous 
techniques. Note that. the new proposed encoder's complexity and power dissipation are 
reduced by one XOR gate compared to the previously improved technique. 
Figure-4. 9: Area comparison of encoder implementations based on different error correction codes 
Table-4. 2: Comparison of different Encoders 
ENCODER (12,8) HAMMING [4] (13,8) KAZEMINEJAD'S PROPOSED 
[KAZ-00l) 
Complexity 13 XOR2 II XOR2 10 XOR2 
(REDUCED) 
Delay 3 XOR2 2 X0R2 2 XOR2 
Table-4.3 presents the comparison of two syndrome generators based upon complexity and 
power dissipation. Note that the complexity and power dissipation is reduced by one, 2-
input XOR (XOR2) gate than previous implementations. 
110 
Chapter 4: Error Detection and Correction Codes 







Complexity 16 XOR2 15 XOR2 (REDUCED) 
Power Dissipation 16 XOR2 15 XOR2 (REDUCED) 
Delay 3 XOR2 3 XOR2 
As mentioned earlier, the area consumed is a good measure for the complexity and it can 
be seen that the overall complexity of the circuits (encoder, syndrome generator, syndrome 
decoder and error correcting logic) based upon the proposed code have been reduced by 
approximately 18% as compared to the Hamming code based implementations. As 
compared to Kazdmindjad's [KAZ-001] approach, our proposed code based circuits 
(encoder and decoder) use 8% to 10% less area, which proves that the proposed code 
based circuits are less complex. This is shown in Figure-4.10. 
o: i l° 	117 
rDpose 	zsTea S 	 fl 
(13,8)[51 
Figure-4. 10: Overall comparison of encoder and decoder process based on different error correction 
codes in terms of area. 
4.4 Summary 
We have presented a new, high performance and low power, single error correcting code 
and its associated circuits. We have shown that the proposed code and code centric circuits 
are not only better than the conventional Hamming code based circuits but also better than 
recent work done in this area such as Kazdminéjad's systematic code [KAZ-001]. The 
added advantage of this technique overcomes the limitations of the currently employed 
techniques and extends the use of proposed code based circuits to other applications such 
as those which are space related. 
111 
CHAPTER 5 
SEU MITIGATION SCHEME FOR 
SEQUENTIAL CIRCUIT ELEMENTS 
5.1 Introduction 
In this chapter, we will look into SEU mechanism very briefly and will discuss some of 
the related previous work in this area. After that we will discuss in detail the proposed 
SEU I SET mitigation technique for sequential elements of the microelectronic circuits. 
Some modifications/optimizations are suggested for the proposed technique for different 
applications. A SEU Simulator design is discussed which has been proposed to test the 
efficacy of the proposed mitigation scheme. Finally, some comparison results are 
discussed along with the performance index of the proposed scheme. 
5.2 SEU Mechanism 
Microelectronics has experienced a dramatic increase in density and speed due to decrease 
in the feature sizes with which these devices are manufactured. The effects of scaling on 
the single event response of microelectronics are a direct result of the physics of energy 
loss, charge collection, and upset due to a cosmic ray striking a junction in an IC device 
Further to the brief introduction of the subject in chapter- 2, many good summaries exist 
[PET-082][PET-0971[MAS-0931[SEX-092] that review these concepts in more detail. 
SEU in static latches and SRAMs became an important issue once feature sizes dropped 
below 10 microns and the critical charge for upsetting a circuit dropped below lpC 
(roughly corresponding to a particle LET of 50MeV cm 2/mg and a collection depth of 2 
microns)[SEX -092]. Static latch SEU vulnerability has been calculated and measured as a 
function of technology feature size [DOD-095][PET-082]. This establishes the relationship 
between the required critical charge to upset the circuit and the technology feature size. 
According to this relation, if electronics feature sizes decrease from 0.8 micron to 0.18 
micron, then the critical charge decreases by nearly a factor of 20 [PET-082]. 
112 












EE1EH•i 	H H 
1 
	 ii. 	I 	 I 	 I 	 III 	 I 
- 	 1000 	 100 	 10 
Feature Size (nm) 
Figure-5. 1: Critical Transient Width Vs Feature Size for Un-attenuated Propagation [MAV-098] 
The curve in Figure-5.1 is the result of SPICE [NAG-073] simulations performed for 
various technologies (feature sizes shown by the dots on the curve) between 1.2 microns 
(1200 nm) and 0.13 microns (130 nm). A generic set of SPICE model parameters was 
used with known model parameters for technology sizes between 1.2 microns and 0.7 
microns, inclusive[MAV-098]. The constant field scaling rules were applied to the generic 
model and to the transistor sizes to predict model parameters at the smaller feature sizes. 
The scaled values of various critical parameters (VDD, VTH, and TOX) were consistent 
throughout these projections and were published in the National Technology Roadmap for 
Semiconductors [HIR-001]. The solid curve in Figure-5.1 simply connects the simulation 
points while the dashed curve simply extrapolates the points to 0.05 micron (50 nm), the 
projected future feature size of commercial technologies. 
The FPGA's configuration bits (Bit-stream) are used to configure both the logic elements 
(clusters) and the routing fabric. An Upset of a configuration bit in a FPGA is much more 
serious than a conventional data bit upset. If a control-bit controlling a logic element 
changes its logic state, then the logic functionality of the FPGA is altered. If a control-bit 
of routing fabric experiences an upset, then the FPGA essentially becomes rewired. In 
either case, the programmed circuit function is no longer what was intended. This upset 
can change either the destination of a signal or the source of a signal as shown in Figure-
5.2(a, b). These affects may cause a short circuit or open circuit situation. For these 
reasons, the configuration storage of the FPGA must be totally immune to SEU. 
113 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
a) Upset Mechanism in a Reconfigurable Fabric 	 b) SEC in Switch Box 
Figure-5.2: SEU in Signal Routing Path 
Radiation which results into a Soft Error (SEU and SET) can cause permanent faults in 
reconfigurable architectures, for example when it hits the configuration memory of the 
reconfigurable fabric. The configuration memory contents, as stated earlier, control the 
overall functionality of architecture through routing of a signal as shown in the Figure-5.2. 
This bit-flip remains effective, due to SEU or SET till the time bit-stream (configuration 
data) is reloaded (Scrubbing Technique) or corrected by dedicated hardware. The results 
from bit-stream fault injection [LIIM-001] and radiation ground testing [CAR-0011 have 
confirmed the efficacy of TMR structure combined with scrubbing, to recover from upsets 
in the reconfigurable architectures. However, the TMR with scrubbing technique has its 
own limitations, such as area over head, does not cover SET faults and voting circuit 
faults. Figure-5.3 illustrates the general TMR scheme. 
!NPULL....J MODULE 
INPUT-2 	 _______ VOTER 	olp 
MODULE 	 LOGIC 
1NPU DULEr 
Figure-5.3: General TMR Scheme 
This research work proposes a novel design technique to cope with both SEU and SET 
faults. The design technique is based upon unique temporal data sampling with weighted 
voting. This design technique not only gives 100% fault recovery from SEU but also gives 
114 
5: SEU Mitigation Scheme for Sequential Circuit Elements 
100% fault recovery from SET, dual event faults and 50% recovery from triple event 
faults. The technique has an auto correction mechanism and does not need scrubbing, 
hence improves the overall performance and speed of the system. 
In the case of reconfigurable architectures, the problem of finding an efficient technique in 
terms of area, performance and power is very challenging due to the high complexity of 
the architectures. An SEU is classified as a soft error but has permanent effects in 
reconfigurable architectures, for example, the SEU can effect the routing of a signal by 
affecting configuration memory and user's synchronous circuits by affecting the memory 
elements and combinational circuits (Figure-5.4). M (Figure-5.4) represents memory 
associated with different components of a reconfigurable architecture. The consequences 
of SEU can not be handled through standard ASIC fault tolerant schemes such as EDAC. 
TMR is an attractive scheme for reconfigurable architectures because it provides full 
hardware redundancy including user's combinational, sequential circuits, the routing and 
JO pads. However, TMR has limitations like area overhead, 10 pad limitations, power 




Figure-5.4: SEU Sensitive Configuration Bit Storage Circuits 
Redundancy techniques as stated before such as duplication and triplication are commonly 
used for designing reliable systems to ensure high dependability and data integrity. The 
proposed scheme is based on hardware and temporal redundancy which give immunity 
against both SEU and SET. 
5.3 The Proposed Model 
While conventional SEU error rates are independent of the chip clock frequency. SET 
induced error rates increase in direct proportion to the operating frequency. This error rate 
115 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
relation actually compounds the SET problem as technology feature sizes shrink, since 
smaller feature sizes result in smaller gate delays that permit circuits to be operated at 
higher clock frequencies. Not only does each combinatorial gate in the circuit contribute 
to the SET error rate (because transients are no longer attenuated), but the probability of 
storing any given transient error will also increase (because of the higher clock 
frequencies). Shrinking device sizes have had serious, if not grave, implications for FPGA 
devices used in cosmic-ray environments. As reported [BUC-097][BAZ-097], for typical 
FPGA designs, SET induced error rates may actually exceed the SEU rates of unhardened 
latches as clock speeds approach 100 MHz for CMOS designs. 
Figure-5.5 illustrates the circuit topology found in nearly all sequential circuits. In 
FPGAs, the sequential circuitry represents the routing switches and logic elements that 
precede each D-Flip-Flop on the chip. The data from the first latch (Li) is typically 
released to the combinatorial logic on a clock edge. at which time logic operations are 
performed through gates. The output of the combinatorial logic reaches the second latch 
(1-2) sometime before the next clock edge. At this clock edge, whatever data is present at 
its input (and meeting the setup and hold times) is stored within the respective latch. 
\ Data r 	 Combinational 	 Data out 
} 
- 	 t Logic Circuits 
cJ 
Figure-5.5: Typical Sequential Circuit Topology and SEU 
If a heavy ion strikes within the combinatorial logic block, and the logic is fast enough by 
the virtue of advanced technology to propagate the induced transient, then the SET 
eventually appears at the input of the second latch in Figure-5.5 where it may be 
interpreted as a valid signal. Whether or not the SET gets stored as real data depends on 
the temporal relationship between its arrival time and the latching edge of the clock. 
Additional invalid transients can occur at the combinatorial logic outputs as a result of 
SETs generated within the signal lines that control the function of the logic. 
116 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
Setup Time Hold Time 
Clock  
_JL 	Non Latching SET 
(b) 	 Earliest Latching SET 
(C) 	 Late Latching SET 
(d) 	 I 	I Non Latching SET 
Figure-5.6. Temporal Relationship for Latching a Data SET as an Error. 
An example of this is an SET generated within the configuration storage circuitry that 
controls either the FPGA logic functionality or the routing connectivity. This is a case 
where true/actual data is low and a positive SET appears at the input of the latch. The 
transient is incorrectly interpreted as valid data, and subsequently stored in the latch if 
SET arrives at or before setup time and lasts long enough to meet hold time requirements. 
Figure-5.6 shows four times where an SET can arrive; (a) and (d) satisfying a non-latching 
condition and (b) and (c) satisfying the earliest and latest arrival times for a latching 
condition . Similar errors can occur from transients that might appear on the clock line. 
References [BUC-097] and [BAZ-097] give several examples of clock SET induced 
errors. 
5.3.1 Temporal Sampling 
The first key step in the proposed technique is Temporal Data Sampling'. This sampling 
technique eliminates all SETs in the clock and data signal [BALO-06a]. The next section 
explains the basis of the proposed techniques. 
A level triggered latch exhibits two distinct modes depending on the state of its clock 
signal. Each level sensitive latch is in sampling mode at the high level of its clock signal 
and is in blocking mode when its clock signal is low. In blocking mode the latch holds the 
data and the data changes at the input are blocked. In sampling mode the latch behaves 
transparently. A conventional D-Flip-Flop can be constructed with the help of two level 
sensitive latches. Figure-5.7 illustrates functional equivalent of a D-Flip-Flop. 
117 






a) -ye edge triggered Rip-Flop 	 b) +ve edge triggered Flip-Flop 
Figure-5.7: Functional Equivalence of Flip-Hop 
The complement of a clock signal is formed through an inverter. Therefore for SEU 
hardening, there is no need to route complementary clock signals over the chip. As 
explained later, radiation induced transients on the clock signals will not affect the SEU 
immunity of the proposed technique. The latch based functional equivalence of D-Flip-
Flop can be used with different clocks in TMR fashion to add parallelism. A simple 
embodiment of the Temporal Data Sampling is shown in Figure-5.8 [BALO-006]. 




ClkC 	H - 
Figure-5.8: Temporal Data Sampling 
The circuit consists of six level sensitive latches (3 at the input stage and 3 at the output 
stage) illustrated in Figure-5.8 as Primary and Secondary sections. The set of three latches 
(Li, L2), (L3, L4), (L5, L6) operate in parallel and sample data at different time intervals. 
These samples are used in weighted voting logic to eliminate single event upsets. Three 
different clocks (Clk-A, Clk-B, Clk-C) are used. These three clocks are derivative of the 
main clock and have a phase shift and 25% duty cycle to cope with the SETs as shown in 
Figure-5.9. If SEU is observed on any one of the clock lines, the phase shift in the 
remaining clock signals will help the respective set of latches to store the correct data at 
different time intervals, hence voiding the effect of a spurious glitch on the clock line due 
to radiation. Any transients due to radiation last for a small period of time and if it happens 
at the negative edge of any one of the clock signals then it will die out before the other 
temporal latches start their operation due to the phase-shift difference in clock signals. So 
this clocking scheme will help to cope with all the single event transients either in Data 
118 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
line or any one of the clock signals. The SEU/SET elimination process is explained in 
greater details in the next sections. Mavis et. al proposed a scheme based on nine level 
triggered latches and four different clocks [MAV-002]. The speed and area overhead of 
Mavis et. al. scheme due to extra components makes it less attractive than the proposed 
scheme in this chapter. Comparison results in terms of area, power and speed are 
represented in chapter-8 and at the end of this chapter. 
I 
2ClOckCye$ 
CLkA  -------- 
CLk- 
Figure-5.9: Clocking Scheme for the proposed Architecture 
The insertion of the two extra clock phases (CLK-B and CLK-C) is required for the 
additional temporal sampling IBALO-006]. The net computation cycle is from the 
negative edge of CLK-C to the next negative edge. In this case the effective on-chip 
computational frequency is exactly one half the frequency of the master clock. This is 
because of phase shifts in different clock signals. As established earlier, SEU immunity 
cannot be achieved without extra cost. This extra cost can be in terms of speed, area and 
power. The single most important thing is to find an SEU solution which can give the best 
SEU immunity with very little overheads. We will see in later section that this latency can 
be improved through different optimization. Clock generation from the main clock is fairly 
straightforward. There are many methods in textbooks and literature to generate different 
clocks which a have temporal relationship from the main system clock. However here is a 
simple method for the clock generation presented in Figure-5.10. 
119 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
Figure-S. 10: Clocking Scheme for the proposed Architecture 
5.3.2 Weighted Voting Circuitry 
Majority voting is commonly used in TMR systems. The proposed mitigation technique is 
based upon unique 'Weighted Majority Voting'. Each node has been assigned a voting 
weight based on a simple rule which is, Voting weight of a node is inversely proportional 
to the probability of the node being disturbed by the radiation in that particular data path" 
[BALO-005]. Each output node (2, 4, 6) has two memory/storage elements in its data path, 
so the probability of being hit by radiation is double than each output node (1. 3. 5), which 
has only one memory/storage element in its data path. The output nodes (1, 3, 5) of the 
latches (Li, L3. 1-5) shown in Fig-5.8, has twice the voting weight of the output nodes (2, 
4, 6) of the latches (1-2, L4, 1-6). If P(n) is the probability of node n, to be hit by SET/SEU 
then voting weight of node n, VW(n) is defined as: 
VW(n) = UP(n) 	 (6.1) 
Let us assume the probability of nodes (i, 3, 5) is 0.5 The voting weight of nodes (1, 3, 5) 
is calculated as 2 through equation-6.1 and voting weight of nodes (2. 4, 6) turns out as 1. 
Figure-5.11 shows the process of weighted majority voter. All outputs from the latches 
(shown as Temp Latch in Figure-5.1 1) are fed to simple majority voter. As explained 
before, the three primary section nodes represented in Figure-5.8 have voting weight equal 
to two whereas the three secondary section nodes have one. So, the total possible voting 
weight for any logic value is 9 (2+2+2+1+1+1). The output of the simple majority voter is 
fed into a correction unit shown in FigureS. 11 as recovery —unit. The unit checks the output 
and all the input to correct voter circuit faults and to override in case of a voter fault. The 
majority voter circuit gives one additional output as recovery. Recovery output is only 
asserted when more than one SEU/SET fault is detected and corrected. The recovery 
output may request the main microprocessor/DSP of the reconfigurable architecture for 
scrubbing. As the scrubbing request is only sent when more than one SEUs are detected 
and corrected, this improves the system performance, as scrubbing frequency is half than 
previously introduced SEU mitigation schemes [LIM-089] [LIM-003] [CAR-O0 i]. 
120 







Figure-5.1 1: Proposed SEU/SET Mitigation Technique with Self Correction Mechanism produced 
by Synplify ASIC 3.0.4 [SYNP-00] 
The weighted voter circuitry has the capability to recover from internal faults. This unique 
feature enhances the reliability of the proposed scheme. In case of an SEU, the download 
process of bit-stream is no longer required for single event upsets due to the auto 
correction feature of the proposed technique. The auto correction mechanism is activated 
through recovery command. This feature makes the design very flexible and enhances the 
overall system performance. The complete process of SEU/SET mitigation is shown in 
Figure-5.1 1. Figure-5.1i is a synthesised logic diagram of the proposed scheme, generated 
through Synplify ASIC 3.0.4. 
5.4 Case Examples 
Let us assume that the input data is '1' which is at the input of all the latches Li, L2, L3. 
At the negative edge of the relevant clock signal the same data will appear at node 2, 4, 6. 
In the following section, we will present different scenarios of a fault recovery by keeping 
the initial conditions the same as described above: 
5.4.1 Single Fault Recovery 
A single fault can occur at any node. Let us assume that the node 2 (Figure-5.8) is 
disturbed by the radiation and the data-value flips from logic '1' to 0'. Total votes are 
calculated on the basis of weights assigned to each node. As discussed earlier the weight 
of the primary section nodes is 2. Table-5.1 explains the mechanism. It can be seen that 
8/9 votes for logic 1' eliminates the SEU fault through weighted voter. 
121 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
Table-S. I: Single Fault Recovery Process 
NODE 
NODE LOGIC VALUE VOTING WEIGHTS 
FOR LOGIC Before SEU After SEU 
I  1 2 - 
3 I  1 2 - 
5 1  1 2 - 
2 1  0 - 
4 
6 
Total Votes 8 
5.4.2 Multiple Fault Recovery (Double Fault) 
Double fault can occur at any two nodes. Different permutations of double faults are 
possible, for example, radiation hitting two nodes and both having the same weights (2 or 
1) or one node with weight equal to 1 and other with weight equal to 2. Let us take a case 
when the both nodes (node-3, node-5) with voting weight 2 are hit by radiation and the 
values are flipped (from logic 1' to 0' in this case). Our proposed scheme will vote for 
the correct value even if two nodes are disturbed, as illustrated by the table below: 
Table-5.2: Multiple Fault Recovery Process 
NODE 
NODE LOGIC VALUE VOTING WEIGHTS 
FOR LOGIC 
1 1'  
Before SEU After SEU 
2 - 
3  1  0 - 2 




Total Votes 5 4 
A software SEU simulator has been designed and coded to verify the mitigation technique. 
Post synthesis simulations were carried out in Active HDL and results are shown in 
Figure-5.12. These simulation results show recovery from voting circuit faults, SEUs and 
SETs. The simulator design is discussed in later sections. 
The weighted voting along with the temporal sampling gives 100% SEU and double event 
upset (DEU) immunity. It gives 50% recovery in the case of triple event upsets depending 
which nodes are disturbed through radiation. 
122 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
Figure-5.12: Simulation Results Generated by Active HDL Simulator [AHDL-00] 
5.5 The Proposed Mitigation Technique 
The circuit presented in Figure-5.8 can be optimized at two levels. The first level is to 
improve synchronization and the second is to improve the scheme in terms of 
performance. As discussed earlier, the data has been sampled at different time intervals 
and it is necessary to synchronize the data so that the voting logic can operate at the latest 
data samples. Figure-5.13 shows a simple synchronization scheme which helps to reduce 
the two inverters and synchronizes the data. The three latches of the secondary stage can 
be utilized as data release latches and they all work on the negative edge of the Clk-C. 
C Ql—ø4c Q 







Figure-5.13: sample Node values for One Computation Cycle 
The proposed technique is based on temporal sampling and weighted voting as explained 
in the previous section. Figure-5.14 shows a logic waveform response of all the nodes 
(Figure-5.5) subject to a particular data input for one computation cycle [BALO-006]. 
Figure-5.14 helps to show that though we are sampling data at different clocks, we are not 
sampling data at different discrete times for all of the nodes. The timing diagram for Node- 
123 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
2 & 3 and Node-4 & 5 are the same. The node-2 and Node-3 samples data at the same time 
through different clocks and so is Node-4 and 5. The different sampling clocks were 
introduced to get SET and SEU immunity regarding data and clock signals. If we can 
retain the same level of immunity by eliminating this redundancy in data sampling as 
discussed above then we can appreciably save area and power. 











Figure-5.14: sample Node values for One Computation Cycle 
Figure-5.15 gives the introduction of the proposed scheme for complete SEU and SET 
immunity. The circuit consists of five level-sensitive latches'. Each latch operates in 
Sampling Mode' when its respective clock signal is in high state and in Blocking Mode' 
when its clock signal is low. This optimization helps to synchronize the outputs while 
reducing one latch as compared to the basic proposed scheme BALO-006]. The detail of 
SEU/SET immunity along with circuit operation is explained in the next section. 
poc 
Figure-5. 15: Proposed Temporal Data Sampling 
124 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
The Temporal Sampling stage helps to store Data samples at different time intervals. 
These samples are used in voting logic to eliminate single event upsets. Three different 
clocks (CLk-A, CU-B & CLk-C) as discussed are used. These three clocks have a 90-
degree phase shift and 25% duty cycle to cope with the SETs. 
The proposed design technique has two stages; data sampling and data release stage. The 
latches Ll, L3 and L5 constitute the data sampling stage while L2, L4 and 1-5 constitute 
Data release stage of the proposed technique. The latch L5 is common in both stages. The 
latches from sampling stage capture data at different time intervals based on their 
respective clock signals. The data release stage serves the purpose of synchronization of 
different samples which are stored through the data sample stage. CU-C serves as a 
sampling clock as well as a sample release clock. For any given data, two samples of data 
are stored at different time intervals (CLk-A, CU-B). Third data sample is stored at time 
(CLk-C) and at the same time previously stored samples are released to the majority 
voting logic along with this data sample. 
Figure-5.16 illustrates new clocking scheme. The computation cycle has been improved 
than the previously discussed clocking scheme and the computation cycle has been 
improved to 3/2 from 2 with respect to system clock. The computation cycle is shown in 
Figure-5.16 which has been 25% improved. 
2 Coa Cy.. —I 
CLOCK 
CLkA _ __  
CLk- B m 
CU-C 
Figure-5.16: Clocking Scheme for the proposed Architecture 
The operation of the circuit of Figure-5.15 with the clocking sequence of Figure-5.16 is 
most easily explained if we start at the beginning of a computational cycle which begins at 
the rising edge of Clk-C. At this time the sample release latches (1-2, L4, 1-5) pass their 
input data to the majority gate where it subsequently appears at the output node. Clk-C 
subsequently goes low, the sample release latches (L2, LA. 1-5) enter into a hold state, and 
this original data remains asserted on their respective outputs for the remainder of the 
computational cycle. This output data is then processed by combinatorial logic before it 
appears at the input to the next temporal sampling latch. The data must arrive at the input 
to the next stage before the rising edge of Clk-A at which time the data is stored in the 
125 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
latch Li. Clk-B then goes high to sample the input. Whatever the data is at the input 
when CIk-B goes back low is then stored in latch U. Finally, Clk-C toggles high and low 
to sample and hold the input data in the latch and release the data at the same time. At this 
time another computational cycle begins. 
5.6 SEU / SET Mitigation Process 
The two key factors, temporal and spatial redundancy of the proposed scheme alleviate the 
upsets. Three distinct upset scenarios are considered here to explain the fault immunity 
procedure of the proposed design. 
5.6.1 Static latch SEU 
This scenario is caused when radiation flips the data at the output of a latch which is 
operates in "Blocking State". The proposed scheme is composed of five latches. These 
latches, as discussed earlier, work on different clock signals and sample the data at 
different times. So, the data flip will be observed at the output node of only one latch due 
to temporal sampling of the data and the rest of the latches will keep the correct data. The 
majority voter circuitry works on these four correct values along with one faulty value and 
ensures that the correct data value is asserted on the output node. 
5.6.2 Data SET 
This phenomenon is caused when a high energy particle hits a combinational circuit in 
such a way that a transient is observed which is strong enough to travel from an incident 
node to the input of a storage element as a normal data signal. The temporal relationship 
between data sampling latches ensures that only one latch stores this unwanted data. This 
false data is only stored in one of the latches if it is synchronized with the respective input 
clock (Clk-A, Clk-B, Clk-C) due to the temporal relationship between data sampling 
latches, as the technique is not based on one clock but is based on three different clocks. 
Again, the majority voter circuitry works on these four correct values along with one 
faulty value and ensures that the correct data value is asserted on the output node. 
5.6.3 Clock SET 
An SET in a clock signal is caused when radiation strikes on the nodes in clock circuitry. 
The proposed technique uses three clock signals (Clk-A, Clk-B, and Clk-C). Clk-A and 
Clk-B are sampling clocks where as Clk-C samples and releases the previously stored data 
126 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
by the rest of the clock signals. We will consider two distinct behaviours regarding clock 
SETs. If a clock is low, an SET will result in a rising edge followed by a falling edge and 
if clock is high then an SET will produce a falling edge followed by a rising edge. In the 
later case, old data is stored in that respective latch and the true falling edge will store the 
correct data. The spatial redundancy in the proposed technique will help the majority voter 
to assert correct value at the output. A special case, when an SET overlaps the true falling 
edge. This may shift the falling edge earlier in time by an amount less or equal to the 
width of SET. Again, the spatial redundancy ensures that only one output node is disturbed 
which can be corrected at the majority voting stage. 
5.6.4 Majority Voting 
Majority voting is commonly used in TMR systems. The proposed mitigation technique is 
based upon simple majority voting. All the data samples which were stored at different 
time intervals are fed into the majority voter circuitry. The data samples from the sample 
release stage' are compared with each other. Data is considered Fault Free' if no 
disagreement is found. On the other hand, if disagreement is found, the data samples from 
the 'sampling stage' are considered with the voter output to evaluate Fault Free output. 
The samples from the sampling stage' provide more data value to compare and evaluate 
fault free value. This unique feature helps to eliminate all SEUs and SETs . The majority 
voting circuitry is equipped with a watch-dog circuit which checks for voter faults. This is 
called the voter fault recovery unit. This constitutes as an extra level of security on the 
voter calculations. In case of voter circuit faults the over-ride functionality corrects the 
output by taking into account the original inputs. 
5.6.5 Size Trade-Off 
The proposed scheme suggests the replacement of a D-Flip-Flop in a circuit with the latch 
based scheme. It is obvious that the proposed design of the latch based scheme will take 
more area than the D-Flip-Flop. However the area will not grow 4 times the total area 
because of following reasons: 
D-Flip-Flop (DFF) is normally composed of two level sensitive latches. The proposed 
scheme is based on 5 latches and a majority voter which advocates that the area will not 
exceeds 5 times as compared to the DFF based design. The impact on area increase is only 
on the percentage of total area of an IC which is composed of D-Flip-Flops. As this 
scheme only targets synchronous elements of the circuit so area increase will be attributed 
to hardening of only these elements. We did some work on different latch intensive 
127 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
circuits to see the impact of D-Flips-Flops in terms of total area. We used ISCAS89 bench 
mark circuits and the analysis shows that only 24% of the total area in case of the s386 
circuit is utilized by DFFs. Only 27% and 29% of the total area is used by DFFs in the 
case of S344 and S349 bench mark circuits. 
There is a fair chance that the synchronous elements in an electronic design already 
contain some level of SEU immunity. This could be either due to transistor sizing or 
technology. etc. So in this case the area increase is not as assumed rather in some cases the 
increase would be none if the design is based on DICE [DOO-090] latch cell which is 
roughly 2.5 times greater in size than the proposed design. On the other hand, it is also 
possible that only certain blocks are important and SEU immunity is required only for 
these blocks. In that case, area increase will be due to only radiation hardening of that 
particular area. So. it is safe to say that area increase is dependant on the design and on the 
application. There may be a case, due to the factors discussed earlier, where rather than 
any area overhead some area can be saved by incorporating the proposed scheme. 
5.6.6 Static Data Storage 
The FPGA's configuration bits (bit-stream) are used to configure both the logic elements 
and the routing switches. An upset of a programming bit in a FPGA is much more serious 
than a conventional data bit upset. If a logic element control bit changes its state, then the 
logic functionality of the FPGA is altered. If a control-bit of routing switch experiences an 
upset, then the FPGA essentially becomes rewired. In either case, the programmed circuit 
function is no longer what was intended. For these reasons, the configuration storage of 
the FPGA must be totally immune to SEU. As the data is static to this configuration 
memory and there it is necessary to scrub the errors periodically before they start 
accumulating. 
A shift register is proposed in Figure-5.17a for the technique which could also be used to 
store the configuration of a reprogrammable fabric. The configuration bit-stream is stored 
at the beginning of the execution cycle of an application and the scrubbing clock would 
maintain data integrity over long periods of time (Figure-5.17b). The configuration signal 
shown in Figure-5.17b ensures the original bit stream data to be loaded in shift register and 
once configuration is done the self-scrub mechanism corrects any errors to avoid any error 
accumulation. The scrubbing frequency is much smaller than the master clock frequency 
and dependant on the system and application requirements. 
128 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
C Q 
> 	1' 
a) Self-Scrub in chain of shift registers 
	
b) Scrub mechanism for configuration storage 
Figure-5.17: Self-Scrub Mechanism for Static Data Storage 
Figure-5.18, 19 and 20 suggest some optimizations in the proposed scheme for static data 
storage. These optimizations help to reduce the number of clocks and storage elements. 
These optimized circuits can be used for the configuration memory of a reconfigurable 
fabric where data is written at the boot time only. Figure-5.18, 19 suggests a scheme for 
configuration storage mechanisms by incorporating a golden data value. The input data to 
flip-flops serves as a golden data value as shown in Figure-5.20. Flip-flops Li and L2 help 
to capture two data samples at the rising edges of a single clock signal (Clk-A and Clk-B) 
which is 180 degrees out of phase and has a 50% duty signal. It is important to note that 




Cit-li, 	 H 
C 0 	 C Q 
I 
II. 
V, F.t R.o.ly IJlt 
C 0 
CLk-E 
Figure-5.18: The Proposed technique with reduced clock signals 
These features make the proposed technique even better than the most commonly used 
TMR scheme where three copies of the storage element are used along with voting 
circuitry. We have compared the proposed technique with different SEU mitigation 
techniques and the results are analyzed in the results section. The proposed technique can 
be optimized for one clock signal as well for time critical applications [BALO-006]. This 
optimization is achieved by capturing the data sample at the rising and falling edge of the 
same clock. This is illustrated in Figure-5.20. 
129 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
Golden Value 
 
I 	 - 
Data 	 D 	 C 	• 
Mjoni 
CLi.B 	K 
I 	r 	L.2 
Figure-5.19: The proposed scheme with reduced clock signals and latches 
Figure-5.20: Optimized proposed SEU mitigation scheme 
We first elaborate the experimental flow which is incorporated for validating the proposed 
scheme. Then, we discuss the SEU simulator; which we have developed to insert faults 
representing SEUs. We also discuss the functional testing procedure employed for 
accessing the SEU immunity of the proposed technique and then we analyze the results by 
applying our unique technique. 
5.7 SEU I SET Simulator 
The proposed technique is coded in C' programme, which takes VERILOG net-list of the 
circuit under test as input [BALO-006]. VERILOG net-lists are obtained through 
SynplifyASIC software. D-flip-flops are identified and structural modifications are made 
to the original circuit by modifying the net-list. The modified net-list is then fed into the 
software simulator to analyse the behaviour of the circuit under SEUs/SETs. The SEU 
immune net-list can be mapped on any ASIC/reconfigurable architecture (FPGA, etc.) 
through a suitable software tool. As said before, a SEU simulator is designed to create a 
realistic scenario for the faults to be injected into the circuit under test due to SEU. 
The final step in a design flow is to calculate Error. The circuit under test is introduced 
with SEU faults through SEU simulator. Figure-5.21 shows the process of error 
calculation. The functional operation of the proposed technique is compared with the 
original circuit without SEU faults. A disparity between two circuits indicates that the 
II] 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 






SEU SlMULATORI[_.I SEU Hardened Circuit 
Compare 
ERRORS 
Figure-5.2 1: Error evaluation of the proposed technique 
The SEU simulator is designed for the purpose of fault injection. The SEU simulator has 
these three main design considerations. An SEU can occur on any line of the circuit. The 
fault can flip the logic value at any node. The SEU simulator is designed to randomly 
inject a SEU fault on any node/signal (0 to 1 or 1 to 0). 
• An SEU can be of variable duration. When an SEU occurs at any node, it inverts 
the value on that line. The simulator allows the variation of SEU duration. The 
duration of SEU represents the period of fault injection. 
• An SEU can occur at any instance during the functional operation of the 
application. The SEU introduces a fault on a line randomly in time. 
• The SEU simulator has a special feature that it can induce multiple faults in 
addition to single event faults. 
Let us take a case where line A has to be induced with an SEU. The line/node is assigned 
to one of the outputs of the simulator. The simulator assigns the original value on line A as 
fault-free value. A logic 1' on the simulator output driving line A will invert the original 
value during simulation. This value is denoted as fault-value as shown in Figure-5.22. The 
fault value can be inserted for any time units and at any time interval. 
131 
5: SEU Mitigation Scheme for Sequential Circuit Elements 
LineA 	' 	 fV 
Logic 1' 
SEU SIMULATORI 




- 	 - 
Final Value on Line A 
TIME 
Figure-5.22: Fault injection process of the designed SEU simulator 
Experimental results are derived for ISCAS89 benchmark circuits [BEN-MRK]. These 
circuits contain a combination of synchronous and combinatorial elements. We injected 
1000 random SEU, SET faults, to verify our proposed scheme. These random fault vectors 
are stored so that the same faults vectors can be applied to all of the test scenarios which 
are explained later. Faults were injected through SEU simulator. The proposed scheme can 
handle all the SEUs and SETs as well and enhances the system performance because no 
extra hardware/software is required for SETs in the clock. The synthesis tool used for 
technology mapping and optimization in this paper is Synplify ASIC 3.0.4. The 
technology used is 0.18 micron Cell library. Post synthesis simulations to verify the 
proposed technique are performed with the help of Verilog-XL and the toggle activity for 
each node is captured. The power figures are calculated through Synopsis Design 
Complier with the global operating voltage set as 1.8V. The results for all bench mark 
circuits are assembled in four different scenarios. 
5.7.1 Scenario 1: 
Standard ISCAS89 benchmark circuits [BEN-MRK] without any level of immunity to 
SEUs and SETs are synthesized and the power figures are computed with the help of the 
afore-said tools. These figures help to estimate the area and power overhead incurred to 
achieve radiation immunity for different mitigation schemes. 
132 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
5.7.2 Scenario 2: 
Standard ISCAS89 benchmark circuits [BEN-MRK] with full standard TMR are 
synthesized and power figures are computed. The figures help to compare the proposed 
scheme's performance with respect to the standard TMR scheme. 
5.7.3 Scenario 3: 
There were no power or area figures available in literature for Lima's[LIM-003] scheme 
for the ISCAS89 benchmark circuits [BEN-MRK]. For the sake of a fair comparison. 
Lima's.LIM-0031 scheme was implanted on the benchmark circuits and the same set of 
input vectors and random faults were applied. These results as shown in Table-5.3 are used 
to prove the performance advantage of the proposed scheme in terms of area and power 
consumption. 
5.7.4 Scenario 4: 
The last scenario is where the proposed scheme is implemented on ISCAS89 benchmark 
circuits [BEN-MRK] and area/power figures are calculated. 
It is evident that a power saving of approximately 68% can be achieved over the standard 
TMR scheme with the help of the proposed scheme. The table-3 represents the power 
consumption of different SEU mitigation techniques along with the standard benchmarch 
circuits without any SEU immunity. As discussed earlier, SEU immunity can not be 
achieved without extra hardware or time cost. Table-5.4 represents the overhead of 
different schemes in terms of power consumption. The power saving depends upon the 
circuit to circuit. 
Table-5.3: Power requirement and analysis for SEU immunity for JSCAS89 bench mark circuits 
133 
5: SEU Mitigation Scheme for Sequential Circuit Elements 
Circuit Standard Circuit 







Lima et al. 
[LIM-003] 
(mW) 
S298 0.98 2.65 2.31 2.48 
S344 1.165 3.27 2.71 2.99 
S349 1.165 3.28 2.69 3.01 
S382 1.077 3.04 2.5 2.86 
S386 0.926 2.86 2.437 2.617 
S420 1.135 3.10 2.67 2.83 
S444 1.187 3.40 2.68 3.12 
S641 2.423 6.53 4.903 5.76 
S713 2.37 6.23 4.84 5.69 
S838 1.865 4.95 3.81 4.235 
Table-5.4: %age SEU Power Overhead for ISCAS89 bench mark with respect to standard circuits 
without SEU immunity 





Lima's et al. [LIM-003] 
power 
overhead w.r.t. standard 
S298 170.4% 135.71% 153.067( 
S344 180.7% 132.61% 156.65% 
S349 182.1% 130.90% 158.36% 
S382 182.617c 132.12% 165.55% 
S386 209.7% 163.17% 182.61% 
S420 173.2% 135.24% 149.33% 
5444 186.5% 125.77% 162.84% 
S641 169.8% 102.35% 137.72% 
S713 162.8% 104.21% 140.08% 
S838 . 	 165.5% 104.28% 127.07% 
Power consumption is critical for space related application and the power saving over the 
standard TMR makes the proposed scheme very promising for critical space applications. 
As mentioned earlier, TMR is the most commonly used scheme in the majority of the 
architectures when used for space related applications.. The authors have compared the 
proposed solution against standard TMR for ISCAS89 bench mark circuits [BEN-MRK]. 
The results are illustrated in Table-5.5. We have compared percentage power saving of the 
proposed scheme over standard TMR and results are very encouraging as approximately 
67% power saving can be achieved depending on the type of circuits. The authors looked 
into different schemes in literature and evaluated the proposed scheme. For example 
Lima's scheme has an advantage of approximately 38% over standard TMR in terms of 
percentage power saving for ISCAS89 benchmark circuits. This proves that the proposed 
134 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
scheme is power efficient when compared with the Lima's scheme. We have compared the 
two schemes and the results show a significant power saving. The advantage of the 
proposed scheme over the previous work done by Lima et al. [LIM-003] is shown 
Table5.5 and 5.6 which advocates the efficacy of the proposed scheme over the Lima's 
scheme. in terms of power consumption. A percentage power saving of approximately 
37% over Lima's scheme [LIM-003] is observed. 
Table-5.5: Power saving through the proposed scheme with respect to ISCAS89 [BEN-MR K] 
bench mark circuits 
Circuit Proposed 
saving over TMR 
Lima 
Power saving over TMR 
Proposed 
saving over Lima et. 
al. [LIM-0031. 
S298 34.69% 17.34% 17.34% 
S344 48.15% 24.12% 24.03% 
S349 51.24% 23.77% 27.46% 
S382 50.51% 17.08% 33.42% 
S386 46.54 17c, 27.10% 19.43% 
S420 37.97% 23.87% 14.09% 
S444 60.74% 23.67% 37.06% 
S641 67.47% 32.10% 35.36% 
S713 58.64% 22.78% 35.86% 
S838 61.28% 38.49% 22.78% 
It is vital to compare the area overhead because silicon usage has a major impact on the 
performance and cost. We have compared the area overhead of the proposed scheme and 
is reported in Table-5.6 along with the original circuit area. 
Table-5.6: Area requirement and analysis for SEIJ immunity for ISCAS89 [BEN-MRK] bench- 
mark circuits 
Circuit Standard TMR Lima et. al. 
fLIM-001 I 
The Proposed Scheme 
S298 2040 6325 5978 4728 
S344 2162 6760 5642 4648 
S349 2169 6752 5747 4771 
S382 2744 8517 7436 6119 
S386 1601 5130 4337 3506 
S420 2314 7065 5923 4836 
S444 3057 9338 7978 6465 
S641 2882 8879 8040 7060 
S713 4023 12109 10258 8649 
S838 4532 13627 12463 9290 
135 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
When compared with the standard TMR scheme, the proposed technique is significantly 
more efficient in terms of area. The standard TMR has an overhead of approximately 
200% to achieve the first level of SEU immunity for a given circuit. The authors compared 
the percentage area overhead for ISCAS89 bench mark circuits and achieved 
approximately 70% area saving over standard TMR. 
Table-5.7: Area requirement and analysis for SEU immunity for ISCAS89 [BEN-MRKI bench 
mark circuits 
Circuit Area Overhead of 
Proposed Scheme 
Area Overhead of 
TIM  
Area Over head of 
Lima el al. [LIM-0031 
S298 131% 210% 193% 
S344 115% 212% 161% 
S349 120% 211% 165% 
S382 123% 210% 171% 
S386 119% 220% 170% 
S420 109% 205% 156% 
S444 111% 205% 161% 
S641 145% 208% 179% 
S713 115% 201% 155% 
S838 105% 200% 175% 
Figure-21 illustrates the percentage area saving of the proposed scheme over different 
schemes. 
120% - U %age Area ang of the Proposed Scheme Over TMR 
I] %age Area Sang of the Proposed Scheme Over Lima at at 










S298 S344 S349 S382 S386 S420 S444 S641 S713 S838 
ISCAS89 Benchmark Circuits 
Figure-5.23: Percentage area saving through the proposed technique 
The complexity of a circuit after a SEU hardening procedure can be attributed to the 
increase in the number of circuit elements. Figure-5.24 presents the area results in terms of 
the total number of flip-flops of 2x2, 8x8, and 16x16 bit multipliers, using no tolerance 
136 
Chapter 5: SEU Mitigation Scheme for Sequential Circuit Elements 
scheme (standard), TMR, Mavis et al. [MAV-002] scheme and the proposed scheme. 
Mavis et al. [MAV-002] scheme described in an earlier chapter, is based on four clocks 
signals and has 9 level triggered latches to implement the SEU immunity where as in the 
proposed scheme only 5 latches (without optimizations) and 3 clocks signals are used. 
Figure-5.24 shows that the scheme is more efficient than previously proposed schemes. 
The proposed scheme has a saving of 33% in terms of the total number of flip-flops over 





LL • stanuafd 
200 I I 	Th1R 
150 f DMasetai E 
100 I I 	rl Proposed 
: -L r11II 
Figure-5.24: Area Comparison of the proposed technique 
5.8 Summary 
The chapter introduces the proposed technique for sequential elements of the circuit. 
The scheme is based on sampling data at different times and comparing the samples to 
get a fault free output. The scheme was implemented on various ISCAS89 benchmark 
circuits. The results were compared with full hardware redundancy and other 
techniques. The efficacy of the scheme was proven in terms of the area and power 
consumption over different already in use schemes. 
137 
CHAPTER 6 
PARTIAL TRIPLE MODULAR 
REDUNEDANCY BASED SEU/SET 
PROTECTION OF COMBINATORIAL 
LOGIC 
6.1 Introduction 
Technology scaling, shrinking geometries into the deep sub-micron regime, lower supply 
voltages, higher operating frequencies, and higher density circuits have all had a negative 
impact on reliability. The number of occurrences of transient faults has increased 
dramatically. One major transient fault type is soft errors, caused by two main sources: 
Secondary cosmic rays, especially atmospheric neutrons 
Alpha particles emitted by decaying radioactive impurities in packaging and 
interconnect materials. 
These highly energetic particles induce SET in digital circuits. The amount of charge 
injected may be sufficient to invert the logical state at a node, hence introducing a soft 
error. Soft Error Rate (SER) per chip is projected to increase four times with decreasing 
feature size LHAZ-0001. 
Traditionally. soft errors were tackled within the context of memory cells. Today, error 
detection and correction circuits are widely used to protect memory arrays. Combinational 
logic circuits, on the other hand, have been found to be less susceptible to SEU in 
equivalent device technologies due to the naturally occurring logical, electrical and 
latching-window masking effects [LID-094]. However, these phenomena are diminishing 
as feature size decreases and circuits move to higher operating frequencies. Recent studies 
predict that the soft error rate (SER) per chip of logic circuits will increase exponentially 
Chapter 6: Partial Triple Modular Redundancy Scheme 
by year 2011, at which point it will be comparable to the SER per chip of unprotected 
memory elements [SHI-002]. 
For an SET, induced in a combinational logic circuit to result into a soft error, three 
conditions have to be satisfied: 
• An active path must exist between the hit node and the output of the circuit. 
• The pulse-width must be enough to overcome inertial delay through subsequent 
gates, and survive electrical attenuation along the active path. 
• The pulse should arrive within the setup and hold time of a latch element to be 
captured and cause a soft fault. 
In this chapter, we propose a novel design technique to cope with SEU related faults in 
combinational circuits. The design technique is a partial triple modular redundancy of the 
combinational circuit and based on projection model of input signal probabilities. SEU 
sensitive gates are identified on the basis of their input signal probabilities. The approach 
uses the inputs of the gate to determine the SEU sensitive gate. A heuristic algorithm for 
detecting signal probabilities is used to calculate signal probabilities [MUS-097]. In this 
chapter some of the related work in the area of SEU tolerance for combinational circuit 
with the perspective of signal probabilities and partial duplication/triplication is discussed 
under the related work heading. Then the proposed technique is discussed with some basic 
concepts and definitions of the proposed model. This is followed by some case examples 
to elaborate the scheme. The results are provided to advocate the advantage of the 
proposed technique over already developed schemes. The overall performance of the 
technique based on our proposed reconfigurable architecture is discussed in the last 
chapter. 
6.2 Related Work 
Several earlier efforts have been made in an attempt to quantify the contributions of 
SET/SEU to the overall upset rate in a microelectronic circuit LID-094][YAN-
092 ][CHA-096][CHA-093]. Some of these researchers used a simple model for the charge 
collection and simply injected fixed amounts of charge into a sensitive node. Circuit 
simulations were performed to evaluate the transient voltage pulse that would result from 
injecting a fixed amount of charge. Afterwards this voltage disturbance was introduced 
into a gate level logic simulation of the entire circuit to determine if a transient fault would 
139 
Chapter 6: Partial Triple Modular Redundancy Scheme 
occur. Such mixed-mode simulations do not consider that pulse stretching in logic gates 
varies for different inputs and input states, and can require many simulation runs to 
evaluate the various logic states and propagation paths in the circuit of interest. 
Several approximation strategies have been developed in the past due to the high 
computational complexity involved in computing signal and fault detection probabilities 
[JON-095 ][WUN-085 ][PAT-093 ][SAV-000]. The cutting algorithm [SAV-000] computes 
lower bounds of fault detection probabilities by propagating signal probability values. 
However, this algorithm delivers loose bounds, which may lead to unacceptable test 
lengths. Lower bounds of fault detection probability were also derived from controllability 
and observability measures [PAT-093]. This method gave poor lower bounds due to the 
fact that they cannot account for the component of fault detection probability due to 
multiple path sensitizations. The above mentioned methods are satisfactory only for faults 
that have a single sensitizing path for fault propagation to an output and hence will not 
provide good results for highly re-convergent fan-out circuits that have multiple path 
sensitizations. 
PREDICT [SET-085] is a probabilistic graphical method which estimates circuit testability 
by computing node controlabilities and observabilities using Shannon's expansion. The 
time complexity of exact analysis by this method is exponential to the circuit size. 
PROTEST [WUN-085], which is a tool for probabilistic testability analysis, calculates 
fault detection probabilities and optimum input signal probabilities for random test pattern, 
by modelling the signal flow. Fault detection probabilities, which are computed from 
signal probability values, do not take into account multiple path sensitization. Another 
method CACOP [JON-095] is a compromise between the full range cutting algorithm and 
the linear time testability analysis, like the controllability and observability program. 
However, this method does not give exact fault detection probability. 
The algorithm proposed by Chakravarty et. al. uses gate decomposition to compute exact 
fault detection probabilities of large circuits [CHA-090]. PLATO (Probabilistic Logic 
Analyzing Tool) is another tool to compute exact fault detection probabilities using 
reduced ordered binary decision diagrams (ROBDD) [KRE-093]. Space requirement for 
constructing the ROBDD of large circuits is very large. Shannon decomposition and 
divide-and-conquer strategies are used to reduce large circuits into small sub-circuits. 
Computing complexity of these decomposition methods are quite high. Another BDD 
based algorithm is proposed by Farhat et. al. to compute exact random pattern detection 
probabilities. However, this algorithm could not be used for large circuits because of large 
140 
Chapter 6: Partial Triple Modular Redundancy Scheme 
space and time requirements [FAR-093]. Thara et. al. proposed a stuck-at-fault model to 
compute accurate detection probabilities using Bayesian networks as data structure [THA-
0051. 
The work carried out by K. J Hass et. al. [HAS-000] determines the probability of upsets 
that originate from transients in combinational logic by examining the nature of the circuit 
rather than by simulating all possible event scenarios. The probabilistic analysis consists 
of three phases. The first analysis phase evaluates the radiation effects. Specifically, an 
understanding of the cosmic particle environment, the orbital parameters of the spacecraft, 
and the physical dimensions of the circuit elements, which leads to a probability 
distribution for the amount of charge deposited on a sensitive circuit node. The second 
analysis phase involves characterizing the logic elements used to construct a functional 
circuit. For each gate the probability distribution of voltage transients, as a function of 
their pulse width is determined by considering direct particle strikes to the gate output as 
well as those transients that may originate elsewhere and propagate through the gate in 
question. 
Diehl et. al. Die-083J presented a technique to determine the error rate of the SEU that 
may occur in combinational circuit logic. The technique may be modified to find the 
probability of a combinational logic to be disturbed at the output of a flip-flop by an SEU. 
A typical circuit can give internal logic value probabilities that are significantly different 
from 0.5 proposed by Diehl et. al. The main limitation of this technique is that it is 
computationally complex and in the case of re-convergence the probability propagation 
becomes very complex. 
Since the occurrence of SEUs is random therefore, it can not be determined what input 
pattern must be applied at the primary inputs to avoid affects of SEU on the logic circuit. 
Hence, K. Holland et. al. [HOL-091] proposed a model to calculate the probability of 
latching SEU errors in VLSI logic circuits. It is based on the assumption that any input 
pattern is equally possible. The probability building and propagation is based on the 
above assumption. The probability building is used to find the logic value probabilities on 
each node in the circuit. This involves following two steps: 
All possible primary input patterns are propagated through the circuit and a count 
is kept for logic 1' and logic '0' at each node. 
Each count is divided by the total number of the patterns applied to get the logic 
0' and the logic '1' probabilities for each node. 
141 
Chapter 6: Partial Triple Modular Redundancy Scheme 
Based on these useful ideas some researchers geared their efforts to find a suitable solution 
for SEU problem in space born systems. The work carried out by K. Mohanram et. al. 
[MOH-003] suggested a partial duplication of the logic circuit based on the overall soft 
error susceptibility and the soft error failure rate of the design. The work is based on gate 
level soft error failure rate and results in computationally recursive and complex 
algorithms. 
The work by P. Mongkolkachit et. al. [MON-003] takes the advantage of the transient 
nature of the spurious pulse generated by radiation. The approach is shown in Figure-6.1. 
The delays incorporated by logic gates help to provide time redundancy. The main 
limitation of the approach is that if the transient is produced within the delay path as a 
result of a direct hit. it will void the whole scheme. 
Figure-6. I: The Approach Using delayed signal and Buffer Circuit [MON-003] 
6.3 Model Description 
The proposed scheme is described in details in the following sections with an illustrative 
example. This probabilistic model calculates the soft error probability of any output node 
in a combinational circuit, based on logical masking principles. The proposed approach 
differs from the ones found in the literature in three important ways [BALO-06c]: 
This model assumes soft error hits at individual nodes, and not on the gate as a 
whole; this makes the model more realistic and accurate. Figure-6.2 shows the 
proposed model where SEU hit at the input nodes is taken into account in contrast 
to the schemes where SEU effects on the whole gate are considered for SEU 
immunity. 
The model accounts for input probabilities, i.e. it can accommodate unbalanced 
input vectors; this allows the designer to estimate soft error resiliency for any 
specific input patterns, as well as random input patterns. 
TMR is applied to only SEU sensitive gates which helps to reduce cost associated 
with TMR. 
141 




E Approach mostly usec 	t: Proposed Model 
Figure-6.2: The Proposed Model 
It is useful to define some assumptions and other key terms which will be used in the 
proposed technique. 
6.3.1 SEU Sensitive Gate 
A gate is SEU sensitive only if the logical state of the output experiences change by 
complementing one of its primary input [BALO-06c]. A pseudo code is presented as 
Figure-6.3. 
1 Gate-sensitivity (Gate, dominant-value, sensitive) 
2 output: Gate sensitivity towards SEU TRUE/FALSE 
3 inputs: Gate, dominant—value 
4 for each input of Gate i 
5 	if i has dominant value 
6 then if rest of inputs have !dominant—value 
7 	 then return TRUE 
8 break 
8 	elseif only one of the rest inputs have dominant—value 
9 then return TRUE 
10 	 break 
10 else return FALSE 
11 end 
Figure-6.3: Algorithm to Determine SEU Sensitive Gates 
6.3.2 Dominant Value 
The input that controls the state of gate-output is defined hereby as dominant value. 
Dominant value is gate dependant. This can be explained through the truth tables of 
different basic gates and calculated through pseudo code presented in Figure-6.4 
lDoininant_Value(Gate,Prob_Threshold, Dominant_value) 
2 output: dominant-value 
3 inputs: Gate, 0< Probability_Threshold <1 
4 Determine Type of Gate & Dominant value 
5 if (Gate is AND or NAND) 
6 	 dominant value = FALSE; 
7 	elseif (Gate is OR or NOR) 
8 	 dominant value = TRUE; 
9 Return dominant value 
Figure-6.4: Algorithm to Determine Dominant Value 
143 
6: Partial Triple Modular Redundancy Scheme 
It can be seen in Figure-6.5 that the dominant value for AND and NAND gates is logic V. 
If any one of the inputs is at logic O' then the output is independent of the logic state of 
the other inputs. Likewise logic 1' is the dominant value for OR and NOR gates. Since we 
have signal probabilities rather than actual test vectors, in order to use the above 
definitions, we define a threshold probability as follows: 
6.3.3 Threshold Probability 
The logic value is assumed as logic'O' if its signal probability is less than threshold 
probability, otherwise logicl'. This Threshold can be specified by end user depending 
upon the nature of application (Radiation Levels etc.). 
:[a :E::>- :E> ::T,> 
A 	P C/F 	 A 	B C/F 	 A 	B CIF 	 A 	B C/F 
0 	0 	0 	 00 	 00 	0 	 00 
O 0 0 	 0 	 0 	1 	 0 
• 	0 	0 	 0 	 0 	• 	 0 	0 
0 	 0 
Figure-6.5: The Logic Functionality for AND, OR Gates 
An SEU sensitive gate can be defined in terms of dominant value. The SEU sensitive gate 
with two or more inputs is determined as follows: 
If only one of the inputs have the dominant value. 
If all inputs have non-dominant values. 
For a given threshold probability, logic values are assigned to each input, according to the 
criteria defined above. The gate's sensitivity is then determined accordingly. The 
Algorithm Dominant_Value, shown in Figure-6.4 is employed to find the dominate value 
of gate inputs. If a gate has one or more sensitive inputs, then that gate is considered as 
sensitive to SEUs. 
Consider a 3-input OR gate having input signal probabilities as 0.7. 0.2 and 0.3 as shown 
in Figure-6.6[a]. The threshold probability is considered as 0.5 for the sake of realistic 
comparison as there is an equal probability for any primary input to be at either logic '0' 
or '1'. According to the definitions described above, the logic values (1. 0, 0) are 
calculated for the corresponding input signal probabilities. The dominant value is assigned 
144 
Chapter 6: Partial Triple Modular Redundancy Scheme 
to the gate inputs depending on gate type through the algorithm described before (Figure-
6.4). For this case the dominant value is logic '1'. Figure-6.6 presents two scenarios for 
AND and OR gates each. Green and red colours illustrate SEU sensitive and in-sensitive 
gates respectively. Let us now assume that a fault due to SEU is on one of the inputs 
which happen to be at logic 1' (Figure-6.6[al). All the other gate inputs have non-
dominant values and only one input has the dominant value. The fault propagates through 
the gate and output will be flipped if the SEU strikes at the logic '1' input. Figure-
6.6[a][(d] illustrates SEU-sensitive gates while Figure-6.6[b] & [c] are the examples of 
SEU in-sensitive gates. 
SEL 
Threthoki 








C7 _!....i 	 c/p C2 _.......0 
C2_____.0 
Probbflty 	Lcg 	Valut  
(c SEL- InSensitive Gate 
/ SEL 
Th,othoe 
C? _..... 1 	 UP
C2 	 C 
C ____. 1 
Probabbty 	LogcVaIu 




C € -. 
Proftty 	Log',V 
cc SEL- Sensitive Gate 
Figure-6. 6: Proposed Model, SEU sensitive and insensitive gates 
In other words a fault on any input propagates to the output of the gate only when other 
inputs have non-dominant values. This can be defined for AND gate in terms of input 
probabilities as 'an SEU on one of the inputs of a gate has a higher probability of upsetting 
its output only if the signal probability of all other inputs being at non-dominant value is 
greater than or equal to the threshold probability'. Hence, the gate is assumed to be 
sensitive to SEUs on its inputs. This is shown in Fiure-6.6[d]. The gates shown in Figure- I= 
 & [c] are not sensitive gates because two or more inputs have dominant values. 
Once the SEU sensitive gates in a logic circuit are identified then TMR is applied to only 
these SEU sensitive gates and to the gates at output stage of logic circuit irrespective of the 
fact that they are sensitive or not sensitive. Hence, the partial TMR process has following 
steps: 
• Step-I: Determine signal probability at each node of the circuit. 
• Step-2: Determine Dominant value for each gate in the circuit under test. 
• Step-3: Determine Logic value of each node based on threshold probability 
• Step-4: Determine gate SEU sensitivity depending input logic value. 




Chapter 6: Partial Triple Modular Redundancy Scheme 
Let us assume that G2 and G3 (Figure-6.7[a]) are identified as SEU sensitive gates 
through the probability projection model. The gates at the output (G4 & G7) are also 
classified as SEU sensitive as stated before. So, once these SEU sensitive gates are 
identified then TMR is applied to only these gates with a simple majority circuit. If the 
two SEU sensitive gates are next to each other in a data path, then we can eliminate a 
simple majority circuit after the first SEU sensitive gate as presented in Figure-6.7(a & b). 
This technique works very well and results in saving area and power because we can 
identify SEU sensitive gates. But the efficiency of the scheme depends on how accurate 
the signal probabilities are calculated for each node in the logic circuit. The signal 
probability of any output is calculated through an heuristic algorithm for estimating signal 
and detection probabilities [KEN-096]. 
L-, 12 	G7 
C-3 
(a) A Logic Circuit with Input Probabilities 
Figure-6. 7: (b) Partial TMR applied to only SEU Sensitive Gates 
As discussed above, the probability of each node will determine whether the node is SEU 
sensitive or not. So. it is important that signal probability procedure should give exact or 
close to exact signal probabilities. The formulas presented in Table-6.1 computes signal 
probabilities but the estimates of signal probabilities are not reliable for the circuits which 
have re-converging fan-out [MUS-097]. 
146 
Chapter 6: Partial Triple Modular Redundancy Scheme 
Table-6. 1: Signal Probabilities calculation formulas [MUS-0971 
Gate Type Output Probability 
AND fl Pi 
NAND i - fl P1 
OR Y ,  Pi -HPi 
NOR 1-(Pi-fLPi) 
XOR Y, P(i)(l - P(j)) 
XNOR i-(P(i)(i-P(j))) 
To tackle this daunting task we looked into some of the existing work and some SEU 
mitigation techniques which take input signal probabilities into account. Some of the main 
techniques are explained in the related work section of this chapter. We concentrated on an 
heuristic algorithm for probability calculation which is explained in the next section. 
6.4 Signal Probability Estimations 
As discussed above, the probability of detecting a fault is highly dependant upon the 
accuracy of input signal probability of each node. This can be further explained by taking 
this simple example. Let us consider a circuit as shown in Figure-6.8. The output function 




Figure-6. 8: The Test Circuit realizing Z= ABCDE + FGHI 
The probability of detecting a fault (stuck-at-0) at the output-Z is simply P(Z=1) in the 
case of randomly chosen input vectors. This can be explained by the following expression 
which is calculated through Table-6.1. 
P(Z)= P(ABCDE) + P(FGHI) - P(ABCDEFGHI) 	 (6.2) 
Let the input test vectors be random and have equal probability such that: 
P(A=l) = P(B=l) = P(C=1) ........... P(I=1) = X 
147 
Chapter 6: Partial Triple Modular Redundancy Scheme 
By taking input probability into account the Equation-6.2 can be expressed as: 
P(Z)=X 5 +X4 –X9 	 (6.3) 
Figure-6.9 is a plot of detecting output probability versus input probability. It illustrates 
that the output can be estimated precisely if input is known. So in other words the 
probability of detecting a fault is highly dependant on the value X. It can be shown that 
many methods of producing test vectors from a list of random numbers inherently fixes X 
0.5 and this lead to Z = 0.09. 
— 




Figure-6. 9: The Graph for Z = X5 + X4 - X9 
It is obvious in this instance, why purely random test generation can be inefficient. The 
test efficiency can be increased by controlling the value of X. This leads to the interesting 
fact that the selection of X can control the reliability of the whole scheme. S. Kathoori et. 
al. proposed a SEU mitigation technique and the input signal probabilities were taken by 
profiling the input environment and then incorporating the simple formulas (Table-6.l) for 
probability estimations of each node in the circuit [KAT-004}. As we have seen, these 
input signal probabilities may play a main role towards the efficiency and reliability of the 
whole mitigation technique. On the contrary, our proposed model is based on the input 
signal probabilities equal to 0.5. It is selected to make the model more realistic and 
generalized as the input has equal probability to be 1 or 0. 
The detection probability of a given fault is the probability that a randomly chosen input 
vector detects the fault. The computation of detection probabilities of a given fault 
involves the computation of signal probabilities in the network. The signal probability of a 
line L in a network is the probability of line L having a logic value l' on a randomly 
selected vector. Actually, the computation of the detection probabilities of a fault could be 
reduced to the computation of signal probability of an auxiliary gate, whose output is logic 
148 
Chapter 6: Partial Triple Modular Redundancy Scheme 
1' if the sensitizing conditions of that fault are satisfied [SAV-084]. Consequently, the 
problem of computing the signal probabilities is of central importance in random pattern 
testability analysis. The major difficulty in computing the signal probabilities is re-
convergent fan-outs. In fact, if a circuit has no re-convergent fan-outs, then the 
independent formulas (Table-6.1) will compute the exact signal probabilities in a linear 
time. However, if a node has fan-out greater than 1 and re-converges at a gate input then 
the signal probability computation of the simple algorithm for all nodes driven by a gate is 
more likely to be deviated from the exact values. Let us take an example circuit as shown 
in Figure-6. 10 and compute the input signal probabilities for each node. 
Figure-6. 10: A Test Circuit to Compute Signal Probabilities 
Ii, 12 and 13 are the three inputs. Z is the circuit output and A, B, C and D are internal 
nodes. Table-6.2 gives the exact and calculated probability values for each node when 
calculated through the simple formulas as shown in Table-6.1. Node-A is connected to 
inputs of two gates. This means that the Ui has a fan-out greater than 1. Table-6.2 presents 
that the nodes that are dependent on Node-A deviate from the exact probability values 
(shown in red colour). On the other hand these basic formulas work absolutely fine for the 
gates with fan-out = 1 and there is no discrepancy in the exact and calculated values of 
probabilities (Table-6.2). 




11 ½ '/2 
12 '/2 '/2 
13 '/2 ½ 
A /4 3% 





Chapter 6: Partial Triple Modular Redundancy Scheme 
As the proposed technique is dependant upon the signal probabilities so it is very 
important that probabilities are calculated exact or as close to exact as possible in a 
reasonable time. As discussed before the discrepancy in probability calculations can be 
attributed to these reasons: 
A re-convergence at a gate 
A re-convergence at a gate in its cone of influence 
A combination of 1 & 2. 
The possibilistic algorithm distinguishes between these causes and uses a proper inference 
rule to reduce the error in every case [BALO-06c]. The technique for estimating signal 
probabilities [MUS-0971 for every output gate is given in the following steps. 
Step-]: 
• Estimate signal probabilities of each output node (S) through a Simple Algorithm. 
Simple algorithm is explained below: 
Simple Probability Algorithm 
2 Inputs: n primary inputs 	I, 12, 1 3, ---- In 
3 Output: Estimated Signal Probability of output nodes 
4 assign 0.5 signal probability to all circuit inputs 
3 calculate output probabilities based on Figure-6.1 
9 Return Output Probabilities 
Figure-6. 11: Simple Algorithm for probability estimations 
Step-2: 
Perform following procedure on each primary inputs of the logic circuit 
Compute signal probabilities of each output node by setting input I, to 0 while all 
other primary inputs I I  !~ j ~! n,j#i)are set to ½ and denoted by SPA(I, = 0). 
Repeat the above step with I, to 1 and is denoted by SPA(I, = 1). 
Compute the average and denote the result as P-tuple (P). 
R = 




Chapter 6: Partial Triple Modular Redundancy Scheme 
Perform the following procedures starting from the primary inputs to the primary 
outputs: 
• Compute the Expected-Tuple (E-Tuple) of each output gate using the P-Tuples 
(P,) as gate inputs by using Figure-6.1 formulas. 
• Run the Mark-Tuple Algorithm as shown below: 
• Compute no-dependent (ND) value by using independent formulas (table-1) and 
the estimated signal probabilities as the gate inputs from previous level. 
• Apply the inference rules listed below: 
Mark_Tuple Algorithm 
1 output: M-Tuple (Signal Probability) 
2 inputs: S Probability calculated in step-1 
3 	 Pi P-Tuple 
4 E 1 E-Tuple 
5 if P-tuples (P±)== Si f l 	n 
6 	then M= 0; 
7 elseif P-tuples (P1)== E1 (1 ( ± ( n 
8 	thenMi = 1; 
9 cntrl-1 = cntrl-1 + 1; 
10 else 
11 	Mi=2 
12 cntrl-2 = cntrl-2 + 1; 
13 endif 
14 end 
Figure-6.12.: Algorithm to Determine Mark Tuple 
Inference_Rules(cntrl_l, cntrl_2, P, S, ND) 
output: Signal Probability 
outputs: cntrl_l 
cntrl_2, 
S (probability from table-6.1) 
ND No-Dependant value 
if cntrl_1 == 0 && cntrl_2 == 0 
then P=S 
elseif cntrl_1 != 0 && cntrl_2 == 0 
then P = ND 
elseif cntrl_1 == 0 && cntrl_2 !=0 
if cntrl_2 == 1 then P = p1 
else 
coeff = >(pi - ei) 
P = ND + coeff + sign_of_coeff x coeff/cntrl_2 
elseif cntrl_1 	0 && cntrl_2 =0 
coeff = >(pi - ei) 
P = ND + coeff + sign_of_coeff * (5 - ND) * coeff/S 
Endif 
end 
Figure-6.13: Algorithm for Inference Rules 
151 
Chapter 6: Partial Triple Modular Redundancy Scheme 
6.5 An Illustrative Example 
Consider an example circuit shown in Figure-6.14. Different nodes are marked as a,b .....h. 
The proposed scheme will be implemented on the circuit and analysed in the result section 
for its immunity against SEU. The circuit has some basic gates, four inputs and two 
outputs. First of all, signal probabilities for each node are calculated through the algorithm 
explained above. The signal probabilities along with E-tuples and P-tuples are shown in 
Table-6.3 to clarify the complete process. Table presents the exact probabilities for each 
node and probabilities calculated through simple formulas (Table-6.1) to show the efficacy 




Figure-6.14: Logical Circuit with SEU Sensitive Gates 
Logic values are assigned to different nodes based on threshold probability. The 
Probability Threshold is assumed as 0.45 for this particular example. Dominant values are 
determined for each gate depending on the gate types and SEU sensitive gates are 
identified. 
The sensitive gates are encircled and shown in Figure-6.14. The identification of SEU 
sensitive gates are purely dependant on the probability of each node and on the probability 
threshold. As discussed before, it is important that the probabilities should be as close to 
exact values so that the exact number of gates along with the correct gates can be 
identified. Gates G6 and G7 (Figure-6.14) are not detected as sensitive gates if probability 
is calculated through simple formulas Tigure-6.1(most commonly used in literature). 
These two gates may affect the overall SEU immunity of the circuit. The Table-6.3 
presents probability figures for node-D. It can be translated into incorrect logic values for 
the given probability threshold if simple formulas are considered. This in result identifies 
gate G6 and G7 as SEU insensitive gates. Whereas, these gates are SEU sensitive if Exact 
Probability of node-D is taken into account. The proposed technique detects these gates as 
SEU sensitive gates. As stated before, the simple formulas work fine for small and non-
convergent circuits. When compared with the exact probability of each node, the 
152 
Chapter 6: Partial Triple Modular Redundancy Scheme 
probability figures calculated are very close/same for different nodes (Tigure-6.3). This 
interesting comparison authenticates the proposed technique. 
Table-6.3: Probability Calculations for Reference Circuit Figure-6.14 
Node P-Tuple E-Tuple P E Simple 
Formulas 
A (0.75, 0.75, 0.75, 0.75 1 10.75, 0.75, 0.75, 0.751 0.75 0.75 0.75 
B 10.25, 0.38, 0.38, 0.381 10.38, 0.38, 0.38, 0.381 0.25 0.25 0.375 
C (0.38, 0.25, 0.38, 0.381 (0.38, 0.38, 0.38, 0.381 0.25 0.25 0.375 
D 10.44, 0.58, 0.44, 0.441 (0.61, 0.61, 0.39, 0.391  
E (0.25, 0.25, 0.25, 0.25) 10.25, 0.25, 0.25, 0.251 0.25 0.25 0.25 
F (0.11,0.11,0.09,0.09) {0.l 1,0.1,0.09,0.091 0.02 0.025 0.097 
G (0.05, 0.05, 0.04, 0.041 10.05, 0.05, 0.03, 0.031 0.03 0.03 0.038 
H (0.33, 0.33, 0.29, 0.291 10.33, 0.33, 0.32, 0.32) 0.41 0.4 0.323 
P= Proposed technique 	E = Exact Probability Model 
Table-I =Calculated Through Figure-6.1 
The above example advocates that an ordinary probability calculation method may lead to 
a situation where SEU sensitive gates are not detected and this can have very drastic 
effects on the system performance and can fail the whole application. 
Figure-6.15: Example Combinational Circuit with SEU Sensitive Gates 
We presented an example where SEU sensitive gates were not detected if an efficient and 
proper scheme for probability calculation is not employed. 
Now let us take a scenario where some of the gates may be identified as SEU while they 
are not sensitive to SEU at all. This situation causes extra area and power without 
improving SEU immunity of the circuit. To elaborate this let us take an example 
combinational circuit as shown in Figure-6.15. The SEU mitigation scheme is applied to 
the test circuit by first calculating the probabilities of each node. The probability threshold 
is assumed to be 0.45. The SEU sensitive gates are identified based on the logical values 
of each gate and its dominant value. For the sake of comparison the probability is 
calculated by three different techniques and then the proposed technique is applied. The 
entire probability figures are shown in Table-6.4. Gate G6 is deduced as sensitive through 
probability projection method (Table-6.1) whereas this gate is insensitive to SEU as 
calculated through proposed technique and through exact probabilities figures as well. The 
circuit in Figure-6.15 is a good example to show that an ordinary probability method may 
153 
Chapter 6: Partial Triple Modular Redundancy Scheme 
detect some gates as SEU sensitive that are inherently insensitive to SEU. This 
phenomenon can result into an area overhead without having any real advantage. 







a 0.75 0.75 0.75 
b 0.25 0.25 0.375 
c 0.25 0.25 0.375 
d 0.525 U.S 
e 0.160 0.25 0.147 
f 0.867 0.8 0.79 
0.9749 0.95 0.88 
TMR is only applied to a partial circuit as only sensitive gates are triplicated. The PTMR 
circuits are analysed through an SEU simulator. The SEU simulator is designed in 
VERILOG. The Simulator can inject faults of any duration and at any time which helps to 
fully analyse the proposed scheme. The SEU simulator is discussed in detail in chapter-5. 
6.6 Experimental Flow 
The proposed technique is coded in C' programme, which takes a net-list of the circuit 
under test as the input. The signal probability for each node is calculated and SEU -
sensitive gates are identified. Based on SEU sensitive gates, necessary structural 
modifications are made to the original circuit. The modified netlist is then fed into the 
software SEU simulator to analyse the behaviour of the circuit under SEUs/SETs. The 
netlist can be mapped on any reconfigurable architecture (FPGA etc.) through a suitable 
software tool (Xilinx Foundation Tool etc.). 
6,7 Evaluations 
The Experimental results are derived for a medium sized MCNC benchmark circuits 
[BEN-MRK] and test circuits given as Figure-6.14 and 6.15. We injected random SEU, 
SET faults with variable duration (ins, 3ns, 5ns, 9ns, lins), to verify the proposed 
scheme. Faults were injected through SEU simulator. The synthesis tool used for 
technology mapping and optimization is Synplify® ASIC 3.0.4. The technology used is 
0.18 micron Cell library. Post synthesis simulations are performed with the help of 
Verilog-XL® to verify the proposed technique. 
154 
Chapter 6: Partial Triple Modular Redundancy Scheme 
The results are obtained for four different implementation scenarios and then results are 
compared to prove the advantage of the proposed solution. The original Circuits 
(benchmark + test circuits Figure-6.14 & Figure 6.15) are implemented without any level 
of SEU immunity on 0.18 micron technology. The resultant technology mapped circuits 
were injected 1000 random upsets and vectors were stored to repeat the simulation for 
other implementations. The upsets were recorded to note how many times the circuit under 
test gets upset. The second implementation scenario involves the implementation of the 
proposed technique on the same set of circuits and the same test vectors were applied. This 
implementation scenario is denoted as PTMR-1 for all future references. The upsets were 
recorded for these simulations and shown in Table-6.5. The third simulations were carried 
out with the proposed mitigation scheme but probability estimates were calculated through 
Table-6.1 formulas. This is denoted as PTMR-2 for all future references. The simulation 
results with area overhead in terms of gate count are presented in Table-6.6. The final 
simulations were done with the full TMR scheme for the said circuits. The results indicate 
that that the proposed scheme is more efficient than full TMR scheme in terms of area 
overhead (gate count). The results also advocate as explained before that the general 
probability estimation technique may fail to enhance system performance. It is also 
evident through simulation results that the proposed technique is better in terms of SEU 
immunity than Implementation-2. 
Table-6.6 provides an estimate of area over head in terms of gate count. The Table-6.6 
provides the percentage area saving of both implementations with respect to full TMR of 
the circuit under test. The comparisons for area overhead of implementation-i and 2 are 
also carried out and reported in Table-6.6. 
The Circuit C138a is an example where the overhead of the proposed technique is the 
same for PTMR- I and PTMR-2. This implies that there is no advantage in terms of area to 
use the proposed scheme. A close and thorough analysis reveals that though the number of 
the sensitive gates evaluated in both implementations is the same, but actually different 
gates are identified. Hence area overhead is same but level of SEU immunity is different. 
This is confirmed through the results as the implementation-I has less number of upsets 
than Implementation-2 with same test vectors (Table-6.5). 
The simulation results for circuits (CM163a, CM152a and CC) indicate that the PTMR-1 
requires more gates than implementation-2. This is because the PTMR-1 could not identify 
all the SEU-sensitive gates of the circuit. This phenomenon indicates the limitation of 
PTMR-2 to identify SEU-sensitive gates correctly. Due to this reason the implementation- 
155 
6: Partial Triple Modular Redundancy Scheme 
1 gets more upsets than PTMR-1. This affect is presented as negative area saving in Table-
6.6. The probability threshold (0.45) is chosen to demonstrate the area overhead of the 
proposed scheme in worst case scenario. 




Original Circuit Proposed Scheme Proposed Scheme 


















Figure-6.14 24 14 8 2 20 3 16 
Figure-6.15 21 9 7 1 17 1 19 
C1355 1698 14 566 0 1102 3 1126 
C499 618 11 206 2 395 3 398 
C880 954 9 318 1 711 2 728 
A1u2 1008 4 336 0 604 0 622 
C8 486 21 162 3 202 5 334 
Cc 168 10 56 0 155 3 148 
Cm138a 63 1 	2 21 0 37 1 37 
CmlSOa 174 2 58 0 56 0 60 
CmlSla 90 9 30 2 58 2 64 
Cm152a 69 16 23 1 49 6 43 
Cm162a 132 9 44 0 74 3 80 
Cm 163a 129 21 43 1 81 7 79 
Cm42a 48 15 	1 16 0 42 0 48 
Cm82a 54 19 1 18 2 51 5 52 
upset inuuceu i-auits = total numner or times circuit is upset with SEU 
The rest of the circuits (CMI50a, CMI62a, C8 etc.) indicate that the implementation-2 
identifies some gates which were not actually SEU-sensitive. As proved earlier this 
overhead does not improve SEU immunity and it is clear that these circuits with high 
overheads get more upsets in number than the proposed technique. The false identification 
is due to re-convergence and inability of Table-6.1 estimation formulas to calculate precise 
probability for the individual nodes in the network. 
The simulations are repeated with different probability thresholds. The results are shown 
in Table-6.6. As stated before, the SEU/SET pulse is generally short lived and has 
duration of 100 - 200ps. The greater pulse width means that there is more probability that 
the spurious pulse will not be filtered through inherent propagation delays and will be 
translated as a valid signal. The selection of probability threshold is important and may 
depend on the scope of target application. The systems where high reliability is important, 
the probability threshold can be set relatively lower than a less critical system in terms of 
reliability. 
156 
Chapter 6: Partial Triple Modular Redundancy Scheme 
Table-6.6: Simulation Results for nercentaoe Area Savincc SET Wilth = 
Circuit Probability 










A B C A A A 
Figure-6.14 17% -25% 33% 16% 15% 11% 
Figure-6.15 19% 11% 10% 17% 18% 15% 
C1355 35% 2% 34% 33% 32% 31 17c 
C499 36% 1% 36% 35% 34% 30% 
C880 25% 2% 24% 25% 24% 19% 
A1u2 40% 3% 38% 39% 40% 38% 
C8 58% 40% 31% 58% 51% 55% 
Cc 8% -5% 12% 8% 4% 0% 
Cm 138a 41% 0% 41% 39% 42% 33% 
CmlSOa 68% 7% 66% 66% 60% 51% 
Cml5la 36% 91/, 29 17, 35% 33% 241-/c 
Cm 152a 29% -14% 38% 28% 22% 15% 
Cm162a 44% 8% 39% 43% 45% 29 17( 
Cmlô3a 37% -3% 39% 36% 21% 18% 
Cm42a 13% 13% 0% 11% 7% 5% 
IL_Em82a 	1 6% 2% 	1 4% 5% 2% 0% 
A = Area Saving 01 I'lMR- I with respect to Full TMR Scheme 
B = Area Saving of PTMR-1 with respect to PTMR-2 
C = Area Saving of PTMR-2 with respect to Full TMR Scheme 
The results indicate that the proposed scheme has an area saving of approximately 14 to 
65% for medium sized circuits. This area saving can be more significant for large circuits. 
The comparisons reported in Table-6.5 and 6.6 confirm the following points: 
• Proposed scheme is more SEU immune than the schemes based on simple 
formulas (Table-6.1) 
• Proposed scheme can be more costly in terms of area than PTMR-2 but better than 
the most commonly used full TMR scheme. 
Proposed scheme gives approximately 98% SEU immunity 
The proposed scheme is applied to ASIC and the proposed Reconfigurable Architecture 
for more complex applications (Discrete Wavelet Transform) and Power and Area results 
along with the other techniques are compared and analyzed in chapter-8. 
157 
Chapter 6: Partial Triple Modular Redundancy Scheme 
6.8 Summary 
This chapter has described a new concurrent SEU/SET mitigation technique for 
combinational circuits. The Scheme is based on input signal probabilities. The signal 
probabilities are incorporated to identify SEU sensitive gates. The proposed scheme gives 
significant immunity against all single faults. Results show that 95% to 98% SEU 
immunity can be achieved with an area saving of approximately up to 65% over standard 
TMR for medium sized circuits. The proposed technique can be employed in any 
commercial FPGA and reconfigurable SoC with minimum speed and area over head. The 
scheme is implemented for the proposed domain specific architecture for DWT and results 
are discussed in Chapter-8. The proposed scheme is also implemented for ASIC 
implementations of DWT and power and area results are reported in chapter-8. 
158 
CHAPTER 7 
SEU/SET MITIGATION WITH DUAL 
HARDWARE REDUNDANCY 
7.1 Introduction 
Reliability is generally attributed to immunity to hard failures; such as electro-migration, 
hot carrier effects, or dielectric breakdowns. However, transient faults or soft errors due to 
radiation induced upsets can also affect the reliability of circuits. Therefore, designs are 
required to be made more robust. 
Our focus in this chapter is on radiation-induced errors, particularly those in combinational 
circuits resulting from high-energy particle strike. The novel scheme proposed in this 
chapter is based on hardware redundancy. The scheme addresses radiation induced 
transient faults in a combinational circuit by hardware duplication and then compares the 
results from the two redundant logic blocks. 
The chapter introduces the problem in hand and some related work in this area. Then the 
proposed scheme is discussed with its hardware implementation. At the end, the scheme is 
compared with the different already in use schemes and results are compared. 
7.2 Background 
A SET can cause a soft error if it propagates to gate's primary output (P0) and finally 
captured by an output flip-flop (FF). However, there are three factors that may prevent any 
logical circuit from the influence of a soft error. These are enlisted below: 
1. Logically Masked: The logic state of other inputs of a particular gate will 
determine the propagation of SET to the gate's output. (discussed in Chapter-6) 
159 
Chapter 7: Dual Hardware Redundancy With Comparison (DHIRC) 
Electrically Masked: The SET propagation will depend on the electrical properties 
of gates in the propagation path. 
Latching-Window Masked: The temporal relationship of a SET arriving at the 
input of FF with its clock edge can eradicate the effects of transient [SHI-0021. 
As discussed in the previous chapter, the critical charge Qcr,t  is defined as the smallest 
amount of the deposited charge that is required to create a sufficient SET pulse (not 
masked through the factors explained above), that results in a soft error. The charge 
deposited is directly related to the energy of the striking particle. The soft error rate (SER) 
increases exponentially with decrease in Q [HA-Z-0001. The soft errors result in increased 
reliability problems in deep submicron technologies because: 
• Smaller and faster transistors void the effects of electrical masking [HAZ-000]. 
• Reduced source/drain capacitances and supply voltages results in lower critical 
charge (Q) [HAR-0111  HAZ-0001. 
• Higher clock frequencies void the effects of latching-window masking [HAZ-
000]. 
Several techniques have been proposed to mitigate the SEU affects in digital electronics as 
discussed in the previous chapters. A mitigation scheme can be accomplished through a 
variety of redundancy techniques. The redundancy is provided through extra components. 
These extra components can be: 
Extra Components (Hardware Redundancy) 
Extra execution time (Time Redundancy) 
Combination of 1 and 2. 
The hardware redundancy can be either carried out at module level or at device level. Each 
scheme has its own advantages and limitations. No matter which scheme is adopted, the 
SEU mitigation comes with a price and the most important is that there will always be a 
compromise between area, speed. power dissipation and fault tolerance efficiency. 
The problem of finding an optimum solution in terms of overall performance for a 
reconfigurable fabric is very challenging due to the complexity of the architecture. As 
discussed in the previous chapters, when a user defined combinational circuit is realized 
through a reconfigurable fabric, it may experience a single event upset with a peculiar 
Chapter 7: Dual Hardware Redundancy With Comparison (DHRC) 
effect which is not common in ASICs. The upset has a transient effect followed by a 
permanent effect. The upset can affect either combinational circuits, memory structures or 
the routing matrix of the reconfigurable architecture. A general FPGA can experience the 
effects of radiation through a number of programming and routing resources. If any one of 
them undergoes these affects, it will change the functionality of the architecture. 
S. Buchner et. al. [BUC-097] established a relationship between the critical transient 
pulse-width required to propagate without attenuation through an infinitely long chain of 
inverters and technology feature size. It is established that at pulse-widths smaller than the 
critical width, the inherent inertial delay of the gate causes the transient to be attenuated 
and the pulse dies out after passing a few gates. At pulse-widths equal to or larger than the 
critical width, the transients propagate through the gate just as a normal circuit signal. 
This phenomenon can be generalized as: 
• Transients of greater width than the critical width propagate through any number 
of gates without being attenuated. 
• Transients of half or less width than critical width attenuate at first gate, 
• Transients of intermediate width propagate through a variable number of stages. 
Traditional techniques to provide soft error tolerance rely on TMR. TMR is based on 
triplicating the original logic and then employing a majority voter to determine the final 
output. However, this technique involves a high overhead (normally> 200%) in terms of 
area and cost. This overhead limits its usage to reliability-critical applications. Various 
ideas based on time redundancy for soft error tolerance were presented in NIC-0991. The 
time domain majority voter presented in [NIC-0991 has a performance overhead since the 
sampling is started after the longest path in the circuit settles. Hence, an online error 
detection and retry procedure was considered better [ANG-000]. Online or concurrent 
error detection can be achieved by using self checking circuits [LO-093] [MET-000] or by 
exploiting temporal redundancy of the signals [ANG-000]. Self checking circuits may 
require high hardware cost for arbitrary logic functions. Online error detection and retry 
may affect performance (throughput) and cannot be used in real-time systems to overcome 
transient faults due to electrical noise or external radiation. Another technique called 
partial error masking, corrects errors with lower overhead than traditional TMR techniques 
by utilizing the difference in soft error vulnerabilities of gates. But, it masks SEU errors 
only in CLBs and has higher overhead compared to the technique presented in this work 
[MOH-03a]. 
161 
Chapter 7: Dual Hardware Redundancy With Comparison (DH1RC) 
7.3 Dual Hardware Redundancy With Comparison 
The TMIR technique is a suitable solution for FPGAs because it provides a full hardware 
redundancy and it only changes the high level design description. It does not result into 
any changes at the mask level. On the other hand as it does not change the reconfigurable 
design itself, it has some limitations. The main limitations are listed below: 
It is expensive in terms of I/O pads and only one third of the pads are available to 
designers. 
• It is expensive in terms of silicon area as the whole design is multiplied by at least 
a factor of 3. The design includes both combinational and sequential circuits 
elements with three voters and multiplexers for each component. 
It is expensive in terms of overall system speed due to delays due to voters. 
• It is expensive in terms of power dissipation which is increased by at least a factor 
of 3. 
These limitations may be ok for many applications but for performance critical system 
they are not. In conclusion. TMR is a suitable solution for most of the FPGAs and 
reconfigurable architectures because it provides full hardware redundancy to 
combinational, synchronous circuits, input pads, the routing matrix and output pads. 
However, it comes with some penalties because of its full hardware redundancy, such as 
area, 110 pads limitation and power dissipation. The amount of reliability required for 
critical applications such as space is normally accomplished through extra hardware or 
execution time. It is always important to make architectures more reliable with the 
minimum extra components. In order to reduce the overheads associated with TMR and at 
the same time coping with the transient and permanent upsets, we present a new technique 
based on dual hardware redundancy with comparison (DHRC) to detect faults in 
programmable matrix (SEU in the programmable elements of reconfigurable architecture). 
The upset detection and voting section are implemented in hardware to eradicate faults and 
identify fault free block (correct value) to allow continuous operation of the circuit. The 
main objective is to overcome drawbacks of full hardware redundancy (TMR). Moreover, 
it can save area for designs composed of large combinational logic structures. The scheme 
is based on these design objectives [BALO-066]: 
To design simple SEU mitigation scheme for combinational circuits 
To overcome limitations of Full Hardware Redundancy 
162 
Chapter 7: Dual Hardware Redundancy With Comparison (DHRC) 
3. To enhance overall system performance 
Figure-7. 1: Dual Hardware Redundancy with Comparison 
Figure-7.1 shows the details of the proposed scheme. Time redundancy in itself can only 
detect transient faults and the dual redundancy with comparison can only give the 
indication of a fault. However, the combination of the temporal redundancy scheme with 
the dual hardware redundancy produce very interesting results which can detect and 
correct the faults by identifying the faulty block. As illustrated in the Figure-7.1, there are 
two redundant blocks (A and B), these two blocks are essentially the same and four values 
are captured through auxiliary exclusive OR gates. As a consequence, these four values 
are used to eliminate the faulty block and to help to identify the correct fault free block. 
Two samples from each block are used in the proposed scheme: original and complement 
[BALO-066]. The details of these signals are as follows: 
• 	HRC-A: Hardware redundancy comparator from block A. 
• 	HRC-B: Hardware redundancy comparator from Block-B. 
• 	HRC-OAB: Hardware redundancy comparator using original values 
from Block A and B. 
• 	HRC-CAB: Hardware redundancy comparator using complemented 
values from Blocks A and B. 
The four signal results into sixteen permutations. Five different scenarios can be detected 
by analyzing the sixteen possibilities of output combination of A, IA, B and lB. The 
different syndromes are recognized and presented in Table-7. 1. 
These syndromes help to identify five distinct situations. The syndrome analysis part is 
implemented as a simple finite state machine based voting circuitry. The state machine 
parses the syndromes and appropriate action is taken to rectify the fault. The details of the 
different syndromes are as follows: 
Chapter 7: Dual Hardware Redundancy With Comparison (DHRC) 
Table-7. I: The Proposed Syndrome Generation and Analysis 
HRC-OAB HRC-A HRC-B HRC-CAB Syndrome 
o o 0 0 Transient Fault- at detection 
o o 1 1 Transient Fault in Block-A 
1 0 1 0 Transient Fault in Block-A 
1 0 0 1 Permanent Fault in Block-A I B 
o 1 0 1 Transient Fault in Block-B 
o 1 1 0 No Fault 
1 1 1 Permanent Fault Block-A or B 
1 0 0 Transient Fault in Block-13 
1 0 0 Transient Fault in Block-13 
1 1 1 1 Permanent Fault Block-A / B 
o 1 1 0 No Fault 
o 1 0 1 Transient Fault in Block-13 
1 0 0 1 Permanent Fault in Block-A / B 
1 0 I 0 Transient Fault in Block-A 
o 0 1 1 Transient Fault in Block-A 
o o 0 0 Transient Fault- at detection in 
7.3.1 Normal Operation - No Fault 	(06Hex) 
The syndrome (6-Hex) identifies that both copies of the combinational circuits are 
working normally and there is no fault in either of the blocks. The outputs from both block 
are identical and one of the two outputs is selected as the true output for further processing 
in the data path. 
7.3.2 Transient Fault in Block-A 	(03Hex, OAHex) 
The both syndromes (3Hex, lOHex) deduce that the Block-A has a transient Fault. The 
syndrome 3Hex identifies that the original outputs of the both blocks and the complement 
of Block-B are correct values. Hence, it is simple to single out the faulty Block-A. On the 
other hand, the syndrome lOHex identifies that the outputs from the two blocks are 
different while their complements are the same. The logical reasoning in terms of single 
event upsets for this phenomenon is that the Block-A is having a transient fault. On the 
other hand, Block-B can be assumed to be stuck at '1' while its actual value is 0'. This 
can only happen in the scenario of multiple upsets. This scheme is for single event upsets 
and multiple upsets are out of the scope of this research. So, it is deduced that the two 
syndromes are generated for the case where Block-A is faulty. The voting circuitry 
rectifies this fault condition by choosing Block-B as fault free block for further processing 
in the data path. 
7.3.3 Transient Fault in Block-B 	(05Hex, OCHex) 
These two syndromes are generated on the same principle as discussed in the case of 
Transient Fault in Block-A. 
164 
Chapter 7: Dual Hardware Redundancy With Comparison (DJ -IRC) 
7.3.4 Permanent Fault in Block-A/B 	(09Hex, OFHex) 
The syndromes are generated for a situation where both combinational blocks and their 
complements have different outputs. This can be either due to multiple radiation strikes or 
due to permanent fault in either of the blocks. 
7.3.5 Transient Fault in Detection 	(OOHex) 
This unique syndrome is generated if both blocks are functioning ok and the radiation hits 
the syndrome generation circuitry. The syndrome identifies this fault scenario and fault-
free output from either block is utilized in later processing down the data path. 
7.4 Voting Logic 
As discussed before, an upset in redundant block-A through syndrome (0011 & 1010), is 
characterized as a transient variation in the output with no changes in the block-B outputs. 
An upset possibility in block-B is recognized in an equivalent way. The syndromes (1001 
& 1111) help to identify the permanent effect due to single event transient in either of the 
blocks. Analyzing these syndromes (0000, 1001), it is not possible to conclude which 
redundant block has the correct value. The voter circuitry is designed to address this 
scenario and helps to continue the circuit operation in a known state. 
Figure-7.2: Voter circuit for the proposed scheme 
The unique syndrome (0000) helps to trace transient faults in the syndrome generation 
circuitry. This feature gives the proposed scheme an edge over already proposed hardware 
redundancy schemes. The voter circuitry is illustrated in Figure-7.2 which has been 
designed for the proposed scheme. 
165 
Chapter 7: Dual Hardware Redundancy With Comparison (DHRC) 
The voter circuitry is implemented in hardware (Figure-7.2). The voter takes into account 
the different syndromes to evaluate fault free output in case of transient faults in either of 
the redundant blocks or in the syndrome generating logic. The voting logic is composed of 
the following hardware blocks: 
• Two 4x 1 multiplexers 
• Two 2-input AND gates 
• One memory register (flip-flop) 
• Two 2x1 multiplexers 
• Counter 
The multiplexers are employed to select the appropriate fault-free output. The memory 
register stores the previous known fault free output of either block. The operation of the 
voting logic can be explained in different distinct fault scenarios as follows: 
7.4.1 Scenario-1: Normal Operation (No Fault) 
This is the scenario where both redundant blocks are fault free. The signals 1-[RC-A and 
1-IRC-B select a fault free output from either of the block. It is irrelevant which block is 
selected as both blocks are fault free. 
7.4.2 Scenario-2: Transient Fault (Block-A) 
This is a case where Block-A has a transient fault while Block-B has no fault. The voting 
logic corrects this fault scenario by selecting output from Block-B. The select lines of 
mux-2 select the appropriate input as fault free output. 
7.4.3 Scenario-3: Transient Fault (Block-B) 
This is a case where Block-B has a transient fault while Block-A has no fault. The voting 
logic corrects this fault scenario by selecting output from Block-A. The select lines of 
mux-2 select the appropriate input as fault free output. 
7.4.4 Scenario-4: Transient / Permanent Fault (syndrome 
Generator) 
This is a unique scenario where it is deduced that both blocks are fault free and radiation 
has caused upset in the syndrome generation circuitry. This fault can either be transient or 
166 
Chapter 7: Dual Hardware Redundancy With Comparison (DI-IRC) 
permanent in nature. In both cases two actions are taken once this unique scenario is 
detected by the voting logic: 
. The output from either redundant block is selected to ensure continued fault free 
operation of the system 
. A counter is incremented to confirm syndrome circuit fault. 
Let us assume that the fault is transient in nature. In this case the output from either of the 
redundant blocks is selected through appropriate muxes and the system continues working 
normally. The counter will be reset if any other syndrome is generated and appropriate 
action will be taken to make sure the normal working of the system continues. In the case 
where the fault is permanent in nature, the fault free output is again selected and counter is 
incremented. As the fault is permanent so the same syndrome is generated again for the 
next fault. Once the counter has reached a count of two (user defined based on the type of 
application) a fault signal is generated to let the system know that there is a permanent 
fault in the syndrome generation circuitry. 
7.4.5 Scenario-5: Permanent Fault (Block-A / Block-B) 
This is the case where it is not possible to conclude which block has the correct value. The 
voting Circuit IS equipped with a memory buffer. This buffer contains the previous correct 
value. Once this scenario is detected the previous value is selected. The selection of the 
previous fault free output is inspired due to two reasons. One is that the system will be in a 
known working state. The second reason is that if the fault goes away, then the system 
then will start working again on its normal operation, and for the duration of fault the 
previous fault free value will ensure the system to be in a legitimate state. 
The proposed scheme is based on hardware redundancy and involves duplication rather 
then triplication which is the case for TMR. The components are only duplicated to detect 
the fault and then voting logic corrects the fault depending on which syndrome is 
generated by the syndrome generator. Therefore, all the output and inputs pads will only 
be double the size of the original circuit. This duplication saves a lot of resources and 
helps to reduce pin count and area overhead. The implementation of the proposed scheme 
is represented in Figure-7.3. 
167 
Chapter 7: Dual Hardware Redundancy With Comparison (DHRC) 
HRC.A HRC.E 
COMBNA11ONA 	 HKOM 	 HRC.A 
LOGIC -A 
I 
__ FA I 	I 
COM8INA11ONP 
LOGIC - E 	_l__........iL1._H))-----i 	dk 
Figure-7.3: Implementation of the proposed scheme 
The main limitation except area overhead of the hardware redundant mitigation schemes is 
their inability to cope with voting circuit faults. The whole scheme can fail due to a fault in 
voting logic. The proposed scheme addresses the issue and has a unique syndrome for the 
syndrome generator faults. As discussed before voting logic takes appropriate action when 
this transient fault is detected. The voting logic circuit which is composed of a memory 
buffer and multiplexers has to be protected. Therefore, it is proposed that the hardening of 
the memory element is based on the scheme proposed by this thesis in Chapter-5. This will 
eliminate single-event faults in the memory element. The multiplexers can be hardened 
through full hardware redundancy as shown in Figure-7.4. 
H 
H 
Figure-7.4: Hardened Voting Circuit for the Proposed Scheme 
7.5 Performance Evaluation 
The Experimental results are derived from medium sized ISCAS benchmark circuits. We 
injected random SET faults with variable duration (ins, 3ns, 5ns, 9ns, ilns), to verify the 
proposed scheme. Faults were injected through a SEU simulator (discussed in Chapter-5). 
The synthesis tool used for technology mapping and optimization is Synplify® ASIC 
3.0.4. The technology used is 0.18 micron Cell library. Post synthesis simulations are 
168 
r 7: Dual Hardware Redundancy With Comparison (DHRC) 
performed with the help of Verilog-XL® to verify the proposed technique. The standard 
benchmark circuits were mapped on 0.18 micron technology. The SEU simulator was used 
to inject transient faults with variable pulse-widths (Ins, 3ns, fins. 9ns, 11 ns).  The different 
pulse-widths were used to test the proposed scheme in all scenarios of SETs, especially 
worst case scenarios. The results are shown as an average of all these pulse-widths. 
The first comparison was carried out on the basis of 110 pin resources. The results are 
shown in Table-7.2. The column under standard heading, represents the original input and 
output pins required. Then Full hardware redundancy technique is applied to these circuits 
and I/O pin requirement is shown in the table as 'TMR Technique'. The 110 pins required 
in the case of proposed scheme and Lima et. al. [LIM-003] are enlisted in the table for 
comparison. Lima et. a! [LIM-003] proposed a scheme that involves dual hardware and 
temporal redundancy. The syndrome generator used in Lima's scheme is based on three 
sequential and combinational elements. The syndrome generator design is not fault 
tolerant and sequential elements in the design make it more prone to SEU faults. Lima et. 
al. [LIM-003] proposed the scheme for mainly Xilinx® architectures and full TMR is 
employed at the output stage. The TMR at the output exhibits the requirement of three 
times the original output pins. This leaves no advantage over the standard TMR scheme. 
Whereas, the proposed scheme is based on the same number of output pins as the original 
circuit and there is no output overhead. The input pins are doubled due to dual hardware 
redundancy but still 33% less then the full TMR. 

























s298 3 6 9 18 6 6 6 18 
s344 9 15 27 45 18 15 18 45 
s349 9 11 27 33 18 II 18 33 
s382 3 6 9 18 6 6 6 IS 
c432 36 7 108 21 72 7 72 21 
c1908 33 25 99 75 66 25 66 75 
c2670 233 140 699 420 466 140 466 420 
c3540 50 22 150 66 100 22 100 66 
c7552 207 108 621 324 414 108 414 324 
2x2 Multiplier 4 4 12 12 8 4 8 12 
8x8 Multiplier 16 16 48 48 32 16 32 48 
16x 16 Multiplier 32 32 1 	96 96 64 32 64 96 
Chapter 7: Dual Hardware Redundancy With Comparison (DHRC) 
The area overhead for some of the ISCAS benchmark circuits is shown in Table-7.3. It can 
be seen that the area overhead is reduced appreciably in the case of the proposed scheme. 
The area figures are based on 0.18 micron technology and are measured in m 2 units. The 
area overhead figures of Lima et. al. [Lim-003} are in between the proposed scheme and 
the TMR scheme. The main reason for this is that the syndrome generator is based on a 
relatively larger circuit and has sequential components and the output stage is basically 
same as TMR. 
Table-7. 3: Area Comparison Results of the DHRC with other SEU Mitigation Techniques 
(im) 
Circuit Standard TMR The Proposed Lima et. al. [LIM-0031 
S298 2040 6325 4728 5978 
S344 2162 6760 5002 6278 
S349 2169 6752 4989 6203 
S382 2744 8517 6047 6931 
The Figure-7.5 presents percentage area overhead of the proposed technique with respect 
to the original circuit (without any SEU immunity). It can be seen that the TMR, as 
expected is approximately 200% costly in terms of area while, Lima's technique is better 
than full TMR but still considerably high when compared with the proposed scheme. 
250 	 .rMH 
- ins P.opnd = 	SI A [LIM.003] 
L Fill 
u8 s344 s349 s382 
Oencflma.0 accuSe 
Figure-7.5: Percentage Area Overhead comparison 
The next results present the comparison of SEU immunity of the schemes discussed before. 
The circuits were tested through the SEU simulator. Table-7.4 shows the total number of 
faults induced and the number of faults detected through different schemes. The results 
shown in Table-7.4, highlight that the proposed scheme detects 100% of the injected faults. 
170 
Chapter 7: Dual Hardware Redundancy With Comparison (DHRC) 
Whereas, the scheme proposed by Lima et. al. has approximately 98% 	100% fault 
detection. This can be attributed to circuit type and the undetected faults in the syndrome 
generator circuitry. Moreover. Lima's scheme is based on three different clocks (clkO, cIki, 
clk2) which have an inherent synchronization problem. There is no mechanism proposed 
to synchronize different data samples collected at different time intervals. The proposed 
scheme in this chapter has an average of 99.9% SEU immunity. This is fairly good 
keeping in view the savings of area and 110 pins. 
Figure-7.6 presents power consumption results. The proposed scheme is better than a full 
hardware redundant solution by approximately 24% in terms of power consumption and 
approximately 15% better than Lima et. al. [LIM-003]. As discussed before, the TMR 
triplicates all the components whereas in the dual redundancy scheme of Lima et. al. 
[LIM-0031 there is 3 extra flip-flops in the syndrome generator circuitry in addition to the 
fact that the output stage of Lima's scheme relies on TMR. This power saving along with 
area and 110 Pin saving make the proposed scheme an attractive choice for critical mission 
applications. 
Table-7. 4: Comparison Results of the DHRC with other SEU Mitigation Techniques 
TMR 
_______  
















s298 1000 1000 0 1000 1 982 18 
s344 1000 1000 0 1000 0 991 23 
s349 1000 1000 0 1000 2 1000 0 
s382 1000 1000 0 1000 0 999 12 
c432 1000 1000 0 1000 0 987 14 
c1908 1000 1000 0 1000 I 988 23 
c2670 1000 1000 0 1000 0 992 25 
c3540 1000 1000 0 1000 1 994 32 
c7552 1000 1000 0 1000 I 997 18 
2x2 multiplier 1000 1000 0 1000 0 1000 0 
8x8 multiplier 1000 1000 0 1000 I 1000 0 
l6x16 
multiplier 
1000 1000 0 1000 0 997 15 
171 
Chapter 7: Dual Hardware Redundancy With Comparison (DHRC) 





Figure-7.6: Power Comparison of the Proposed Scheme 
7.6 Summary 
In this chapter, a novel SEU mitigation scheme has been introduced for combinational 
circuits. The scheme is based on dual hardware redundancy with a comparison technique. 
The design concepts of the proposed scheme are discussed. The technique is composed of 
syndrome generator, which generates syndrome for each fault scenario. The syndrome 
generator generates syndrome for it own fault as well, the proposed scheme compared with 
the commonly in used schemes in terms of area and power performance. The SEU 
immunity of the scheme is tested through the SEU simulator and results advocate that the 
proposed architectures are most suited to the mission critical applications where radiation 




PERFORMANCE EVALUATION OF 
THE PROPOSED ARCHITECTURE 
8.1 Introduction 
The efficient domain specific reconfigurable architectures for Discrete Wavelet Transform 
algorithms were introduced in Chapter-3. The performance of these novel architectures in 
terms of speed, area and power consumption was compared with a generic FPGA and 
efficacy of the proposed fabric was proved through different results. Moreover, some 
novel SEU/SET mitigation techniques were proposed in chapter-5, Chapter-6 and Chapter-
7. In this chapter, the proposed schemes were implemented on the newly introduced 
reconfigurable architectures. The performance of the proposed reconfigurable fabric is 
evaluated in terms of area, power consumption and SEU immunity. This chapter explains 
the test setup and the results are compared with different already in use techniques. 
8.2 Performance Evaluations 
This section deals with the performance evaluation of the proposed reconfigurable 
architecture and its immunity against single event faults (SEU and SET). The newly 
introduced architecture has a wide spectrum in terms of architecture, operations, flexibility 
and capability to withstand the after effects of radiations therefore, the performance is 






8: Performance Evaluation Of The Proposed Architecture 
The proposed reconfigurable architecture is a domain specific reconfigurable fabric which 
deals with the efficient implementations of the DWT algorithms. Three different discrete 
wavelet transform algorithms were selected as target applications. The three different 
DWT algorithms were: 
The Proposed (3-input pixels) Lifting based DWT algorithm 
Lian et. al. lifting based DWT [LIA-0011 
Fast integer based DWT [DAN-0021 
The decision to have three different implementations was inspired by many reasons, but 
the main reasons were; firstly, to prove the performance of the proposed architecture over 
a range of target application's algorithms secondly, to prove the flexibility of the 
reconhigurable core to handle different discrete wavelet transforms thirdly, to prove the 
efficacy of SEU/SET mitigation schemes in a variety of application oriented algorithms 
and lastly, to have more test space to evaluate and analyse the comparison results. 
The Performance evaluation is carried at five different levels as shown in Figure-8.l. The 
levels are decided to compare different implementation platforms and SEU/SET mitigation 
techniques comprehensively and cover every aspect of the proposed architectures in terms 
of performance. The proposed SEU/SET simulator (discussed in Chapter-5) is used to 
inject faults at different circuit nodes with different pulse-widths. The details of the five 
levels are: 
• LEVEL-1: compares the performance of the proposed reconfigurable architecture 
with two other platforms, i.e. ASIC and Generic FPGA (Xilinx Virtex-E). The 
evaluation establishes the performance advantage of the proposed fabric over 
generic reconfigurable architectures. The ASIC implementation helps to compare 
the area and power overheads of the proposed architecture. The results explain the 
cost to have flexibility in the architecture. Three different DWT algorithms, as 
discussed earlier, are implemented on three platforms without any level of SEU or 
SET immunity. The implementation results are used to compare and analyse the 
cost of the radiation hardening through different schemes including the proposed 
schemes at different levels of performance evaluation. 
• LEVEL-2: Full hardware redundancy is the most commonly used technique in the 
aerospace industry. The three DWT algorithms were implemented on ASIC and 
on the proposed architecture with full TMR and the results are compared. The 
174 
8: Performance Evaluation Of The Proposed Architecture 
performance was compared with the previous level to determine the overheads 
associated with the TMR scheme. 
. LEVEL-3: SEU/SET immunity was evaluated for ASIC and the proposed 
reconfigurable fabric by using Mavis et. al. Mitigation scheme [MAV-002]. The 
implementation was used to bench mark the system performance in three different 
DWT scenarios. The results were compared with Level-i to determine the 
overheads associated with Mavis et. al. technique. 
• LEVEL-4: Lima et. al. proposed a scheme based on dual hardware redundancy 
[LIM-003]. The technique is implemented on both platforms. Again, the results 
were compared and analysed with the Level-i results. 
• LEVEL-5: This thesis has introduced three different novel SEU/SET mitigation 
schemes (Chapter-5, Chapter-6 and Chapter-7). The proposed schemes were 
implemented on the proposed reconfigurable architecture. The results were 
compared with all previous levels to prove the efficacy of the proposed fabric 
when used in harsh environments. 
Level- i advocates the performance edge of the reconfigurable core over a generic FPGA 
architecture. Level-2 to Level-4, help to analyse the following two points: 
. To determine the performance of the already introduced radiation hardening 
schemes 
• To provide performance figures to benchmark the proposed mitigation 
schemes in early chapters 
Therefore, with the help of these comparisons, the advantage of the proposed 
reconfigurable fabric is proved along with the efficacy of the proposed radiation hardening 
schemes for the proposed reconfigurable architecture. 
175 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
DVVT Alporithml 	 The Proposed SEt Miticution Schemes  
Thc Proposed Lthusg Bsed DWT 	 lmplcrncnLauon - I Temporal Rcdan'ianc' Schcmc Chapter-5. 
Lim olLfiung Based D\L \ Implementation - 2 PurualTMRSchc,m PTMR)Chaplc-  
loLcgc F&t DWI DAIN 	 Imptcnicutaoc - Dual Harthasc Re&md,c Scheme Chapter-7 
LEVEL .1  
The Proposed 	 Generic FPGA 
WnloutRud,aucuHardcmnc 	Reconficurable Architecturt 	tXiIin %irtex-E 
Wuh.r 	 eo,e Whow Radaion Ha,dcnmp 
—----- 
LEVEL 2 
the ProDojed 	 U 
With Full Hard,u arc Redundanc, iTNIR 
A 
A. - 
Perrormance Evaluation - 
F EL 3 	
Arc, P.— SESET Immmut, 	 - 	D 
ihe Proopsed 	 - 
Reconfillurable Architecture Wul, Mas-rurl a MA'Q-002 
-- a T 
LEVEL - 4 
The Proposed 
Reconfjour:bleArchitecturi 	Writ, Lima cialLlM-O) j 
JL:L 
The Pronosed The Pr000sed 	- - 	The Proposed 	R 
Recpnfipurable Architecture Recontleurable Architecturt i  Reconfteurabie Architecturt 
With Proposed Scheme - I 	With Proposed Scheme - 2 	IL 	With Proposed Scheme 3 
Figure-8. 1: Performance Evaluation Flow Diagram for the Proposed Reconfigurable Architecture 
All VLSI implementations (discussed above) targeting 1-D DWT were implemented on 
different platforms (discussed earlier). The Xilinx Virtex-E XV50E device was chosen as 
generic FPGA target device for comparisons [XIL-DAT]. All these systems use 0.18.tm 
CMOS technology and run at 1.8V. The values are measured for a single frame of three 
images (Lenna, Barbra and Pepper), of size 64x64 and 128 x128. All the implementations 
were fed with extended images as discussed in chapter-3. The input data contains 'OOOH' 
after the image information. Active HDL 5. l Tm and Modelsim 5.8b ' were used for 
Register Transfer Level (RTL) simulations in Verilog Hardware Description Language 
(HDL). Cadence's Verilog-XL Tm was used for post simulation to verify the designs. 
Power analysis was carried out through Synopsis Prime Power tool. Xilinx ISE 6.2i was 
used to get performance figures in terms of area, timing and power for the generic FPGA. 
176 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
8.3 Evaluations: Level-1 
Level-i evaluations were performed to prove the performance advantages of the proposed 
domain specific reconfigurable architecture over a generic FPGA core. The results are 
based on three images (Lenna, Barbra and Pepper). A single frame for the images was 
taken and the simulations were performed for image sizes of 64x64 and 128x 128. Three 
different DWT algorithms were implemented as explained earlier in this chapter and are 
referenced as DWT- 1, 2 and 3 in later sections. The details of these implementations are: 
DV,/T-1: 9/7 Lifting based Lian et. al. DWT [LIA-0011 
DWT-2: 9/7 Integer Fast based DWT [DAN-002] 
DWT-3: 9/7 Proposed lifting based 3-input DWT 
The results are based on an operating frequency of 40MHz. The three 9/7 DWT algorithms 
were implemented on three different platforms: 
Application Specific Integrated Circuits (ASIC) 
The proposed reconfigurable architecture 
A Generic FPGA architecture (Xilinx Vitex-E) 
Table-8. I: Power consumption and comparison figures of various DWT algorithms on three 
different platforms for the processing of a single frame of test image, Lenna, Barbra and Pepper at 
image size of 64x64. Test conducted at clock frequency of 40MHz 
POWER POWER 
ASIC VIRTEX.E PROPOSED OVERHEAD OVERHEAD 
IMAGE IMAGE TYPE POWER POWER POWER DIFFERENCE DIFFERENCE 
SIZE TYPE OF DWT CONSUMPTION CONSUMPTION CONSUMPTION OF VIRTEX.E OF PROPOSED 
(MILLI-WATT) (MILLI-WATT) (MILLI-WATT) W.R.T. ASIC CORE W.R.T. 
XILI N N 
DWT-1 0.89 6.82 3.59 5.93 3.23 
Lenna DWT-2 0.93 11.3 5.91 10.37 5.39 
DWT-3 0.78 6.13 2.88 5.35 3.25 
64 DWT-1 0.93 7.01 3.64 6.08 3.37 
X Barbra DWT-2 0.99 11.9 5.99 10.91 5.91 
64 DWT-3 0.85 6.61 2.97 5.76 3.64 
DWT-1 0.90 6.93 3.60 6.03 3.33 
Pepper DWr-2 0.94 1.4 5.96 10.46 5.44 
DWT-3 0.79 6.22 2.90 5.43 3.32 
The image extension algorithms were implemented separately from the DWT core and the 
results are not considered for the sake of fair comparisons among different architectures. 
177 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
Table-8.1 shows power consumption figures for different implementations on three 
different platforms. It can be seen that the proposed reconfigurable core is better than a 
generic FPGA in terms of power consumption. The results were used to analyse power 
overhead of the proposed reconfigurable fabric over ASIC implementations. 
The break-down of the power consumption of the proposed reconfigurable fabric into net 
internal switching and logic internal switching are graphically represented in Figure-8.2. 
tWT -- Uan el a LihOO I 
Owl - Oang 
6
a [DAN-0021 






Figure-8.2: Power consumption graph of various DWT algorithms on the proposed reconfigurable 
fabric for the processing of a single frame of test image, Lenna, Barbra and Pepper at image size of 
64x64. Test conducted at clock frequency of 40MHz 
Table-8. 2: Power consumption and comparison figures of various DWT algorithms on three 
different platforms for the processing of a single frame of test image, Lenna, Barbra and Pepper at 
image size of 64x64. Test conducted at clock frequency of 40M1-Iz 
image size image type Type of DWT % Power Overhead of 
Generic FPGA 
% Power Overhead of 
Proposed Fabric 
% Power Overhead of Generic 




DWT-I 666.2% 303.3% 89.9% 
DWT-2 1115.1% 535.4% 91.2% 
DWr-3 685.8% 269.2% 112.8% 
Barbra 
DWT-1 6537% 291.3% 92.5% 
DWT-2 1102% 505% 98.6% 
DWT-3 677.6% 249.4% 122.5% 
Pepper 
DWT-1 670% 300% 92.5% 
DWT-2 1112.7% 534% 91.2% 
DWT-3 1 	687.3% 267% 114.4% 
The ASIC, as anticipated, gives the best performance in terms of power consumption but 
as explained in chapter-3 lacks the flexibility to incorporate the future upwards trends of 
technology and algorithms. Table-8.3 presents the cost factor in percentage that is incurred 
to make the architecture flexible. It can be seen that the percentage power cost associated 
with a generic FPGA structure is approximately between 650% and 1150%, depending on 
178 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
the target application. On the other hand this cost factor is reduced to approximately only 
260% to 540% with the help of the proposed reconfigurable fabricFigure-8.3 shows the 
power consumption graphically for different DWT algorithms on various implementation 
platforms. It can be noted that the selection of image plays very little role towards the 
power consumption. The main power consumption is due to DWT algorithm and the 
selection of the hardware platform. It can also be seen that the proposed architecture is 
approximately in between a pure ASIC and a generic FPGA design in terms of power 
consumption. 
14 0 Proposed ReconIlgurabe Faboc U Xthflx V4ea.E U ASIC 
12- 
OWl- OWT- DW7-2 Owl- DWT-WWl- OW'I.'OW'T- OW1- 
Lenra 	 Barbra 	 Peppe 
Figure-8.3: Power consumption graph of various DWT algorithms on three different platforms for 
the processing of a single frame of test image, Lenna, Barbra and Pepper at image size of 64x64. 
Test conducted for 329000ns at clock frequency of 40MHz 
Table-8.3 represents the power consumption and comparisons figures for the three 
implementation platforms for test images of size 128x 128. 
Table-8. 3: Power consumption and comparison figures of various DWT algorithms on three 
different platforms for the processing of a single frame of test image, Lenna, Barbra and Pepper at 
image size of 128x 128. Test conducted at clock frequency of 40MHz 
ASIC VIRTEX-E PROPOSED POWER POWER 
IMAGE IMAGE TYPE POWER POWER POWER OVERHEAD OVERHEAD 
SIZE TYPE OFDWT CONSUMPTION CONSUMPTION CONSUMPTION DIFFERENCE DIFFERENCE 
(MILLI-WATT) MILLI.WATT (MILLI-WATT) W.R,T ASIC W.R.T. XILINX 
DW7-1 0.89 6,82 3.59 5.93 3.23 
Lenna DW1'-2 0.93 11.3 5.91 10.37 5.39 
DWT-3 0.78 6.13 2.88 5.35 3.25 
128 DWF-1 0.93 7.01 364 6.08 3.37 
X Barbra DWT-2 0.99 11.9 5.99 10.91 5.91 
128 Dwr-3 0.85 6.61 2.97 5.76 3.64 
DWT-1 0.90 6.93 3.60 6.03 3.33 
Pepper DWT-2 0.94 11.4 5.96 10.46 5.44 
DWT-3 0.79 6.22 2.90 5.43 3.32 
179 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
8.3.1 Evaluation (Process Completion Time) 
Although all three implementation platforms process about the same number of data 
samples. the time duration for each platform to complete one level DWT process is 
different. The proposed reconfigurable architecture has a process time less than the generic 
FPGA architecture. The process time for the three implementation platforms are stipulated 
in Table-8.4. It can be seen that the proposed recontigurable fabric is better in terms of 
process time for one level of DWT algorithms. 
Table-8. 4: Process completion time for various implementation platforms for a single frame of test 
image at image size of 6404 at clock frequency of 40MHz 
IMAGE 
SIZE 
PROCESS COMPLETION TIME (ONE LEVEL-DWT) 
ASIC Generic FPGA The Proposed Reconfigurable Fabric 
64x64 35072nS 137488nS 102368nS 
The proposed fabric is 1.3 times faster than the generic FPGA. The results can be 
explained as the generic FPGA uses more routing resources than the domain specific 
architecture and each routing resource adds capacitance to the track which in tern slows 
down the whole architecture. On the other hand, power is a function of time and the 
difference in completion time has an impact on the power consumption of the architecture. 
Consequently, power evaluations at each individual platform were carried out with respect 
to the process competition time. A single frame of Lenna image of 64x64 size was 
processed for one level of DWT computations at clock frequency of 40MHz for 
137488nS. The comparison results are stipulated in Table-8.5. 
Table-8. 5: Power consumption and comparisons for various implementation platforms for a single 
frame of test image at image size of 6404 at clock frequency of 40MHz. Test conducted for 
l37488nS. 
IMAGE IMAGE TYPE ASIC POWER VIRTEX-E POWER PROPOSED POWER THE PROPOSED FABRIC 
SIZE TYPE OF DWT CONSUMPTION CONSUMPTION CONSUMPTION % POWER INCREASE 
(MILLI-WATT) (MILLI-WATT) MILLI-WATT) W.R.T. TABLE-8.I 
64 
X Lenna DWT-1 1.81 6.82 6.103 70% 
64 
It can be seen that the power consumption had actually increased for the proposed and 
ASIC platform as compared to Table-8. I. The percentage increase in the proposed fabric 
is approximately 70% and power consumption figure is now 6.103mW. The percentage 
advantage of the proposed architecture is reduced to 11% as compared to 89 17c, (Table-8.2). 
180 
8: Performance Evaluation Of The Proposed Architecture 
The increase in the power consumption results is due to the difference in power 
consumption during the actual image processing period and during the architecture's idle 
period. 
8.3.1.1 Analysis 
In the previous section, the two sets of the power evaluation results were presented. The 
results showed an approximate 70% power consumption increase in the proposed 
architecture when all the platforms were evaluated for the same processing time. The 
discrepancies in the power consumption results are discussed in this section. All the power 
consumption profiles, used in this section are the proposed architecture's power 
consumption figures when it processes a 64x64 Lenna image. The power consumption 
figures and process times are presented in the previous section. The power profile of the 
proposed reconfigurable fabric is shown in Figure-8.4 at its process completion time 
102368nS at clock frequency of 40MHz. 
102368 
Figure-8. 4; Graphical representation of the average power consumed by the proposed 
reconfigurable fabric for the processing of one level of DWT of a single frame of test image 64x64 
at its process completion time of 102368ns at clock frequency of 40MHz 
The El is the energy consumed, P1 the power consumption 3.59mW and Ti is the process 
completion time 102368ns (figure-8.4). The power P1 in the graph shown is the same 
value in Table-8.1. On normal power evaluation assumption, this power consumption 
value should remain the same for the remaining time after 102368ns. This assumption 
does not hold and the architecture runs in idle state and power consumption goes down. 
(The image is extended with 0000H at the end to incorporate this test). The power 
consumption figures fell from 3.59mW to 2.513mW (P2). The reduction in the power is 
due to the average effect of the difference in power consumption during the actual image 
processing period and during the architecture's idle time. The graphical representation of 
the power consumption profile is shown in figure-8.5. 
181 







Figure-8. 5: Graphical representation of the average power consumed by the proposed 
reconfigurable fabric for the processing of one level of DWT of a single frame of test image 64x64 
for the test duration of 137488ns at clock frequency of 40MHz 
Ti is the actual time involved in image processing and T2 is the time the architecture runs 
in idle mode. E2 is the energy associated with P2. Therefore, a more accurate 
representation of the power consumption profile should be a representation that has two 
distinct power consumption time regions as shown in Figure-8.5. The idle period power 
consumptions shown in Table-8.5 were obtained by running the proposed architecture with 
null input vectors over a time of 137488ns at a clock frequency of 40MHz. 
Table-8. 6: The idle period power consumption for various implementation platforms for a single 
frame of test image at image size of 64x64 at clock frequency of 40MHz. Test conducted for 
I 37488nS. 
IMAGE IMAGETYPE TYPE OFDWT TYPE OFPOWER CONSUMPTION PROPOSED POWER CONSUMPTION 
SIZE 
64 Logic Internal Switching 1.759 rnW 
X Lenna DWT-1 Net Switching 0.75 rnW 
64 Total 2.513 rnW 
To explain the power consumption figures of Table-8.1 due to the averaging effect of the 
two regions, the following calculation will use the two energy values (El and E2) to get 
the power consumption of the proposed reconfigurable architecture. In the following 
calculations, the power consumption figures of each region are converted into energy 
figures. The idle period power of the proposed reconfigurable architecture, shown in 
Table-8.5 will be used in calculations. The area under the power curve is the energy 
182 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
consumed by the processor hardware. The following expression calculates the energy 
consumed. 
E = $P(t)dt 
	
(8.1) 
Hence, energy consumed by the proposed reconfigurable fabric for its power consumption 
3.59rnW(Pl) [Figure-8.4] fora period of 102368ns is shown in Equation-8.2. 
El = P1 x Ti 
= [3.59 ,n WI x [102368ns] 
= 367501.12 iii. 
The idle time period T3 is calculated as: 
T2=T3—T1 
= 137488ns— 102368ns 
= 35120ns. 
The energy E2 can be calculated as: 
E2 = P2 x T2 
= [2.5 13 mW] x [35120ns] 
= 88256.56 ni. 
The total energy can be calculated as: 
E3 = El + E2 
= 367501.12 ni + 88256.56 ni 
= 455757.68 nJ. 
Finally, the average power P3 consumed by the proposed reconfigurable architecture when 
processing the test image (Pepper) is calculated by the following. 
P3 = E3 / T3 
= 455757.68 / 137488 
= 3.41 niW 
183 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
The calculated power consumption value P3 of 3.41mW in the above calculation is close 
to the value of 3.59mW shown in Table-8.1. The calculation proved that the power 
consumption profile in Figure-8.5 is correct and the reduced power is due to an averaging 
effect of the energy in the two distinct power consumption time regions. 
8.3.2 Area Comparison 
The area estimates were performed for generic FPGA. ASIC and for the proposed 
reconfigurable core. The synthesis was performed on 0.18 tm technology, as stated 
earlier. The findings are given below: 
The area estimates for Xilinx Virtex-E XV50E are presented in Table-8.7. The area 
figures are an estimate (Virtex-E CLB area = -30 tm 2 , each Virtex-E CLB Consists of two 
slices) provided by Chipworks [CHI -WEB][SAM-004]. It is worth mentioning that the 
routing and mapping of the proposed array is done through VPR tool and hence, the results 
are less optimized than that of Virtex-E. This is due to commercial intelligent routing and 
placement algorithms. The area results of the proposed array can further be improved by 
employing more sophisticated tools for routing and placement as careful placement of 
logic blocks can optimise the performance of the circuit in terms of speed as well as power 
consumption by reducing the routing requirements. It can be seen that an area saving of 
7% to 12% is achieved through the proposed architecture. As mentioned earlier, the area 
saving can be made more through dedicated and sophisticated placement and routing 
algorithms. 
All three DWT cores were also put through the ASIC implementation on the same feature 
length as the Virtex-E FPGA (0.18 gm). The "umcIJ8u250t2_iyp" library was used along 
with Synplify ASIC to perform the synthesis. 
Table-8. 7: Comparison of Map and Place & Route Reports (Xilinx Virtex-E). 
DWT-1 DWT-3 DWT-2 
SLICES (of 768) 102 13% 66 8% 425 55% 
SLICE FFs (of 1.536) 95 6% 41 2% 188 12% 
MAP 
4 input LUTs 128 
(of 1536)  
8% 109 7% 558 36% 
Total gate count 2,355 1,809 9,159 
REPORT JTAG gate count 
for lOBs  
3,216 3,984 5,032 
AREA (Xilinx) 1.64 mm2 1.197 mm 2 6.95 mm-' 
00 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
The ASIC implementation results are shown in Table-8.8. As anticipated, these area 
estimates are much lower than a generic FPGA core. 
Table-8. 8: Comparison of ASIC synthesis results. 
DWT.1 DWT-3 DWT.3 
Cell Usage Count 431 270 1397 
Area 41m 2 ) 19140 12188 46715 
Fanout 
Report 
NET Count 547 402 1754 
Total Fanout 1 	786 1 	546 1 	3470 
Timing Max Freq. 1 252 MHz 1 211 MHz_-- 1 194 Ml-{z 
The area comparison results are represented in Table-8.9. The proposed reconfigurable 
core can be made more area efficient through efficient placement and routing algorithms. 
Table-8. 9: Area Comparisons. 




DWT-1 19140 (jtm2 )
__ 
 1530039 (lAm2 ) 1647732 (mm 2) 
DWT-2 46715 (tim) 6241385 (urn) 6953199 (pm) 
DWT-3 12188 (m2 ) 1100356 (tm) 1197635 (Pm) 
8.4 Evaluations: Level-2 
Level-2 performance evaluations were performed to estimate the effect of the full 
hardware redundancy on the proposed reconfigurable architecture and other hardware 
platforms. As discussed in earlier chapters, TMR is the most common technique to 
improve a circuit's immunity against single event effects. The results are derived for all 
three DWT implementation algorithms as explained before. A single frame of test images 
of Lenna, Barbra and Pepper of image size 64x64 was used during this level of 
evaluations. 
Table-8. 10: Full TMR Power consumption figures of various DWT algorithms on three different 
platforms for the processing of a single frame of test image, Lenna, Barbra and Pepper at image size 
of 64x64. Test conducted at clock frequency of 40MHz 
185 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
IMAGE IMAGE TYPE ASIC VIRTEX-E POWER PROPOSED POWER 
SIZE TYPE OF DWT POWER CONSUMPTION CONSUMPTION CONSUMPTION 
DWT-i 2.79mW 21.37mW 11.04mW 
Lenna DWT-2 2.83mW 35.1 mW 18.19 Mw 
DWT-3 249mW 19.8 mW 9.68mW 
64 DWT-1 2.68 mW 20.2mW 10.82mW 
X Barbra DWT-2 2.91 mW 34.97mW 16.99mW 
64 DWT-3 2.58 Mw 20.14 mW 9.56mW 
DWT-1 2.60 mW 20.5 mW 10.84 mW 
Pepper DWT-2 2.89mW 35.04mW 18.66mW 
DWT-3 2.5 mW 19.68 mW 9.77 mW 
The TMR technique triplicates all the circuit components and majority voter circuitry is 
used to get fault free output. Obviously, the area overhead is more than 200% for the TMR 
technique. Figure-8.6 presents the percentage power overhead associated with the TMR 
scheme. The power consumption figures are increased approximately 180% to 210% as 
compared to the original circuit (without TMR). This power increase makes TMR a less 
attractive choice for mission critical applications. Moreover, the faults in the voting 
circuitry can also degrade the SEU immunity. Level-5 based evaluation deals with these 











0 • 	• 	I 	I 	I 	S 	S 	S 
0WT DT-2 DWT•3 DWT- DWT-2DWT3DWr DWT2 
Lenna 	 Barbre 	Pepper 
Figure-8.6: Percentage power overhead comparison of TMR technique. The results are based on 
64x64 Lenna, Barbra and Pepper test images implemented through various DWT algorithms 
(DWT-1, DWT-2, DWT-3) 
8.5 Evaluations: Level-3 
Level-3 performance evaluations were performed to estimate the effect and overhead of 
the Mavis et. al. [MAV-002] technique on the proposed reconfigurable architecture and 
5C 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
other hardware implementation platforms. The technique is based on multiple data 
samples at different distinct operation times. The technique is based on four different 
clocks and a voting logic to determine the correct output. However, the technique does not 
propose any scheme for combinational elements and voting circuit faults. Moreover the 
scheme has a performance overhead of nine latches and voting logic. The scheme was 
implemented to get the power and area figures for different implementation platforms. 
These performance figures were used to benchmark the proposed techniques later at 
Level-5. 
The test was conducted on test images of Lenna. Barbra and Pepper. All test images were 
single frame of 64x64 size. Table-8.11 represents power consumption of the Mavis et. al. 
scheme. The voting circuit was implemented on the simple majority basis for three inputs 
(A, B and C) and the following Equation was used: 
(AB)C+AB 	 (8.2) 
Table-8. 11: Mavis et. al. [MAV-002]Power consumption figures of various DWT algorithms on 
three different platforms for the processing of a single frame of test image, Lenna, Barbra and 
Pepper at image size of 64x64. Test conducted at clock frequency of 40MHz 
IMAGE IMAGE TYPE ASIC PROPOSED POWER 
SIZE TYPE OF DWT POWER CONSUMPTION CONSUMPTION 
DWT-1 1.61 mW 6.43mW 
Lcnna DWT-2 1.75 mW 10.99 Mw 
DWT-3 1.479 mW 5.89 mW 
64 DWT-1 1.66mW 7.02mW 
x Barbra DWT-2 1.75 mw 11.52 mW  
64 DWT-3 1.58 mW 6.59 mW 
DWT-1 1.6mW 6.38 mW 
Pepper DWT-2 1.69mW 10.61 mW 
DWT-3 1.47 mw 5.85mW 
Figure-8.7 presents a graphical view of power distribution among Level-1, Level-2 and 
Level-3. Although it is not fair to compare TMR with Mavis et. al. scheme because the 
TMR takes care of combinational circuits but even then Figure-8.7 gives an overview of 
the schemes with respect to original circuits without any SEU immunity. 
187 


















Figure-8. 7: Power consumption distribution among Level-I, Level-2 and Level-3 
The SEU immunity of the Mavis et. al. scheme is evaluated at Level-5 along with the 
proposed schemes. The area estimates are represented in Table-8.12. It is evident that an 
area increase of 1.6 times in ASIC and 4.4 times in the proposed reconfigurable 
architecture, is associated with the Mavis et. al scheme. The increase in the proposed 
fabric is 3.14 times more than ASIC due to more memory element (configuration memory) 
than ASIC design. It is understandable as the Mavis et. al scheme only works on the 
synchronous/memory elements of the circuits as explained in chapter-5 and the area 
increase is very much circuit dependant. ASIC circuits are without configuration memory 
and that is why the change is only 1.26 times. 
Table-8. 12: Area Comparisons of Mavis et. al [MAV-002]. 
ASIC THE PROPOSED AREA AREA INCREASE 
RECONFIGURABLE INCREASE (THE PROPOSED 
CORE (ASIC) RECONFIGURABLE 
CORE) 
DWT-3 15436 (im2) 4937694 (p.m) 1.26 4.4 
8.6 Evaluation: Level-4 
Level-.4 performance evaluations were performed to estimate the effect and overhead of 
the Jima et. al. [LIM-003] technique on the proposed reconfigurable architecture and other 
hardware implementation platforms. The technique is based on dual hardware redundancy 
with comparison. The syndrome generator is based on storing the values in a memory 
element and the output stage is inherently based on TMR. Lima's work was mainly on 
Xilinx architectures and the output TMR is based on Xilinx majority voting scheme. Lima 
188 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
proposed three different clocks to capture data samples. The scheme was evaluated in 
terms of power, area and SEU immunity and the results were compared with the proposed 
alternatives. 
Table-8. 13: Lima et. al. [LIM-003]Power consumption figures of various DWT algorithms on 
three different platforms for the processing of a single frame of test image, Lenna, Barbra and 
Pepper at image size of 64x64. Test conducted at clock frequency of 40MHz 
IMAGE IMAGE TYPE ASIC PROPOSED POWER 
SIZE TYPE OF DWT POWER CONSUMPTION CONSUMPTION 
DWF-1 2.23 mw 8.99mW 
Lenna DWT-2 2.28 mW 14.48 mW 
DV,/T-3 2.01 mW 7.42mW 
64 DWT-1 2.29 mW 8.96 mW 
x Barbra DWT-2 2.31 mW 13.97 mW 
64 DWT-3 2.11 mW 7.37 mW 
DWT-1 2.19mW 8.76mW 
Pepper DWT-2 2.3 mW 14.58 mW 
DWr-3 2.05 mW 7.5 mW 
Figure-8.8 represents the percentage power consumption overhead of the schemes 












Dwr- D\NT- DWT- DWr- DWT- DWT- DWI- DWT- DWT- 
23 	 2 	3 	2 	3 
- 	Lenna 	 Barbra Peppei 
Figure-8. 8: Percentage power consumption overhead with respect to ASIC 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
8.7 Evaluations: Level-5 
Level-5 performance evaluations were performed to estimate the effect and overhead of 
the three proposed techniques (Chapter-5, Chapter-6 and Chapter-7) on the proposed 
reconfigurable architecture and other hardware implementation platforms. The technique 
introduced in Chapter-5 is for synchronous circuits while the schemes introduced in 
Chapter-6 and Chapter-7 are for combinational circuits. The two combinational circuit's 
schemes were implemented together with the synchronous element scheme (chapter-5) for 
the processing of one level of DWT. The fore said test images were used as test images of 
64x64 size for a single frame. The partial TMR scheme (Chapter-6) along with Chapter-5 
scheme will be referenced as implementation one (Imp-1) for the rest of the discussions 
and dual hardware redundancy scheme (Chapter-7) with Chapter-5 will be referenced as 
implementation two (Imp-2). 
Table-8.14 represents power consumption results for ASIC and the proposed 
reconfigurable core, Imp-1 and Imp-2 implementations. The tests were conducted for 
various types of DWT algorithms on test images. 
Figure-8.9 represents the percentage power overhead for various mitigation schemes for 
the proposed reconfigurable core. It can be seen the proposed mitigation schemes are 
better in terms of power consumption than the previously proposed techniques in 
literature. The Mavis et. al. [MAV-002] scheme has lower power consumption than the 
proposed Imp-2 which was anticipated a the Mavis's technique is only for the 
synchronous elements of the circuits. 
Table-8. 14: Power consumption figures and comparisons of various DWT algorithms on different 
platforms for the processing of a single frame of test image, Lenna, Barbra and Pepper at image size 
of 64x64. Test conducted at clock frequency of 40MHz 
IMAGE IMAGE TYPE OF IMP-2 IMP-I IMP.2 IMP-I 
SIZE TYPE DWT POWER POWER POWER POWER 
CONSUMPTION CONSUMPTION CONSUMPTION FOR CONSUMPTION 
FOR ASIC CORE FOR ASIC CORE POPOSEI) CORE FOR PROPOSED 
CORE 
DWr-1 2.11 mW 1.82mW 7.39mW 8.52 rnW 
Lenna DWT-2 2.12 mW 1.86mW 11.8mW 13.5 mW 
DWT-3 1.945 mw 1.76mW 6.49mW 7.3mW 
64 DWT-I 2.13 mw 1.874mW 7.55mW 7.99 mw 
x Barbra DWT-2 2.135mW 1.88 Mw 11.3mW 13.01mW 
64 DWT-3 1.97 mW 1.775 mW 6.2 mW 6.78 mW 
Dwr-1 1.958 Mw 1.77 mw 7.08mW 7.88mW 
Pepper DWI'-2 2.24 mw 1.905 Mw 12.0mW 14.3mW 
DWT-3 1.95mW 1.765mW 6.47mW 7.15mW 
190 
Chapter 8: Performance Evaluation Of The Proposed Architecture 




DWT. DWT DWT DWT DWT. DWT DWT. DW1. DW1 
2 	3 	 2 	3 	 2 	3 
Lenna Barbra Peppei 
Figure-8. 9: Percentage power consumption overhead with respect to ASIC 
Table-8.15 represents the comparisons of the proposed schemes in terms of percentage 
power savings. The proposed schemes are compared with full hardware redundancy 
(TMR) and lima's schemes. The comparison of the two proposed schemes is presented in 
Table-8.15. 
Table-8. 15: Percentage Power saving figures of the proposed techniques for various DWT 
algorithms on the proposed reconfigurable fabric for the processing of a single frame of test image, 
Lenna, Barbra and Pepper at image size of 64x64. Test conducted at clock frequency of 40MHz 
IMAGE IMAGE TYPE OF %OVERHEAD IMP-2 IMP-2 IMP-1 IMP-2 
SIZE TYPE DWT OFIMP-I % POWER SAVING % POWER SAVING % POWER % POWER 
W.R.T. IMP-2 OVER LIMA OVER LIMA SAVING OVER SAVING OVRE 
SCHEME SCHEME TMR TMR 
DWT-1 15.9% -5.3% -18.3% -32.2% -53.3% 
Lenna Dwr-2 13.9 0/( -7.01% -18.4% -36.3% -55.2% 
DwT-3 10,5% -3.23% -12.4% -28.02% -41.47% 
64 DWT-I 13.6% -6.9% -18.1% -26.3 17( 
x Barbra DWT-2 13.8% -7.3% -18.6% -35.09 0/r -54.7% 
64 DWT-3 10.9% -6.6% -15.8% -31.4% -45.9% 
DWT-1 10.7% -10.5% -19.1% -32.6% -46.89% 
Pepper DWT-2 17.5% -2.67% -17.1% -29.01% -51.7% 
DWT-3 10.48% -4.87% -13.9% -28.2 17c -41.64% 
191 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
Table-8.16 represents the results for SEU immunity achieved through various mitigation 
schemes. SEU faults were injected through SEU simulator (chapter-5). The input test 
vectors were used same as previous test results (test images). Total 1000 SEU faults were 
injected at different time intervals. The pulse-width of the faults was selected 3ns. 
Threshold probability for imp-i was chosen as 0.5 for impl- 1. The functional outputs of 
the implementations for various DWT algorithms were compared with a standard 
implementation without any faults and a number of functional upsets was calculated for 
each mitigation scheme. 
Table-8.16: SEU immunity comparison of the proposed techniques for various DWT algorithms on 
the proposed reconfigurable fabric for the processing of a single frame of test image (Lenna) at 
image size of 64x64. 
IMAGE IMAGE TN PE OF IMP-1 IMP-2 LIMA [LIM-0031 TMR MAVIS IMAV-0021 
SIZE TN PE DWT UPSETS UPSETS UPSETS UPSETS UPSETS 
64 DWT-1 4 0 7 0 12 
X Lerma DWT-2 6 0 9 0 15 
64 DWT-3 4 0 7 0 14 
Column heading upsets' in the Table represent number of the times circuit is functionally 
upset with the induced SEU. It can be seen that the proposed scheme a SEU immunity of 
99.9 percent can be achieved through proposed scheme (imp-2). Mavis et. al based scheme 
failed 100% in terms of voting circuit and combinational circuit faults. However, Mavis et. 
al. based implementation detected all faults in clock signals and synchronous/memory 
elements of various DWT implementations. Lima et. al based implementation of the 
proposed fabric failed for syndrome generator faults and therefore the SEU immunity is 
lower than the proposed schemes. 
The second test was conducted with faults of pulse-width = iOns. The proposed 
implementation scheme (imp-1) gets more functional upsets. This is due to partial TMR of 
the design and if the upset is caused in non-protected region of the circuit then there is a 
high probability that the spurious pulse will travel as a normal signal before it gets latched. 
Mavis et. al. based scheme experiences more upsets because longer duration of upset pulse 
has more probability to be sampled by two or more sampling clocks. The comparison 
results are shown in Table-8.17 
Table-8. 17: SEU immunity comparison of the proposed techniques for various DWT algorithms on 
the proposed reconuigurable fabric for the processing of a single frame of test image (Lenna) at 
image size of 64x64. 
192 
Chapter 8: Performance Evaluation Of The Proposed Architecture 
IMAGE IMAGE TYPE OF IMP-1 IMP-2 LIMA (LIM-0031 TMR MAVIS[MAV-0021 
SIZE TYPE DWT UPSETS UPSETS UPSETS UPSETS UPSETS 
64 DWT-1 6 0 7 0 13 
x Lenna DWT-2 8 0 9 0 17 
64 DWT-3 5 0 7 0 iS 
It can be seen that the proposed scheme (imp-2) has 100% SEU immunity and all the 
single event faults in data and clock tree were detected and corrected by the proposed 
scheme. On the other hand, TMR has also 100% fault recovery but the advantage of power 
and area saving over TMR makes the proposed scheme more attractive for performance 
critical systems. 
8.8 Summary 
This chapter introduces performance evaluations of the proposed domain (DWT) specific 
reconfigurable architecture in terms of area, power consumption. The performance was 
compared with respect to a generic FPGA and ASIC core. The proposed SEU mitigation 
techniques were implemented on the proposed reconfigurable core for performance 
evaluations. The SEU immunity of the proposed hardware was compared with different 
available options. The result showed the efficacy of the proposed architecture along with 





In recent years the reconfigurable computing machines have introduced a new set of 
alternatives for hardware and systems designers. The reconfigurable SoC (system-on-chip) 
offers great room for innovations in system architecture because of increasing device 
densities and the combination of software targeted tasks with the runtime reconfiguration 
of the system. Due to the ability to customize hardware modules, it is possible to optimize 
control, data-path and interconnections according to specific algorithm requirements. 
Domain specific reconfigurable systems in particular, can achieve outstanding benefits 
from this paradigm, adapting to the instantaneous needs of an application. Reconfigurable 
architectures allow the possibility for combining the post-fabrication programmability of 
the processors with the performance (area savings and power consumption) of the 
application specific circuits (ASIC). As a mid-range solution between generic (software) 
and specific (hardware) approaches. the domain specific reconfigurable architectures are a 
valuable tool for design space exploration. 
This final chapter summarizes the work presented in the main body of this thesis, 
evaluates the extent to which the original has been accomplished and shows clearly the 
contribution to the research on efficient realization of JPEG-2000 DWT architecture which 
can withstand single event effects. 
Contemporary multimedia and telecommunication applications bring new challenges to 
designers. Besides the high performance demands of various complex algorithms, the 
following two design aspects are very important: 
• Power consumption per function 
• Hardware flexibility to reconfigure for specific tasks 
194 
Chapter 9: Summary 
The success of many embedded systems is directly related to a low-power design that 
maximizes the device autonomy. On the other hand, the flexibility of reconfigurable 
systems is also a key factor. It allows the modification of the implemented tasks/products 
through the addition of new services or the redefinition of algorithms functionality. These 
features result in important commercial advantages that are not found in ASICs. 
Multimedia operations include algorithms that require heavy real-time processing. Image 
analysis and machine vision solutions are important in many industrial and military 
applications such as robotics, security, medical imaging, and scene inspection. These 
applications have regular characteristics and an inherent level of parallelism that can be 
exploited by reconfigurable systems. 
The theme of this work and thesis has been to propose an efficient reconfigurable 
architecture which can withstand the effects of radiation. We looked at different existing 
reconfigurable architectures and found that most of the architectures offer flexibility which 
may not be required for the target application. This extra flexibility comes at the expense 
of power consumption, area and speed. Therefore, it was deduced that if we can tailor 
design a reconfigurable architecture for a particular domain then we can save this extra 
cost. It is worth mentioning that the design engineer will have the domain information 
before starting the design work. As explained before. DWT was selected as the domain for 
this research work. We started to look into DWT domain to study existing algorithms. We 
looked into vlsi implementation techniques for these algorithms to identify the required 
hardware resources. These implementation options can be limited depending on the nature 
of the application or it can be unlimited. If the options are limited then maximum area and 
power savings can be achieved while maintaining the 100% required flexibility. If the 
options are unlimited then a comprehensive study of all the options is required to 
incorporate maximum flexibility while maintaining the performance edge over generic 
FPGA. We proposed a reconfigurable architecture for DWT domain on the basis of above 
mentioned studies. We implemented different DWT algorithms to verify the desired 
flexibility of the proposed architectures. We implemented the same algorithms on a 
general purpose FPGA and compared the results. The results authenticated the 
performance edge of the proposed architecture over general pupose FPGA. We achieved 
42-52 17( power saving with 18-27% speed gain over a general purpose Xilinx vertex-E 
FPGA. The main reason for this performance edge is due to design of more specific coarse 
grain CLBs for DWT domain that helped to reduce interconnecting load. Power 
consumption due to interconnects in a general purpose FPGA is approximately 70% of the 
195 
Chapter 9: Summary 
total power consumption. Therefore, by tailoring the architecture for a particular domain 
resulted in comprehensive performance gain. 
After developing a reconfigurable architecture, we looked into different SEU/SET 
techniques. The majority of SEU/SET techniques are based on hardware redundancy. We 
combined temporal and hardware redundancy to gain performance edge over existing 
mitigation techniques. The combination of temporal and hardware redundancy helped to 
mitigate effects of SETs in clocks and control signals. The temporal redundancy helped to 
analyse different data samples at different time interval without any extra hardware. We 
incorporated weighted voting logic which helped to mitigate more than one faults. In 
addition we incorporated self checking mechanisms for voting logic to mitigate voter 
circuit faults. This self checking feature along with temporal sampling makes the proposed 
SEU/SET technique power and area efficient. We implemented the proposed mitigation 
techniques on our reconfigurable architectures and deduced that 29-36% power saving can 
be achieved. Therefore, it was proven that the proposed schemes along with the proposed 
reconfigurable architecture are far more efficient in terms of performance and well suited 
for mission critical applications. 
9.2 Future Work 
Although this thesis has striven to give rigorous investigation of the research objectives as 
outlined in chapter-1, there are still areas which can potentially add to the knowledge 
already gained from the research presented. 
The efficient automated tool for placement and routing can improve the performance of 
the proposed reconfigurable array. Investigation to look for possible alternates of VPR will 
benefit the design in terms of area and power consumption. 
Efforts to develop an efficient hardware mapping tool for the proposed schemes will 
improve the performance and mapping time. Investigations and algorithms development to 
efficiently realize the hardware from C like description to SEU hardened hardware would 
shorten the design cycle. 
In Chapter-8 the inadequacy of the current and conventional power evaluation and 
modelling tools were highlighted. Research into better power evaluation methods and 
development of new tools that will provide energy consumption figure and profile log are 
required. 
196 
Chapter 9: Summary 
9.3 Final Comments 
An efficient SEU hardened reconfigurable design arid engineering is an important area of 
research as it is one of the main points that decide the viability of the application for 
widespread usage in aerospace and in commercial industry. As the transistor sizes are 
shrinking, the SEU effects are becoming more and more prominent. Therefore, the 
researchers are geared towards making the electronics more reliable. 
Literature review represents details of three areas. Firstly, reconfigurable computing and 
related work, secondly, Single event Upsets and lastly, discrete Wavelet Transform. 
The motivation of this research is to realize an efficient and reliable domain specific 
JPEG-2000 discrete wavelet transform based reconfigurable core. It is concluded that this 
utilization of algorithmic and architectural level optimization significantly reduces power 
consumption of the JPEG-2000 DWT based reconfigurable architecture. It is proved that 
architecture can be made more efficient in terms of area, speed and power by designing it 
for a particular domain while keeping the required flexibility for that domain. It is also 
concluded that the proposed SEU mitigation techniques are more power and area efficient 
than most commonly used techniques. 
197 
References 
[ABI-WEB] R. 	Abielmona, 	"Alphabetical 	List 	of 	Reconfigurable 	Computing 
Architectures," http://www.site.uottawa.ca/—rabielmo/personal/rc.html 
[A130-001 J Prof. E Abou-fadel, "JPEG 2000: The Next Compression Standard using 
wavelet 	technology". 	Grand 	Valley 	State 	University, 	2001. 
http://www.gvsu.edu/math/wavelets/student —work/EF/index.htmi 
[ACT-097] ACTEL. Design Techniques for Radiation-Hardened FPGAs. Application 
Note. http://www.actel.com  (Sep. 1997). 
[AHDL-00] Active-HDL, Complete FPGA Verification Environment by ALDEC, 
www.aldec.com . 
[ALF-098] Alfke, P; Padovani, R. "Radiation Tolerance on High-density FPGAs". 
http://www.xilinx.com  (Oct. 1998). 
[ALT-0981 ALTERA Corporation. Data Sheet. http://www.a1tera.com , Nov. 1998. 
[ALT-BUK] Altera data book sheets, http://www.altera.com/products/prd-index.html.  
[AND-083} J. L. Andrews, J. E. Schroeder, B. L. Gingerich, W. A. Kolasinski, R. 
Koga, and S. E. Diehl, "Single event error immune CMOS RAM," IEEE 
Transaction on Nuclear Science, vol. 29, pp.  2040-2043, Dec. 1982. 
[ANG-000] Anghel and M. Nicolaidis, "Cost reduction and evaluation of a 
temporary faults detecting technique," in Proc. Design Automation and 
Test Europe, 2000, pp.  672-678. 
[ANT-092] Anthoni, M Barlaud, P Mathieu, I Daubechies, "Image coding using 
wavelet transform". IEEE transaction on image processing, Vol. 1, 1992. 
pp. 280-288. 
[ART-0961 Arthur Abnous, J. Rabey, 	Ultra low power domain specific multimedia 
processors', Proceedings of IEEE VLSI signal processing workshop, 
USA, 1996, pp. 341-345. 
[BALO-007] S. Baloch, T. Arslan, A. Stoica, 'Radiation Hardened Coarse-Grain 
Reconfigurable Architecture for Space Applications', has been accepted 
as a regular paper for the 14th Reconfigurable Architectures Workshop 
RAW 2007, to be held in March 2007 at Long Beach California, USA, 6 
pages. 
IRRI 
[BALO-066] S. Baloch, T. Arslan. A. Stoica, 'An Efficient Fault Tolerance Scheme for 
Preventing Single Event Disruptions in Reconfigurable Architectures", 
IEEE Field Programmable Logic and Applications, 2005.on 24-26 Aug. 
2006, Madrid Spain, pp. 618-621. 
[BALO-006] S. Baloch, T. Arslan. A. Stoica, "Design of a single event upset (SEU) 
mitigation technique for programmable devices" IEEE Quality Electronic 
Design, 2006. ISQED 06. 7th International Symposium on 27-29 March 
2006.4 Pages. Digital Object Identifier 10.1 109IISQED.2006.46. 
[BALO-06a] S. Baloch, T. Arslan, A. Stoica, "Design of a Novel Soft Error Mitigation 
Technique for Reconfigurable Architectures" IEEE Aerospace 
Conference, 2006 IEEE 04-11 March 2006. pp.  1 - 9. 
[BALO-06b] S. Baloch, T. Arslan, A. Stoica, 'An Efficient Technique for Preventing 
Single Event Disruptions in Synchronous and Reconfigurable 
Architectures Adaptive Hardware and Systems, 2006. AHS 2006. First 
NASA/ESA Conference on 15-18 June 2006, pp.  292-295. 
[BALO-006c] S. Baloch, T. Arslan. A. Stoica, "Probability Based Partial Triple Modular 
Redundancy Technique for Reconfigurable Architectures" IEEE 
Aerospace Conference. 2006 IEEE 04-11 March 2006, pp.  1 - 7. 
[BALO-06d] S. Baloch. T. Arslan, A. Stoica, "Embedded Reconfigurable Array Fabrics 
for Efficient Implementation of Image Compression Techniques", First 
NASA/ESA Conference on Adaptive Hardware and Systems, 2006. Al-IS 
2006. pp.  15-18. 
[BALO-05a] 	S. Baloch, T. Arsian, A. Stoica, "Efficient Error Correcting Codes for On- 
Chip DRAM Applications for Space Missions", IEEE Aerospace, 2005 
IEEE Conference, March 2005, pp.  1 —9. 
BALO-05b] S. Baloch, T. Arsian, A. Stoica, "Low power domain-specific 
reconfigurable array for discrete wavelet transforms targeting multimedia 
applications", Field Programmable Logic and Applications, 2005. 
International Conference on 24-26 Aug. 2005, pp.  618 —621. 
[BALO-05c] 	S. Baloch, T. Arslan, A. Stoica, "Domain-specific reconfigurable array 
targeting discrete wavelet transform for system-on-chip applications", 
IEEE Parallel and Distributed Processing Symposium, 2005. Proceedings. 
19th IEEE International 4-8 April 2005, 4 Pages. 




[BAZ-0971 	Baze, M.P. and S.P. Buchner, 'Attenuation of Single Event Induced 
Pulses in CMOS Combinatorial Logic', IEEE Transactions on Nuclear 
Science, Vol. 44, No. 6, December 1997, pp. 2217-2223. 
[BEC-0021 	Becker J.. "Configurable systems-on-chip". Proceedings. 15th 
Symposium on Integrated Circuits and Systems Design, 2002, vol. 7. pp. 
379-384. 
[BEN-0021 	K. C. B. Tan and T. Arslan, "An Embedded Extension Algorithm for the 
Lifting based Discrete Wavelet Transform in JPEG2000", 2002 IEEE 
International Conference on Acoustics, Speech, and Signal Processing 
(ICASSP 2002), Vol: 4, pp. 3513-3516. May 2002. 
[BEN-004] 	K. C. B. Tan and T. Arsian, "Shift-accumulator ALU Centric JPEG2000 
5/3 Lifting based Discrete Wavelet Transform Architecture", 2003 IEEE 
International Symposium on Circuits and Systems (ISCAS 2003),Vol:5. 
pp. 161-164, May 2003. 
[BEN-MRK] http://www.eecs.umich.edu/—jhayes/iscas/benchmark.html 
[BES-0931 	Bessot, Denis. Conception de Deux Points Memoire Statiques CMOS 
durcis Contre L'effet des Aleas Logiques Provoques par L'environment 
Radiatif Spatial. INPG. Novembre, France, 1993 pp.  98-106. 
BET-0991 	BETZ, Vaughn; ROSE, Jonathan Rose, "FPGA Routing Architecture: 
Segmentation and Buffering to Optimize Speed and Density". In IEEE 
conference on FPGA, Feb. 1999. pp.  133-139. 
[BIN-0751 	D. Binder, E. C. Smith, and A. B. Holman, "Satellite anomalies from 
galactic cosmic rays," IEEE Transactions on Nuclear Science., vol. 22, pp. 
2675-2680, Dec. 1975. 
[BIT-097] 	Ray Bittner, and Peter Athanas, "Wormhole Run-time Reconfiguration," 
Proc. of FPGA international conference, Monterrey, CA, February. 1997, 
pp. 67-73. 
[BOY-0971 	Boyyarinov, I., Farrell, P., Markarian. G., and Honary. B.: Random 
Double-Bit Error Correcting Generalized Array Codes', Electron Letter., 
1997, pp. 382-383. 
[BRY-098} 	OBryan. Martha; Label, Kenneth A.; Reed, Robert A; Barth, Janet; 
Seidleck C.; Marshall, Paul; Marshall, C.; Carts, M, Single Event Effect 
and Radiation Damage Results For Candidate Spacecraft. In: IEEE 
NSREC Conference. 1998, pp.  187-191. 
[BUC-097] 	Buchner, S., M. Baze, D. Brown, D. McMorrow. and J. Melinger, 
"Comparison of Error Rates in Combinatorial and Sequential Logic", 
I. 
References 
IEEE Transactions on Nuclear Science, Vol. 44, No. 6, December 1997, 
pp. 2209-2216. 
[CAL-000I 	Callahan, T.J.; Hauser, J.R.; Wawrzynek. J.; "The Garp architecture and C 
compiler",Transaction on IEEE Computer, Volume 33, Issue 4, April 
2000, pp.  62-69, Digital Object Identifier 10.1109/2.839323 
[CAL-0961 	Calm, T., M. Nicolaidis. and R. Velazco, "Upset Hardened Memory 
Design for Submicron CMOS Technology. IEEE Transactions on 
Nuclear Science, Vol. 43, No. 6. December 1996. pp.  2874-2878. 
[CAL-96a] Calm. T.: Velazco, R.; Nicolaidis, M.; MOSS, 5; Lamondiere, S. D.; Tran, 
V. T.; Koga, R. Topology-Related Upset Mechanisms in Design Hardened 
Storage Cells. Proceedings of NSREC Conference. 1996, pp.  33-37. 
[CAL-96b] Calm, T.; Nicolaidis, M.: Velazco, R. Upset Hardened Memory Design 
for Submicron CMOS Technology. In: IEEE Transactions on Nuclear 
Science. vol.43, pp.  1387-1393, December 1996. 
[CAM-099] 	Carmichael, C.; Fuller, E.; Blain, P.; Caffrey, M. SEU Mitigation 
Techniques 	for 	Virtex 	FPGAs 	in 	Space 	Applications". 
http://www.xi1inx.com (Sep. 1999). 
[CAR-001] 	Carmichael, C., Fuller, E., Fabula, J., Lima, F., "Proton Testing of SEU 
Mitigation Methods for the Virtex FPGA",Proc. of MAPLD, 2001. USA, 
pp. 117-121, www.k1abs.org/richcontent/MAPLDCon01/.  
[CAR-0041 	C. Ebeling, C. Fisher, G. Xing, M. Shen, H. Liu, Implementing an 
OFDM Receiver on the RaPiD Reconfigurable Architecture', IEEE 
transactions on computers, Vol 53, No. 11, pp.  233-239, November 2004 
[CAR-Oil] 
	
	Carmichael, C.. "Triple Module Redundancy Design techniques for Virtex 
FPGA",Xilinx Application notes, p.  197. vol. 1.0, March. 2001. 
[CAR-096] Carro, L.; Pereira, G.; Suzin, A. Prototyping and Re-engineering of 
Microcontroller- Based Systems. In: IEEE Rapid Systems Prototyping 
Workshop. Proceedings, June 1996, USA. pp  21-27. 
[CHA-090] 	S. Chakravarty and H. B. hunt, III. "On computing signal probability and 
detection probability of stuck-at faults," IEEE Transactions on Computers, 
vol. 39-11, Nov. 1990 pp. 1369-1377. 
[CHA-0931 	H. Cha and J. H. Patel, "A logic-level model for ®-particle hits in CMOS 
circuits, in Proceedings of the International Conference on Computer 
Design, pp. 538-542, Oct. 1993. 
[CHA-094] 	H. Cha and J. Patel, "Latch design for transient pulse tolerance," in Proc. 




ChA-0961 Cha, E. M. Rudnick, J. H. Pate!, R. K. Iyer, and G. S. Choi, "A gate- 
level simulation environment for alpha-particle-induced transient faults," 
IEEE Trans. on Computers, vol. 45, pp.  1248-1256, Nov. 1996. 
[CHA-WEB] Chameleon Systems Corp. 	CS2000 Advance Product 	Specification. 
Chameleon 	Systems, 	Inc., 	San 	Jose, 	CA. 	2000. 
http://www.chameleonsystems.com/ 
[CHI-WEB] Chipworks, 	Inc., 	"Xilinx 	XC2V1000 	Die 	Size 	and 	Photograph", 
Chipworks, Inc., Ottawa, Canada, 2002. www.chipworks.com  
[COM-099] K. Compton. and S. Hauck. "Configurable Computing: A Survey of 
Systems and Software," IEEE transaction on computer, 1999, vol. 87. pp- 
182-208. 
[CRO-092] Cronquist, Brian. Richard B. Katz, Jib-Jong Wong, John McCollum, Igor 
Kleyner, 	Ingrid 	Brill, 	W. 	Parker, 	Kenneth 	A. 	LaBel, 	"Radiation- 
Hardened/High-Reliability 	Programmable 	Logic 	Using 	Modified 
Commercial-off-the-Shelf 	Technology, 	Proceedings 	MAPLD 
Conference, 2000. pp.  114-120. 
www.klabs.org/richcontent/MAPLDConOO/.  
[DAN-002] Dang, P.P.; Chau, P.M.: "Integer fast wavelet transform and its VLSI 
implementation for low power applications" IEEE Workshop on Signal 
Processing Systems, 2002. (SIPS02), 16-18 Oct. 2002. pp.  93-98. 
[DAU-0981 Daubechies and W. Sweldens, "Factoring wavelet transform into lifting 
steps, journal of Fourier analysis and applications, vol. 4, pp. 247-269, 
1998. 
[DeH-004] 	DeHon A., Wawrzynek J., "Reconfigurable Computing: What. Why, and 
Implications for Design Automation". Design Proceedings of the Design. 
Automation and Test in Europe Conference and Exhibition Designers' 
Forum (DATE'04) 1530-1591/04 2004, pp.  610-615. 
[DEP-098] 	J. Depreitere, H. Van Marck and J. Van Campenhout, Evaluation of 
FPGA Switch Matrices using a Monte Carlo Approach. In: 12"'DCE 
conference, 1998, pp.  54-60. http://www.e!is.rug.ac.be/-.jdp/DCE98.pdf 
[DIC-000} Dick. C.. 'FPGA: The High-End Alternative For DSP Applications', 
Internal symposium on field programmable logic, FPL- 2000. pp.  112-
118. 
[DIE-083] 	Diehl, S.E., Jaeger, R.C.; Gaensslen,;"An Efficient Numerical Algorithm 
for Simulation of MOS Capacitance", Computer-Aided Design of 
Integrated Circuits and Systems, IEEE Transactions, Volume 2, Issue 2, 
April 1983. pp. 111-116. 
202 
References 
[DIE-084] 	S. E. Diehl-Nagle, J. E. Vinson, and E. L. Peterson. "Single event upset 
rate predictions for complex logic systems," IEEE Transactions on 
Nuclear Science, vol. 31, pp.  1132-1138, Dec. 1984. 
[DOD-095] 	Dodd. P.E. and F.W. Sexton, "Critical Charge Concepts for CMOS 
SRAMs", IEEE Transactions on Nuclear Science, Vol. 42, No. 6, 
December 1995. pp.  1764-1771. 
[DOO-090] 	Dooley, J.G., "SEU-Immune Latch for Gate Array, Standard Cell, and 
other ASIC Applications". United States Patent Number 5.311,070. 
[DSP-JOU] 	DSP 	journal 	May/June 	2000 	Inaugural 	Issue, 
"www.dsp.dla.mil/newslettersljournal/DSP-05-00.pdf'  
[DUP-0021 	Dupont, D., Nicolaidis, M., Rohr, P., "Embedded Robustness IPs for 
Transient-Error-Free CS", 8 0' IEEE Design and Test of Computers, May, 
2002, pp.  180-186. 
[EBE-0961 	C. Ebeling, D. C. Cronquist, P. Franklin, "RaPiD - Reconfigurable 
Pipelined Datapath.", IEEE symposium on Field-Programmable Logic: 
Berlin, Germany, pp. 126-135, 1996. 
[FAC-099] 	F. Faccio, K. KJoukinas, A. Marchioro, T. Calin, J. Cosculluela, M. 
Nicolaidis, and R. Velazco, "Single event effects in static and dynamic 
registers in an a 0.25 _m CMOS technology," IEEE Transactions on 
Nuclear Science, vol. 46, pp.  1434-1439, Dec. 1999. 
[FAR-093] 	H. Farhat, A. Lioy and M. Pocino, "Computation of exact random pattern 
detection probability," Proceedings of IEEE Custom Integrated Circuits 
conference. vol. 9-12. May 1993, pp.  2671-2674. 
[FRI-085] 	A. L. Friedman, B. Lawton. K. R. Hotelling, J. C. Pickel, V. H. Strahan, 
and K. Loree, "Single event upset in combinational and sequential current 
mode logic," IEEE Trans. Nuclear Science, vol. 32. pp.  4216-4217, Dec. 
1985. 
[GEO-0011 	Georgi K., Bahman Z., Prarthana S., Stamatis V., Reconfigurable DWT 
unit based on lifting', 12th Annual Workshop on Circuits, Systems, and 
Signal Processing (ProRISC)', April, 2002, pp.  219-230. 
[GID-085] 	A. E. Giddings, F.W. Hewlett, R. K. Treece, D. K. Nichols, L. S. Smith, 
and J. A. Zoutendyk, "Single event upset immune integrated circuits for 
Project Galileo," IEEE Transactions on Nuclear Science.. vol. 32, pp. 
4159-4163, Dec. 1985. 
[GOL-000] 	S. Goldstein et al. "PipeRench: A Reconfigurable Architecture and 




[GOL-099] 	S. C. Goldstein et al., "PipeRench: A Coprocessor for Streaming 
Multimedia Acceleration". IEEE proceedings of ISCA' 99, Atlanta. May 
2-4, 1999, vol 9, pp. 1180-1186. 
[GUE-079] 	C. S. Guenzer, E. A. Wolicki, and R. G. Allas, "Single event upset of 
dynamic RAM's by neutrons and protons," IEEE Transactions on Nuclear 
Science, vol. 26, pp.  5048-5053, Dec. 1979. 
[HAM-0501 	R. W. Hamming, Error Detecting and Error Correction Codes, Bell 
Systems Technical Journal. vol. 29:147-160, April 1950, pp.  113-135. 
[HAR-001] R. Hartenstein, "A Decade of Reconfigurable Computing: A Visionary 
Retrospective," Proc. DATE 2001 Conference, Munich, Germany. pp. 
642, Mar.. 2001. 
[I-IAR-0 111 	S. Hareland, J. Maiz, M. Alavi, K. Mistry. S.Walsta, and C. Dai, "Impact 
of CMOS process scaling and SOl on soft error rates of logical 
processes," in Symposium on VLSI Technology Digest of Technical 
Papers. IEEE, 2001, pp.  73-74. 
[HAR-01A] 	R. Hartenstein, "Coarse Grain Reconfigurable Architectures," Coarse 
Grain Reconfigurable Architecture ASP-DAC 2001. Asia and South 
Pacific Design Automation Conference 2001, Jan. 2001, Yokohama. 
Japan, pp. 564 -569. 
[HAR-0961 	R. Hartenstein, et al. "A Novel Machine Paradigm to Accelerate Scientific 
Computing." Special issue on Scientific Computing of Computer Science 
and Informatics Journal, Computer Society of India, vol. 2 1996. pp.  243-
248. 
[HAR-0981 	R. Hartenstein "Using The KressArray for Reconfigurable Computing," 
2' Conf. on Configurable Technology and Applications. Boston, Nov. 
1998, pp.  310-318. 
[HAS-000] 	K. Joe Hass "Probabilistic Estimates of Upset Caused by Single Event 
Transients "IEEE Transactions on Nuclear Science, vol. 37, pp.  1007-
1013. Dec. 2000. 
[HAS-098] 	K. Hass, J. Gambles, B. Walker, and M. Zampaglione. "Mitigating single 
event upsets from combinational logic," in Proceedings of 7th NASA 
Symposium on VLSI design. USA, 1998, pp.  152-156. 
[HAU-097] 	J. R. Hauser, J. Wawrzynek, "Garp: A MIPS Processor with a 
Reconfigurable Coprocessor", IEEE Symposium on Field-Programmable 
Custom Computing Machines, pp.  12-2 1, March. 1997. 
[HAY-098] 	S. D. Haynes, P. Y. K. Cheung, "A Reconfigurable Multiplier Array For 
Video Image Processing Tasks, Suitable for Embedding in an FPGA 
204 
References 
Structure", IEEE Symposium on Field-Programmable Custom Computing 
Machines, pp.  226-234, March, 1998. 
[I-IAZ-000] 	P.Hazucha and C.Svensson. "Impact of CMOS Technology Scaling on the 
Atmospheric Neutron Soft Error Rate," IEEE Transactions on Nuclear 
Science, Vol. 47, No. 6, Pp.  2586-2594, Dec 2000. 
[HER-002] 	Herman, S. et al.. Piperench: A virtualized programmable data path in 
0.18 micron technology, Proceedings of the IEEE Customer Integrated 
Circuits Conference. May 2002. pp.  63-66. 
(HIR-001] 	Hiroshi I., "Direction of Silicon Technology from Past to Future", 
Proceedings of 8th IPFA 2001, Singapore. pp. 650-654: 0-7803-6675-1/01 
[HOL-091] K.C. Holland, J.G. Tront, " Probability of Latching Single Event Upset 
errors in VLSI Circuits", 7-10th April 1991. pp.  109-113 vol.1 
Digital Object Identifier 10.1 1 09/SECON. 1991.147715 
[JON-0951 	W-B. Jone and S. R. Das. "CACOP-a random pattern testability 
analyzer." IEEE Transactions on Man and Cybernetics, vol. 25- 5 pp. 
865-871,1995. 
JPG-2001 	ISO/IEC 15444-1 ,"An information Tech no] ogy-JPEG-2000 image coding 
system-Part 1: Core design-System "http://www.jpeg.org/JPEG2000.html  
[KAF-003] Kafafi N.. Bozman K., Wilton S.J.E, "Architectures and Algorithms for 
Synthesizable Embedded Programmable Logic Cores". ACM 
Inrernarional Symposium on FPGA, Monterey, CA. Feb 2003, pp.  78-82. 
[KAT-094] Katz, R.; BARTO, R.; McKerrachier, P.; Carkhuff, B; Koga, R. SEU 
Hardening of Field Programmable Gate Arrays (FPGAs) for Space 
Application and device characterization. NSREC Conference, 1994-USA, 
pp. 167-171. 
LKAT-0961 	Katz, R. et al. An SEU-Hard flip-Hop for Antifuse FPGAs, Washington 
DC, USA, MAPLD, 2001, pp.  61-67. 
www.kiabs.org/richcontent/MAPLDConOl/.  
tKAT-0971 	Katz, R.; Label, K.; Wang, J.; Cronquist, B.; Koga, R.; Penzin, S.; Swift, 
G. Radiation Effects on Current Field Programmable Technologies. In: 
NSREC Conference, 1997, pp.  234-238. 
[KAT-0981 	Katz, R.; Wang, J.; Label, K.; McCollum, J.; Brown, R.; Reed, R.; 
Cronquist, B.; Cram, S.; Scott, T.; Paolini, W.; Sin, B. Current Radiation 
Issues for Programmable Elements and Devices. IEEE NSREC 
Conference, 1998, pp.  23-28. 
References 
[KAZ-001I Kazéeminejad. A., 	'Fast. 	Minimal Decoding Complexity, Systematic 
(13,8) Single-Error-Correcting Codes for on-chip DRAM Applications', 
IEEE Electron Letter., 2001, pp.  1208-1209. 
KAZ-01a] Kazdeminejad, A. and B. Eric. 'Fast, Minimal Decoding Complexity, 
System Level. Binary Systematic (41,32) Single-Error-Correcting Codes 
for on-chip DRAM Applications', Proceeding of IEEE international 
symposium on defect and fault tolerance in VLSI. 2001. pp.  122-126. 
[KEN-0961 Kenneth P. Pawr and Edward J . McCluskey ,'Analysis of Logic Circuits 
with Faults Using Input Signal Probabilities' IEEE 0-8186-7150-5/96 
1996 IEEE Proceedings of FTCS-25, Volume 111. pp.  1.8-1.12 
[KOL-0991 W. A. Kolasinski, J. B. Blake, J. K. Anthony, W. E. Price, and E. C. 
Smith, "Simulation of cosmic-ray induced soft errors and latchup in 
integrated- circuit computer memories," IEEE Trans. Nuclear Science, 
vol. 26, pp.  1434-1439. Dec. 1999. 
[KRI-0931 R. Krieger, B. Becker and R. Sinkovic, "A BDD-based Algorithm for 
computation of exact fault detection probabilities." Digest of Papers, 
Fault-Tolerant Computing. vol. 22-24, June 1993, pp.  186-195. 
[LAB-099] Label, K. . Commercial Microelectronics Technologies for Applications 
in 	the 	Satellite 	Radiation 	Environment". 	http://flick.gsfc.nasa.gov/ 
radhome.htm (Nov. 1999). 
[LEM-002] Lemieux G.. Lewis D. "Circuit Design of Routing Switches". ACM 
International Symposiem on FPGA. Monterey, CA. Feb 2002, pp.  71-77. 
[LLA-001 I C-J Lian, K-F Chen, H-H Chen and L-G Chen, "Lifting based discrete 
wavelet 	transform 	architecture 	for JPEG-2000", 	IEEE 	international 
symposium on circuits and systems, 2001. pp. 12-18. 
[LIA-0021 C-J Lian, C. Chakrabarti and T Acharya, "A VLSI architecture for lifting 
based forward and inverse wavelet transforms", IEEE transaction on 
signal processing, Vol. 50, pp.  330-338. 
IILIU-092 1 M. Whitaker, "Low Power SEU immune CMOS Memory Circuits", 
NSREC Workshop, 1992., Vol: 47, pp. 17-21. 
[LID-0941 P. Liden, P. Dahlgren, R. Johansson, and J. Karlsson, "On latching 
probability of particle induced transients in combinational networks, 	in 
Proceedings of the 24th International 	Symposium on Fault-Tolerant 
Computing, pp. 340-349, June 1994. 
[LIM-00I1 C.; Fuller, E.; Fabula, J.; Lima, F. Proton "Designing and testing fault- 
tolerant techniques for SRAM-based FPGAs", Proceedings of the 1st 
206 
References 
conference on Computing frontiers Special session on reconfigurable 
computing, pp. 419 —432, 	2004 ISBN: 1-581 13-741-9. 
[LIM-0011 Lima. F., Carmichael, C., Fabula, J., Padovani, R., Reis, R.. "A Fault 
Injection Analysis of Virtex® FPGA TMR Design Methodology", 
Proceedings of IEEE RADECS, Sept. 2001, vol-34, pp.  1120-1128. 
[LIM-003] Lima, F., Carro, L., R., Reis, R., "Designing fault tolerant systems into 
SRAM-based 	FPGAs" IEEE Design Automation Conference (DAC) 
2003. pp.  315-321. 
[LIM-03a] Lima, F. Carro, L., Reis. R., "Reducing Pin and Area overhead in Fault- 
Tolerant FPGA-based Designs" IEEE FPGA conference 2003, California 
USA, pp.  76-85. 
[LO-093] J. Lo, "A novel area-time efficient static CMOS totally self-checking 
comparator," IEEE Journal of Solid-State Circuits, vol. 28, pp.  165-168, 
Feb. 1993. 
[LU-099] G. 	Lu, 	et 	al. 	"The 	Morphosys 	Parallel 	Reconfigurable 	System," 
Proceedings. of Euro-Par 99, Toulouse, France, Sept. 1999. vol. 56. 	pp. 
988-996. 
[MAR-093] S. A. Martucci and R. M. Mersereau, "The symmetric convolution 
approach to the nonexpansive implementations of FIR filter banks for 
images," in 1993 IEEE International Conference on Acoustics, Speech, 
and Signal Processing. ICASSP93. vol. 5, pp.  65-68. April 1993. 
[MAR-99a] A. Marshall, T. Stansfield, I. Kostarnov, J. Vuillemin, B. Hutchings, "A 
Reconfigurable 	Arithmetic 	Array 	for 	Multimedia 	Applications", 
ACM/SIGDA International Symposium on FPGAs, pp.  135-143, 1999. 
[MAR-099] Martinez-Peiro, M.; Valls, J.; Sansaloni, T.; Pascual, A.P.; Boemo, "A 
comparison between lattice, cascade and direct form FIR filter structures 
by 	using 	a 	FPGA 	bit-serial 	distributed 	arithmetic 	implementation". 
Electronics, Circuits and Systems, 1999. Proceedings of ICECS 1999, pp. 
106-112. 
[MAS-000] L. W. Massengill, A. E. Baranski, D. 0. V. Nort, J. Meng, and B. L. 
Bhuva. "Analysis of Single-Event Effects in Combinational Logic - 
Simulation of the AM2901 Bit slice Processor," IEEE Trans. on Nuclear 
Science, vol 47, pp.  2609-2615, Dec 2000. 
[MAS-0931 Massengill, 	L., 	'SEU 	Modeling 	and 	Prediction 	Techniques, 	IEEE 
Nuclear and Space Radiation Effects Conference. 1993, pp  345-351. 
207 
References 
[MAV-002] 	Mavis, D.G., Eaton, P.. H, "SEU and SET mitigation techniques for 
FPGA circuit and configuration bit storage Design" MAPLD International 
conference, Washington DC, USA, 2002, pp. 16-22. 
[MAV-098} 	Mavis, D., B. Cox, D. Adams, and R. Greene, "A Reconfigurable, 
Nonvolatile, Radiation Hardened Field Programmable Gate Array (FPGA) 
for Space Applications. Proceedings MAPLD Conference. 1998. pp.  88-
96. 
[MAY-0791 	T.C. May, M.H. Woods. "Alpha-particle-induced soft errors in dynamic 
memories". IEEE Trans. On Electron Devices, vol.26, no. 1, pp.2-9.  Jan. 
1979 
[MAY-79a] 	T. C. May, "Soft errors in VLSI: Present and future," IEEE Trans. 
Components, Hybrids, Manuf. Tech.. vol. 2, pp.  377-387, Dec. 1979. 
[McI-002] 	McIver, G.W.. J.R. Marum, and J.B. Cho, 'Triple Redundant Fault- 
Tolerant Register, United States Patent Number 5,031,180. 
[MET-000] 	C. Metra, M. Favalli, and B. Ricco, "Self-checking detection and 
diagnosis of transient, delay, and crosstalk faults affecting bus lines," 
IEEE Transactions on Computers, vol. 49, pp.  560-574, June 2000. 
[MIC-002] Michael Bedford Taylor et al.. "The RAW microprocessor: A 
computational fabric for software circuits and general-purpose programs", 
IEEE Micro 22 (2002), no. 2. pp.  25-35. 
[MIR-0961 	E. Mirsky and A. DeHon, "MATRIX: A Reconfigurable Computing 
Architecture with Configurable Instruction Distribution and Deployable 
Resources," IEEE Symposium on FPGAs for Custom Computing 
Machines, April. 1996, Napa, CA. pp.  157 -166. 
[MIY-098] 	T. Miyamori, K. Olukoton, "A Quantitative Analysis of Reconfigurable 
Coprocessors for Multimedia Applications", IEEE Symposium on Field-
Programmable Custom Computing Machines. pp.  2-11. 1998. 
[MOH-0031 	K. Mohanram and N. A. Touba, "Partial error masking to reduce soft error 
failure rate in logic circuits," in Proc. International Symposium on Defect 
and Fault Tolerance in VLSI Systems, 2003, pp.  433-440. 
[MOH-03a] 	K. Mohanram and N. A. Touba, "Cost-Effective Approach for Reducing 
Soft Error Failure Rate in Logic Circuits." International Test Conference, 
2003, pp.  893-901. 
[MON-003] 	P. Mongkolkachit and B. Bhuva "Design Technique for Mitigation of 
Alpha-Particle-Induced Single-Event Transients in Combinational Logic", 
Digital Object Identifier 10.1 109/TDMR.2003.8 16568, IEEE Transactions 
References 
on Device and Material Reliability, Vol. 3, No. 3, pp.  1065-1075, Sep 
2003. 
[MOO-075] Moore, G. E., "Progress in Digital Integrated Electronics", Technical 
Digest of the IEEE IEDM, issue 2, 1975, pp.  32-36. 
[MOO-088] Moore, G. E., "Progress in Digital Integrated Electronics", Technical 
Digest of the IEEE IEDM. issue 5, 1988, pp.  43-49. 
[MOR-0981 C. 	A. 	Moritz, 	D. 	Yeung. 	A. 	Agarwal, 	"Exploring 	Optimal 	Cost 
Performance Designs for Raw Microprocessors", IEEE Symposium on 
Field-Programmable Custom Computing Machines. pp.  12-27. 1998. 
[MOR-WEB] MorphlCs web site, http://www.morphics.com . 
[MUK-003] S. Mukherjee. C. Weaver, J. Emer, S. Reinhardt, and T. Austin, "A 
systematic methodology to compute the architectural vulnerability factors 
for a high-performance microprocessor," in International Symposium on 
Micro-architecture, Dec. 2003, pp.  338-342. 
[MUS-0011 Museau, 	Oliver; 	Ferlet-Cavois, 	Silicon-on-Insulator 	Technology: 
Radiation Effects, In: IEEE NSREC conference, 2001, pp.  276-288. 
[MUS-097] Musaed A. Al-Kharji, Sami A. Al-Arian. 'A New Heuristic Algorithm for 
Estimating Signal and Detection Probabilities,' IEEE, glsvlsi. 7th Great 
Lakes Symposium on VLSI, 1997, pp.  243-249. 
[MUZ-093] Mazumder. P.: Design of a Fault-Tolerant Three-Dimensional Systematic 
random-access Memory with on-chip Error-Correcting Circuit', IEEE 
Transactions on Computer., 1993, 42, (12), pp.  1453-1468. 
[NAG-0731 Nagel, L.W. and D.O. Pederson, Simulation Program with Integrated 
Circuit Emphasis (SPICE), Electronics Research Laboratory, Technical 
Report 	Number ERL-M382, 	pp. 	110-114, 	University of California, 
Berkeley, 1973. 
[NAS-WEB ] http://www.sti.nasa.gov/thesfrmi.htm  
[NIC-0991 	M. Nicolaidis, "Time redundancy based soft-error tolerance to rescue 
nanometer technologies," Proceedings of International VLSI Test 
Symposium, 1999, pp.  872-992. 
[NOR-096] 	Normand, E. "Single Event Upset at Ground Level". In: IEEE 
Transactions on Nuclear Science. Vol. 43, pp.  771-781. December 1996. 
[NTR-094] 	SIA Semiconductor Industry Association. The National Technology 




[NTR-099] SIA 	Semiconductor Industry Association. The National 	Technology 
Roadmap for Semiconductors. 1999, 2' Ed. Prentice Hall, Chapter-5, pp. 
101-119. 
[OCH-098] SI E. S . Ochotta. P. J. Crotty. C. R. Erickson. C.-T. Huang et a]. "A novel 
predictable segmented FPGA routing architecture". ACM Inremntionnl 
Symposium on FPGA. Monterey, CA. Feb 1998, PP.  221-227. 
[OPP-097] V. Oppenheim and R. W. Schafer, "Digital signal processing". 3d  Ed. 
"Chapter-2 -Transform Theory", pp.  28-39, Prentice-Hall, 1997. 
[PAN-0001 Pandora Inc. - "The Seven Sisters at Various Sizes and Quality Rates", 
http://public.migrator2000.org/pandorademo.  
[PAT-082] Peterson, E.L., P. Shapiro, J.H. Adams Jr., and E.A. Burke, 'Calculations 
of Cosmic Ray Induced Soft Upsets and Scaling in VLSI Devices", IEEE 
Transactions on Nuclear Science, Vol. 29, No. 6, December 1982, pp. 
2055-2063. 
[PAT-0931 R. 	Pathak, 	"A 	generalized 	algorithm 	for 	bounding 	fault 	detection 
probabilities 	in 	combinational 	circuits," 	Auto 	Test 	conference, 
Proceedings of IEEE Systems Readiness Technology Conference, vol. 20- 
23, Sep. 1993, pp.  683-689. 
[PAT-097] Peterson, E.L., "Single-Event Analysis and Pre-diction", IEEE Nuclear 
and Space Radiation Effects Conference, 1997, pp. 451-463. 
[PEN-005] Pen-Shu Yeh; Armbruster, P.; Kiely, A.; Masschelein, B.; Moury, G.; 
Schaefer, C.; Thiebaut, C.; 	,"The new CCSDS image compression 
recommendation", Aerospace, 2005 IEEE Conference 5-12 March 2005 
pp. 4138 —4145. 
[PET-090] Peterson, W. Wesley. Error-correcting codes. 2' d  Ed. Cambridge : The 
MIT Press, 1980. Pp.  560-586. ISBN 0262160390. 
PHI-0911 P. Philips, "On computing the detection probability of stuck-at faults in 
a combinational circuit," IEEE system readiness Technology Conference, 
vol. 24-26, Sep. 1991, pp.  301-305. 
[PIC-0781 J. C. Pickel and J. T. Blandford, Jr., "Cosmic ray induced errors in MOS 
memory cells," IEEE Transaction on Nuclear Science., vol. 25. pp.  1166- 
1171,Dec. 1978. 
[PRO-0961 J. G. Proakis and D. G. Manolakis, Digital signal processing: Principles, 




[RAB-097] Rabaey. "Reconfigurable Processing: The Solution to Low Power 
Programmable DSP," Proceedings of 1997 ICASSP Conference, Munich, 
Vol. 1, PP.  275 - 278, April 1997. 
[RAB-097] Rabaey J., 	Reconfigurable Processing: The Solution to Low- Power 
Programmable DSP". Proceedings of ICASSP conference 1997, Munich, 
April 1997, vol. 1, pp.  318-322. 
RAD-0981 B. 	Radunovic 	and 	V. 	Milutinovic, 	"A 	survey 	of 	Reconfigurable 
Computing 	Architectures," 	Proc. 	of 	FPL'98 	Eighth 	International 
Workshop on Field Programable Logic and Application, Tallinn, Estonia, 
Sept. 1998, pp.  877-887. 
[RICA-Oil http://www.electronicsweekly.com/ARTICLES/2005/06/14̀ /3561 I/ 
Reconfigurable/chi p/speaks/C.html 
{ROC-0921 Rockett, L. "SEU Hardened Scaled CMOS SRAM Cell Design Using 
Gate Resistors". In: IEEE Transactions on Nuclear Science. October, 
1992, vol 23, pp. 434-442. 
[ROS-0901 Rose L. Brown S.. "Flexibility of interconnection structures for field- 
programmable gate arrays". IEEE Journal of solid state circuits, Vol.26. 
issue 3. 1990. pp.  277- 282. 
[SAM-004] S. Khawam, T. Arslan, F. Westall, "Synthesizable Reconfigurable Array 
Targeting 	Distributed 	Arithmetic 	for 	System-on-Chip 	Applications", 
Proceedings of the 	international parallel and distributed processing 
Symposium. IPDPS, 2004, pp.  611-614. 
[SAR-001] Sarin G. Joseph B Evans, 	A FPGA Implementation of an adaptive 
reconfigurable image encoder', 
http://www.ittc.ku.edu/projects/ACS/documents/hWcOO.pdf  
[SAV-0841 Savir, G. Ditlow, and P.H. Bardell. " Random Pattern Testability" IEEE, 
Trans. On Computers.. vol. C-33, no. 1, pp.79-90, Jan. 1984. 
[SCO-003] Compton, 	S. 	Hauck, 	"Totem: 	Custom 	Reconfigurable 	Array 
Generation", IEEE Symposium on FPGAs for Custom Computing 
Machines, 2001. pp. 112-118. 
[SET-085] S. C. Seth, L. Pan, and V. D. Agrawal, "Predict - probabilistic estimation 
of digital circuit testability," IEEE International Symposium on Fault- 
Tolerant Computing. Jun. 1985, Pp.  220-225. 
[SEX-092] Sexton, F.W., 	Measurement of Single-Event Phenomena in Devices and 




[SHA-048] 	C. E. Shannon, A Mathematical Theory of Communications, Bell System 
Technical Journal. vol. 27: pp. 379-423, 1948. 
[SRI-0021 	P. Shivakumar, M. Kistlerand. S. W. Keckler, D. Burger, and L. Alvisi, 
"Modeling the effect of technology trends on the soft error rate of 
combinational logic," in Proc. ACM International Conference on 
Dependable Systems and Networks, June 2002. pp.  389-398. 
[SID-0981 	C. Sidney Burrus, Ramesh A. Gopinath and Haitao Guo. Introduction to 
Wavelets and Wavelet Transforms - A Primer. Prentice-Hall, 2' Ed. pp. 
312-339, 1998. 
[SMI-090] 	M. J. T. Smith and S. L. Eddins. "Analysis/synthesis techniques for 
subband image coding," in IEEE Transactions on Circuits and Systems for 
Video Technology, vol. 38, no. 8, pp.  1446-1456, August 1990. 
[SMI-099] G. Smit, et al. "Chameleon Reconfigure-ability in Hand-held Multimedia 
Computers," Proc. First International Symposium on Handheld and 
Ubiquitous Computing, international HUC conference proceedings 99, 
Karisruhe, Germany, September 1999, pp.  650-654. 
[SUT-098] 	R. Sutton, V. Srini, and J. Rabaey, "A Multiprocessor DSP System Using 
PADDI-2," Proc. of IEEE Design Automation Conference (DAC). 1998, 
San Francisco, CA, pp.  62 -65. 
[SWE-096] 	W. Sweldens, "The lifting scheme: A new philosophy in biorthogonal 
wavelet constructions," in Proceedings of IEEE SPIE, pp.  68-79. 1995. 
[SYNP-00] 	Synplicity, 	Inc. 	600 	West 	California 	Avenue 
Sunnyvale, CA 94086, USA, www.synpilicity.com  
[THA-0051 	Thara R. and Sanjukca B., "An Accurate Probabilistic Model for Error 
Detection", Electrical Engineering University of South Florida Tampa, 
Florida, USA Proceedings of the 18th International Conference on VLSI 
Design held jointly with 4th International Conference on Embedded 
Systems Design (VLSID'05), pp  244-248, 1063-9667/05 
[VAL-004] 	C Valens, "A Really Friendly Guide to Wavelets" 2004 
http://perso.wanadoo.fr/polyvalens/clemens/wavelets/wavelets.html  
[VEL-0941 Velazco, R.; Bessot, D.; Duzellir, S.; Ecoffet, R.; Koga, R. Two Memory 
Cells Suitable for the Design of SEU-Tolerant VLSI Circuits. In: IEEE 
Transactions on Nuclear Science. Vol. 41, No. 6, pp-180-188, December 
1994. 
[VIL-098] 	J.Villasenor, "The Flexibility of Configurable Computing," IEEE Signal 
Processing Magazine. Vol. 15, No. 5 ,Sept. 1998. pp.  67 -84. 
[VPR-0011 	VPR and T-VPack: Versatile Packing. Placement and Routing for FPGAs 
212 
References 
[WAI-097] 	E. Waingold, M. Taylor., "Baring it all to software: raw machines, trans 
on IEEE Computer (1997), vol 32: pp.  86-93. 
[WAL-062] J.T. Walimark, S.M. Marcus, "Minimum size and maximum packaging 
density of non-redundant semiconductor devices," Proceedings on IRE, 
1962, vol. 50, pp.  286-298, March 1962. 
[WAN-000} 	M. Wan, et al. "Design Methodology of a Low-Energy Reconfigurable 
Single-Chip DSP System". Journal of VLSI Signal Processing, Mar. 2000, 
vol. 9, pp.  671-678. 
[WAY-0791 	R. C. Wyatt, P. J. McNulty, P. Toumbas, P. L. Rothwell, and R. C. Filz. 
"Soft errors induced by energetic protons," IEEE Transactions on Nuclear 
Science.. vol. 26, pp.  4905-4910. Dec. 1979. 
[WEA-087] 	Weaver. H. "An SEU Tolerant Memory Cell Derived from Fundamental 
Studies of SEU Mechanisms in SRAM'. IEEE Transactions on Nuclear 
Science, December 1987, vol. 34, No. 6, pp.  1102-1110. 
[WHI-089] 	White, S. A., 'Applications of Distributed Arithmetic to Digital Signal 
Processing: A Tutorial Review', IEEE Acoustics, Speech and Signal 
Processing Magazine, issue 34, pp.  419. 1989. 
[WHI-091] 	Whitaker, S.; Canaris, J.; Liu, K. SEU Hardened Memory Cells for 
CCSDS REED Solomon Encoder. In: IEEE Transactions on Nuclear 
Science. VOL. 38, NO. 6, pp.  1710-1718, December, 1991. 
[W-HI-0931 	White, M., B. Bartholet, and M. Baze, "Automated Radiation Hardened 
ASIC Design Tool", 5th NASA Symposium on VLSI Design, 1993, pp. 
1141-1148. 
[WIS-093] 	Wiseman. D.; Canaris, J.; Whitaker. S.; Vembrux, J.; Cameron, K.; Arave, 
K.; Arave, L.; Liu, N.; Liu, K.'Design and Testing of SEU / SEL Immune 
Memory and Logic Circuits in a Commercial CMOS Process'. IEEE 
NSREC Conference. 1993. p.  287. 
[WUN-085] 	H. J. Wunderlich , "PROTEST: A Tool for probabilistic testability 
Analysis," Proceedings of the 22nd IEEE Design Automation conference, 
vol. 14-3, 1985, pp. 204-211. 
[XIL-000I 	Xilinx, "Virtex architecture guide", Tech. Rep., Xilinx, San Jose, CA, 
September 2000. 
[XIL-001] 	Xilinx. "Virtex 2.5V field programmable gate arrays", Data Sheet DS003- 
1. Xilinx, San Jose, CA, April 2001. 
[XIL-001I 	Xilinx, Inc., VirtexTM  2.5 V Field Programmable Gate Arrays: Product 
Specification. Xilinx, Inc., San Jose, CA, 1999. 
213 
References 
[XIL-002] 	Xilinx, Inc., VirtexTMII  Platform FPGAs: Detailed Description. Xilinx, 
Inc., San Jose, CA, 2002. 
[XIL-03a] 	Xilinx. Inc., Virtex-11 Pr0TM  Platform FPGAs: Advance Product 
Specification. Xilinx, Inc.. San Jose, CA, 2003. 
[XIL-03b] 	Xilinx, Inc., PicoBlaze 8-Bit Microcontroller for Virtex-E and Spartan- 
Il/lIE Devices. Xilinx, Inc., San Jose, CA, 2003. 
[XIL-03c] 	Xilinx, Inc., PicoBlaze 8-Bit Microcontroller for Virtex-11 Series Devices. 
Xilinx, Inc., San Jose, CA. 2003. 
[XIL-03d] 	Xilinx, Inc., MicroBlaze Processor Reference Guide. Xilinx, Inc., San 
Jose, CA, 2003. 
[XIL-094] 	Xilinx, Inc., The Programmable Logic Data Book. Xilinx, Inc., San Jose, 
CA, 1994 
[XIL-096j 	Xilinx, Inc., XC6200: Advance Product Specification. Xilinx, Inc., San 
Jose, CA, 1996. 
[XIL-BUK] 	Xilinx data book sheets, http://www.xilinx.com/partinfo/databook.htm.  
[XIL-DAT] 	Xilinx, The Programmable Logic Data Book, Xilinx Inc., 2001 
[YAN-092] 	F. L. Yang, R. A. Saleh, "Simulation and analysis of transient faults in 
digital circuits, IEEE Journal of Solid-State Circuits, vol. 27, pp.  258-
264, Mar. 1992. 
[ZHA-099] 	H. Zhang, M. Wan, V. George, and J. Rabaey, "Interconnect Architecture 
Exploration for Low-Energy Reconfigurable Single-Chip DSPs," 
Proceedings of the IEEE Computer Society Workshop on VLSI 99, pp. 
2-8, 1999. 
[ZIE-078] 	J. F. Ziegler, H. W. Curtis, H. P. Muhlfeld, C. J. Montrose, B. Chin, M. 
Nicewicz, C. A. Russell, W. Y. Yang, L. B. Freeman, P. Hosier, L. E. 
LaFave, J. L.Walsh, J. M. Orro, G. J. Unger, J.M. Ross, T. J. O'Gorman, 
B. Messina, T. D. Sullivan, A. J. Sykes. H. Yourke, T. A. Enger, V. Tolat, 
T. S. Scott, A. H. Taber. R. J. Sussman.W. A. Klein, and C.W.Wahaus, 
"IBM experiments of soft failures in computer electronics (1978-1994)," 
IBM J. Res. Develop., vol. 40. no. 1, pp.  3-18, 1996. 
214 
