Radiation Mitigation and Power Optimization Design Tools for Reconfigurable Hardware in Orbit by Larchev, Gregory et al.
Radiation Mitigation and Power Optimization Design Tools for 
Reconfigurable Hardware in Orbit 
Matthew French', Paul Graham', Michael Wirthlin3, Li Wang' and Gregory Larchev4 
'University of Southern California, Information Sciences Institute, Arlington, VA 
'Los Alamos National Laboratory, Los Alamos, Nh4 
3Brigham Young University, Proto, UT 
QSS Group Inc., NASA Ames Research Center, Moffett Field, CA 
Abstract- The Reconfigurable Hardware in Orbit (RHinO) the reliability of FPGA designs, inserting appropriate 
project is focused on creating a set of design tools that facilitate redundant hardware, and manipulating the low-level 
and automate design techniques for reconfigurable computing in structures of the EpGA design are needed for robust 
opefation and SEU and latch-up tolerance. space, using SRAM-based field-programmable-gate-array 
(FPGA) technology. In the second year of the project, design 
tools that leverage an established FPGA design environment Available FPGA synthesis tools optimize for speed or 
have been created to Visualize and analyze an FPGA circuit for area, but not for real-time power consumption. Limited power 
radiation Weaknesses and power inefficiencies. For radiation, a are such as xlim's mower; 
single event Upset (SEW emulator, persistence analysis tool, and however, these are difficult to use and have Limited utility to 
a half-latch removal tool for Xilinx Virtex-II devices have been the actual FPGA design process. Accurate power estimates created. Research is underway on a persistence mitigation tool 
are only achievable after completing an entire Iteration of the and multiple bit upsets (MBU) studies. For power, synthesis 
level dynamic power visualization and analysis tools have been design Cycle and provide no power optimization guidance. TO 
completed. Power optimization tools are under development and make effective use of FPGAs in space, took providing 
preliminary test results are positive. accurate power estimation and dynamic power optimization, 
operating on the FPGA's gate logic or on individual 
configurable logic blocks (CLBs), are needed; specifically: 
1) to monitor power consumption early in the design process 
I. INTRODUCTION 
FpGAs have a promising at a useful granularity (e-g., at CLB); 2) to aid in the design 
to processing On space-based Payloads. They Offer features analysis that captures data-dependent transients as well as 
that anti-fuse FpGAs do not, such as reprogramability, overall power consumption; and 3) to perfom automated 
embedded multipliers, and embedded processors, while also dynamic power optimization. 
offering 5- lox more logic gates. These features allow SRAM- 
based FPGAs to address resource multiplexing, fault Both the radiation-induced and power consumption 
tolerance, mission obsolescence and design flaws in on-orbit effects are currently handled through manual intervention or, 
payloads that directly impact design cost and mission risk, at best, through ad-hoc in-house tools. There is a real need in 
while also providing better processing perfomance. the comunity for validated design tool automation to raise 
However, a significant barrier to developing space-ready the technology readiness level (TRL) of SRAM-based FGPA 
S N - b a s e d  FPGA applications is &e difficulty in designing user designs. The RHinO project is leveraging an established, 
for the rigorous constr&ts mandated by the operational open-source tool-suite that accepts OU@.lt from COIIlXlerCiallY 
environment. Two main issues limit the use of conventional available synthesis took to create tools that  OW the 
FPGAs to such designs: 1) SRAM-based FPGAs are sensitive developer of a space-based FPGA application to 
to radiation effects, namely, total ionizing dose (TID), single automaticalb analyze and optimize a Xilim Virtex II FPGA 
event latchup (SEL), and single-event-upsets (SEUs), because circuit for both space radiation effects and Power Utilization. 
of their high propodon of memory structures; and 2) s m -  In the second year of this effort, the space radiation 
based Optimize for throughput at the effects and power analysis tools were refined and leveraged 
to build optimization tools. The final year of this effort will expense of power. 
FPGA focus on validating and improving the optimization took in 
devices provides enough tolerance for a large number of ESE the relevant m i r O m m W .  This Paper will Outline the JHDL 
orbits for total dose and latch up (no destructive latchups tool suite and extensions made to it for the Rl3110 toolkit in 
have been reported), however the SEU presence is a mjo r  section II. SEU effects tools are discussed in Section m, and 
designloperational issue. The large amount of static memory power ufities are discussed in Section w. synergy with 
within S M - b a s e d  FPGAs, such as look-up tables, routing evolvable algorithms for mitigating SEL effects are discussed 
switch tables, etc., makes them sensitive to SEUs. m e  in Section v. Section VI will summarize the progress to date 
traditional hardware redundancy techniques improve the and draw conclusions. 
reliability of FPGA designs (at the expense of increases in 
hardware, power, etc.), novel FPGA-specific techniques are 
required to address the unique vulnerabilities of SRAM-based A. Background 
FPGA architectures, while incurring less hardware overhead. the RHinO tools suite is built upon the 
Therefore, design automation tools evaluating and assessing open-sourced mL E23 FPGA design environment. The tool 
designs 
Current foundry process technology for 
II. The JHDL Tool Suite 




suite, shown in Figure 1, contains a digital circuit simulator, a 
circuit hierarchy browser, FPGA library primitives, and tools 
for exporting user designs into EDIF and VHDL. JHDL 
provides an open AF’I into the circuit structure to facilitate the 
creation of application-specific design aids for viewing, 
revising, manipulating, or interacting with a user design. The 
integrated design aids, circuit ,431, and flexibility of JHDL 
make it an ideal tool for aiding the development of radiation- 
hardened and power-aware space-based FPGA designs. A 
variety of application-specific tools can be created to analyze 
and improve the reliability of FPGA circuits. 
Figure 1. JHDL Tool Suite 
Under this effort, RHinO is devising new features for 
JHDL, specific to space environments, which would enable 
SRAM-based FPGA payload developers to confidently 
manage the limiting on-board spacecraft design constraints 
for power, radiation effects, fault-tolerance, reliability, etc. A 
, -  
The relationship between the EDIF tools and other RHinO 
tools is shown below in Figure 2. An FPGA design is loaded 
into the RHinO suite through the EDIF parser and into the 
EDIF data structure. At this point, the design can be 
manipulated or analyzed using one of several RHinO tool 
components. For example, power estimates of the design can 
be made by using the JHDL/RHinO power estimator tool 
chain. In this mode, a dynamic simulation of the design is 
created in JHDL to obtain the activity rates of design 
components and nets. The power estimation and viewer tools 
are available for browsing and viewing the results of this 
design simulation. Alternatively, the design reliability 
analysis tools may be invoked from the EDIF data structure. 
With these tools, the reliability of the design can be analyzed 
and presented to the user. As the project matures, design 
mitigation and power optimization techniques can be applied 
design over its orirrinal specification. 
\ J J 
VHCL EDIF 
-----c ---+ 3 -* 
key goal of the effort is interoperation with existing 
commercial tool flows based on VHDLNerilog, through 
seamless JHDL-EDIF translation. Alternatively, the user can 
work entirely in the JHDL design environment, using the 
RHinO power and SEU tools in concert with the normal ‘. cross 
JHDL features for simulation, netlisting, and runtime control, 
all within a single user interface. 
Figure 2. Tool Infrastructure 
Naming and CorreLation 
An important capability required within this design 
infrastructure is the ability to correlate design resources 
between the various tool stages and with the COTS tool flow. 
For example in power analysis, design resources are 
B. RHinO Enhancements represented in three different design databases. First, design 
During the first Year of this effort the JHDL infrastructure resources are specified in the original EDIF source and are 
was enhanced to Support the desired SEU mitigation and captured within the EDIF design environment. Second, the 
power tool functionfity. A GUI event API was developed to EDIF design is translated to the JmL for dynamic 
support intercommunication and interoperability with Other simulation. Third, the EDIF design is translated to the FPGA 
modules, or tools that could be dropped into JHDL. AS technology specific netlist after technology mapping. The 
shown in Figure 2, this has led to the development of multiple power capacitive loading values afe made available at this 
to01 modules being able to leverage the Core JHDL level, but need to be relayed to higher levels to make power a 
capabilities. first class design constraint. 
Recently, considerable effort Was spent enhancing the To properly analyze and estimate the power consumption 
EDIF netlist tool, originally created to support importing 31d of FPGA designs, design resources must be correlated 
party IP. The EDIF netlist Parser and data Structure SOftware between the three design representations. Figure 3 below 
provides the central design database for both the RHinO represents the relationship between these design 
power analysis tools and RHinO design reliability and representations. The difficulty in name correlation occurs due 
mitigation tools. These tools provide two important to the difference in the way that the vendor specific names 
capabilities for the RHinO to01 suite. First, these tools and the m L  simulation environment interpret names. The 
provide the capability of impumhg an FPGA design created vendor-specific ‘minx’’ naming scheme uses the EDIF 
with a third-party tool into the RHinO infrastructure. Second, ‘c~rigina7 name for naming each of these resources. These 
these tools provide a consistent circuit database for each of names are the names chosen for design resouces by the 
the tools created in the RHinO project. original design synthesis tool. JHDL uses the “valid” EDIF 
name to represent each of its design resources. A name B. SEU Analysis and Emulation 
management resource was added (represented by the red To analyze an FPGA design for SEU robustness for half- 
arrow to the right) to Provide a fast matching capability latches, persistence, and MBUs, an effective SEU emulation 
between the in JHDL and the platform, the Virtex-II SEU Emulator (V2SE), was completed 
capacitive loading resources defined by Xlinx. 
Figure 3. Signal Naming Relationships 
These tool enhancements were critical to better 
interoperation with the commercial-off-the-shelf (COTS) 
synthesis, placement and routing (PAR) tools, as well as to 
provide what until now has been low-level bit-stream 
information at higher levels. Bringing this information to 
higher levels allows the tools outlined in sections III and IV 
to operate on smaller, faster files and databases, and more 
easily relay information to the user at the design entry level. 
III. SEU Radiation Effects 
A. Background 
To further advance the TRL’level of Virtex-I1 FPGAs for 
space applications, the RHinO project has a goal of 
improving the reliability of user designs in the presence of 
SEUs. SEUs are the main radiation concern since these 
FPGAs have been shown to have acceptable tolerance to TID 
as well as to SEL for low earth orbits (LEO). SEUs can 
occur in several memory structures on these S W - b a s e d  
FPGAs [3 41, namely in the support and control logic, the 
user design state, the programming memory (often called the 
configuration memory), and half-latches. Upsets in the 
support and control logic can have a range of effects, from 
fairly benign to totally erasing the contents of the FPGA 
configuration memory. Upsets in the user design state, such 
as in flip-flops and on-chip memories, may cause faults to 
occur in the user design’s operation, while upsets in the 
configuration memory may change the user‘s design directly 
by changing the connectivity of logic or changing the logic 
functions themselves. 
This year, the SEU radiation research completed the 
development of the SEU emulator and half-latch mitigation 
tool and focused on two specific areas, error persistence and 
time an error persists once a circuit experiences an SEU. 
Circuits with feedback paths or containing state machines 
may experience a significant degree of error persistence. For 
some applications, it may be enough to apply triple-modular 
redundancy (TMR) to the parts of the circuit exhibiting the 
most error persistence, drastically reducing the amount of 
overhead usually associated with TMR. MBUs due to a single 
charged particle are important since they could potentially 
affect multiple modules in TMR and produce incorrect circuit 
values. 
;T;”l+:-h:+ ..-nn+n fkBRTT.-\ E-nv n n - n : n + ~ - n ~  vafpln +n lnH,.+l-. nf 
UIU U I L  uJ+Wm \r*ruvo/. Ll l lVl  puoLalL,sb” L b I C l l i )  ‘V L G ‘ ‘ ~ C u  V I  
€or characterizing individual designs for their particular set of 
SEU sensitivities. The V2SE is a combination of COTS 
hardware, custom software, and a single, low-cost (<$150) 
custom printed circuit board (PCB). 
Figure 4. COTS-based SEU Emulator 
The V2SE has been validated using a series of different 
tests and designs. Further, a graphical interface was 
developed to allow the user to better visualize the results of a 
particular execution of the V2SE on a design. Since the 
results of the SEU emulation sessions have the same 
floorplan as the user FPGA designs themselves, the 
visualization tools also suggest that SEU emulator is 
performing well. 
The object of SEU emulation is to provide a cost-effective 
and quick alternative to accelerator testing for understanding 
user FPGA designs’ SEU sensitivities. To validate this goal, 
the cost of upsetting 1.5 million programming bits through 
irradiation was compared against the costs of the V2SE. The 
V2SE, including the cost of a Linux PC to control the boards, 
costs about $6000. The V2SE can inject 1.5 million bit 
upsets in about 30 minutes given the above criteria. At an 
accelerator, a similar number of upsets with the same ability 
to distinguish the cause of output errors would take about 11 
days of test time (assuming about 1.5 SEUs per second). 
Further, the V2SE provides considerably more control over 
what gets upset and when than the accelerator. 
C. iiaifrLatciz iliitigation 
To meet the goals for automated SEU mitigation of 
Virtex-11 user designs, the V2SE was employed in validating 
the RadDRC-II tool, which was developed to eliminate half- 
latch SEU sensitivities from user FPGA designs. Unlike the 
results of using SEU emulation for Virtex, SEU emulation for 
Virtex-II does not seem to ,upset half-latches as frequently. In 
fact, there was no noticeable difference between unmitigated 
Virtex-II designs and those mitigated by RadDRC-II. 
The V2SE was then used for a proton radiation test at 
Crocker Nuclear Laboratory to further validate the effects of 
the RadDRC-II. The initial results with 63 MeV protons 
suggest that RadDRC-II does improve the reliability of 
designs, but more test data is required to confirm this. 
More specifically, two hard lockups of the unmitigated 
designs were observed-lockups that were typical of half- 
latch behavior with the earlier Virtex family. During one test, 
the unmitigated design also showed significant sequences of 
design lockups (>lo consecutive lockups). The mitigated 
designs did not show either of these behaviors. 
D. Persistence 
Characterizing the error persistence within FPGA designs 
[5 61 is of interest because it can lead to a way of gaining 
design robustness without extensive redundancy. For 
instance, for a certain test signal processing design, SEU 
emulation was performed on the design (i.e., intentionally 
injecting faults into the FPGA’s programming data) and the 
FPGA programming bits that caused persistent errors were 
characterized. For this particular design, about 10% of the 
FPGA’s configuration bits caused an output error while only 
about 0.24% of the configuration bits caused a persistent 
error-almost two orders of magnitude fewer bits (see Figure 
4). This suggests that to have a design that continues to 
operate in the presence of SEUs without the need of a design 
reset, a significantly smaller amount of redundancy may be 
needed than full triple modular redundancy for the design. 
accelerator’s results were within 15% of what was predicted 
through SEU emulation. Further, it was confiied that there 
are no “hidden” structures in the FPGA (other than 
unmitigated half-latches) that will cause this persistence 
behavior. 
The results also point out, though, that the prediction of a 
particular persistent failure can be very difficult due to the 
large state space for significant digital designs and the 
variable amount of time required for certain errors to be 
flushed out of the system. In other words, the persistent 
failures often depend heavily on when an SEU occurs, what 
was. upset, and what data was being processed at the very 
moment of the upset. Further, for some designs, the longer 
one waits for values to flush out of the system, the fewer 
upsets will be classified as persisent since more and more will 
eventually flush out of the system. In other words, 
persistence as tested through SEU emulation and sampled at 
the accelerator is always relative to the amount of time 
allowed for errors to flush out of the system. 
E .MBU Analysis 
The main concern with respect to MBUs is whether or not 
MBUs can affect user FPGA designs that employ TMR or 
other forms of redundancy. To better understand the 
frequency of multi-bit upsets due to individual strikes by 
charged particles, Xilinx FPGAs from the Virtex, Virtex-11, 
and Virtex-4 families were irradiated with 63 MeV protons 
during multiple visits to Crocker Nuclear Laboratory. For 
these experiments, the occurrence of SEUs in these FPGAs 
was sampled and then the sampled SEUs were clustered 
based on their physical locations on the devices. An 8-bit 
neighborhood was used around each bit to determine 
adjacency. It was observed that 0.045% of the upset events 
experienced by Virtex XCVlOOO FPGAs were MBUs while 
about 1.07% of all upset events experienced by v&ex-a 
~ ~ 2 ~ 2 5 0  and ~ c 2 ~ 1 0 0 0  F ~ G A S  were MBU events-an 
increase in frequency of about 24 times between the FPGA 
families. As expected by Xilinx, Virtex-11 exhibited a 
During multiple tests at Crocker Nuclear Laboratory using significant bias in the location of &@3uS-8% of the M B U ~  
63 MeV protons, the SEU emulation predicting SEU error were within the same configuration data frame-while virtex 
persistence for designs was verified. Four designs were MBQ were within the same row of configuration 
tested, including the test digital signal processing (DSP) data (adjacent data frames with the same bit 
design (as mentioned above), a Pipefined may of multipliers offsets). The Virtex-4 data has not been completely analyzed, 
and adders, a design With an may of Counters, and a but initial real-time feedback provided at the test suggested 
TMR version of the counter array design. that about 1% of the events were MBUs. 
The bitstream SEU sensitivity observed at the accelerator F~~ test data verification, the probability of causing false 
matched the predicted sensitivity using SEU emulation quite mus rando&y was calculated and then against 
well- For the unmitigated designs, greater than 90% of the Monte Car10 simulations to verify these predictions. The 
bits that caused output errors in the accelerator testing also of this analysis are shown in Table 1. ~n the worst 
caused output errors during sEu emulation- with the TMR case, the actual observed MBU rates were more than 25 times 
design we tested, only 60% of the bits that caused an output the predicted false rates, suggesting that an actual 
error seen by the accelerator were predicted by SEU m u  phenomenon has been observed. 
emulation, but this was not a big concern since the design had ne mno team has also started analyzing heavy ion 
Very few problem bits to begin with at the accelerator (a total data that was taken by and the Radiation Test 
of 5), so the actual difference in only a few bits. Consortium for Virtex-II. The early analysis of the data has 
with regards to error Persistence, the Proton tests been for LETs of 1.5-60 MeV/mg/cm2. For these LETs, 1% 
confirmed that SEU emulation is effective at predicting the to 35% of the events observed are mus. A more &tailed 
statistics of the persistent errors. For the four test designs, the on mus  in s m  F ~ G A ~  can be found in [n- 
Figure 4. Cornparkon of TOM Sensitive Cross-section 
(middle) to Persistent Cross-section (right) for a test DSP 
design (left) 
The technical approach can be broken into two parts, 
developing a tool infrastructure to support synthesis level 
simulation and circuit queries, and developing a synthesis- 
level power model. The tool infrastructure was discussed in 
section II to develop power tools to support querying and 
assigning capacitance values, and tracking wire toggle rates 
during simulation. Additionally, interoperability with the 
Xilinx tool flow was added, allowing Dower reports and 
s .ncd files to be imported for detailed analysis of a placed and 
with observed MBU rates routed circuit’s power characteristics. This environment 
allows a synthesized circuit to be directly compared against 
its placed and routed form to track what factors at the 
synthesis level lead to high power consumption at the placed 
A. Background and routed level. This environment allows the development 
’ Current SRAM-based FPGA design tools have been and experimentation of several power models, from generic 
created with only speed or area optimizations as their goal ‘toggle rate only’ models to the exact timing-level placed and 
and only recently have accurate power measurement tools routed circuit. After the power models have been created and 
become commercially available. These tools, such as the verified, the end user can import their design into the JHDL 
Xilinx mower tool, are limited in the content of the power environment using the EDIF import tool and simulate using 
information they provide, readability, and their entry point in the Synthesis Power Model (SPM), to generate synthesis 
the FPGA design flow. Pre-place and route estimates are level power reports, as depicted in Figure 5. 
currently performed by manually entering and estimating 
device utilization, toggling rates, and routing interconnect, 
and automated power measurements are only available after 
going through a complete iteration of the entire design flow. 
This process can be ad hoc and time consuming, as a designer 
IV. Power 
needs to interact with multiple tools that generate multiple 
memory-hungry intermediary files. At this level, the machine- 
generated signal names are difficult to resolve with their 
functional level counterparts in order to make optimizations. 
In other words, very little guidance is given to the designer on 
how to optimize if the power specifications were not met. 
The goal of the power analysis and optimization tools is 
to make power a first-class design constraint by moving 
power analysis and optimization closer to the design entry 
point, with the help of the tool improvements outlined in 
Section E. Whereas the first year of this effort focused on 
developing accurate time-based power simulations of placed 
and routed circuits and enabling capabilities to rapidly sort, 
find, and cross probe signal and components, the second year 
used this infrastructure to develop more sophisticated power 
modeling to do power estimation and analysis at the post 
synthesis level, without the need for going through place and 
route. 
B. Post Synthesis Power Modeling 
Modeling at the yost synthesis level has the following 
advantages: 1) early power feedback in the design flow, 2) 
power results are displayed at a high level, closer to the 
logical design entry point 3) and bulky, low-level timing 
accurate simulation and stimulus files are eliminated. These 
three aspects allow a designer to quickly and easily generate 
power estimates, relate the results back to their original 
logical level design enfry, and explore design trade-off 
scenarios. The results presented here were derived using 
Xilinx Viex-IT FPGAs and tool suites, however the 
techniques apply to all FPGAs. 
Figure 5. Power Modeling Tool Infrastructure 
For generation of the SPM, placed and routed circuit 
power consumption was analyzed and then compared with the 
available information at the synthesis level. Dynamic power 
consumption is described as: 
Power=CF, * q  *c, * v , ~  (1) 
I 
where F is the frequency, T is the toggling rate, C is the 
capacitance and V is the voltage of the ith component. For 
our modeling, we will assume the voltage will be fixed as per 
device specifications. The frequency and toggling rate of each 
net can be tracked within the JHDL simulation environment, 
allowing multiple subcomponents of the design to be tracked 
and the window of time that the power is averaged over to be 
The final term, the capacitance of each component, can be 
broken into two parts, the capacitance of the FPGA logic 
resource (ie LUT, BlockR4M, Multiplier etc) and of the 
interconnect route that it drives. Some capacitance 
information of FPGA resources has been published and other 
resources’ capacitance values can be extracted from the 
Dower reports to create a complete iesource capacitance 
model for any Xilinx device. 
ch&7ged &y,n&c.-&y &&-@ sk-L-&&cGz, .&s in Figre 6. 
Figure 6. Power Tool Simulation 
The final and perhaps most difficult piece is modeling the 
routing interconnect. While an unoptimized list of the 
resources is known at after synthesis, the exact routing 
interconnect is unknown until placement and routing is 
performed, which can account for 50-70% of the dynamic 
power consumption of a circuit. In the Xilinx Virtex 2 device 
for example there are 4 types of interconnect (direct connect, 
doubles, hex, and long lines), each with a different 
capacitance and they can be joined in any number of 
combinations. Slight logic changes or different random seed 
inputs to the Xilinx place and route tools can yield large 
differences of a particular net's length as the router is purely 





timing constraint driven. 0 50 100 150 
Using the JHDL power infrastructure, several designs on 
a Virtex-II 6000 varying in functionality and utilization, as 
shown in Table 2, were profiled and relationships between 
wire capacitance and fanout, wire length, number of switch c. Power Optimization 
boxes etc, were explored. Fanout proved to not only be a 
strong predictor of net capacitance; it also correlates well 
Fan out 
Figure 7. Net Capacitance vs. Fanout 
Dynamic power consumption can be reduced by reducing 
from the placed and routed circuit to the synthesized circuit. 
Though 
information is not available at the synthesis level. Figure 7 switching capacitance* Dynamic Power 
depicts the capacitance versus fanout plot for the nets in the logc component Power, and 
term in equation 1, however as the is fixed and 
and toggling rates are largely length and the number of switch boxes also the Operating 
correlated well with a d  routed net capacitance, this application driven, the most robust approach is to lower the 
Power, 
power* The focus 
Counter circuit. The fanout relationship was aggregated over Of the power Optimization 
SPM. It should be noted it is not expected to achieve precise by external interfacing requirements Off 
accurate enough to guide the user to design-level hot-spots in and DLmCMs are 
the circuit. Single net outliers from the fanout behavior are At the level in the COTS tool flow that &e W n O  tools 
best left to lower level analysis and optimization in mower are currently capable of addressing, the biggest power 
and FPGA Editor. variabie we can address is iength, and therefore ~ ~ p & . t i ~ e  
After the SPM was developed, we modeled the power of the signal interconnects. h the following sections, two 
consumption of the circuits in Table 1 and cornpared them to methods for optimizing signal power within the context of the 
the values obtained by using the corresponding placed and COTS PAR tools without altering FPGA design functionality 
routed circuit and XPower results. For the counter circuit a are discussed. Both of these methods work based on 
mean power error of 3.2% per net and load, with a standard shortening signal routing paths for reducing signal 
deviation of 1.6% is achieved. Further enhancements to the capacitances. 
model, such as predicting routing congestion based on total depicts the power optimization tool flow on the 
left hand side, and the verification tool flow on the right hand device utilization, are also being evaluated. 
side. The power optimization tools for both methods generate 
is to reduce the and 
pool to develop the clocking power of a circuit as I/O power is determined more 
a'd 'OgC 
the designs in the circuit 
power estimation at the post synthesis level, but rather to be c0mP0nent7 such as cLBs, multiPliers, 
during synthesis* 
user constraint files, which are input to the Xilinx PAR tools. 
With these added constraints, the COTS tools are able to 
create more power efficient circuits. The first power 
optimization method presented is to add timing constraints on 
signals. The second method is to add location constraints on 
flip-flops. Both methods have the effect of over constraining 
power sensitive parts of the circuit, forcing the PAR tools to 
use shorter, lower capacitance interconnects. 
J Ve\rerificatiun - -_ -____  
Figure 8. Power Tool Flow 
Timing Constraint Based Power Optimization 
In the timing constraint power optimization approach, 
timing constraints are added to signal wires to essentially 
translate power interconnect specifications into timing 
constraint specifications that the Xilinx PAR tools can work 
can be grouped together to minimize routing interconnect. 
Also, another important effect of this approach is that it can 
pare down the clock distribution tree, further reducing power. 
Global clock signals in FPGAs have dedicated low-skew 
nets with short delays. In the Xiilinx Virtex-11 FPGAs, the 
clock nets are distributed like a tree into 4 clock zones: 
northwest, northeast, southwest, and southeast [9]. 
Furthermore, each clock zone quadrant branches further into 
sub-zones. In the chip investigated here, the Vertex-II 6000, 
each clock zone had 6 clock sub-zones, 88 x 16 slices in each 
sub-zone. The main clock trunk travels in the north south 
direction across the middle of the chip, with clock branches 
extending out from the trunk to the west and east into the sub- 
clock zoGes and clock sink cells. Clock s@ cells include flip- 
flops, which dominate clock sink components, block RAMS, 
block multipliers etc. Unused clock branches remain static, so 
moving logic to trim the clock tree reduces power 
consumption. Figure 9 depicts two different placements of 
two flip-flops. On the right hand side, it can be seen that by 
placing two flip flops of the same clock domain near each 
other an entire quadrant of the clock tree can be trimmed, 
reducing. uower. - . =  with. By raising timing constraints above the minimum 
operating frequency for high capacitance signals in a design, 
the PAR tools work to achieve shorter, or more power 
efficient, routes. From VLSI interconnect technology[8], the 
delay of a wire is proportional to wire capacitance C, 
therefore minimizing delays has the added benefit of 
minimizing capacitance and therefore power. In this way, 
power optimization is achieved by over-constraining the 
timing on signal wires to get lower signal capacitances. 
Currently, the power optimization tools provide the user 
with a wire table, which can be sorted by simulated power 
consumption, load, fanout, etc. This helps the user to identify 
the most power critical wires, and rapidly create timing 
constraints on every net in a design if desired. For 
preliminary results to verify our methodology, we used a 
memory diagnostic and EDAC circuit for testing which 
contained 10,259 signals. The average toggling rate per wire 
during simulation was 12.5%. Testing was done on a variety 
of cases such as putting constraints on all toggling nets, 
constraints on nets with fanout of 10 or less, the top 25% 
most power hungry nets, etc. Preliminary results showed 
better placed and routed circuits with up to a 12% decrease in 
dynamic power consumption without changing the 
functionality of the circuit. Experiments are also underway to 
determine the effects of how much timing delay tightening 
should or can be applied. 
Location Constraint Power Optimization 
Another way to affect the commercial PAR tools is 
through specified location primitives that either define 
relative placement to other macros in a circuit or absolute 
placement within the device..By defining the placement with 
an eye towards power optimization, potentially high 
capacitance signal lines with high fanout or high toggle rates 
Figure 9. Logic Placement Clock Tree Effects 
The power optimization tools for placement currently 
provides the mechanisms to help users decide where to put 
flip-flops and how many FPGA slices are need to 
accommodate a groups of flip-flops. The tool generates 
location constraints automatically. For the same test design 
used in the timing constraint experiments, there are 4 clocks, 
with fanout ranging from 488 to 2825. Preliminary test results 
were obtained for two cases. In the first case, the circuit was 
treated as a module, where I/O logic was free to be moved. In 
this case, dynamic power was reduced by up to 23%. For the 
second case, the I/O logic was left in specified IOBs, as in a 
typical full chip design. In this case the power was reduced by 
up to 11%. 
D. Current Status and Future Direction 
The preliminary power optimization results are 
encouraging. It should be noted that these tools currently are 
at a level where user guidance is still required for 
optimization and only a few of the possible optimization 
techniques have been experimented with. In the upcoming 
year, the research will focus on automating the optimization 
process through various algorithmic techniques. Also, the 
timing constrainf approach and the placement approach are 
not mutually exclusive, so the effects of combining these two 
methods will also be explored. 
Finally, the JHDL module generators will be expanded to 
include power (and SEU tolerance) as a design entry 
9 
constraint, in addition to traditional throughput and precision performance over the course of each run ranged between 
constraints. The module generators can then use tbe krtown 12.6% and 22.8%. What these results teII us is tfm while 
performance and capacitance information of FPGA some circuits respond better to evolutionary repair than 
components to create optimal modules for a particular desigo. athe~s, the EA methods result in noticeable improvement in 
performance in all the circuits tested. The chance of 
successful repair usually depends on the size and compfexity 
of the circuit as well: as the number and location of the faults; 
but eves if EA approach cannot completely repair each 
individual circuit, there is a good chance that it can restore 
V. Evolvable Techniques 
A. Background 
dealing with SRAM-based WGAs, for certain orbits and life- =. 
spans, Tn, and SE;L do become a concern. Since it is 
virtually impossible to replace spacecraft components in-situ, 
there is a ckar opportunity for fault-tolerant FPGA circuits. 3- C m n ~  S m m  a d  f;uhn? Direcbns 
Evolutionary a lgorih (EA) methods hold promise in The evolutionary algorithm is being further refined to 
thek ability to search across the space of FPGA handle larger, more sophisticated circuits using the Viaex-II 
configurations for those that can function in the presence of FPGA architecture, with the goal of showing that the 
certain types of faults. Since S W - b a s e d  FPGAs are fidly a l g o ~ ~  is robust enough to solve SELs in the project's 
reprogrammabte, it is possible to restore the functionality of benchark 3x3 image convolution kernel and other red-life 
the compromised FPGA by rerouting a circuit around applications. When a fault occurs in a large, complex circuit, 
corrupted resources, a property wbich the RHinO team is the plan is to isolate the fault to a simpler component and 
exploring. ' then to re-evolve the component. Operating on full-sized 
Autonomous repair can either provide an alternative applications on FPGA hardware will be a major tbrust of this 
to or supplement redundancy as a means of restoring lost effort. 
capability. Some circuit configurations are more responsive All of the team's FPGA evolution work to date uses 
to evolutionary repair than others. If a particular circuit has bitstring chromosomal representation. This representatim is 
been shown to respond well to evohtkmxy repair, then EAs the simplest but also the least efficient one. As larger c~~~ 
can be relied on as a primary source of fault tolerance. 'This are considered, the shortcomings of bitsting represenration 
allows the engineers to avoid the increased size, weight, and Vvil become more appasent. Therefore, work is underway to 
power consumption, traditionally associated with providing change the algorithm to a generative (tree-Iike) 
redundaat spares. In cases when EAs have difficulties representation. By being mote conducive to component reuse, 
producing fully functional repairs, it is stal possible to use generarive representation shortens the chromosome length 
these methods alongside traditional redundancy techniques. and makes the evolution of the larger circuits more 
By repairing each individual triplet of a triple-modular- manageable. 
redundant system, it is possible to improve the performance In the upcoming year, the algorithm will be proven on 
of each triplet by a large enough margin so that the majority larger application circuits and integrated With the rest of the 
output is 100% correct (even if each individual output is not.) m n O  toolkit. UltimateIy, the evolutionary algorithm will 
The objective of thr: WQFk described here was to evolve circuits which are not o d y  immune to existing SELs 
investigate how various small circuits {some of which are but also use guidance fiom the rest of the rZHin0 toolkit to 
commonly used in spacecraft electronics design) respond to a create circuits #at are SEU tolerant as well. 
pre-determined Ievel of simulated radiation damage. One 
sequential and three combinational circuits were tested The 
sequential circuit was the quadrature decoder (a 4-state state 
machine.) 'pae combinational circuits were a 3-by-3 h the second Year of this effo~% the basehe tool 
multiplier, a 3-by-3 bic adder and a 4-to-7 bit decoder (a infi.astruCture Was built upon enabling breakthroughs in 
used to con so^ the h&vidU& segments of a 7,segment radiation analysis tools and power d J ' S i S  took. The J m L  
ED &splay.) ne ckc-ts were subject to a number of bfraStructuSe was enhanced t0 d O W  better GUT development 
simulated faults, where at least 10% of the circuit's LUTs mcf fully sUFPOrt EDIF file from a vafieq Of 
wg&j be sei 
an output short to power or ground.) In addition, between be added radiation arena, the 
1.5% and 2% of d l  the LUT bits were "hard-wired" to 0 or 1. sEu em'~tor was cOmpfe@d the half-latch too1 was 
Such fault scheme was u n d e d e n  in order to try and take Ve&&* and work has be@n 
into account the acwd logic vs. routing transjstor &sn-ib&On m&'siS. FindIY, the power mdysis modeling tools were 
on an WGA. completed, allowing a user to obtain accurate power 
With the specified fault penetration, fie average repair estimtions in the desim flow and power optimization 
rate (percentage of took have been developed, yielding promising h&d returns. 
ranged fiom 0% (for the multiplier) to 90% (for the 4.40-7 bit ne next Year's seek to levewe the infrasmclure~ 
decoder.) However, the average @rot.emat in CirGtjt further, validating and Verifying the tool5 and measuring the 
Though 3 m S  are the primary ~ ~ n c m r l  w En 100% over. function&ty den us& in c o n j u n c ~ o ~  && 
VI[. Conclusions 
a corfimit 0 a 1 cornersid S>%LkSif tmls, %bid? 2ll.Iwe6 mdti@e tsds to 
the 3!dlX backbone- In 
persistence and 
ac+ved 100% 
power trade-offs of variom radiation mitigation schemes. 
6 
References 
French, et al., “Design Tools for Reconfigurable Hardware in Orbit,” Earth 
Science Technology Conference 2004, Palo Alto, California, June 2004. 
“Adaptive Computing Systems”; 1997- 2003 DARPA effort; see 
www.jhdl.org ’ Carl Carmichael, Earl Fuller, Phil Blain, and Michael Caffrey, “SEU 
Mitigation Techniques for Virtex F’PGAs in Space Applications”, 
Proceeding of the Military and Aerospace Programmable Logic Devices 
International Conference (MAPLD), Sept. 1999, Laurel, MD, pp. C2.1-8. 
Michael Caffrey, Paul Graham, Michael Wirth!in, Eric Johnson, and 
Nathan Rollins, “Single-Event Upsets in SRAM F’PGAS”, Proceedings of the 
5th Annual International Conference on Military and Aerospace 
Programmable Logic Devices (MAF’LD), Sept. 2002, pp. P8.1-6. 
E. Johnson, JS. Morgan, M. Wiahlin, M. Caffrey, and P. Graham, 5 
“Persistent Errors in SRAM-based FPGAs,” MAPLD ’04, Washington, 
D.C., September 2004. 
K. Morgan, E. Johnson, B. Pratt, M. W a i n ,  M. Caffrey, and P. Graham, 
“SEU Induced Error Propagation in FPGAs,” IEEE NSREC ’05, Seattle, 
WA, July 2005, To be presented. 
P. Graham, H. Quinn, J. Krone, M. Caffkey, and S. Rezgui, “Radiation- 
Induced Multi-Bit Upsets in SRAM-Based FF’GAs,” EEE NSREC ’05, 
Seattle, WA, July 2005, To be presented. 
* www.ece.utexas.edu/-adnan/vlsi-O~Aec6lnterconnect.u~t 
h~:/lwww.xiLinx.comlbvdocsl~ublications/ds031 .pdf, page 36. 
