# **SEE Test and Data Analysis for Complex FPGA Systems**





Melanie Berg<sup>1</sup>, Michael Campola<sup>2</sup>

Melanie.D.Berg@NASA.gov; Melanie.Berg@SSAI.com
1.SSAI in support of NASA/GSFC
2. NASA/GSFC

#### **Acronyms**



- Application specific integrated circuit (ASIC)
- Block random access memory (BRAM)
- Combinatorial logic (CL)
- Configurable Logic Block (CLB)
- Device under test (DUT)
- Edge-triggered flip-flops (DFFs)
- Field programmable gate array (FPGA)
- High speed serial interface (GTX)
- Input output (I/O)
- Intellectual property (IP)
- INV (inverter)
- Linear energy transfer (LET)
- Look up table (LUT)
- Mean fluence to failure (MFTF)

- One time programmable (OTP)
- Operational frequency (fs)
- Power on reset (POR)
- Place and Route (PR)
- Representative Tactical Design (RTD)
- Reprogrammable (RP)
- Single event functional interrupt (SEFI)
- Single event effects (SEEs)
- Single event failure (SEF)
- Single event latch-up (SEL)
- Single event transient (SET)
- Single event upset (SEU)
- Single event upset cross-section (σ<sub>SEU</sub>)
- Static random access memory (SRAM)
- Static timing analysis (STA)
- System on a chip (SOC)

#### **Problem Statement**



Single event upset (SEU)
Field programmable gate array (FPGA)



- Data are extrapolated into survivability calculators.
- Generic SEU data are used across all designs.
- Assumption: the need for testing is reduced.
- However, the fidelity of generic SEU data extrapolation to tactical designs is questionable.

Better to use representative tactical designs (RTD) for SEU analysis:

- Data are a better fit for characterizing tactical behavior.
- However, requires SEU testing for every design!

How do we provide SEU data for survivability calculations of tactical systems; while reducing the need to test every design? Generic testing versus Test-As-You-Fly.

#### FPGA SEU Cross Section Model





Configurable logic block: (CLB)

Block random access memory: (BRAM)

Intellectual property: (IP); e.g., micro processors, digital signal processor blocks (DSP), embedded state machines, etc.

Global Routes: (GR)

Analog circuits

SEU Cross sections for a mapped design ( $\sigma_{SEF}$  ) are based on the FPGA's internal elements and the mapped design's topology.



Dominant mechanisms of failure will drive  $\sigma_{SEF}$ 

#### **Embedded View of Mapped Logic**



FPGA configuration and user

logic are different types of embedded components.



Modern FPGAs have 100's of millions of configuration bits and 100's of thousands of logic cells.



Designs only map into a portion of the configuration and only use a portion of the user fabric logic gates.

#### Generic Test Structures: Shift Register





LUT

LUT

User logic: Lookup Table (LUT)



With an SRAM-based FPGA, each design uses more logic than assumed. Makes extrapolation of SEU data (from simple test structures to tactical designs) unreliable.

Generic Xilinx Implementation (LUT can differ by family)

LUT

LUT

# Closer Look: Shift Register with Manufacturer Inserted Routing Matrix (Hidden Logic)





Simple test structures will not capture the impact of a tactical design's hidden logic (data are not extrapolatable). Hence the drive towards testing RTD structures.

# Representative Tactical Design (RTD) Test Structures and MFTF Test Strategies



- RTDs are based on tactical designs and might contain the following:
  - Embedded processors

Mean fluence to failure (MFTF)

- Highspeed serial (GTX)
- Embedded SRAM (BRAM)
- Global routes
- Obey tactical design strategy:
  - Synchronous design
  - Routing/floorplanning specifics
- Piecemeal tests, yet use complex structures:
  - Increases visibility
  - Study trends
  - Have at least one full RTD (close as possible to tactical)
- MFTF testing requires an increase in the number of experiments (statistics).
- MFTF testing will be driven by dominant mechanisms of failure in the design (given proper testing and visibility into failure).

# RTD Test Structures and MFTF Strategies: Not a Simple Task



### The following expertise is required:

- Professional design techniques
- Complex test system development
- The ability to create visibility into test structures for proper MFTF measurement
- Knowledge of test facilities







# Data Analysis: Easing the process of SEU test and analysis for tactical-design survivability prediction.

The following slides only apply to Xilinx SRAMbased FPGA devices with no embedded or user inserted mitigation.

# Configuration, Mask, and Essential Bits



**Configuration Total (fixed per each FPGA type)** 

|          | Essential bits | Total bits | Masked bits | Unmasked bits |
|----------|----------------|------------|-------------|---------------|
| Design A | 13326446       | 115522848  | 8853590     | 147819850     |
| Design B | 10334231       | 115522848  | 27727958    | 128945482     |
| Design C | 6515993        | 115522848  | 8857942     | 147815498     |

Masked Total (calculated by the manufacturer and is not under user control... design and device dependent)

Essential Bit Total: number of configuration bits used by the design mapping (calculated by the manufacturer upon user directive... design and device dependent)

#### **SEU Cross-Sections**





- Cross-section Categorization:
  - Across all configuration cells (device)
  - Per configuration cell (device-bit)
  - Across essential-bits (Design + device)
  - Design specific

Generally, configuration cross-sections are readily available from generic device investigations.

$$\sigma(LET)_{configuration\_Device} = \frac{\#errors}{\#Particles/cm^2}$$

$$\#errors$$

$$\sigma(LET)_{configuration\_bit} = \frac{\text{\#Particles}}{(\text{\#Particles})^*(\text{\#unmaskedconfigurationBits})}$$

$$\sigma(LET)_{Essential\_bit} = Essential\_bits \times \sigma(LET)_{configuration\_bit}$$

$$\sigma(LET)_{SEF} = 1/\text{MFTF} = 1/((FailureTime - BeamStartTime)*AverageFlux)$$

Which cross-sections do we use for survivability analysis?

Must consider mission requirements.

#### **Mission Driven Data Analysis**



- Assuming configuration SEU cross-sections are strict upper-bounds:
   Does the survivability prediction using the configuration SEU cross-sections per device satisfy mission requirements?
  - Can I stop here? If mission requirements are satisfied, then readily available configuration SEU cross-sections can be used.
  - Additional testing might be required to investigate device anomalies.
- Assuming essential-bit SEU cross-sections are strict upper-bounds:
   Will the essential bit SEU cross-sections satisfy mission requirements?
  - In most cases, this will still be a strict upper-bound of a design's SEU susceptibility... however ... should test to verify the assumption.
  - Requires configuration read-back tests.
  - Requires RTD-MFTF testing.
- If MFTF SEU results are not mission compliant, is mitigation necessary?

# If Upper-bounds Satisfy Mission Reliability/Survivability Requirements, Then No Mitigation is Required.





### Xilinx SEU Test and Analysis: What Can the Manufacturer Provide?



#### **Front-end Proof of Concept**

 $\sigma(LET)_{\text{Essential\_bit}} = Essential\_bits \times \sigma(LET)_{configuration\_bit}$ 

- Goal is to determine if generic data can be extrapolated to characterize complex tactical designs.
- Providing DFF, CLB, and LUT generic test data is not extrapolatable.
  - Topology effects are non-linear and does not include hidden logic.
- An alternative is to prove  $\sigma(LET)_{Essential\ bit}$  is an upper-bound to  $\sigma(LET)_{SEF}$ .



Manufacturer performs a variety of tests (benchmarks) to compare  $\sigma(LET)_{Essential\ bit}$  to  $\sigma(LET)_{SEF}$ .



Manufacturer provides generic data: configuration, BRAM, and embedded logic cross-sections.



Manufacturer performs additional testing to investigate potential SEFIs and other device SEU susceptibilities (global routes and SEL).

### Xilinx SEU Test and Analysis: What Does The End-User Do with The Data?



#### **Application of Concept**

Intellectual property (IP)

- If  $\sigma(LET)_{Essential\_bit}$  proves to be a satisfactory upper-bound, the  $\sigma(LET)_{configuration\_bit}$  data and the tactical design's calculated essential-bits can be used by development teams for survivability analysis.
- In the past,  $\sigma(LET)_{Essential\_bit}$  has been assumed (by some) to be adequate for survivability prediction. However, as technology shrinks the need for RTD-MFTF testing and proof of concept is growing:
  - Mixed-signal circuitry, global-routes, and hidden logic (embedded IP cores) will have more impact on  $\sigma(LET)_{\rm SEF}$  at low LETs.



Compare your design to manufacturer benchmark tests. Use  $\sigma(LET)_{\rm Essential\_\it{bit}}$  for survivability calculations if  $\sigma(LET)_{\rm Essential\_\it{bit}} > \sigma(LET)_{\rm SEF}$ 



If manufacturer data show anomalies or your tactical design has untested complexities, additional RTD testing will be needed.



The end-user should not piecemeal small grained components (e.g., CLBs) for survivability analysis because of hidden logic and topological non-linearities.

#### **Kintex UltraScale SEU Cross-Sections**





#### $\sigma_{\text{ssential\_bit}} > \sigma_{\text{SEF}}$

Implies  $\sigma_{Essential\_bit}$  can be used to predict survivability (non-mitigated design).

More testing will be performed to investigate if there are SEFIs and if upper-bound holds across complex designs (e.g., embedded processors); and higher LET.

#### **Mitigation Analysis**



- If the survivability analysis proves the design implementation does not satisfy mission requirements, user-inserted mitigation might be necessary.
  - This will change the design and its essential-bit count.
  - Essential-bit upper-bounds cannot be used to measure the survivability of applications with embedded mitigation.
    - Mitigation requires additional logic
    - Additional logic will increase the essential-bit count and consequently increase the estimated  $\sigma_{\text{SFF}}$ .
  - RTD-MFTF testing is required to measure the efficacy of the inserted mitigation. Can't assume mitigation performs as expected.
  - Requires the development team to perform SEU testing.
- Should analyze the design with-mitigation and without-mitigation (when possible)... used as another metric for the fidelity of the inserted mitigation.



#### **Summary**



Single event transient(SET)
Single event latchup(SEL)
Single event functional interrupt (SEFI)

- Purpose of the work is to improve SEU data-sets used for survivability analysis.
- Generic SEU data obtained from testing simple structures (e.g., shift registers) are no longer adequate for SEU characterization of FPGA designs.
- An approach is presented that combines investigating simple and complex test structures:
  - Investigates the efficacy of using configuration SEU data with design specific information for survivability analysis.
  - Goal is to reduce the necessity of performing SEU testing on every design.
  - MFTF testing of complex structures is required to validate the approach (per SRAM-based FPGA family of devices).
- Xilinx Kintex UltraScale data are presented:
  - Data suggest that essential-bit SEU cross-section might be a reliable dataset for survivability analysis.
  - Additional testing by Xilinx is required and will be performed... yet initial results are promising.
  - Eventually, this approach can reduce the need for testing by the end-user.
- If mitigation is required,  $\sigma(LET)_{\rm SEF}$  RTD-MFTF testing is required to be performed/orchestrated by the end-user.