An Analysis of Heavy-Ion Single Event Effects for a Variety of Finite State-Machine



Melanie Berg, AS&D Inc. in support of NASA/GSFC Melanie.D.Berg@NASA.gov

**Kenneth Label: NASA/GSFC** 

Hak Kim, Anthony Phan, Christina Seidleck: AS&D Inc.

### **Acronyms**



- Device Under Test (DUT)
- Edge-triggered flip-flops (DFFs)
- Error Correction and Detection (EDAC)
- Finite state machine: (FSM)
- Field programmable gate array (FPGA)
- Input output (I/O)
- Linear energy transfer (LET)
- Localized triple mode redundancy (LTMR)
- Low cost digital tester (LCDT)
- Probability of logic masking (P<sub>logic</sub>)
- Radiation Effects and Analysis Group (REAG)
- Single event transient (SET)
- Single event upset (SEU)
- Single event upset cross-section (σ<sub>SEU</sub>)

# FSMs Implemented in FPGAs Targeted for Critical Applications

- FSMs are used to control operational flow in FPGA devices.
- Because of their ease of interpretation, FSMs simplify the design and verification process and consequently are significant components in a synchronous design.
- By definition, the current state of an FSM is stored in DFFs
- Significance: can be detrimental to system operation if an FSM were to change its state due to an SEU in one of its DFFs



# Motivation: FSM Mitigation and SEU Testing



- Techniques have been employed to FSMs that either:
  - correct the current state of an FSM,
  - detect incorrect state transition, or
  - Auto-transition to a new state if an un-mapped state is reached ("safe state-machine" which is very UNSAFE).
- Currently no heavy-ion or proton SEU studies have been performed that measure the efficacy of any of these mitigation approaches.

#### **Overview**



- Define FSMs and various mitigation strategies that can be applied to them.
- Discuss Goal of SEU testing: to investigate mitigation efficacy while varying frequency and giving attention to global route SEEs.
- Discuss a scheme that can be used to test the efficacy of SEU FSM mitigation strategies and provide corresponding SEU test data

We used the Microsemi ProASIC3 and the Virtex-5QV as DUTs. Data presented is from the ProASIC3 SEU testing.

#### Synchronous FSMs and SEUs



- A synchronous FSM is designed to deterministically transition through a pattern of defined states
- A synchronous FSM utilizes
   DFFs to hold its current
   state, transitions to a next
   state controlled by a clock
   edge and combinatorial
   logic, and only accepts
   inputs that have been
   synchronized to the same
   clock
- FSM SEUs can occur from:
  - Caught data-path SETs
  - DFF SEUs
  - Clock/Reset SETs



#### **Mapping States into DFFs**



- Each state of an FSM must be mapped into some type of encoding (pattern of bits) stored in DFFs
- Once the FSM state is mapped into a DFF state, it is considered a defined (legal) state
- Based on the number of DFFs used (N), the total number of available DFF state mappings is 2<sup>N</sup>
- Unmapped DFF states are considered illegal states

2<sup>3</sup>=8 available DFF states

5 out of the 8 states are mapped

3 out of the 8 states are unused

 Other encoding schemes can be employed that use more than 3 DFFs.



#### 5-State FSM Binary Encoding Example



#### 5-State Finite State-Machine



Example of an FSM used to control a 5-State peripheral device encode

5-State Finite State-Machine Binary Encoding State 0 100 State 4 **State** State 3 State 2

5-State FSM with each state encoded as binary numbers.

An SEU can change current state and cause a catastrophic event

## **EDAC: Corrective FSM Mitigation**

- NASA
- Corrective FSM mitigation (as defined in this presentation) is a scheme that masks and corrects SEUs so that incorrect FSM state transitions do not occur
- Scope of presentation focuses on two corrective mitigation approaches:
  - Localized triple modular redundancy (LTMR)
  - Hamming Code-3
- Auto transitioning ("safe state-machine") is a reaction to a small subset of incorrect transitions (unmapped states). They do not protect against incorrect transitioning and are not in the scope of this presentation

## **Adding Corrective Mitigation**



- LTMR: Triplicate each DFF and use a majority voter.
  - The triplication + voter is treated as one DFF
  - Encoding doesn't change
  - Resultant FSM has 3 times the number of DFFs than the original encoding scheme.
  - Combinatorial logic (not including the voters) does not change
- Hamming Code-3: requires a new encoding scheme.

#### **Binary versus LTMR FSMs**





LTMR implementation: only change is each DFF is triplicated. Majority voter is used across the triplication.

#### Synchronous LTMR FSMs and SEUs

NASA

- Triplication plus majority voter protects against SEUs in DFFs
- No mitigation in Data-path, consequently, data-path SETs can get caught by



 If global routes (clocks and resets) are not hardened, then SETs can global affect DFF states

# FSM Fault Tolerance: 5-State Conversion to a Hamming Code-3

**FSM** 





Hamming Code-3 FSM Diagram for a 5 Base-State FSM: Would need 5\*7=35 FSM states to be represented... 6 DFFs

A closer look at a base-state (state 0) and its companion-states



# SEU Testing of FSMs: Efficacy of mitigation while investigating how frequency and global routing affect FSM $\sigma_{\text{SEU}}$ s

LETs lower than 10MeV\*cm²/mg are used. Otherwise, global route SEUs dominate.

# ProASIC3 SEU Heavy-Ion Test Structures:



- No error detection and correction: 8-bit Binary Encoding:
  - 256 FSM states total
  - Binary: 1 DFF per bit requires 8 DFFs
- Local triple modular redundancy (LTMR): 8-bit Binary Encoding:
  - 256 FSM states total
  - LTMR: 3 DFFs per bit requires 24 DFFs
- Hamming Code-3: 5-bit encoding:
  - 32 FSM states total
  - Hamming Code-3 must represent all states plus their companion states and requires 9 DFFs

For statistical analysis, a large number of each of these FSMs are implemented.

## ProASIC3 FSM Heavy-Ion SEU General

**Test Structure Diagram** 

REAG Counter Array concept is used. FSMs

replace Counters.



## NASA

#### **ProASIC3 Heavy-Ion FSM SEU Testing**



SEU cross-sections per FSM.
Scale is Log-Linear

SEU cross-sections for global routes: (clocks and resets). Scale is linear-linear

#### **Novelty of SEU FSM Results**



- The efficacy of previous EDAC+FSM studies was proven by means of theory or by fault injection in softconfiguration SRAM Based FPGAs. Problems:
  - Theory doesn't take into account data-path SETs and global route upsets
  - EDAC implementations with FSMs are not worth-while schemes in soft configuration devices. This cannot be uncovered using fault injection because global route SETs and frequency response cannot be fully investigated with fault injection.
  - In general, previous studies have no regard to LET (size of SET), global routes, or frequency of operation
- This is the first study to investigate FSM SEU response to heavy-ions while taking into account frequency, SETs, and global routing effects.

#### **Conclusions**



- Utilizing the Snap-Shot test scheme has shown to be a reliable approach for investigating FSM SEEs.
- Analysis of non-mitigated FSM data shows that it cannot be assumed that the FSM-σ<sub>SEU</sub>s will increase across frequency.
  - Well mitigated (e.g., LTMR and Hamming-3) FSM- $\sigma_{\text{SEU}}$ s increase across frequency
  - Non-mitigated FSM- $\sigma_{SEU}$ s decrease across frequency
- Well-mitigated FSM- $\sigma_{\text{SEU}}$ s will be lower than non-mitigated FSM- $\sigma_{\text{SEU}}$ s
- Global routing:
  - A trade should be made prior to deciding whether to use mitigation because the global routing SEUs may be significant enough to erase the gains from additional mitigation circuitry
  - At lower frequencies, mitigation will reduce global routing  $\sigma_{\text{SEU}}$ s