Abstract-With technology scaling, lower supply voltages, and higher operating frequencies clock distribution networks become more and more vulnerable to transients faults. These faults can cause circuit-wide effects and thus, significantly contribute to the functional failure rate of the circuit. This paper proposes a methodology to analyse how the functional behaviour is affected by Single-Event Transients in the clock distribution network. The approach is based on logic-level simulation and thus, only uses the register-transfer level description of a design. Therefore, a fault model is proposed which implements the main effects due to radiation-induced transients in the clock network. This fault model enables the computation of the functional failure rate caused by Single-Event Transients for each individual clock buffer, as well as the complete network. Further, it allows the identification of the most vulnerable flip-flops related to SingleEvent Transients in the clock network.
Functional Failure Rate Due to Single-Event Transients in Clock Distribution Networks
Thomas Lange * † , Maximilien Glorieux * , Dan Alexandrescu * , Luca Sterpone †
I. INTRODUCTION
Today's reliability standards and customers' expectations set tough targets for the quality of electronic devices and systems. Among other reliability threats, transient faults, such as Single-Event Upsets (SEUs) in sequential/state logic and Single-Event Transients (SETs) in combinatorial logic, are known to contribute significantly to the overall failure rate of the system, possibly exceeding the set reliability targets. As an example, standard flip-flops and SRAM memories, manufactured in relatively recent technologies (down to the latest CMOS bulk processes) exhibit error rates of hundreds of FITs (events per a billion working hours per megabit) [1] , [2] . Complex circuits using such cells can easily overshoot the by ISO 26262 mandated 10 FIT target for an automotive ASIL D application.
Circuits' susceptibility to transient faults/single events is caused by faults occurring in the circuit's cells and their subsequent propagation in the system, possibly causing observable effects (failures) at the system level. The impact of Single-Event Upsets and Single-Event Transients in individual state and combinatorial cells has been extensively studied and for many applications, is the leading contributor to the overall event rate exhibited by the circuit. However, due to technology scaling, lower supply voltages and higher operating frequencies, other circuit features such as the clock distribution network (CDN), reset circuitry, etc. become also more vulnerable to transient faults [3] - [6] and could cause circuitwide effects that are more difficult to mitigate and to correct. Indeed, clock buffers from the clock distribution networks have a high fan-out and very few masking mechanism; SingleEvent Transients occurring in these cells can potentially reach many sequential cells and state elements and thus, significantly contribute to the overall functional failure rate.
A. Objective of Our Methodology
So far, only few works studied the impact of SETs in clock networks. To determine the sensitivity of clock buffer cells to these events, some studies performed accelerated radiation tests of dedicated test chips [5] , [7] . Other approaches computed a static failure rate by performing circuit simulation on the electrical-level and thus, obtaining the Electrical DeRating per clock buffer, as well as the upset rate of the sequential logic due to SETs in the clock network. This upset rate was combined with the functional failure rate due to SEUs in the sequential logic obtained from a SEU fault injection campaign [8] , [9] . However, their SET fault injection simulations used only static inputs and thus do not reflect any dynamic behaviour during the runtime of the circuit. Hence, [10] extended this method by injecting SETs in the clock distribution network during a dynamic electrical simulation and thus, obtaining the faulty latching activity of the sequential logic.
Nonetheless, the previous work does not analyse the impact of SETs on the functional behaviour of the circuit and furthermore, they are all based on electrical simulations. Since the complexity of today's circuits is increasing, a dynamic simulation of the full circuit on the electrical level is not feasible anymore. Thus, contrary to the previous work, the proposed fault model in this paper is based on logic-level simulation and only requires the register transfer level description of a design. This enables a faster analysis of the circuit. The proposed method is evaluated by applying it on a practical example and performing a fault injection campaign.
B. Organisation of the Paper
The remainder of this paper is organised as follows: Section II summarises the definition of Single-Event Effects and the different de-rating mechanism and relates them to the context of SETs in the clock distribution networks. The proposed methodology and dedicated fault model are described in section III. In section IV the proposed method is validated on a practical example and the functional failure rate for each clock buffer and the whole network are computed. Further, the most vulnerable flip-flops related to transients in the clock network are identified. Section V rounds off this paper by giving concluding remarks as well as prospects for future work.
II. SINGLE-EVENT EFFECT MECHANISM WITH REGARD TO CLOCK DISTRIBUTION NETWORKS
Erroneous data in one of the memory or logic points of a circuit can be produced by the propagation of a SingleEvent Transient (SET) or Single-Event Upset (SEU). SETs are the result of the collection of charge deposited by ionising particles on combinatorial logic cells. SEUs are the change of the logic state of a discrete sequential element, such as a latch, a flip flop or a memory cell.
In the data path between flip-flops, four de-rating mechanisms [11] , [12] , the impact at the function of the circuit can vary, and in many cases is benign. Thus, considering the faults at an applicative level, the de-rating depends on the criteria defining the acceptable behaviour of the circuit during the execution of an application and the fault classifications (correctable, uncorrectable, not detected by the hardware but detected by the software, if a retry is possible, if there is a time limit to receive the correct result, etc.) These structural de-ratings mechanism are used to evaluate the probability of the propagation of a fault during the clock cycle of their occurrence. They are usually estimated by using probabilistic algorithms and simulation based approaches.
For a transient in a clock distribution network (CDN), the Logical De-Rating and Temporal De-Rating is limited.
Potentially, an SET may be logically masked by a clock gating cells or an enable pin of a flip-flop. Temporal De-Rating is limited as the clock input of the flip-flop is by definition asynchronous.
In [13] two main effects are identified due to transients in the clock network: radiation-induced jitter and radiationinduced race. Jitter occurs if an transient causes the clock edge to move forward or backward causing a timing violation. A race condition occurs if a transient causes a flip-flop that is closed to become open allowing data to "race" through to the next stage.
The objective of this paper is to present a methodology to compute the functional failure rate of a circuit with regards to Single-Event Transients in the clock network. Therefore, the described radiation-induced effects are implemented in a fault model based on logic-level simulation which is presented in the next section.
III. METHODOLOGY
To analyse how the functional behaviour is affected by SETs in the clock distribution network (CDN), the main radiationinduced effects are implemented in a fault model. In order to cope with the complexity of today's circuits the proposed fault model is based on logic-level simulation, which enables a faster analysis than simulations based on the electrical level. By using this fault model in a fault simulation campaign the functional failure rate for each clock buffer and the whole network can be calculated. Further, the vulnerability of the sequential logic in relation to these events can be computed.
A. Fault Model
The proposed fault model which implements effects of Single-Event Transients propagating along the clock network is illustrated in Fig. 1a . It is based on logic-level simulations and thus, only uses the register-transfer level (RTL) description of a design. To emulate the SET in the clock network, first, a clock buffer is selected as injection target. Second, all flipflops which are connected to the end-point of the selected clock buffer are identified. Then, during the RTL simulation, for each identified flip-flop, the corresponding signal values at the flip-flop output are modified at the injection time. The SET induced clock pulse is imitated by copying the signal value from the flip-flop input signal D in to its output signal Q out as shown in Fig. 1b . Thus, only flip-flops which would have changed their state in the following clock cycle are impacted by the transient and others remain unchanged (as shown in Fig. 1c) .
The proposed fault model does not take any electrical or timing behaviour into account and thus, is representing a worst case scenario. However, it can be combined with measured cross-sections of the clock buffer cells obtained during radiation experiments, as shown in [7] , or Electrical De-Rating factors obtained from electrical level simulations (without taking the runtime behaviour into account) as described in [8] . 
B. Virtual Clock Network
The proposed method relies on the RTL model of a design. Typically, These models do not provide a clock distribution network. In general, the clock network is obtained by performing a clock network synthesis during the physical design stage of a chip. In this paper this step is simplified by generating a virtual clock network. The generation of a virtual clock network enables an analysis of the circuit in earlier design stages with regard to clock network issues (such as SETs) and allows the evaluation of different clock network features, such as the fan-out, layout or topology. In fact, recent work has shown that topology and gate load play a significant role in the overall SET sensitivity of clock networks [14] .
In the most common implementation of clock distribution networks buffers are inserted either at the clock source and/or along the clock path, forming a tree structure. Thereby, the most used topology of the networks is the symmetric H-tree which can be also seen as a binary tree topology [15] (as illustrated in Fig. 1a ). This network can be generated in a recursive manner: The root clock buffer (stage 1) is assigned to the full set of available flip-flops. The second stage has two clock buffers which are driven by the buffer of the first stage. The full set of flip-flops is split in two disjoint sets with half the size and assigned to the two different clock buffers. For the next stage of clock buffers these sets are again divided in half and assigned to separated clock buffers which are driven by clock buffers of the previous stage. This process is repeated until the defined minimum fan-out of the clock buffer is reached. The example shown in Fig. 1a consists of a set of 9 flip-flops and a minimum fan-out of 2 which results in 3 levels of clock buffer. Due to the uneven number of flip-flops the actual fan-out of the clock buffer ranges from 2 to 3.
C. Fault Injection Simulation Campaign
With the previous described fault model a logic-level simulation based fault injection campaign can be performed. Therefore, the RTL model of the considered design and a testbench is needed. The testbench allows to verify the correct behaviour of the circuit. This can be done, for example, by monitoring and recording all outputs of the circuit. The record can be used as the golden reference and any difference is considered as a functional failure.
In the fault injection campaign faults are injected into the clock buffers of the clock network at a random point in time according to the described fault model. During each fault injection the changed and unchanged flip-flops are captured and stored. After the injection, the simulation is continued. The circuit output is monitored during the whole simulation and compared to the golden reference. If, according to the monitored output, no failure on the functional level was noted, the injected fault was masked and the correct function is verified. If the functional behaviour is different to the reference, the fault is considered as a functional failure. Thus, the Functional De-Rating factor of SETs in a clock buffer and the complete clock network can be computed. Further, by tracking the flip-flop changes which lead to a functional failure the vulnerability of the sequential logic can be calculated and thus, the most vulnerable flip-flops can be identified. This information can provide guidelines to the circuit designer to improve robustness of the clock distribution network. For example, techniques for selectively harden the most critical clock buffers are shown in [6] and [16] . Further, the ∆-TMR technique can be used which hardens the sequential logic against SEUs, but also introduces delays into the data path in such a way the logic is protected against SETs in the clock signal [17] .
IV. FAULT INJECTION CAMPAIGN
In this section the presented methodology and implemented fault model is shown on a practical example. Therefore, the circuit under test and the corresponding testbench is described. Afterwards, the functional failure rate for each clock buffer and for the complete network is computed. Additionally, the most vulnerable flip-flops related to SETs in the clock distribution network (CDN) are identified.
A. Test Circuit, Testbench and Clock Distribution Network
For this case-study the Ethernet 10GE MAC Core from OpenCores is used. The circuit implements the Media Access Control (MAC) functions for 10 Gbps operation as defined in the IEEE 802.3ae standard. The 10GE MAC core has a 10 Gbps interface (XGMII TX/RX) to connect it to different types of Ethernet PHYs and one packet interface to transmit and receive packets to/from the user logic [18] . The circuit consists of control logic, state machines, FIFOs and memory interfaces. It is implemented at the Register-Transfer Level (RTL) and is publicly available on OpenCores.
The corresponding testbench writes several packets to the 10GE MAC transmit packet interface. As packet frames become available in the transmit FIFO, the MAC calculates a CRC and sends them out to the XGMII transmitter. The XG-MII TX interface is looped-back to the XGMII RX interface in the testbench. The frames are thus processed by the MAC receive engine and stored in the receive FIFO. Eventually, the testbench reads frames from the packet receive interface and prints out the results [18] . During the simulation all sent and received packages to and from the core are monitored and recorded. This record is used as the golden reference for the fault injection campaign.
By performing a simple logic translation of the design, 1233 flip-flops have been identified and matched with the corresponding RTL signal names. One virtual clock network was generated which groups flip-flops together according to their signal names and connects them to the same clock buffer. Additionally, 50 virtual clock networks were generated which connect the flip-flops to randomly selected clock buffers. The clock networks have a minimum fan-out of 16 flip-flops, which results in 7 stages and a total of 127 buffers with an actual fan-out from 19-20 flip-flops.
B. Results for SETs in the Clock Distribution Network
A fault injection campaign was performed to analyse the functional failure rate of SETs in the clock distribution network (CDN). Therefore, 170 SETs were injected in each of the 127 clock buffer of the different virtual clock networks. The faults were injected only during the active phase of the simulation, when packets are sent and received through the user packet interface.
The overall results of the SET fault injection campaign are summarized in Table I and Table II. Table I presents the results for the clock distribution network (CDN) which groups and connects flip-flops together based on their signal names. The number of reached, changed and unchanged flip-flops are listed for the entire campaign as well as the averaged number per injection. Further, the number of injections which lead to a functional failure is shown. Table II presents the results for the same metrics but averaged over the 50 different random virtual clock networks. It was noted that the values for changed and unchanged values are identical. This can be explained by the fact that the pseudo random number generator always generates the same values to determine the injection time for each fault injection campaign. Thus, for each campaign the faults are injected at the same injection times and reaching the same flip-flops (via different buffers) which results in the same state changes. However, the functional failure rate is varying among the different random clock networks and especially in comparison to the not random clock network the functional failure rate differs by a factor of 2. The most vulnerable flip-flops to SETs in the clock network are obtained by tracking the flip-flops which were reached and consequently changed their state due to an injected event and thus, led to a functional failure. Fig 2 shows the most critical 5 % of the flip-flops for one of the randomly created clock networks, ranked by the individual functional failure rate. In case of selective mitigation these flip-flops should be considered for hardening with the ∆-TMR technique [17] .
C. Results for SEUs in the Sequential Logic
The functional failure rate caused by SEUs in the flip-flops is obtained by a classical full flat statistical fault injection campaign. The SEU is emulated by modifying the stored value of a flip-flop at a random point in time during the simulation. Similar to the SET fault injection campaign, any difference in the send or received packages is considered as a functional failure of the application. (0) rx_eq0.crc32_d64(3) rx_eq0.crc32_d8 (17) rx_eq0.crc32_d8 (7) rx_eq0.crc32_d8(26) rx_eq0.crc32_d64 (16) rx_eq0.crc32_d64(21) tx_dq0.curr_state_enc (0) rx_eq0.crc32_d64 (13) rx_eq0.crc32_d8(6) rx_eq0.crc32_d8(24) tx_dq0.crc32_d64 (20) (15) rx_eq0.crc32_d64(24) tx_dq0.crc32_d64(21) rx_eq0.xgxs_rxd_barrel (7) rx_eq0.xgxs_rxd_barrel (4) tx_dq0.crc32_d64(24) rx_eq0.crc32_d8 (14) Functional Failure Rate (2) rx_hold_fifo0.fifo0.ctrl0.wr_ptr (1) wishbone_if0.cpureg_config0(0) tx_hold_fifo0.fifo0.ctrl0.rd_ptr (2) tx_hold_fifo0.fifo0.ctrl0.rd_ptr (1) tx_data_fifo0.fifo0.ctrl0.rd_ptr(0) tx_data_fifo0.fifo0.ctrl0.rd_ptr (1) tx_dq0.curr_state_pad(0) tx_data_fifo0.fifo0.ctrl0.rd_ptr (5) tx_data_fifo0.fifo0.ctrl0.rd_ptr(3) tx_data_fifo0.fifo0.ctrl0.rd_ptr (6) tx_dq0.txhfifo_wen tx_dq0.byte_cnt(6) tx_hold_fifo0.fifo0.ctrl0.wr_ptr (3) tx_dq0.crc32_d8 (9) tx_dq0.crc32_d8(25) tx_dq0.crc32_d8(28) tx_hold_fifo0.fifo0.ctrl0.wr_ptr (2) tx_dq0.crc32_d8 (16) . This is particularly important when selective hardening of the sequential logic is considered. For example, if there is a limited budget which can be used to harden flip-flops by using the ∆-TMR technique, different flip-flops need to be taken into account in order to lower the functional failure rate related to both effects.
Furthermore, the functional failure rate due to SEUs in the sequential logic is compared to the error rate due to SETs in the clock network. Therefore, Table IV summarizes the average Functional De-Rating factor per element. The Functional De-Rating factors are in the same order of magnitude but can be twice as much depending on the layout of the clock network. However, the number of sequential elements is 10 times higher and thus, the SEUs in flip-flops are the leading contributor to the overall functional failure rate of the circuit.
Considering further physical effects the Functional DeRating factor can be combined with a FIT rate obtained from a characterized standard cell library. In [19] FIT values for the NanGate FreePDK45 Open Cell Library [20] were obtained by using dedicated tools and results from radiation testing. The average values for D-Flip-Flops and clock buffers show that the FIT value for the sequential logic is about 3 times higher, which further lowers the effect of SETs in the clock network.
Nonetheless, if a fully SEU hardened circuit is considered, the Functional De-Rating of the sequential logic is lower. Depending on the implementation of the hardened cells, the sensitivity is usually about one order of magnitude lower than the one of un-hardened cells. Taking this into account, the Functional De-Rating factor would is lowered by the same amount and thus, the functional failure rates getting closer to the failure rates due to SETs. This would mean that the SETs in the clock network are almost as significant as SEUs in the sequential logic. V. CONCLUSION This paper proposes a methodology to analyse how SingleEvent Transients (SETs) in the clock distribution network are impacting the functional behaviour of a circuit. A methodology and a fault model were presented which implement the main radiation-induced effects in clock networks. The method enables the computation of the functional failure rate in a logic-level simulation based on the register-transfer level of the design. Thus, a faster evaluation can be performed than by simulating on the electrical level.
The approach was applied in a practical example. SETs were injected into the clock network of the circuit under test in a fault injection campaign. Thus, the functional failure rates of the clock network and the individual clock buffers were determined. Further, the most vulnerable flip-flops have been identified, which can be considered for selective mitigation techniques.
The proposed method uses a Virtual Clock Network which has the advantage that different clock network features can be evaluated with regard to Single-Event Effects in the clock network in an early design stage. However, the presented method can also be used in later design stages when the real clock network is available. This remains a topic for future work. In this paper two different types of clock network layouts were created. It has been noted that the layout can have a significant impact on the functional failure rate. Therefore, for future work further layouts and topologies of real clock distribution networks should be evaluated.
Finally, the functional failure rate due to SETs in the clock network has been compared to SEUs in the sequential logic. It was noted that there is almost no overlap looking at the most critical flip-flops. Further, the discussion has shown that the contribution of SETs in the clock network can be quite significant, if the circuit's sequential logic is only hardened against SEUs.
