# A Timing-Monitoring Sequential for Forward and Backward Error-Detection in 28 nm FD-SOI

Andrea Bonetti\*, Jeremy Constantin\*, Adam Teman<sup>†</sup>, and Andreas Burg\*

\* École Polytechnique Fédérale de Lausanne (EPFL), Switzerland <sup>†</sup> Bar-Ilan University, Israel andrea.bonetti@epfl.ch, adam.teman@biu.ac.il, andreas.burg@epfl.ch

Abstract—The increasing impact of variability on nearthreshold nanometer circuits calls for a tighter online monitoring and control of the available timing margins. Error-detection sequentials are widely used together with error-correction techniques to operate digital designs with such carefully controlled far-below-worst-case margins, ensuring their correct operation even in the presence of uncertainties and variations. However, these registers are often designed only to either detect setup timing violations or to measure the available positive timing slack for a small detection-window. In this paper we propose a timingmonitoring sequential that provides both timing-monitoring modes, which can be selected at run-time depending on the desired timing-monitoring strategy. As the detection window of the presented circuit depends on the duty-cycle of the clock, either slow paths or fast paths can be monitored and measured with wide timing windows. The performance of this timing-monitoring sequential is evaluated in a 28 nm FD-SOI process with postlayout simulations which show that the circuit is able to monitor a positive timing slack as small as 140 ps or to measure a path delay as fast as 50 ps. The proposed circuit is applied to a digital multiplier that was fabricated in a test chip and measurements show that the timing-monitoring sequentials are able to measure the critical path of the multiplier with a 1% accuracy and without incurring any timing violation.

## I. INTRODUCTION

As portable devices became more popular, minimization of the power consumption in digital circuits through voltage scaling has become a key requirement. However, the reliability of circuits in nanometer nodes is significantly degraded under voltage scaling, as both process variations and dynamic variations such as changes in temperature, aging, and voltage noise have a more severe impact on the circuits when operated in the near-threshold regime, resulting in a very high timing uncertainty [1]. For this reason, in conventional high-volume manufacturing, costly guard-bands are generally required on top of the intrinsic supply voltage to ensure the reliable operation under any expected environment [2].

However, not all variations exhibit a high frequency; for example, temperature variation and aging change relatively slowly. Therefore, the system often operates under non-critical conditions, where the timing conditions are met even for a small or even no guard-band. Several techniques have been proposed to trim the guard bands at run-time with the use of timing-monitoring circuits, such as replica circuits [2] and error-detection sequentials (EDSs) [3]. These circuits track any change on the critical path of the design, allowing the system to adapt and thus maximize the power savings while ensuring the reliable operation.

In the context of EDSs, many circuit solutions have been proposed in the literature [4]–[6], mainly to reduce the design overhead of the error-detecting part of the registers [4]. These *in-situ* timing monitors are usually implemented at the end

point of the critical path and of paths that are close to being critical. The basic principle of these registers is to sample the input data twice: once at the active clock edge and once during an error-detection time-window that starts with the active clock edge and generally coincides with a part of the high-phase of the clock. If the two sampled values differ, a late-arriving transition on the data input of the register is detected, which signals a setup timing-violation. In addition to the detection of late-arriving signals, short paths that arrive at the same endpoint may excite the data input of the register within the errordetection window, albeit in the same clock cycle in which the data is launched, before the next active clock edge. This type of event erroneously signals a violation in the timing monitor. Common approaches to overcome such short-path problems, are to add hold buffers to short paths or to reduce the length of the error-detection window [7].

A limitation of many EDSs is that they only generate an error signal after a timing violation has already occurred, usually to activate an error-correction mechanism. This comes with a severe overhead (e.g., pipeline stalling). As an alternative, a timing-fault sensor (TMFLT-S) which measures the available positive timing-slack of a path is proposed in [8].

With these two alternatives (EDS vs TMFLT-S) the choice of either monitoring setup timing-violations or measuring the available timing-slack of certain paths is relegated to the design phase. Sequentials allowing to choose either one of these timing-monitoring modes would enable to select at run-time the most desirable timing-monitoring strategy. However, to the best of our knowledge, they have not been proposed yet. Also, a limitation of the TMFLT-S is that it only allows to monitor paths that are *close* to become critical. Nevertheless, as the critical path of a design is not always excited, the activated paths might have a significantly shorter delay compared to the critical path, depending for example on the instruction executed by a microprocessor core [9]. For this reason, the available timing-slack needs to be measured for a wide detection-window, so these timing margins can be exploited by either increasing the operating frequency or by applying dynamic voltage scaling at run-time depending on the instruction executed in the core [10].

*Contribution:* in this paper, we propose a timing-monitoring sequential (TMS) designed in 28 nm FD-SOI which is capable of detecting transitions on the input data, during either the high or the low phase of the clock, here defined as forward-detection mode (FDM) and backward-detection mode (BDM), respectively. FDM provides detection of late-arriving transitions, as in conventional EDSs [4]. However, the additional BDM functionality of the TMS enables measurement of the available timing-slack as in the TMFLT-S [8], without affect-



Fig. 1. Circuit schematic of the TMS.

ing the operation of the circuit. The size of the detection windows in FDM and BDM are set by the high and the low phases of the clock, respectively, therefore the length of these windows can be adjusted by controlling the duty-cycle of the clock. In addition, by adjusting the low phase of the clock in BDM, the TMS can measure the timing slack of paths that are far from being critical. This capability can provide insightful information to either exploit the available timing margins [9], [10] or for offline diagnostics. Results from postlayout simulations are provided to evaluate the performance of the TMS, and both FDM and BDM are verified with measurements from a test chip fabricated in 28 nm FD-SOI process technology, which included a 16-bit digital multiplier, implemented with TMS cells.

*Outline:* the rest of this paper is organized as follows: the proposed TMS circuit is presented in Section II. The timing analyses enabled by the TMS are discussed in Section III. Section IV presents post-layout simulation results of the TMS and silicon measurements of a 16-bit multiplier implemented with TMS cells. Finally, the conclusions are reported in Section V.

# II. PROPOSED TIMING-MONITORING SEQUENTIAL

The proposed TMS is composed by augmenting a resettable D-flip-flop (DFF) with a two-stage error-detector that provides two modes of operation. The TMS is, in fact, capable of detecting any input transition occurring either on the high phase or the low phase of the clock, depending on whether it is set to forward-detection mode (FDM) or backward-detection mode (BDM), respectively. The circuit schematic of the TMS is shown in Fig. 1, where the input ports are clock (CK), input data (D), active-low reset of the DFF (RN), active-low reset of the output error (RN\_ERR) and the error-detection mode select (M), while the output ports are output data (Q) and the error flag (ERR). By assigning logic-0 or logic-1 to input M, the operation mode of the TMS is set to FDM or BDM, respectively. Whenever a transition on the input data D is detected in the active time-window, the output ERR will rise and it will remain high. This error flag is reset by enabling the RN\_ERR input to prepare the TMS to detect any transition in the next detection window.

The error-detection circuitry of the TMS is shown inside the gray dotted-lines in Fig. 1. To start, the input data is delayed and inverted by a delay line (D0) to provide a delay margin that

ensures that any transition of the input data happening at the beginning of the active time-window is correctly detected as an error. The error-detector is divided into two stages, where the first stage (M0-M11 and I2-I5) generates two internal signals (INTA and INTB). These signals show if the input data D has reached either both or only one of the logic states during the error-detection window, which corresponds to either having or not a transition on D, respectively. The second stage (M13-M19 and I6–I8) receives INTA and INTB and if a transition on D has been detected during the active window it drives the output error signal ERR to logic-1. As the internal nodes INTA, INTB and ERR\_N are driven by dynamic logic (M0-M19), latches (I2-I7) are added to each of these nodes to ensure that their logic state is kept even when there is no activity on the input ports of the TMS. Dynamic logic was used to reduce the transistor count in the error-detector, while also minimizing the delay of the error generation.

The complete truth table of the TMS is provided in Table I. For each case, it is specified if the error-detector is active and a case number is assigned for reference. The errordetector generates an error (ERR=1) when one of the following conditions is met:

- In FDM (M=0), when INTA=0 and INTB=1.
- In BDM (M=1), when INTA=1 and INTB=0.

Considering the FDM, during the low phase of the clock INTA and INTB are forced to logic-1 and logic-0, respectively, regardless of the input data value. Therefore, any transition on D does not generate any error (cases #2 and #3 in Table I). In the subsequent high-phase of the clock, the condition on INTA and INTB to generate an error is met only if both cases #0 and #1 manifest within the same high-phase of the clock, which corresponds to having a transition on D during the detection window. Note that the error is generated irrespectively to the

TABLE I Truth table of the internal error signals in the TMS

| Case Number | M | CK | D | INTA     | INTB     | Error-Detector           |  |
|-------------|---|----|---|----------|----------|--------------------------|--|
| 0           | 0 | 1  | 0 | 0        | previous | active                   |  |
| 1           | 0 | 1  | 1 | previous | 1        | active                   |  |
| 2           | 0 | 0  | 0 | 1        | 0        | idle                     |  |
| 3           | 0 | 0  | 1 | 1        | 0        | idle                     |  |
| 4           | 1 | 1  | 0 | 0        | 1        | idle                     |  |
| 5           | 1 | 1  | 1 | 0        | 1        | idle<br>active<br>active |  |
| 6           | 1 | 0  | 0 | previous | 0        |                          |  |
| 7           | 1 | 0  | 1 | 1        | previous |                          |  |



polarity of the transition (rising or falling). The same concept can be applied to BDM, where an error is generated only if both cases #6 and #7 take place within the same low-phase of the clock, which corresponds to having a transition on D during the detection window. Examples of the error detection in both FDM and BDM are shown with voltage waveforms in Fig. 2.

#### **III. TIMING ANALYSES ENABLED BY THE TMS**

The use of the TMS together with the control of the clock duty-cycle enables three different modes for timing analysis. Any one of these modes can be enabled at run-time by choosing the TMS operating mode together with setting the clock duty-cycle and frequency. The three timing-analysis modes enabled by the TMS presented in this paper are shown in Fig. 3 and described hereafter.

Setup-violation monitoring (SVM): When operated in FDM, any transition on the high phase of the clock is detected, thereby providing conventional detection of late-arriving transitions [4], as shown in Fig. 3(a). The TMS will detect any setup timing violation, therefore this timing-monitoring mode can be integrated with any previously published error-recovery technique.

*Timing-slack monitoring (TSM):* When selecting BDM and setting the high-phase of the clock to be as long as or comfortably longer than the critical path of the design, the TMS can detect any increase in the critical path delay without incurring an actual setup timing-violation, as depicted in Fig. 3(b). The error-detection window is equal to the low phase of the clock; therefore, it can be adjusted by controlling the duty-cycle of the clock. The main drawback for this timing monitor mode is the speed limitation given by the timing margin, corresponding to the low phase of the clock, which needs to be added on top of the maximum operating frequency. In addition, the lower bound of this margin is given by the minimum low-phase of the clock at which the TMS still functions correctly.

*Fast-path measurement (FPM):* The timing slack of paths that are significantly faster than the critical path can be measured using the TMS in BDM. In this case, the duty-cycle of the clock needs to be adjusted until the high phase of the clock meets the propagation delay of the path under analysis, as shown in Fig. 3(c). For this operating mode, the frequency of the clock can be set according to the critical path of the design to ensure an execution free of any timing violations. The analysis of paths that are far from being critical can give insightful information about different sub-blocks in the system



Fig. 3. timing-monitoring modes enabled by the TMS.

and it can enable adaptive dynamic voltage and frequency scaling, whenever the critical path is not excited [9], [10].

#### **IV. SIMULATIONS RESULTS AND MEASUREMENTS**

The performance of the proposed TMS designed in 28 nm FD-SOI technology is evaluated with post-layout simulations and the results are presented in this section. The FDM and BDM of the TMS are also verified with measurements of a 16-bit multiplier that was fabricated in the same technology.

## A. Post-Layout Simulations of the TMS

The operation of the timing-monitor modes, presented in Section III, requires control over the duty-cycle of the clock. However, the value of the duty-cycle is limited by the TMS, as well as by any sequential cell, because its operation is ensured only for minimum values of the low and high phases of the clock. For this reason, the minimum low-phase and the minimum high-phase of the clock that allow correct operation of the DFF as well as of the error detector in the TMS for any timing-monitoring mode have been measured with post-layout simulations at 0.9 V, considering the typical process corner and  $25^{\circ}$ C. The results are presented hereafter.

For the conventional SVM mode, during which timing errors should be detected, a short high-phase of the clock is required to relax the extra hold-buffering caused by the short-path problem that typically affects EDSs [7]. In this scenario, the TMS is able to flag late-arriving transitions for a high-phase of the clock as low as 90 ps. Considering the TSM mode which enables the measurement of the available timing slack, the detection window (low phase of the clock) represents an additional timing margin on the maximum operating frequency, which has to be minimized. Simulations show that the TMS is capable of correctly sampling the data as well as detecting transitions for a low-phase of the clock as short as 140 ps. The fastest path that can be measured in FPM mode is given by the minimum high-phase of the clock, where the TMS is capable of sampling the data as well as detecting a transition during the low-phase of the clock. For this mode, the TMS

 TABLE II

 Comparison between a baseline flip-flop and the TMS

|                                                        | D-flip-flop | TMS   | Variation     |
|--------------------------------------------------------|-------------|-------|---------------|
| Area [µm <sup>2</sup> ]                                | 3.75        | 12.56 | 3.35×         |
| Leakage current [nA]                                   | 1.61        | 4.21  | $2.62 \times$ |
| Clock-to-Q delay [ps]                                  | 53.45       | 69.79 | 1.31×         |
| $E_{\rm c}$ without activity [fJ]                      | 3.57        | 16.66 | 4.67×         |
| $E_{\rm c}$ with 100% activity [fJ]                    | 7.61        | 19.40 | $2.55 \times$ |
| $E_{\rm c}$ with 100% activity and detected error [fJ] | N/A         | 27.07 | N/A           |

provides correct operation for a high-phase of the clock as short as 50 ps.

The overhead of the error-detector in *in-situ* timing monitors is a well-researched topic and very low-complexity circuits have been proposed [4]. Despite the fact that the proposed TMS was not optimized for area, speed, or power, the resulting overhead compared to a baseline DFF has been measured in post-layout simulations for completeness. The results are reported in Table II where  $E_c$  is the energy-per-cycle and reported for 0% and 100% activity factors on input D of the DFF, as well as with and without an error-detection event. It is worth mentioning that, in a large design, this overhead would only affect the registers that are actually replaced with TMS cells. Therefore, the actual overhead due to the use of TMS cells highly depends on the considered design and on the number of monitored end-points.

### B. Measurements of a Multiplier Using TMS Cells

A test structure consisting of a  $16 \times 16$ -bit radix-4 Boothrecoded digital multiplier with TMS cells sampling the outputs was implemented in a 28 nm FD-SOI test-chip to verify the proposed *in-situ* timing monitor. Unfortunately, as the provided internal clock-generator could not provide control on the dutycycle of the clock, only few basic checks could be performed. For this reason, the presented measurements are conducted using a 50% duty-cycle.

At 0.7 V, the operation of the fabricated multiplier is verified up to a maximum frequency of  $f_{\text{max}}$ =398 MHz. When operating in SVM mode, the TMS cells start generating error signals as soon as the operating frequency is larger than  $f_{\text{max}}$ . On the other hand, when operating with a frequency close to, but not larger than  $f_{\text{max}}$ , no error signal is generated, proving the correct operation of both the FDM and the SVM mode. It is worth noting that, for the presented design, when operating *close* to  $f_{\text{max}}$ , the short-path problem [7] is not present, as the contamination delay of the multiplier is longer than half of the clock period. However, as expected, the generation of false error-signals starts as soon as the operating frequency is set to be much smaller than  $f_{\text{max}}$ .

The BDM of the TMS, where the input transitions are detected on the low phase of the clock, was also verified with measurements. For this case, the test consisted in initially operating the multiplier with a relaxed clock frequency. In this condition, it ensured that all the input transitions of the TMS happened during the high phase of the clock, therefore avoiding the generation of any error signal. The clock frequency was subsequently increased up to the point where excitation of the critical path of the multiplier resulted in a transition of its end-point during the low phase of the clock, which was captured by the corresponding TMS. Based on the achieved frequency and considering the 50% duty-cycle, it was possible to derive the maximum operating frequency of the multiplier *without incurring an actual timing violation*. With this test, the maximum frequency of the multiplier was correctly measured with an error of less than 1%.

## V. CONCLUSION

A timing-monitoring sequential (TMS) which is capable of detecting transitions on either the high or the low-phase of the clock was presented. Controlling the duty-cycle of the clock, the proposed circuit enables the timing monitoring of all the paths that are close to being critical, where either timing violations are detected or reductions on the available timingslack are monitored. When the detecting window is chosen to coincide with the low-phase of the clock, short paths can be measured. The desired timing-analysis mode can be chosen at run-time and simulations show the TMS to operate with either a high or a low clock-phase as short as 90 ps and 140 ps, respectively. Both detection modes of the proposed TMS were verified on a fabricated test chip in a 28 nm FD-SOI technology. The silicon measurements enabled a runtime maximum frequency measurement and configuration with an accuracy of only 1% by using the TMS cells, while not incurring any timing violation.

#### **ACKNOWLEDGMENTS**

This project is partially funded by the IcySoC project of Nano-Tera.ch with Swiss Confederation financing. The authors would like to thank STMicroelectronics for chip fabrication.

### REFERENCES

- R. G. Dreslinski *et al.*, "Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits," *Proceedings of the IEEE*, vol. 98, no. 2, pp. 253–266, Feb 2010.
- [2] M. Cho et al., "Postsilicon Voltage Guard-Band Reduction in a 22 nm Graphics Execution Core Using Adaptive Voltage Scaling and Dynamic Power Gating," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 1, pp. 50–63, Jan 2017.
- [3] W. Jin et al., "In Situ Error Detection Techniques in Ultralow Voltage Pipelines: Analysis and Optimizations," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 25, no. 3, pp. 1032–1043, March 2017.
- [4] Y. Zhang *et al.*, "iRazor: Current-Based Error Detection and Correction Scheme for PVT Variation in 40-nm ARM Cortex-R4 Processor," *IEEE Journal of Solid-State Circuits*, vol. PP, no. 99, pp. 1–13, 2017.
- [5] K. A. Bowman et al., "A 45 nm Resilient Microprocessor Core for Dynamic Variation Tolerance," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 1, pp. 194–208, Jan 2011.
- [6] R. N. Tadros et al., "A Low-Power Low-Area Error-Detecting Latch for Resilient Architectures in 28-nm FDSOI," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 63, no. 9, pp. 858–862, Sept 2016.
- [7] S. Kim *et al.*, "Variation-Tolerant, Ultra-Low-Voltage Microprocessor With a Low-Overhead, Within-a-Cycle In-Situ Timing-Error Detection and Correction Technique," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 6, pp. 1478–1490, June 2015.
- [8] E. Beigné et al., "A 460 MHz at 397 mV, 2.6 GHz at 1.3 V, 32 bits VLIW DSP Embedding F<sub>MAX</sub> Tracking," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 1, pp. 125–136, Jan 2015.
- [9] J. Constantin et al., "Exploiting dynamic timing margins in microprocessors for frequency-over-scaling with instruction-based clock adjustment," in 2015 Design, Automation Test in Europe Conference Exhibition (DATE), March 2015, pp. 381–386.
- [10] J. Constantin et al., "DynOR: A 32-bit microprocessor in 28 nm FD-SOI with cycle-by-cycle dynamic clock adjustment," in ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference, Sept 2016, pp. 261–264.