This paper presents a measurement circuit structure for capturing SET pulse-width suppressing pulse-width modulation and within-die process variation effects. For mitigating pulse-width modulation while maintaining area efficiency, the proposed circuit uses massively parallelized short inverter chains as a target circuit. Moreover, for each inverter chain on each die, pulse-width calibration is performed. In measurements, narrow SET pulses ranging 5 ps to 215 ps were obtained. We confirm that an overestimation of pulse-width may happen when ignoring die-to-die and within-die variation of the measurement circuit. Our evaluation results thus point out that calibration for within-die variation in addition to die-to-die variation of the measurement circuit is indispensable.
Introduction
As circuit integration advances to a very large scale, neutron induced soft error is becoming an actual concern even at sea level. Especially in combinational logic, single event transient (SET) is a threat to degrade circuit reliability. After propagating through several combinational logic gates, a neutron-induced SET pulse finally arrives at a memory element and may be captured depending on the pulse-width and clock timing [1] . Therefore, it is important for calculating SET-induced error rate to characterize the probability distribution of SET pulse-width.
Recently, several measurement circuits for obtaining SET pulse-width have been proposed. Figure 1 shows a popular SET measurement structure with a pulse-width measurement circuit. It is implemented on a chip with a target circuit where SETs are expected to occur, which is often realized by a combinational gate chain. On the other hand, because of limited irradiation time, area efficient implementation of measurement and target circuit is necessary. For this purpose, conventional measurements often adopted a very long gate chain as a target circuit to increase the area ratio of the target circuit to overall measurement circuit. However, Manuscript received September 12, 2013 . Manuscript revised January 9, 2014. † The authors are with the Department of Information Systems Engineering, Osaka University, Suita-shi, 565-0871 Japan.
† † The authors are with JST, CREST, Tokyo, 102-0075 Japan. † † † The author is with the School of Systems Engineering, Kochi University of Technology, Kami-shi, 782-8502 Japan.
a) E-mail: harada.ryo@ist.osaka-u.ac.jp b) E-mail: mitsuyama.yukio@kochi-tech.ac.jp c) E-mail: hasimoto@ist.osaka-u.ac.jp d) E-mail: onoye@ist.osaka-u.ac.jp DOI: 10.1587/transfun.E97.A.1461 such a long combinational chain involves pulse-width modulation due to propagation induced pulse broadening (PIPB) [2] and performance mismatch between rise and fall delays [3] , which may excessively increase the pulse-width or vanish the pulse itself. In this case, the measured pulse-width distribution becomes totally different from that in actual circuits. Furthermore, process variation fluctuates the performance of measurement circuit. As technology advances, within-die process variation becomes significant, which means measurement circuits even on the same die do not have the same performance, especially in low voltage operation. Therefore, minimization of pulse-width modulation and elimination of process variation effects are necessary to obtain accurate SET pulse-width distribution.
In this work, to overcome the above pulse-width modulation while maintaining area efficiency, we present a measurement circuit structure with parallelized shallow inverter chains. We introduced OR cones to converge the outputs of the inverter chains into a pulse-width measurement circuit. In addition, for calibrating within-die performance variation and performance mismatch of measurement circuit including the converging paths, we embed a calibration circuit that can exercise each chain and obtain the response of pulsewidth measurement circuit. We confirm that the proposed structure can achieve higher area efficiency by increasing the number of parallelized chains. Note that [4] presented a similar target circuit structure with some parallel short gate chains, but the number of chains in parallel is much larger in this work. In addition, [4] measured the relation between clock frequency and SET-induced errors captured by a latch, and did not measure the SET pulse-width distribution directly. Therefore, the calibration of measurement circuit is not explicitly considered in [4] . Experimental results of neutron irradiation tests using 65 nm test chips show that the SET pulse-widths observed in our 512-parallelized 10-stage inverter chains are statistically narrower than results previously reported using a long combinational chain. CaliCopyright c 2014 The Institute of Electronics, Information and Communication Engineers bration results indicate that pulse-width distribution may be overestimated when die-to-die and within-die variations of the measurement circuit are not considered. Calibration results also show that the response of measurement circuit is influenced by not only die-to-die variation but also withindie variation, and clearly point out the importance of withindie variation elimination.
The remainder of this paper is organized as follows. Section 2 introduces the proposed measurement structure. Section 3 presents the irradiation experimental results and calibration results of 65 nm test chip. Section 4 discusses the impact of process variation in the measurement circuit on SET pulse-width measurements. Section 5 concludes this paper. Figure 2 illustrates the proposed circuit structure for measurement. We use parallelized inverter chains as a target circuit and bundle them to a single pulse-width measurement circuit † . Using this structure, while we can suppress the pulse-width modulation by using short inverter chains, we can increase the area ratio of the target circuit to the measurement circuit. Among circuits previously proposed for pulse-width measurement, we selected a two-stage measurement circuit that consists of a pulse-to-time converter and a time-to-digital converter [7] . In this work, Vernier delay line (VDL) is used as the time-to-digital converter.
Proposed Measurement Structure
VDL is composed of two buffer chains and a D-type latch chain, as illustrated in Fig. 3 . Two step signals (START and STOP) with T time difference are given to this circuit, and T is to be measured by VDL. The buffer delay of the chain for START signal t 1 is larger than that for STOP t 2 . START chain gives clock signals and STOP chain provides data signals to latches. START and STOP signals race and finally STOP signal overtakes START signal. When START and STOP signals propagate through one stage, the time difference between them, which was initially T at the input, is reduced by t r = (t 1 − t 2 ). Latches at which the time difference becomes 0 or below store 1 and the others latch 0. Letting N denote the number of latches storing 0, the time difference T is estimated by where t s is the setup time of a latch. Figure 4 shows the circuit configuration of the pulseto-time converter. The output of the lower FF first changes from low to high at the rising edge of SET pulse, and subsequently that of the upper FF transitions at the falling edge. Thus, START and STOP signals with T time interval are generated.
Using these circuits, an SET is first converted to two step signals whose interval is equal to SET pulse-width T as shown in Fig. 2 , and then the time interval T is digitalized by VDL. For guiding SETs occurring in parallelized chains into the VDL, we insert two convergent cones of OR gates for the first and second step signals between the target circuit and VDL. Note that while SETs generated in the target circuit trigger VDL, an SET invoked inside one of the OR cones delivers only a single step signal to VDL and hence cannot trigger VDL, which means SETs occurring in OR gates are discarded. Hereafter, the path between the output of target circuit and VDL is called converging path.
Die-to-die variation varies the pulse-width modulation of OR cones and the time resolution of VDLs between any test chips. In addition, due to within-die variation, the pulsewidth modulation of OR cones varies for each converging path even in a chip. As for VDL, within-die variation causes non-uniform time resolution for each VDL stage even in a chip. To obtain a precise SET pulse-width distribution by overcoming die-to-die and within-die variations of the measurement circuit, the following two requirements must be satisfied.
R1: chain-by-chain pulse-width calibration
We need to calibrate the fluctuation of VDL time resolution for every VDL stage to eliminate within-die variation in VDL. This VLD calibration is performed for every converging path to eliminate within-die variation in converging paths. Die-to-die variations in VDL and converging paths can be excluded by performing these calibrations for every chip. This means that for every converging path and VLD in every chip must be calibrated.
R2: indication of a chain where an SET occurs
To apply an appropriate calibration result from the calibration results prepared for every converging path, we need to know where the observed SET occurred. Figure 5 explains the proposed calibration scheme for each inverter chain to satisfy R1. An on-chip pulse generator injects a pseudo SET pulse whose width can be precisely and finely tuned into the output of inverter chain and its width is measured in VDL. The chain selector determines the inverter chain where a pseudo SET is injected. The onchip pulse generator needs to satisfy; 1) the time resolution of pulse-width is finer than that of VDL for calibrating the fluctuation of VDL time resolution for every VDL stage, 2) the pulse-width range that can be generated covers SET pulse and VDL time ranges, and 3) the generated pulse-width can be assessed in another way.
The on-chip pulse generator used for the implementation is mainly composed of an XOR gate, selectable delay elements, and a bit counter. In this circuit, an input transition is converted into a pulse whose pulse-width is equal to the propagation delay of the delay element (t d1 ) as shown Fig. 6(a) . By selecting a delay element and adjusting its power supply voltage, we can continuously change the pulse-width, which means 1) and 2) are satisfied.
Meanwhile, t d1 is influenced by manufacturing variability, and then t d1 should be measured after fabrication to assess the width of the generated pulse. The assessment process of delay element is explained in the following. The oscillating period of the path without delay element (Fig. 6(b) ) is calculated from the counts in the large-bit counter while it is configured to oscillate. Similarly, the oscillating period of the path including t d1 (Fig. 6(c) ) is calculated. Based on these two oscillating periods, we can obtain t d1 . This satisfies 3).
Besides, to identify and record the inverter chain where an SET occurs, i.e. for R2, we add a pulse detector to the output of each inverter chain. Based on the VDL outputs obtained in the calibration for the inverter chain where an SET occurs under irradiation, we can compute the pulsewidth at the output of the target circuit. With this procedure, the performance mismatch of the converging paths and the variation of VDL time resolution due to within-die and dieto-die variations can be eliminated. Note that the mismatch of the converging paths is also caused by design difference due to, for example, wire length difference in addition to within-die variation. On the other hand, the presented procedure can eliminate both of them, though the design difference will not be explicitly discussed in the rest of this paper.
Irradiation Experiment and Calibration
To confirm the operation of the proposed circuit, a test chip was fabricated in 65 nm process. A micrography of the test chip is shown in Fig. 7 . The test circuit is composed of 512-parallelized 10-stage inverter chains, 230-stage VDL, and other calibration circuits. In a circuit whose clock frequency is high, SET pulses are more likely to be captured in FFs [4] . In such high speed circuits, the number of gate stages in combinational circuits is quite small (e.g. 10). To precisely evaluate SET-induced error rate in high speed circuits, 10-stage inverter chain is selected as the target circuit. The inverter chains consist of standard-size inverters and tap cells are placed in 10 μm interval. This 230-stage VDL was designed to achieve 4.8 ps time resolution. The number of stages in VDL is determined so that the measurable range of SET pulse-width can cover the pulse-widths reported in literature [7] , [8] . The upper bound of measurable pulse-width is about 10 ns. In this implementation, larger-sized buffers are used for VDL to suppress the within-die variation. It is 12 times larger than the standard buffer. The number of parallelized inverter chains is determined from the available silicon area. Figure 8 shows the area ratio of target circuit to overall measurement circuit when the number of inverter chains in parallel is varied. Here, the overall measurement circuit includes all the circuits necessary for measurement including target, pulse-width measurement, and calibration circuits. The area ratio of the test chip configuration is 16%, and we 9 An example of calculating a VDL time resolution by using two regression lines through the assessed pulse-width (which equals to t d1 ) and the calibrated VDL outputs.
can raise it by increasing the number of inverter chains in parallel.
Neutron irradiation tests were performed at Research Center for Nuclear Physics (RCNP), Osaka University. The average flux density of used neutron beam was 2.17 × 10 9 cm −2 h −1 . [9] reported that the neutron beam of RCNP reproduced well the neutron energy spectrum at the sealevel. Therefore, the distribution of SET pulse-width at the sea level can be measured in this accelerated test. We obtained 63 SETs by measuring 26 of 31 fabricated test chips in 0.8 V operation for 18 hours.
We performed calibration by injecting a pseudo pulse to VDL through convergent OR cones. Using the calibration scheme explained in the previous section, we can obtain precise non-uniform time resolution of 230-stage VDL for each of 512 inverter chains in 26 test chips. However, we here calibrate only 63 pairs of VDL and inverter chains in which SETs occurred for the sake of simplicity and obtain 63 variations of VDL time resolution. Figure 9 exemplifies the calculation of VDL time resolution used in this paper. We here evaluated VDL outputs by sweeping VDD d of on-chip pulse generator and injecting a pulse generated at each VDD d to VDL propagating through a selected converging path, and obtained the relation between the number of flips in the VDL latch chain and VDD d (Fig. 9(a) ). Furthermore, we assessed the propagation delay of the delay element t d , which equals to the width of injected pseudo pulse, at different voltages ( Fig. 9(b) ). We then obtain two regression lines; VDL outputs versus VDD d (regression line 1) and t d (equals pulse-width) ver- sus VDD d (regression line 2). By deleting VDD d from these two regression lines, we obtained the relation between the assessed pulse-width and the calibrated VDL outputs (Fig. 9(c) ), and finally calculated the average VDL time resolution as a slope of this linear function for a converging path on a chip. Note that the non-uniform time resolution within a VDL, which originates from the error residue in the regression in Figs. 9(a), (b) and is caused by within-die variation, is ignored in this calibration procedure. The calibration that explicitly considers this non-uniformity is a future work.
This calibration, i.e. the computation of average VDL time resolution, was carried out for 63 pairs of converging path and VDL on different chips where SET pulses were observed in the irradiation test, as mentioned earlier. This means that die-to-die variation both in VDL and converging paths and within-die variation in converging paths can be eliminated with this calibration. While the number of 63 SET samples may not be enough to discuss the range of SET pulse-width, the impact of the presented calibration on the distribution of SET pulse-width can be estimated and the average pulse-width can be roughly compared. Figure 10 depicts the calibrated pulse-width distribution in 10 ps time resolution. The time range and average pulse-width of measured SETs are 5 ps to 215 ps and 67 ps, respectively. The measured SET widths are obviously narrower than those of several former results which used a long combinational logic chain, e.g. the average pulse-width is about 600 ps in [8] which used a chain of 1,000 minimumdrive-strength inverters in 90 nm process. This tendency is consistent with recent results [3] , [10] . The pulse-width modulation must be carefully eliminated for SET pulsewidth characterization.
On the other hand, Fig. 11 shows the pulse-width distribution calculated with the uniform VDL time resolution obtained by circuit simulation. The time range and average pulse-width of measured SETs are 9 ps to 126 ps and 54 ps, respectively. Clearly, the calibrated pulse-widths of Fig. 10 are widely distributed compared to non-calibrated ones of Fig. 11 . While the difference in average pulse-width be- tween two situations is not significant, the difference in maximum pulse-width may cause unexpected reliability degradation. For example, when a delay element of 126 ps is inserted into BISER [11] , which can eliminate SETs whose pulse-width is narrower than the propagation delay of the delay element, to filter out all SETs, 5 measured SETs cannot be filtered out and actually propagate to FFs, which may cause failures. This manifests the importance of calibration for SET pulse-width distribution. Figure 12 shows the histogram of VDL time resolution after calibration, which eliminated within-die and die-to-die variations in the converging paths and die-to-die variation in VDL. As mentioned above, the within-die variation in VDL is not calibrated, and hence it is not discussed in this paper. The time resolution varies between 1.88 ps to 13 ps, and the average is 5.66 ps. A possible reason of such a large variation in VDL time resolution is the vulnerability of latches under within-die variation reported in [12] , since the VDL was operating at 0.8 V while the nominal voltage the foundry suggested is 1.2 V. To achieve more robust VDL time resolution, the variation-tolerant FF design presented in [12] is necessary.
Discussion
Because of die-to-die variation in VDL and die-to-die and within-die variations in the converging paths, without an appropriate calibration, we may misestimate pulse-width distribution. Let us show how the pulse-width distribution may vary depending on calibration approaches. Figure 13 shows three pulse-width distributions with different calibration approaches listed below. Fig. 13(a) : within-die and die-to-die variations in the converging paths and die-to-die variation in VDL are eliminated using the calibration in the previous section. Thus, this figure is identical to Fig. 10 . Fig. 13(b) : the largest time resolution of 13 ps from 63 calibration results (Fig. 12) is selected for calibration. In this case, within-die and die-to-die variations in the converging paths and die-to-die variation in VDL remain. Fig. 13(c) : die-to-die variation in the converging paths and VDL is eliminated. For each chip, a representative converging path is selected and its calibration result is used for the other converging paths on the same chip. In this experiment, the largest time resolution on the chip is chosen as the worst case. With this calibration approach, within-die variation in the converging paths remains.
Figure 13(b) shows the possible widest pulse-width distribution when only a pair of the converging path and VDL is calibrated on a chip and the result is applied to the other pairs even on different chips. Time range and average pulse-width of this pulse-width distribution are 20 ps to 371 ps and 156 ps, respectively. This result clearly indicates the need for the post-fabrication calibration.
Next, Fig. 13(c) represents the widest possible pulsewidth distribution when the calibration is performed only for a single converging path per chip. Time range and average pulse-width of this pulse-width distribution are 11 ps to 218 ps and 80 ps, respectively. Even after eliminating dieto-die variation, the average pulse-width could be overestimated by 19%. Furthermore, when looking at the overestimation of individual SET measurements, the maximum overestimation reaches 67 ps, for example 110 ps to 177 ps.
We finally show how the number of flips in VDL varies chip by chip even though the identical pulse is given. Figure 14 shows the VDL response of 23 chips in the case that the identical 200 ps pulse was injected. For each chip, about 3 converging paths were calibrated, and their averages and standard deviations of VDL bits are presented. Here, die-todie variation changes the average, and within-die variation causes the standard deviation. We can see the range of ±σ of chip 3 (26.5 bit) is comparable with, or rather, larger than the range of die-to-die ±σ of all the chips (19.6 bit). This means calibration for die-to-die variation, which was carried out in [3] , is not sufficient, and the effect of within-die variation should be eliminated as well.
Conclusion
In this paper, we proposed a measurement circuit structure for capturing SET pulse-width suppressing pulse-width modulation and within-die variation of the measurement cir-cuit. The proposed circuit uses parallelized short inverter chains as a target circuit to mitigate pulse-width modulation while maintaining area efficiency. Furthermore, each converging path and time-to-digital converter on each fabricated chip can be calibrated. Under neutron irradiation, we observed narrow SET pulses whose width range is 5 ps to 215 ps. Calibration results also show that highly overestimated pulse-width distribution may be obtained when ignoring both variations, and point out that calibration for withindie variation of the measurement circuit is indispensable. As future works, we will evaluate non-uniform time resolution within a VDL and estimate its impact on SET pulse-width distribution.
