Abstract-Special synchronizers exist for special clock relations such as mesochronous, multi-synchronous and ratiochronous clocks, while variants of N-flip-flop synchronizers are employed when the communicating clocks are asynchronous. N-flip-flop synchronizers are also used in all special cases, at the cost of longer latency than when using specialized synchronizers. The reliability of N-flip-flop synchronizers is expressed by the standard MTBF formula. This paper describes cases of coherent clocks that suffer of a higher failure rate than predicted by the MTBF formula; that formula assumes uniform distribution of data edges across the sampling clock cycle, but coherent clocking leads to drastically different situations. Coherent clocks are defined as derived from a common source, and phase distributions are discussed. The effect of jitter is analyzed, and a new MTBF expression is developed. An optimal condition for maximizing MTBF and a circuit that can adaptively achieve that optimum are described. We show a case study of metastability failure in a real 40nm circuit and describe guidelines used to increase its MTBF based on the rules derived in the paper.
INTRODUCTION
Recently, a SoC product exhibited an alarmingly high rate of random failures in operation. Analysis showed that the problem was located in a clock domain crossing based on a two flip-flop synchronizer. While such synchronizers are designed to bridge asynchronous clock domains, it turned out that the domains in question were coherent, resulting in an increased failure rate.
The usual classification [1] [2] [4] sorts the relationship between two clocks based on their frequency and phase relations, as in the upper part of Figure 1 . No frequency and phase relationship is assumed for two asynchronous clocks, and various relations exist in the loosely synchronous class. That class is further divided into mesochronous, plesiocronous and heterochronous groups. The latter group is further sub-divided into ratiochronous and nonratiochronous [5] [6] clocks. We employ a different classification, based on clock sources, as shown in the lower part of Figure 1 . Clocks are non-coherent when they are sourced from different references and coherent when they share a common reference clock. The coherent case is further divided into two subcases depending on the nature of their phase distribution, uniform and non-uniform. The dotted circle groups the cases where the phase distribution is uniform, whether coherent or not. Note that some cases of the loosely synchronous clocks may be either coherent or non-coherent, and hence it is impossible to unify the two different classifications. It is widely believed that loosely synchronous clock domains may be synchronized using either a special purpose synchronizer designed for each of the special cases(e.g., [7] - [11] ), or using a brute-force ܰ-flip-flop synchronizer of the type designed for asynchronous domains [2] In that latter case, the reliability of the synchronizer is given by the estimate of mean time between failures ‫:)ܨܤܶܯ(‬
where ݂ , ݂ , ܵ denote the frequency of the clock, the rate of the incoming data signal and the settling time allowed for synchronization between the clock domains, and ߬ and ܶ ௐ are the metastability resolution time constant and its window of vulnerability. Significant research has focused on the improvement and enhancement of such synchronizers [12] [13] [14] . However, we have realized that in certain cases of coherent clocks that expression does not apply. Deep inside (1) lays the assumption that the probability distribution of data edges along the sampling clock period is uniform [2] [15] . However, we show that uniform distribution cannot be assumed in coherent clock domains. Rather, the common clock source leads to particular nonuniform phase relations that may result in significantly higher failure rates than predicted by (1) .
The paper is organized as follows. Section 2 defines coherent clocks and discusses the resulting phase relations. Section 3 describes jitter noise and its influence on clock phases. In Section 4 we develop a formula for ‫ܨܤܶܯ‬ in a general coherent clock case and an optimality condition for minimum failure rate. In section 5 we show the conditions for achieving that minimum and explain why previous publications [7] - [11] on adaptive synchronization do not provide such optimality. Section 6 presents the case study of a synchronization failure in a Soc, as discussed at the beginning of this introduction, showing solutions to achieve the optimal condition and section 7 concludes the work.
COHERENT CLOCKS
Synchronization in multiple-clock domain SoC can be sorted into two major categories, coherent and non-coherent clocking. The coherent clocks scenario is illustrated in Figure 2 . Two clock domains are fed from two different PLLs that are referenced from a common source and apply rational frequency multipliers M 1 ,M 2 . The clock frequencies of domains 1 and 2 are ݂ ଵ and ݂ ଶ respectively. A data signal sourced in domain 1 is sampled by a flip-flop in domain 2. This is the case when clock domains are referenced from a single oscillator or crystal on board that provides reference to all domains. In the general case, no assumption is made on the values of ݂ ଵ and, ݂ ଶ , and every ratio is permitted according to the programmed values of the PLL multipliers. This case is similar to Globally-Ratiochronous, LocallySynchronous (GRLS) in [6] . The non-coherent scenario is illustrated in Figure 3 , and corresponds to the case where the communicating clock domains are sourced from different references. This is the case when more than one oscillator is present in the circuit, or when synchronizing asynchronous inputs into the system. Both coherent and non-coherent cases may be present in large SoCs. In the following we deal mostly with the coherent clock scenario; non-coherent clocks are discussed briefly at the end of Section 3. In Figure 2 , data is generated at rate ݂ ଵ . The aim is to analyze the distribution of the phase difference between the data leading edge and its sampling clock in domain 2. We denote by ߮(݊) the time difference in the ݊ ௧ cycle of clock ݂ ଵ (this time difference is henceforth expressed as a phase difference). Assuming ݂ ଶ > ݂ ଵ , the phase is bounded by 0 ≤ ߮(݊) ≤ ܶ ଶ . Figure 4 describes this scenario; the leading edge of data is represented by the rising edge of clock 1. Figure 4 . Relative phases of two clocks Because both clocks are derived from a common reference, they are rational,
Following (2) and the waveform diagram of Figure 4 , an equation describing the evolution of phase for cycle ݊ can be derived [16] .
where ‫ܭ‬ can take only two possible values, ‫ܭ‬ = ⌊ߟ⌋ or ‫ܭ‬ = ⌊ߟ⌋ + 1. Equation (3) has been studied in the context of communication systems [17] and the solution is given by:
where ߩ = ߟ − ⌊ߟ⌋ = ொ and ߮(0) is the phase at time zero.
An interesting property of (4) is that ߮(݊ + ܳ) = ߮(݊) , which means that ߮(݊) is periodic with period ܳ and ߮(݊) can take at most ܳ different values.
An exhaustive analysis [16] of the solution (4) shows that ߮(݊) is composed of ܲ monotonically decreasing subsequences as shown in Figure 8 Based on the derived results a general expression for the probability density function ‫)݂݀(‬ of the phase ߮(݊) can be obtained:
where δ(‫)ݔ‬ is the Dirichlet delta function. Figure 9 shows a diagram of the ‫݂݀‬ for the phases.
T 2 Figure 9 . Probability density function diagram of () Substituting, we obtain the expression:
Small perturbations in clock frequencies
From (7), ܳ ෨ can be at most ܳ • ߠ and since ߠ is a large number, the number of possible phases ߱(݂ ଵ , ݂ ଶ + ߝ) = ܳ • ߠ is very large. In some cases, the numerator and denominator in (7) may have common divisors, lowering the value of ߱. In summary, while for certain frequencies ݂ ଵ , ݂ ଶ the number of possible phases can take a limited number of values ( ܳ), for a small perturbation of those frequencies the number of possible phases may drastically increase.
To show the above argument, we consider a case where ݂ ଵ = ‫ݖ‪ℎ‬ܯ521‬ and ݂ ଶ = ‫ݖ‪ℎ‬ܯ5.151‬ , meaning ߝ = 1.5 ‫ݖ‪ℎ‬ܯ‬ and ఏ ಿ ఏ ವ = 0.01 , representing a 1% deviation from ݂ ଶ = ‫ݖ‪ℎ‬ܯ051‬ . Figure 10 and Figure 11 show ߮(݊) and its histogram. The number of possible phases increased from 5 to 250 possible phases. The histogram in Figure 11 resembles a continuous uniform distribution, since it is composed of a large number of delta functions. 
CLOCK PHASE PROBABILITY DISTRIBUTION
The preceding analysis ignores noise. In periodic electronic signals, noise manifests as phase jitter. To understand the effect of jitter on the values of ߮(݊), we assume the noise is independent, time invariant and additive [18] . Then, (8) where ߮ ܰ (݊) describes the jittered phase at cycle ݊, ߥ(݊) is the jitter component that is assumed to have normal distribution ܰ(0, ߪ ଶ ), and ߮(݊) are the ideal phase values as described in the previous section. Figure 12 and Figure  13 show the effect of noise on the phase positions for ݂ ଵ =125Mhz and ݂ ଶ = 150Mhz case. As expected, instead of delta like phase positions, Gaussian like distributions in each of the peaks are obtained. Figure 14 and Figure 15 show a similar example for the case of a slight deviation from the desired frequencies. Since the number of possible phase positions increases drastically, the final result is almost a continuous uniform distribution through all possible phases.
In the non-coherent scenario Figure 3 , the ratio of the two clock frequencies cannot in general be expressed as a rational number. This is true even when the two reference clocks are specified to the same nominal frequency. Rather, this ratio is modeled as a rational number plus a small perturbation. Hence, based on the analysis of small perturbations explained above, the relative phases span a wide range in a manner close to uniform distribution. This situation persists even when adding noise. 
ܲ=53 ܳ=250
So far we have shown that the different clock scenarios can be classified based on their clock reference. In the noncoherent case the phase is distributed uniformly, while in the coherent case the phase distribution may become non- # of events uniform depending on the clock frequencies. When the phase is non-uniformly distributed, ‫ܨܤܶܯ‬ calculated using (1) is invalid; a new expression for the coherent nonuniform case follows in the next section.
COHERENT CLOCK MTBF
A failure in a synchronizer appears when the data-clock separation is inside the metastability window of vulnerability. Then synchronizer failure probability can be expressed as
where ߜ is the theoretical phase separation that causes the synchronizer output to settle at the metastability voltage (ܸ ) [15] . The parameter ߜ ௐ is the metastability window around ߜ such that for ߮ ே (݊) values outside the
, the voltage at the output of the synchronizer takes defined valid values within bounded time and there is no risk of further metastability propagation. However, if ߮ ே (݊) lies inside the interval, the synchronizer output is delayed generating intermediate voltages at its output at the system sampling time which may propagate metastability to the synchronous domain and lead to a failure ( Figure 16 ). ߜ ௐ is assumed symmetrical for the ease of the derivation while in real circuits it may be nonsymmetric around ߜ . (8), and since the probability density function of the sum of two independent random variables is the convolution of their separate density functions, the pdf of ߮ ே (݊) can be written as
The value of σ is the standard deviation of the jitter noise in the circuit. The probability density can be regarded as cyclic with cycle ܶ ଶ . The resulting ‫݂݀‬ function takes a form similar to the diagrams of Figure 17 .
From (10) and Figure 17 , we identify two different scenarios. When ܶ ଶ > 2ܳߪ, (10) represents a non-uniform distribution as in Figure 17 (a), having maxima and minima similar to the example of Figure 13 . This happens because the distance between the ideal phase positions ( ܶ ଶ /ܳ ) is larger than the standard deviation (ߪ) of the noise and the maxima are well separated. When ܶ ଶ < 2ܳߪ , the summation in (10) produces a mixture that can be approximated by a continuous uniform distribution, as in Figure 17 (b). This is because the Gaussian mean locations (߮(݅)) are uniformly distributed through the clock period and the distance between phase positions is shorter than the standard deviation of the noise. An alternative analysis in the Fourier domain yields a similar criterion for the uniformity of the overall distribution. Using (10) the failure probability can be re-written as:
which is the usual result for uniform phase distribution [15] .
Assuming the data rate is given by ݂ ଵ , a new general ‫ܨܤܶܯ‬ expression for the coherent clocks scenario can be derived from (11) :
Evidently, coherent clocking may lead to a different MTBF than expression (1) .
The maximum possible ‫ܨܤܶܯ‬ ratio is given when ߜ lies on a peak or in a valley of the probability distribution function. In those cases, the ‫ܨܤܶܯ‬ ratio is given by:
In common cases, jitter represents a few percent of the clock period. Taking ߪ ܶ ଶ ⁄ = 0.02 (2%) and ܳ = 3 , the ‫ܨܤܶܯ‬ ratio becomes 4160, meaning ‫ܨܤܶܯ‬ may increase or decrease by 4-5 orders of magnitude. This ‫ܨܤܶܯ‬ variation should be added to other design margins by increasing the settling time by 9߬ (ln (4160) ≈ 9), which in modern technologies (especially LP) can add up to 0.5-1 nsec latency. A similar scenario is shown in [20] by means of a special feedback setup that creates metastable events in almost every cycle. When the jitter is extremely low, the MTBF ratio becomes very high (e.g. for 0.5% jitter the ratio is almost 10 ଵ ), which is a considerable improvement in MTBF.
In most cases the MTBF uncertainty caused by coherent clocks should be compensated by an additional settling time margin. Those margins in synchronizer design should be added to other Process technology, temperature and supply voltage margins (PVT).
MAXIMIZING MTBF
Many synchronizers have been proposed for different types of coherent clock relations, such as mesochronous, multi-synchronous, plesiochronous and periodic clocks. However, in this section we consider maximizing the MTBF or N-flip-flop synchronizers, when employed between coherent clock domains.
Because (10) can be non-uniform as described previously, we aim at optimizing the synchronization setup in order to maximize MTBF. Since ߜ and ߜ ௐ are intrinsic parameters of the flip-flops and ߪ is related to the jitter of the clock network (basically the jitter of the reference clock from which both ݂ ଵ and ݂ ଶ are derived), we focus our optimization on the phases ߮(݅). The absolute phases are a function of interconnect and internal delays, which determine the value of ߮ ே (0), while the relative phase is independent of any delay and is given by ܶ ଶ ܳ ⁄ . Internal flip-flop delays depend on the circuit design and hence the only available parameter in a system level perspective is the interconnect delay that affects ߮(݅) by an overall offset.
To find the optimum value of ߮(݅) that yields maximum MTBF, one should solve equation (14):
Since the MTBF function is monotonic, it follows
Since (15) does not have analytical solution, an approximation is given by:
A graphical representation of the solution is shown in Figure 18 . The analytical solution matches the intuitive approach that the interconnect delay is to be adjusted so that the point ߜ lies between any two peaks ߮(݅) values.
It is possible to build a circuit that produces the optimal MTBF condition derived in (16) . Since ߜ is usually not known to the system designer, a method for adaptive delay learning is implemented. Previous works on adaptive synchronization [7] - [11] do not take ߜ into consideration and consequently may be unable to achieve the maximum ‫.ܨܤܶܯ‬ Figure 18 . Graphical representation of solution of (15) The principle of the adaptive delay unit is shown in Figure 19 and consists of a variable delay and delay control block that are independent of the synchronizer. The delay control receives both ݂ ଵ , ݂ ଶ and the output of the first flipflop in the synchronizer and generates a control signal (set) that sets the delay value. The output of the first synchronizer stage is critical in order to generate the condition shown in (16) . Once the delay is found, the control unit locks the value. This procedure is triggered after a reset of the clock domains and the delay is kept locked until any of the frequencies is changed. 
CASE STUDY
In this section we present a real circuit that a-posteriori was found to present metastability failures. The circuit was part of a commercial SoC in a 40nm technology. This presentation aims to achieve two goals: first, to demonstrate that phase distribution may be non-uniform in coherent clock circuits as shown in previous sections; second, to describe techniques useful for detection and analysis of random metastability failures.
The relevant portion of the circuit is shown in Figure 20 . In order to locate the failure, Infra-red emission analysis (IREM) was used. It identified an area of the SoC that exhibited irregular emissions correlated to the failure event. Figure 22 shows the IREM image during normal system operation and just prior to failure. In normal operation, only one emission spot was visualized. Prior to a failure, additional emission spots were observed. Multiple signals in the vicinity of the culprit location were examined, by adding FIB micro-probes. Figure 21 shows the synchronizing clocks and the waveform at the output of the synchronizer (signal S1), with an unexpectedly-short pulse, caused by a late output transition generated by metastability. Logically, this event was determined to cause the failure.
Since the system employed coherent clocks, we studied the phase relation of the micro-probed clock and the data feeding into the suspected synchronizer. Using the value of ߜ and the phase histogram, we calculated the probability of failure, by the ratio of the events in a window around ߜ divided by the overall event number in the histogram. Table 1 shows the result of the failure probability for different values of the ݂ ଶ ݂ ଵ ⁄ ratio. The failure probability changes by very large factors. We then validated these findings by measuring failure probabilities of the SoC for each of the tabulated ratios. Finally, we directed the SoC user to use only the ratios that are highlighted in the table, since they lead to significantly reduced failure probabilities.
We note that the proposed solution did not fix the problem completely, but increased the ‫ܨܤܶܯ‬ by two orders of magnitude, which resulted in an acceptable solution for the specific application. In the future, a similar SoC may employ a circuit like in Figure 19 to dynamically adjust ߜ and achieve an even better improvement in ‫.ܨܤܶܯ‬
The synchronizer in Figure 20 was poorly designed and the case is presented here to illustrate the coherent clock phase distribution and not as a method to solve metastability issues. Table 1 . Failure probability for different ݂ ଶ ݂ ଵ ⁄ ratios
CONCLUSIONS
We have proposed a new classification of CDC synchronization based on the source of the clock references involved. Coherent and non-coherent clock scenarios are introduced. In the non-coherent clock scenario the clock phase distribution is shown to be uniform as assumed in previous publications. In contrast, coherent clock synchronization is shown to present non-uniform phase distribution in some cases depending on clock frequencies.
A condition for non-uniformity versus uniformity in coherent clocks is developed. A new formula for ‫ܨܤܶܯ‬ in the general coherent clock scenario is developed and an expression for optimum ‫ܨܤܶܯ‬ is found. A general block diagram of an adaptive synchronization scheme that can maximizes ‫ܨܤܶܯ‬ is proposed. A real case of synchronization failure in a coherent clocking SoC is presented, demonstrating measured non-uniform phase distribution and also illustrating how random metastability failures can be detected and localized in real chips.
ACKNOWLEDGEMENT
The work of Salomon Beer was supported in part by HPI Institute for scalable computing. The authors would like to thank the anonymous reviewers for their wise comments that helped improving the quality of this publication. 
