Abstract-The main objective is to detect the number of faults and to minimize the clock skew. Minimization of clock skew is quite difficult due to the PVT variations The post silicon skew tuning is the technique that has the ability to tolerate PVT variations (Process,Voltage and Temperature). Clock skew minimization that is an important issue in very large scale integration design has become difficult due to the presence of process, voltage, and temperature (PVT) variations. The postsilicon skew tuning (PST) technique with the ability to tolerate PVT variations, even after a chip is manufactured has generated considerable discussion. Unlike most previous works that have focused on the implementation and the performance issues of a PST architecture and it tells about one fault detection, this paper focuses on the testing issues of a PST architecture that detects more than one fault. However, testing the variation tolerance ability of the PST architecture is difficult because the clock skew does not directly affect the functionality of a design. In this paper, we propose an efficient fault model considering the physical limitation of the devices for the PST architecture. In addition, we propose some novel structures to detect the manufacturing faults and increase the robustness of a PST architecture. Our experiment shows that with a little overhead, we can achieve robustness.
I. INTRODUCTION CLOCK skew has tremendous effects on the performance of a design. To address the clock skew minimization problem, many previous works [1]- [8] propose to use buffer and ire-sizing methodology. However, with the influence of process, voltage, and temperature (PVT) variations [9] in the advanced technology, the traditional sizing techniques may satisfy the clock skew constraints in certain variation conditions but not in others. PVT variations are prone to cause errors on crypto-architectures.These variations may be lethal to the parity check of the design. Thus, a good methodology to reduce the effect of PVT variation is needed for these architectures. Recently, the post-silicon skew tuning (PST) technique with the capability of tolerating PVT variations has attracted a lot of attention [17] - [24] . The main idea of the PST architecture is to adjust the clock skew dynamically even after a chip has been manufactured. There are two important devices in the PST architecture: adjustable delay buffers (ADBs)and phase detectors (PDs). An ADB is a delay buffer whose delay can be modified through the controlling inputs. A PD can detect the arrival time (phase) difference between two clock signals. Fig. 1 shows how a PST architecture works. First, a PD, such as PD1, samples, the clock skew between two flip flops (FFs), FF1 and FF3. From the skew information of PD1, the controller modifies the delay of ADB1 to minimize the clock skew between FF1 and FF3. Unlike previous works that mainly focus on the design and implementation issues of a PST architecture, this paper discusses the testing of PST designs. Although PST architecture is being used to tolerate PVT variations of clock skew, variations can also affect the tolerance ability or even the functionality of ADBs and PDs. Therefore, testing the PST architecture is important. However, the traditional testing methodology for logic gates is not suitable for testing ADBs and PDs. There are several reasons. First, clock skew variation does not directly affect the functionality of a design, so the traditional stuck-at fault test methodology cannot be applied. Second, it is difficult to create various PVT environments in a testing situation. Even if a manufactured chip passes the traditional testing flow, www.ijaegt.com we cannot guarantee that the PST architecture can dynamically minimize the clock skew under different environmental conditions. The contributions of this paper are described as follows. First, we propose a general fault behavior model for the PST architecture. Based on the modeled faulty behavior, we propose efficient ways to diagnose the PST architecture. In addition, we present a robust architecture that properly inserts redundant devices so that the PST architecture can recover from one faulty ADB or PD. Our experimental results show that with very little overhead, we can increase the robustness of the PST architecture. The organization of the rest of this paper can be outlined as follows. In Section II, we introduce the design of the ADB and the PD, as well as the idea of combining them into an integrated device. In Section III, we present our fault model. Then, in Sections IV and V, we show the basic idea, the algorithm, and the implementation of our fault detection method. Finally, we propose our faulttolerance architecture in Section VI and present the results of our testing.
II. PRELIMINARIES
In this section, we first discuss the designs of ADBs and PDs. Then, we propose a fault model for our testing mechanism and review a basic PST design from [19] .
A. ADB and PD Designs
In a PST architecture, the ADBs and PDs are two major components. A PD can sample the arrival times of two clock signals and indicate which clock signal arrives earlier. In addition, the delay of an ADB can be adjusted from its control inputs The control inputs of an ADB form a value called the delay value. Normally, an ADB is designed to have the equal-delay step called the resolution step between the adjacent delay values.
B. PST Architecture
Various ways of constructing a PST design have been proposed in previous works. The main difference between each PST design and other PST designs is the locations of ADBs and the corresponding PD networks. A simplified PST design proposed is shown in Fig. 2 . We now use Fig. 2 to demonstrate the alignment process for the PST architecture.It determines a reference clock that all other clock sinks should be aligned with. In Fig. 2 , the reference clock is FF7. In the second step, ADBs A3, A6, A8, and A12 adjust their delays to minimize the clock skew between the reference clock FF7, and the clock sinks FF3, FF6, FF8, and FF12 based on the skew information from PDs PD3, PD6, PD8, and PD12, respectively. Once these clock sinks are aligned with the reference clock, the arrival times of the clock sinks {FF2,FF5, FF9, FF13} are aligned with the clock sinks {FF3, FF6,FF8, FF12} by adjusting ADBs {A2, A5, A9, A13}. Finally, the process uses ADBs {A1, A4, A10, A14, A15} to align clock sinks {FF1, FF4, FF10, FF11, FF14, FF15} with the clock sinks {FF2, FF5, FF9, FF13}. After the alignment process, all clock sinks will be aligned with the reference clock directly or indirectly. In this paper, all clock sinks are assigned a level as follows. First, we assign the reference clock, FF7, to be level 0. The clock sinks directly aligned with the level-zero clock sink are placed at level 1. For example, FF3, FF6 , FF8, and FF12 are assigned to level 1.Similarly, each clock sink will be set to level X+1 if it is directly aligned with a level X clock sink. For example, FF1, FF4 , FF10, FF11, FF14, and FF15 are assigned to level 3.
C. Synchronizer
To simplify the control logics of a PST architecture, [19] , [21] , [22] , and [28] use only one PDs skew information to determine the delay value of an ADB. As a result, each ADB has a corresponding PD. In this paper, an ADB and its corresponding PD can form a new module called a synchronizer. Fig. 3(a) shows the block diagram of a synchronizer. For example in Fig. 3 , the PD and the ADB form a synchronizer, S1, where the PD gathers the arrival times of FF1 and FF2. Using the information from the PD, the controller is implemented as a finite state Fig. 4 . Similarly, as with all clock sinks, we assign each synchronizer to the same level as its FFtarget signal. For example, S3, S6, S8, and S12 are assigned to level 1, whereas S1, S4, S10, S11, S14, and S15 are assigned to level 3, the leaf level.
III. FAULT MODELING OF THE SYNCHRONIZERS ADBs and PDs may exhibit various faulty behaviors due to the manufacturing defects and process variations. According to [1] and [29] , there are at least eight locations that may have manufacturing defects in an ADB design. In addition, the parameter problems [25] , such as the linearity of the delay, should be considered. Testing all possible faulty (analog) behaviors of each individual ADB and PD can be very cumbersome because of the large number of ADBs and PDs in a PST architecture. However, since a pair of ADB and PD, along with its controller is integrated with a synchronizer, our approach considers the faulty behavior of the synchronizer instead of considering the individual faulty behavior of ADBs and PDs. We determine a synchronizer to have a synchronization fault if the clock skew between its FFtarget signal and its FFref signal cannot be aligned. The synchronization fault model is a very general fault model. For simplicity of discussion, we will assume there is at most one synchronization fault in a design. Traditionally, the testing of the design uses a stuck-at fault model to verify the correctness of the design. However, using the stuck-at fault model can hardly verify all possible fault behavior of the synchronizer. In addition, unlike faulty logic gates that may directly cause functional failure, a synchronization fault affects only the timing rather than the functionality of the design. Because a synchronization fault affects only clock skews, we need a way to control and observe the relative arrival times of clock signals.
IV. BASIC PST TESTING FLOW AND THE TESTING ARCHITECTURE In this section, we first discuss our PST testing flow for finding the faulty synchronizer. Then, we discuss how to add design-for-test elements to achieve the PST testing flow. Our basic idea for testing a faulty behavior is to modify the delay of an ADB and observe the corresponding change in the PST architecture. If an unexpected change is observed, we discern that there is a synchronization fault in the design. Since we cannot add delays after manufacturing, we assume the only way to manually modify clock arrival times is to change the delay values in a synchronizer. To change and observe the delay values of synchronizers, we assume that a scanchain structure is inserted into all synchronizers to scan in and out the delay values. The testing is transformed to scan in appropriate delay values to synchronizers smartly and then scan out the delay values of synchronizers for verification. To implement the PST testing flow, we need to insert design-for-test devices into a PST design. Fig. 5 shows the proposed PST architecture that includes a scan chain and one additional ADB, ADBref added to the reference clock sink. With the scan chain, we can manually adjust the delay values of synchronizers. If all synchronizers work properly, certain synchronizers should react to the change by properly adjusting their delay values. After that, we try to manually increase the arrival time of FF3 by adding additional k units of delay in the synchronizer S3. Then, we manually fix the delay of S3 so that S3's delay value will not be changed in the subsequent alignment process. Finally, we try to align FF2 to FF3. The delay value of S2 should react to the change of S3 and increase its delay by k units. Otherwise, there is a synchronization fault in either S2 or S3. Given a PST design, we now describe the steps of our basic PST testing flow. In the first step, we perform the clock alignment. After the alignment, all synchronizers are assigned certain delay values. In the second step, we scan out the delay values of synchronizers. In the third step, we scan in the appropriate delay values to certain synchronizers to perturb the clock arrival times. Then, in the fourth step, we align the PST design. After alignment, the delay values of synchronizers will be updated due to the arrival time perturbation in the previous step. In the final step, we check the updated delay values of synchronizers. From the updated values, we can determine whether there is a faulty synchronizer. To fully test the same synchronizer, we need to iterate between the third step and the fifth step. Let us first assume that a PST design is working properly. In the first step, we align all clock sinks and suppose the delays of synchronizers, S2 and S3, are settled to certain values. In the second step, we scan out the delay values of all synchronizers and let the delay values of S2 and S3 be value j and valuek, respectively. In the third step, we scan in and fix the delay value of S3 by one resolution 
A. Scan-Chain Architecture and the Proposed Algorithm
Recall that in Section IV, the third step of our testing flow requires us to change and fix the delay values of certain synchronizers. A testing algorithm makes the decisions of: 1) which synchronizer and its values will be in the third step and 2) the order of processing synchronizers. The pseudocode of our testing algorithm is shown in Fig. 6 . For simplicity of discussion, we use an example to illustrate the algorithm. Our basic idea is to modify the values of synchronizers in the higher level to test the functionality of synchronizers in the immediate lower level. Note that the level of a clock sink is defined in Section II-B. As a result, all synchronizers in the lower level will be modified at the same time. In this example, we have three levels of synchronizers and one ADB for the reference clock sink. The ADB for the reference clock sink is ADBref , which is a newly inserted redundant ADB for the purpose of testing. Synchronizers in the first level are {S3, S4, S7}, in the second level are {S2, S5, S8}, and in the leaf level are {S1,S6, S9}. In the first iteration of our testing flow, we attempt to modify the ADB, {ADBref }, and observe the behavior of the first-level synchronizers, {S3, S4, S7}. Let the maximum delay value be Maxsyn, the minimum delay value be Minsyn, and the default delay value be Dsyn. We also assume the scan out value of ADBref is Dsyn after the first alignment. To test all possible behaviors of synchronizers {S3, S4, S7} in the first level, we perform a total of (Maxsyn−Minsyn) iterations of testing flows. In each iteration, we scan in the delay values of ADBref , align, and check the delay value of S3, S4, and S7 as we described in Section IV. The values are scanned into ADBref in such a way that the value is increased by one from the previous iteration until the value reaches the maximum (Maxsyn). For example, let us assume Maxsyn, Minsyn, and Dsyn of synchronizers be 7, 0, and 4, respectively. We scan in 5 (=4 + 1) to the delay value of ADBref for one testing flow. Then, in the subsequent two testing flows, we scan in 6 and 7 to ADBref . After that, the values scanned into ADBref are decreased one at a time until the value reaches the minimum (Minsyn). The values we scan into ADBref in next five iterations are 4, 3, 2, 1, and 0. In summary, we scan in to ADBref the values in the order of {5, 6, 7, 4, 3, 2, 1, 0}. In the final step of a testing flow, we scan out the values of ADBs. From the values of ADBs, we now can determine how to test whether ADBref is faulty.when the delay value of ADBref is increased (decreased) by one, the scan-out values of the firstlevel synchronizers {S3, S4, S7} should also be increased (decreased) by one unless the value has reached Maxsyn (Minsyn). Otherwise, the corresponding synchronizer is deemed to be faulty. Once synchronizers {S3, S4, S7} in the first level are tested correctly, we use them to test the second level following the same mechanism. For example, S4 is used to test S5. The process continues until we have tested all synchronizers in the design. In this paper, we define this method that uses the parent synchronizer (the synchronizer in the higher level) to test its children synchronizers (the synchronizers in the immediate lower level) as the forward fault detection To test a PST architecture, we insert a scan chain and a reference ADB, ADBref into the design. During the testing in Fig. 6 , ADBref is assumed to be correct to test others. However, ADBref also needs to be tested. Let 
B. Timing Analysis
The overall test time of the proposed mechanism is relative to the cycle times required for scanning in and scanning out the values of the delay signal and the number of times required for performing the scanning operation. The cycles required for scan-in all values for all synchronizers are r * k where r equals to the number of bits of the delay signal on each synchronizer and k represents the number of synchronizers. For example, if a design contains 200 synchronizers and the tuning range for each synchronizer is eight, then it takes 200 * log28 = 600 cycles to scan in all values in the worst case. Once all desired values are scanned in, we need to start the synchronization process. With 2r possible values of delay, the synchronization process may take r cycles to determine the correct delay values using binary search. Then, we need to scan out the values of delay of all synchronizers. By carefully arranging the scan chain, it is possible to overlap the scan-in and scanout process into r * k cycles. Now, we determine how many multiples of scan-in values are required to verify all synchronizers. According to Fig. 6 , the synchronization processes in the same level of synchronizers are independent of each other. Thus, we can simultaneously perform fault detection process to all synchronizers in the same level. For example, in Fig. 8 , we can verify clk1, clk4, clk10, clk11, clk14, and clk15 at the same time since they are in the leaf level. As a result, we need to scan in the values n − 1 iterations, where n is the number of levels in the PST architecture. In sum, the overall test time is around rk(n − 1) cycles.
C. Effect of the Over-Range Scenario
In a practical ADB design, each ADB has a fixed adjustable range. In this paper, we define the tuning range of a synchronizer as the number of the resolution steps for a synchronizer. Let the ADB in a synchronizer has three control inputs. This ADB has 23 = 8 steps of resolutions. The tuning range of the synchronizer is eight. The finite tuning range that can limit testing capability of our algorithm is shown in Fig. 6 . The key idea shown in the pseudocodes in Fig.  6 is to use the synchronizer on the higher level, i.e., the parent synchronizer, to test the synchronizers on the lower level, i.e., its children synchronizers. With the finite tuning range, only certain values will be available for the parent synchronizer to adjust according to its current delay value. As a result, only limited values can be used to test its children synchronizers. The traditional single stuck-at fault cannot completely verify the functionality of ADB and PD because that clock skew variation affects the timing but does not change the functionality of the data path. Furthermore, a delay testing fault model cannot be applied either, since even if a chip has Using the methodology to verify each ADB or PD can be too costly since our architecture consists of many PDs and ADBs. In our model, the synchronization fault can verify the correctness of ADB and PD at the same time. It focuses only on the correctness of aligning the clock skew. Thus, the synchronization fault would be easier to detect. In addition, our synchronization fault model can verify the correctness of the PST architecture. Similarly, the idea of cover ratio also provides fault coverage and identifies how many possible faults can be tested through the proposed fault detection methodology. Compared with the fault coverage in the stuck at fault model that can only identify the number of digital faults on a data path, the cover ratio can identify the number of faults of the analog synchronizer on the clock path. Consider testing the synchronizer, S2, in Fig. 7 (a) using the synchronizer described in the previous paragraph with the timing interval of each resolution step being 20 ps. Before the alignment, suppose the arrival times of FF1 and FF2 are 200 and 240 ps, respectively, In our test algorithm, we scan-in all possible delay values of S1 to verify the correctness of S2. However, we cannot test the correctness of S2 if the arrival time of FF2 is greater than 340 ps since S1 cannot be adjusted higher than 340 ps. Thus, we can guarantee that S2 works properly only when the delay value is {0, 1, 2, 3, 4, 5} but we do not know whether S2 is faulty or not when the delay value is in the range of {6, 7}. In that case, we can only test six delay values of S2 among all eight possible values. Consider another extreme case shown in Fig. 7(c) . After alignment, let us assume the controlling value of S1 and S2 to be 7 and 0, respectively. In this extreme case, we cannot test any value of S2 since the clock arrival times of S1 and S2 deviate a lot in the testing environment. Therefore, both S1 and S2 have to stretch to their maximum and minimum values to balance the clock skew. To measure the effect of the over-range scenario, we define the number of testable resolution steps as the cover range. Thus, the cover range of S2 is six, whereas the cover range of S2 is zero. We define the cover ratio of synchronizers to be the percentage of the testable delay values that equals to the cover range divided by their tuning range.The cover ratio of S2 is 75% meaning that only 3/4 of the resolution steps can be tested. We also define the cover ratio of a circuit as the average of all cover ratios of synchronizers. In this paper, we use the term the over-range scenario in cases where the proposed fault detection algorithm cannot fully verify a synchronizer.
VI. FAULT-TOLERANCE ARCHITECTURE
In this section, we first discuss our fault-tolerance architecture that inserts redundant PDs into the design. Then, we describe how to insert sufficient redundant PDs to tolerate any possible faulty synchronizers. In addition, we introduce a way to use the fault-tolerance architecture to improve the fault detection method.
A. Fault-Tolerance Mechanism
Before performing our fault-tolerance procedure, we assume the faulty synchronizer has been identified by the procedure.For simplicity, we use an example to illustrate our basic idea of fault tolerance.During the alignment process in several clocks are aligned to the reference clock FF7, first. To describe the alignment relation between clock sinks, we can transform to a new directed graph called the synchronization graph.A node in the synchronization graph represents a clock sink. In addition, a directed edge from FF1 to FF2 represents an assertion that the clock sink FF1 is aligned with the clock sink FF2 by synchronizer, S1.
If there is a faulty synchronizer, conceptually, the corresponding edge in the synchronization graph should be removed because a faulty synchronizer cannot align its two clock sinks. For example, assume the synchronizer, S3, which aligns FF3 and FF7 in Fig.  9 , is faulty. After removing edge S3, the synchronization graph is separated into two disjoint synchronization graphs. The synchronization graph SG1 consists of the reference clock sink, FF7 and all the clock sinks aligned to FF7. The synchronization graph SG2 consists of all clock sinks aligned to clock sink FF3. n this example, two clock sinks FF1 and FF2 are aligned to FF3 but they are not aligned to the reference clock FF7. To perform fault tolerance on the above example, our basic process is as follows. If we can find the skew difference between FF7 and FF3, we can scan in additional skew differences to all the clock sinks in the disjoint synchronization graphs to align all clocks. For the same example, suppose we can obtain the information that FF7 would arrive k units earlier than FF3. We scan in additional k units of delay to all clock sinks in SG1 so that all clock sinks in SG1 can be aligned with FF3. To calculate the skew difference, we now describe our fault tolerance architecture and methodology. Our faulttolerance architecture first inserts several redundant PDs. For the same example, we insert a redundant PD between FF1 and FF10. The redundant PD will share the ADB of the synchronizer that controls FF1 (or FF10). In the fault-tolerance process, we first fix all other synchronizers except the new synchronizer. Then, we perform an alignment process, using the new synchronizer to align FF1 with FF10. Once the arrival times of FF1 and FF10 are aligned, we can then scan out the delay values and compare them with the previous ones. The difference between the new delay value and the previous value is the skew difference between FF1 and FF10. Since all clock sinks in synchronizer graphs SG1 and SG2 are aligned already, the clock skew difference between FF1 and FF10 is the same as that difference between FF3 and FF7.
B. Rules of Selecting Redundant PDS
In this section, we describe how to insert sufficient redundant PDs to tolerate any possible single synchronization fault. Since the faulty synchronizer separates the original synchronization graph into two disjoint synchronization graphs, we need to insert one redundant PD to connect these two disjoint synchronization graphs. We make the following three observations. First, it is not efficient to connect clock sinks that are not in the leaf level using a redundant PD since we can always find a better clock sink in the leaf level. In this paper, we define a clock sink, FFi , as being on the transitive path of clock sink FFj, if FFi is reachable from FFj through the directed edge in the synchronization graph. For example, FF3 is on the transitive path of FF1.The redundant PD, PD1 connecting FF1 and FF10 can used to tolerate the faulty synchronizer S3. We say that S3 can be tolerated by PD1. Consider another redundant PD, PD2 connecting FF3 and FF10 where FF3 is not in the leaf level. We can show that if a faulty synchronizer can be tolerated by PD2, it can also be tolerated by PD1. As a result, PD1 is more efficient than PD2. Therefore, we should consider adding redundant PD connecting clock sinks only in the leaf level since they can recover a faulty synchronizer in its transitive path. Second, all the clock sinks in the leaf level must be sampled by at least one redundant PD. It is obvious that the clock sink in the leaf level without a redundant PD cannot be corrected if the corresponding synchronizer is faulty. As our last point, a redundant PD should not connect two clock sinks that would both be affected by the same synchronizer. If the synchronizer of FF8 is faulty and we use a redundant PD to connect FF10 and FF11, then we cannot detect the clock skew since both FF10 and FF11 are aligned to FF8. The above three observations can be summarized as the following three rules. Fig. 10 shows the synchronization graph and a possible redundant PD connection of Fig. 8 . The dotted lines in Fig. 10 indicate the redundant PD connection. In this example, we add four redundant PDs. Note that each level of the PST architecture consists of more than one synchronizer so the third rule can be satisfied for all sinks in the leaf level. PDs only connect to the leaf clock sinks that do not have the same clock sink in the first level. In other words, redundant PDs that are created based on these two rules can guarantee that every clock sink is in at least one loop. In addition, rule 2 guarantees that all clock sinks in the leaf level will be selected.
C. Construction of the Redundant PD Structure
The previous section shows the rules of inserting sufficient number of PDs to form a fault-tolerance PST architecture. However, there are many possible ways to insert PDs. The design uses only three redundant PDs rather than four of the three rules such that we minimize the total number of inserted redundant PDs. Our idea is to transform this problem into an integer linear programming (ILP) problem.
Let G = (V , E) be an undirected graph where V = {Vi} is the set of clock sinks in the leaf level and E = {ei j } is the set of all candidate edges connecting redundant PDs between two vertexes, Vi and Vj . To reduce the area overhead, we intend to select the minimum number of candidate edges so that the three rules are satisfied. We transform these conditions as follows. First, according to rule 1, every vertex in V must be connected by at least one candidate edge.
Second, we use an n×Cn 2 constraint matrix M to guarantee that rule 3 is satisfied where n is the number of clock sinks in the leaf level. We have 
D. Improved Fault-Tolerance Methodology
It introduce three rules of constructing the redundant PD structure. In this section, we provide an alternative way to construct the redundant PD structure. With extra computing effort, we can increase optimality by relaxing the three rules.
Because of the relaxation, we can improve our redundant PD structure for better wire routing. The three rules can guarantee that all possible single synchronization faults be tolerated. However, these three rules are too restrictive and result in the loss of optimality. For example, Fig. 13 shows the same design with different redundant PD structures from Fig. 12 . It can be easily shown that the redundant PD structure in Fig. 13 can also tolerate all possible faults. As in the figure, the redundant PD, R1, can tolerate the fault on S2, S3, and S5. The second redundant PD, R2, can tolerate the fault on S1, S2, S4, S6, and S7 while the last redundant PD, R3, can tolerate the fault on S7 and S8. In addition, the total wire length of the redundant PD structure is much shorter. rule 3 is not a necessary condition for tolerating all possible faults. Thus, we can relax rule 3 to increase the flexibility of constructing redundant PD structures. According to the proof of Theorem 1, to tolerate any single faulty synchronizer, we need to guarantee that every clock sink is in one loop created by redundant PDs. Based on the proof, we relax rule 3 as follows. Rule 3': Each edge in the synchronization graph must be contained by at least one loop created by redundant PD edges. Then, we modify our ILP formula according to the new rule to tolerate all possible single faults. Let G = (V , E) be an undirected graph where V = {Vi} represents all leaf level clock sinks and E = {ei j } is the set of edges connecting clock sinks Vi and Vj. We also use a k×Cn 2 constraint matrix M to guarantee that the original requirement is satisfied, where k is the number of all edges in the synchronization graph and n is the number of leaf level clock sinks in V. We have Mi j = 1 if the j th candidate edge is connected to the i th vertex and the edge satisfies rule 3'; otherwise Mi j = 0. The modified ILP formula is described as follows: minCS s.t., MS ≥ Z, si j ∈ 0, 1 where C represents the cost factor of selecting each edge, M is the modified constraint matrix. Compared with the original ILP method in Section VI-C, the modified ILP method has more flexibility but requires much more run time.
For that reason, we now need to consider adding a redundant PD in every pair of clock sinks. In addition, we need to make sure that all clock sinks are included in at least one loop created by redundant PDs. These operations increase the complexity of the constraint matrix and the computation time of the result.
E. Improved Fault Detection Method With the Fault-Tolerance Architecture
The fault-tolerance architecture can also be applied to improve the cover ratio described in Section V-C. The basic idea is to use the redundant PD to perform fault detection so the cover ratio can be improved.
For ease of discussion, we illustrate our basic idea using an example in Fig. 13 . In the example, the synchronizer S3 can only be verified by its parent synchronizer, S2. However, with the redundant PD R1, we can use the skew information from the redundant PD to verify S1 with S5. Because of the additional flexibility of testing S3 using the different synchronizer, the cover ratio of the S3 increases when adopting the redundant architecture. Assume the resolution size and the tuning range of all synchronizers are 8 and 20 ps, respectively; the arrival time of FF3, FF4, and FF6 are 120, 270, and 260 ps, respectively. The cover ratio without the redundant PD structure is 12.5%, whereas that with the redundant PD structure is 100%.
VII. EXPERIMENTAL RESULTS
We constructed our fault-tolerance PST architecture on several industrial designs. This section shows the experimental results. We implemented our algorithm and applied it on three large industrial circuits: a commercial crypto-processor [11] Then, we placed and routed the synthesized circuits and obtained the physical information from a commercial tool, SoC Encounter.With that physical information, we applied the PST architecture as the one in [19] . Then, we inserted the discrete Fourier transform (DFT) for testing the PST architecture. The DFT overhead included one reference ADB, the scan-chain architecture, and control logic.
Output 2:More number of fault detection.
Finally, we added the redundant PD connections to form a fault-tolerance PST architecture. Column two shows the gate counts of the design.
VIII.REFERANCE
[1]Mac Y.C.Kao,Kun-Ting Tsai,Shih -Chieh Chang" Fault detection and tolerance architecture for postsilicon skew tuning" in proc.IEEE/ACM ,july 2015.
