Abstract-This paper presents a novel approach for test generation and test scheduling for multi-clock domain SoCs. A concurrent hybrid BIST architecture is proposed for testing cores. Furthermore, a heuristic for selecting cores to be tested concurrently and order of applying test patterns is proposed. Experimental results show that the proposed heuristics give us an optimized method for multi clock domain SoC testing in comparison with the previous works.
INTRODUCTION
y increases in complexity of system on chip (SoC) designs, user defined logics and the number of cores in SoCs, achieving a reasonable test time becomes very important. Therefore, different approaches have been used to reduce the total SoC test application time. One approach is based on allocating optimal test access mechanisms (TAMs) and test scheduling algorithms for testing cores [1] [2] [3] [4] [14] [23] . One of the most important factors to improve test application time (TAT) in SoC testing is a suitable test scheduling algorithm based on the appropriate test architectures [4] [5] [6] [7] [24] . In [7, 11, 26] the authors proposed an algorithm based on algebra and genetic algorithm to find an optimal test scheduling. On the other hand, because of the increasing switching activities during the test process, power consumption of a chip while being tested, is higher than the normal mode of operation and that can affect the reliability of testing [8] . Thus, power constraint and thermal aware SoC testing have become important during concurrent testing [9, 10, 12, 13, 25] .
Many test scheduling schemes and algorithms have been proposed to minimize SoC test time utilizing concurrent approaches with power limitation consideration. In [14, 15, 27 ] the authors presented a heuristic method to minimize test application time by allocating optimum wires of TAM to each core and best test scheduling scheme. In [16, 20] the authors presented a power aware bus architecture and an appropriate algorithm to use a functional bus instead of TAM to test cores concurrently. The main idea is to use memory buffers for applying deterministic test patterns (DTPs) to cores concurrently when the speed of the functional bus is higher than the speed of injecting test patterns in cores. In [21] an architecture and a test scheduling method for testing cores with multi-clock domain SoC is presented. On the other hand, when talking about concurrency, built-in self-test (BIST) becomes a beneficial method to achieve concurrency. In [17] the authors proposed an algorithm to combine BIST with external testing (called hybrid BIST or CBET) to achieve optimum result for test time minimization problem. In [10] the authors proposed an algorithm based on solving rectangular packing problem for power constrained concurrent hybrid BIST for SoC testing. In this paper we propose a hybrid BIST structure and concurrent test scheduling for multi-clock domain SoC. Furthermore a novel algorithm for finding the optimal number of pseudo random test patterns (PRTPs) and deterministic test patterns for each core is presented.
Section II is devoted to discuss test generation process in hybrid BIST. A hybrid BIST architecture for a multi-clock domain SoC is proposed in Section III. In Section IV, a test scheduling graph for modeling power aware core testing is presented. Based on the proposed test scheduling graph, some heuristics for selecting the set of cores to be tested concurrently, determining sets and the order of applying deterministic and pseudo random test patterns to each core will be discussed in Section V. An algorithm for test scheduling based on the test scheduling graph is proposed in Section VI. The results obtained by the proposed methods are drawn in Section VII. Finally Section VIII concludes our work.
II. CALCULATING NUMBER OF PSEUDO RANDOM TESTS
This section discusses an approach for calculating pseudo random tests to be applied to the hybrid BIST. As mentioned, this BIST uses DTPs (Deterministic Test Patterns) and PRTPs (Pseudo Random Test Patterns). Both DTPs and PRTPs have advantages and disadvantages. DTPs are generated for random-resistant faults and increase fault coverage more than PRTPs. But, in many cases, an automatic test equipment (ATE) is needed for applying test patterns with global clock and through scan chains. So, using DTPs increase the TAT. On the other hand, PRTP can be produced through BIST architectures with local clocks [17] . In some BIST architectures, scan chain is not needed that reduces the number of cycles for applying test patterns. In general, the speed of PRTPs application (through a BIST architecture) is higher than the speed of DTPs (through an external ATE). We refer to S Bi =F Bi /AC Bi and S Ei =F Ei /AC Ei as BIST speed and external speed of core i respectively. Here F is the clock frequency and AC is the number of cycles for applying each test pattern.
In a hybrid BIST method, both the speed of PRTPs and quality of DTPs will be used for achieving the minimum TAT. In the common hybrid BIST test generation, first, PRTPs are generated for detecting easy-to-detect faults and then, for the remaining faults (random-pattern-resistant faults), DTPs will be generated. In the end of the test generation process, DTPs are generated for specific faults. Quality of these vectors is very high, and for achieving full coverage, we should apply these test vectors to the CUT. With applying these vectors in the start of test generation, a large number of faults will be detected, which leave a relatively few fault for the PRTPs to detect. Thus reducing the number of PRTPs significantly. In our method a PRTP test generation is sandwiched between two DTPs as discussed below. Generally, there are 3 stages for the test generation process.
The first stage of test generation is phase 1 of deterministic test generation (DTPs-phase1). Such test vectors are generated for veryhard-to-detect faults and should be applied early in the test process. A Large number of faults (easy-to-detect and hard-to-detect) will be detected with these high quality tests.
B
Elaheh Sadredini, Mohammad Hashem Haghbayan, Mahmood Fathy, and Zainalabedin Navabi es9bt@virginia.edu, hashem@cad.ut.ac.ir, mahfathy@iust.ac.ir, navabi@cad.ut.ac.ir
The second stage of test generation is the pseudo-random one. PRTPs are generated by linear feedback shift registers (LFSR) and detect the remaining easy-to-detect faults. The problem is finding the optimal number of PRTPs for achieving a minimum TAT for the remaining faults. A simple binary search algorithm helps us find the optimal number of PRTPs (Figure 1 ). This algorithm will be discussed later.
The third stage of test generation is phase 2 of deterministic test generation (DTPs-phase2). Such tests are generated for remaining faults. Finally, full coverage will be achieved.
The total TAT decreases by applying the above three phases of test generation process, compared with hybrid BIST tests where deterministic tests follow pseudo random tests. Figure 2 shows test application time for the proposed test generation method (checkered-dark-hatched bars) and work presented in [22] (graywhite bars). In our proposed method, PRTP_PM is the optimal number of PRTPs and PRTP_PW is the optimal number of PRTPs for previous work of [22] . 
Calculation of TAT in hybrid BIST:
The overall TAT, is the addition of PRTPs application time and DTPs (phase 1 and phase 2) application time. N PRTP shows the number of PRTPs and N DTP shows the number of DTPs.
Finding optimal N PRTP in hybrid BIST: The process of finding the optimal number of PRTPs is illustrated in Figure 1 . The function shown in this figure starts finding the optimal N PRTP between zero N PRTP and a maximum N PRTP as minimum and maximum, respectively (inputs of the function, Line 1). Then, TAT for min N PRTP and max N PRTP will be calculated. After that, TAT for three N PRTP between min and max N PRTP will be calculated (Line 6-10) to determine the optimal N PRTP is at the right of the (max N PRTP -min N PRTP )/2 or left. Then the same process continues by assigning the middle place as the min or max N PRTP . This process continues until finding the optimal N PRTP . The algorithm is very similar to binary search for finding the minimum TAT. The order of the algorithm is:
Where n is the maximum N PRTP used at the start of the algorithm.
III. CONCURRENT HYBRID BIST IN SOCS
The architecture of concurrent hybrid BIST, used in this paper, is shown in 2. Each core in the SoC shown in Figure 3 has a BIST architecture and a TAM or a functional bus ( [16, 20] ) which can be used to apply external test patterns. We use this bus for the deterministic tests, while pseudo random tests are generated internal to the core. If we use RS for the k th set of PRTPs for the built-in self-test part of core i, and RS for the k th set of DTPs for the external test part of core i, then:
In the above equations, RS is the set of test patterns that is applied to core i, Z [\ is the set of detected faults by RS .
RS is the test time that remains for application of remaining vectors in the test set for core i. Thus, in the above, the remaining test time is what remains of PRTPs and DTPs. 
IV. PEAK POWER LIMITATION AND TEST SCHEDULING
Many works have been done on power constraint SoC test scheduling and, also SoC hybrid BIST. The main contribution in this paper regarding to previous works is proposing a new test generation algorithm for a hybrid BIST architecture, and a test scheduling algorithm based on the generated test patterns. This section presents an algorithm for selecting cores that can be tested concurrently, and a test scheduling graph.
As mentioned before, in a hybrid BIST architecture, the total peak power limitation should be considered while cores are being tested concurrently. It is obvious that power consumption varies over time, but to simplify the test scheduling process, it is assumed that power consumption of each core is the same as its peak power consumption all over the test process [19] . For the rest of the paper, peak power consumption of core i is represented by P mi , and P max is the maximum power limit of the SoC.
For presenting the test scheduling algorithm, example of five cores shown in Table 1 . This table shows the characteristics of five cores that needs to be available at the start of the algorithm that calculates the set of cores that can be tested simultaneously. The parameters of this algorithm are: peak power consumption of core i (P mi ), time for applying DTPs (T(v d )), and total time for applying PRTPs (T(v p )).
For example Core 1 in Table 1 has 100uw test peak power, needs 300us for applying all deterministic test patterns using full TAM wires, and needs 200us for applying its PRTPs through BIST architecture. We will use this example for some definitions. For applying the proposed algorithm, we propose a test scheduling graph model in this section. The following definitions are used in presentation of the scheduling graph. The algorithm of Figure 4 finds M SoC for an SoC. This algorithm is a recursive algorithm where inputs are the power characteristic for each core of the SoC, peak power upper bound of the SoC, and an N k set. First, the algorithm initializes M SoC with an empty set. Then, core i with P mi is added to N k , if the total power for N k does not exceed P max . Then, the algorithm calls itself with a new peak power upper bound (P max -P mi ) and N k In each N k set, cores in the set are tested concurrently. When the BIST or the Externaltest part of a core, like N i , in a group of cores is complete, that core is released from BIST or External test part. Each node is labeled by its time duration, i.e., T node . T I corresponds to the time of the incomplete node I.
The test scheduling graph is performed considering power consumption of each core. Each node handles timing of BIST or External test part of the cores selected based on N k . Selection between BIST or External test part of a core in a node depends on the core's T(v p ) and T(v d ), and will be discussed in the next section. This example only shows transitions from one node to another.
According to the test scheduling graph shown in Figure 5 , first, the BIST part of Core 1 According to the proposed architecture in Section III, we should have at most one External test part while using the functional bus for applying deterministic test patterns (cores tested externally are underlined in Figure 5 ). Because of high time penalty of pausing External test parts (due to state saving), we choose not to pause external tests while a core is being tested through TAM or functional bus. In the following sections, based on the proposed architecture and assumptions, we will discuss our algorithm to find an optimal test scheduling graph. We use this algorithm to reduce TAM as much as possible. Figure 6 shows the proposed test scheduling algorithm based on test scheduling graph. The algorithm gets core characteristics and peak power consumption of an SoC. First, test generation according to algorithm shown in Figure 1 generates optimal DTPs and PRTPs for each core (Line 1). After that, the M SoC will be generated according to algorithm shown in Figure 4 (Line 2).
V. ALGORITHM FOR TEST SCHEDULING
The algorithm sorts all N k sets from M SoC by assigning a weight to them. This weight can help us decide which N k set should be selected first (Line 3 in Figure 6 ). Based on a weighted N k sets, i.e. Weighted_M SoC in Figure 6 , the algorithm selects N k sets by starting from the highest weight, to make test scheduling graph, TSG. This continues until all BIST parts and deterministic parts for all cores are covered (Line 4). After making the TSG, the algorithm adds some cores to incomplete nodes to be tested by their BIST part and generates new DTPs based on the added PRTPs in their BIST part (Lines 5-10). For example, consider again I 1 in Figure 5 . As mentioned, {2, 3} does not belong to M SoC, but {2, 3, 5} does. Then we can add BIST part to Core 5, by generating more PRTPs for Core 5, and including this BIST part to I 1 . By adding extra PRTPs to Core 5 new DTPs should be generated. It is obvious that if we add some PRTPs to test a core, the generated number of deterministic test patterns decreases. After that, the algorithm updates the weighs of N k sets based on the newly generated DTPs and PRTPs. This process continues until exhausting all incomplete nodes. For giving a weight to N k sets (W Nk ), the following Heuristics are helpful: Heuristic 1. To select N k sets for a test scheduling graph, sets of cores with longer External test parts are given a higher priority. So the weight for an N k set depends on the time of the longest External test part among the cores of the N k set, max ( ).
Reasoning. This is due to the fact that a pause for External test parts has more time overhead for global ATE that pausing BIST parts can be done.
Heuristic 2. As mentioned, to select an N k set, set of cores with the longest External test part is better to be selected first (Heuristic 1). The average of BIST parts of other cores in this set should be the largest of all available sets. So if we call e k the core with the longest External test part in N k , the weight for N k set depends on the average of T(v pe ), such that ∈ R and ≠ R .
Reasoning. It can be observed that a combination of long BIST parts with long External test part, in an N k set can result in more concurrency. Then, the average of BIST parts in each N k set should be high.
Based on the above heuristics we have: The set with the highest weight will be selected as the node of the test scheduling graph. Within that node, the core with the highest value of T(v d ) will be selected to be tested externally. After selecting an N k , the weight of all remaining sets should be reevaluated according to the test parts covered in the selected N k . If all External test parts of all cores are completed, or if it is not allowed to pause the External test part of the core that is being tested externally in the recently selected N k set, calculation of 
VI. EXPERIMENTAL RESULTS
For the experimental results, we used cores of MCDS 1 from [21] that is a version of d695 from ITC02 benchmark. The problem for comparing our results with available methods is availability of the gate level details of the cores. Then, d695 is the best choice because its cores are from ISCAS benchmarks. The characteristic of the SoC, i.e., power, frequency in each domain, etc., is exactly the same as MCDS 1 [21] . The ATALANTA test generator is used for determining DTPs for External test part of our hybrid BIST.
The optimal N PRTP and N DTP obtained by the proposed algorithm of Figure 1 for several d695 cores are shown in Table  2 . This information will be used for initializing the SoC test scheduling process. We used ATALANTA for generating DTPs, an LFSR for generating PRTPs, and a parallel fault simulator. The clock frequency for External test part is 100MHz. PRTPs are applied in one clock cycle. Using scan chain for applying DTPs, the addition of primary inputs and pseudo primary inputs determine the number of clock cycles for applying each DTP. Finally, our proposed method test cycles is computed from Equation 6.1.
In this equation, PMDV is the total number of selected deterministic test vectors, opt_PRTP is the number if pseudo random tests. PIs show the number of primary inputs and PPIs are number of pseudo primary inputs of the circuit. The results of the obtained number of the sets for M SoC for some SoCs according to the algorithm of Figure 4 are shown in Table 3 . We used d695 benchmark, MCDS 1 from [21] , and hCAD01 from [20] to show the number of sets and CPU time of the proposed algorithm. The results of TAT for the proposed method are shown in Table 4 and Table 5 , while using a fixed clock for ATE and different peak power and TAM width limitations. For comparison, there was a miss match between the number of generated test patterns by ATALANTA for full coverage and the number of test patterns reported in ITC02 benchmarks for each core. For example, for core "c6288" (that is the first core of "d695") ATALANTA generates 33 DTPs for full coverage, while 12 TPs are reported in ITC02 benchmark. So, we implemented the proposed method in [21] with the new number of test patterns generated by ATALANTA for full coverage.
The results are categorized by different peak power limitation, different number of TAM sizes, and fixed or flexible number of scan chains for each core. The fixed number of scan chains wereobtained from those reported in the ITC02 benchmark. In the flexible number of scan chains, no limitation is considered for determining the number of scan chains for each core. As shown, a considerable TAT reduction is obtained by the proposed method in comparison with [21] and [22] .
VII. CONCLUSIONS ANDFUTUREWORKS
In this paper, a concurrent Hybrid BIST method for reducing SoC test time is proposed. The most important constraint is the power limitation of the chip. An algorithm to find the most suitable cores to be tested concurrently is proposed. During the test process, applying Deterministic Test Patterns and Pseudo Random Test Patterns can be done together to reduce the test time. Experimental results show that using this method provides a considerable reduction in test application time compared to the previous methods for power constrained 
