ABSTRACT Shared-memory systems enable parallel computing for the automatic test pattern generation (ATPG). Although the existing techniques for parallel ATPG reach near-linear speedup, test inflation becomes a common problem in its practicality. Therefore, this paper proposes a multi-threaded test pattern generation called MT-TPG that can suppress test inflation and accelerate fault processing, simultaneously, to retain high parallelism. For suppressing test inflation, hard-fault shuffling (HFS) and concurrent-fault interruption (CFI) are involved to avoid repeated detection of the same fault among different threads. For accelerating fault processing, the potentially-droppable-fault removal (PDFR) and single-pattern parallelfault simulation (SPPFSim) collectively drop not-yet-detected faults as early as possible for shortening the overall execution time of ATPG. According to our experimental results, the HFS and CFI can successfully suppress test inflation to < 4% on 17 benchmark circuits; PDFR and SPPFSim can achieve 13.7X speedup using 16 threads on average. As a result, MT-TPG is proven effective at unleashing parallelism with minimal test inflation on shared-memory systems.
I. INTRODUCTION
Continued growth in the size and complexity of very-largescale integration (VLSI) systems is a fueling demand for faster automatic test pattern generation (ATPG) for production test. The conventional ATPG algorithm running on a single processor now becomes a bottleneck, incapable of generating tests required for modern designs. The rapid development of multi-core processors has opened the door for parallel computing as a solution for scaled designs. The communication protocol determines the classification of parallel computing architectures into shared-memory systems and message-passing systems. Both of these parallel computing systems provide additional computing power, which has been used in [1] - [5] to accelerate ATPG.
Common strategies used in parallel ATPG include fault parallelism, heuristic parallelism, search-space parallelism, algorithmic parallelism, and circuit parallelism. Fault parallelism [6] divides all of the faults among the available processors, wherein each processor generates tests for its corresponding faults. In heuristic parallelism [7] , each processor employs a distinctive ATPG algorithm to generate a test for the same fault. Search-space parallelism [7] allows the processors to work collectively in the selection of a test for a single fault, by dividing the search space into discrete pieces and allowing each processor to work on the search spaces simultaneously. ATPG algorithms divide up the finegrained tasks to allow processors to work in parallel on each task based on algorithmic parallelism [8] . In circuit parallelism [9] , the circuit is divided into disjoint sub-circuits, and each processor performs ATPG operations on its respective part.
Although a variety of approaches for parallelization have been proposed, the core of ATPG does not vary and involves three operations: (1) test pattern generation (TPG), (2) fault compaction (FC) and (3) fault simulation (FS). In [10] , Krishnaswamy et al. parallelized fault simulation but disregarded test generation. Yeh et al. [2] demonstrated parallelism in test generation as well as fault simulation, and achieved sub-linear speedup; however, their solution is prone to severe test inflation (i.e. increase on pattern count). Wolf et al. [1] parallelized ATPG completely; however, they did not yet implement test compaction to reduce pattern counts. Aguado et al. [11] implemented serial compaction to avoid excessive tests; however, that approach is inapplicable to parallel structures. For taming test inflation inherent to parallel ATPG, fault broadcasting is commonly used [14] , [15] . Information pertaining to generated test patterns or newly detected faults is broadcast to all processors in order to reduce the probability of generating a test pattern for identical faults. In [3] , a novel parallel ATPG with parallelized test compaction was proposed. The use of depth-first-search (DFS) compaction as well as dynamic fault partitioning achieved near-linear speedup. Ku et al. [12] utilized the fanin-cone fault ordering and proposed the ripple search on fault compaction to alleviate test inflation with improved acceleration.
Despite these successes in parallel ATPG, test inflation remains serious and awaits to be solved. As shown in Fig. 1(a) , one parallel ATPG [12] executing sixteen threads still suffered averagely 35.5% inflation in pattern count for four circuits, mem_ctrl, pci_bridge32, bench4 and ethernet. Similarly, the work [3] from Synopsys also demonstrates the experimental result of > 15% test inflation on certain cases. According to [16] , a 5.9% increase in pattern count can lead to a 100% increase in test cost per unit (worst-case scenario), not to mention the additional time and related presilicon effort. Moreover, test inflation also limits acceleration. Fig. 1(b) shows that the best speedup only achieves 8.5X with 16 threads on ethernet. For another example, the speedup on mem_ctrl using 16 threads is slower than that using 12 threads. The fact suggests that generating unnecessary test patterns limits the acceleration for parallel ATPG system.
For conquering the problems of test inflation and speedup in parallel ATPG, we propose to generate two discrepant fault lists (one is the primary-fault list for test pattern generation and the other is the secondary-fault list for dynamic compaction) and then develop multi-threaded TPG (MT-TPG) to retain high parallelism with minimal test inflation. MT-TPG is incorporated with two key objectives: (1) suppressing test inflation and (2) accelerating fault processing. For suppressing test inflation, hard-fault shuffling (HFS) and concurrent-fault interruption (CFI) are proposed to reduce pattern count in MT-TPG. HFS avoids different threads taking the same fault as the primary fault during test generation by giving each of them a different order of faults. CFI is a smart inspection that avoids unnecessary detection on one fault when the status of such fault changes to ''detected''. For accelerating fault processing, potentiallydroppable-fault removal (PDFR) and single-pattern parallelfault simulation (SPPFsim) are involved for shortening total runtime. PDFR utilizes the result of random-pattern simulation to remove droppable faults from the fault list during compaction. Therefore, the number of the secondary faults is quickly dropped by PDFR at the beginning of ATPG. On the other hand, traditional ATPG takes time on waiting for a fixed number of patterns (commonly is 32) to be ready before performing parallel-pattern single-fault simulation (PPSFsim). However, in MT-TPG, each thread processes different faults concurrently. Once a pattern is generated, fault simulation is performed immediately since each thread does not need to wait for other threads. As a result, SPPFsim is chosen in MT-TPG to save such idle time.
These aforementioned approaches are implemented onto an ATPG engine (PODEM-X) 1 [17] . As a result, MT-TPG successfully achieves averagely 3.9X, 7.1X, 9.8X, and 13.7X speedup using 4, 8, 12 , and 16 threads, respectively, for 17 benchmark circuits. Moreover, comparing with another parallel TPG [12] that can achieve sub-linear speedup using 16 threads with 35.5% test inflation in certain cases, MT-TPG results in only 1.1%, 1.9%, 2.6%, and 3.9% test inflation on 4, 8, 12, and 16 threads, respectively, for 17 benchmark circuits. In addition, for particular circuits, pattern counts are even reduced (w.r.t. single-threaded TPG) by MT-TPG. Take ethernet as an example, pattern counts decrease by 1.0%, 1.3%, 1.2%, and 1.4% when using 4, 8, 12, and 16 threads, respectively.
The remainder of this paper is organized as follows: Section II presents the background information, including the fundamentals of ATPG and parallel architectures, as well as the problem of test inflation in MT-TPG. In Section III, we outline the architecture of MT-TPG. Section IV introduces two techniques (i.e. HFS and CFI) for suppressing test inflation, whereas Section V details two techniques (i.e. PDFR and SPPFsim) for accelerating fault processing. Experimental results are presented in Section VI. Finally, conclusions are drawn in Section VII.
II. BACKGROUND AND MOTIVATION
In this section, we review the background information pertaining to automatic test pattern generation (ATPG) and parallel architectures. The concept and flowchart of ATPG are introduced first. Next, the dynamic fault compaction is described briefly. Then, we outline the parallel architecture used in the proposed MT-TPG. Two different types of parallel ATPG are depicted in Section II-D. Finally, the motivation behind this study, which is the problem of test inflation on a multi-threaded system, is described in Section. II-E.
A. AUTOMATIC TEST PATTERN GENERATION
Imperfect manufacturing processes can lead to defects during fabrication, resulting in chips that potentially malfunction. The objective behind test pattern generation is to produce a set of tests capable of uncovering defects in a chip. Fig. 2 illustrates a high-level concept of test pattern generation for detecting defects. The circuit under test (CUT) at the top is defect-free, and any defective circuit that is functionally different must find at least one input pattern capable of differentiating itself from the defect-free one at the outputs. Generating effective test patterns for circuits is the goal of automatic test pattern generation (ATPG).
Seeking to generate tests targeting all possible defects that could occur during the manufacturing process is expensive. Rather, automatic test generators employ abstract representations of defects referred to as faults. One popular fault model is single stuck-at fault. It is assumed that only one fault is present in the circuit, in order to simplify the problem of test generation.
In the single stuck-at fault model, a fault simply denotes that a node is tied to logic 1 or logic 0 permanently. Fig. 3 presents a circuit with a single stuck-at fault where signal d is tied to logic 0 (denoted by d/0). A logic 1 must be applied from the primary inputs of the circuit to node d if there is a difference between the fault-free (or good) circuit and the circuit with a stuck-at fault. To observe the effect of the fault, a logic 1 must be applied to signal c, such that fault d/0 (if it exists) can be detected at output e. Test generation is an attempt to generate a test pattern for every possible fault in the circuit. In this example, faults such as a/1, b/1, and c/1 are also targeted by the test generator.
Moreover, some of the faults in the circuit can be logically equivalent, such as c/0 and d/0, inferring that no test can be derived by which to distinguish between them. Equivalent fault collapsing is used to identify equivalent faults a prior in order to reduce the number of faults that must be targeted [18] - [20] . Thus, ATPG is concerned only with the generation of test patterns for each fault in the collapsed fault list.
Next, a typical ATPG flow is introduced briefly. Fig. 4 illustrates such flow and consists of three phases: (A) test pattern generation (TPG), (B) fault compaction (FC), and (C) fault simulation (FS). In phase A, a primary fault f i is picked from the fault list first, and then it will be used for test generation. If f a is detected, then pattern p a is generated and serial TPG enters phase B. Otherwise, it picks the next primary fault for TPG. In phase B, fault compaction uses p a as a constraint to see if a secondary fault f b can be compacted together. This procedure repeats until all the secondary faults are processed. Finally, the typical ATPG will start fault simulation for fault dropping. Note that parallel-pattern single-fault simulation (PPSFsim) is commonly used while sufficient patterns are generated from the previous phases.
B. DYNAMIC FAULT COMPACTION
ATPG uses dynamic fault compaction to reduce pattern count by filling unspecified PI in one single test. During this phase, the remaining faults are treated as secondary faults. One test is modified for secondary faults if such test for the primary fault still reserves some unspecified bits (i.e. X s). The specified bits of the current test become constraints during the subsequent modification. A test for the target secondary fault can only be derived when it satisfies all existing constraints. Imposing constraints on the secondary faults limits the search space for test solutions and speeds up the ATPG process. If some more bits for the secondary fault are specified, these bits become a new constraint. The search repeats until all undetected faults are exhausted. One round of test generation is done for the primary fault and these subsequent compatible faults.
C. PARALLEL ARCHITECTURE
A shared-memory system that enables all processors to access all of the memory as a global address space, which is shown in Fig. 5 , is used in MT-TPG. Multiple processors operate independently but share the same memory resources. Furthermore, changes in the location of memory due to a processor are visible to all other processors. A parallel architecture provides powerful computing capability; however, it also leads to certain problem in our application. Parallel ATPG is made possible to target multiple faults, simultaneously; however, VOLUME 6, 2018 it also leads to the problem of test inflation. Thus, this work aims at addressing the issue of test inflation but retaining high parallelism for ATPG on a multi-threaded architecture.
D. DETERMINISM AND NON-DETERMINISM
Parallel ATPG can be classified into deterministic and non-deterministic ones. A deterministic parallel ATPG like [4] and [5] should generate the same test set regardless of the process time and the number of threads. To keep the same result, a synchronization procedure is used to avoid race conditions for achieving determinism. However, this often incurs non-negligible overhead for synchronization or communication. On the other hand, a non-deterministic parallel ATPG like [2] , [3] , and [12] does not need the consistency of test sets. Sometimes, a better result (i.e. low pattern count) can be even obtained. However, a non-deterministic parallel ATPG frequently runs into the problem of test inflation caused by race conditions of multiple threads. As a result, this work proposes a non-deterministic parallel ATPG called MT-TPG, which incorporates particular techniques for suppressing test inflation to reduce repeated detection for one fault.
E. TEST INFLATION ON MULTI-THREADED SYSTEM
We would like to further address the test-inflation problem in multi-threaded TPG and analyze its causes in Fig. 6 . Assume that TPG, FC, and FS denote the three phases of testpattern generation, fault compaction, and fault simulation. In a typical TPG, T and f represent one thread and a target fault, respectively. Fig. 6(a) shows the first case where TPG running for fault f 1 on thread T 1 and thread T 2 , simultaneously. Fig. 6(b) shows the second case where the target fault f 6 in TPG on thread T 2 is concurrently detected by FC on another thread T 1 . Fig. 6(c) shows the third case where the target fault f 5 in TPG on thread T 2 is concurrently detected by FS on another thread T 1 . Without particular effort, cases like TPG(f 1 ) in Fig. 6(a) , TPG(f 6 ) in Fig. 6 (b) or TPG(f 5 ) in Fig. 6 (c) may result in additional patterns, leading to serious test inflation in parallel ATPG.
Upon observing the causes of test inflation, we propose suppressing test inflation, which consists of hard-fault shuffling (HFS) and concurrent-fault interruption (CFI). HFS mainly avoids the simultaneous generation of tests on different threads as shown in Fig. 6(a) whereas CFI prevents the simultaneous generation of tests and fault simulation (fault compaction) of identical faults as shown in Fig. 6(b) and Fig. 6(c) . More details will be provided in Section IV.
III. MULTI-THREADED TEST PATTERN GENERATION (MT-TPG)
In this section, MT-TPG is elaborated with two objectives: (1) suppressing test inflation and (2) accelerating fault processing. In (1), hard-fault shuffling (HFS) and concurrentfault interruption (CFI) work together to reduce the repeated detection for the same fault. In (2), potentially-droppablefault removal (PDFR) and single-pattern parallel-fault simulation (SPPFSim) are applied to shorten total runtime.
To the best of our knowledge, all previous parallel TPG systems maintain the same fault list for test pattern generation (TPG) and fault compaction (FC). MT-TPG is the first work that attempts to to divide the initial fault list into the primary-fault list and the secondary-fault list for each thread. During the pre-process, several techniques are applied to generate the corresponding fault list for each thread. The overall framework of MT-TPG is shown in Fig. 7 . Fig. 7(a) illustrates that the initial fault list requires particular pre-processing before starting MT-TPG. After randompattern simulation, all faults can be classified into either hard faults or easy faults. To generate the primary-fault list (pf-list) for the TPG phase, hard-fault shuffling (HFS) assigns a different fault order to each thread by shuffling hard faults. To generate the secondary-fault list (sf-list) for the FC phase, potentially-droppable-fault removal (PDFR) deletes droppable faults from the initial fault list. Fig. 7 (b) details the complete flow of MT-TPG. In phase A, a thread will first pick a primary fault f a from pf-list for test pattern generation. If f a is detected, then pattern p a is stored accordingly and MT-TPG goes to phase B. Otherwise, it picks another primary fault for TPG. In phase B, fault compaction is applied on p a to see if the picked secondary fault f b from sf-list can be compacted. This procedure iterates until all the secondary faults in sf-list are exhausted. After fault compaction, MT-TPG assigns all unknown values in the given pattern with random filling (either 0 or 1) and then performs fault simulation for dropping other not-yet-detected faults in phase C. If all faults in pf-list are checked, the process of TPG is completed; otherwise, MT-TPG will return back to phase A for next f a .
IV. SUPPRESSING TEST INFLATION
For suppressing test inflation, MT-TPG incorporates two core techniques: (1) hard-fault shuffling (HFS) and (2) concurrent-fault interruption (CFI). HFS generates different orders for multiple threads to avoid taking the same fault before TPG; CFI detects if one fault that is concurrently processed by two different threads and early terminates the redundant computation if possible. In the following, we explain how each technique works for suppressing test inflation and shows their the effectiveness on four benchmark circuits. 
A. HARD-FAULT SHUFFLING
On a multi-threaded system, different threads may coincidentally pick the same fault as the primary fault in TPG. Repeated detection of one fault leads to test inflation and also limits speedup. Hard-fault shuffling (HFS) is proposed to minimize the likelihood of picking the same primary fault among threads. Random simulation is a simple-but-quick way to classify faults. Hard(-to-detect) faults [21] can be identified by a multiple-detection approach where top 10% of the faults with the least detection are considered as hard ones and will be shuffled in the primary-fault list. Fig. 8 uses four threads for illustrating HFS. T , H , E and DR represent the thread index, hard faults, easy faults and detection rate computed by random simulation, respectively. First, the faults are divided into the sets of hard faults and easy faults according to a threshold value α on the detection rate (DR). If DR of one fault is smaller than α, such fault is classified as a hard one. On the contrary, if its DR is larger than/equal to α, then it is classified as a easy one. Using four threads, the hard-fault set is partitioned into 4 subgroups (H 1 , H 2 , H 3 , and H 4 ). After partitioning, we shuffle the hard faults to generate a new order for each thread for avoiding one fault is picked repeatedly as the primary fault among different threads. For example, the pf-list for Unlike traditional fault partitioning, HFS does not assign a partially distinct fault set to each thread. Instead, in HFS, all easy faults need to be appended after the shuffled hard faults, because the status of easy faults cannot be confirmed (detected or undetected) even if all hard faults are processed. Although all faults are reserved in pf-list for TPG, the order is different for each thread. Fig. 9 presents an example with four threads for HFS, where the hard faults are f 1 to f 8 and the easy faults are f 9 to f 15 . After applying HFS, the pf-list of 15 , and the pf-list of T 3 is 15 . Since all four threads obtain the complete fault list with different orders, they are more likely to avoid targeting the same fault during TPG from a probabilistic perspective. 
B. CONCURRENT-FAULT INTERRUPTION
As mentioned previously, many researchers have used fault broadcasting to deal with the problem of test inflation [2] , [3] . However, this approach is insufficient when the status of a fault is changed by multiple threads, as illustrated by Fig. 6 . Fault broadcasting only passes along information; it does not stop on-going test generation for newly detected faults. Therefore, this scenario is encountered frequently in parallel ATPG, resulting in a serious increase in pattern counts. Thus, we propose concurrent-fault interruption (CFI) to reduce occurrence of repeated detection. When a fault is detected by one thread, it informs other threads about such discovery, and actually interrupts the test pattern generation (TPG) or fault compaction (FC) for the same fault on other threads.
In a ATPG system, test pattern generation (TPG), fault compaction (FC), and fault simulation (FS) may change the status of faults. As the implementation of ATPG is changed from a single thread to multiple threads, 9(= 3×3) conditions may cause duplicate detection. However, the cost of interrupting FS is expensive since FS is the last phase in ATPG and has the strongest power to drop faults. Therefore, CFI excludes interrupting FS and only considers 6 cases: (a) TPG interrupts TPG, (b) FC interrupts TPG, (c) FS interrupts TPG, (d) TPG interrupts FC, (e) FC interrupts FC, and (f) FS interrupts FC, in MT-TPG. Fig. 10 presents the six cases of CFI with two threads. In these cases, T and f denote a thread and a target fault, respectively. Fig. 10(a) illustrates the first case where the target fault f 5 is processed by TPG on T 1 and by TPG on T 2 simultaneously. Fig. 10(b) illustrates the second case, in which the target fault f 7 is processed by FC on T 1 as well as by TPG on T 2 . Fig. 10(c) illustrates the third case, in which the target fault f 9 is processed by FS on T 1 as well as by TPG on T 2 . The above three cases indicate that the processing TPG on T 2 is redundant and can be terminated when the target fault becomes detected by T 1 . As interrupted threads terminate their current computation immediately, and test pattern generation for a new fault will start. Instantly aborting test generation for newly detected faults can effectively reduce pattern count and computation time of ATPG.
The other three cases that interrupt FC are shown in Fig. 10(d), Fig. 10(e) , and Fig. 10(f) . Fig. 10(d) illustrates the fourth case where the target fault f 5 is processed by TPG on T 1 and by FC on T 2 concurrently. Fig. 10(e) illustrates the fifth case in which the target fault f 9 is processed by FC on T 1 as well as by FC on T 2 . Fig. 10(f) illustrates the sixth case in which the target fault f 7 is processed by FS on T 1 as well as by FC on T 2 . Again, the above three cases indicate that FC on T 2 is redundant. After terminating FC on T 2 , such interrupted thread can perform fault compaction for a next fault. Instantly aborting fault compaction for newly detected faults can also avoid unnecessary compaction on a pattern. As a result, a pattern can keep more unspecified primary inputs (PIs) for compacting other secondary faults later. Fig. 11 shows that result of test-inflation suppression by HFS and CFI, respectively, under 4, 8, 12 and 16 threads on four benchmark circuits. Ori., HFS, CFI, and HFS+CFI denote the original parallel TPG without any technique (Ori.), hard-fault shuffling, concurrent-fault interruption, and hard-fault shuffling combined with concurrent-fault interruption, respectively. The percentage of test inflation is computed based on the pattern count derived from the parallel ATPG using no particular technique to those derived from the respective techniques under different numbers of threads. As a result, HFS increases the average number of pattern count by 3.1%, 6.1%, 9.4% and 12.6% using 4 ,8 ,12 and 16 threads, respectively. This manifests that HFS effectively avoids picking the same fault during test pattern generation among different threads. For CFI, the average test inflations are 2.9%, 6.0%, 7.6% and 9.8% under 4 ,8 ,12 and 16 threads, respectively, referring that CFI can reduce computation on duplicate detection. Moreover, HFS+CFI averagely outperforms only HFS or only CFI in MT-TPG. HFS+CFI only increases the average number of test inflation by 1.0%, 2.0%, 3.2% and 3.4% under 4, 8, 12 and 16 threads, respectively. As a result, HFS greatly removes redundant tests of the same faults and CFI prevents unnecessary fault compaction to reserves more space for compacting latter faults.
Note that the runtime overhead of CFI is almost negligible for all benchmark circuits in MT-TPG. Instead, CFI further reduce the runtime by 5.4%, 9.6%, 10.2%, and 13.3% under 4, 8, 12, and 16 threads, respectively, for circuit ethernet.
Other benchmark circuits also demonstrate similar results. As one can see, the runtime improvement will be larger as the number of threads increasing. In summary, CFI successfully suppresses test inflation in MT-TPG and reduces more runtime as the number of threads increases.
V. ACCELERATING FAULT PROCESSING
This section describes two core techniques for accelerating fault processing in MT-TPG: (1) potentially-droppablefault removal (PDFR) and (2) single-pattern parallel-fault simulation (SPPFSim). First, PDFR is invoked for saving computational effort on those faults that can be early dropped during fault simulation. Later, SPPFSim is performed to drop more faults once a pattern is newly generated. Similarly, we explain how each technique works for accelerating fault processing and proves their effectiveness on four benchmark circuits.
A. POTENTIALLY-DROPPABLE-FAULT REMOVAL (PDFR)
As mentioned in section IV-A, all faults can be divided to hard faults and easy faults based on the result of random-pattern simulation. Moreover, some faults can be dropped easily in the latter fault simulation. Potentially-droppable-fault removal (PDFR) removes those faults from the secondaryfault list. In PDFR, one threshold value β on the detection rate (DR), which refers to the number of detections with respect to a fault divided by the total number of patterns used in the random simulation, is pre-defined. If the detection rate of one fault is greater than β, such fault is marked as a potentially droppable fault. As a result, these faults are more likely to be dropped during fault simulation later.
Four benchmark circuits and 10 runs with 16 threads are used in experiments for PDFR. Fig. 12 compares the pattern counts with respect to different threshold values (i.e. β s). For mem_ctrl, the pattern count only decreases by 1% as β changes from 0.01 to 0.02. For bench4, the pattern count decreases by 30% as β changes from 0.01 to 0.02. However, the curves of these cases become flat once β is greater than 0.1. Therefore, those potentially droppable faults (detection rate > 0.1) can be safely removed in MT-TPG, without increase in pattern counts. Fig. 13 compares the fault numbers in sf-list of each pattern between MT-TPG with PDFR and without PDFR. In Fig. 13 , as the pattern index is smaller than 10, the impact of PDFR is greater. For example, as the pattern index is 1, the reduced fault number of the sf-list list are 2233, 8327, 2337, and 31385 on mem_ctrl, pci_bridge32, bench4, and ethernet, respectively. On the contrary, when the pattern index is greater than 5, there is no reduction of those numbers of the secondary faults. In summary, PDFR can quickly reduce the number of the secondary faults at the beginning of ATPG.
B. SINGLE-PATTERN PARALLEL-FAULT SIMULATION (SPPFSIM)
Fig. 14 shows the two types of fault simulation in a parallel ATPG. Assume that T and pttn denote a thread and a pattern, respectively. In general, a typical parallel ATPG performs parallel-pattern single-fault simulation (PPSFsim) when a fixed number of patterns are generated as Fig. 14(a) shows. The synchronization is required before PPSFsim. Therefore, the thread (T 1 , T 2 , or T 3 ) idles until the last pattern is generated by T 4 . This indicates that the waiting time of a thread is a waste in parallel computing. Furthermore, fault simulation is effective for dropping faults. If a parallel ATPG performs fault simulation until the pattern count is sufficient, then it may miss the chance of detecting faults earlier by fault simulation. For better fault dropping, we propose to replace parallel-patten single-fault simulation by singlepattern parallel-fault simulation (SPPFsim) in MT-TPG. Fig. 14(b) shows that each thread perform SPPFsim immediately after each pattern is generated. As one can see, much time is saved after generating 8 test patterns when a parallel ATPG uses SPPFsim.
In the following, we compared the pattern counts and the runtime between PPSFsim and SPPFsim. Table 1 first compares pattern counts on four benchmark circuits. The pattern reduction is computed based on the pattern count derived from MT-TPG using PPSFsim under different numbers of threads. Experimental results indicate that the best circuit pci_bridge32 achieves about 5% runtime improvement on average.
Last, we also investigate the impact of the runtime between PPSFsim and SPPFsim. Table 2 presents a comparsion of runtime reduction using two types of fault simulation in MT-TPG under different numbers of threads. Experimental results demonstrate that the runtime of a parallel ATPG using SPPFsim is shorter than PPSFsim on all cases. From the above experiment, SPPFsim generates fewer pattern counts than PPSFsim does. Moreover, SPPFsim drops the undetected faults much earlier and shortens the overall execution time. As a result, SPPFsim is more suitable for parallel ATPG than PPSFsim on a multi-threaded system.
VI. EXPERIMENTAL RESULTS
MT-TPG is implemented in C/C++ with Pthreads and runs on a Linux machine with 20 processors and 64GB RAM. Experiments are conducted on 17 benchmark circuits: seven from ISCAS'89, five from IWLS'05, and five from the Industrial Technology Research Institute of Taiwan (ITRI) [22] . Table 3 lists information related to each benchmark circuit. Columns 2 to 3 list the number of gates (#gate) and the number of stuck-at faults (#SA-fault). Columns 4 to 9 list the pattern count (#pttn), the fault coverage (FC (%)), and the runtime (sec), which are derived from a commercial tool 2 and MT-TPG, respectively.
MT-TPG incorporates the package PODEM-X, which is an in-house implementation of the Path-Oriented DecisionMaking algorithm. In particular, two techniques (HFS and CFI) for suppressing test inflation and the other two techniques (PDFR and SPPFsim) for accelerating fault processing are applied on the PODEM-X engine. We also examine how the number of threads affects the performance of the speedup and test inflation on MT-TPG.
A. COMPARISON OF PATTERN COUNTS UNDER DIFFERENT NUMBERS OF THREADS
Four benchmark circuits and 100 runs for each thread are prepared in this experiment. Fig. 15 compares the pattern counts under different numbers of threads. The bottom and top of each box are the first and third quartiles for pattern counts, and the band inside the box is the median among 100 runs. The ends of the whiskers represent the minimum and maximum of pattern counts of 100 runs. For mem_ctrl and pci_bridge32 , both the average pattern counts decrease under 4 threads, but increase as the number of threads larger After applying PDFR and SPPFsim, MT-TPG achieves an average speedup of 4.0X, 7.4X, 10.1X and 13.7X using 4, 8, 12 and 16 threads, respectively. In particular, the speedup of some cases (like b15 on 16-thread) even become supper-linear (i.e. the speedup is larger than the number of threads). These facts manifest the effect of fault dropping in parallel ATPG. Therefore, this also makes super-linear acceleration more likely to occur in MT-TPG. In addition, all faultcoverage variation (< 0.05%) that comes from random filling for the generated patterns can be negligible, demonstrating the stability of MT-TPG.
VII. CONCLUSIONS
Runtime acceleration and test inflation are two critical issues in parallel ATPG for modern VLSI testing and cannot be solved individually. Duplicate detection in parallel ATPG leads to test inflation. Furthermore, the generation of additional patterns limit runtime acceleration. Therefore, in this work, we proposed MT-TPG to reduce test inflation and retain high parallelism, simultaneously. The proposed MT-TPG mainly consists of two objectives: (1) suppressing test inflation (2) accelerating fault processing. For suppressing test inflation, Hard-fault shuffling (HFS) and concurrentfault interrupt (CFI) are proposed to reduce pattern counts. HFS generates different fault lists for different threads to avoid one fault is concurrently targeted. CFI intervenes the test generation or fault compaction as the target fault is processed by other threads. For accelerating fault processing, potentially-droppable-fault removal (PDFR) and singlepattern parallel-fault simulation (SPPFsim) are applied to shorten total runtime. PDFR speeds up the ATPG at the beginning by removing potentially droppable faults before dynamic fault compaction. SPPFsim saves much idle time and provides better efficiency of fault dropping than parallelpattern single-fault simulation (PPSFsim). Comparing to the test inflation (> 15%) 3 
