Graphical abstract
INTRODUCTION
With the advancement in semiconductor technology, the complexity of Very-Large-Scale Integration (VLSI) designs is growing and the cost of testing is increasing. The external test with timeconsuming scan-in and scan-out of test data and expensive test equipment has become a bottleneck in testing due to limited input / output channels, bandwidth and speed of the external automated test equipment (ATE). The problems of ATE can be overcome by integration of built-in-self-test (BIST) technique in a circuit [1] . In this technique, test pattern generators (TPGs) are added to primary inputs (PIs) and response analysers (RAs) are added to primary outputs (POs). Thus, the circuit is enabled to generate test patterns internally and compact the output responses. However, adding these TPGs and RAs is not sufficient to achieve high fault coverage (even for single stuck-at faults) if the circuit contains loops in its structure. Therefore, several techniques BIST have been proposed [1] .
Basically, selected registers in the circuit-undertest (CUT) are modified into test registers in order to improve the CUT's connectivity. This is because test register is able to generate test patterns or compact test responses in the test mode. Test registers could be BILBO (built in logic block observer) [2] , CBILBO (concurrent BILBO), MISR (multiple input signature register) and LFSR (linear feedback shift register). BILBO Generally, BIST schemes are classified into testper-scan and test-per-clock. In the test-per-scan scheme, the test registers generate test patterns and compact test responses in a few clock cycles based on scan-in and scan-out processes through scan path [4, 5, 6] . Test-per-scan scheme requires low area overhead. However, it has long test application time.
In the test-per-clock scheme, the test registers generate test patterns and compact test responses in one clock cycle. The main advantage of this scheme is low test application time. However, testper-clock requires higher hardware overhead than the test-per-scan scheme does.
BIST technique in the test-per-clock scheme [7, 8, 9, 10] can be classified into two categories. The first category requires simultaneous testing of all combinational blocks whereas the second category requires each module to be tested independently from the other modules. These two methods add TPG at all primary inputs and RA at all primary outputs.
Wunderlich tested all combinational blocks simultaneously [7] . The DFT method augments a given circuit with BILBO or CBILBO so that the circuit becomes easily testable. Therefore, every loop of the circuit must contain one CBILBO or two BILBOs in order to operate both TPG and RA in the same test session. The advantage of this scheme is short test application time, since all combinational blocks are tested simultaneously. However, this scheme has high area overhead since several internal registers are modified to become BILBOs and CBILBOs.
In order to reduce hardware overhead in [7] , Masuzawa proposed a BIST method based on hierarchical test where each combinational module is tested independently from other modules [8] . The DFT method for this approach is called single control testability. This scheme does not require to add extra registers but only adds extra combinational logics to augment a given data path so that it is easily testable. Hence, the area overhead is lower than that in approach [7] . The disadvantage of this scheme is long test application time because only a single combinational module is tested at a time.
Yamaguchi [9, 10] proposed a new testability called concurrent single control testability to remedy the disadvantage of test application time in [8] . They extended the concept of the testability by introducing concurrent testing so that multiple combinational modules can be tested concurrently at a time. This scheme has shorter test application time than that in approach [8] but suffer from area overhead.
Nicolici proposed a BIST method that extracted structural information from data path [11] . The same type modules are grouped in test compatibility classes. Each class has more than one LFSR that used as test patterns generator. More than one test sessions are required to test the whole circuit. Besides, an extra comparator is added to check the output responses. Therefore, they needs more area overhead on top of overhead incur by LFSR and MISR.
According to Nassar and Salama, they proposed a method with minimum area overhead and minimum number of test sessions [12] . The algorithm of register allocation and binding was introduced. By using this algorithm, CBILBO and BILBO are applied in the CUT with minimum number of test sessions. Extra multiplexers are added to provide interconnection between the modules. Based on the analysis result, the proposed algorithm used CBILBO, BILBO, MISR, TPG and multiplexers to make the CUT easily testable. As a results, it suffers from high area overhead. More than one test sessions are required depending on the CUT design and its characteristics.
In this paper, we propose a BIST technique that test all combinational blocks simultaneously using test-per-clock scheme. The simulation result is compared with the same approach that use testper-clock scheme and not with the scan BIST because scan BIST requires long test application time. We classify the circuit into three categories based on BIST type and BIST insertion method to make the circuit BIST-able. They are called primitive BIST-able RTL circuit, concurrent BIST-able RTL circuit and reduced BIST-able RTL circuit (the proposed BIST). There is no element of DFT augmented into primitive BIST-able RTL circuit. For concurrent BIST-able RTL circuit, its DFT method using approach in [7] . In [7] , they obtained short test application time and high area overhead. We improve approach in [7] by keeping the advantage of short test application time and alleviating the disadvantage of area overhead. In order to reduce area overhead, we replace CBILBO with MISR for each loop because MISR also can generate test pattern and compact test response simultaneously, similar to CBILBO. The selected register as MISR as test register is determined such that the area overhead is minimum. We also get similarly high fault coverage to that of the approach in [7] . This paper is organized as follows. In Section 2, we present the classification of BIST-able RTL circuits whose all combinational blocks are tested simultaneously. In Section 3, we define the DFT method for reduced BIST-able RTL circuit. In Section 4, we present the simulation results and Section 5 for conclusion.
METHODOLOGY
In this section, we classify the RTL circuits into three categories and introduce DFT method.
Classification of BIST-able RTL Circuits
The RTL circuits are classified into three categories based on BIST type and BIST insertion method. They are called primitive BIST-able RTL circuit, concurrent BIST-able RTL circuit and reduced BIST-able RTL circuit. Each category uses LFSR as test pattern generator at PI while MISR as register analyser at PO. For primitive BIST-able RTL circuit, no internal register is modified into test register. For concurrent BIST-able RTL circuit, the original registers for each loop are augmented into CBILBO. For reduced BIST-able RTL circuit, the original circuit is augmented using our proposed DFT by ensuring at least a MISR or transparent MISR for each loop. MISR operates similarly to CBILBO to generate test pattern and compact test response simultaneously. To facilitate our discussion, we define the primitive BIST-able RTL circuit, concurrent BIST-able RTL circuit and reduced BIST-able RTL circuit as follows.
Definition 1 An RTL circuit is called primitive BIST-able if its primary inputs and primary outputs are modified into TPGs and RAs respectively. Referring to this example, there is no DFT element inserted internally into the primitive BIST-able RTL circuit. This approach obtains low fault coverage due to the untestable caused by the loops in the circuit. Note that for every loop, the register is not modified into test register.
According to [7] , they extend the primitive BISTable circuit by inserting two BILBOs or one CBILBO in each loop to guarantee high fault coverage. It is known as concurrent BIST-able RTL circuit. In order to define concurrent BIST-able RTL circuit, the transparent CBILBO or transparent MISR is defined as follows. , original register R2 is modified into CBILBO register (T2) and original register R3 is modified into CBILBO (T3). There is another solution to make this original circuit self-testable. It also can be achieved by inserting just one transparent CBILBO (T4) at the sharing path of both loops as shown in Figure 2 (c). Regarding the hardware cost, CBILBO is more expensive than BILBO and also inserting a transparent test register is more expensive than modifying an existing register with CBILBO. Similar to approach in [7] , our approach replaces CBILBO with MISR to break each loop. In our approach, the MISR not only can compact test responses but also can generate test patterns simultaneously. Our method can get high fault coverage and low area overhead. We call it as reduced BIST-able RTL circuit. The reduced BIST-able RTL circuit is defined as follows. Figure 3 shows two types of reduced BIST-able RTL circuit. Figure 3(b) shows reduced BIST-able with a MISR (T5 and T6) for each loop and Figure 3(c) shows reduced BIST-able with a transparent MISR (T7). Note that original register R2 is modified into MISR register (T5) and original register R3 is modified into MISR (T6) in Figure 3(b) . We also can insert only a transparent MISR (T7) at the sharing path of both loops to make the circuit self-testable as shown in Figure 3(c) . Regarding the cost, we choose the lower one between these two types based on the DFT method. Concurrent BIST-able RTL circuit and reduced BISTable RTL circuit have different ways of breaking a loop in the circuit. DFT for concurrent BIST-able RTL circuit breaks a loop with CBILBO while DFT for reduced BIST-able RTL circuit breaks a loop with a MISR. The area cost of CBILBO is more expensive than the area cost of MISR. With the advantage of reduced area overhead by using a MISR, we call our approach reduced BIST-able RTL circuit. In addition, we also can get low test application time and comparable high fault coverage as in approach in [7] .
Definition 2

Definition 4
DFT Method for Reduced BIST
This section describes the DFT method that augments a given RTL into a reduced BIST-able RTL circuit. Figure 4 illustrates the DFT method for reduced BISTable RTL circuit. Each benchmark circuit is written in VHDL that represents the RTL design. The RTL design uses a finite state machine with data path (FSM-D) model. Extended R-graph modelling is derived from VHDL. Next, extended MFVS is derived. The information of extended R-graph and extended MFVS are used to modify original RTL design with DFT. Then, reduced BIST-able RTL circuit is obtained. 
Extended R-Graph
Similar to R-graph [13] , extended R-graph is defined to represent the RTL circuit that contains the original register (node), test register (node) and connection between them (arc). The difference is extended Rgraph has an extra vertex called dummy vertex, D. The purpose of dummy vertex is to represent the new hardware added as transparent test register. Extended R-graph is defined as follows. Read node is the input node (register) while write node is the output node (register). Properties i, ii, iii, iv and v explain the connection between vertices in extended R-graph. Arc is generated in several ways instead of between read node and write node. They are from dummy node to dummy node, read node to dummy node or write node to dummy node. Property vi describes the size of vertex.
Definition 5 An extended R-graph of an RTL is a directed graph G=(V,A,w)
Area Cost
Cost for each vertex that represents the corresponding register are included in the extended R-graph. Each vertex is added a cost except the primary input vertex and primary output vertex. One unit area is equal to the size of a NAND gate as used in Synopsys Design Vision. Cost, p(v) for each vertex of original register and dummy vertex are defined in the extended R-graph as follows.
i. Each cost of original vertex is 5wr. ii.
Each cost of dummy vertex is12wd.
BILBO is used as MISR in the reduced BIST-able RTL circuit. The cost area added to augment an original vertex into MISR is 5. The cost area is 5 because it is the total area of additional logic of one AND gate, one NOR gate and one XOR gate as shown in Figure  5 . The figure shows that cost area for AND gate, NOR gate and XOR gate are 1, 1 and 3, respectively such that the total cost area to insert 1-bit MISR is 5. On the other hand, the cost area added to augment a dummy vertex into transparent test register is shown in Figure 6 . It requires additional logic of one XOR gate, one multiplexer and one flip-flop to insert a new transparent MISR in the circuit. The area for multiplexer, XOR gate and flip-flop are 3, 3 and 6, respectively. Therefore, the total cost area to insert 1-bit transparent MISR is 12. Figure 7 shows that each vertex except the primary input vertex and primary output vertex is assigned with a cost. The cost indicates the area overhead of the corresponding register when it is modified into a test register MISR or transparent MISR. R1 and R2 are original vertices while D1, D2 and D3 are dummy vertices. Thus, the cost of R1 and R2 are 5, and the cost of D1, D2 and D3 are 12. Reduced BIST-able RTL circuit
Extended MFVS
The algorithm of MFVS in [14] is used to identify the vertex that belong to minimum feedback vertex set (MFVS). At this paper, this algorithm is modified. The modified algorithm is called as extended MFVS. The extended MFVS is introduced in order to find a set of test registers such that the area overhead incurred (area cost) is minimum. Test register can be either MISR or transparent MISR. Note that test register MISR and transparent MISR are generated from original vertex and dummy vertex, respectively. Definition of extended MFVS is defined as follows.
Definition 6
An extended MFVS is a set of vertices V such that: i.
A vertex in the set of vertices, v∈V, breaks every loop. ii.
The summation cost of set of vertices that break the loops is minimized ∑ cost (v). v∈V
Two sets that break the loop are identified based on Definition 6. There are original vertex set and dummy vertex set. One of the sets is chosen as a test registers if that set has minimum total cost. Figure 7 shows the identification of vertex sets. Referring to Figure 7 two sets of MFVS is identified; original vertex set and dummy vertex are set. The MFVS of original vertex set is {R1, R2} and the MFVS of dummy vertex set is {D2}. Cost of each R1 and R2 is 5 and cost of D2 is 12. The total cost of original vertex set and dummy vertex set are 10 and 12, respectively. As a result, the original vertex set is selected as test register because the total cost of original vertex set is lower than that the total cost of dummy vertex set. Thus, R1 and R2 are modified into MISR to make the circuit testable. The test register is transparent MISR if dummy vertex set is selected as test register.
RESULTS AND DISCUSSION
Simulation Setup
RTL circuit is converted to its reduced BIST-able circuit using the method in Section 3. Next, reduced BISTable RTL circuit is synthesized to its gate level netlist using Design Vision. The result of area is obtained by Design Vision. Another tool, fault simulation in TetraMax is used to obtain fault coverage. The pseudo random test pattern is generated using C++. The analysis and comparison of the results for each circuit is done in terms of area overhead, fault coverage and test application time to show the effectiveness of the proposed method compared to the previous method, concurrent BIST-able RTL circuit. The simulation has been conducted on ITC'99 benchmark [15] circuits. The characteristic of the benchmark circuits are shown in Table 1 . Referring to Table 1 , FF represents the number of flip-flops while PI/PO represents the number of inputs/outputs of the circuit. The area column represents the area of the circuit where one unit of area is equal to the size of NAND gate. 
Simulation Results
The effectiveness of reduced BIST method is measured in terms of fault coverage, area overhead and test application time. The model of single stuckat-fault is considered during the simulation. The proposed method is compared with concurrent BISTable RTL circuit [7] and primitive BIST-able RTL circuit. Figure 8 represents the percentage of the area overhead for concurrent BIST-able RTL circuit (con circuit) and reduced BIST-able circuit (red circuit). Since no DFT element is inserted into the primitive BIST-able RTL circuit except the TPGs and RAs, thus no area overhead is considered. One unit area is equal to the size of a NAND gate in the netlist. The area overhead for reduced BIST-able circuit is provided by the new gates added to the circuit during DFT insertion. test register for concurrent BIST-able RTL circuit. Referring to circuit b02, only one register called stato that has 3 bits is modified into MISR and CBILBO. By modifying register stato into MISR and CBILBO, the area overhead incurs about 22% and 51%, respectively. In addition, the original register is modified into MISR at RTL such that the area overhead becomes low. Next, the measurement in terms of fault coverage is shown. The graphs of fault coverage versus clock cycles for primitive BIST-able RTL circuit, concurrent BIST-able RTL circuit and reduced BIST-able circuit are shown in Figure 9 -Figure 17 . The clock cycles are determined first. Next, the fault coverage is measured within the pre-determined clock cycles. The proposed method shows that the fault coverage is slightly lower than that of concurrent BIST-able RTL circuit for all the circuits. Since the test patterns generated by MISR for a combinational block partly depend on the test responses from the preceding combinational block, some of the random test patterns cannot be generated. Because there are not many useful random test patterns missed by the MISR, the fault coverage of our method is just slightly lower than that of concurrent BIST-able RTL circuit. Figure 9 shows the fault coverage for circuit b01. The fault coverage for reduced BIST-able circuit and concurrent BIST-able RTL circuit are 15% and 20%, respectively in 20 clock cycles. After the clock cycles are up to 60, the fault coverage of our method increases to 83% compared to 88% for concurrent BIST-able RTL circuit. For circuit b09 in Figure 13 , reduced BIST-able RTL circuit method obtains 82% while concurrent BIST-able RTL circuit achieves 86% after 7000 clock cycles. Circuit b09 needs more clock cycle of applying random test patterns in order to achieve the satisfactory fault coverage because the size of circuit b09 is bigger compared to circuit b01. This is supported by the results for other bigger circuits such as b10, b11, b13, b14 which need more than 1000 test patterns in order to achieve high fault coverage.
For circuit b11, b13 and b14, they need more than 50,000 test patterns to achieve high fault coverage. In these cases, MISR has 10 bits instead of 3 bits such that long sequences of test pattern are applied to detect a fault on circuit, thus, this increases the number of test patterns needed in the simulation. 
CONCLUSION
A new BIST technique for RTL circuits has been introduced in this paper. A given RTL circuit is augmented by our DFT method is called reduced BIST-able RTL circuit. The DFT method modifies the original registers into MISR or transparent MISR to make the circuit easily testable. The proposed BIST method has lower area overhead by about 32.9% and achieves comparably high fault coverage compared to the previous method, concurrent BIST using ITC'99 benchmark circuits.
