New BIST methodology for RTL data paths is presented. The proposed BIST methodology takes advantage of the structural information of RTL data path and reduces the test application time by grouping same-type modules into test compatibility classes (TCCs). During testing, 
Introduction
Register transfer-level (RTL) is the abstraction level in the behavioral domain of the very large scale integration (VLSI) design flow where an integrated circuit is seen as sequential logic consisting of registers and functional units that compute the next state given the current memory state. The functional units which compute the next state logic are arithmetic logic units (ALU), multipliers or complex multi-functional library modules. The complexity of modern digital circuits requires automated synthesis and optimization techniques that can explore a wide class of implementations choices using computer-aided design (CAD) tools [1] . High level-synthesis is the process of generating RTL structure from a behavioral description [2] . The modules (functional units) allocated by high level synthesis algorithms are generated by module generators which are able to synthesize the layout of modules with high performance and device density.
The modules are placed in module libraries and have identical physical information. Given the complexity of modern digital circuits it is necessary that testability is addressed at RTL due to fewer elements than at the gate level which makes test synthesis and test scheduling problems more tractable.
Previous work
Two main approaches have been proposed to enhance the testability of digital circuits at RTL.
The first approach is aimed at minimizing the complexity of automatic test pattern generation (ATPG). In [3] the complexity of ATPG for scan-based design for testability (DFT) techniques is reduced by an efficient selection of scan flip-flops using RTL information. The high test application time associated with scan-based technique is overcome by using scan chain reconfiguration to reduce shifting time [4] and use of partial scan design of RTL circuits [5] . However, a significant disadvantage of the scan-based technique is that at-speed testing with the complete test set is not possible (i.e., all test patterns cannot been applied at the operational speed of the circuit). To solve at-speed testability, nonscan DFT techniques applicable to RTL data paths were proposed in [6] . Instead of selecting flip-flops to make controllable/observable as the conventional scan-based techniques, execution units are selected using an execution unit graph.
Performing at-speed testability and reducing test area overhead is achieved at the cost of expensive test pattern generation phase. To reduce the costs of test pattern generation, an algorithm that adds minimal test hardware in order to ensure that all the embedded modules in the circuit are hierarchically testable was presented in [7] . In [8] a technique for extracting functional (control/data flow) information from RTL controller/data path is presented, thus avoiding the use of high level information [7] . Recently in [9] a testability analysis methodology for modular designs is introduced which extracts a set of justification and propagation requirements based on the cone-of-logic of each input and output. However, despite reducing both area overhead and ATPG complexity the test application time and the volume of output data are still high.
The second approach to enhance testability of RTL circuits is built-in self-test (BIST) [10] .
While scan BIST [11] eliminates the use of ATPG, it still requires high test application time and volume of output data associated with scan based design. On the other hand parallel BIST reduces both test application time and volume of output data [10] . From now onwards BIST hardware synthesis refers to parallel BIST test hardware insertion for RTL data paths. BIST hardware synthesis at RTL can be further subdivided to functional-based and structural-based BIST hardware synthesis. Functional-based BIST hardware synthesis based on algorithmic and deterministic BIST scheme was presented in [12] .This algorithm uses a high-level cell fault model, and data paths are assumed to be composed of only specific adders/subtracters and multipliers. Combination of different BIST schemes and reusing pre-existing modules of the data path for functional-based BIST hardware synthesis under heterogenous test schemes was proposed in [13] . Another functional-based BIST hardware synthesis [14] uses the controller netlist to extract the test control/data flow to derive a set of symbolic justification and propagation paths. In [15] regular expression based high level symbolic testability analysis further reduces test area overhead under delay constraints by carefully carefully selecting a small subset of registers to serve as test pattern generators and output response analyzers. Recently, redundancy identification and testability improvement of digital filter data paths was proposed in [16] which restricts to circuits which are described as a network of shift, add, delay, sign-extension and truncation elements. All the previous functional-based BIST hardware synthesis techniques [12] [13] [14] [15] [16] depend strongly on the functional information of data path modules and/or high level control/data flow. On the other hand, structural-based BIST hardware synthesis inserts test registers by analysing interconnections between registers and modules in a given RTL netlist, without using the functional information of data path modules or high level control/data flow. This makes structural-based BIST hardware synthesis more suitable at RTL than functionalbased BIST hardware synthesis when only the structural interconnection of data path modules and registers is given. An early structural-based BIST hardware synthesis algorithm at RTL was presented in [17] without taking into account the test application time. Another structuralbased BIST hardware synthesis algorithm that minimizes test application time and BIST area overhead was proposed in [18] . The algorithm, however, has an inefficient testable design space exploration due to fixed test resource allocation, which means that the test hardware is allocated before the test scheduling process. Furthermore, the optimization algorithm limits the number of test plans to only four per module, leading to limited number of explored testable designs.
To overcome the fixed test resource allocation, simultaneous test hardware insertion and test scheduling was proposed in [19] . While previous test scheduling algorithms [20] [21] [22] assumed fixed test resource allocation, the work in [19] presented an incremental test scheduling procedure which overcomes the limited testable design space exploration encountered with fixed test resources. Despite its good performance, the algorithm in [19] is not capable of dealing in low computational time with complex designs such as 32-point discrete cosine transform (DCT), since a branch and bound-based algorithm is employed to explore the testable design space. A recent approach which explores the testable design space during high level synthesis has been proposed in [23] . However, same test length is considered for all data path modules which leads to unnecessary long test application time.
Motivation and objectives
Up to this point, the described structural-based BIST hardware synthesis algorithms have assumed the BIST embedding methodology where every module port is embedded between a test pattern generator and a signature analysis register. This methodology is inefficient due to the following four problems: a. to achieve low test application time high number of test registers is required which leads to large BIST area overhead and performance degradation.
b. since every module belongs to a different BIST embedding the aliasing can occur for every module tested separately leading to increase in fault escape probability for the entire data path.
c. the increased number of signature analysis registers yields large volume of output data and increases the overall test application time due to the time required to shift out of test responses.
d. the huge size of the testable design space where test synthesis and test scheduling are strictly interrelated leads to long computational time for efficient testable design space exploration.
To overcome the large number of test registers in the BIST embedding methodology (problem a), a methodology based on chaining modules into test paths was described in [24, 25] .
Randomness and transparency of data path modules [26] are used to guide the simultaneous test path generation and test scheduling. Despite reducing the performance degradation, the great number of test patterns for each test path, which are no longer truly pseudorandom, increased the test application time. The test path generation algorithm lacked the global view of the design space and the suboptimal solution depends on the order in which the modules are processed. Furthermore, the pipelined test scheduling for multiple clock cycles test paths increases the complexity of the BIST controller as the design complexity is enlarged. Concurrent checkers [27, 28] have been used for reducing fault escape probability (problem b) during offline self-test. While large BIST area overhead solutions based on duplicate circuitry realized in complementary form are described in [27] , the results presented in [28] show that extra test hardware required to achieve low fault-escape probability, if designed as a combination of a concurrent checker and signature analysis registers, is more cost-effective than the design using only signature analysis registers. Recently a different approach which combines mutual and signature testing schemes [29] has been proposed for reducing fault escape probability. This approach uses test registers that combine equality comparators and signature analysis registers leading also to reduction in the volume of output data (problem c). However, due to large number of test registers when maximum test concurrency is targeted the problem of BIST area overhead and performance degradation are not solved. The previous approaches [24] [25] [26] [27] [28] [29] proposed separate solutions for solving only one of the problems (a) -(c) at the expense of the other problems of the BIST embedding methodology. Furthermore, the interrelation between test synthesis and test scheduling which leads to huge size of the testable design space (problem d) was not solved efficiently by the previously described approaches [17] [18] [19] [20] [21] [22] which trade-off the quality of the final solution and computational time.
The aim of this paper is to introduce a new BIST methodology for RTL data paths using a new concept called test compatibility classes which reduces test application time with comparable or even lower BIST area overhead when compared to the traditional BIST embedding methodology. The proposed BIST methodology which targets data flow intensive application domains, like digital signal processing, communications and graphics, overcomes the performance degradation, fault-escape probability and volume of output data associated with the BIST embedding methodology. Furthermore efficient heuristics for testable design space exploration produce high quality of the final solution in low computational time. The paper is organized as follows. Section 2 introduces the TCC grouping methodology. BIST hardware synthesis for TCC grouping is given in section 3. Experimental results of benchmark and complex hypothetical data paths are presented in section 4. Finally, concluding remarks are given in section 5.
New BIST methodology for RTL data paths
This section motivates the key ideas presented in this paper through examples and gives formal concepts and definitions of the proposed BIST methodology. First the shortcomings of the traditional BIST embedding methodology are identified and benefits of the proposed BIST methodology are outlined using a detailed example. Then the formal definition of test compatibility classes is given.
An illustrative example
Traditional BIST embedding methodology embeds every module port between a test pattern generator and a signature analysis register. This may lead to conflicts between different test resources when maximum test concurrency is targeted. Furthermore the number of test resources for low test application is extremely high leading to both high BIST area overhead and performance degradation. The proposed BIST methodology takes advantage of the structural information of RTL data path and reduces the test application time by grouping same-type modules into test compatibility classes (TCCs). Two modules are of the same type if they are two different instances of the same module library prototype and hence they have the identical physical and structural information. Due to the identical physical and structural information the fault sets of two same-type modules have the same detection probability profile [30] . Thus, the same test pattern generators can be used simultaneously (no need to schedule the tests at different test times) for two or more same-type modules without decreasing the fault coverage. On the other hand fault sets of different-type modules have different detection probability profiles and hence different test pattern generators and different test application times are needed to satisfy the required fault coverage. It should be noted that use of hard macro implementations of library modules which have identical physical and structural information can significantly improve the final design [31] . Furthermore design methodologies which use regular elements and identify similarity need to be incorporated in state of the art CAD tools [31, 32] . Therefore the proposed BIST methodology is targeting design flows that use few pre-designed module types with identical physical and structural information and exploits the regularity of the data path to reduce test application time and BIST area overhead as explained in the following example.
Example 1
To give insight of the proposed BIST methodology consider the simple data path shown in Figure 1(a) Figure 1 fault-escape probability since faulty output responses which map into fault-free signatures in the BIST embedding methodology will be detected by the comparators. And thirdly, the number of signatures is reduced which has the following two implications. On one hand the volume of output data is reduced which leads to less storage required for test data. On the other hand the overall test application time is reduced due to less clock cycles needed to shift out the test responses. For example, given the data path width has 8 bits width, the time required to shift out the output response stored in MISR 7 , MISR 8 , and MISR 9 ( Figure 1(a) ) is 24 clock cycles when compared to only 8 clock cycles required to shift out the output response stored in MISR 7 ( Figure 1(b) ).
Solutions using comparators described in [27] k test patterns are required to test a n-input k-bit comparator. Any portion of the data path not tested by the proposed BIST methodology is tested using a small global set of functional patterns. Since comparators check the responses of same-type modules which are inherently different cones-of-logic the small global test of functional patterns can be generated easily using the justification/propagation techniques [8, 9] . The small global set of functional patterns is applied in a preliminary phase and has no impact on the overall test application time.
Definition of Test Compatibility Classes
An RTL data path consists of n reg registers, n mod two-input modules of n res module-types, and multiplexers. Before test compatibility class concept is introduced, it is necessary to present the following preliminary definitions. The test registers used to perform TPGF are: LFSRs, built-in logic block observers (BILBOs) and concurrent BILBOs (CBILBOs). If for each input port l (IP l ) of every data path module, of different bit width are tested using different bit-width for test registers and n-input k-bit comparators. Moreover, the proposed methodology can handle both several modules chained together without any registers between them and particular cases when logic/RTL synthesis tools transform different instances of the same module type into different implementations by considering them as new module-types with new detection probability profile.
Example 2 To illustrate Definitions 1-3 consider the data path example of Figure 2 , where 
. Similarly, the ORS(TCC 0 1 ) is
Any of these two registers R 11 and R 12 can be configured as signature analysis register for TCC 0 1 . The procedure that chooses the best signature analysis register is presented in section 3.3. For data path example in Figure 2 the chosen signature analysis register for TCC 0 1 is R 12 whilst both TCC 0 0 and TCC 1 0 use R 7 as signature analysis register at different test times.
New BIST hardware synthesis algorithm for TCC grouping
Having described the TCC grouping methodology, now a BIST hardware synthesis algorithm is considered. As outlined in section 2, the BIST embedding methodology is a particular case of the TCC grouping methodology where each TCC consists of a single module. Therefore, testable design space for the TCC grouping methodology is much larger and more complex than testable design space for the BIST embedding methodology. The previous approaches [17] [18] [19] [20] [21] [22] which trade-off the quality of the final solution and the computational time are unsuitable for the size and complexity of the TCC grouping methodology. This section presents a new and efficient testable design space exploration which combines the accuracy of incremental test scheduling algorithms [19] with the exploration speed of test scheduling algorithms based on fixed test resource allocation [20] [21] [22] . Section 3.1 outlines the general framework of tabu search-based testable design space exploration. Section 3.2 presents the generation of new solutions and speed up techniques for local neighborhood search. Finally in section 3.3 an incremental TCC scheduling algorithm for each solution is proposed.
Tabu search-based testable design space exploration
Tabu search [33] was proposed as a general combinatorial optimization technique. Tabu search falls under the larger category of move-based heuristics which iteratively construct new candidate solutions based on the neighborhood that is defined over the set of feasible solutions and the history of optimization. The neighborhood is implicitly defined by a move that specifies how one solution is transformed into another solution in a single step. The philosophy of tabu search is to derive and exploit a collection of principles of intelligent problem solving. Tabu search controls uphill moves and stimulates convergence toward global optima by maintaining a tabu list of its r most recent moves, where r is called tabu tenure and it is a prescribed constant. Occasionally, it is useful to override the tabu status of a move when the move is aspirated (i.e., improves the search and does not produce cycling near a local minima). Tabu search based heuristics are simple to describe and implement. Furthermore, a well defined cost function and the use of topological information of the design space will lead to an intelligent search of high quality solutions in very low computational time. Before the proposed tabu search-based testable design space exploration is described, it is necessary to present the following definition.
Definition 4 A solution in the testable design space is a partially testable data path PT-DP where test pattern generators are allocated for each data path module. A fully testable data path FT-DP is generated by allocating signature analysis registers for each test compatibility class of the partially testable data path.
The proposed tabu search-based testable design space exploration is summarized in Figure 3 .
The algorithm starts with an initial solution which is a partially testable data path PT-DP init obtained by randomly assigning a single test pattern generator to each input port of every module from the data path as shown from lines 1 to 4. During the optimization process (lines 5 to 21) for each current solution PT-DP current , a number of n reg neighbor solutions are generated as described in section 3.2. Test application time T x and BIST area overhead A x are computed after a fully testable data path FT-DP x and a test schedule S x are generated using the algorithms from section 3.3, as shown from lines 8 to 12. The optimization process is guided towards the objective of minimal test application time design by a cost function which is defined as follows.
Definition 5 The cost function is a 2-tuple
where T x is the test application time, A x is the BIST area overhead and the following relations are defined:
The main objective of the cost function is test application time with BIST area overhead used as tie-breaking mechanism among many possible solutions with same test application time. It should be noted that the minimization of other parameters outlined in section 2, performance degradation, volume of output data, overall test application time and fault escape probability, is a by-product of the proposed optimzation using the previously defined cost function. Based on the value of the cost function and on the tabu status of a move, a new solution is accepted or rejected as described from lines 14 to 19 in Figure 3 . 
Generation of new solutions and speed up techniques for local neighborhood search
The neighborhood of the current solution in the testable design space PT-DP current is defined with n reg feasible neighbor solutions. For each data path register there is a single neighbor solution. Each of the n reg solutions is provided by an independent subroutine designed to identify better configuration of test registers based on two new metrics. Due to the huge size and complexity of the testable design space, speed up techniques for efficient exploration are required.
Before defining the neighbor solution for each register two new metrics and a theorem used for reducing the testable design space are presented.
Definition 6
The current spatial sharing degree C SSD
of module-type j is the number of modules of j for which R x performs test pattern generation function (TPGF) for IP k in the current partially testable data path.
Definition 7
The maximum spatial sharing degree M SSD
of module-type j is the number of modules of j for which R x can perform TPGF for IP k .
The value of M SSD
is the cardinality of the set of modules of module-type j whose IP k is connected to R x through only multiplexers.
The following theorem presents a very important theoretical result which has two implications on speeding up local neighborhood search. See [34] for the proof.
Theorem 1 Consider two current solutions, PT-DP 1 current and PT-DP 2 current , with different C SSD
for given R x , j and IP k . In PT-DP 1 current the current spatial sharing degree is 0
, whilst in PT-DP 2 current the current spatial sharing degree is
. Then PT-DP 2 current has at most the number of TCCs as PT-DP 1 current .
The first implication of the theoretical result of Theorem 1 reduces the total testable design space to the representative testable design space. The total testable design space consists of partially testable data paths with all the possible values 0 C SSD i. The first phase computes:
x is a metric that measures the difference between the potential and actual use of R x as a test pattern Let j max be the index of module-type and k max be the index of input port for which max is maximum. Let
be the set of modules of module-type j max whose IP k max is connected to R x through only multiplexers. Before the move, R x performs TPGF for
. After the move, R x performs TPGF for by reduction in the size of the testable design space to be explored.
Incremental TCC scheduling algorithm
So far the testable design space to be explored was reduced with respect to the number of test registers required for test pattern generation using the speed up techniques for local neighborhood search. The algorithms outlined in this section further shrink the size of the testable design space by considering simultaneous TCC scheduling and signature analysis registers allocation for each partially testable data path generated by local neighborhood search. Firstly the assignment of every data path module to test compatibility classes to maximize test concurrency is summarized. Secondly the algorithm for simultaneous TCC scheduling and signature analysis registers allocation is described.
To achieve maximum test concurrency it is required that a large number of different-type test compatibility classes are compatible. Following the second property of TCCs (Definition 3-(ii)) a high number of incompatible modules are sought to be merged in a small number of in- , it is checked whether R k belongs to the used test register set and the R k with the maximum fanin is chosen; this choice will allow R k to be reused at a later test time.
ii. when the shortest currently active test TCC i j is completed, the test register R k that has served as signature analysis register is removed from the busy register set B and added to the used register set U .
iii. after the completion of test scheduling all the registers from the used register set U are modified to signature analysis registers; the algorithm returns a test schedule S and a fully testable data path FT-DP which are used to compute test application time and BIST area overhead in the tabu search testable design space exploration (Figure 3 ). 
Experimental results
The BIST hardware synthesis for the TCC grouping methodology has been implemented on SUN SPARC 20 workstation using 6000 lines of C++ code. To give insight into the efficiency of testability achieved using the presented approach Table 1 So far the reductions in TAT and BIST area overhead achieved by the TCC grouping methodology when compared to the BIST embedding methodology were outlined. Table 2 Table 1 For most of the designs the proposed TCC grouping reduces TAT when compared to BIST embedding. However, when both BIST embedding and TCC grouping achieve low TAT, the reductions in BIST area overhead, number of test registers (impact on performance degradation), volume of output data and overall test application time are substantial. For example in the case of EX-9 reductions of 50% in TAT, 23% in BAO, 47% in TR, 94% in VOD and 61% in overall TAT are achieved. Furthermore, the computational time for obtaining high quality solutions is still very low related to the size of the testable design space. For example it took less than 600s
to find high quality solutions for data paths with 45 modules and up to 115 registers.
Finally, Figure 5 shows how the proposed TCC grouping methodology decreases the faultescape probability when compared to the BIST embedding methodology. The experiments were done for a data path module with possible 10 6 error sequences, where the aliasing error sequences, for a given characteristic polynomial of signature analysis register, vary from 10 to 90. Fault-escape probability of a module varies from P m ¥ 0¦ 01% to P m ¥ 0¦ 09%. As it can be seen from Figure 5 (a), in the case of BIST embedding methodology the fault escape probability for group of modules
increases as the number of modules tested simultaneously increases.
On the other hand, in the case of the TCC grouping, the fault-escape probability decreases exponentially with the number of modules tested simultaneously as shown in Figure 5 (b). This is due to the fact that a fault is not detected in the TCC grouping methodology only when initially the n-input k-bit comparator fails to detect the fault and subsequently the signature of a TCC also fails to detect the fault. A previous work on reducing fault-escape probability at the expense of increased area overhead, performance degradation, and volume of output data was presented in [40] . Note that the proposed methodology does not introduce any area overhead, nor performance degradation, whilst the reduction in fault-escape probability is exponential.
Conclusion
This paper has addressed the testability of RTL data paths. It has been shown that an improvement in terms of test application time, BIST area overhead, performance degradation, volume of output data, overall test application time (the sum of test application time and shifting time required to shift out test responses) and fault-escape probability is achieved using the newly introduced test compatibility classes-based methodology. Furthermore the proposed BIST hardware synthesis algorithm achieves high quality of the final solution in low computational time.
The proposed methodology and the BIST hardware synthesis algorithm have been successfully integrated in high-level synthesis design flow [36] leading to lower design cycle by considering testability at higher levels of abstraction than the gate-level. This reinforces the conclusion reached recently by other researchers [6] [7] [8] that testability of digital circuits is best explored and optimized at the register transfer level. Since the proposed methodology targets RTL data paths of data flow intensive designs future work will investigate integrated controller/data path testing for both data flow and control flow intensive circuits. (b) Decrease in fault-escape probability for TCC grouping Figure 5 : Comparison in fault-escape probability when 1 to 8 same-type modules are tested simultaneously in BIST embedding and TCC grouping methodologies 
List of Figures

List of Tables
