Vector reordering is an essential task in testing VLSI systems because it affects this process from two perspectives: power consumption and correlation among data. The former feature is crucial and if not properly controlled during testing, may result in permanent failure of the device-under-test (DUT). The latter feature is also important because correlation is captured by coding schemes to efficiently compress test data and ease memory requirements of Automatic-TestEquipment (ATE), while reducing the volume of data and lowering the test application time. Reordering however is NP-complete. This paper presents an evaluation of different heuristic techniques for vector reordering using ISCAS85 and ISCAS89 benchmark circuits in terms of time and quality. For this application, it is shown that the best heuristic technique is not the famous Christofides or Lin-Kernighan, but the Multi-Fragment technique.
INTRODUCTION
Recently, there has been a tremendous growth in the development and application of intellectual property (IP) cores [1] . These cores are provided by third party vendors and are often shipped with test data so that the core integrator can apply the data to the design after manufacturing to ensure its correct operation. As the complexity of these Sea Of Cores (System On Chip) systems increases, testing has become a significant bottleneck.
During testing, all cores must be tested to ensure that they work properly; this requires a considerable power consumption which may often be higher than for normal operation [2] . A higher power consumption (i.e. more energy is taken from the supply), means that more energy is dissipated through the substrate of the circuit, thus resulting in an increased heat dissipation which may burn devices. Several techniques have been studied to address this problem. A test generation technique for low power, has been discussed in [3] . Another approach is based on scheduling so that during testing, maximum power consumption is kept under a certain threshold to avoid burning the DUT [4] . Previous works ( [5] [6] [7] ) have also suggested reordering of test vectors so that the Hamming distance between adjacent vectors is minimal. They have empirically proved that a minimal Hamming distance translates into a lower activity, thus reducing the power consumption of the DUT. Additionally, the volume of test data can also increase rapidly. This increase affects testing: 1) test time increase is proportional to data volume and 2) the memory requirement of the test equipment system is also increased. To solve the increasing volume, many works have suggested to compress test data [8] [9] . This technique relies on pre-processing vectors by reordering them (i.e. minimizing the Hamming distance between adjacent vectors), differentiating each vector with its successor, and finally applying a coding scheme (e.g. RunLength coding) to compress the processed data. Reordering is an essential task which is needed to address correlation extraction among test data for compression.
For the above two issues, vector reordering is a critical task for manufacturing test of VLSI systems. Vector reordering however translates into a traveling salesman problem (TSP) which is a known NP-complete problem. The TSP instance for vector reordering is represented by a complete graph with a large number of nodes. Therefore it is essential to find a good heuristic solution in terms of time and quality. The objective of this work is to evaluate a number of different heuristic approaches for this problem in terms of execution time and quality and present one that results in best overall performance for vector ordering application. The rest of this paper is organized as follows: In Section 2. basic concepts and definitions are presented. Section 3. examines a number of heuristic approaches and presents their time complexity and quality. Section 4. presents the simulation results for a number of heuristics used in reordering vectors for a number of benchmarks circuits, followed by conclusions in Section 5.
BASIC DEFINITIONS
Consider a test set for a combinational (or full-scan sequential) circuit given by V = {v1, v2, · · · , vn} where |V | = n. Each vector vi is formed by a fixed ordered set of bits bj, i.e. vi = (b1, b2, · · · , bm). The Hamming Distance between each two vectors vi = (b k ) and vj = (c k ) is defined as the number of 1s obtained from the operation vi XOR vj where XOR is the bitwise XOR operation, i.e.
HD(vi, vj
The Total Hamming distance of a given ordering (sequence) is calculated by finding the Hamming distance between each adjacent pair of vectors starting from the first vector. As- 
The reordering problem consists of finding an ordering of vectors that gives a minimal total Hamming distance. To solve this problem, a graph is generated by assigning a vertex to each vector and an edge between each two vertices (vectors). The fully connected graph is weighted; each edge weight is equal to the Hamming distance of the connecting vectors [7] [8] . This graph is undirected because the XOR operation has the exchange property. Figure 1 shows a graph constructed for the sample test data.
If a cycle that traverses only once all nodes of this graph is found, and the sum of its edge weights is minimal over all possible cycles (minimum Hamiltonian cycle), then the cycle also finds the optimal order of the test vectors. This corresponds to the traveling salesman problem (TSP) in which a traveling salesperson visits all cities and returns to the original city, with the shortest path. The ordering found from TSP directly affects the so-called correlation [8] among test vectors for compression. Since the total Hamming distance is minimal, and vectors are bitwise XORed, then there are long runs of 0 appearing in the test data. These long runs are used to compress the test data by employing a coding technique, such as Run-Length or Golomb [8] [9] .
The ordering found by solving the TSP indirectly affects power consumption during testing. The application of vectors to a DUT in such an order, triggers minimal activity on the primary inputs of the circuit; however in general, it does not guarantee minimal activity over the internal nodes of the DUT. It has been empirically shown that the ordering with minimal total Hamming distance also produces efficient low power manufacturing test [7] . For better results, edge weights can be set according to the total switching activity that the application of a vector pair (vi, vj) triggers inside the chip. This results in two arcs per edge because (vj, vi) may be different from (vi, vj ). As added complexity, for every possible combination of vectors, logic simulation of the circuit is required to obtain the edge weights. This is computationally more expensive compared to the calculation of the Hamming distances.
Assume that the DUT has a total of N nodes including primary inputs, internal nodes, and primary outputs. The switching activity triggered over the DUT when two vectors (vi, vj ) are applied in the same order, is defined as where the Node(k, vi) function returns a boolean value for node k when vector vi is applied. Then, the Total switching activity for a given ordering π is calculated as:
In general, the directed graph generated by mapping vectors to vertices and SA() to arcs, is still not a precise model for measuring the power consumption of a given ordering π, because different nodes may have different capacitance. However, it is better than using an undirected graph in which the edge weights are only represented by the HD() function.
Graph Model and Complexity Bounds
In this section, we study the characteristics of the graph obtained for the reordering problem and discuss several different graph models. This study is useful in analyzing the complexity and quality of few possible heuristic solutions to TSP. Consider the minimum tour length for a TSP instance I and denoted to as Many TSP instances are representative of real applications and have the property of Triangularity 2 . This means that the shortest path between two vertices is always the direct edge between the two, i.e. given three arbitrary vertices vi, vj , v k , the following condition holds [11] For example, Christofides heuristic [14] (which is discussed in more detail in the next section) guarantees < 0.5. Another important class of TSP instances, is referred to as Euclidean instances in which the vertices are located in a plane and their distance is described using the so-called l2 norm, i.e., for two vertices i, j located at geometric positions (xi, yi) and (xj, yj) respectively, the distance is given 1 Assuming P = NP 2 Some authors call this, the metric property. 
The Euclidean TSP instances are a subset of the triangular TSP instances, i.e. an Euclidean TSP instance is a triangular TSP instance, but not vice versa [11] . This is shown graphically in Figure 3 .
The category of TSP for vector reordering by considering edge weights with the HD function, is characterized in [5] as follows
Theorem 3. The TSP instance constructed for test vector reordering (assuming edge weights are established by the HD() function), is triangular.
The case in which edge weights are set according to the transitions over all internal nodes using the SA() function for the power consumption of the DUT, may be either triangular or non triangular, depending upon the delay model used in the analysis. Under the general delay model, the graph may not be triangular [5] .
SOLUTIONS TO THE TSP
Many heuristic criteria are available for solving TSP. Two classes of heuristics for TSP can be identified as given below [11] : 1) Tour Construction Heuristics: The heuristic criterion gradually constructs an ordering (possibly for a minimal tour); 2) Local Search Heuristics: The heuristic criterion starts by an initial ordering (tour) and gradually refines it into a new and possibly better ordering. There are a number of algorithms in each category. Table 1 shows sample algorithms which fall in each category [11] [12] . For example, the Nearest-Neighbor [11] approach is in the class of Tour Construction heuristics in which an ordering is gradually built by adding edges from the TSP graph. In this algorithm, the salesperson starts from any city (graph node), moves to the nearest neighbor city and follows this rule until it traverses all cities and returns to the initial city.
Execution Time Complexity
In general, execution time complexity of local search heuristics is worse than that of a tour construction. An analysis of worst case time complexity of a number of TSP heuristics is reported in Table 2 . n is the number of nodes in the TSP graph. These results are valid assuming the TSP instance is triangular [11] [12].
Tour Quality
In addition to time complexity, another performance measure is quality. Quality is related to the tour length that Table 2 : Worst case quality and execution time complexity of TSP heuristics the heuristic produces. The heuristic tour length is always greater than or equal to the minimum tour length, hence quality is defined as the length ratio between the heuristic tour and the minimum tour. The quality of the algorithms for TSP is summarized in Table 2 [11] [12] . For example, the Nearest-Neighbor heuristic algorithm [11] for a TSP of 1000 nodes, guarantees that the heuristic solution length is less than or equal to 0.5( log21000 + 1) or 0.5( 9.966 + 1) = 5 times the minimum tour length. No better guarantee is possible. The best tour construction heuristic, which was proposed by Christofides [14] , guarantees a worst-case tour length of 1.5 times the minimum tour length. The importance of Christofides algorithm is that quality is independent of the number of nodes. Together, these two measures (time and quality) can lead to a fair comparison among heuristics.
EVALUATING HEURISTIC CRITERIA
We have considered two well known heuristics, one from the class of tour construction and one from the class of local search algorithms. The first heuristic is Christofides, which has been used for vector ordering to reduce power consumption [5] . The second heuristic is Lin-Kernighan which has been used in [15] for vector ordering in data compression. Additionally, we have considered a number of heuristics including Nearest-Addition, Nearest-Neighbor, ClarkWright 3 , and Multi-Fragment [11] [12] . We have created a triangular graph model for the vector set of each of the ISCAS85 and (full-scan version) ISCAS89 circuits, and generated fully-specified vectors using HITEC [16] .
The GNU TSP solver program (tsp-solve) was compiled in an Alpha workstation. Two heuristics (Nearest-Neighbor and Multi-Fragment) were also implemented in C. The first was not implemented in the GNU TSP solver; the second was implemented for checking the results of the Multi-Fragment heuristic with the TSP solver. The original source code of the TSP solver does not cover instances with n > 2400, so we have modified the code to fix this limit and allow benchmark circuit s38417 to be examined.
The graph model of each benchmark circuit was provided to the TSP solver and the C program to find the execution time and quality of the solution. Table 3 shows the results. The results for the Nearest-Neighbor heuristic are from the C program and not the TSP solver. There are two sets of results for the Multi-Fragment heuristic; one is from the C implementation (Multi-Fragment2), and one is from the TSP solver (Multi-Fragment1). The quality in the table is measured in terms of minimal tour length obtained by the heuristic divided by the lower bound of the tour length (if the problem is solved using a relaxed integer linear programming). This is valid because it is not possible to obtain the optimum tour and follow the definition of quality (presented The Multi-Fragment heuristic 4 performs very close to the minimal tour with a very short execution time. For quality, the Multi-Fragment heuristic performs better than all, but one heuristic, i.e. Lin-Kernighan. However, the timing comparison between the Multi-Fragment and Lin-Kernighan heuristics shows that the Multi-Fragment execution time is significantly smaller than that of Lin-Kernighan. Table 4 shows the average execution time and quality of each heuristic over the 18 circuits of Table 3 . The Multi-Fragment heuristic starts by sorting all edges in the TSP graph in ascending order of length. A minimal tour is then constructed by selecting safe edges in the order. An edge is safe if by adding it to the current constructed tour, it does not create a loop of length less than n (where n is the number of nodes) and does not create a node of degree 3 (the degree of a node is the number of edges incident upon it).
CONCLUSION
An experimental evaluation of test vector ordering heuristics has been presented using quality and execution time as figures of merit. It has been shown that the MultiFragment heuristic performs better than Christofides and Lin-Kernighan heuristics in terms of time using realistic benchmark vector sets. The Multi-Fragment heuristic also outperforms the Christofides heuristic in terms of quality and achieves performance very close to Lin-Kernighan. We recommend ordering algorithms to use the Multi-Fragment heuristic for near-minimal ordered sets of vectors that result in both reduced power consumption and enhanced data compression ratio.
