Abstract
Introduction
Advances in semiconductor manufacturing technology are triggering new design and test methodologies which are necessary to cope with the increased chip complexity [1] . For example, system-on-achip (SOC) design using reusable intellectual property (IP) cores is emerging as a new implementation paradigm. IP cores are pre-designed and pre-verified by core providers, however SOC composition, design verification and manufacturing test fall into the duties of the system integrator [2] . The latter, including the test of the cores and of the entire SOC, requires special test access mechanisms (TAMs).
To enable both core reuse and easy test access, the embedded cores are connected to TAMs using special interfaces called core wrappers [3] . While the use of core wrappers guarantees high test quality by facilitating the test of individual cores, it also influences the cost of test. This is because the core wrapper design influences both the test time and the volume of test data, which are two essential factors that determine the cost of SOC test.
A common solution to reduce the test time in core-based systems is to use multiple scan chains. Due to various design constraints (e.g., routing overhead, scan path length) the multiple scan chains are not always balanced (i.e., not all the scan chains have equal length). In order to reduce on-chip control when feeding multiple scan chains, the test vectors are augmented with "don't care" bits to account for the differences between the scan chains' lengths. This is exemplified in Figure 1 , where for three scan chains of lengths 3, 6 and 4 the test tools "pad" the scan patterns, with "don't cares", to make them of equal length 1 . These "don't cares" are shown as Xs in the figure. Therefore, due to unbalanced scan chains, the test data comprises useful test data (the scan chain data) and useless test data (the padded data). For example, based on experimental data, when minimum test time is attained for core Module6 of system p93971, from the ITC02 benchmarks [4] , the amount of test data is 9154 2k, of which useful test data is 5170 6k while useless test data is 3983 6k. Hence, the useless test data represents 44% of the total amount of test data. Since this useless test data is explicitly allocated within the automatic test equipment (ATE) memory, it affects the ATE memory requirements, and it is therefore referred to as useless memory allocation (UMA). The UMA for one test vector represents the number of bits required to make the scan chains of equal length.
The volume of test data (VTD) is an emerging concern for testing complex SOCs [2, 5] since it influences directly the ATE memory requirements, and hence, the cost. A recently advocated solution to the VTD problem is test data compression. However, the approaches which compress the VTD (e.g., [6] [7] [8] [9] [10] ) will inherently compress the useless test data as well. This may adversely influence the compression ratio obtained by these approaches.
The objective of this paper is to reduce the UMA by exploiting the memory management support of the new generation ATEs. Memory management support comes with ATEs that implement "sequencingper-pin" [11] , i.e., the capability of controlling a pin, or a group of pins, individually. The relevant sequencing-per-pin tester's feature is the ability to make a larger number of transfers on a group of pins while others remain unchanged. The minimum number of pins in a group is referred to as pin-group granularity. For example, if a sequencing-per-pin ATE has 64 pins and the pin-group granularity is 32, then it can control separately the number of transfers on two groups of 32 pins.
While sequencing-per-pin is an expensive extension for functional testers, the recently advocated design-for-test (DFT) testers present the same feature, however, with the advantage of reduced cost [12] . This is because, DFT testers do not need all the functional sequencing-per-pin tester's features "behind" each pin. In this paper, this ATE feature is referred to as reconfigurable memory pool (RMP) [12] . Note that this is contrary to conventional ATEs which are capable of performing only sequencing-per-vectorall the ATE channels transfer data at the same rate. With reference to the previous example, a sequencingper-vector ATE will transfer data on all the 64 pins. Throughout the paper "control overhead" will be used with reference to an ATE with sequencing per vector (i.e., how many groups have to be controlled).
Hence, the greater the number of groups, the greater would be the control overhead on the ATE.
It will be shown in this paper how the UMA problem scales from multiple scan chain designs to core wrapper designs, and how the UMA can be reduced in core-based SOCs by efficiently exploiting the memory management capabilities of the new generation ATEs. In the following section the relevant previous approaches to TAM and core wrapper design, which influence the memory requirements, are discussed, and Section 1.2 summarizes the contributions of this paper.
Previous work
Recently a number of approaches have addressed the core wrapper design [3, [13] [14] [15] and the TAM design [13] [14] [15] [16] [17] [18] [19] [20] [21] issues. With respect to core wrapper design, the work in [13] proposes a "test collar"
as a core wrapper for SOC test. While, the method uses variable-width buses for test data and control, widening these busses may have an negative impact on the routability of the design [13] . Marinissen et al. [14] , proposed a "TestShell" wrapper which is the basis for the IEEE P1500 [22] core wrapper.
The TestShell is scalable and supports the operating modes required by IEEE P1500. Since the initial approach [14] presents the disadvantage of unbalanced wrapper scan chains (i.e., the scan chains formed by the internal scan chains and the core's inputs/ outputs tend to have unequal lengths), heuristics have been proposed, in [3] and [15] , to balance the wrapper scan chains with focus on test time, however, the VTD has not been addressed. Recently, in [23] reconfigurable core wrapper design, where the core wrapper can dynamically change between different configuration, has been proposed.
With respect to TAM design, the work in [19] has addressed the TAM design problem for minimizing test time by considering problem formulations ranging from fixed width test buses to the design of the entire TAM under given maximum test bus width constraint, using the core wrapper design from [14] .
In [20] , place and route and the power dissipation constraints were also considered in the TAM design problem with primary focus on test time minimization. In [15] , the approach from [3] was generalized such that the wrapper design, TAM and test time minimization were combined into a unified problem formulation. In [21] , by managing the number of bridges (e.g., multiplexers, controllable buffers and bypass routes) between the cores, the TAM was designed for minimum test time and bridge area overhead. In the context of reconfigurable core wrappers, in [23] the associated TAM algorithms have been introduced. Based on "TestRail" [14] , a flexible test data mechanism, in [24] , the TR-ARCHITECT algorithm has been proposed.
Recently, the problem of VTD reduction and TAM design for complex SOCs has been addressed [25, 26] . In these approaches, the VTD was taken into account in the cost function which drives the TAM design heuristic. However, as illustrated in [26] , targeting test time and VTD minimization will produce a TAM design which provides a trade-off between the two. This, as will be shown in this paper, can be also attributed to the inherent trade-off between VTD and TAM width in core wrapper design caused by useless test data.
Motivation and contributions
As illustrated in Figure 1 , the UMA is a result of the unequal length of the scan chains. While attempts have been made to equal the scan chains' lengths [27, 28] , these require information about the core's internal structure, and they do not take into account the inputs and the outputs of the core. As the IEEE P1500 standard requires that each core has a wrapper, and, since depending on the business model the system integrator is often restrained from modifying the core's internal structure [2] , the approaches [27, 28] the one-to-one association between the WSCs and the ATE channels, the number of transfers on each partition (the depth of the corresponding ATE channels) is given by the length of the maximum WSC in the partition, also referred throughout the paper as the partition's length. In addition, while the RMP feature allows different number of transfers on each partition, the greater the number of partitions, the more complex is the control on the ATE. Hence, to efficiently exploit the RMP ATE feature, the depth of the ATE channel, the number of partitions and the pin-group granularity have to be considered.
Having illustrated the link between WSC partitioning and the RMP ATE feature, the following section exemplifies the relationship between UMA and core wrapper design. Section 2.2 illustrates the control requirements for the ATE when WSC partitioning is considered with the core wrapper design. As noted previously, in order to efficiently exploit the ATE RMP features, the number of partitions and the pin-group granularity have to be considered. These two are illustrated using Example 1. In addition, as it will be illustrated in Example 2, the case when the number of outputs are greater than the number of inputs will also influence the UMA. 4, 8, 16 . In the best case scenario per-pin granularity is available, however, if this is not the case, the pin-group granularity (g) can also affect the UMA. To illustrate this, consider Figure 4 (b) where in order to reduce the UMA, two partitions of size 2 were created. Hence, a tester pin-group granularity smaller than, or equal to, 2 is required. If the tester pin-group granularity is greater (e.g., 4), then no partitioning is possible, hence the memory requirements cannot be reduced. Therefore, with the increase in ATE pin-group granularity, the UMA tends to increase.
It should be noted that for the wrapper designs discussed in this paper, all the partitions are loaded in parallel using the same clock and the ATE deploys data on the partitions at different moments. This will be further detailed latter on in this section. Hence, if the number of inputs is greater than, or equal to, the number of outputs, then the ATE memory will have to account only for the UMA caused by the input scan chains as explained in Example 1 and Figure 3 . However, if the number of inputs is smaller than the number of outputs, then the wsc i can be smaller than wsc o and, in order to ensure that all the data has been shifted out from the output WSCs, the ATE memory has to account for the difference between the wsc i and wsc o . This is another source of UMA, as explained in the following example. This problem could be easily solved by using the repeat fill feature of some ATEs and adding special "scan op-codes" to account for the repeat [30] . However, if the number of repeats is not considerable, then adding the extra scan op-codes does not provide a viable solution [30] , as it increases the memory requirements instead of reducing them. Furthermore, in order to provide a uniform solution for both cases, when wsc i there is only one test bus that connects the core under test to the TAM and only one clock is used to drive all the WSCs, storing the test data for the longest WSC in the ATE memory will be satisfactory to load/unload the data from all the WSCs in the core under test. Therefore, reducing UMA when the number of core's outputs is greater than the number of core's inputs, requires that the number of outputs are considered explicitly in the design of the input as well as the output WSCs. It should be noted that previous core wrapper design algorithms [3, 15] make a clear distinction between the input and output WSCs design phases. Note that, since the test time is a function of max § wsc i¨w sc o © , considering the number of outputs to drive the input WSCs construction will not affect the test time.
UMA and ATE test vector deployment
The UMA reduction, illustrated with Examples 1 and 2, is due to partitioning of the WSCs and division of the test set according to the WSC partitions. As the length of the two partitions differ, the ATE will have to account for this difference when deploying the two test sets. This is illustrated next, for the test sets shown in Figure 4 cycles, data from both T S 1 and T S 2 is loaded onto partitions p 1 and p 2 respectively. It should be noted that having only one clock driving all the WSCs for the first 4 clock cycles the data loaded on the test bus lines corresponding to partition p 1 represents "don't cares". This is allowed since valid test data is required at the input WSCs of p 1 only after the 4th clock cycle.
Since the core wrapper design is an intermediate step in SOC test, the proposed approach does not incur any extra overhead. Hence, the modifications on the ATE are the only changes implied by the proposed approach. This can be achieved at the expense of an external ATE module [31] to support custom ATE behavior employed when IEEE P1500 compliant SOCs are tested.
Novel test methodology for UMA reduction
It was illustrated in the previous section that WSC partitioning in conjunction with the ATE deployment procedure lead to UMA reduction. In this section a new test methodology is given which comprises two components: It should be noted that throughout the paper it is assumed that the core test language (CTL) [32] , describing the core test information, contains the scan chains lengths 2 . In addition, in order to provide a generic solution to the UMA problem, no specific test pattern information (i.e., the content of the test patterns) has been considered. When test pattern information is available, the proposed methodology can be used in conjunction with other solutions to reduce the vector memory, such as ATE repeat fill [30] .
Wrapper design algorithm for reducing UMA
Prior to providing the new core wrapper design problem, which accounts for UMA, two recently proposed approaches [3, 15] are analyzed. Since the core wrapper design problem was shown to be NP hard, several heuristics have been proposed such as: Largest Processing Time (LPT), MultiFit and Combine in [3] , and Best Fit Decreasing (BFD) [15] . Both, the Combine and MultiFit heuristics [3] employ the First Fit Decreasing (FFD) heuristic [3] to assign scan chains to WSCs. The FFD [3] assigns a scan chain to the first WSC which will not lead to an overflow on the maximum WSC capacity.
Hence, it tends to unequally distribute the WSCs lengths, thus leading to UMA. The BFD heuristic [15] aims to equal all the WSC lengths such that the minimum number of WSCs are used, however, since test time minimization is the primary design objective, it does not explicitly target the reduction of UMA. For example, when applied to the core considered in Example 1, both algorithms lead to the core wrapper design shown in Figure 2(a) . Thus, since these heuristics do not target minimum number of WSC partitions and minimum UMA, they lead to the UMA marked in Figure 3(b) . There are two interesting conclusions described in [3] and [15] It should be noted that the mUMA problem is NP hard. This can be easily shown by assuming that the number of partitions equals the test bus width. In this particular case, there is no UMA and the problem reduces to the core wrapper design problem as presented in [3] and [15] , which was shown to be NP hard. However, as illustrated in Example 1 and as shown later in Section 3.2, the number of partitions influences the complexity of the ATE program. Hence, finding the minimum number of partitions is important. Therefore, in the following, a new core wrapper design algorithm is proposed which accounts for minimum number of WSC partitions, minimum test time and minimum UMA. In contrast to previous heuristics [3, 15] , which always aim at minimizing the test time taking into account only the number of inputs or only the number outputs for WSC construction, in order to reduce the UMA, the proposed algorithm uses the number of outputs to drive the design of both the input and the output WSCs (see Example 2 in Section 2). The proposed heuristic can be divided into two parts, an algorithm which manages the WSC partitioning, and an algorithm which constructs the WSCs for each eral for a test bus of w, there are 2 w 1 distinct partitions [34] . For each P, an algorithm to generate the WSCs, called mA (Algorithm 2), is applied to the internal scan chains (step 4), the outputs (step 5) and the inputs (step 8) of core C. If the number of outputs is greater than the number of inputs, the maximum capacity cap is computed (step 6). This will be used to drive the construction of the input WSCs, hence contributing to UMA reduction as shown in Example 2 (see Section 2). The U MA for the newly design wrapper is computed using equation (3) . If the UMA is 0 (step 10), the algorithm is halted, otherwise, the UMA is recorded. When all partitions from set P for a given np have been processed and no solution with U MA ¤ 0 was found, then the number of partitions is increased. solution with the minimum UMA and minimum number of partitions is chosen. The UMA is computed using equation (3) (see Section 4). For the remainder of the paper mU MA¢ np£ will denote the mUMA heuristic when applied for np partitions.
Considering the ATE pin-group granularity as a constraint, in the above algorithm, implies filtering the partitions set P such that each partition's length is divisible by the ATE pin-group granularity. Alternatively, one could generate the partitions P such that each partition's length is divisible with g. was not assigned to partition p k then the next partition is chosen. If S i was not assigned to any partition (step 11), then it is assigned to the WSC with the minimum length (W SC min ). After every scan chain is assigned, the WSCs are sorted ascending (step 12).
It is important to note that Algorithm 2 aims at generating a WSC representation like the one given in Figure 4(b) , such that the control overhead on ATE is minimum. To achieve this the partitions are iterated in reversed order, i.e., firstly partition p 2 and then partition p 1 (see Figure 4) , and when the first WSC assignment is found (step 8 in Algorithm 2) the next scan chain is selected. While alternative algorithms for designing the core wrapper, aiming at minimum UMA, are possible, care must be taken to ensure that reducing the trade-off between the UMA and test bus width will not result into a trade-off between UMA and ATE control.
The complexity of Algorithm 2 is given by O¢ s w ¦ s w log¢ w£ £ , i.e., in the worst case scenario there are w partitions, and each scan chain has to be assigned to one; in addition the reordering step is performed for each assignment. As illustrated in Algorithm 1, the mA algorithm is used first for the internal scan chains of the core (step 4), then for the outputs (step 5) and then for the inputs (step 8). The inputs and the outputs are considered as scan chains of length 1. Hence, the complexity of Algorithm 1 is given by O¢ 2 w 1
To achieve reduction in memory requirements by exploiting WSC partitioning, ATEs need memory management support. ATE test vector deployment methods which account for this requirement are detailed in the following section.
Test vector deployment procedure for reduced UMA
This section illustrates two possible implementations of the proposed test methodology when different ATE features are considered. Firstly, an ATE test vector deployment procedure is given for the particular case of np ¤ 2 partitions, and secondly, the "Split Timing Mode" architecture is examined [35] .
In order to fully exploit the new core wrapper design, the initial test set is divided into a number of test sets equal to the number of partitions. The ATE program will have to deploy test vectors from the different test sets at separate times. Hence, the increase in the number of partitions will lead to a more complex ATE program (see Example 1 in Section 2). However, if the number of partitions is limited to 2 the changes on the ATE are minor. The pseudo-code for the ATE program for this particular case is discussed in the following. For max wsc clock cycles, the test data from T S 2 is loaded onto the test bus. Since the first partition is smaller than the second one, the ATE will read the test data for T S 1 only after di f f clock cycles. It should be noted that since all the WSCs are driven by the same clock, the data loaded into the WSCs corresponding to the first partition represents don't cares for the first di f f clock cycles. This is allowed since valid test data is required in this partition only after di f f clock cycles (see Example 1 in Section 2).
It is important to note that the three parameters suffice to characterize any core wrapper designed with mUMA for np ¤ 2. In Section 5 is shown that even though for np ¤ 2 the UMA is not always 0, the particular case leads to a good solution from the UMA standpoint, at the benefit of simplifying extra ATE requirements.
The three parameters provide the benefit of independence between the test control and the test data, which is also the view put forth by the CTL [32] developed in parallel with the IEEE P1500 standard.
For a given core and it's CTL description, after core wrapper design, using two WSC partitions, the initial core can be seen as two virtually independent cores with their own CTL description. One case in which the above scenario can be used is detailed below for the case when the Split Timing Mode (STM) [35] architecture is available. The STM architecture has been used in [35] for dual-frequency test. The basic idea behind this architecture is to configure a tester as two independent virtual test systems using the same master clock [35] , but providing data to the chip under test at two different frequencies. The feature of interest in the investigated scenario is the fact that each virtual test system has its own memory and pattern generator [35] . This feature can be exploited, in the case of the proposed approach, as follows. When the difference (di f f ) between the two partition's length is considerable, the test set corresponding to the shorter partition can be augmented with scan op-codes for repeat fill [30] .
These will, then, automatically generate the padded data for the shorter partition. Hence, the test vector deployment procedure is no longer needed as the deployment information is already included within the first test set.
Analyzing wrapper scan chain partitioning trade-offs
Having illustrated the proposed test methodology, in this section a theoretical analysis of WSC partitioning is given and the WSC partitioning, VTD and test time trade-offs are examined.
Theoretical analysis
Consider that W SC j represents the length of the WSC corresponding to test bus line j, and w represents the test bus width. Similar to multiple scan chain designs, WSCs also have different lengths, hence, the memory depth of the corresponding ATE channels will also differ. As illustrated in Figure 1 (see Section 1) for multiple scan chain cores, the UMA for one test vector represents the number of bits required to make the scan chains of equal length. For wrapped cores (i.e., cores for which the WSCs have already been determined) this translates into:
i.e., the number of bits required to equal the WSCs for a given test bus width. Basically, it is the difference between the maximum and the minimum memory requirements. 
i.e., the number of bits required to equal the WSCs from a given partition. Hence, the total UMA is:
Starting with an initial ad-hoc partitioning with np partitions, the number of partitions can be further increased through:
iterative partitioning -when one of the partitions is further divided; or ¢ i i£ repartitioning -when a new partitioning with more partitions, independent of the existing one, is performed.
With respect to iterative partitioning, the following lemma holds. heuristic performs repartitioning and aims at selecting the solution with minimum UMA. Hence, the above relation holds for the proposed mUMA heuristic as also illustrated in the following section.
Lemma 1 For a wrapped core of test bus w and np disjunctive partitions such that

Volume of test data and test application time trade-offs
As illustrated with equation (1) in Section 4.1, the memory requirements are dependent on the test bus width. This implies that there is a trade-off between the VTD and test bus width, and consequently, there is a trade-off between the VTD and test time. These trade-offs are analysed next.
The trade-off between VTD and test bus width is illustrated in Figure 8 , where the memory requirements for mUMA with np ¤ 1¨2 and 3 (see Figure 8(a) ), and the test time (see Figure 8(b) ) when the test bus width is varied between 1 and 31 are given for Module26 of SOC p22810 from the ITC02 benchmark circuits [4] . As can be seen in Figure 8 trade-off between the VTD and the test bus width is reduced for np 2, and eliminated in most of the cases for np 3, as can be seen in the figure. In order to keep the figure simple, the plot for np 4 is not shown. In this case however, there is no more trade-off between test bus width and VTD. It can be seen in Figure 8(b) , that the test time steadily decreases with the increase in test bus width. However, due to the trade-off between the test bus width and the VTD, for np 1, there is a trade-off between the VTD and the test time. Since, increasing np leads to reducing the trade-off between the test bus width and the VTD, it also leads to reducing the trade-off between the VTD and the test time. It is interesting to note that the reduction in the variation of volume of test data is considerable when the number of partitions increases from np 1 to np 2. When np ¡ 2, the reduction is small. Therefore, while using the number of partitions as a constraint can diminish the effectiveness of the proposed algorithm, as long as at least two partitions are allowed, the UMA reduction can be significant. From the above example, the following can be derived:
Observation 1 Minimizing the memory requirements and minimizing the test time can be viewed as orthogonal problems if WSC partitioning is considered with the core wrapper design.
Thus, if the RMP feature is available, using WSC partitioning in the core wrapper design will allow simultaneous reduction in both test time and ATE memory requirements. Hence, considering WSC partitioning could also reduce the trade-off between test time and VTD in TAM designs.
Next, the relation between the test time obtained using the proposed core wrapper design and the one obtained using the previously proposed BFD [15] This observation is justified by the following. The BFD [15] heuristic tries to equalize the WSCs by assigning a scan chain to the WSC such that the length of the resulting WSC is closest to the maximum WSC length. Hence, it tries to exploit "horizontally" the scan chain to WSC assignment process. This is done to yield a minimum bus width core wrapper 3 . The mUMA heuristic (Algorithm 1) exploits both "vertically" and "horizontally" the scan chain to WSC assignment process, i.e., it tries to minimize the difference between the maximum and the minimum memory requirements for a partition (see equation (2)). As shown in Algorithm 2 (step 7), a scan chain is assigned to the first WSC such that UMA is minimized without an overrun on the maximum WSC. However, considering only 1 partition the UMA will be the same regardless of the WSC to which the scan chain is assigned. Since, after each run the WSC are sorted (see Algorithm 2 -step 12), assigning a scan chain to the first WSC such that no overrun on the maximum WSC occurs, is equivalent to assigning a scan chain to a WSC such that the length of the resulting WSC is closest to the maximum WSC length. The latter being the strategy used in BFD [15] . Therefore, mUMA with np ¤ 1 and BFD [15] will generate the same core wrapper. The same reasoning is applicable for np ¤ w. In this case, there is no UMA. Hence, assigning a scan chain to the first WSC such that no overrun on the maximum WSC occurs will yield the same core wrapper design as the BFD [15] heuristic. Note that when n It is important to note that, in general the values for the length of the maximum WSC, which influence directly the test time of the core, are comparable to the ones obtained by the BFD heuristic [15] . This is because, in both approaches, the scan chains are assigned to WSCs such that the current maximum WSC length is never exceeded. For the case illustrated in Figure 8 , Table 1 gives the test time for the BFD heuristic [15] and the mUMA(2) for different test bus widths. It can be observed from the table that they are equal. Therefore, considering WSC partitioning in the core wrapper design algorithm has small or no penalty in test time at the great benefit of significant reduction in memory requirements as it will be shown in Section 5.
Experimental results
The experimental analysis has been performed on a Pentium II 366 MHz Linux workstation with 128
Mb of RAM using the largest ISCAS89 [29] and ITC02 [4] benchmark circuits. Exploiting wrapper scan chain (WSC) partitioning for reducing useless test data requires ATE with reconfigurable memory pool (RMP). As illustrated with Lemma 1, the UMA can be reduced by increasing the number of partitions, however, this will then increase the control overhead on the ATE. In addition, the ATE pin-group granularity may also influence the effectiveness of WSC partitioning. Using the cores' specifications detailed in Section 5.1, the above issues are investigated with the following three experiments: It should be noted that for the first two experiments a per-pin granularity is assumed.
Core specifications
For the ISCAS89 benchmark circuits, we considered the specifications as given in Table 2. The table   lists the circuit, the number of inputs/outputs (n/m), the number of internal scan chains (s), the total number of internal scan cells (FFs), the number of test vectors (n v ) and the minimum memory required to store the test set computed as mem ¤ ¢ 
It should be noted that w max , as computed above, will guarantee minimum test time, however, it will not always represent the minimum test bus width for which the minimum test time is obtained. The test time given in the table is obtained for w max as computed above.
From the ITC02 benchmark circuits [4] we considered the systems p22810, p34392 and p93791.
While all the ITC02 benchmark systems have been taken into account in our experiments, only these three are reported as they better exemplify the variation in memory requirements. This is mainly due to the large number of scan chains and the scan chain length distribution. It should be noted, however, that the results for the other systems are within the range of the reported results in this section. Table 5 . mUMA for w max with ITC02 [4] benchmark circuits system, the three modules with the largest memory requirements were considered. The specifications for the circuits used in our experiments are given in Table 3 , the detailed specification can be found at [4] . In addition to the information given for the cores in Table 2, in Table 3 the number of bidirectional pins (q) is given as well. It should be noted that, for the core wrapper design, the bidirectional pins (q) were added to both inputs and outputs as suggested in [3] . For the two benchmark sets ISCAS89 (see Table 2 ) and ITC02 (see Table 3 ), the results are reported in Table 4 and Table 5 respectively. The tables list the length of the partitions, the memory requirements, the UMA, and the execution time (E t in seconds) needed to complete the mUMA algorithm for a test bus width of w max , for both: the general case (columns 3 -6), and for the particular case with only two partitions (column 7 -10). It is interesting to note that even though for two partitions the UMA is not zero in all of the cases, it is still very small. For example, in the case of core s38584 (see Table 4 ), the increase is 0 27%, while in the case of Module26 from SOC p22810 (see Table 5 ) the increase is 4 04%. On average, the increase in memory requirements for the particular case of np ¤ 2 is of less than 5%. This justifies the usage of the proposed heuristic for the particular case with two partitions, since minimum or close to minimum memory requirements are obtained with minor changes on the ATE (see Section 3.2). The execution time (E t ) is insignificant, e.g., for the general case it is up to 4 seconds and for the particular case of two partitions it is under 1 second. Having shown that the particular case of two partitions yields minimum or close to minimum UMA, for the remainder of the experiments this particular case will be considered for further comparisons. In the following the overall performance of the proposed test methodology is compared with the case when a conventional ATE is used. [3] , Best Fit Decreasing (BFD) [15] ) have been used, while for the latter the mU MA¢ 2£ has been employed. To provide a common ground for the comparison it has been imposed that for all the cases the test time is the same and the test bus has been varied between 4 and w max . As noted in Observation 2, BFD [15] and mU MA¢ 1£ obtain the same test time. In addition, for the performed experiments, it was found that the test time obtained using mU MA¢ 2£ will equal the test time obtained using BFD. This can be explained by the fact that both approaches assign the scan chains to WSCs such that the current maximum WSC length is never exceeded. Therefore, there are no test time penalties when compared to [15] . To also ensure that the test time obtained with FFD [3] equals to the one obtained with BFD, the FFD algorithm was used considering the capacity given by the maximum WSC determined with BFD. It should be noted that, although this might give the impression of a disadvantage with respect to [3] , it will actually lead to reduction in memory requirements when employing the FFD heuristics and comparing it to BFD. This is because, in some cases the BFD heuristic requires more WSCs to obtain the same test FFD [3] BFD [15] mUMA Table 6 . Memory requirements comparison for ISCAS89 [29] time as the FFD heuristic. Hence, discarding the empty WSCs for the core wrapper design produced by FFD, will reduce the memory requirements. It is important to note that, due to the variation of w between 4 and w max , the entire core wrapper design solution space is explored and therefore the test time can be considered as a reference point in the comparison.
As illustrated in Section 4.2 for different test bus widths there are different memory requirements.
Therefore, the three core wrapper designs have been employed when w has been varied between 4 and w max (w ¤ 4¨w max ), and their minimum (Min), maximum (Max) and average (Avg) memory requirements over all TAM widths have been computed. It should be noted that for mU MA¢ 2£ for each test bus width the two WSC partition solution which leads to minimum UMA has been chosen. The results are reported for the three core wrapper design methods in the case of ISCAS89 benchmarks circuits [29] in Table 6 , and in the case of ITC02 benchmarks circuits [4] in Table 7 . In the case of the ISCAS89 benchmark circuits, for the FFD and BFD approaches, the minimum, maximum, and average memory requirements over all test bus widths are given in columns 2 -4 and 5 -7 in Table 6 respectively. The results for mU MA¢ 2£ are reported in columns 8 -10 in the same table. Note that the difference between Max and Min is considerably greater in the case of the FFD and BFD methods than in the case of the proposed core wrapper design algorithm. For example, for core s13207, in the case of both FFD and BFD, the maximum memory requirements are 32 47% greater than the minimum memory requirements, hence, the trade-off between VTD and test bus width. This is contrary to the proposed approach where the increase is only 1 42%, which leads to the trade-off reduction. Based on the information provided in Table 6 , also the reduction in minimum, maximum and average memory requirements over the two previous approaches, FFD and BFD, can be determined. For example, in the case of circuit s38584, the maximum memory requirement is reduced by 28 18% when compared to FFD and by 42 08% when compared to BFD. The average memory requirement for s38584 is reduced by 18 73% when compared FFD [3] BFD [15] mUMA Table 7 . Memory requirements comparison for ITC02 benchmarks circuits [4] to FFD and 21 80% when compared to BFD. Overall, the proposed test methodology achieves average and maximum memory requirement reduction of up to 22 05% and 45 86% respectively.
For the ITC02 benchmarks circuits [4] , the results are reported in Table 7 . Once again, note the difference between Max and Min in the case of the FFD and BFD methods. For example, for core Module20 from SOC p93691, in the case of FFD (columns 2 -4 in Table 7 ) the maximum memory requirements are 36 30% greater than the minimum memory requirements. Similarly, for BFD (columns 5 -7 in Table 7 ), the maximum memory requirements are 36 68% greater than the minimum memory requirements. Hence, the trade-off between the test bus width and the memory requirements. When the reduction in minimum, maximum and average memory requirements over the two previous approaches, FFD and BFD, are analyzed, considerable reduction in maximum and average memory requirements can be observed. For example, in the case of core Module27 from SOC p93691, the maximum memory requirements are reduced by 38 03% and 39 76% when compared to the two previous approaches (FFD and BFD) The reduction in average memory requirements over all test bus widths is 15 31% and 18 60% when compared to the FFD and BFD heuristics. Overall, the reduction in maximum memory requirements is up to 46 67%, while the reduction in average memory requirements is up to 26 13%. Based on the above results, it can be clearly seen that considering the ATE memory requirement during core wrapper design reduces the trade-off between test bus width and memory requirements and consequently, as also illustrated in Section 4.2, between memory requirements and test time. (d) Module 27 of SOC p93791 [4] with g 8 Figure 9 . ATE pin-group granularity and WSC partitioning
Experiment 3: ATE pin-group granularity constrained WSC partitioning
In this section two issues are investigated. Firstly, the implications of the pin-group granularity on the performances of the proposed mUMA, and secondly, the importance of considering WSC partitioning within the core wrapper design algorithm. It should be noted that in the framework of the proposed test methodology, WSC partitioning has been considered as a step within the core wrapper design algorithm. However, WSC partitioning could be also seen as a post processing step. To provide a comparison for these two cases, WSC partitioning has been considered as a post processing step for the FFD [3] With respect to the first issue, the influence of the ATE pin-group granularity on the performances of the mUMA algorithm, it can be seen in the figures that the influences are small. For example, in Having illustrated the influence of ATE pin-group granularity on the proposed algorithm's performances, in the following the difference between considering WSC partitioning as a post processing step and as a step within the core wrapper design is illustrated. As noted previously, for this purpose, the FFD and BFD core wrapper algorithms have been extended with a post processing WSC partitioning step. Throughout the performed experiments it has been observed that considering WSC partitioning as a post processing step yields memory requirements which are lower bounded by the ones obtained using WSC partitioning within the core wrapper design (mUMA). This is best illustrated with Figure 9 
Conclusions
This paper analyzed the test memory requirements for core-based SOCs and identified unequal length scan chains as one source of useless test data which leads to a trade-off between test bus width and volume of test data in multiple scan chains-based cores. A new test methodology has been proposed, which based on employing wrapper scan chain partitioning in core-based designs and exploiting ATE memory management features can obtain considerable reduction in useless memory. Extensive experimental analysis, on the ISCAS89 and ITC02 benchmark circuits, has been conducted to evaluate the proposed methodology. Thus, the work presented in this paper demonstrates that with the advent of the new generation ATEs, which allow greater flexibility and provide memory management capabilities, methodologies complementary to test data compression can be used to reduce the volume of test data, and hence the cost of testing complex SOCs.
