Testing of three-dimensional (3D) stacked ICs (SICs) is starting to receive considerable attention in the semiconductor industry. Since the die-stacking steps of thinning, alignment, and bonding can introduce defects, there is a need to test multiple subsequent partial stacks during 3D assembly. We address the problem of testarchitecture optimization for 3D stacked ICs to minimize overall test time when either the complete stack only, or the complete stack and multiple partial stacks, need to be tested. We show that optimal test-architecture solutions and test schedules for multiple test insertions are different from their counterparts for a single final stack test. In addition, we present optimization techniques for the testing of TSVs and die-external logic in combination with the dies in the stack.
Introduction
As the semiconductor industry continues to scale CMOS technology to smaller feature sizes, long interconnects have emerged as the dominant contributor to circuit delay and a major cause for high power consumption. Three-dimensional (3D) integrated circuit (IC) technology provides many benefits for chip design [1, 2] , including a reduction in the average interconnect length and alleviation of many of the problems caused by long global interconnects [3, 4, 5] . Since die can be stacked in a 3D environment, on-chip data bandwidth can be increased as well. Furthermore, since 3D ICs can scale "up" instead of "out", higher packing density and smaller footprint can be achieved.
Advances in manufacturing have led to direct stacking and bonding of die on die, with short interconnects running between die. In this work, we focus on 3D stacked ICs (3D-SICs) that use throughsilicon via (TSV) vertical interconnects, as this approach offers the promise of high vertical interconnect density. 3D-SICs can be created by placing multiple device layers together through wafer or die stacking, and connecting metal layers from die to die using TSVs [5] . A 3D-SIC using TSVs allows for short, high-density vertical interconnects between thinned dies. Memories have already been successfully manufactured with this technology [6] and stacks that include memory stacked on logic die [7] , or multiple stacked logic die [8] are likely to be seen in the near future.
TSV-based 3D-SICs are likely to make a significant impact on core-based system-on-chip (SOC) design. The use of embedded cores in SOCs provide numerous benefits for both design and test, so it is likely that embedded cores will be seen in 3D-SICs as well. Testing of core-based dies in 3D-SICs introduces new challenges [9, 10] . In order to test the dies in a stack, the embedded cores, and the TSVs, a test access mechanism (TAM) must be included on the die to transport test data to the cores, and a 3D TAM is needed to transfer test data to the die from the stack input/output pins. In a 3D-SIC, a test architecture must be able to support testing of individual dies as well as testing of partial and complete stacks [10] . Furthermore, test-architecture optimization must not only minimize the test time, but also minimize the number of dedicated test TSVs used to route the 3D TAM, as each TSV has area costs associated with it and is a potential source of defects. Test bandwidth constraints due to a limited number of package pins available only at the lowest layer in a 3D stack must also be considered.
Compared to two-dimensional ICs that typically require two test insertions, namely wafer test and package test, 3D stacking introduces a number of natural test insertions [10] . Since the diestacking steps of thinning, alignment, and bonding can introduce defects, there is a need to test multiple subsequent (partial) stacks during assembly. Figure 1 shows an example manufacturing and test flow for a 3D stack. First, wafer test (i.e., pre-bond test) can be used to test die prior to stacking to ensure correct functionality, as well as to match die in a stack for power and performance. Next, Die 1 and Die 2 are stacked, and then tested again. This is likely to be the first time the TSVs between Die 1 and Die 2 will be tested due to technology limitations that make pre-bond test of TSVs infeasible [9] . This step also ensures that we can detect defects in the stack due to additional 3D manufacturing steps such as alignment and bonding. Then the third die is added to the stack and all die in the stack, including all TSV connections, are retested. Finally, the "known good stack" is packaged and the final product is tested. In this paper, we present optimization methods for 3D-SICs with hard dies, in which a test architecture already exists on each die. We minimize the test time considering all possible stack tests and the complete stack, while considering die-external tests as well. These optimization methods allow us to efficiently generate multiple options for testing a 3D-SIC. While it is possible to have multiple dies on a given layer in a stack, we only consider one die per layer. Also, a core is considered to be part of a single die only, i.e., we do not consider "3D cores" as they are not likely in the immediate future of 3D-SICs [10] .
The key contributions of this paper are as follows:
• Generalized and rigorous optimization methods to minimize test time for multiple test insertions. These methods also provide optimal solutions for several additional problem instances of interest, e.g., final stack test only, and testing of any subset of partial stacks.
• Methods to support multiple test schedules for a test-access architecture that is defined to be globally optimal for multiple test insertions.
• Optimization techniques that consider die-external and dieinternal tests during test-time minimization.
• Optimization methods that are compatible with a recently proposed die-wrapper architecture [11] , instead of requiring unrealistic assumptions on a specific wrapper for each case.
The rest of this paper is organized as follows. Section 2 provides an overview of related prior work. Section 3 uses simple examples to motivate this work and outlines how a 3D TAM can be designed and optimized. Section 4 presents integer linear programming (ILP) models for three TAM optimization problemsone considering multiple test insertions, and two considering stack TSV testing. Section 5 presents experimental results for various stacks constructed from several SOCs from the ITC 2002 SOC test benchmarks [14] . Finally, Section 6 presents conclusions drawn from this work. An appendix with a complete ILP model for the first problem is included at the end of this paper.
Prior Work
As interest in 3D-SICs has increased, many papers on the topic of 3D-SIC testing have been published. Heuristic methods for designing core wrappers in 3D ICs were developed in [15] . ILP models for test architecture design for each die in a stack are presented in [16] . While these ILP models take into account some of the constraints related to 3D-SIC testing such as the TSV limit, this approach does not consider the reuse of die-level TAMs, multiple test insertions, or TSV tests. A TAM wire-length minimization technique based on simulated annealing is presented in [17] . While this work allows both pre-bond and post-bond tests, TAMs can start and end on any stack tier, which is unlikely in a 3D-SIC. Heuristic methods for reducing weighted test cost while taking into account the constraints on test pin widths in pre-bond and post-bond tests are described in [18] [11] , and they do not make any unrealistic assumptions on die wrappers or the 3D TAM.
In [19] , the authors presented an expanded wrapper architecture for 2D ICs using modified wrapper cells in which each wrapper cell can be connected to two TAMs. As opposed to the 1500-standard wrapper (referred to in the rest of this paper as a "thin" or 1500-like wrapper), this expanded wrapper architecture, or "fat" wrapper, allows for core-external test (EXTEST) and core-internal test (INTEST) to be run in parallel. We consider both types of wrappers; in particular, the use of fat wrappers in this work is a natural extension of die-level wrappers to allow for die-external tests (TSV tests) and die-internal tests in parallel.
In our recent work, we introduced optimization techniques for minimizing the test time for final stack test [20] . A global limit was set on the number of dedicated TSVs to be used for test access and constraints were imposed on test bandwidth due to a limited number of test pins on the lowest die in the stack. A drawback of [20] is that it does not consider multiple test insertions for testing the parial stack. Another limitation is that the test time for TSVs and die-external logic is ignored in the optimization framework. Finally, in [20] , TSV limits for the 3D TAM apply to the complete stack, and not on a per-layer basis.
In this paper, we generalize the optimization models from [20] to allow for multiple test schedules and optimization for any number of or all post-bond stack tests. We maintain the test-bandwidth constraints but use more realistic constraints on dedicated test TSVs by considering a maximum number of TSVs per die, as opposed to a global limit. We further extend previous work by minimizing test time for die-internal and die-external tests using both fat and thin wrappers.
Problem Definitions
In a 3D-SIC, the lowest die is usually directly connected to chip I/O pins, therefore it can be tested using package pins. To test the other dies in the stack, TAMs that enter the stack from the lowest die should be provided. To transport test data up and down the stack, "test elevators" need to be included on each die except for
the highest die in the stack [10] . The number of test pins and test elevators, as well as the number of TSVs used, affect the total test time for the stack.
Many new manufacturing steps are needed for the production of 3D-SICs than for 2D-SICs, including TSV creation, wafer thinning, alignment, and bonding. These steps can introduce possible defects that do not arise for 2D-SICs [9] . Such defects include incomplete fill of TSVs, misalignment, peeling and delamination, and cracking due to back-side grinding. It is difficult to carry out pre-bond testing of TSVs due to limitations in probe technology and the need for contactless probing. Thus, post-bond partial-stack and in-stack TSV testing are needed to detect these new defects and reduce defect escapes.
In this paper, our optimization methods consider the use of IEEE 1500 style die level wrappers such as those proposed in [11] . We choose 1500-style wrappers for several reasons. They can be easily extended to the die level to provide a uniform interface for each die. They make no assumptions on the on-die TAM and they allow for greater flexibility. The test modes associated with a 1500-style wrapper, such as bypass and EXTEST, map naturally to the tests required for a 3D stack. The proposed optimization techniques, while requiring some means for test access on the lowest die, do not make any assumptions on the specific test methods being utilized. Thus, the proposed optimization methods can be used with any appropriate standard of choice.
It is important to note the costs associated with multiple test insertions for a 3D stack. Each additional test insertion increases the cumulative test time for the 3D stack, generally with a greater increase associated with a more complete stack. Multiple test insertions also require a single chip to be on a tester multiple times during stacking, which takes up valuable test resources. Our optimization methods do not explicitly take into account increased costs associated with multiple test insertions. Instead, it allows the designer to optimize a 3D test architecture for whichever test insertions he has deemed appropriate based on cost models [12, 13] . Figure 2 shows an example of 3D test access architecture for a 3D stack with 3 dies, with the die on the left being the lowest die in the stack. In this example, 1149.1 style test access is used on the bottom die. As can be seen, a 3D TAM routes test data to and from each die in the stack, and a 2D TAM exists on each die for internal test data routing. Each die level wrapper contains its own instruction register for selecting between test modes such as bypass, INTEST, and EXTEST.
The following example highlights the limitations of optimization techniques that are oblivious to multiple test insertions. In [20] , optimization decisions were made considering only the final stack test after all die have been bonded. These models cannot be directly applied to optimize for multiple test insertions, as shown in Figure 3 .
(a) (b) Example: We attempt to optimize for two test insertions, the first with three die on the stack as in Figure 3 (a) and the second with all four die as in Figure 3 We then attempt to add the fourth die while preserving the previous architecture, but this leads to a violation of the TSV limit since we now need 100 TSVs (the test elevators marked in the figure exceed the mandated upper limit lem; see Figure 4 . Figure 4 (a) shows a three-die stack with a test architecture using fat wrappers to allow TSV tests to take place in parallel with module testing on the dies. Each die has a separate TAM for die-external tests, both to higher and lower die in the stack. Each TAM has its own width and utilizes different test pins. Below we present the problem definitions for these two variants, where "||" refers to parallel and "−−" refers to serial. ) for the functional TSVs between die − 1 and (for > 1). The goal is to determine an optimal TAM design and a test schedule for the stack for die-internal and die-external tests such that the total test time is minimized and the number of test TSVs used per die does not exceed . □
All three problems presented above are -hard from "proof by restriction" [21] , as they can be reduced using standard techniques to the rectangle packing problem, which is known to be -hard [22] . For example, for Problem , if we remove the constraints related to maximum number of TSVs and consider only the final stack test insertion, each die can be represented as a rect-
with width equal to the total number of test pins and height equal to the total test time for the stack, which needs to be minimized. Despite the -hard nature of these problems, they can be solved optimally since the number of layers in a 3D-SIC is expected to be limited, e.g., up to four layers have been predicted for logic stacks [23] .
Test-Architecture Optimization
In this paper, we use ILP to solve the problems defined in the previous section. The problem instances in practice are relatively small for realistic stacks with anywhere from two to eight dies, therefore ILP methods are good candidates for solving these optimization problems.
ILP Formulation for Problem
To create an ILP model for this problem, we need to define the set of variables and constraints. We first define a binary variable , which is equal to 1 if die is tested in parallel with die for a test insertion when there are die in the stack, and 0 otherwise. There are − 1 test insertions, one for each additional die added to the stack such that ranges from 2 to . Constraints on variable can be defined as follows:
The first constraint indicates that every die is always considered to be tested with itself for evert test insertion. The second constraint states that if die is tested in parallel with die for insertion , then die is also tested in parallel with die for insertion . The last constraint ensures that if die is tested in parallel with die for insertion , then it must also be tested in parallel with all other dies that are tested in parallel with die for insertion .
Next, we define a second binary variable , which is equal to 0 if die is tested in parallel with die on a lower layer ( > ) for insertion , and 1 otherwise. The total test time for the stack is the sum of test times of all dies that are tested in series plus the maximum of the test times for each of the sets of parallel tested dies for all test schedules at every test insertion. Using variables and , the total test time for all test insertions with the set of dies can be defined as follows.
It should be noted that Equation (4. techniques. The linearized function for total test time can be written as follows.
As number of test pins used for parallel testing of dies should not exceed the given test pins across all test schedules for every test insertion, a constraint on the total number of pins used to test all dies in a parallel set in any given test insertion can be defined as follows for all .
Similarly, the total number of used TSVs should not exceed the given TSV limit ( ) for each die face across all test insertions. It should be noted that 2 is the limit for the upper face of die 1 and the lower face of Die 2, 3 is for the upper face of die 2 and lower face of Die 3, and so forth. The number of TSVs used to connect layer to layer − 1 is the maximum of the number of pins required by the layer at or above layer that takes the most test pin connections, and the sum of parallel tested die at or above layer in the same parallel tested set across all test insertions. Based on this, we can define the constraint on the total number of TSVs used in a test architecture as follows.
We can linearize the above set of constraints by representing the function by a variable . Finally, to complete the ILP model for Problem , we must define constraints on binary variable and the relationship between binary variable and . For this purpose, we first define a constant that approaches but is less than 1. We then define as follows:
The first equation forces 1 to 1, since the lowest layer can not be tested in parallel with any layer lower than itself. Constraint (4.9) defines for the other layers. To understand this constraint, we first make the observation that the objective function (as shown in Equation (4.4)) would be minimized if each is zero. This would make the objective function value equal to 0, which is an absolute minimum test time. Thus, we only need to restrict to 1 where it is absolutely necessary, and then we can rely on the objective function to assign a value 0 to all unrestricted variables. This equation considers the range of values that the sum of can take. The fraction in the equation normalizes the sum to a value between 0 and 1 inclusive, while the summation considers all possible cases for a die being in parallel with die below it. The complete ILP model is shown in Figure 11 in the Appendix. is set to 20, such that there can be no more than 20 dedicated test TSVs between any two die (this limits the TSVs per die to 40). There are two test insertions, the first when Die 1 and Die 2 are stacked and the second for the complete stack. In the first test insertion ( = 2), we calculate the optimal solution that Die 1 and Die 2 are tested in parallel. As such, 1,1,2 , 1,2,2 , 2,1,2 , and 2,2,2 are all equal to 1. Die 2 is tested in parallel with a die below it (Die 1), so 2,2 is 0 and 1,2 is 1. The test time for this test insertion is 1333098 cycles, since 1,2 is 1333098 and 2,2 is 0.
For the second test insertion ( = 3), the optimal solution is to test Die 1 and Die 2 in parallel, and then test Die 3. Since Die 1 and Die 2 are tested in parallel again, 1,1,3 , 1,2,3 , 2,1,3 , and 2 The above ILP model is a generalization of the special case presented in [20] , in which test-time was minimized only for the final stack test. If the variable is constrained to take only one value, namely , in , then optimization will produce a test architecture and test schedule that minimizes test time only for the final stack, i.e., the objective in [20] . We refer to optimization for only the final stack test as and for hard and soft die, respectively. Therefore, an advantage of the optimization model proposed here is that it is flexible-it can be easily tailored to minimize test time for any number of stack tests, from one final test to two or more intermediate test insertions. Multiple options can be automatically generated for testing the stack. For example, suppose we are interested in two test insertions-after the second die is bonded to the first die and the final stack test. By allowing to now take two values, namely 2 and , we can minimize the test time for the stack considering only these two insertions.
ILP Formulations for
,|| and The above expression begins by taking the floor of /4. We assume that, between two die, there are any number of functional TSVs that are used for either input to the die or output from the die. The TSV tips are latched and the corresponding flip-flops are connected to form one or more scan-chains. Thus, the scan-flops are also assumed to be unidirectional. In order to quickly test all TSVs, we must allow for shift-in, shift-out, and capture on both sides of the TSVs simultaneously. This requires 4 test pins for each TSV scan-chain, i.e., an input and output pin for each scan-chain for both sides of the TSVs as seen in Figure 6(a) . As can be seen from Figure 6(b) , there is no reduction in test time for stack tests if we consider unequal numbers of TSV scan-chains for die on either side of the TSVs, as the bottleneck in test time would then be the die with the fewest TSV scan-chains (TSV testing requires use of TSV scan-chains on both sides of the TSVs in parallel). At least four more test pins must be added for a reduction in test time as shown in Figure 6 (c). Thus, we evenly divide test-pin use among the die for TSV tests, and the number of scan-chains on either side of the TSVs is ⌈ /⌊( /4)⌋⌉. We multiply this by the number of patterns required for TSV testing plus one, to accomodate for shift-in and shift-out operations. Without loss of generality, we do not consider the number of test pins required for control signals to a die-level wrapper for TSV testing. Figure 7 shows the difference between fat and thin wrappers. In Figure 7 (a), the die-internal and die-external TAMS utilize different test pins. Thus, EXTEST can be performed in parallel with either or both INTEST of Die 1 and Die 2. This is representative of fat wrappers. For thin wrappers, Figure 7(b) shows that the same test pins are utilized for die-internal and die-external tests, and as such test data must be multiplixed to the correct TAM. In this case, the INTEST of each die can be performed in parallel, but EXTEST cannot be performed in parallel with INTEST on Die 1 or Die 2 (though it can be in parallel with INTEST of other dies in the stack). The ILP formulation for 3D-SICs with hard cores including TSVs is derived in a similar manner as the 3D-SIC with hard cores in Problem 1. We begin by removing the subscript from all variables that accounts for multiple test insertions-we only consider the final stack test. As stated before, we consider a set of TSVs between two layers to be a virtual die for purposes of optimization, such that Die 1 is the lowest die in the stack, Die 2 represents the TSVs between Die 1 and Die 3, Die 3 is the second die in the stack, Die 4 represents the TSVs between Die 3 and Die 5, and so on. In this way, odd-numbered die are actual die and even-numbered die represent the TSVs between two odd-numbered dies.
(a) (b) Figure 7 : Simplified illustration of fat wrappers and thin wrappers.
We must add variables and constraints in order to accurately model the dies representing TSVs. We begin by defining the variable = 1 if a die representing TSVs can be tested in parallel with the dies below and above it and 0 otherwise. In the case of fat wrappers, we leave as a decision variable and in the case of thin wrappers we force to 0 for all die representeing TSVs. For actual (even-numbered) die, is always 1. If a die representing TSVs can be tested in parallel with the die around it, then the number of test pins given to TSV testing reduces the number of pins available to the die for testing. Otherwise, the TSVs can use all the test pins that are utilized by the dies around it. We define the set to contain all even-numbered die, or the actual die, in the stack. We then place the following restrictions on for fat wrappers (the first constraint applies for thin wrappers as well):
This restricts to 0 if all pins in a die are shared for TSV tests, and otherwise leaves the variable open to optimization. We use to accurately represent the number of TSVs and test pins used in the stack and use these constraints to further restrict the variable . We redefine the variable as follows:
Thus, we tally test elevator use only between actual die, but take into account the number of extra test elevators used for the TSV layer considering parallel testing. For thin wrappers, the last constraint on is removed. For Die 1, the last constraint is instead ≥ + +1 ⋅ +1 . Test pin constraints therefore reduce to:
The constraint (4.20) accounts for the combined test-pin use by dies and TSVs, when they are tested in parallel or they are tested serially. This contraint is removed for thin wrappers, and for the first die instead reads
It is further necessary to update the variable as follows:
For ,−− , we further add the constraint:
Experimental Results
In this section, we present experimental results for the ILP models given in Section 4. As benchmarks, we have handcrafted two 3D SICs (as shown in Figure 8 ) using several SOCs from the ITC'02 SOC test Benchmarks as dies inside SICs. These are equivalent to two of the benchmark 3D-SICs in [20] . The SOCs used are d695, f2126, p22810, p34292, and p93791. In SIC 1, the most complex die (p93791) is placed at the bottom, with die complexity decreasing as one moves up the stack. The order is reversed in SIC 2. To determine the test architecture and test time for a given die (SOC) with a given TAM width, we have used the TAM design method in [24] for daisychain TestRail architectures [25] . For problem instances with hard dies, the test times (cycles) and TAM widths for different dies are listed in Table 1 . Note that test pins were assigned to dies based on their sizes in order to avoid very large test times for any individual die.
For a fixed value of and range of values of , Table 2 presents results for for the two benchmark SICs. They are compared against optimized results for only the final post-bond stack test (referred to as . The ILP models were run and optimal results obtained using XPRESS-MP [26] . The CPU times for the experiments on an AMD Opteron 250 with four Gigabytes of memory was in the range of a few seconds to eight minutes. Table 2 we can see that the proposed method can provide considerable reductions in test time over the optimization methods of [20] For a different number of TSVs ( ), Figure 9 (a) and Figure 9(b) for a given does not always decrease the test time. These Pareto-optimal points can easily be seen in Figure 9 for both benchmarks. Furthermore, we see that optimizing for the final stack test does not always reduce test time when we consider multiple test insertions. In fact, we often see that the test time is higher when we increase the values of and . It should be noted that for all optimizations with the same values of and , the stack configuration (SIC 2) with the largest die at the highest layer and the smallest die at the lowest layer is the best for reducing test time while using the minimum number of TSVs. This is because the most complex dies with the longest test times are tested in the fewest insertions. For example, the die at the top of the five-die stack is only tested once, while the die at the bottom is tested four times.
For
,|| , two extra test pins were added to the hard die for each EXTEST TAM on a die. For the highest and lowest dies in the stack, this implies an addition of two test pins, while for other dies it implies an addition of four test pins. For our problem instances, we assumed, without loss of generality, that each die had 10,000 functional TSVs, requiring 20 patterns. It has been reported that the number of tests for TSVs is likely to grow logarithmically with TSV count [10] . In order to make comparisons between ,−− and ,|| , either two or four test pins were added to the total TAM width of the hard dies. As can be seen in Figure 10 , there is a large dependence of test length on both and . These parameters and the hard TAM widths determine whether serial testing or parallel testing of TSVs with the dies is adopted to lower test times. For a majority of values for and for the TAM widths chosen, parallel testing leads to lower test times. We note here that these optimizations are only for the final stack test, so SIC 2 results in longer test times than SIC 1, as expected from previous work [20] .
Conclusions
We have presented generalized optimization methods to minimize test time for a 3D-SIC, either for the final stack test or for any number of multiple test insertions during bonding. These methods provide us with optimized test schedules for each test insertion. We have presented optimization techniques that consider both die-internal and die-external test for both fat and thin diewrappers. The proposed optimization methods incorporate constraints on both test bandwidth and the number of dedicated test elevators per die. Results have been presented for two different stack configurations with die made up of five SOCs taken from the ITC'02 SOC Test Benchmarks. These results show that optimization methods that only consider the final stack test provide significantly suboptimal results (higher test times) if multiple test insertions are carried out. Optimizations for hard die that take into account die-internal and die-external tests show that in general fat wrappers reduce test time more than thin wrappers.
