Abstract-Built-in self-test (BIST) is a well-known design technique in which part of a circuit is used to test the circuit itself. BIST plays an important role for embedded memories, which do not have pins or pads exposed toward the periphery of the chip for testing with automatic test equipment. With the rapidly increasing number of embedded memories in modern SOCs (up to hundreds of memories in each hard macro of the SOC), product designers incur substantial costs of test time (subject to possible power constraints) and BIST logic physical resources (area, routing, power). However, only limited previous work addresses the physical design optimization of BIST logic; notably, Chien et al. [7] optimize BIST design with respect to test time, routing length, and area. In our work, we propose a new three-step heuristic approach to minimize test time as well as test physical layout resources, subject to given upper bounds on power consumption. A key contribution is an integer linear programming ILP framework that determines optimal test time for a given cluster of memories using either one or two BIST controllers, subject to test power limits and with full comprehension of available serialization and parallelization. Our heuristic approach integrates (i) generation of a hypergraph over the memories, with test time-aware weighting of hyperedges, along with top-down, FM-style min-cut partitioning; (ii) solution of an ILP that comprehends parallel and serial testing to optimize test scheduling per BIST controller; and (iii) placement of BIST logic to minimize routing and buffering costs. When evaluated on hard macros from a recent industrial 28nm networking SOC, our heuristic solutions reduce test time estimates by up to 11.57% with strictly fewer BIST controllers per hard macro, compared to the industrial solutions.
I. INTRODUCTION
In modern SOCs, embedded memories (normally, SRAM blocks) can account for more than 50% of die area [25] . Since a defect in embedded memory can make the entire chip fail, design for test (DFT) techniques for embedded memory are essential. Built-in selftest (BIST) is an increasingly effective and necessary DFT technique in which part of a circuit is used to test the circuit itself [1] . In particular, BIST is now ubiquitous for embedded memories, which do not have pins or pads exposed for testing with automated test equipment (ATE).
Memory BIST affects design quality and chip cost in several basic ways.
• BIST controller logic occupies silicon real estate, and contributes to die area, leakage power, and routing congestion. All else being equal, the fewer BIST controller blocks, the better.
• The widths and depths of embedded memories assigned to a given BIST controller must be "packed" into a feasible test schedule that minimizes test time subject to maximum power constraints. The test time directly impacts product cost and is a first-class design consideration, especially in a design with many memories.
• The physical placement of a BIST controller logic block relative to its associated memory blocks affects not only routability, but also the signal delay between the controller and the memories. Larger distances force the use of more buffering and lower-V T devices to meet timing and electrical constraints; this costs more power. From these considerations, it is apparent that the co-optimization of physical design resources, test power, leakage power, and test time falls between front-end DFT groups and back-end physical design groups. On the one hand, a floorplan-oblivious partitioning of memories to BIST controllers might force use of low-V T (LVT) cells to meet timing requirements. On the other hand, the physical design (PD) engineer's suggested partitioning may lead to less congestion, routing cost, and signal delay between memories and BIST logic, but with dramatically increased test time.
In this paper, we describe a heuristic optimization that smooths the interactions between front-end DFT and back-end PD, reducing iterations and schedule costs. Our heuristic minimizes test time as 978-3-9815370-2-4/DATE14/ c 2014 EDAA well as test physical layout resources, subject to given upper bounds on power consumption. A new integer linear program (ILP) formulation finds the optimal test time for a given cluster of memories using either one or two BIST controllers, taking full advantage of any available serialization and parallelization of the memory self-test. When evaluated on hard macros from a recent industrial 28nm networking SOC, our heuristic solutions reduce test time estimates by up to 11.57% with strictly fewer BIST controllers per hard macro, compared to the industrial solutions.
Our main contributions can be summarized as follows.
• We propose a weighted hypergraph construction that allows use of top-down min-cut partitioning of memories into clusters that have good physical design and test scheduling attributes.
• We propose an ILP that comprehends parallel and serial testing of a given group of memories as it finds a minimum-test time solution with one or two BIST controllers.
• We use the above two elements, along with bottleneck matching to find BIST logic placement locations, in a heuristic that simultaneously reduces both BIST logic and test time costs in hard macros from a recent 28nm networking SOC. In the remainder of this paper, Section II briefly reviews related works in the areas of test scheduling and memory BIST. Section III describes our ILP formulation to minimize test time taking advantage of available serialization and parallelization. Section IV presents our heuristic approach, and Section V gives experimental results with industrial testcases. Section VI describes directions of ongoing work and concludes the paper.
II. RELATED WORKS
In this section, we broadly classify related literature as dealing with (1) test scheduling and (2) BIST controller optimizations.
A. Test Scheduling
Test time reduction has long been a basic goal of DFT research, since test time is directly related to test cost. Parallel (simultaneous) testing reduces test time but is constrained by power and bandwidth (pin count) limits. Works such as that of Yao et al. [20] , formulate and solve the test scheduling problem to minimize total test time while satisfying such constraints. Iyengar et al. [12] [13] adapt a rectangle packing problem formulation to test scheduling; they cooptimize test access mechanism (TAM) architecture and test wrapper, while designating a group of tests. Zou et al. [21] formulate SOC test scheduling as two-dimensional bin packing under given pin constraints, and simulated annealing is used to search for a heuristic optimum test schedule by perturbations to an initial solution.
Other researchers have applied integer linear programming (ILP) to find optimal test schedules under constraints [5] [6] [8] [15] . Chakrabarty [5] proposes test access architectures that incorporate place-and-route constraints arising from interconnections. [6] uses mixed integer-linear programming (MILP) to optimize test schedules for core-based systems; a heuristic algorithm efficiently solves larger problem instances for which the MILP approach has excessive runtime. Liu et al. [15] apply ILP formulation for NOC instances. Chin and Nourani [8] propose a flexible ILP-based test scheduling environment with many user options.
Wang et al. [19] develop a test scheduling algorithm based on elements of the March algorithm for memory BIST; the objective is to minimize overall testing time under a power constraint.
Unlike previous works, we study the minimization of total test time in the context of a mixture of serial and parallel testing, with multiple memory BIST controllers, by considering physical information of memories. 1 We note that most of the previous literature on test scheduling addresses scheduling for logic cores, where the testing is mainly performed by scan chain techniques. By contrast, we address embedded memory testing using multiple memory BIST controllers, where the memories have different sizes, test times, and test power values.
B. Design Optimizations for Memory BIST Controllers
Most works in the memory BIST literature focus on architectural and testing aspects, even though design optimization of memory BIST can provide substantial benefits to the entire chip design and to test quality. To our knowledge, relatively few works exist in the realm of (physically-aware) design optimization of memory BIST. 2 A memory grouping method for sharing memory BIST logic is proposed by Miyazaki et al. in [17] . Area overhead reductions are achieved by the grouping of memories for parallel and serial testing. Devanathan et al. [9] propose a physically-aware memory BIST datapath synthesis framework, wherein a hierarchical synthesis approach achieves correct-by-construction, area-efficient memory BIST solutions. Devanathan et al. demonstrate the benefits from strategic approaches to physically-aware BIST in [11] and built-in selfrepair (BISR) design optimization methods in [10] : such techniques mitigate the difficulties of physical design closure such as congestion and timing closure, even as the numbers of memory instances and BIST controllers in complex SOCs continue to increase. The authors of [10] [11] also note that their methods enable designers to apply more effective tests and reduce verification cycle times.
Chien et al. [7] propose a memory BIST design optimization method to minimize test time, wire length and total area while considering several practical design constraints. To our knowledge, [7] is the first published work considering aspects of physical design for memory BIST controllers. The authors adopt an integer linear programming (ILP) formulation for the assignment of memories to controllers. They then apply legalization and refinement steps to meet user-specified constraints and to further improve the quality of their solution. Although [7] is the previous work that is closest to ours, we observe that it makes a number of simplifications that we avoid, e.g., (i) all memory instances in a BIST cluster are tested in parallel (leading to an unrealistic test time estimate); and (ii) only one cluster is tested at a time (preventing exploitation of parallel testing with multiple BIST controllers). 3 
III. ILP FORMULATION
We develop an integer linear program (ILP) to solve the memory test scheduling problem when using multiple BIST controllers. Note that our ILP formulation is very different from those of [5] [6] [8] [15] since we use logical constraints to define parallel and serial testing. Table I defines notations used in our discussion. The objective is to minimize total test time, i.e., minimize max
where
We assume that a memory has test time proportional to its depth [17] and test power proportional to the square root of its size. Based on our studies, we see that allowing both serial and parallel testing of memories can reduce test time as illustrated in Figure 1 . 
Indicator whether m i is tested with BIST controller
Indicator whether m i and m j belong to the same BIST controller
Indicator whether m i and m j are tested in parallel with the same BIST controller
Indicator whether m i is tested before starting test of m j with the same BIST controller
Indicator whether
Positive and very small real number, 0 < ε 1
The ILP constraints are as follows.
Maximum power constraint. We use E MAX to denote an upper bound on maximum available test power. The instantaneous testing power E(t q ) cannot exceed E MAX , as indicated by constraint (2) . E(t q ) is the sum of test power consumption for all memory instances m i being tested at time t q , as shown in Equation (3), where U i (t q ) indicates whether m i is being tested at time t q , and E(m i ) is the test power of m i . The constraint (4) ensures that all memories must be tested to obtain a valid solution.
BIST assignment constraint. We use the constraint (5) to ensure that each memory is uniquely assigned to a BIST controller for testing. 
IV. CO-OPTIMIZATION OF TEST SCHEDULING AND MEMORY BIST LOGIC PLACEMENT We now describe our heuristic methodology for the co-optimization of test scheduling and memory BIST logic placement. Modern semiconductor chips contain hundreds of embedded memories scattered across the entire die. These memories can have various widths and depths, and can belong to different clock and logic hierarchies. Both the number and complexity of memory instances make the test scheduling problem extremely hard. We utilize a "divideand-conquer" approach to develop a three-step heuristic method that (1) initially partitions all memories based on physical information using MLPart [4] [24]; (2) solves the test scheduling problem using an ILP formulation, followed by additional partitioning for better test time optimization; and (3) places memory BIST logic for each partition to minimize wirelength between memory BIST logic and memories. The goals of our heuristic approach are (1) minimization of test time, (2) reduction of number of partitions (i.e., number of BIST controllers), and (3) minimization of wirelength between each BIST and memories. We have developed a solver that uses command-line options as shown in Table II . Algorithms 1-3 outline our heuristic modeling approach. A. Memory Partitioning Memory partitioning is the "divide" step in our heuristic approach. We divide memory instances into k partitions using MLPart [4] [24], a min-cut hypergraph partitioner based on the multilevel FiducciaMattheyses hypergraph partitioning [24] algorithm. The input to MLPart is a hypergraph G, where each node in G corresponds to a memory in the design (Algorithm 2). We define edge weights based on parameters such as memory shape, depth, power, location, etc. We expect that partitioning memories that have the same shape or depth end if // partition p j is the input for the next bipartitioning 13 :
for criterion index r = 1 to 6 do
15:
G ← GenerateHypergraph(p j , r, G); 16: end for 17:
18:
end for 20:
if D max > maxD then 22: if n == numMaxP then add node v i to G; 5: visited(v i ) ← false; 6: end for 7: for i = 0 to |p| − 1 do
8:
V conn ← / 0; e ← null;
9:
V conn ← {v i }; // v i is reference node 10: visited(v i ) ← true; 11: for j = 0 to |p| − 1 do
12:
if i = j then 13: if (visited(v j ) == false) || (r ≥ 4) then 14: if v i and v j satisfy criterion crit r then 15: add (hyper)edge e to G; 25: end if 26: end for 27: return G; into one group leads to higher opportunity to minimize test time. This is because memories with the same depth can be tested in parallel and memories with the same power can be tested in serial, which minimizes idle space in test time and power. In addition, we assign larger weights to edges when memories are closer. 
numAddBIST ← numAddBIST − 1; 20: end while 21: P out ← P k ; Table III summarizes the edge weights used in G, each corresponding to one criterion. crit 1 , crit 2 , and crit 3 are the criteria of hyperedges between memories that have the same shape (crit 1 ), depth (crit 2 ) and test power (crit 3 ), respectively. In addition, crit 4 , crit 5 , and crit 6 specify the criteria of edges between pairs of memories with distances ≤ longD (crit 4 ), ≤ (longD + shortD)/2 (crit 5 ), and ≤ shortD (crit 6 ), respectively. In our implementations, we set longD and shortD to 1000μm and 250μm 4 , respectively. The weights of hyperedges (respectively, edges) are additive, e.g., memories that have the same shape are connected by hyperedges with weight K 1 +K 2 +K 3 since memories having the same shape also have the same depth and power. 5 In Algorithm 1, we loop through a number of partitions ranging from numMaxP down to numMinP, in order to obtain a k-way partitioning result that satisfies the given constraints of maxMemP and maxD. Since MLPart only returns bipartitions, we execute MLPart k − 1 times to obtain a k-way partitioning (Line 3 in Algorithm 1). At each iteration, we choose one next partition as the input to MLPart (Lines 5-12 in Algorithm 1) based on the following criteria, in order of priority.
1) The partition that violates the maxMemP constraint, which is defined as twice the current average number of memories per partition.
2) The partition that violates the maxD constraint, which is defined as the half-perimeter of bounding box of the memory blocks in the corresponding partition. 3) One each of the partitions with the maximum number of nodes and with the largest diameter, respectively. Both are partitioned using MLPart, and the one with the smaller cut is selected. We define size of a partition as the number of memories in the partition. The above criteria result in partitions that have similar sizes. We also specify a tolerance in MLPart to further promote balanced partition sizes.
Fewer partitions result in a larger solution space for scheduling, which in turn leads to less test time. Therefore we minimize the number of partitions with respect to the maximum diameter and maximum size constraints. In Algorithm 1, we keep reducing the number of partitions as long as the diameter of all partitions is ≤ maxD 
(Lines 20-27 in Algorithm 1). If one of n partitions violates the maxD constraint, we end Algorithm 1 and return an (n+1)-or n-way partitioning result by comparing n to numMaxP. In Algorithm 2, we construct weighted hypergraph. After mapping each m i in p into v i (Lines 2-6 in Algorithm 2), we collect the set {V conn } of all nodes that satisfy crit r with respect to the reference node v i . If |V conn | ≥ 2, we connect all nodes in V conn as hyperedge e and add e to hypergraph G (Lines 7-26 in Algorithm 2).
B. Test Scheduling
After partitioning, we solve the ILP described in Section III to obtain a test schedule. The number of extra BIST controllers (numAddBIST ) is calculated as the difference between the current number of partitions (k) and numMaxP. Utilizing extra memory BIST controller resources can reduce the overall test time. Figure 2 illustrates an example showing how the test time can be reduced by utilizing additional memory BIST controllers.
For further test time reduction with extra BIST controllers, we try splitting each partition. SolveMBIST ILP(n) returns the solution (Sol) of SolveMBIST ILP(n) for a given number n of memory BIST controllers. The solution (Sol) contains test cost (Sol.cost) for the corresponding partition. Our heuristic allows for at most two memory BIST controllers (i.e., n = 2) for each partition. 6 When numAddBIST is larger than zero, we run SolveMBIST ILP(n) with both one and two memory BIST controllers to calculate the benefit (Gain p i ) of splitting the partition p i . Since the two partitions are generated by SolveMBIST ILP (2) such that the given power constraint is satisfied, both of the split partitions can be tested simultaneously, enabling us to achieve a test time reduction.
We observe that most memories that have the same shape are scheduled in parallel with the same memory BIST controller; we can therefore pre-group those memories to reduce the runtime of ILP solver without significantly affecting solution quality. We group memories that have the same shape (width×depth) in each partition and consider the group as a single large memory to improve runtime by reducing the number of ILP constraints (GroupMemories(s j ), Line 5 in Algorithm 3). 7 The number of memories in a group (s j ) can be decided by the max power constraints. After solving ILP (Lines 6-7 in Algorithm 3), these grouped memories are tested in parallel with the same memory BIST controller. Since s j affects the solution of ILP, we try different s j in GroupSizes to get the best solutions. In Algorithm 3, Sol p i ,s 1 ,best gives the minimum test cost with one BIST logic by s 1 , and Sol p i 1 ,s 2 ,best and Sol p i 2 ,s 2 ,best give the minimum test cost with two BIST logics by s 2 (Lines 9-10).
At the last stage in this procedure (Lines 13-19 in Algorithm 3), we identify the partition that has the largest Gain p i in the overall test scheduling from splitting with additional memory BIST controllers. The selected partition is divided into two partitions (p a and p b ) to be mapped to the additional memory BIST controller. When all the available memory BIST controllers are consumed, the procedure ends and returns the partition and test scheduling result. We exit the while loop when the largest Gain p i is zero (Lines 14-16 in Algorithm 3). 
C. Memory BIST Logic Placement
In the memory BIST logic placement step, we first define grids that cover the entire design. Any grid square that does not intersect memories is a possible location for BIST logic placement. We calculate the diameter from a grid square to all memories in a partition, and use this as a cost parameter. By calculating this cost parameter for all grid squares and all partitions, we generate a two-dimensional cost matrix for each grid square and memory partition. We then use this cost matrix to formulate and solve a min-weight maximum-matching problem in a bipartite graph, which is efficiently solvable using the Hungarian algorithm [23] . The resulting matching heuristically addresses timing criticality in paths between BIST logic and memories.
V. VALIDATION AND EXPERIMENTAL RESULTS Our heuristic implementation is developed in C++ and compiled with g++ 4.8.0. All experiments are run on a 2.5GHz Intel Xeon E5-2640 Linux workstation with 128GB memory and 12 hyperthreaded CPU cores. In the partitioning step, we apply MLPart [24] on hypergraphs generated using Algorithm 2 above. In the scheduling step, we use CPLEX 12.5.1 [22] as our ILP solver to schedule testing of memories in each partition. Last, we solve the minweight maximum matching problem in a bipartite graph [23] to assign BIST logic placement locations to partitions. (To our understanding, the turnaround time of our heuristic is not critical, and resynthesis of memory BIST logic after memory grouping takes only a few hours [26] .) Table II presents command-line options in our implementation. In all of our experiments, we set 200 as the power constraint since the maximum E(m i ) in testcases is 150 < E(m i ) < 200.
To validate our heuristic methodology, we use six industrial testcases, each derived from a separate hard macro in a recent 28nm networking SOC product. Parameters of these testcases are given in Table IV . The number of memories in each testcase ranges from 124 to 160 and the number of partitions ranges from 7 to 13. Maximum and minimum number of memories, and maximum diameters without BIST logic, are also presented in Table IV.   TABLE IV TC1  143  13  26  1  3900  TC2  150  11  28  2  4500  TC3  124  8  22  8  2200  TC4  160  13  30  1  3400  TC5  137  7  26  11  3200  TC6  148  12  25  1  4100   Table V compares industrial results and our results. We achieve up to 11.57% improvement in estimated test time, strictly smaller number of partitions (i.e., number of memory BIST controllers), and reduced maximum diameter with respect to BIST logic placement location, compared to the industrial results. Considering that test time is directly related to test cost and that fewer number of memory BIST logic leads to smaller die area, we believe this is a significant improvement. Furthermore, smaller maximum diameter of each memory partition (as shown in Figure 3) indicates better timing, which allows at-speed testing with smaller gate sizes and higher-V T cell instances.
VI. CONCLUSIONS
In this work, we propose a heuristic methodology to co-optimize partitioning, test scheduling and memory BIST logic placement to minimize test time. Our heuristic approach generates hypergraphs over memories with test time-aware weighting of hyperedges, along with top-down, FM-style min-cut partitioning. Our ILP formulation comprehends parallel and serial testing for test time optimization with respect to power constraints. Further, we place the BIST logic to minimize the maximum diameter for each BIST group, which minimizes routing and buffering costs and improves timing. On hard macros from a recent industrial 28nm networking SOC, our results achieve up to 11.57% reduction in test time compared to the industrial solutions, using strictly fewer BIST controllers.
Our ongoing work pursues three main directions. (1) First, recall that we construct the weighted hypergraph instance for top-down partitioning independently of any map of placement density or routing congestion. We currently do not evaluate our memory partitioning and BIST logic placement solutions after placement and routing, and signoff timing analysis. To bridge this gap, we seek to integrate our partitioning and BIST logic placement optimizations into a (production) physical implementation flow. apply SolveMBIST ILP (2) to optimally schedule the testing of a large cluster of memories using two BIST controllers means that the hypergraph construction at some point leads min-cut partitioning "away" from good memory clusters. Thus, we seek improved hypergraph construction and weighting such that top-down mincut partitioning more directly produces a multi-way clustering that achieves minimum test time with k BIST controllers. (3) Third, recall that an initial motivation for this work is the disconnect between frontend DFT teams and back-end PD teams. We plan to enable the use of our tool by a PD team in a production SOC design environment, to validate the accuracy and schedule impact of (i) early feedback on timing and need for LVT devices in the BIST logic, (ii) understanding of feasible memory groupings in light of test schedule and power constraints.
