Abstract-As the number of frequency domains aggressively grows in today's systems-on-chip (SoCs), the delivery of highdelay test quality across numerous frequency domains while meeting test budgets assumes crucial importance. This paper proposes a method to explore the delay test quality tradeoffs across these domains, determining an optimal distribution of the test time budget across all domains while minimizing the overall SoC delay defect escape level. Satisfaction of this goal necessitates not only consideration of fault coverage but also of the distinct characteristics of each domain, such as frequency, path length distribution, scan length, and shift speed as well as full utilization of concurrent test support while remaining within the constraints of power thresholds to provide a reliable test environment. An optimization formulation as well as efficient test time allocation methods based on convexity and fast concurrent test planning algorithms are provided.
I. INTRODUCTION
T ODAY, systems-on-chip (SoCs) integrate numerous cores, ranging from multiple microprocessors, digital signal processing cores, and graphics processing units to multimode modems. The higher level of integration, empowered by the unceasing process scaling, increases, however, the susceptibility to manufacturing defects, particularly delay defects that alter the timing of the design. It is not uncommon to have hundreds of domains in state-of-the-art SoCs with various frequencies, ranging from very low speeds to GHz levels. It is therefore crucial to apply delay tests in order to ensure that every single frequency domain meets its respective speed target, steadily expanding the share of delay test in overall test time.
Since the emergence and wide acceptance of fault-modelbased structural testing, fault coverage, such as stuck-at coverage, has been used as a primary metric to assess the quality of the test sets. As the numerous years of experience in structural testing show, fault coverage is indeed an effective metric to assess test quality for static defects that only require logical activation and propagation of erroneous behavior. However, the increasing importance of parametric attributes such as delay highlights the limitations of fault-coverage-based test quality measurements. In order to detect a delay defect, not only it is required that a logical fault detection condition be satisfied by generating a transition at the faulty location and propagating it to an observation point, but also it is imperative that the faulty behavior be propagated to an observation point within a time period determined by the target frequency. Evidently, actual signal propagation time depends on path lengths and the additional delay incurred by the delay defect. Although fault coverage considers the satisfaction of the logical detection condition, parametric attributes, such as domain frequency, path lengths, and defect size, that have direct effect on test quality, go unnoticed.
Although parametric attributes also effect other tests such as quiescent power-supply current, delay test requires particular emphasis due to, not only its increasing importance in overall test quality, but also because of the absence of a well-accepted answer to the question of how to effectively test for delay defects, particularly, with a constantly increasing number of frequency domains within SoCs. It is not uncommon to stop at an arbitrary fault coverage level in the range of 80%-90% for each domain with commonly used transition fault models, based on test time budget or coverage limitations. Even when the test cost constraint is relaxed, the questions of effective test resource usage, including whether to utilize test resources to achieve the highest possible coverage for each domain or to provide more emphasis on a particular set of domains, remain largely unanswered.
As test time is a precious commodity that directly impacts product cost, we examine the question of the most effective use of the available delay test resources (i.e., delay test time) in this paper. Since there are hundreds of domains in a typical state-of-the-art SoC with a diverse set of attributes such as domain frequency, path lengths, scan chain length, and shift speed that directly influence the delay test quality and test time, a carefully crafted allocation of overall delay test resources to these domains based on their distinct characteristics is necessary to extract the highest benefit from the allocated resources. For example, a faster frequency domain with narrow path slacks has reduced tolerance to delay defects than a slower frequency domain with wider path slacks. Consequently, a larger fraction of all possible delay defects can cause an observable failure in the faster frequency domain, effectively resulting in a higher level of defect escapes in a faster domain. Furthermore, the existence of a larger set of observable defects implies that the additional test time assigned to a faster domain would reduce the overall defect 0278-0070 c 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
escape level at a faster pace (i.e., deliver an increased rate of overall test quality improvement) than it would have been in a slower domain. It should be noted that not only the designinduced domain characteristics, such as frequency and path lengths, but also the design-for-test (DFT)-induced domain characteristics such as the longest scan chain length and scan shift speed effect the rate of overall test quality improvement as additional test time is allocated. A slower shift frequency or a longer maximum scan chain length increase the application time of an identical test set, spreading out the same test quality improvement delivered by this particular test set over a larger duration of test time, thus reducing the rate of overall test quality improvement for each unit of test time.
It is therefore essential to consider the effect of the designand DFT-induced distinct characteristics of each domain on the rate of overall test quality improvement as additional test resources are allocated while exploring the delay test quality tradeoff points across these domains. It is clear that the typical fault-coverage-driven approach used in industry to select delay test sets for each frequency domain fails to offer a medium for full exploration of the appropriate delay test tradeoff points among distinct frequency domains. Effective test resource allocation necessitates an estimation of test quality based on domain characteristics and algorithmic advances to efficiently identify optimal allocation of resources in order to maximize test quality. Evidently, the effective test resource allocation process does not necessitate the use of a specific delay test quality estimation method. Any delay test quality estimation method that considers domain characteristics can be equally utilized. A presilicon test quality estimation technique is employed in this paper as an example. This method utilizes a metric that combines delay defect size distribution information with each domain's frequency, path lengths, and fault coverage-test pattern count relationship to provide a simulation-based solution. By considering numerous domain characteristics, test quality estimation delivers a metric that correlates test time to the test quality per domain, thus introducing the possibility of framing an optimization problem to maximize the overall test quality within a test time budget. To explore the test quality tradeoffs, we first formulate the overall optimization problem, which we subsequently simplify to a computationally tractable convex optimization problem.
While the allocation of test time resources based on distinct domain characteristics opens up the possibility of enabling so far unexplored test quality maximization techniques within a test cost budget, its application in the context of conventional test architectures used in industry necessitates the consideration of numerous practical constraints. Primary among them stand test architectural techniques that enable concurrent delay testing of a variety of domains within a core. The exploitation of test concurrency necessitates the identification of the optimal schedule in regard to the domains to be tested in parallel, an approach capable of significantly reducing test time yet requiring the consideration of synchronization constraints among simultaneously tested domains. Concurrency not only maximizes test time exploitation but also imposes in turn a further constraint on simultaneous power utilization, which may necessitate judicious execution of test time/power tradeoffs.
The considerable mathematical and algorithmic challenges inherent in resolving the best allocation of test resources to various frequency domains are thus exacerbated by concurrent domain testing considerations under these additional synchronization and power constraints. While overall optimality can be delivered in the case of the fundamental optimization problem definition that we develop in this paper, the scope of optimization shifts to the development of computationally tractable algorithms aimed at the minimization of the deviation from the optimal solutions. The minimization of deviation ensures negligible error introduction into the more challenging optimization formulations, enabling an efficient exploration of the multidimensional tradeoff space of optimal test resource allocations while determining the schedule of the concurrently tested domains in the face of concurrency-induced synchronization and power constraints.
Although a slew of test scheduling techniques [7] - [9] , [24] - [28] that focus on the effective use of a test access mechanism across the cores within an SoC have been previously proposed, scheduling in the context of this paper fundamentally differs, targeting instead the identification of which domains within a core are to be concurrently tested while simultaneously considering the optimal test resource allocation for each domain within and across the cores to maximize test quality. Conventional test scheduling focuses on the effective use of a test bandwidth for a predetermined test set for each core to minimize overall test time. However, it is not concerned about whether each core is allocated a proper test time, assuming that the test time allocation and test planning (such as concurrent domain testing schedule) within the core have already been performed. The main focus of the proposed technique in this paper on the other hand consists of addressing the challenging goal of the identification of proper test time allocation for each domain (i.e., variable test set) while simultaneously considering the issue of domain scheduling under architectural and power constraints to maximize test quality within a test time budget. It should be noted that the conventional scheduling techniques could be potentially utilized in conjunction with the simultaneous test time allocation and domain scheduling method proposed in this paper. Effectively, the proposed methodology first identifies the optimal test time allocation and domain test plan, feeding the resulting delay test set for each core to a conventional test scheduling method in order to identify the final test schedule of all cores under various SoC level constraints.
Section II reviews the previous work and an overview of the proposed methods follows in Section III. Section IV presents the delay test quality estimation and the optimal test time allocation framework for sequential domain testing is provided in Section V. The simultaneous test time allocation and domain scheduling with architectural and power constraints under support for concurrent domain testing are subsequently presented in Section VI. Section VII discusses the experimental results and the conclusion is drawn in Section VIII.
II. PREVIOUS WORK The transition fault model, which does not impose a restriction on fault activation and propagation paths, and the path delay fault model, which specifies the exact path to be tested, are the most commonly used delay fault models [1] . Due to the sheer volume of possible paths in a circuit, only a small set of critical paths can be typically targeted by the path delay, aiming at identifying the chips at the tail end of the process variation distribution. Transition test remains the main test used to target random delay defects due to its topological coverage of the design and its compact size although effectiveness suffers at small delay defect detection as test generation tends to favor short activation and propagation paths.
As the delay defect occurrence frequency exacerbates with continuous process scaling, various new delay test generation techniques have been recently proposed to improve transition delay test quality. Timing-aware delay test methods [2] - [4] , [19] , [20] utilize timing information during test generation to test the faults through the longer paths, increasing small delay defect coverage. Faster-than-at-speed delay test methods [5] , [22] , [23] run the tests at a speed higher than the target frequency to reduce the timing margins, consequently increasing the small delay defect coverage. Pattern selection methods [6] , [21] start with a large set of patterns and select a subset of higher quality patterns from the initial test set to deliver a high-test quality with a compact test set. Although these techniques aim at higher test quality, they do not consider the delay test quality tradeoffs among various frequency domains. The method we propose complements these techniques and could be used with a test set generated by any of these techniques to maximize overall test quality across frequency domains.
III. OVERVIEW
As the level of integration and, consequently, design size aggressively grows, core-based test architectures are increasingly becoming the norm for testing large SoCs. In the core-based SoC test, instead of having a single test mode to test the full chip, the design is partitioned to cores and the testing of each core in isolation through core wrappers is enabled. Although numerous methods are being proposed to develop configurable core wrappers with different test channel bandwidth requirements and to achieve an efficient sharing of common test access mechanisms [7] - [9] , [24] - [28] , a typical core-based SoC test architecture in an industry setting supports a dedicated test mode for each core with the cores being tested sequentially as depicted in Fig. 1 . In this architecture, each core utilizes the full test channel bandwidth to receive the test stimulus and output the test responses. Each core is composed of multiple frequency domains depicted as F i,j in Fig. 1 . An at-speed clock generation circuitry resides on chip which can be configured to generate launch and capture clocks for delay testing at various speeds based on the target frequency of the domain under test.
The test mode of a core configures its scan chains and routes input/output test channels to this particular core. Since each frequency domain under a core is tested in this particular core scan configuration, the maximum scan chain length and scan shift frequency, and consequently test time, for a single pattern is identical for each frequency domain of a core. However, the maximum scan chain lengths and scan shift frequency can differ across the cores.
The capabilities of the on-chip clock generation circuitry determine the level of concurrency achievable while performing parallel delay testing of various frequency domains. If at-speed launch and capture can be generated for one frequency domain at a time as in a typical system, the delay test of frequency domains is performed sequentially. If a more complex at-speed circuit enables concurrent generation and routing of N frequencies, up to N frequency domains under a core can be tested in parallel as illustrated in Fig. 2 , wherein the length of the bars represents the test time for the corresponding frequency domain. N is the architectural concurrency limit in this particular setting and the goal is to complete the testing as quickly as possible by careful scheduling of the domains to be concurrently tested.
In this paper, we assume the use of the core-based SoC test architecture described above when developing our delay test quality optimization techniques. Ideas and techniques proposed in this paper extend to other core-based test architectures, albeit with appropriate modifications based on the particularities of the architecture.
Although the SoCs include the test features necessary to enable manufacturing testing, the efficient delay testing of hundreds of frequency domains within an acceptable test time as increasingly higher levels of integration are reached remains a challenge. Although test concurrency provides an effective method to slow down the growth of test time of SoCs, the attainable concurrency level is constrained not only by architectural limits, but also by the power constraints due to the increased switching activity as a result of higher levels of concurrent testing. As illustrated in Fig. 3 , wherein the width and length of the bars represent the switching activity level and test time for each corresponding frequency domain, the domains that can be tested concurrently are determined not only by the test architecture concurrency limit, denoted as N, but also by the power limit as a result of the cumulative switching activity level, denoted as P. Although the architecture supports the testing of three cores in this example, all three domains in core 3 cannot be concurrently tested due to power limits.
The large set of domains in SoCs with diverse frequencies has numerous other characteristics that affect delay test quality and delay test time in addition to architectural and power-related concurrency constraints. Functional and test slack distributions of each domain affect the set of delay defects that can cause a system failure and that can escape delay test, respectively. The maximum scan chain length and the scan shift speed of each core determine the test time of a single pattern for a frequency domain in this particular core. Additionally, the size and fault coverage curve as a function of the number of test patterns for each domain determine the number of faults tested and the test quality improvement trend as the number of patterns increases. Since these parameters vary for each frequency domain based on the design and the test set, we propose a methodology that considers all these parameters and analyzes the tradeoffs across the frequency domains with the goal of obtaining the lowest overall delay defect escape rate for an SoC for a given test time budget. This analysis determines the optimal allocation of the test time budget to each domain and simultaneously pinpoints the optimal scheduling of the domains to be concurrently tested under the constraints of power and test architecture imposed concurrency limits.
Although the proposed method is presented in the context of quality maximization within a test budget, the method can be utilized to efficiently boost delay test quality further by enabling an effective use of the improved delay test generation techniques such as timing-aware or even simple N-detect test generation as well. Timing-aware and N-detect delay test methods, for example, can generate test sets with lower defect escape rates yet the cost of the substantial growth in test set size for all domains can be prohibitive with even generously relaxed test cost constraints. The proposed methodology can be utilized to judiciously allocate test resources for timingaware or N-detect testing of the critical domains, delivering test quality levels unattainable with regular transition test while avoiding cost-prohibitive timing-aware/N-detect testing of all domains.
We start in Section IV with an overview of delay test quality estimation while providing a presilicon delay test quality metric as an example. The proposed test quality optimization and domain scheduling method are subsequently presented in Sections V and VI.
IV. DELAY TEST QUALITY ESTIMATION
The quality of a test set in delay defect detection is correlated with the length of the test paths. To be able to detect a delay defect, the delay of the test path with the additional delay from the defect should exceed the period of the functional clock. Therefore, the slack of the test path (T test ) as illustrated in Fig. 4 determines the size of the smallest delay defect that can be detected. When the delay defect size exceeds the slack of the test path, it is detected by this particular test; otherwise, it remains undetected. However, if the size of a delay defect is smaller than the slack of the longest functional path (T func ) passing through the defect location, this particular defect does not result in a malfunction of the circuit at the system frequency and is therefore deemed redundant.
The distribution of the size of delay defects is not uniform. As reported in [2] and [10] - [12] , the occurrence frequency of the delay defects of smaller size is higher than the delay defects of larger size and it can be modeled as an exponential distribution as in the following equation, wherein the parameters, a, b and λ, can be obtained through a least squares fitting of the curve to the empirically collected data [2] , [10] 
The proposed test resource allocation method necessitates the use of the delay test quality estimation techniques that capture the aforementioned attributes contributing to the delay test quality. We discuss subsequently a presilicon delay test quality estimation technique as an example. It should be evident that any metric that captures the attributes that effect delay test quality could be used equally as well.
The presilicon method utilizes a metric, known as statistical delay quality level (SDQL) [2] , [11] . SDQL is a delay test quality metric that incorporates the functional and test path lengths and delay defect size distribution information, wherein a smaller SDQL indicates a better test quality and consequently fewer test escapes [2] , [11] . SDQL divides the delay defect distribution into three regions as depicted in Fig. 5 . The delay defects with a size larger than T func but smaller than T test are not detected by the test set; the area thus indicated between these two boundaries in Fig. 5 is denoted as SDQL and calculated as in the following equation. T func and T test are calculated by utilizing the design timing information
If a fault is not targeted by any test pattern, any defect with a size larger than T func is undetected and can cause system failure. The SDQL for untested faults can be calculated as in the following equation:
The SDQL is calculated for all faults, including both tested and untested faults, and the total of SDQL estimations of all faults denotes the quality level of the test set. Since the functional and test slacks are derived from the frequency and the functional and test path lengths, the effect of these parameters on test quality is taken into account in the SDQL metric. Since the total SDQL is obtained by adding the SDQL of each detected and undetected fault, it inherently considers the design size and the fault coverage as well.
Previously published methods and tools [2] - [4] exist for SDQL calculation. The normalized value of SDQL that has been calculated for a frequency domain, F i,j , of an industrial design by a commercial automatic test pattern generator (ATPG) tool is depicted in Fig. 6 . As more patterns are added, SDQL improves as expected. Once the incremental SDQL is calculated as a function of the additional test patterns, a curve can be fitted to represent SDQL as a function of the number of patterns for this domain, F i,j , denoted as Q i,j (x i,j ) in the figure. Our goal is the minimization of the cumulative SDQL across all frequency domains within a set test time budget.
Since the redundant area in Fig. 5 depends on the functional slack of the target fault location, hence fixed for a given design, the undetected area (i.e., SDQL) shrinks as the detected area expands; consequently, the goal of SDQL minimization can be achieved by maximizing the detection level as well. The detection level and the curve fitted to it, denoted as D i,j (x i,j ), can be seen in Fig. 6 for the same domain and are inversely correlated with SDQL as expected.
V. OPTIMAL TEST TIME ALLOCATION
The test quality estimated as a function of test patterns per domain in Section IV implicitly captures numerous distinct design-induced characteristics of the domain such as frequency, domain size, functional and test path lengths, and the fault coverage improvement as more patterns are added. It also allows us to explore test quality tradeoffs among the frequency domains. For example, if it is assumed that all characteristics except for the frequencies of two domains are identical, the faster domain will have tighter functional and test slacks, resulting in inferior test quality (i.e., higher defect escape level or lower defect detection level) for the same number of test patterns as illustrated in Fig. 7 . The reduced tolerance of the higher frequency domain to delay defects implies that a higher fraction of all possible delay defects can cause system failure, resulting in the higher frequency domain having a higher defect escape level as can be seen in Fig. 7 . Additionally, since this pair of domains is assumed to be identical except for the frequencies, the high-frequency domain results in a higher defect escape level for an identical test set (i.e., same number of test patterns and fault coverage) as highlighted by the red dashed lines in Fig. 7 for an arbitrarily selected test set size. However, the rate of defect escape level reduction (the slope of the test escape level curve) is steeper for the high-frequency domain in comparison to the slow frequency domain at an identical fault coverage and test pattern count point. Similarly, the defect detection rate is, as expected, higher for the highfrequency domain as can be seen from the defect detection curves in Fig. 7 . It is consequently beneficial to allocate more test time to the high-frequency domain. Other design-induced domain characteristics offer tradeoffs similar to the frequency.
While consideration of these design-induced characteristics can help the identification of the most beneficial test time allocation among the domains, one should not overlook the variable test cost side of the equation, namely the test time of each pattern, that is, effected by DFT-induced domain characteristics such as the maximum scan chain length and scan shift speed. If we assume that the test time of each pattern is twice as long for the low-frequency domain in comparison to the high-frequency domain due to a combination of longer scan chains and slower shift speed in the example of Fig. 7 , a delivery of the same defect escape level reduction (i.e., defect detection level increase) necessitates twice as much test time for the low-frequency domain as can be observed from the change in defect escape (detection) level curves of the lowfrequency domain in Fig. 8 . Since the identical defect escape level reduction for the slow frequency domain is spread out over a larger duration of test time, the rate of the defect escape level reduction (i.e., the slope of defect escape level curve of the slow frequency domain in Fig. 8 ) is lowered for each unit of test time.
Consequently, the influence of not only design-induced but also DFT-induced domain characteristics on the rate of defect escape level reduction should be considered while exploring tradeoff points among numerous domains with a goal of test quality maximization within a test time budget. For example, for the domains whose defect escape and detection curves are depicted in Fig. 7 , in a conventional fault-coverage-based test resource allocation, test sets with the same number of test patterns (i.e., fault coverages) could have been initially selected for each domain as exemplified by the dashed line in the figure. Since the low-frequency domain is assumed to have a slower test application speed due to the longer scan chains and slower shift speed in this example, it would take twice as much test time to apply the selected test set to the lower frequency domain as can be seen in Fig. 8 . The optimization goal is defined as the minimization (maximization) of overall defect escape (detection) level while utilizing a test time budget that is identical to the total test time initially allocated to these two domains by the conventional fault-coverage-based method.
The origin (i.e., point 0 at the left edge of the horizontal axis) in Fig. 9 shows the overall defect escape (detection) level delivered by the initial test time allocation, calculated as a sum of defect escape (detection) levels of the domains at the test resource allocation point marked by the dashed line. A close examination of this particular test resource allocation point reveals that the high-frequency domain has a higher rate of defect escape (detection) level change than the low-frequency domain. It would consequently be beneficial to allocate more test time to the high-frequency domain. A fixed test time budget, however, dictates the removal, from the low-frequency domain, of the same amount of test time. The horizontal axis in Fig. 9 shows the total test time shifted to the high-frequency domain from the low-frequency domain, starting from the initial test time allocation of these domains based on fault coverage while keeping the overall test time constant throughout the process. As can be observed in Figs. 7 and 8, the rate of defect escape (detection) level gain for the high-frequency domain keeps monotonically decreasing as more test time is allocated and, inversely, the rate of defect escape (detection) level loss for the slow frequency domain keeps monotonically increasing as more test time is removed. Consequently, as more test time is shifted to the high-frequency domain, the overall defect escape (detection) level initially improves as long as the rate of defect escape (detection) level gain for the high-frequency domain is higher than the rate of defect escape (detection) level loss for the slow frequency domain as depicted in Fig. 9 . However, once the rate of defect escape (detection) level gain for the high-frequency domain is suppressed by the rate of defect escape (detection) level loss for the slow frequency domain, the overall test quality (defect escape or detection level) starts to deteriorate. This particular crossover point marks the optimal test time allocation point as shown by the arrows in Fig. 9 .
The goal of the identification of the optimal delay test time allocation in an SoC necessitates the simultaneous exploration of the tradeoff highlighted in this example across all domains in the SoC while taking into consideration various architectural constraints as discussed in the rest of this paper.
An SoC test architecture, depicted in Fig. 1 , that only generates at-speed launch and captures pulses at a single frequency forces, the domains to be tested sequentially. In this particular architecture, the sole challenge consists of the identification of optimal test time allocation per domain as the schedule of the domains does not affect the overall test delay quality. If the test flow stops at first failure, the user may, after test time allocation, rank the testing of frequency domains based on their defect escape (detection) level to minimize the average test time for the defective devices. We proceed in this section with the discussion of the optimal test time allocation formulation for SoCs that restrict themselves to sequential testing of frequency domains only. Test quality optimization when an SoC admits concurrent domain testing necessitates the simultaneous optimization of test time allocation and test scheduling as discussed in the next section.
Upon obtaining the test quality function, Q i,j (x i,j ), per each frequency domain, F i,j , the optimal test pattern count, and x i,j , per domain to maximize test quality within a test time budget can be identified. The test time budget can be set to the total test time utilized by conventional fault-coverage-based delay test set selection so as to target the maximization of test quality while consuming in total a test time identical to conventional delay tests. Alternatively, the test time budget can be determined by product and test cost. The test time allocation can be easily cast as an optimization problem by incorporating the maximum scan chain length, SC i,j , and the period of scan clock, SP i,j , for each domain. SC i,j and SP i,j are known values for each frequency domain and expected to be identical for all frequency domains of a particular core. Given c cores, f i frequency domains in core i and test time budget of TT target , the optimal test time allocation per domain can be determined as follows:
This formulation corresponds to an integer nonlinear optimization problem as the number of patterns per domain, x i,j , is an integer variable. However, since a test suite for an SoC consists of thousands of scan patterns, the addition or removal of an individual scan pattern for a domain has a minuscule effect on test time and test quality. We therefore ease the integer restriction on x i,j to simplify the problem, rounding up afterwards to the closest integer once the noninteger solution is obtained. Additionally, the test setup sequence overhead is ignored in the formulation as it proves to be negligible in comparison to the scan shift time.
This particular nonlinear optimization problem, wherein the objective is a convex function and the constraint is an affine function, belongs to the class of convex optimization problems [13] . In convex optimization, any local minimum solution of the problem is guaranteed to be globally minimum as well, enabling the use of effective algorithms in the search for an optimal solution. The user can select any of a set of well-developed methods such as gradient descent and interiorpoint methods [13] to solve this convex optimization problem efficiently.
If the defect detection level function, D i,j (x i,j ), is available instead of the defect escape level function, Q i,j (x i,j ), the same formulation can be used by replacing Q i,j (x i,j ) with D i,j (x i,j ) and changing the objective to the maximization of the detection level instead of minimizing the defect escape level. The defect detection level function is a concave function. The problem of a concave function maximization can be easily reformulated as the equivalent problem of convex function minimization, enabling the use of the same efficient convex optimization techniques.
VI. OPTIMAL TEST TIME ALLOCATION AND SCHEDULING
An at-speed launch and capture clock generation circuitry that enables the concurrent testing of up to N frequency domains provides an opportunity for significant test quality improvement/cost reduction. An effective use of concurrency support necessitates a careful identification of the schedule of which domains are concurrently tested while performing test resource allocation. Yet, the simultaneous power utilization of the concurrently tested domains should adhere to the imposed power limit, P, to ensure reliable test application.
The proposed methodology therefore strives to identify both the domain test scheduling and the optimal allocation of test time to each domain while adhering to the test time limit, TT target , on the horizontal dimension and the architectural concurrency, N, and power limits, 1 P, on the vertical dimension as illustrated in Fig. 3 . We start with addressing the test time allocation and scheduling problem with only a test time budget and architectural concurrency limit as illustrated in Fig. 2 . Subsequently, the power limit is introduced and the complete solution is provided.
A. Test Time Allocation and Scheduling With Test Time and Concurrency Constraints
The test time allocation and scheduling problem with an overall test time budget of TT target and a test architecture supporting concurrent testing of N frequency domains in a core is significantly more complicated than the test time allocation problem for sequential testing, presented in Section V. We initially provide an optimization formulation for this problem and follow this up with an efficient solution.
The test time allocation and scheduling problem has to consider not only the distribution of the test time budget, TT target , but also determine when and where to test each frequency domain, effectively forming a domain test schedule. In order to create the optimization formulation for this problem, the available test resources are partitioned to horizontal and vertical channels as modeled in Fig. 10 . On the horizontal dimension, since the test architecture supports the concurrent testing of N domains, it is modeled as N distinct horizontal channels that each frequency domain can be assigned to. On the vertical dimension, the optimization model for each core i makes available V i vertical channels matching exactly the number of domains in core i, denoted as f i . Each slot in the horizontal/vertical channel crosspoint can accommodate only a single domain.
The objective is to determine a test time allocation for each domain and assign these domains to the available slots in order to maximize test quality while only using a total test time budget TT target . As the observant reader would notice, not all vertical channels need to be used. The optimization process identifies the minimum number of vertical channels needed while the rest are not allocated any test time at all. Let x i,j , SC i,j , and SP i,j denote the optimal test pattern count allocation, the maximum scan chain length and the period of scan clock for domain j in core i, respectively. To keep track of the location assigned to each domain, the binary variable, y i,j,h,v , is used to denote whether domain j in core i is assigned to the slot in the hth horizontal and vth vertical channel crosspoint. TT target is the available test time budget and TT i,v is the test time allocated on the vth vertical channel to core i. Assuming that c cores exist, the optimal test time allocation and scheduling problem can be posed as the following optimization formulation:
In the formulation shown above, the first constraint ensures that each domain is assigned to only a single slot and the second constraint ensures that each slot contains at most one domain. The third constraint determines the test time assigned to each vertical channel and the fourth constraint keeps overall test time usage within the test time budget while pursuing the objective of defect escape level (i.e., SDQL) minimization. The problem could be easily formulated as a detection level, D i,j (x i,j ), maximization problem as discussed in Section V. Even if the number of patterns, x i,j , is allowed to be noninteger, this is still a highly complex mixed-integer nonlinear optimization problem that requires extensive computational cost. A tractable solution of this problem necessitates a decomposition that exhibits the underlying properties of the space to deliver an efficient solution while minimizing any possible deviation from optimality. We propose therefore in the subsequent parts of this section a decomposition of this formulation into an efficient, incremental delay test quality optimization method to identify the test time allocation and scheduling solution.
While the formulation of Section V provides an optimal solution to the test effectiveness problem of various domains by equalizing the marginal utility of the final test vector in each test set, the addition of the concurrency constraints may necessitate the deviation of the test time allocation to ensure adherence to the synchronization constraints that mark the start of each domain group, comprised the cores to be tested concurrently. It should be evident that the minimization of the deviation of optimal test times in a domain group is a worthwhile goal. Yet not only should the minimization of deviation be targeted, but also the inefficiencies of the fit within the synchronization constraints of the domain groups may bloat the test time away from the user-specified test time parameter. Shrinking the time back to the specified test time constraint requires a revisiting of the marginal utility of the test set vectors, albeit this time no longer in the context of a single test set but rather in the context of a test domain group.
Our proposed algorithm starts off by introducing an optimal approach for test scheduling under concurrency constraints that minimizes synchronization-induced test set deviations. It then rebalances the allocated time by equalizing the utility of the vectors across domain groups, thus delivering a computationally effective approach that maximizes the utility of the allocated test resources.
In more detail, we initially disregard the concurrency consideration to perform a test resource allocation, referred to as initial test time allocation, that delivers the optimal delay test quality without any concurrency constraint, thus setting an upper bound on the attainable quality. The concurrency constraint is subsequently introduced while targeting the minimization of the deviation from the delay test quality level obtained by the initial test time allocation. The minimization of the deviation from the attained delay test quality level is ensured by a subsequent two-step process. First, an optimal schedule of the concurrently tested domains is identified based on the initial test time allocation. This scheduling step, referred to as domain scheduling, minimizes the unutilized test time slacks that potentially exist due to an unequal allocation of the test time to the domains. Furthermore, since the initial test time allocation is performed to enable the full utilization of the test time budget, the unutilized time slack in the schedule can result in test budget overflows. Subsequently, the slack avoidance and final allocation step applies the optimal test time allocation process once again yet this time on the concurrently tested domain groups determined during the domain scheduling phase, instead of on individual domains. This final step eliminates the unutilized timing slacks and keeps the test time within budget while ensuring the delivery of the highest test quality under the concurrency consideration.
1) Initial Test Time Allocation:
Subsequent to the test generation and test quality estimation for the individual frequency domains, an initial test time allocation is performed, assuming all test resources can be freely used. Since N horizontal channels exist, the total test time available is N × TT target , each channel providing TT target test time. Alternatively, if T test time is allocated to a domain, since N domains can be concurrently tested, this particular domain is effectively using 1/N × T test time resources from the overall test time budget, TT target . Since the test time is allocated with no restriction in this step, the optimization formulation is similar to the one in Section V while taking into account the increased total test time available due to concurrency as follows:
TT i denotes the total test time assigned to core i. As can be noted, the first constraint considers the concurrent testing of the domains by obtaining the effective test time usage which corresponds to 1/N of the total test time. The second constraint ensures that a single domain does not exceed the total test time assigned to a core. This constitutes a convex optimization problem, similar to the one in Section V with minor modifications, and can be efficiently solved.
2) Domain Scheduling: The initial test time allocation is followed up by the domain scheduling phase to determine the domains to be tested in parallel. Domain scheduling creates domain groups with up to N domains each while minimizing the unutilized test time slacks to ensure the smallest deviation from the initial test time allocation. Optimal test scheduling of all n frequency domains, wherein n = (nlog(n) ). The schedule, however, may leave unutilized test time slacks as not all domains in the same group are allocated an identical test time by the initial test time allocation. The test time slacks can subsequently result in a deviation from the test time budget although the deviation is minimized with the optimal scheduling algorithm.
An example of a domain schedule for a design with eight frequency domains, four per core, and the ability to concurrently test two domains is shown in Fig. 11(a) . The start times of the concurrently tested domains are aligned as expected and the overall test time is minimized. However, since the allocated test times for the concurrently tested domains in this particular example are not equal during the domain scheduling, unutilized slacks abound in the schedule, resulting in the overall test time exceeding the budget. In order to eliminate the unutilized test time slacks and keep the total test time within the overall budget, a slack avoidance and final allocation step follows the domain scheduling.
3) Slack Avoidance and Final Allocation: This final step starts with obtaining a cumulative test quality function for each frequency domain group. For example, F 1,3 and F 1,1 form one of these domain groups in Fig. 11(a) . If the individual test sets generated in the first step can be merged, the user can simply merge the tests of the domains of each group and add the individual test quality functions to obtain the cumulative test quality function per domain group. In practice, however, since scan compression is a de facto standard nowadays, it may not be possible to merge the test sets due to the scan compression constraints. In that case, a new test set for each domain group is generated and the cumulative test quality for each group is estimated.
Once delay test quality functions per domain group are obtained, the same convex optimization method discussed in Section V is applied to identify the new test time allocation for each domain group. The only change in the optimization formulation is the use of domain groups and their corresponding test quality functions instead of individual domains. The final test time allocation and domain schedule for the example can be seen in Fig. 11(b) and for the initial schedule in Fig. 11(a) . No unutilized test time slacks remain while the total test time is kept within the test time budget.
B. Test Time Allocation and Scheduling With Test Time, Concurrency, and Power Constraints
In addition to the architectural concurrency limit modeled in Section VI-A, power also plays a crucial role in constraining the possible domain schedules. Excessive switching activity generated during testing can result in thermal and power supply conditions that exceed those of the functional mode of operation. Elevated peak current draw and power dissipation due to the excessive switching activity can cause large instantaneous voltage droop and an increased temperature, increasing the circuit delays beyond those of functional mode during delay testing and subsequently resulting in unnecessary yield loss [14] . Evidently, test time allocation and domain scheduling have to consider power limits to create reliable test conditions.
As illustrated in Fig. 3 , the concurrently tested domains at any particular moment should remain within the power envelope. The power constraint, P, is modeled in this paper as the switching activity level, and represented as the ratio of active circuit nodes to the total circuit nodes. Similarly, switching activity level for domain j in core i, p i,j , is the active node level when this particular domain is singly tested. Although we utilize the widely used switching activity level in this paper, any other test mode power estimation method could be easily substituted.
The test time allocation and domain scheduling process have to consider not only the test time budget and the architectural concurrency limits on the horizontal and vertical dimensions, respectively, as depicted in Fig. 10 , but also an additional power constraint on the vertical dimension. Given a power level for each domain, p i,j , and an overall limit, P, the cumulative power level of all domains assigned to any vertical channel in Fig. 10 is constrained to be within the overall power limit, P. Optimal test time allocation and domain scheduling with the power constraint can be posed as the following optimization problem:
In this optimization formulation, the third and fifth constraints limit the overall test time usage to the test time budget. The first constraint ensures that each domain is only assigned to a single available slot and the second constraint ensures that each slot contains at most one domain. The fourth constraint identifies the domains assigned to each vertical channel and ensures that the total power limit of these domains does not exceed the overall power budget. This particular power constraint further increases the computational complexity of this optimization problem. As in the optimal test time allocation and scheduling problem with only an architectural concurrency constraint, we propose an efficient, incremental optimization method to identify the test resource allocation with the additional power constraint. It is even more imperative to develop an efficient methodology for the solution of this problem that minimizes the deviation from test effectiveness optimality.
As one can recollect, the solution to the multidimensional tradeoff problem outlined in the previous section relied on an initial partitioning of test sets into domain groups that delivered a schedule that minimized test set deviations. Yet the power constraint can invalidate such optimal solutions as they can possibly contravene the power budget. Not only the power constraint can invalidate such solutions, but also it questions even the initial test time allocation as stringencies of power budgets that supervene the strictness of the architectural concurrency limit may result in reduced initial test time allocations.
The proposed algorithm starts off by assessing the relative stringencies of the power and concurrency constraints, possibly resulting in reduced test times as stringent power constraints could introduce unutilized horizontal channels in addition to test time deviation induced slacks. This formulation is followed up by a test scheduling approach that again minimizes test time deviations in domain groups, yet this time under power constraints. The optimal algorithm introduced in the previous section needs to be modified to handle the twin constraints of test time deviation minimization and power constraints in each domain group. A rebalancing of the test times across domain groups eradicates the concurrency and power induced test time bloats, while equalizing the marginal utility of the vectors in each domain group.
The optimal test time allocation and domain scheduling with the total test time budget, the architectural concurrency and the power constraints are performed in three incremental steps which we define as follows. The initial test time allocation step not only considers the architectural concurrency limit but also imposes a power limit during test time allocation to the domains. Similarly, domain scheduling does not simply assign domains to an available slot, but also considers the power limit in the vertical dimension while minimizing the deviation from the quality level obtained by the initial test time allocation. After the initial test time allocation and domain scheduling, the slack avoidance and final allocation step follows the same convex optimization solution presented in Section VI-A. The new initial test time allocation and domain scheduling steps with additional power constraint are performed as follows.
1) Initial Test Time Allocation:
A domain allocated test time of T, as up to N domains can be concurrently tested, is effectively using 1/N × T test time resource from the test time budget, TT target , as previously discussed. The power constraint, however, can hinder the efficient use of all available test time. When a domain with p i,j power level is assigned to a vertical channel, it effectively uses p i,j /P of the available power budget. If p i,j /P exceeds 1/N, it endangers the full utilization of all available horizontal channels. The initial test time allocation leverages this observation and uses a more stringent utilization ratio based on both architectural and power limits for each domain to guide the test time allocation
The first constraint uses an architectural and power based normalization factor, n i,j , to estimate the effective test resource. This formulation is a convex optimization problem as well, similar to the initial test time allocation formulation in Section VI-A, and can be efficiently solved.
2) Domain Scheduling: Domain scheduling with the utilization of test time allocations in the previous step aims at minimizing the overall test time usage (horizontal direction) by determining which domains are tested concurrently. In addition to the architectural limit on the number of domains that can be tested concurrently, the schedule has to consider the power usage of each test, n i,j , represented as the width of the tests in Fig. 3 . Consequently, not only should the number of domains in any vertical channel not exceed N, but also the cumulative width of all tests in a vertical channel should not surpass the power limit, P.
This problem is a revised version of a problem known as 2-D strip packing [16] . In the 2-D strip packing problem, a set of rectangular items, similar to the delay test representations of each domain in Fig. 3 , is packed to a rectangular strip of a fixed width and an infinite length with the objective of packing all items on the strip while minimizing the length of the rectangular strip. This is known to be a NP-hard problem [16] with numerous heuristics proposed in the literature.
Our domain scheduling problem is similar to the 2-D strip packing problem but with two additional constraints. First, the number of total items (i.e., domains) assigned to any vertical dimension is limited to N. Second, the start time of all domains in a vertical channel needs to be identical. Many of the 2-D strip packing heuristics align the item start positions on the vertical dimension (referred to as levels), thus automatically satisfying the second constraint; the incorporation of the first constraint is straightforward as we illustrate later on.
A commonly used 2-D strip packing heuristic is the first-fit decreasing length (FFDL) algorithm [16] . 2 Items are sorted in decreasing order by their length and processed in this order. Each item is placed at the first level where it fits within the width limit of the strip. The same heuristic with a limit of N on the number of items placed on each level is utilized in this paper. The complexity of this algorithm is O(nlog(n)), wherein n is the number of items (domains). Strip packing remains an active research area. Any of the more advanced recent heuristics proposed in the literature could be adapted as well.
Domain scheduling with FFDL:
i) Rank the domains based on their allocated test times. ii) For each domain (starting from the first one on the list): a) if there is a domain group (vertical channel) for the corresponding core with fewer than N domains assigned and with a sufficient power margin to accommodate the current domain, assign this particular domain to this domain group; b) otherwise, start a new group for this core and assign this domain to it.
VII. EXPERIMENTAL RESULTS The proposed delay test quality optimization method is evaluated in this section. The first set of experiments focuses on pairs of frequency domains and shows the test quality tradeoff that is discussed in Section V and illustrated in Figs. 7-9 as test time is reallocated between a pair of domains. The second set of experiments explores larger SoC configurations, ranging from 8 to 128 domains. This set of experiments shows the overall test quality improvement as the total test time is optimally allocated across all domains in an SoC while considering architectural and power limits for concurrent domain testing.
We used an industrial circuit in our experiments as a baseline for frequency domains and combined numerous variations of this industrial circuit to create various core and SoC configurations as discussed later in this paper. The industrial circuit is a 65 nm design with 1.3 million transition faults and a single clock and voltage domain. The default max scan chain length and the scan chain shift speed are set at 200 and 20 MHz, respectively. A commercial ATPG tool is utilized to generate 6500 transition fault patterns and to calculate the corresponding SDQL and defect detection level versus test pattern count curves for the domains while utilizing the design standard delay format files for timing information. The maximum scan chain length and scan shift speed are used to find the corresponding test time as the pattern count varies. We set the delay defect distribution to the following equation in our experiments, which is provided in [2] based on the data in [12] :
Numerous domain variations are generated from the aforementioned industrial circuit by altering various properties of the circuit that effect the test quality and test time, namely the domain frequency, path lengths, maximum scan chain length, and scan shift speed. The domain variations are combined to create cores with multiple domains and subsequently SoCs with multiple cores. In the first set of experiments, we take a close look at the delay test quality tradeoff across the domains, that is illustrated through Figs. 7-9, by restricting our focus to a pair of domains instead of a large set of domains in order to observe the trend of overall test quality change as we alter the test time distribution between the domains. Three different domain pairs with an increasing level of variation between domains are analyzed as defined in Table I . The defect detection level as a function of test patterns is computed for each domain in Table I . The proposed delay test quality optimization technique is applied to each domain pair in the table to analyze the effect on overall test quality as the test resource allocation between the domain pairs is altered.
The domains, denoted as config 1 in Table I , are identical to each other with the exception of maximum scan chain lengths. Subsequently, this pair of domains will have identical test quality (i.e., defect escape and detection level) versus test pattern count curve, delivering the identical test quality with the same number of test patterns. However, the domain with longer scan chains will require more test time to deliver the same test quality, resulting in a lower rate of test quality gain per unit of test time in domain B. Fig. 12 depicts the overall defect detection level change as the test time is reallocated between domain pairs. In a manner similar to Fig. 9 , the origin point shows the cumulative defect detection level for the domain pair with their original fault-coverage-based test time allocation (i.e., an equal transition fault coverage for both domains A and B). The defect detection levels reported in Fig. 12 are normalized values (the value at the origin is normalized to 1) in order to show the overall test quality increase in comparison to the starting point for these configurations. The horizontal axis shows the additional test time allocated to domain A while removing an identical amount of time from domain B in order to keep the overall test time usage (i.e., test cost budget) intact. As the allocation of the overall test time is altered by giving more weight to domain A, the overall defect detection level starts to increase as can be seen in Fig. 12 . The rate of the additional quality gain, however, diminishes as increasingly more test resources are allocated to domain A, eventually reaching a peak point in the overall test quality. The peak points are the optimal test allocation points for the domain pairs as seen in Fig. 12 .
In the second pair of domains, denoted as config 2 in Table I , in addition to distinct scan chain lengths, the libraries (i.e., path lengths) vary as well, boosting the variation between the domains. The increased path slacks in domain B due to shorter paths magnify this particular domain's tolerance to delay defects, further reducing the rate of test quality gain per unit of test time. Subsequently, more test time can be allocated to domain A before reaching an equilibrium point, obtaining a higher overall test quality improvement in comparison to config 1 as can be seen from the overall defect detection level curve of config 2 in Fig. 12 .
In the third pair of domains, denoted as config 3 in Table I , the frequency of domain B is lowered in addition to the scan chain length and library, boosting further the variation between the domains. The additional increase in path slacks due to the lower frequency reduces the rate of test quality gain per unit of test time in domain B even further, enabling a higher test quality gain by the proposed method. In the case of config 3, a test quality improvement of almost 30% is attained by just altering the test time allocation between domains A and B with no increase at all in the original fault-coverage-based test time.
In the second set of experiments, three different SoC configurations composed of 8, 32, and 128 domains are utilized. The domains in these SoCs are obtained by changing four different domain properties (i.e., frequency, library, scan length, and shift speed) as discussed earlier. domains (test sets selected based on fault coverage only) as they are sequentially tested is used as the baseline for comparison. The method in Section V is applied to the same domains to reallocate the test time based on various domain characteristics and the resulting test sets are sequentially applied. The improvement in defect detection level for the sequential testing of all domains can be seen in Fig. 13 , denoted as "four different frequencies." As the number of domains within an SoC increases, the variation among the domains and, subsequently, the opportunity for test quality improvement by test resource reallocation increases as can be seen from the higher test quality gain in Fig. 13 as the number of domains increases. An improvement of around 15% is observed for the SoC with 128 domains while maintaining the original fault-coveragebased test time. In an effort to increase the variation among domains and analyze the effect on the test quality improvement, the frequencies of all 500 and 750 MHz domains in these configurations are changed to 250 MHz and 1 GHz, respectively. As the differences between domains increase, the test quality improvement is further boosted as expected. The improvement surpasses the 20% level as depicted in Fig. 13 , denoted as "two different frequencies."
In the final set of experiments, test quality improvement with the concurrent testing of domains under architectural and power constraints is evaluated. The 128-domain SoC configuration that is divided into 16 cores is utilized. Initially, the domain test sets that have been selected based on solely fault coverage considerations are scheduled for concurrent testing by applying the scheduling algorithm developed in the proposed method in Section VI. Essentially, the proposed optimization flow is applied with no variable test resource allocation. The total test time and test quality for the optimally scheduled fault-coverage-based test sets are used as baseline in this state of the experimental results. Therefore, the reported results understate the scheduling benefit of the proposed method and highlight how variable test time allocation can improve test quality for the same test time budget.
We start the evaluation with no consideration of power concerns during concurrent testing. The defect detection improvement delivered by the proposed method for the architectural concurrency limits, N, of 2 and 4 is depicted in Fig. 14, denoted as "N = 2, P = N/A" and "N = 4, P = N/A," respectively. A substantial improvement, exceeding 20%, in test quality is observed while consuming test time identical to the original test set. The quality improvement is slightly lower during the concurrent testing of four domains as expected due to the additional constraints of an increased concurrency level.
In the remaining part of this set of experiments, the proposed method is evaluated with a power constraint, P, in addition to the architectural constraint, N, during the concurrent testing. The power consumption of each domain is assumed to be proportional to its frequency with the highest power consumption among all domains denoted as p max . Initially, the overall power limit, P, is set to a very large number, effectively ensuring that the architectural constraint, N, remains the only restriction on the attainable level of concurrency. As can be seen in Fig. 14, the quality improvement for "N = 4, P = Large" is identical to the results reported for "N = 4, P = N/A" as expected. Subsequently, the overall power limit, P, is set to p max , 1.5 * p max , and 2 * p max , while keeping the architectural limit, N, at 4. The delay test quality improvement levels with these particular constraints are reported in Fig. 14, denoted as "N = 4, P = p max ," "N = 4, P = 1.5 * p max ," and "N = 4, P = 2 * p max ," respectively. A substantial level of test quality improvement, reaching up to 15%, even with strict power limits is observed.
It is interesting to compare the results of this paper with optimal solvers. A comparison shows that the complexity of the problem overwhelms optimal solvers for a realistic set of constraints, resulting in the optimal solvers having to resort to heuristics themselves yet with results substantially inferior to those reported by our algorithms. As this paper's focus is on benefits and methodology for test resource allocation, we refrain from a discussion focused on the merits of the various heuristics for mixed integer/nonlinear optimization solvers, for which we refer the reader to [29] .
As the power limit and the variation among domains increase, the attainable level of the delay test quality improvement delivered by the proposed method increases as expected. Evidently, the continuous integration of an increasingly larger number of domains with distinct characteristics in the state-of-the-art SoCs can be expected to boost the quality improvement level delivered by the proposed delay test resource allocation method.
VIII. CONCLUSION Today's SoCs are composed of hundreds of frequency domains with distinct characteristics that affect delay test quality. In order to extract the highest value from the available test resources, a carefully crafted allocation of the test resources to each domain based on the domain characteristics as well as a full utilization of the concurrent test support while adhering to power limits is necessary. This paper proposes a technique for the identification of the test resource allocation and the schedule of the concurrently tested domains in order to maximize overall delay test quality within the limits of available test resources and power. The delay test quality of each domain while considering distinct domain characteristics is initially estimated. Subsequently, optimization formulations as well as efficient test time allocation and concurrent test scheduling methods based on convexity and fast scheduling algorithms are provided.
Experimental results show that the proposed method can deliver a substantial delay test quality improvement in comparison to the conventional, fault-coverage-driven approach while consuming identical test time. As the level of integration and complexity of SoCs increases, the test quality improvement delivered can only be expected to grow.
