SUMMARY The rapid advancement of VLSI technology has made it possible for chip designers and manufacturers to embed the components of a whole system onto a single chip, called System-on-Chip or SoC. SoCs make use of pre-designed modules, called IP-cores use of pre-designed modules, called IP-cores, which provide faster design time and quicker time-to-market. Furthermore, SoCs that operate at multiple clock domains and very low power requirements are being utilized in the latest communications, networking and signal processing devices. As a result, the testing of SoCs and multi-clock domain embedded cores under power constraints has been rapidly gaining importance. In this research, a novel method for designing power-aware test wrappers for embedded cores with multiple clock domains is presented. By effectively partitioning the various clock domains, we are able to increase the solution space of possible test schedules for the core. Since previous methods were limited to concurrently testing all the clock domains, we effectively remove this limitation by making use of bandwidth conversion, multiple shift frequencies and properly gating the clock signals to control the shift activity of various core logic elements. The combination of the above techniques gains us greater flexibility when determining an optimal test schedule under very tight power constraints. Furthermore, since it is computationally intensive to search the entire expanded solution space for the possible test schedules, we propose a heuristic 3-D bin packing algorithm to determine the optimal wrapper architecture and test schedule while minimizing the test time under power and bandwidth constraints.
Introduction
The recent popularity of advanced technologies such as broadband Internet, 3-G cellular phones and high-speed workstations is due to many factors, one of which is the rapid advancement in the design and production of VLSI chips. More importantly, it has now become possible to put entire systems onto a single chip which is commonly known as System-on-Chip (SoC). Currently, SoCs are widely used in devices intended for telecommunications, networking and digital signal processing. Moreover, they are increasingly being utilized in mobile on-the-field devices, which increase the demand for highly-reliable, defect-free chips that have very low power requirements.
To ensure that SoCs and other VLSI chips operate as intended, testing must be conducted per chip. As production capabilities improve, clock-rates rise exponentially and transistor density increases dramatically, the cost of testing newer VLSI chips also becomes higher. More specifically, the increased complexity of the circuitry in SoCs also means an increase in the amount of test data, which usually results in longer test application time. Furthermore, test access becomes a problem since the cores cannot be directly accessed from the I/O pins of the chip. Moreover, modern IP cores operate at various frequencies internally, which have advantages such as reduced power and silicon area. These multiclock domain cores present clock skew and at-speed testing problems. Clock skew problems arise from unsynchronized clock sources such as two signals of the same frequency but different clock trees that arrive out-of-sync to their destinations thereby causing data corruption. Furthermore, power consumption and heat during test has become a big issue because of high switching activity during scan-shift operations as well as the possibility of more active components than expected during normal operation.
The most common Design-for-Testability (DFT) used for SoCs with multiple cores is the design of a test data delivery method, more commonly known as Test Access Mechanism (TAM), and the use of wrappers, which enables independent core testing. The wrapper isolates a core during test and provides an interface to apply and collect test data from it. More recently, the IEEE 1500 standard for embedded core test has been approved to provide guidelines for core wrapper design and interfacing to TAMs. Furthermore, several approaches to optimize wrapper designs for single frequency embedded core testing [1] , [2] as well as wrapper and TAM co-optimization algorithms [3] , [4] , [11] , [12] have already been suggested. Still, these approaches don't directly address the problem of testing multi-clock domain cores.
To address this problem, we propose an IEEE 1500 compliant power-aware multi-clock domain core wrapper that partitions the IP core into smaller sub-groups and utilizes gated-clocks to control the start times of scan-shift operations and enable a more flexible and efficient use of the external bandwidth under a power constraint compared to previous methods. A heuristic 3-D rectangular bin packing algorithm is also introduced, which forms the basis of the proposed wrapper design method.
Related Work
Most at-speed multi-clock domain core testing techniques Grouping the wrapper scan chains according to their clock domains prevents the effects of clock skew during this phase. To avoid clock skew at the capture phase, the test enters this phase before or after the last bit of test data is scanned in. In the capture phase, all the scan chains are driven by their functional clocks to simulate normal operation. For multi-clock domain cores, a capture window similar to that proposed in [8] , [9] , as shown in Fig. 3 , is used during this phase and the P-vc's are activated in such a way that ensures inter-domain and intra-domain signals are prop- Fig • ( 1) where:
2. The total power dissipation of all active P-vc's at any time t cannot exceed the maximum allowed power dis-
Since all the P-vc's become active and inactive independently, the total scan-shift time for one test pattern can be computed from the P-vc with the latest end time as shown below: the packed P-vc's while minimizing the total height. Since it has been shown that the restricted 3-D bin packing problem is NP-Hard in [12], we propose the following heuristic algorithm to solve the problem.
Initialization: Cube Creation and Ordering
To illustrate the various steps in the algorithm, the benchmark multi-clock domain IP core hTCADT01, first introduced in [9] and shown in Table 1 , is used throughout the following sections. This IP core has seven clock domains and sd# denotes the sub-domain number while P is the power dissipation of the sub-domains when shifting at 100MHz and is made equal to the sum of all the scan chain lengths Lscij belonging to that sub-domain to simplify the setup. Further details are given in Sect. 6. P-vc's can be created to represent any combination of sub-domains belonging to the same clock domain. If a domain Di has Nsi sub-domains, then the total number of possible P-vc's from Di is just the sum of all the possible combinations of Sij. (5) where f1 is the maximum allowed shift frequency. In this paper, we set f1=fATE and Pipmax is the power dissipation at f1 as expressed below:
where bij=1 when sub-domain Sij belongs to Gip and bij= 0 if not, and pij is the power dissipation of Sit at fATE. The minimum test time tip can be computed as follows:
The Fig. 7 (b) so the fip of P-vc_2 is decreased to 50MHz, and this leads to a decrease in pip=225 while wip remained the same and tip doubled to 3.00ƒÊsec. Figure  7 shows the finished schedule separated into two 2-D graphs gain of around 4% across all power constraints and a maximum gain of 14.25% at Pmax=3000. Small differences in time (0-1%) are attributed to discrepancies in rounding-off among the programs used and makes them negligible, so our algorithm matches or exceeds [10] in 90% of the cases.
In [8], [9] , the area increase due to the demultiplexingmultiplexing circuitry, scan-control modules and other necessary logic to implement the multi-clock domain wrapper was stated to be less than 10% the area size taken by the IEEE 1500 wrapper and other scan logic. Since our approach only requires a slight modification of the scan control circuitry in [8], [9] , it is safe to assume that the added overhead would be minimal. Furthermore, as manufacturing processes become smaller and transistor count becomes higher, the probable DFT overhead becomes more and more negligible in light of the possible gains in test application time. Also, the added flexibility of domain partitioning and partitioned test scheduling would be of greater benefit as designers start to re-use older generation multi-clock domain circuits as IP-cores in newer, more complex designs. Hideo Fujiwara received the B.E., M.E., and Ph.D. degrees in electronic engineering from Osaka University, Osaka, Japan, in 1969 , 1971 , and 1974 . He was with Osaka University from 1974 to 1985 and Meiji University from 1985 to 1993, and joined Nara Institute of Science and Technology in 1993. Presently he is a Professor at the Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan. His research interests are logic design, digital systems design and test, VLSI CAD and fault tolerant computing, including high-level/logic synthesis for testability, test synthesis, design for testability, built-in self-test, test pattern generation, parallel processing, and computational complexity. He is the author of Logic Testing and Design for Testability (MIT Press, 1985 
