Abstract
Introduction
As the test set size of a scan design grows, reduction of test application time is a critical issue and has been widely investigated. A straightforward method to reduce the test application time is via parallel scan chains [1] . However, each scan chain requires a pair of scan-in and scan-out pins, and the number of available I/O pins often limits the number of possible parallel scan chains. Many sophisticated methods have been proposed to address the problem [2] - [6] . For example, Rajski et al. presented a hardware scheme based on linear finite state machine [6] . The external input pins receive compressed test data from the automated test equipment (ATE), and the state machine decompresses the data before they are applied to parallel scan-in chains. Similarly, the test responses from the scan-out chains are compacted in space before being placed on the external output pins for observation. All existing methods, in essence, compress scan-in data before applying them to external input pins and decompress them before application to scan chain inputs; the reverse operation applies for scan-out data. Recently, we have investigated a new approach for the dual use of power pins for data communications as well as for delivery of power [7] , and this paper aims to extend the method for parallel scan design.
The number of pins on a VLSI chip seems to grow boundlessly with the advancement of VLSI into deeper and deeper submicron technology. The increasing pin count results not only from additional signal pins but also from additional power and ground pins. For example, the Intel Pentium 4 processor in the 775-land package has a total of 775 pins (lands), of which 226 pins are for power and 274 pins for ground. We present the possibility of using power pins to scan-in test data in this paper while simultaneously serving their intended purpose, supply of power. We also present a new scheme to multiplex scanout data using the pulse position modulation (PPM) and the code division multiple access (CDMA) scheme. The dual use of power pins for scan-in test data and of the PPM/CDMA scheme for scan-out test data addresses the bottleneck on I/O pins for parallel scan design and would make parallel scan design even more attractive.
In fact, the use of power lines for communications is only a new approach in the IC world. Power line communications (PLC), patented in the early 1920's, has been mainly considered by utility companies for remote metering and control [9] - [10] . Recently, it has been revived for broadband internet access over existing power lines [11] . To our best knowledge, dual use of power lines at the IC level, even at the PCB level, has not been investigated outside the authors' research group. It faces a totally different set of technical challenges from that of traditional PLC as described below.
Most importantly, the noise characteristics as well as the communications channel of the power distribution network are completely different from a traditional PLC. Power lines consisting of a power distribution network are very noisy due to the so-called IR drop, Ldi/dt, and thermal noise [12] . Second, the signal power level at a power distribution network should be sufficiently small not to disturb the correct operation of the circuit, which makes the recovery of data from a noisy power line difficult. Such noise is not as much of problem for a Paper 21.3
INTERNATIONAL TEST CONFERENCE 2 traditional PLC. Third, power pins and lines are electrically connected to preclude the possibility of direct application of parallel scan chain data on multiple power pins. Lastly, the power pins of contemporary advanced packages such as flip-chip BGA (Ball Grid Array), when combined with on-chip/off-chip decoupling capacitors, form a high-quality lowpass filter to suppress high frequency noise terms [13] - [15] . Therefore, a high frequency data signal may not be able to propagate through power pins.
It is insightful to realize that cellular CDMA wireless communications face essentially the same problems except the last one; the channel is noisy, the signal level from a cellular phone is low, and the multiple users share the same spectrum [16] . A close examination reveals that essentially the same scheme adopted for CDMA wireless communications can also be applied to address the technical problems faced by the dual use of power lines in ICs. Specifically, we propose to use DS-CDMA (Direct Sequence Code Division Multiple Access) combined with ultra wideband (UWB) signaling. DS-CDMA is an effective method to mitigate the high noise level while providing a method for multiple scan data channels [16] . Owing to its wide bandwidth, UWB offers high data rate with average power near the noise level [17] . Fortunately, the use of UWB signals also addresses the last technical problem: the formation of a lowpass filter by the package. Our initial research findings show that, in fact, such a filter fails to behave as a lowpass filter at an extremely high frequency band over 1 GHz due to parasitics associated with power and ground planes and low self resonant frequencies (SRFs) of decoupling capacitors.
The remainder of this paper is organized as follows. Section 2 provides preliminaries on DS-CDMA techniques, UWB, power distribution networks, and IC packages. Section 3 and Section 4 describe our proposed approach for scan design and its feasibility study, respectively. Section 5 discusses the limitation of the proposed method and its potential solution to be investigated. Section 6 draws conclusions with suggested future works.
Preliminaries
In this section, we briefly review the relevant background on DS-CDMA, UWB, power distribution networks, and IC packages. 
DS-CDMA technology
The value is also 0 for users A and C. Hence, it is possible that only user B can recover the logic value, while the received signal behaves as noise for the other three users. In reality, a de-spread value deviates from the expected value due to such factors as clock skew and propagation delay of the received signals, which necessitates a threshold circuit to decide the logic value of a received codeword.
The advantage of the spreading operation can be quantified as processing gain, which is defined as 10×log 10 (spreading_factor) in dB. For example, the spreading factor of 4-bit codewords is 4, which yields a processing gain of 6 dB. This means that the signal-tonoise ratio (SNR) increases by 6 dB, while maintaining the same the level of signal power. So the DS-CDMA technique not only enables multiple scan channels over power lines, but also helps recovery of data over noisy power lines. The cost for these benefits is a lower data rate for the same clock speed, which is acceptable for scan testing. In practice, a scan clock could be as slow as one-tenth as that of the system clock [8] . One relevant practical issue faced for the CMDA technology is the socalled the near-far problem [16] . The near-far problem occurs when one user signal is stronger than others, which dominates the demodulator to lose other users' signals.
Paper 21.3 INTERNATIONAL TEST CONFERENCE 3

Ultra Wideband (UWB)
Since the FCC's allocation of a UWB spectrum in the range of 3.1 GHz to 10.6 GHz in 2002, UWB has gained phenomenal interest in academia and industry [17] Compared to traditional narrowband communication systems, UWB has several advantages such as high data rate, low average power, and simple RF circuitry. Shannon's theorem states that the channel capacity C is given as B×log 2 (1+SNR) [18] , where B is the bandwidth. As the bandwidth B is much larger (on the order of several GHz) for UWB than for a narrowband signal, the SNR can be much smaller for UWB to achieve the same data rate. Therefore, UWB is often able to recover data, even if the signal power is close to the noise level. In other words, the power level of UWB signals could be at the noise level of power lines to have little impact to the power integrity.
UWB signaling can be carrier-based or impulse-based, and impulse UWB is more suitable for the proposed application due to its simple hardware [17] . Impulse UWB is based on train of narrow pulses (which are typically a few hundreds picoseconds wide). The most popular pulse shapes used for impulse UWB are Gaussian pulses and their derivatives [19] . Various modulation schemes such as on-off keying, pulse amplitude modulation, pulse position modulation (PPM) and binary phase shift keying (BPSK) are available for UWB. An mary PPM scheme has 2 m distinctive time positions, and one pulse carries m bits of information. We adopt BPSK for scan-in test data due to its efficient performance, and we adopt PPM for scan-out test data due to its low hardware complexity [17] , [20] , [21] .
2.3
Power Distribution Networks A power distribution network (PDN) consists of power and ground wires over multiple metal layers and forms a grid. Figure 1 illustrates a power grid over four metal layers, in which vias make connections between two different metal layers. Power sources (from power pins) are connected to bumps distributed on the top metal layer in the figure, but some package types allow connections only along the perimeter of the top metal layer. Current sinks in the figure represents loads connected to the power grid.
Major issues with the PDN design include signal transmission models, noise models, and extraction of the necessary parasitics for the models. A lumped model of a PDN consisting of lumped R, L, and C components is simple, but it suffers from inaccuracy. For example, modeling a distributed RC network as a lumped RC network can result in up to 50 % error [23] . We adopted the N-stage ∏-type distributed RLC model suggested in [24] for our simulations.
Major noise sources for a PDN are IR drop, Ldi/dt noise, and thermal noise [12] . An IR drop is caused by the parasitic resistances of power/ground wires and vias. Ldi/dt noise is due to the inductive parasitics of bond wires or C4 bumps and the simultaneous switching of the circuit, and it is the major source of noise for a PDN [25] - [30] . Thermal noise at resistors is proportional to the absolute temperature and the noise bandwidth [31] . The IR drop was implicitly considered in a PDN model for our simulations as feasibility study, and we embedded Ldi/dt noise and thermal noise on the current sinks distributed over the PDN.
Extraction of accurate R, L, and C parasitics is the necessary ingredient for an accurate modeling of a PDN and has been the subject of intense research [32] , [33] . To obtain the parasitic resistance of a metal wire, we need to know the length, width, and sheet resistance, and the sheet resistance is often available from the technology library of the particular technology in use. Capacitive parasitics result from the routing conductors on the substrate, and the parallel-plate capacitor model (C = εS/d) is most widely used, where S is the area of the parallel-plate capacitor, d is the insulator thickness, and ε is the permittivity of the insulating material between the two plates. We extracted parasitic values of a unit length R and a unit area of C from a layout and computed necessary values for our simulations. It is difficult to model the inductance accurately because of non-locality effects and unknown return path. We relied on the following formula suggested in [32] and [33] for our simulations. 2.4 Packages A wire-bonded BGA package is an area array I/O package, which employs a connection via bond wires from the solder ball to the I/Os of a die [13] - [15] . Since this type of package uses the bonding wires for the electrical and physical connections between the pins and pads, the parasitics of inductance and resistance would be larger than those of flip-chip type packages [35] . A flipchip BGA package, which is widely used for complex chips, provides a short distance connection between the package I/Os and the die (chip) I/Os, and thus it reduces the parasitics associated with the package. Figure 2 shows the structural difference between the wire-bonded BGA and flip-chip BGA. Note that the difference between the two packages in the length of the electrical contact. This difference impacts the inductive loss, and thus the voltage drop of an integrated circuit. We considered a wirebonded BGA package for our simulations. Modeling of wire-bonded BGA packages is relatively simple compared with flip-chip BGA packages, whereas modeling of signal, power, and ground planes for a flipchip BGA package is more complex.
Proposed Approach for Parallel Scan Design
We propose to use power pins to scan in test data based on UWB and DS-CDMA communication techniques. We also propose a new scheme to multiplex scan-out data based on CDMA and PPM. The two schemes are explained in this section. Figure 3 illustrates the proposed method on a package with 44 power pins. The circuit diagram shows the i th scan data recovery block SDR i. One bit of scan-in data is spread over multiple, narrow UWB pulses called chips. BPSK modulation is adopted for our method, in which a chip signal represents 1 (-1) as a positive (negative) pulse. UWB pulses are observed at the output of the power pad for the best performance, and, in theory, the signals can be observed from any point of the power line. A simple comparator such as the one given in [36] or a simple correlator samples the signal on the power line and produces a logic value [18] . Then, the following digital circuit performs despreading with its own code. Note that logic value 1 (0) represents -1 (+1) of a bit of a codeword for the exclusive-OR operation. An orthogonal code is assigned to each pin, and all the other scan data signals except the intended one behave as noise. For example, the codeword (1, 1, -1, -1) is assigned on pin 1, while the codeword (1, -1, -1, 1 ) is assigned on pin 2. Ideally, the data signals applied to other pins do not interfere with the recovery of the intended signal. However, several factors such as noise and differences of propagation delays cause imperfect orthogonality, which would limit the number of multiple scan channels and the locations of data recovery blocks. Finally, a correlator performs better than a comparator, but costs higher circuit complexity.
Scan-in Test Data
Scan-out Test Data
Use of a PDN for both scan-in and scan-out data is problematic, as it causes the near-far problem as faced for wireless CDMA communications. Simply speaking, a UWB signal applied to a PDN from an internal transmitter (to send out a scan-out data) dominates the signal applied from an external pin (to send in a scan-in data). We propose a new scheme to multiplex parallel scan outputs to reduce the number of scan-out pins based on PPM and CDMA.
The process is illustrated in Figure 4 . Each scan-out data s1, s2, … , s4, are spread through a unique orthogonal code (assigned to the scan chain) in the horizontal direction at the chipping clock rate Ci. In this example, each bit is spread onto a code that is four chips long. Then, the chip values are added in the vertical direction, and the sum decides the pulse position in the time slot. For example, the sum of the first chip values is 2. Hence, the position of the pulse is 2 in the time slot C1, so the pulse is displaced by four positions from the reference. Similarly, the sum of the second chip values is -2, so that the pulse is displaced by two positions. The scan-out data modulated in position is applied to the data pin. Note that there are 2N-3 distinctive positions in a time slot for N scan chains, which is five for this example. It should be note that the pulse repetition rate is the same as the chipping rate, which is the system clock rate, in practice. Now, let us consider recovery of the scan-out data, which is performed at the ATE side or by some dedicated external hardware. The ATE demodulates the PPM data for each code, i.e., it detects the positions of pulses for the code. Then, the demodulated PPM data is de-spread with individual codes of scan-out channels. For the example in Figure 4 , the demodulated pulse sequence for the code is (2, -2, -2, -2). The scan-out data for s1 is obtained by correlating with its assigned code (-1, -1, -1, -1) . The correlated value is +4, which is interpreted as logic 1. Similarly, the assigned code for s2 is (-1, 1, -1, 1) (which is the complement of the de-spread value), and the correlated value is -4 to be interpreted as logic 0. 
Feasibility Study
In order to investigate the feasibility of the proposed method for dual use of power pins, we modeled an IC package and a PDN, and then simulated the proposed system. The IC package considers wire bonds and lead frames, and power pads including a diode for electrostatic discharge (ESD).
Package and Power Pad
Model A widely adopted model for an IC package and a power pad is shown in Figure 5 We considered an Amkor flexBGA 144 pin package for our simulation due to the availability of its parasitic values in the public domain [34] , and the following parameters are used.
• The capacitive parasitics C 1 of a power pad are based on the TSMC 0.25 μm deep submicron process. The resistive parasitics R 1 are calculated from the sheet resistance model [12] . The capacitive parasitics of the ESD protection diode are extracted from the layout of the power pad.
Power Distribution Network Model
We adopt an N-stage ∏-type distributed RLC model, and Figure 6 illustrates the model for N=3 [24] . The number of stages N is set to 10 for our model, as it is sufficiently accurate for the delay estimation [37] . The current sinks model the current consumed by the devices, and they are evenly distributed over the entire network. grid. Finally, the PowerPC processor has 93 power/ground pins out of 360 pins, and the total number of power pins is set to 44 for our simulation.
Is
Pulse Propagation and Detection
Gaussian
2 UWB pulses with peak amplitude of 0.25 V, which is 10 percents of the supply voltage, and a width of 1 ns were considered for our simulation. The maximum chipping rate, i.e., the pulse repetition rate, is set to 1 GHz.
The first and most critical experiment is to examine whether UWB pulses propagate through power pins. We applied UWB pulses only at power Pin 1 for the single channel environment (refer to Figure 3) , while the rest of the 43 power pins are at VDD = 2.5 V Note that perfect impedance matching between UWB transmitter and power pins is assumed. Figure 7(a) shows the waveforms at the input of the power pin and at the output of the power pad i.e., at the input of the scan-in data recovery block.
The waveforms indicate that the attenuation of the peak of the pulses due to the package parasitics is in the range of 40 to 48 percents, and the propagation delay at the peak is about 100 ps. Although the peak is attenuated, pulses propagate through pins to open the possibility of using UWB pulses on power pins. The peak of the second output pulse is higher than that of the first output pulse. This is due to the tail of the first pulse to cause inter symbol interference, and it can be mitigated by DC offset cancellation or simply a longer pulse repetition time.
The next experiment is to examine the interference level from a UWB pulse applied on an adjacent pin. We applied two consecutive positive pulses to Pin 5 and a positive pulse followed by a negative pulse at Pin 7. Note that Pin 6 is the center power pin on a side. Figure 7(b) shows the simulation results, and the output waveform is shown only for Pin 5. As expected, when the two pulses add constructively, the output increases. When the two pulses have the opposite polarity, as in the case of the second pulses, they may add destructively to cause inter channel interference. However, the attenuation level due to the inter channel interference is negligible as shown in the figure. So the experiments indicate that the interference from other pins does not pose a problem. It is important to note that as a scan data recovery block moves away from the power pad toward the central point of a PDN, it will experience a higher level of inter channel interference to limit the number of parallel scan chains. In such a case, the power levels of individual UWB transmitters can be adjusted properly to maximize the number of scan chains.
Paper 21.3
INTERNATIONAL TEST CONFERENCE 7
Next, we added both Ldi/dt and the thermal noise to each current sink, i.e., load. The Ldi/dt noise has a rising slope of 108 mA/ns for 50 ps followed by a symmetrical falling slope. The thermal noise has a Gaussian distribution limited to 5 percent 3 of average current at each current source. We set the level of the thermal noise intentionally high to account of variations of the L di/dt noise for real circuits. The waveforms under the two scan channels (Pin 5 and Pin 7) are shown in Figure 8 , and the output waveform is shown only for Pin 5. The Ldi/dt noise incurs a dip notch slightly after the peak of the input, and the thermal noise distorts the output waveform for the entire period. The output waveform indicates that the sampling time of the output is critical when a comparator is used for a scan data recovery block. Note that it is not a problem for a correlator, which integrates the waveform for an entire clock period.
The next experiment examines the performance of the overall system in terms of BER (bit error rate). We considered a comparator (rather than a correlator) for the data recovery block in our simulation. We experimented with a single scan chain, in which we applied data to only one pin under the conditions of Ldi/dt and thermal noise mentioned above. We then compared the recovered data with the original data. After applying 100,000 scan-in data bits, we did not observe any errors -even without spreading, i.e., SF (spreading factor) = 1. Since the BER is 0 for 100,000 scan data bits under SF=1, there is no need to experiment with a higher spreading factor. We then experimented with 44 scan channels, in which scan data is applied to all 44 pins. Again, we did not observe any errors even under SF=1. Although we have not experimented with a sufficiently large number (such as on the order of 10 11 ) of data due to long simulation time, we believe that the proposed method could be used for parallel scan chains without employing the spreading operation. Finally, in order to see the impact of spreading, we experimented with an artificially inflated (and hence unrealistic) level of the thermal noise. We set the thermal noise level to 10 percents of average current and considered two different spreading factors, SF=1 and SF=4, under four scan channels. We adopted OVSF (Orthogonal Variable Spreading Factor) codes, which are employed in Third Generation CDMA standards [39] . The center power pin on each side is selected to maximize the distance between pins. Under the application of 10,000 data bits, we obtained the BER=1.1×10 -3 for SF=1. When the spreading factor increases to 4, we did not observe any error. In other words, the increase of the spreading factor by four times increases the BER to 0. Before we close this section, we describe the method of determining the sampling instant of the received signals for the data recovery blocks. The sampling instant may significantly affect the performance of the proposed system, especially when a comparator is used for a data recovery block. An eye diagram is a useful tool to determine the effects of the sampling instant for the received signals. Unlike wireless communication systems, the UWB signal generators on the ATE side and the data recovery blocks for the proposed system are synchronized through a dedicated clock signal line. So, clock jitter has insignificant impact on an eye diagram for the proposed system. The major sources of error in the eye diagram for the proposed system are Ldi/dt and thermal noise. Figure 9 shows the eye diagram for 100 data bits under a single scan channel with the Ldi/dt and the thermal (5 percent) noise mentioned above. As shown in the figure, the optimal sampling instants are different for a data bit 1 and a data bit 0. The sampling instant at 0.52Ts, where Ts is the symbol duration, is the best for a bit 0, but the worst for a data bit 1. It is the opposite for the sampling instant at 0.63Ts. As a compromise, we set the sampling instant to 0.60Ts in our simulations. 
Issues and Applications
A major issues lies with the load boards of ATE for actual deployment of the proposed method in the real world.
All power pins on a load board are tied together with large bulk capacitors as well as individual bypass capacitors at each power pin. The power planes handle a large amount of current on the order of 100 Amps. Breaking up power planes into individual power pins, while maintaining equal potential, is challenging. Further investigation on load boards is necessary to address this problem.
The power distribution network is ubiquitous across the chip as seen by the internal logic, i.e., a power line is accessible to any internal node. This suggests the possibility of monitoring the logic value of any internal node through the power line by attaching an ideally a small size transmitter to the node. Routing the data through a power line avoids preplanned routing of a data path from the node to an external data pin. This is a highly attractive feature for testing, as the routing of data paths for testing alone is considered by industry to be too expensive in design time as well as in silicon area [8] .
The ability to monitor internal node values without routing data paths opens a new set of applications in testing such as fault diagnosis, monitoring transient logic values during built-in self test, and on-line testing..
Conclusion
In existing scan designs, each scan chain requires a pair of scan-in and scan-out pins, and the number of available I/O pins often limits the number of possible parallel scan chains. We proposed a new approach to address this problem. The key idea employed for our approach is the dual use of power pins for scan-in data communications as well as for delivery of power. Specifically, we proposed to use DS-CDMA combined with UWB signaling to avoid degrading power delivery. We also proposed a new scheme to multiplex scan-out data using PPM and CDMA schemes.
Our SPICE simulations indicate that the proposed method is indeed feasible for parallel scan design. Scan-in test data are recovered correctly at power pads for 44 scan channels under a realistic PDN and a realistic noise model. However, to deploy our method in the real world, further study is necessary, especially into the load boards of ATE. Finally, the proposed method offers the possibility for monitoring the internal logic values through a PDN and power/data pins, which opens many new applications in testing and diagnosis.
7.
