Abstract-This paper presents a chip-package hybrid clock distribution network and delay-locked loop (DLL) with which to achieve extremely low jitter clock delivery. The proposed hybrid clock distribution network and DLL provide digital noise-free and low-jitter clock signals by utilizing lossless package layer interconnections instead of lossy on-chip global wires with cascaded repeaters. The lossless package layer interconnections become high-frequency waveguides and provide a repeater-free clock distribution network; thus, the clock signal becomes free of on-chip power supply noise. The proposed chip-package hybrid clock scheme has demonstrated a 78-ps peak-to-peak jitter at 500 MHz under a 240-mV on-chip simultaneous switching noise condition versus a conventional clock scheme, which produced a 172-ps peak-to-peak jitter under the same condition. Moreover, the proposed scheme has demonstrated an 80-ps long-term jitter with a 300-mV DC voltage drop test condition, contrasted with the 380-ps long-term jitter of a conventional clock scheme. Finally, the proposed hybrid clock scheme has a confirmed delay of 1.47 ns versus a conventional clock scheme delay of 2.85 ns.
I. INTRODUCTION

I
N MODERN high-speed and high-current digital ICs and system-in-packages (SiPs), the distribution of stable power supply voltages and the suppression of simultaneous switching noise (SSN) at the power supply network are important design issues [1] - [3] . These design challenges are becoming more serious as CMOS device technology advances rapidly into the nanoscale in high-performance digital integrated chips. The SSN severely affects on-chip clock signal waveforms, and it eventually produces an unacceptable amount of clock jitter and delay at the clock distribution network. Since the clock jitter reduces the timing margin of high-speed data transmission systems, it should be limited to the lowest possible amount. There are numerous noise coupling paths of the SSN to noise sensitive circuits, including clock generation circuits, such as phase-locked loops (PLLs) or delay-locked loops (DLLs). Since they have either a voltage-controlled oscillator (VCO) or a voltage controlled delay line (VCDL), both of which are highly sensitive to the power supply noise, the clock signal already contains a small amount of jitter even immediately after being generated. However, the PLL or DLL circuits can be localized on an area of a chip, and the power supply to the PLL or DLL can be separated from other power supplies of noisy digital logic blocks. If these steps are taken, then the noise coupling through a resistive substrate or a shared power supply network can be effectively reduced [4] . In addition, a triple-well process can provide more isolation from a resistive substrate coupling noise [4] , [5] . Using these isolation techniques, the jitter of the clock generation circuits can be suppressed to below tens of picoseconds, regardless of the SSN. Another noise coupling path of the SSN is the clock distribution network. In conventional on-chip clock distribution networks, the clock signals are distributed throughout the entire chip surface area by routing a net of global on-chip interconnections with sufficient cascaded repeaters. The cascaded repeaters are employed to overcome on-chip interconnection loss and delay. However, the clock repeater then becomes a major contributor to clock jitter and delay because of the power supply fluctuations of the repeaters caused by the SSN. Therefore, even though the clock generation circuit provides an extremely low-jitter clock signal, a significant amount of the jitter is present while the clock signal travels over the distribution network. Moreover, the scaling of the CMOS process technology causes more loss on a global on-chip interconnection such as a clock distribution network [6] , [7] . While the advanced CMOS process technology can provide higher speed transistors for clock generation circuits, it would require more repeaters in the clock distribution network to overcome the increased loss. An increased number of repeaters would result in increased noise sources of clock jitter. Therefore, a clock distribution network can be even more disadvantageous than a clock generation circuit as the CMOS process technology becomes nanoscale.
Numerous attempts have been made to solve jitter problems, including the adoption of copper and low-dielectric materials, the optimization of the repeaters, the utilization of small swing signaling, and the implementation of a standing-wave oscillator on a chip [6] - [11] . These schemes have been designed to overcome on-chip interconnection loss and delay to enable high-speed clock distribution with low jitter and delay. However, the aforementioned approaches have still been affected by the on-chip power supply noise because each of these ap- proaches is implemented inside the chip where the clock distribution network remains vulnerable to the SSN.
In this paper, we propose a new chip-package hybrid clock scheme to achieve a digital noise-free clock distribution with both extremely low jitter and delay at the I/O clock receivers of a chip. In the proposed clock scheme, the interconnections of the clock distribution network and the DLL replica loop are placed on a ball grid array (BGA) package substrate in a form of a lossless high-frequency waveguide. By using the lossless microstrip-line structure on the BGA-type package substrate, nearly digital noise-free clock signals can be achieved with minimal jitter and delay from the DLL circuits to the clock receivers at the I/O circuits. There have been previous attempts to bypass the lossy on-chip global interconnections by using lossless interconnections on the package substrate [12] , [13] . In those attempts, however, the concept of a package layer clock distribution was considered from a clock delay perspective under an ideal power supply environment. Our scheme advances the concept by using a realistic noisy environment. Therefore, the package layer clock distribution can be considered not only from a clock delay perspective, but also from a clock jitter perspective. Moreover, there was no real chip implemented utilizing the package layer clock distribution. In this paper, we demonstrate an actual operating chip with the chip-package hybrid clock distribution, which even includes a DLL replica loop on the package layer. In doing so, our study provides both simulation results and measurement results for the chip-package hybrid clock scheme.
The clock signal from the DLL circuits exits the silicon chip, runs through a microstrip line net on the BGA package substrate, and then returns to the same silicon chip. The clock distribution network is located on the package substrate, rather than on the chip, and does not require clock repeaters on the BGA substrate. In contrast with conventional schemes, most repeaters can be eliminated using the lossless package layer interconnection by which the global on-chip wire loss and delay can be avoided. Meanwhile, the DLL replica loop is designed to be placed on the top layer of the package substrate. The DLL replica loop should be an exact copy of the clock distribution network so that it can replicate the delay variation of the clock distribution network. However, even with identical repeaters and on-chip interconnection length, it is not easy to make an exact copy of a noise environment such as SSN in the clock distribution network. Since the DLL replica loop in the proposed hybrid clock scheme is implemented on the BGA package substrate together with a clock distribution network, the clock signal is free of the on-chip power supply noise. Therefore, the proposed hybrid clock scheme can minimize the discrepancy between a clock distribution network and the replica loop even in a severe on-chip power supply noise environment.
The simulation models included on-chip interconnection lines, microstrip lines on the BGA package substrate, and bonding wires for the electrical link of the chip to the package substrate. In addition to the interconnection line models, on-chip, package, and PCB level power supply network models were incorporated into the simulation to estimate the amount of power supply SSN noise inside the chip. The package and PCB power supply network models had an -parameter data format, while the on-chip power supply network model was a distributed RC net. After explaining the interconnection structure of the proposed hybrid clock scheme in Section II, the interconnection line models and the power-supply network models are described in Section III. In Section IV, the hybrid I/O clock transmission test chip is tested against a conventional I/O clock chip to verify the improved jitter and delay performance of the proposed hybrid clock scheme. Concluding remarks are provided in Section V.
II. PROPOSED HYBRID I/O CLOCK DISTRIBUTION SCHEME
The three-dimensional graphic drawings in Fig. 1 illustrate the difference between a conventional I/O clock scheme and the proposed hybrid I/O clock scheme. In this study, the conventional I/O clock used is generated on a silicon chip via a clock generation circuit DLL, and is distributed on the same silicon chip by lossy on-chip wires to each I/O buffer, which is located at the periphery of the chip. This on-chip clock distribution network requires sufficient cascaded repeaters to compensate for the heavy wire loss. Moreover, the feedback loop of the DLL requires the same number of repeaters to duplicate the identical clock distribution network. In contrast, as shown in Fig. 1(b) , the proposed hybrid clock scheme eliminates the repeaters, which are indispensable to the conventional clock scheme. In the hybrid clock scheme, the clock signal is also generated from a silicon chip by a clock generation circuit DLL, and I/O buffer circuits are still present on the same silicon chip. However, the electrical interconnections between the clock generation circuit and the I/O buffer circuits are implemented onto a BGA package substrate to form a lossless microstrip line instead of the silicon chip. Subsequently, the repeaters, which are the main noise sources of the clock jitter, are removed both from the clock distribution network and the DLL feedback loop. Hence, the SSN generated on the silicon chip is isolated both from the clock distribution network and the DLL feedback loop. The resulting clock signal is expected to have extremely low jitter without being degraded by the SSN. Moreover, the clock delay of the hybrid clock distribution scheme is much shorter than that of an on-chip clock distribution scheme. If a clock signal has a rising time shorter than 2.5 times the flight time of the microstrip line, then the clock delay of the hybrid network is fundamentally determined both by the length of the microstrip line net on the BGA substrate and by the phase velocity of the electromagnetic wave at the microstrip line. However, if a clock signal has a long rising time and does not satisfy the condition above, then the clock delay is mainly determined by driver impedance and total capacitance of the hybrid clock network, because the microstrip line can be considered as a lumped capacitance. In either case, the delay of the hybrid clock network is independent of the on-chip wire process technology. On the other hand, on-chip delay is mainly determined by the resistance and capacitance of on-chip wires. The delay of a conventional clock network increases as CMOS process technology is scaled down to nanodimensions because the more advanced technology entails lossier on-chip wires. Therefore, the proposed hybrid clock network will offer both a lower amount of jitter and a shorter propagation delay time than those of a conventional clock network. Fig. 2 (a) shows a physical diagram of the proposed chip-package hybrid I/O clock scheme using chip-to-package bonding wire interconnections. As shown in this diagram, parasitic capacitance and inductance are present at the chip-package interface due to bond pads and ESD circuits for the wire bonding and the ESD protection. These extra pads and ESD circuits increase loading capacitance on the hybrid clock network. The total capacitance of the hybrid clock network should be kept as low as possible to maintain the maximum possible clock signal amplitude. In addition to the parasitic capacitance, the effects of the bonding wire inductance should be carefully considered. The inductance effects on the clock jitter can be negligible when the driver impedance and the total capacitance of the hybrid clock network satisfy the following relation described in (1):
In (1), is a damping factor of the hybrid clock network and is the output impedance of the hybrid clock driver. The terms and refer to the total capacitance and the total inductance, respectively, of the hybrid clock network, including the extra pads, the ESD circuits, and the bonding wires, as well as the interconnections for the clock distribution on the BGA substrate. Equation (1) is a condition for an overdamping mode of a second-order RLC network [14] . When the hybrid clock network meets this condition, the inductance of the bonding wires has no affect on the slew rate of the hybrid clock driver. Accordingly, the impact of the parasitic inductance on the clock jitter becomes negligible.
The hybrid clock network in this paper has 67 of , 7.9 pF of , and 6.16 nH of . The consists of a junction capacitance of a driver (72 fF), a gate capacitance of eight receivers (576 fF), a 12-mm microstrip line capacitance (1008 fF), a capacitance of nine extra pads (2700 fF), and a capacitance of eight ESD circuits (3200 fF). The consists of an inductance of bonding wire (1.4 nH) and a 12-mm microstrip line inductance (4.56 nH). Therefore, is 1.2, and the condition of (1) is satisfied. Under this condition, the inductance variation no longer affects the slew rate of the hybrid clock driver, as mentioned above, which indicates that the clock jitter is strongly dependent on driver impedance and total capacitance rather than bonding wire inductance. For the same reason, the slight differences between bonding wire inductance values at each receiver node do not affect the clock skew under this over damping condition. On the other hand, substituting flip-chip interconnection bonding for the bond wire interconnections makes it easier to fulfill the condition of (1) by reducing the parasitic inductance of the flip-chip bond. With flip-chip interconnection technology, lower driver impedance and lower total capacitance than those of bonding wire interconnection technology can be adopted, thus guaranteeing a larger signal amplitude at a given receiver node than that provided by bonding wire technology. Therefore, flip-chip technology can contribute to enhancing the operating frequency range of the hybrid clock network. Fig. 2(b) shows a schematic diagram of the proposed chippackage hybrid I/O clock scheme, which comprises a DLL circuit, I/O buffers, and a digital noise generator on a silicon chip. The DLL has capacitor-loaded inverters as a delay cell, a charge pump loop filter, and a phase frequency detector. The hybrid clock driver, which is a part of the DLL circuits, drives 16 I/O buffers and a DLL feedback loop through the microstrip line structure on the package substrate. However, a 5-mm line of a total 17-mm interconnection is still routed on the silicon chip with two cascaded inverters to recover the magnitude of the clock signal before it reaches each I/O buffer circuit. The DLL feedback loop also has two cascaded inverters before reaching the phase detector. The required number of repeaters in conventional clock schemes is increasing as clock frequencies are increasing and as silicon process technology is advancing into the nanoscale. In this paper, the conventional I/O clock network has ten repeaters to drive 16 I/O buffers, which are 17 mm apart from the DLL output driver.
The inter symbol interference (ISI) caused by parasitic capacitance and inductance at the chip-package interface can hurt the clock signal waveform in the hybrid clock channel. However, since the clock signal has a periodic pattern, the ISI simply degrades the voltage margin of the clock signal instead of generating timing jitter. Fig. 3 shows the measurement results of the ISI effect for different signal patterns. While the timing jitter of a periodic signal is free from the ISI, the timing jitter of random data suffers from the effects of ISI resulting in deterministic jitter, as shown in the jitter histogram. The magnitude of this graph is attenuated by 10 dB because of the on-chip probe characteristic. The signal attenuation caused by the ISI is recovered at the receiver circuit, which is composed of cascaded inverters in this design. Since the slew rate of the receivers is much larger than that of the driver circuit, the timing jitter caused by the cascaded inverters is not a significant problem for the hybrid clock network. Once the condition of (1) is satisfied, limiting the total capacitance to as low as possible is the principal means to control timing jitter of the hybrid clock network.
III. MODELING AND SIMULATION
To verify the performance of the proposed chip-package hybrid clock network, we have incorporated models of the chip, package, and PCB interconnections into the simulations simultaneously. There are two types of models: the interconnection model and the power supply network model.
The interconnection models used in this simulation are shown in Fig. 4 . Fig. 4(a) shows an on-chip wire model extracted from a 0.35-m four-metal CMOS process. For a metal-4 layer of 1-m linewidth, the on-chip wire can be modeled using a distributed RC circuit model with a 2.5-resistance and a 13-fF capacitance per unit of 100-m line length. A conventional clock distribution of 17-mm line length is simulated using the on-chip wire model with repeaters at every length period of 1700 m. In contrast, a microstrip line on the BGA package substrate has a 120-m linewidth, a 12-m metal thickness, and a dielectric constant of 4; this microstrip line is modeled using a distributed RLC circuit model with 380-pH inductance and 84-fF capacitance per unit length of 1 mm. In this model, a 5-resistance is included as a series form with the capacitance to present the effect of skin effect loss and dielectric loss efficiently. This simple distributed RLC model is adequate to simulate the hybrid clock distribution network. Fig. 4(b) shows a comparison of the characteristics of the distributed RLC model with the characteristics of a 3-D full-wave simulation for up to 5 GHz. The characteristic impedance of the microstrip line is set to 67
. The reason for setting the characteristic impedance above 50 is to reduce the I/O driver strength, which should also be matched with the impedance of the microstrip line. By reducing the driver strength, the total current consumption of the hybrid clock network can be decreased. In our design, the 67-driver impedance enables the hybrid clock network chip to operate with a power consumption less than or equal to that of the conventional clock network chip under comparison. Fig. 4(c) illustrates the bonding wire model acquired from an -parameter extraction using a 3-D full-wave simulation by fitting the with the distributed RLC circuit model up to 5 GHz. In addition to the bonding wire model, bonding pads and ESD circuits are included in the chip and package interface model with a total capacitance of 1 pF. Using these interconnection models and chip-package interface models, we simulated the hybrid clock distribution for a 17-mm line length.
The power supply network is designed to have less than 250 mV of SSN with a 2.5-V operating voltage. Since the digital switching noise generator was designed to have a peak current of 250 mA, the impedance of the power supply network was kept below 0 dB for the entire frequency range of interest. Fig. 5(a) depicts an on-chip power supply network model. Metal-3 and metal-4 layers of 30-m linewidth are used for the on-chip power supply network. The power/ground lines are placed every 400 m and form a grid structure. On-chip decoupling capacitors with a total of 2-nF capacitance are inserted in the power supply network model. The on-chip decoupling capacitors of 2-nF capacitance limit the impedance of the on-chip power supply network to below 0 dB when operating above 80 MHz. On the other hand, at frequency ranges below 80 MHz, the decoupling schemes and power supply network at the package and the PCB dominantly determine the on-chip SSN. Fig. 5(b) and (c) shows the power/ground plane cavity models of the multilayer package and the PCB. If the package and PCB power plane cavities are modeled using the distributed RLC circuit models, too many nodes are generated and the computation time is significantly increased in the transient simulations. To conduct an efficient chip, package, and PCB co-simulation, the plane cavity models of the package and the PCB power supply network are transformed to an -parameter data format. Fig. 5(b) and (c) presents how the package and PCB power plane cavity are modeled using a transmission-line matrix (TLM) model to extract the required -parameters [15] . Ten decoupling capacitors with a total capacitance of 220 nF are inserted in the PCB power plane model to achieve a power supply network impedance of below 0 dB for a frequency range below 80 MHz. Fig. 6(a) shows the power supply network impedance at the PCB acquired from both the simulation and measurement. A two-port self impedance measurement method was used to ob- tain an accurate and reliable measurement of the milliohm-scale power supply network impedance of the PCB [16] , [17] . First, the impedance follows an impedance curve of a discrete type decoupling capacitor of 2.2 F on the PCB. Then, at 2.7 MHz, a series resonant peak is observed, which occurs at 2.2-F capacitance and 1.6-nH equivalent series inductance (ESL) of the 2.2 F on-PCB decoupling capacitor. Over 2.7 MHz, the ESL of the decoupling capacitor becomes dominant in determining the impedance slope of the curve. Furthermore, the ESL curve meets a capacitance curve of the PCB power plane and then a parallel resonant peak occurs at around 700 MHz. At around 900 MHz, another series resonant peak occurs, where the resonance is produced by the power plane capacitance and the power inductance of the PCB plane cavity. The next resonant frequencies in the impedance curve in Fig. 6(a) follow a cavity resonant equation expressed as
In (2), and are the sizes of the PCB power plane in the and directions, respectively, while and are the mode numbers of the standing waves in the and directions, respectively [18] . A resonant peak of the mode appears at 1.18 GHz.
However, these resonant peaks generated by the cavity mode of Fig. 6 (a) are screened out by the impedance lowering effect enabled by the on-chip decoupling capacitors, as noted in Fig. 6(b) . Fig. 6(b) shows the simulated power supply network impedance at the on-chip power line. A 2-nF on-chip decoupling capacitor eliminates the resonant peaks over 80 MHz, as shown in Fig. 6(b) . It is also found that the parallel resonant frequency at 700 MHz in Fig. 6(a) is shifted to 100 MHz in Fig. 6(b) . In Fig. 6(b) , the ESL line of the discrete-type decoupling capacitor at the PCB meets the capacitance line of the on-chip decoupling capacitor, rather than the capacitance line of the PCB power plane. Therefore, the impedance curve of the power supply network in the hybrid clock simulation model has less than 0 dB in the entire frequency range, except around the parallel resonant frequency. In our study, since a 1-GHz clock signal is generated and distributed through the clock distribution network, the resonant peak does not affect the signal integrity of the clock signal. As a result, the impedance of the power distribution network is well controlled, and it is verified by simulation that the SSN inside the chip is confined to within 240 mV, as shown in Fig. 6(c) .
Using interconnection and the power supply network models, we compared the clock jitter and the delay of the proposed hybrid clock network with a conventional clock network. Two dif- ferent types of clock jitter, called "short-term jitter" and "longterm jitter," are compared in this paper. Fig. 7 illustrates the definition of each jitter. Short-term jitter comes from the SSN generated by switching logic blocks. Since the SSN causes variation in the time from one rising/falling edge to the next for the clock signal, it generates a cyclical jitter, which we call short-term jitter in this paper. On the other hand, long-term jitter usually arises when a chip changes its operating mode. For example, when a DRAM chip turns its operating mode from a normal read status with just a single toggling output into a dual bank burst read status with all toggling outputs, there would be average voltage drop, as shown in Fig. 7 . The clock signal then has a positive phase offset because the voltage drop causes an additional delay on the clock distribution network. When the chip changes its operating mode conversely from that above, the clock signal has a negative phase offset caused by a reduced delay on the clock distribution network. Of course, the DLL will re-lock the delay offset via its locking process. However, even though the long-term jitter does not occur every cycle, it should also be considered in the timing budget of high speed I/O systems. Fig. 8 presents the simulation results of short-term clock jitter. Under a 240-mV SSN environment, a 25-ps clock jitter is expected in the hybrid clock network, while a 95-ps clock jitter is predicted in the conventional clock network. Fig. 9 shows the simulation results of long-term clock jitter, which is caused by an abrupt average voltage drop or a recovery in the power supply. The hybrid clock network is found to be robust for this kind of the jitter, and it exhibits a 90-ps clock jitter when a 300-mV voltage drop occurs: in contrast, a conventional clock network has a 280-ps clock jitter with an equal voltage drop. Finally, Fig. 10 shows the simulation results of the propagation delay. The conventional I/O clock network has a 1.645-ns delay to drive 16 loads with 17-mm line length from the DLL output driver with a 2.5-V power supply. Meanwhile, the hybrid clock network shows a 620-ps delay to drive these same loads.
These simulation results were obtained based on the interconnection and power supply network models. The simulation results were verified through a series of measurements, which are shown in Section IV.
IV. MEASUREMENT AND DISCUSSION
To validate the effectiveness of the hybrid clock network as compared to a conventional clock network with repeaters, a hybrid clock network chip and package were fabricated and tested against a conventional clock network chip and package using a 0.35-m four-metal CMOS process and a plastic BGA package process. The two chips were compared under the same conditions of switching noise. The test chip for the hybrid clock transmission is presented in Fig. 11 ; it has a total area of 4 mm 4 mm along with the fabricated BGA package. In the conventional clock network chip, a clock signal runs through a distribution line that is 17 mm in length with 12 repeaters, and it drives 16 I/O buffers. In contrast, in the hybrid clock network chip, the clock distribution line of 12 mm is routed through the BGA package layer. In addition, for the DLL replica loop, the line of 12-mm length is routed through the off-chip BGA package layer. Accordingly, a total of ten repeaters are eliminated in the hybrid clock network chip. The BGA package has a four-layer stack-up and the top layer is used for the clock signal and the DLL replica routings. Fig. 12 shows the measured results of short-term clock jitter. For a 500-MHz clock signal distribution, the proposed hybrid clock network shows a 78-ps peak-to-peak jitter and a 14-ps rms jitter. The jitters were measured under a 240-mV power supply noise with a 2.5-V power supply and an accumulation of 1270 clock cycles. The conventional clock network exhibits a 172-ps peak-to-peak jitter and a 32-ps rms jitter under the same conditions. The reduction of the jitters is enabled by the removal of the ten repeaters from the on-chip clock distribution interconnection.
It was also confirmed that the proposed hybrid clock network offers noticeable improvement of long-term clock jitter. The long-term clock jitter was measured under a 300-mV average voltage drop environment. For the measurement, we provided two different external power supplies that had a 300-mV gap, and we measured the phase shift at each power supply separately. Fig. 13 shows the measured long-term clock jitter. It was found that the hybrid clock network generates an 80-ps long-term clock jitter, while the conventional clock network generates a 380-ps long-term clock jitter. Since the hybrid clock distribution network consists of a lossless interconnection, instead of a lossy on-chip wire and number of repeaters, the clock delay becomes insensitive to a power supply voltage drop, resulting in a significant reduction of the long-term clock jitter. Fig. 14 shows the measured clock propagation delay. In the hybrid clock network, the clock signal runs just 5-mm on-chip wire and 12-mm line on the lossless package layer. In contrast, in the conventional clock network, the clock signal runs 17-mm on-chip line with ten repeaters. As a result, the hybrid clock network has a 1.47-ns delay in driving 16 loads with a 2.5-V power supply, whereas the conventional clock network has a 2.85-ns delay to drive an equal number of loads.
The differences between the simulation results and measurement results are caused by the measurement environment. First, a clock signal generator and additional traces to make probing pads on the board, which were not considered in the simulation, increase the short-term clock jitter in the measurement. For the long-term clock jitter, the measurement result of the on-chip clock distribution shows more delay variation than that of the simulation. This is mainly caused by a process variation, which creates a small discrepancy between the transistor model parameters and the actual chip that we measured. Each repeater, which has 10 ps more of delay variation, causes an approximately 100-ps difference for the clock distribution network, which comprised ten repeaters. For the clock propagation delay, the simulation result was obtained by investigating the DLL output to the receiver input, while the measurement result was obtained by probing the board from a clock signal generator to the probe pad of output signals, and this probing causes an additional delay path to the measurement result. However, both the simulation and measurement data consistently demonstrated that the hybrid clock distribution scheme reduces short-term clock jitter, long-term clock jitter, and clock propagation delay, as compared to a conventional on-chip clock distribution scheme.
The chip-package hybrid design scheme still requires on-chip interconnections for wafer-level tests. However, the wafer-level tests are not used to verify of high-speed operations; they are used to verify the functionality of a chip. Therefore, the on-chip Through the parallel routing of on-chip and on-package layer interconnections, the chip-package hybrid chip can be tested at the wafer level. The parallel routing has little effect on the clock jitter because the main propagation path of the clock signal is still the lossless package layer. For the clock delay, the parallel routing causes an approximate 100-ps additional delay because of the increased capacitance of the on-chip routing: however, the proposed scheme is still much faster than a conventional design scheme [19] . Besides timing jitter and delay, power consumption is also an important design issue for clock distribution networks. As mentioned in Section III, the hybrid clock driver was designed not only to correspond to the microstrip-line impedance, but also to keep the total current consumption of the hybrid clock network from exceeding the current consumption of a conventional clock distribution network. As shown in Table I , the hybrid clock network consumes 113 mA at 500 MHz with a 2.5-V power supply, while the conventional clock network consumes 110 mA under the same operation conditions. This shows that we can obtain the benefits of the hybrid clock network, including low jitter and low delay characteristics, without excessive power consumption.
V. CONCLUSION
In this paper, we have proposed a new chip-package hybrid clock distribution network and DLL with which to achieve an efficient I/O clock distribution with minimal jitter and minimal delay. The performance improvements of the proposed scheme were obtained by combining the on-chip interconnection lines and the microstrip interconnection lines on a BGA package layer, thereby eliminating repeaters from the clock distribution network. The on-chip repeaters are a major noise coupling path from the digital switching circuits to the clock distribution network.
To verify the advantages of the proposed scheme, we have conducted simulation and measurement. For the simulation of the hybrid clock network, the interconnection and power supply models were extracted at the chip, package, and PCB levels and were incorporated into the simulation as a distributed RLC circuit form and an -parameter data format. The measurement data demonstrated that the proposed clock distribution network has a 78-ps peak-to-peak jitter and a 1.47-ns delay at a 500-MHz clock distribution. In contrast, the compared conventional on-chip clock distribution chip has a 172-ps peak-to-peak jitter and a 2.85-ns delay under the same test conditions. The reduction of the jitter and the delay is achieved by reducing the number of the on-chip repeaters, which are indispensable to a conventional on-chip clock distribution.
