Abstract-This paper presents an injection-locked clock (ILC) distribution system with a new active deskew mechanism based on the built-in phase tuning of injection-locked oscillators (ILO). The proposed technique removes the required deskew delay lines and associated power dissipation, clock latency and jitter accumulation in conventional active deskew schemes. A test chip was fabricated in a standard 0.18 m digital CMOS process to demonstrate this new technique. Working at 3.5GHz clock frequency, the ILOs in the ILC achieved 40ps deskew range with a step size of 1.25ps. The deskew loop successfully achieved a skew reduction from the preset value of 16ps to 2ps. The cycle-to-cycle jitter degradation from clock input to clock output is measured only 0.04ps.
I. INTRODUCTION
Injection-locked clocking (ILC) has been proposed for multi-GHz clock distribution [1] , which uses injection-locked oscillators (ILO) as local clock generators. Compared to conventional clocking based on buffered trees and grids, it can achieve lower jitter while consuming less power [2] . Compared to other resonance-based clocking schemes proposed recently [3] , [4] , ILC's jitter performance is not limited by the quality factor Q of on-chip resonator, which explains why injection locking has recently been adopted for resonant clocking [5] - [7] .
Skew, however, remains a major challenge to be addressed in these new clocking schemes including ILC. Balanced Htree and grid structure have been used in conventional clock distribution to reduce skew [8] . Due to the load mismatch, and process, voltage and temperature (PVT) variations, it becomes increasingly difficult to use this method to effectively control the skew in a multi-GHz system. More seriously, as the load capacitance and the temperature profile of the microprocessor vary when running different tasks, induced skew changes with time. Passive skew reduction techniques cannot predict and control such dynamical skew variations, and active deskew is needed.
Conventional active deskew methods [9] , [10] compensate the skew by adding tunable delays to different clock paths. They are designed to reduce the clock skew after the chip fabrication, and capable of tracking the skew variations dynamically. The tunable delay is typically implemented by active delay lines which are loaded with switched-capacitor arrays [9] , or built with current starved buffers [10] . These approaches proved effective in conventional clocking and have been applied to resonant clocking [7] . However, adding active delay lines has several disadvantages. First, it consumes extra power; second, it increases the clock latency substantially due to the delay tuning requirement; most importantly, the extra active delay line tends to degrade the clock jitter significantly. This is because power supply noise coupled through clock buffers is the main contributor to jitter accumulation in the conventional clock distribution [11] , and adding active delay lines for deskew further increases the length of the buffer chain in the clock signal path.
In a multi-GHz clock distribution system, jitter degradation due to deskew circuitry must be minimized considering the ever-decreasing timing margin when clock frequency increases [13] . Therefore, we propose a new active deskew scheme that utilizes the phase tuning capability of ILOs to compensate the skew [1] , [15] . This deskew scheme does not require the additional deskew delay lines and hence does not suffer in clock latency and jitter while consuming minimal extra power. In this paper, we will for the first time demonstrate such an active deskew method in an ILC system.
II. ILO-BASED ACTIVE DESKEW
In order to avoid the jitter degradation of the conventional active deskew scheme, the active deskew scheme for ILC is based on the ILO's phase tuning capability, as shown in Fig. 1 . The ILO at the input of each local clock domain works as both a local clock regenerator and a deskew buffer. Clock skews between different clock domains, or between each clock domain and a global reference, are measured by phase detectors and sent to deskew control logics (DSKs) associated with each ILO. The DSKs, under the coordination of the microprocessor core, then control the phase shift of each ILO to compensate the skew and accomplish the deskew function. It is worth noting that, skew can also be intentionally inserted for skew scheduling with the desired skew information loaded to the microprocessor core.
The key for such an active deskew scheme is the phase tuning capability of ILOs, which has been demonstrated in our previous work [1] , [14] , [15] . As shown in determined by the difference between the input frequency , and the free-running oscillation frequency ¼ , of the ILO, which means that the ILO phase can be tuned by varying its free-running oscillation frequency ¼ determined by its resonator (Fig. 2a) . This phase tuning characteristics of an ILO is plotted in Fig. 2b , which shows the phase shift with respect to frequency tuning at different injection ratios [15] . It can be seen that the phase shift tuning characteristics is pretty linear in the middle of the tuning range for all injection ratios, which is good for linear deskew implementation. Also since the delay tuning of ILOs are centered around zero phase shift, there is no clock latency increase as in conventional active deskew approaches. Frequency tuning of an ILO can be implemented like in a generic voltage controlled oscillator by varactors [1] or switched capacitor array [15] in its resonator. The switched capacitor array approach is more suitable for digital deskew control. Thanks to the digitized control signals, it exhibits better immunity to power supply and substrate noise, which is a serious threat to analog circuits in a noisy microprocessor environment.
Unlike conventional deskew delay lines, an ILO has less paths to pick up power supply noise. Even when noise manages to couple into the ILO, it is largely high-pass filtered, considering that an ILO behaves like an PLL with large loop bandwidth. Thus the jitter degradation of this ILO based deskew scheme is reduced compared to conventional deskew schemes. Interestingly, in [7] , it was proposed that injection locking would reduce the jitter introduced by conventional deskew delay lines, which exhibits an low-pass filtering characteristics. Since the deskew control logic is digital, it is critical to verify the ILO's dynamic behavior, and particularly, how an ILO responds to deskew control steps. As shown in Fig. 3 , the ILO transient response to a full-scale step of the control word (from 0 to 31) is simulated in time domain, which means all the switched capacitors in the ILO resonator are turned on. From the waveform, we can see the transient time is within several clock cycles even with a pessimistic rise time on the control signal. Also the transition dynamics does not disrupt the ILO output phase. Fig. 4 shows the schematic of the test chip to demonstrate the proposed ILO-based active deskew in ILC. The input clock signal is distributed by a passive H-tree to each clock domain, and injection-locked to an ILO. Each ILO drives a 2pF clock load, which models the local clock load in real processors, through a differential to single-ended buffer ( Ù ½ ). The ILOs use a newly developed transformer-direct-injection topology [15] , as shown in the blow-up of Fig. 4 . A 5-bit binaryweighted switched capacitor array is implemented in the LC resonator of each ILO for phase tuning. The 5-bit binary coded digital control is generated from a 5-bit bi-directional counter built inside the deskew control logics (DSKs). The DSK algorithm and an example of the deskew sequence is shown in Fig. 5a and Fig. 5b . When the counter counts UP, the ILO free-running frequency decreases, and the ILO phase tuning increases, and vice versa. The counter value can also be preset from external, as shown in Fig. 5a This enables a manual adjustment of the ILO delays for test purpose. The clock phases from adjacent clock domains are compared by a digital phase detector, and the 1-bit skew information is fed into the deskew control logic, which is Ò in Fig. 5 . The skew information for two previous cycles are also stored by two registers Ê ½ and Ê ¾ to implement the ringing detection and prevention algorithm in the deskew control logic. Once a ringing happens, the DSK forces the counter to enter a stop state, until a start from external to restart the deskew control logic.
III. TEST CHIP IMPLEMENTATION
The test chip was fabricated in a standard 0.18 m digital CMOS technology. The clock frequency is set at 3.5GHz, representing the state-of-the-art processor speed. The transformer and inductors in the ILOs are all built with the 0.35 m-thick top metal layer. The transformers have a k factor of 0.77 at 3.5GHz. The symmetric spiral inductor have an inductance value of 2.8nH and quality factor of 4.1 at 3.5GHz. The die photo of the test chip is shown in Fig. 6 , which measures 2mm by 2mm.
IV. MEASUREMENT RESULTS
The test chip is measured on a probe station. Locking range of the ILC network is measured up to 6.5% with input amplitude of 0.6V (Fig. 7) . Time domain clock waveforms from each clock domain are measured on 50GHz sampling oscilloscope to study the clock timing. When comparing the timing of different clock domains, connector and cable delay mismatch is first characterized and then used to calibrate the measured results.
The free-running frequency tuning and locked state phase tuning of ILOs are first characterized to show the deskew capability of the ILC network, as shown in Fig. 8 . The measured ILO free-running frequency tuning is pretty linear with a step size of 2.5MHz. This linear frequency tuning generated a linear phase tuning for ILOs in the locked state, with range of 40ps and a step size of 1.25ps Then the dynamics of the deskew loop is measured as shown in Fig. 9 . An initial skew of -16ps is preset between the two clock domains before the deskew loop starts. The deskew loop reduces the skew to a final residual value of 2ps within 15 cycles of the deskew clock with a little overshoot. The residual skew can be attributed to the deskew step size limitation and phase detector offset.
The phase noise of ILC output clock is measured and compared with that of the input clock source and free-running ILO (Fig. 10a) . The ILC output tracks the phase noise of the source clock up to 10MHz, and shows up to 60dB improvement over free-running case near the offset of 10KHz. Cycle-to-cycle jitters for both ILC output and input clock are measured with a self-triggered method, and compared in Fig. 10b . The jitter accumulated in ILC is largely negligible (0.04ps).
Each ILO in the test chip consumes 12mA from a 1V 
