Abstract-Resonant clock distribution with distributed LC oscillators is promising to reducing clock power and jitter noise. Yet the difficulty in the integration of on-chip inductors still limits its application in practice. This paper resolves such a key issue with sub-50 µm magnetic inductors, which are fully compatible with the CMOS process. These inductors leverage soft magnetic coils to achieve inductances up to 4nH, Q-factor of 3 at 1 GHz with a device diameter of only 30-50 µm, resulting in area savings of nearly 100X as compared to conventional design. The latency and noise performance of the resonant clock network is demonstrated to be comparable to those using conventional inductors without soft magnetic materials. In addition, inductors with integrated magnetic materials significantly reduce mutual coupling and eddy current loss in the power grid below the clock network. These design advantages enable high density of on-chip distributed oscillators, providing better phase averaging, lower power and superior noise characteristics as compared to traditional buffer-tree based clock network.
I. INTRODUCTION
Clock distribution network in a microprocessor traditionally employs a grid driven by tuned and balanced clocktree. With process scaling and increased number of cores on a chip, meeting the skew and jitter requirements under process variations has become a major design challenge. Additionally, the clock distribution network can take up to a quarter of the overall chip power budget [1] . Resonant clock distribution schemes that recycle the charge of the load capacitors using inductive reactance provide a lownoise low-power alternative to conventional tree-based clock distribution strategies. The two principle resonant clocking techniques are standing-wave [2] and traveling-wave clock generation [3] that utilize the inductance of transmission lines. In spite of being low-skew, low-jitter and reduced power consuming schemes, standing-wave clocks suffer from spatially varying amplitude while traveling-wave clocks have non-uniform phase across the distribution.
To overcome these issues, an H-tree clock using spiral inductors in resonance with on-chip capacitance which generates uniform phase and uniform amplitude signal was proposed [4] . However, this scheme utilizes large on-chip decoupling capacitance to establish a mid-rail DC voltage around which the clock network operates. Additionally, the clock signal is sinusoid resulting in slower transition times. These shortcomings are resolved using a distributed differential LC oscillator global clock network injection-locked to an external reference [5] . Implementation of large on-chip inductors poses unique design challenges. In addition to area penalty, it has been noted that the inductors have very low quality factor due to the eddy currents in power grid and inductance drop due to mutual coupling [4] . In this paper, we propose a resonant distributed LC oscillator based clock distribution network employing scaled magnetic inductors. However, these are extremely scaled on-chip spiral inductors have low quality factor due to eddy current losses in the conductive magnetic material. The impact of low-Q inductors on the resonant clock distribution and the trade-offs involved are discussed in this paper. The important performance improvements on using scaled magnetic inductors are:
• 4nH inductance, 2 GHz cut-off frequency and peak quality factor 3, at 1 GHz with inductor diameter 30x30 µm 2 .
• 10 ps and 13 ps peak period jitter for an injected low frequency power-supply noise of 150 mV and 300 mV respectively.
• For similar performance, scaled magnetic inductor occupies 900 µm 2 , an area savings of 87X, as compared to a bare inductor occupying 78,400 µm 2 .
• 72% less power dissipation than a non-resonant clock network.
• Negligible mutual coupling and very low eddy current loss in the metal layers underneath the spiral inductor. Fig. 1 shows optical microscope images of bare and spiral in- ductor with wrapped magnetic ring. On-chip spiral inductors with integrated magnetic materials have been demonstrated to have inductance increase of up to 16X [6] .
The remainder of the paper is organized as follows. In Section II, a detailed overview of simulation results and fabrication of scaled magnetic inductors is presented. In Section III, the distributed oscillator based clocking scheme and its key advantages are discussed and a comparitive study of scaled magnetic inductor and bare inductor as components of a resonant clock network is undertaken. Section IV discusses the results and conclusions are presented in Section V.
II. SCALED MAGNETIC INDUCTOR
On-chip inductors occupy very large amounts of space on an integrated circuit with relatively low inductance values. Physical scaling of inductors while maintaining similar performance metrics is one of the major challenges today. Various research groups have implemented inductors using magnetic films [7] , [8] , [9] , with nominal improvements ranging from 17% to 130% increase because of low permeability magnetic material or ineffective structure. Inductance increase of 16X using magnetic vias was reported in [6] , but the design suffers from poor frequency response (100Mhz) and low quality factor. 
A. Magnetic Inductor Optimization
Spiral inductors with top and botttom magnetic layers (Permalloy, Ni 80 Fe 20 , µ r = 600) are used to achieve high inductance. Multiple simulations were run in Ansoft HFSS to identify the optimum magnetic inductor structure. Since permalloy is conductive, breaking the continuous film into stripes minimizes the eddy current loss. Connecting the top and bottom magnetic layer using a magnetic via and using magnetic material with laminations results in more than 10X increase in inductance as shown in Fig. 2 (b) . This magnetic ring structure enhances the inductance with a high frequency response allowing reduction in device footprint down to 30x30µm 2 . Fig. 2 (a) shows the magnetic inductor simulation setup in HFSS. Figs. 2 (b) and 2 (c) show the inductance and quality factor of inductors with magnetic rings of different laminations (varying conductivity) from 100 MHz to 5 GHz. The DC inductance is 4nH, cut-off frequency approximately 2 GHz and a peak quality factor of 3 at 1 GHz.
B. Fabrication Process
A description of the fabrication process for the complete magnetic ring structure inductor is shown in detail in Fig.  3 . The process started with a clean quartz substrate. The bar shape was patterned by E-beam lithography (EBL) with length about 20 µm and width about 1 and 2 µm, respectively. 1 µm of permalloy layer was then deposited by magnetron sputter deposition ( Fig. 3(a) ). In order to obtain the high permeability of this magnetic material, laminations are fabricated by alternately depositing layers of 50 nm permalloy followed by depositing layers of 5 nm chromium. For the purpose of isolating magnetic material and the conductor lines, a thin polyimide (1.5 µm) was used as the insulation material and structure holder. After spin coating, polyimide was soft baked at 80 • C first and cured at 250 • C for 2 hours in argon atmosphere. Then, magnetic via was defined by EBL and polyimide was removed by oxygen plasma etching (Fig. 3(a) ). The inductor was also defined by EBL and followed by copper deposition and lift-off process, of which thickness is 1µm (Fig. 3(b) ). To isolate the conductor lines and the top magnetic layer, a second layer of polymide (1.5µm) was spin coated followed by RIE etching a smaller opening for the second magnetic via structure (Fig. 3(c) ). Chemical Mechanical Polishing (CMP) was avoided in order to take advantage of the spacer layer of polyimide formed by spin coating. Upon completion of the final magnetic layer, the carefully aligned second magnetic bar patterns were defined by EBL. The top permalloy layer was deposited using the same process as described above, in completing the whole magnetic ring wrapped inductor (Fig.  3(c) ). Before each metal deposition, the sample was dipped in hydrochloric acid to remove the oxide formed in via regions during oxygen plasma etching process in order to get best contact between metal layers. Fig. 1 shows a top-down optical microscope photograph of the fabricated inductor, which consists of the sputtered magnetic ring, spiral conductor lines, and polyimide as an insulation material. A bare inductor is fabricated in parallel as a reference. The width of the conductor lines is 1 µm with spaces of 1.5 µm and a thickness of 1 µm. The magnetic ring structure has a width of 1 µm and thickness of 1 µm.
In spite of additional steps required to fabricate the magnetic inductors, the associated process cost in an industrial setup will not be significant since it requires simple modifications to the layout mask. Even though permalloy is not used in a CMOS process, CMOS compatible magnetic inductors using CoZrTa alloys have been demonstrated to achieve similar improvements in inductance [6] .
III. RESONANT CLOCK DESIGN USING SCALED MAGNETIC INDUCTORS
A distributed differential LC resonant clock distribution scheme was proposed in [5] . As shown in Fig. 4 , in this scheme, resonant oscillators are coupled together using interconnects to form a clock network. Each oscillator consists of a spiral inductor resonating with the load capacitor of the local buffers and negative differential transconductor to maintain the oscillations. The local buffers convert the differential sinusoidal output signal of the oscillators to single-ended full-swing square waves. A test chip implemented in 0.18µm CMOS technology demonstrated an order of magnitude less power-supply-induced jitter and power dissipation compared to a conventional tree-driven-grid clock network [5] .
The on-chip spiral inductor in the demonstrated prototype design has an outer diameter of 280 µm giving an inductance of 6.4nH and a quality factor of 6. Large scale implementation of these inductors throughout the chip pose unique design challenges. It has been shown that the actual inductance values and quality factor drop significantly due to eddy current losses in the power grid lines running in the lower metal layers. This can result in noise performance degradation of the oscillators and a shift in resonant frequency. Cutting the power grid lines to reduce eddy current losses increases local IR drops and affects the onchip transmission line properties [10] . Based on the results in Section II, we study the performance of the resonant clock distribution network on replacing bare spiral inductors with scaled magnetic inductors. Since the diameter of these inductors are scaled down to 30-50µm with comparable inductance, area savings is one of the obvious advantages.
A. Simulation Setup
As shown in Fig. 2 , scaled magnetic inductors suffer from relatively lower quality factor (Q=3 at 1 GHz), due to some eddy current loss in the conductive magnetic material. This is modeled as R s = ωL s Q which gives 8.37 Ω series resistance for the scaled magnetic inductor at 1 GHz. In order to compare the performance of bare and magnetic spiral inductors in the distributed oscillator based resonant clocking scheme, a single unit (differential oscillator and local clock buffer) were designed and simulated in SPICE using 0.18 µm CMOS model files. The quality factor for the bare inductor based oscillator is assumed to be 6 [4] , giving a series resistance of 4.18 Ω. For an operating frequency of 1 GHz, the load capacitance of the local buffers is found to be 6 pF. Using these values, the negative differential transconductor transistors are appropriately sized and biased to get sinusoidal oscillation and full-swing square wave clock at the output of the local buffer. The circuit schematic of the oscillator, output buffer and equivalent inductor model used in simulations is shown in Fig. 5 .
To obtain jitter statistics, power supply noise of 150 mV and 300 mV are injected to the oscillator V DD nodes. The noise frequency is 100 MHz, one-tenth of the clock frequency. It is found that noise at this frequency is caused by die and package LC and severely impacts the local and surrounding speed paths in a microprocessor [11] . To extract jitter histogram, period jitter is calculated as Period Jitter = t measured − t ideal which gives the variation of the measured clock period from the ideal clock period. The rms period jitter is calculated as given in [5] ,
3D field solver simulations of bare inductor (outer diameter 280 µm) and scaled magnetic inductor (outer diameter 30 µm) with metal grid layer beneath them are run in Ansoft HFSS, to observe the effect of mutual coupling and eddy current loss in the presence of power grid analogous to an actual microprocessor circuit.
IV. RESULTS AND DISCUSSION
A. Effect of bias current and transistor sizes Fig. 6 shows the clock waveform at the output of the differential oscillator and the local buffer. The output swing of the oscillator is determined by the bias current in the oscillator and the transistor sizes. The low swing sinusoidal output of the oscillator is converted to rail-to-rail square wave using the local output buffers.
Differential oscillator using scaled magnetic inductor (Q=3) requires more current to compensate for the high resistive loss in the inductor and sustain oscillations. To obtain equal waveform envelope as compared to the bare inductor (Q=6) based design, nearly twice the current is required, as shown in Fig. 6 . Low swing signals are more susceptible to power supply noise. Hence, there is a direct trade-off involved between low power and noise immunity when designing the clock network with magnetic inductors. Fig. 7 shows a plot of the combination of transistor area and the bias current required to sustain oscillations. The design with scaled magnetic inductor requires nearly three times more transistor area for the same bias current to sustain oscillations compared to the design with bare inductor. RMS mean period jitter extracted from simulation decreases with increasing bias current since larger voltage swing at the output of the oscillator is more immune to power-supply noise. Hence, for the resonant clock using magnetic inductors to be operational at low bias currents (low power) with better noise immunity, there is an area penalty due to larger transistors. Fig. 8 shows the period jitter histograms obtained from simulations of the differential oscillator using bare and magnetic inductors. Both the designs have transistors of the same size. For the design using scaled magnetic inductor, the peak jitter obtained is more than twice compared to the design with bare inductor. On adding power supply noise of 150 mV and 300 mV, the peak period jitter is 25 ps and 34 ps respectively for Q=3, while it is 8 ps and 11 ps for Q=6 design. As noted in Fig. 7 , we can increase the transistor sizes for the Q=3 case to match the output waveform envelope of the Q=6 case. This makes the circuit more immune to power supply noise and jitter statistics are comparable to the Q=6 case. For the resized design, peak jitter reduces to 10 ps and and 13 ps for 150 mV and 300 mV power supply noise, respectively. Additional improvement can be achieved by increasing the bias current of the oscillator.
B. Jitter Statistics and Power Dissipation
The scaled magnetic inductor has a power dissipation of 5.9 mW/pF, about twice than that of the bare inductor design of 2.3 mW/pF. However, this still gives 72% power savings compared to the non-resonant network with a power dissipa- tion of 21 mW/pF as reported in [5] . The area overhead for similar performance is about 160 µm 2 in transistor sizing. Bare inductors occupy 78,400 µm 2 , while scaled magnetic inductors occupy 900 µm 2 resulting in an area savings of 87X.
Additionally, it should be noted that in the complete design, the output of the differential oscillators is injection locked to a reference frequency for better phase averaging. An injection-locked oscillator is more immune to power supply induced jitter as well as data-dependent clock jitter than a free-running oscillator [12] .
C. Eddy current loss
Eddy current losses in the power grid running below the inductor structure are significant, reducing the the inductance and quality factor of the spiral by mutual coupling. Results from 3D field solver simulations (Fig. 9) show the magnetic field lines of the spiral penetrating into the lower metal layers of the microprocessor, especially the power grid. This is directly manifested as a change in the inductance value of the spiral shown in Fig. 11 .
In case of scaled magnetic inductors (Fig. 10) , most of the magnetic flux is confined inside the magnetic ring structure, hence significantly reducing eddy current losses in the power grid. There is negligible change in the inductance of the spiral since the magnetic ring reduces the effect of mutual coupling as shown in Fig. 11 . Eddy currents form in a very small area of the grid owing to the reduced dimensions of the magnetic inductor. Thus, scaled magnetic inductors can be easily integrated on to a microprocessor without any detrimental effect on the rest of the circuit due to magnetic coupling.
V. CONCLUSION
Resonant clock distribution schemes are an attractive alternate to conventional balanced H-tree based clocking since they offer low power and better immunity to noise. But implementing large on-chip spiral inductors on the top metal layer of microprocessor circuits can be a tedious design procedure with unwanted implications. To overcome these issues, we propose implementing the resonant clock network using scaled magnetic inductors. A comprehensive simulation study shows that the clock network possesses noise immunity similar to clock networks implementing bare spiral inductors at the cost of marginally higher power dissipation. Area savings of nearly two orders of magnitude per inductor can be achieved. The magnetic ring structure gives better insulation from mutual coupling with other circuit elements. The area savings can be utilized to implement a higher density of distributed oscillators allowing better phase averaging and uniform clock throughout the chip. Thus, scaled magnetic inductors pave the way for practical implementation of high-signal integrity resonant clock distribution network in large scale multi-core microprocessor circuits.
VI. ACKNOWLEDGEMENT
