A standing wave oscillator (SWO) is a perfect clock source which can be used to produce a high frequency clock signal with a low skew and high reliability. However, it is difficult to tune the SWO in a wide range of frequencies. We introduce a frequency tunable SWO which uses an inversion mode metal-oxide-semiconductor (IMOS) field-effect transistor as a varactor, and give the simulation results of the frequency tuning range and power dissipation. Based on the frequency tunable SWO, a new phase locked loop (PLL) architecture is presented. This PLL can be used not only as a clock source, but also as a clock distribution network to provide high quality clock signals. The PLL achieves an approximately 50% frequency tuning range when designed in Global Foundry 65 nm 1P9M complementary metal-oxide-semiconductor (CMOS) technology, and can be used directly in a high performance multi-core microprocessor.
Introduction
Great attention has been focused on global clock distribution of high performance microprocessors due to the continuous increase of chip size and frequency. The prevailing methodology to generate and distribute a clock is to use a phase locked loop (PLL) and hierarchical clock buffers. According to this methodology, up to 50% of the total power dissipation is attributed to the clock distribution network, and the reflection and the capacitive load make it harder to achieve higher frequencies. These trends point to a need for a novel clocking methodology.
A recently proposed resonant global clock distribution scheme has the potential to reduce global clock power and clock skew. As described in Wood et al. (2001) , Chan et al. (2003; , and Drake et al. (2004) , from different types of resonant clocks, the standing wave oscillator (SWO) technology can provide clock signals with a constant phase and varying magnitude. As a result, it is easier to recover the constant phase, constant magnitude, and low skew clock signals from SWO architecture, and this is the reason why SWO is recently attracting more and more attention. Andress and Ham (2004) listed the types of SWO architecture. O'Mahony (2003) and O'Mahony et al. (2003) summarized the theory of SWO and emphasized all the design issues of designing SWO. Andress and Ham (2005) described how to use a varactor to change the frequency of an SWO. Cordero and Khatri (2008) contributed a rotary Mobius architecture to cover larger areas or obtain higher frequencies. Mandal et al. (2011) provided a methodology to naturally enlarge the SWO rotary area without decreasing the frequency.
While using the technology of SWO in chip design, there are still two problems to be addressed. The first is how to tune the SWO's frequency to meet different requirements. The second is how the SWO should be used without greatly changing the whole design methodology. In this paper, a new PLL architecture based on a frequency tunable SWO technology is proposed. This architecture can be used not 
Tunable SWO design
In this study, we use inversion mode metaloxide-semiconductor (IMOS) field-effect transistors as varactors to change the SWO's frequency, and try to find out the relationship between the position of the varactors and the frequency response of the SWO. In Global Foundry 65 nm 1P9M technology, the SWO's frequency can be tuned from 4 GHz to 5 GHz with the parameters listed in this section. Based on this tunable SWO, we can design a new clock system which can change its frequency automatically according to a reference clock. Fig. 1 shows the simulation architecture of a λ/4 SWO. We use an internal element, named U-element, in the HSPICE (a circuit simulation tool of Synopsys) to simulate the transmission line, and use a cross coupled inverter pair (CCIP) as the current source. This architecture has one CCIP and 17 equally sized U-elements, and the parameters are listed in Table 1 . The simulation result shows that the base frequency is 5.3 GHz with a power rating of 12.2 mW.
Simulation environment

Varactor architecture
In this study, we use IMOS to model the behavior of the varactor. The IMOS architecture is illustrated by a cross-section of the device in Fig. 2 . The varactor's basic structure is a negative channel metal-oxide-semiconductor (nmos) transistor. The drain and source are shorted to form one of the capacitor terminals, while the polysilicon gate forms the other. This structure has capacitance that varies almost monotonically according to the voltage diffences between the source and gate (V sg ).
We simulate the capacitances of four types of nmos transistors under different V sg settings. These four types of transistors have different threshold voltages (Fig. 3) . In Fig. 3 , 'hvtfet' stands for high threshold voltage nmos, 'rvtfet' for regular threshold voltage nmos, 'lvtfet' for low threshold voltage nmos, and 'zvtfet' for zero threshold voltage nmos. We found that the minimum capacitance decreases along with the threshold voltage while the maximum capacitance remains mostly unchanged. In this study, we choose zvtfet to design the varactor to obtain a wider frequency tuning range.
SWO tuning architecture
We simulate the two structures described in Andress and Ham (2005) . The first uses lumped (Fig. 4a) , and the second uses distributed varactors (Fig. 4b) . Each structure has 16 stages of varactors composed of two back-to-back IMOSs and one control signal. The control signal connects the source and drain of these two IMOSs. The two polysilicon gates of these IMOSs are connected to a differential transmission line. When V sg is set to 0 V, we say that the varactor is turned on. The capacitance is then changed, so is the frequency.
The lumped varactor's architecture tunes the frequency by changing the boundary condition, while the distributed varactor's architecture modifies the wave velocity on the transmission line to the tune frequency. We choose the parameters listed in Table 1 for these two architectures to compare the tuning results and the power dissipations.
Simulation results
We simulate the frequency and power of these two architectures when different numbers of control signals are turned on. Figs. 5 and 6 show the simulation results. The maximum frequencies are 5.0 GHz and 5.3 GHz for the lumped and distributed varactor's architecture, respectively; the minmum frequencies are 4.0 GHz and 4.9 GHz for the lumped and distributed varactor's architecture, respectively. The power of the lumped varactor's SWO is about 12.8 mW at 5 GHz, and the power of the distributed varactor's SWO is about 12.2 mW at 5.3 GHz. They all increase when more varactors are turned on. As is known, the voltage amplitude decreases along the transmission line. When the varactor is placed near the short end, the voltage of the gate (V g ) decreases and the V sg increases. According to Fig. 3 , the maximum capacitance will decrease. So, distributing varactors that are the farthest from CCIP will increase the maximum frequency within a narrow tuning range. In contrast, the power of the varactors is larger near the CCIP because the leakage current is larger with a high V g .
Summary
According to the simulation results, the lumped varactor's architecture achieves a 20% tuning range (4.0-5.0 GHz), while the distributed varactor's architecture achieves only 7.5% under the same conditions (4.9-5.3 GHz). However, the former has an about 5% larger power dissipation (12.2 mW and 12.8 mW for the distributed and lumped architecture at their maximum frequencies, respectively).
There are a few other differences between these two architectures when used in chip design. First, the varactors in a distributed architecture should be uniform to keep the line's parameters consistent and to avoid harmonics. Second, distributed varactors need more control signals, and these signals should be derived through long distance. These requirements make it hard to design a flexible and reliable clock system using distributed varactor's architecture, so lumped varactor's architecture is selected for chip clock design in this study.
SWO based PLL
There are still some problems to be resolved when using an SWO as a clock source in a chip. The first is how the frequency can be locked to a needed frequency with acceptable performance. The second is how to use the transmission line as part of the clock distribution network to decrease the clock power and skew. The third is how to cover the whole chip area with this new architecture.
In this study we design a new PLL using a frequency tunable SWO. This PLL has the same architecture and work flow as a normal voltagecontrolled-oscillator (VCO) based PLL, and can provide the needed output frequency within a proper frequency range. The most important thing is that this PLL can be used not only as a normal clock source but also as part of a clock distribution network. Fig. 7 shows the diagram of an SWO based PLL prototype and Fig. 8 details the SWO architecture. This new PLL includes a phase frequency detector (PFD), a charge pump, a loop filter, a frequency divider, an SWO, and a clock recovery block. As for the SWO, we choose a lumped varactor's architecture to obtain a wide tuning range, and the output of the charge pump is distributed locally at the same time. This PLL has one reset signal, a few configuration signals for frequency dividers, and a few control signals for the SWO.
SWO based PLL architecture
In Fig. 7 'ref_clk' is the reference clock of the PLL, 'test_clk' is the output of the frequency divider, 'cfg[*]' are the configuration signals for the frequency divider, and 'EN[*]' are the preset control signals for the varactors. The PFD compares the 'test_clk' and 'ref_clk', and outputs 'up' and 'down' signals. These two signals are connected to a charge pump to control its output voltage level. The output signal of the charge pump is 'ctrl', which is connected to the varactor of the SWO as the control signal.
There are two types of varactors in an SWO as shown in Fig. 8 . The first type of varactor uses the Fig. 5 , the preset signals can quickly initialize the SWO to a certain frequency which is close to the needed frequency. Although this method can obtain a short clock locking process, there are some frequency holes in the tuning range because of the nonlinear capacitance changing depicted in Fig. 3 .
Simulation results
Before the reset signal is removed, the control signals 'EN[*]' and configuration signals 'cfg[*]' are set to a proper initial state according to the relationship between the reference clock frequency and the target frequency. At this point, the frequency of the PLL output clock is close to the target frequency, so the locking process is fast. When the reset signal is removed, the output of the charge pump can tune the frequency little by little to achieve the target frequency according to the output of the PFD.
We use the same parameters in Table 1 for the SWO to simulate the locking process. The total width of the varactors controlled by the 'EN[*]' signals is 399 μm, and the width of the varactor controlled by the output of the charge pump is selected as 57 μm. The PLL can work between 3.2 GHz and 4.8 GHz depending on the setting of the 'EN[*]' signals. In our simulation, the divider factor N is set to 16 through the 'cfg[*]' signals, and all the 'EN[*]' control signals are set to 1 V. The frequency of the reference clock is set to 300 MHz, so the target frequency is 4.8 GHz.
As shown in Fig. 9 , the 'ctrl' signal is 0 V at the beginning, the frequency of 'test_clk' is 291 MHz, and the frequency of the SWO is about 4.6 GHz (291×N MHz). After the 'reset' signal is removed, the frequency of the 'test_clk' increases slowly with the voltage of the 'ctrl' signal and is finally locked at 300 MHz, and then the frequency of the SWO is locked at 4.8 GHz (300×N MHz). The total power of this PLL is 14.6 mW at 4.8 GHz, and the power of the SWO is 13.1 mW at 4.8 GHz. To analyze the clock jitter of the PLL, we measure the jitter of the recovered clock in 10 000 cycles. The maximum period is 213.2 ps, the minimum is 203 ps, the average is 208.1 ps, and the jitter is less than 5% of the clock period.
Discussion
This new PLL architecture has a few performance bugs which should be resolved in the future. For example, the tuning range of this prototype is not as wide as that of a normal PLL. Although adding a larger varactor can enlarge the frequency tuning range, the parameters need to be carefully selected to make the SWO work steadily. The design procedure of selecting the correct varactor architecture could be challenging if the size of the tuned varactor is set to wide. Then the tuning range can be too narrow for the frequency 'holes'.
In our simulations, the parameters of the charge pump and the loop filters are copied directly from a normal PLL. When the PLL is locked, there is a little voltage variation in the 'ctrl' signal of the varactor, which increases jitter in the output clock signal. The details of a practical architecture should be subject to further study.
Proposed usage
Clock buffer
As described in O'Mahony (2003) , a special clock buffer should be used to convert the differential sinusoids to digital levels. This clock buffer is connected between the SWO and a conventional low level clock network (Fig. 10) .
This clock buffer will consume more power than a normal clock inverter and also introduce a few picoseconds of skew according to where it is connected to the SWO as depicted in O'Mahony (2003) .
Clock distribution
A lot of papers have discussed how to construct a clock grid using multiple SWOs. This architecture can distribute a low skew clock signal in a whole chip, which normally requires a globally synchronized clock signal.
More and more high performance chips use globally asynchronous locally synchronous architecture to currently eliminate the requirements of a clock distribution network. In these chips, each processing core has its own PLL to provide a clock signal within the local area; therefore, the skews between different cores are not required. So, there is no need to construct a large global clock grid, and a few coupled SWOs can meet this requirement. Fig. 11 illustrates a typical clock network architecture of a four-core CPU using an SWO based PLL as described in this study. In Fig. 11 , the CPLL is the PLL for the core, PPLL is the PLL for the PCIE unit, DPLL is the PLL for the DDR unit, and IPLL is the PLL for the interconnect unit. These units use asynchronous schemes to connect each other, so the skews between these units are negligible. Thus, we need not pay any more attention to how to construct a coupled SWO grid to control skew in a large area.
Conclusions
In this paper, we analyzed the frequency tunable SWO architecture and presented a novel SWO based PLL architecture which can be used not only as a clock source but also as part of a global clock network. The new PLL works between 3.2 GHz and 4.8 GHz, and achieves an over 50% frequency tuning range with acceptable jitter in 65 nm technology. This PLL can be used directly in globally asynchronous locally synchronous chips.
There are several practical aspects of the SWO based PLL which require further investigation. The most important is the issue of facilitating low frequency operations such as scan and burn-in. Unlike VCO based PLL, the SWO based PLL cannot directly bypass low frequency clock. One way to resolve this problem is to modify the SWO design using switched shorts described in O'Mahony (2003) . The second issue is the loading of the clock buffer. The effects of the distribution of capacitive loading should be studied carefully to avoid harmonics and clock skews.
