I. INTRODUCTION
C LOCKING at gigahertz rates requires generators with low skew and low jitter to avoid synchronous timing failures. The notion of a "clocking surface" becomes untenable at gigahertz rates [1] , frequently mandating that large VLSI chips are subdivided into multiple clock domains and/or utilize skew-tolerant multiphase circuit design techniques [2] .
Techniques such as distributed phase-locked loops (PLLs) [3] and delay-locked loops (DLLs) [4] can control systematic skew to within 20 ps, but are complex, introduce random skew (i.e., jitter), and have area penalties. H-tree distribution systems, while simple, are difficult to balance and can use upwards of 30% of a chip's total power budget [5] . All these systems are inherently single-phase, induce large amounts of simultaneous switching noise, and can be highly susceptible to this noise.
Researchers have therefore looked to alternative oscillator mechanisms for better phase stability and lower power consumption. Previous transmission-line systems such as salphasic distribution [6] , distributed amplifiers [7] , and adiabatic LC resonant clocks [8] provide only a sinusoidal or semisinusoidal clock, making fast edge rates difficult to achieve. This paper introduces the rotary traveling-wave oscillator (RTWO); a differential LC transmission-line oscillator which produces gigahertz-rate multiphase (360 ) square waves with low jitter. Extension of the RTWO to rotary oscillator arrays (ROAs) offers a scalable architecture with the potential for low-power low-skew clock generation over an arbitrary chip area without resorting to clock domains. Simulations predict rise and fall times of 20 ps on a 0.25-m process and a maximum frequency limited only by the of the integrated circuit technology used.
Experiments show that although the RTWO operates differentially, careful attention is required to guard against magnetic field couplings between the clock conductors and other structures if the potential performance of these oscillators is to be realized.
II. CONCEPT OF THE ROTARY CLOCK OSCILLATOR

A. Fundamentals and Structures
The basic ROA architecture is shown in Fig. 1 . A representative multigigahertz rotary clock layout has 25 interconnected RTWO rings placed onto a 7 7 array grid. Each ring consists of a differential line driven by shunt-connected antiparallel inverters distributed around the ring. This arrangement produces a single clock edge in each ring which sweeps around the ring at a frequency dependent on the electrical length of the ring. Pulses are synchronized between rings by hard wiring which forces phase lock. Fig. 2 illustrates the theory behind the individual RTWO. Fig. 2(a) depicts an open loop of differential transmission line (exhibiting LC characteristics) connected to a battery through an ideal switch. When the switch is closed, a voltage wave begins to travel counterclockwise around the loop. Fig. 2(b) shows a similar loop, with the voltage source replaced by a cross-connection of the inner and outer conductors to cause a signal inversion. If there were no losses, a wave could travel on this ring indefinitely, providing a full clock cycle every other rotation of the ring (the Möbius effect).
In real applications, multiple antiparallel inverter pairs are added to the line to overcome losses and give rotation lock. Rings are simple closed loops and oscillation occurs spontaneously upon any noise event. Unbiased, startup can occur in 0018-9200/01$10.00 © 2001 IEEE either rotational sense-usually in the direction of lowest loss. Deterministic rotation biasing mechanisms are possible, e.g., directional coupler technology or gate displacement [9] . Once a wave becomes established, it takes little power to sustain it, because unlike a ring oscillator, the energy that goes into charging and discharging MOS gate capacitance becomes transmission line energy, which is recirculated in the closed electromagnetic path. This offers potential power savings as losses are not related to but rather to dissipation in the conductors where can be reduced, e.g., by adoption of copper metallization. Very large distributed transistor widths give substantial capacitive loading to the lines, thus lowering velocity to give a reasonably low clock rate from a compact oscillator structure. In application, up to 75% of this capacitance can come from load capacitance, reducing the size of the drive transistors accordingly.
B. Waveforms
The upper traces of Fig. 3 show the simulated voltage waveforms on the differential line at points labeled A0, B0. The lower traces show the current in the conductors to be 200 mA, while the supply current is simulated at 84 mA with 4.5 mA of ripple. This clearly illustrates that energy is recycled by the basic operation of the RTWO. Just driving the 34 pF of capacitance present would require 275 mA at this frequency (from ).
C. Phase Locking
Interconnected rings, as in Fig. 1(a) , will run in lockstep, ensuring that the relative phase at all points of an ROA are known. It is possible to use a large array of interconnected rings to distribute a clock signal over a large die area with low clock skew. For example, referring to Fig. 1(a) , all the points marked with the equals sign have the same relative phase as that arbitrarily marked as 0 . At any point along the loop, the two signal conductors have waveforms 180 out of phase (two-phase nonoverlapping clock). A full 360 is measured along the complete closed path of the loop. In principle, an arbitrary number of clock phases can be extracted. Phase advances or retards depend on the direction of rotation, and Fig. 4 shows the current-voltage relationships for clockwise and counterclockwise rotation.
D. Network Rules
Although the square-ring shape is convenient to show diagrammatically, it is only one example of a more general network solution which requires ROAs to conform closely to the following rules.
1) Signal inversion must occur on all (or most) closed paths.
2) Impedance should match at all junctions.
3) Signals should arrive simultaneously at junctions. From 1) above, any odd number of crossovers are allowed on the differential path and regular crossovers forming a braided or "twisted pair" effect can dramatically reduce the unwanted coupling to wires running alongside the differential line.
The differential lines would typically be fabricated on the top metal layer of a CMOS chip where the reverse-scaling trend of VLSI interconnect offers increasingly high performance [10] . Fig. 5 illustrates a three-dimensional section of the ring structure connected to a pair of CMOS inverters expanded to show the four individual transistors. The main current flow in the differential conductors is shown by solid arrows, the magnetic field surrounding these conductors by dashed loops, and the capacitance charge/signal-boost current flowing through the transistors by dashed lines.
E. Fields and Currents
An important feature of differential lines is the existence of a well-defined "go" and "return" path which gives predictable inductance characteristics in contrast to the uncertain return-current path for single-ended clock distribution [11] .
Capacitance arises mainly from the transistor gate and depletion capacitance and interconnect capacitance does not dominate.
indicates intrinsic gate resistance, i.e., the ohmic path through which the gate charge flows. The term implies a parasitic gate term, but in reality, most of this resistance is in the series circuit of the channel under the gate electrode. This is shared by the D-S channel, as illustrated by the triangular region (shown with transistors operating in the pinchoff region). Fig. 6 is an expanded view of a short section of transmission line with three sets of back-to-back inverters shown. It is assumed that startup is complete and the rotating wave is sweeping left to right. For this analysis, we view the inverter pairs as discrete latch elements.
F. Coherent Amplification, Rotation Locking
Each latch switches in turn as the incident signal, traveling on the low impedance transmission line, overrides the ON resistance of the latch and its previous state. This "clash" of states occurs only at the rotating wavefront and therefore only one region is in this cross-conduction condition at any one time. The transmission-line impedance is of the order of 10 and the differential on-resistance of the inverters is in the 100--1-k range, depending on how finely they are distributed throughout the structure.
Once switched, each latch contributes for the remainder of the half cycle, adding to the forward-going signal. Coherent buildup of switching events occurs in this forward direction only. An equal amount of energy is launched in the reverse direction, but the latches in that direction cannot be switched further into the state to which they have already switched. The reverse-traveling components simply reduce the amount of drive required from those latches.
Importantly, it is the nonlinear latching action which is responsible for the self-locking of direction (a highly linear amplifier has no such directionality).
To clarify the above statements, Fig. 7 demonstrates how a large CMOS latch responds to an imposed differential signal. The curve trace shows a central differential-amplification region bounded by two absorptive ohmic regions (shaded) corre- sponding to the two latched states. Except at the wavefront location where amplification takes place, the ring structures will be terminated ohmically to the supplies.
The four-transistor "full-bridge" circuit minimizes supply current ripple to the cross-conduction period.
G. Frequency and Impedance Relations
In simulation models (and indeed as fabricated), the RTWO transmission line is built up from multiple RLC segments, and therefore, these primary line constants must be identified. (1) where interconnect capacitance for the line AB; gate overlap and Miller-effect feedback capacitance; total channel capacitance; drain depletion capacitance to bulk (substrate); load capacitance added to a line.
(Note that the is used to convert the in-parallel "to ground" values into in-series differential values of capacitance.)
is usually a small part of total capacitance and accurate formulas are available [12] if needed.
To calculate the per-unit-length differential inductance, i.e., accounting for mutual coupling, we use [13] , expressed below.
(2) 
Transmission line characteristics dominate over RC characteristics when [14] (6)
H. Bandwidth and Power Consumption
Seen from an RF perspective, Fig. 8(a) shows the RTWO to be two push-pull distributed amplifiers folded on top of each other. Distributed amplifiers exhibit very wide bandwidth because parasitic capacitances are "neutralized" by becoming part of the transmission-line impedance [15] . Performance is limited by the carrier transit time of the MOSFETs [16] , not by the traditional digital inverter propagation time , which is not applicable where gates and drains are driven cooperatively by an imposed low-impedance signal, and where the load capacitance is hidden in the transmission line.
Operation of the RTWO is largely adiabatic when the voltage drop required to charge the capacitances is developed mainly across the inductance: (7) and when the intrinsic gate resistance is low relative to the reactance of the gate capacitance. Edges become faster and cross-conduction losses are reduced when the structure is more distributed. Table I lists characteristic changes with  , where  with , and held constant.
The most significant power loss mechanism for the RTWO is power dissipated in the interconnect, given by
Most of the remaining losses in Table I are attributed to crossconduction and parasitic losses. is a real loss mechanism for gigahertz signals, and RTWO rise/fall times can be doubled by this phenomenon. In newer CMOS processes, improves with shorter channel length.
III. MORE DETAILED CONSIDERATIONS
A. Skew Control
Interconnected RTWO loops offer the potential to control skew in spite of relatively large open-loop time-of-flight mismatches. Functionally, phase averaging occurs by pulse combination at the junction of multiple transmission lines. For a four-port junction, the normal operating mode will see two pulses arriving at the junction simultaneously. These two sources will feed two output ports and signal flow will be unimpeded by reflections if impedance is matched. This amounts to a situation similar to that described in [17] , [18] , although for ROAs, the mechanism is LC transmission-line energy combination, not ohmic combination of CMOS inverter outputs.
Where there exists a time-of-flight mismatch, one pulse arrives at the junction before the other. Fig. 9(a) depicts the operation of a four-port junction between of two interwired but velocity-mismatched RTWO loops. Each of these rings has been divided into segments numbered (each as Fig. 8 ). Four rings are wired together (similar to Fig. 16, shown later) . Only the junction of the rings and are considered here; the latter having a higher open-loop operating frequency. From simulation, two pulse-combination effects appear to be present, the simplest of which is the impedance match effect where the first signal to arrive at a junction must try to drive three transmission lines. If all ports have equal impedance, the junction can only reach a quarter of the full signal value and a reflection occurs driving an inverted signal back down the incident port [ Fig. 9(b) ]. Initially, detrimental effects on signal fidelity arising from this reflection are overcome when the other pulse arrives, whereupon the pulses combine and branch into the output ports, as shown in Fig. 9(c) .
The second pulse combination effect is believed to be due to nonlinear MOSFET drain capacitance, which can modulate the velocity of the line. Reflections can drive the MOSFETS from the ohmic state into the low-capacitance pinchoff region, locally increasing velocity.
Quantitative Results From Simulation: Fig. 10 presents the results of a SPICE simulation of the above situation with an extreme condition of velocity mismatch. A 50% variation of oxide thickness is modeled across a small 2.4 2.4 mm chip having four interconnected rings. Thick oxide (lower ) devices are on the right side of the chip, giving a 22.5% phase velocity increase relative to the left side.
Looking at these results with reference to Fig. 9 reveals that the first pulse arrives from ring and passes point A at time ps and begins its rise time. Within this rise time, the leading edge reaches the nearby junction, where negative reflections bounce back to momentarily prevent A passing through the 1.5-V level. The second pulse arrives from the slower left-hand ring , reaching point B at approximately ps. It then combines with the first pulse at the junction to branch into the two output ports without further reflections.
By ps, the signals have reached points A and B
and are essentially coincident-forward progress of the waves in rings and are now synchronized. The phase-locking phenomenon occurs at every junction of the array (not just the junction considered here) and twice per oscillation cycle which accounts for the smaller than expected initial skew seen between the rings.
Simulations of typical arrays show that lockup is achieved within a few nanoseconds from powerup after signals settle into the lowest-energy state of coherent mesh.
B. Coupling Issues Related to Layout
The induced magnetic fields from the rotary clock structures can be strong. This is because is relatively high (square waves). The magnetic coupling coefficient, however, depends on the angle between source and victim and falls to zero when the angle becomes 90 . Fig. 11 illustrates a 90 layout technique to minimize inductive coupling problems. The top metal M5 (running left to right) is used to create the differential RTWO, while orthogonal M4 is used as a routing resource for busses into and out of areas bounded by the clock transmission line.
For capacitive coupling, fast rise and fall times imply high displacement currents and a potentially aggressive noise source. Differential transmission lines tend to mitigate such effects [19] , and in Fig. 11 , the total capacitive coupling area between each of the transmission-line conductors and any M4 conductor is balanced. If the clock source were ideally differential, no net charge would be coupled to the M4 wires. For the RTWO, distributed inverters force the waveforms to be substantially differential and nonoverlapping, keeping glitches below the sensitivity of a typical gate.
For the five-metal test chip (Section V), a 45% utilization of M4 was used for the 90 routing pattern immediately underneath the RTWO rings. This coverage allows the M4 to act as both a routing resource and as an electrostatic shield similar to [20] , preventing electrostatic coupling to signal lines further below. Magnetic fields are not attenuated much by this configuration, because the spaces between the thin perpendicular M4 lines break up the circulating currents which could repel a magnetic field. Substrate magnetic fields [21] are, therefore, to be expected.
Coupling to co-parallel (0 ) victim conductors is potentially much more problematic (discussed later in Section IV-C).
C. Tapoff Issues and Stub Loadings
It is possible to "tap into" the ROA structure (Fig. 11) anywhere along its length and extract a locally two-phase signal with known phase relationship to the rest of the network. This signal can then be routed via a fast differential transmission line to other circuits and will generally represent a capacitive stub on the RTWO ring.
For minimum signal distortion, the round-trip time-of-flight (forward and backward along the stub) must be much less than the rise time and fall time of the clock waveform: (11) When the above condition is met, the capacitance can be taken as being effectively lumped on the main RTWO ring at the tap point for the purposes of predicting oscillator frequency and ring impedance.
Although not immediately apparent, this condition is achievable in practice due to three factors. The first factor is that the tap line velocity is relatively fast for SiO dielectric. It is approximately , while the main RTWO oscillator ring might be operating at perhaps . The second factor is that the tap length only has to be long enough to reach within a single RTWO ring. The third factor is that it requires two signal rotations on the RTWO to complete a clock cycle. These three factors work together to make the RTWO rings physically small compared to the expected speed-of-light dimensions. The distances to be spanned by the fast tap wires are therefore short enough that transmission-line effects on these lines are unimportant-certainly at the clock fundamental frequency and even at higher harmonics. This can be illustrated by reference to a specific 3.4-GHz RTWO, 3200 m long with 20-ps rise/fall times. Within one of these rise or fall periods, a stub transmission line with velocity is able to communicate a signal over a distance of 3 mm. For a stub length of 400 m (to reach the center of the ring), this equates to 3.75 round-trip times along the stub. Fig. 12 shows simulated waveforms with 2 pF of total to-ground capacitance at the end of one such stub. Reflected energy gives rise to the ringing which is evident with this level of capacitance. The line resistance of the stubs must be low to maintain reflective energy conservation.
The ratiometric factors outlined above between ring length, frequency, rise/fall time, and stub lengths are expected to hold as ROAs are scaled to higher frequencies and smaller ring lengths without requiring special stub tuning measures.
Capacitive Loading Limits: Substantial total-chip capacitive loading can be tolerated by the RTWO relative to conventionally resonant systems [8] , [22] , [23] . However, the loading effects of interconnect, active, and stub capacitances cannot be increased without limit. The consequential lowering of line impedance increases circulating currents until losses become a concern. Eventually, the impedance becomes so low relative to the loop resistance that the relation (6) cannot be maintained, whereupon oscillation ceases altogether. 
D. Frequency/Impedance Adjustment
Rewriting (4) in the form below shows that frequency is set only by the total inductance and capacitance of the RTWO loop. (12) Total loop inductance is proportional to RingLen and varies strongly as a function of the width and pitch of the top metal differential conductors. This allows a coarse frequency selection through the top-metal mask definition. Unit-to-unit inductance variation is expected to be small because of the good lithographic reproduction of the relatively large clock conductors and the weak sensitivity of inductance to metal thickness variations.
Total capacitance for the RTWO is the sum of all lumped capacitances connected to the loop (1). tends to be dominated by gate-oxide capacitance from the drive FETs and the clock load FETs.
is inversely proportional to gate-oxide thickness , which on a modern CMOS SiO is controlled to approximately 5% variation over extended wafer lots [24] . Drain depletion capacitances exist on bulk CMOS where the active transistors connect to the ring.
During the VLSI layout phase, a CAD tool (expected release: Q1 2002) can target a fixed operating frequency. The tool will be able to correct impedance discontinuities caused by lumped load capacitance by the addition of dummy "padding" capacitance elsewhere around the loop, and postcompensate an overly capacitive-loaded clock network by reducing the differential inductances through pitch reduction-hence restoring velocity and thus frequency. Alternatively, at the expense of using more metallization, a new layout with more numerous, shorter length rings could be used. The tool will need to simultaneously solve impedance matching issues [refer to Section II-A, (5)]. By manipulation of both and simultaneously, it is possible to control and independently, as shown diagrammatically in Fig. 13 . For example, velocity can be reduced by increasing both and by the same factor to cancel the effect on . These adjustments can support arbitrary branch-and-combine networks (at least in theory).
Post fabrication, adding together the sources of variation and given that frequency is related to and , a 5% initial tolerance of operating frequency between parts is expected. Matching within a die should be better, but temperature gradients and transistor size variations as they affect capacitance will lead to phase velocity changes requiring correction by the Skew Control mechanism (described in Section III-A).
Temperature can alter frequency through variation of and . Inductance variation is assumed to be negligible compared to capacitance variation and is not considered. Gateoxide thickness variation could potentially affect , but for SiO dielectric, with properties similar to quartz, this can be ignored. More significant are temperature variations of drain depletion capacitance and of transistor . To tune an ROA clock to an exact reference frequency, allowing limited "speed-binning" and reduced internal phase mismatches, closed-loop control of distributed switched capacitors [9] or varactors [25] is envisaged.
E. Active Compensation for Interconnect Losses
Resistive interconnect losses make it difficult to communicate high-frequency clock signals over a large chip without waveshape distortion and attenuation, which impacts on the practicality of reflective energy conservation schemes [6] , [22] , [23] . The skin effect loss mechanism has been evident in clock tree conductors for some time [26] and is frequency dependent. High-speed H-trees tend to use hierarchical buffers within the trees to maintain amplitude and edge rates.
Active compensation of VLSI differential transmission lines to overcome clock attenuation was shown by Bußmann and Langmann [27] to be applicable to sine-wave signals. Shunt-connected negative impedance convertors (NICs) were used with linear compensation to prevent oscillations.
The distributed inverters used within RTWOs afford active compensation for transmission-line losses, raising the apparent of the resonant rings and helping to maintain a uniformly high clock amplitude around the structure.
F. Logic Styles
Two-phase latched logic [28] is the style most compatible with RTWO. It is highly skew tolerant and through dataflowaware placement [27] offers the potential to exploit the full 360 of clock phase to reduce clock-related surging [29] , which in future systems could exceed 500 A [30] . Conventional singlephase D-latch designs can be driven where timing improvements through skew scheduling [31] might be possible. A locally four-phase system to support domino logic [2] could be implemented by wrapping two loops of RTWO line around the region to clock. Unfortunately, all of these techniques are beyond the capability of current logic synthesis tools.
IV. SIMULATED PERFORMANCE
A. Approach
To enable rapid "what-if" evaluation of potential RTWO structures, a simulation/visualization program known as Rotary Explorer [32] has been developed. Rotary Explorer is GUI driven and parametrically creates a SPICE deck of macromodels linking to FASTHENRY subcircuits [33] for multipole magnetic analysis of skin, proximity, and LR coupling effects in the time domain. MOSFETs are modeled using BSIM3v3 nonquasi-static model with an external resistor added to model (Fig. 8 ). The BSIM4 model [34] , which properly accounts for as a D-S channel component, was not available. With the Rotary Explorer program, it is possible to simulate RTWO rings independently or as interlocked arrays. The effects of tap loads, oxide thickness variations, and magnetically induced "victim" noise can be evaluated.
As a visualization aid, Rotary Explorer gives a "live" display of color-coded SPICE voltages projected onto a scaled image of the ROA structure being simulated. This aids in the intuitive understanding of reflections and how the structure achieves a steady-state phase-locked operation.
B. Results
Two very important performance metrics for any oscillator are its sensitivity to changes in temperature and supply voltage. Simulations of these effects on a nominally 3.34-GHz rotary clock resulted in the data given in Tables II and III. Supply Induced Jitter: Following on from the above and in light of the RTWO's time-of-flight oscillation mechanism, it is inferred that such voltage sensitivity will also apply to phase modulation versus voltage, i.e., jitter-at least at low supplynoise frequencies. For a single RTWO ring, the power-supply induced jitter will be related to and the power-supply rejection ratio (PSRR) by (13) where , because of the distributed nature of the oscillator, is the mean supply voltage deviation as experienced along the path of an edge as it travels two complete rotations. To improve PSRR, plans are in place to add voltage-dependent capacitance to the structure to give first-order compensation.
From simulations, we see that jitter reduces for multiple ring structures due to averaging effects.
C. Coupling II-Simulated Coupling
The Rotary Explorer program makes it easy to simulate coupled noise between an RTWO ring and user defined victim trace (drawn with the aid of a mouse). Simulated results are shown in Table IV for a 3.4-GHz RTWO configured to have 20 ps rise and fall times, and with geometry as shown in Fig. 14 .
Peak coupling magnitude occurs at 60-m victim length. A trace longer than this will see a coupling cancellation effect that approaches zero for each pitch of the braiding it traverses. Fig. 15 illustrates a notably strong coupled signal waveform at victim distance m, with no loading on the victim trace and one end connected to ground. Note the more sensitive noise scale.
The absolute maximum coupling occurs if victim distance is allowed to go to zero. In this case, mutual coupling between aggressor and victim is 100% with no cancellation effects from the other differential trace. As a numerical example, it follows that a 2.5-V signal with a rise time of 20 ps on a transmission line with a velocity of has the 2.5-V gradient over 430 m of length (Fig. 4 illustrates the concept) . Over the 60-m length discussed above, this equates to 348 mV. Slower edge rates, faster transmission lines, and lower supply voltages reduce this figure proportionally.
Long-range inductive noise coupling from the differential transmission line is expected to be small, since (from a distance) the 'go' and 'return' currents are equal and opposite.
Potential problems exist in short-range magnetic coupling to wiring in the vicinity of the clock lines. Inductance is lowered by coupling to any highly conductive structure in which eddy currents can flow to decrease and distort the inducing field. Couplings to less conductive circuits such as the substrate give a loss mechanism which can be modeled as a shunt term in the transmission-line equations. LC resonance in the small-scale coupled structures is unlikely because of the high resonant frequencies. All of the coupling mechanisms mentioned are edge-rate dependent, and this can limit the achievable rise and fall times of the RTWO by attenuating the high-frequency signal components.
Full RLC layout extraction is essential in the neighborhood of the clock lines if routing is allowed in these areas. An alternative proposal under investigation is to predefine a VLSI structure combining clock and power distribution into the same grid to give consistent characteristics and shielding. V. SOME EXPERIMENTAL RESULTS Fig. 16 shows a die photograph of a prototype built using a 0.25-m 2.5-V CMOS process with 1-m Al/Cu top metal M5. The conductors are relatively wide in order to minimize resistive losses of the rather thin M5. The available top-metal area consumed by the transmission lines was 15%. A general feature of the RTWO and ROA is that power can be reduced by increasing the metal area devoted to clock generation. The simple substitution of copper metallization could halve the width of the lines for the same power consumption.
The prototype features a large ring independent of four interconnected smaller rings. The 12 000-m outer ring uses 60-m conductors on a 120-m pitch, with 128 62.5-m/25-m inverter pairs distributed along its length.
For the large ring, simulations predicted a clock frequency of approximately 925 MHz. Measurements of the actual performance versus simulated with V are shown in Fig. 17 . The oscillation frequency was 965 MHz. Jitter was measured at 5.5 ps rms using a Tektronix 11 801A oscilloscope with an SD-26 sampling head.
The slower than simulated rise-time discrepancy is believed to be due to the large extrinsic gate electrode resistance on the Pch FETs. At design time, the importance of this parameter was overlooked. Transistors are now laid out according to RF design rules with the gate driven from both sides of the device. Fig. 18 shows that the oscillation frequency versus is quite flat over a large . We calculate from the measured slope that PSRR is approximately 34 dB for oscillators fabricated on this process. The oscillator was seen to be functional down to 0.8-V supply voltage, although 1.1 V was required to initiate startup.
The test chip incorporates 15 pF of on-chip decoupling capacitance per ring. No off-chip decoupling was required. Effectively, the equivalent of ten single-ended lines each having 10 impedance were active, but simultaneous switching surges are low because of the distributed switching times of the inverters.
The quad of inner rings each have the following characteristics:
• Fig . 19 shows the measured waveform from one of the 3.4-GHz rings. The oscillation frequency is 3.38 GHz versus a simulated frequency of 3.42 GHz. However, the waveshape is disappointingly distorted, the amplitude is low, and even-mode artifacts are visible.
Investigation of the fault identified a 'co-parallel' (0 ) inductive coupling problem between the clock signal lines and and supply traces running directly beneath on M3 for the complete loop length. Only when a complete FASTHENRY analysis was performed including these power traces was it apparent that induced current loops (circulating through the decoupling capacitors) were strongly attenuating the rotary signal. In this condition, the latching action (Fig. 7) does not fully develop and the rings support linear amplification of noise signals-hence the problematic multimode action. (This effect was much less severe on the large 965-MHz ring because the lines were much closer to the magnetically neutral center line of the transmission line). The problem can be mitigated by use of braided transmission lines. (as detailed in Section IV-C).
Analysis of the test chip showed that 90 coupling between M5 and the orthogonal thin M4 lines is not a significant problem, making it possible to route power and signals between regions bounded by the rotary clock structures.
VI. CONCLUSION AND FURTHER WORK PLANNED
This paper has described the rotary traveling-wave oscillator (RTWO) and its potential application to gigahertz-rate VLSI clocking. The oscillator is unique for a resonant-style LC-based oscillator in that it produces square waves directly and can be hardwired to form rotary oscillator arrays (ROAs). Being LC-based, the oscillator is stable and jitter is low.
The formulas presented here give practical adiabatic oscillator designs suitable for VLSI fabrication. The structure and operation of the RTWO is fundamentally simple and amenable to analysis. We find that agreement between simulation and measurement is good.
We need to demonstrate skew control (believed to be inherent) to fully establish that the simulated performance of multiring ROAs is realizable, and to measure susceptibility to induced high-frequency noise. Further work is planned to establish firm mathematical/analytical foundations for the prediction of both jitter and skew and to determine exact stability criteria for arrayed oscillators. Currently, a test chip using braided transmission line design to minimize coupling and incorporating varactors to control frequency is awaiting packaging and test.
Looking to the future, our simulations predict that the oscillator scales well. On a more modern 0.18-m copper process, 10.5-GHz square-wave oscillator/distributors should be realizable consuming less than 32 mA per ring using slimmer 10-m conductors. From simulation, the RTWO also appears to be viable on SOI processes.
