I. INTRODUCTION
C LOCK distribution is a widely demanded task but there are still fundamental issues to be solved, especially when reducing the timing skews becomes critical. For example, in systems with large number of digitization channels, often the timing skews of the clock signals driving the digitization devices in different modules are to be controlled within a small range. Even in a printed circuit board, multiple devices may need to be driven by clocks with identical phases.
The most commonly used clock distribution scheme between modules is the parallel fan-out scheme as shown in Fig. l(a) . The master clock is input to a fan-out module that produces multiple copies of the clock. The connections between the fan-out module and the receiving modules are point-to-point and usually are terminated to match the cable impedance at both transmitting and receiving ends to eliminate reflections due to impedance mismatch. There are several issues in this scheme. First, the timing skews of the buffers and the connecting cables in different output channels are to be matched well in design stage but may vary due to temperature change in operation or replacement of spare cables. Secondly, the power conswnption in this scheme is relatively high. The author is with Fermi National Accelerator Laboratory, Batavia, TL 60510 USA (phone: 630-840-8911; fax: 630-840-2950; e-mail: jywul68@ fual.gov). An alternative scheme is to send the clock into a multi-tap cable and the receiving modules pick up the clock at the taps using high impedance buffers as shown in Fig. l(b) . Usually, the cable is terminated at both ends. Power saving is an obvious advantage in this scheme but the skews in different modules are large due to propagation delay in the cable.
I 'I � �-[!Jrrl
If the termination at the far end of the cable is removed, the clock signal is reflected back from the cable end. The reflecting signals carry useful information regarding the cable length and can be used to compensate the timing skews. In Reference [1] , narrow pulses are sent in an open-ended transmission line and reflected.
The mean-times of the transmitting and reflecting pulses are identical at all the taps. A circuit called Interval-Halving PLL is utilized to extract the mean-time of the pulse pairs. The cable delay can also be compensated by adjusting the internal delays of each node [2] . The times of the edges of the pulse pairs may also be measured using time-to-digital converter (TDC) and processed digitally [3] [4] .
All these schemes require a relatively complex circuit in each receiving node, which is inconvenient to the module designers.
In this paper, we describe a novel clock distribution method, the trapezoidal clocking in an open-ended transmission line. The cable connection is similar as in regular multi-tap clock distribution shown in Fig. l(b) , except the impedance is only matched at the transmitting buffer but not the far end of the cable. The receiving circuit is simply a high impedance comparator with the DC level of the clock pulses as the reference threshold. When trapezoidal clock pulses with sufficiently long rising and falling ramps are fed into such a transmission line, the timing skews at different taps due to propagation delays can be compensated naturally as a result of the superposition of the transmitting and reflection signals.
II. LINER RAMpING IN OPEN-ENDED TRANSMISSION LINES
The inspiration for the trapezoidal clocking scheme came from the long-abandoned analog mean-timer schemes [5] [6] . To find the mean-time of two pulses in the analog mean timers, a voltage is ramping at a slope of 1 unit when the first pulse arrives and at a slope of 2 units when the second pulse arrives. This process can be conducted in an open-ended cable naturally without any circuits when a trapezoidal pulse is fed into the cable. This principle is illustrated in Fig. 2 . Consider a tap of the open-ended cable with a timing separation x from the cable end. The transmitting trapezoidal pulses arrives the tap (solid line) first and then arrives the cable end (dashed line) with a time delay x as shown in the top traces in Fig. 2 . Since the cable is not terminated, the pulses are reflected back immediately with the same polarity. The voltage measured at the cable end is still a train of trapezoidal pulses, except the amplitude doubles. If the cable is lossless, and if the ramping time of the pulses is larger than 2x, the voltage measured at any tap in the cable will also have a double amplitude peak value. The most attractive feature is that the traces form all taps overlap in a section near the zero crossing point. (We define the DC level of the pulses as the zero-crossing reference threshold.)
The timing skew is compensated naturally.
Assume that the ramping is linear and define the zero crossing at the cable end as the time reference point (t=0). The transmitting and reflecting ramps with slope h at any tap can be written:
During the time interval that both the transmitting and reflecting ramps are present at the tap, the superposition of these ramps can be written:
It can be seen that there is no dependence on x in the function shape of the sum voltage. The sum voltages at all the cable taps have the same values near the zero-crossing point. With a comparator in each tap connected to the sum voltage and use the DC value of the pulses as the threshold, all the comparators flip at an identical time with skews due to cable delay cancelled.
It can be seen that the sum voltage at a tap first ramps at a slope h due to transmitting signal and then ramps at a slope 2h after reflecting signal arrives. This is same process as in the analog mean-timers, except it happens naturally in the cable.
Oscilloscope traces of trapezoidal clock pulses inside a coaxial cable are shown in Fig. 3 . The frequency of the clock used in this paper is 5 MHz which is the lower limit for many phase lock loop (PLL) circuits in today's FPGA devices. The clock pulses have a nominal leading ramp time 50 ns and a nominal trailing ramp time 50 ns that allows a maximum cable length of 25 ns and is sufficient for clock synchronization between modules within a few meters.
In Fig. 3 , trapezoidal clock pulses are fed into typical RG 1741U cables, 4 ns per segment. The cables are connected through T connectors that are plugged into the oscilloscope inputs with AC coupling at high impedance. The signal passes through channel l, 2, 3 and 4 of the oscilloscope to emulate modules in a system and the cable is either tenninated with a 50 ohms resistor at channel 4 or left open.
As shown in the top traces of Fig. 3 , when the cable is terminated, the reflection wave is eliminated and all channels receive the clock pulses of the same shape with a time delay approximately 4 ns between adjacent channels.
If the termination resistor is removed, as shown in bottom traces in Fig. 3 , the waveforms seen in different channels are different due to addition of the reflecting waves. However, the zero crossing times in all channels are identical.
It should be pointed out that the voltage ramping slope at the zero-crossing doubles when the tennination is removed which is preferable for reducing timing jitters. Given a fixed clock buffer driving ability, the open cable approach uses the signal energy more efficiently by reusing reflecting signals.
The pulse shape needs to be neither an isosceles trapezoid nor the ramp times and flat times need to be evenly divide the pulse period. Other shapes with a linear ramp, such as triangle wave, saw teeth wave also exhibit similar isochronal property. (These shapes are simply special cases of trapezoid.) The reason a relatively symmetric shape is chosen is to avoid rapid voltage changes in all the taps so that the noise due to high frequency components can be minimized.
There are deviations from ideal transmission lines in actual system such as non-infmity impedance at the input of each channel which is associated the non-flat tops of the waveform in the lower traces. Some effects are cancelled themselves due to symmetry. For example, the non-flat tops of the waveform may change the charge integrations of the positive and negative portions of the pulse. Due to symmetry, however, the variations of the charges in the positive and negative portions are the same so that they cancel each other. Therefore, the DC level of the clock pulses is kept unchanged and isochronal property of the zero-crossing in all channels is maintained.
III. HIGHER ORDER EFFECT DUE TO CABLE Loss
For small systems such as a few modules within a crate, the total length of the connecting cables can be kept short and if high quality cables are chosen, the resistive loss of the cables is negligible. In this case, a simple trapezoidal clock pulse with linear ramp as described in the previous section will provide a sufficiently good synchronized clock to all modules.
In practical systems, the cables are never loss less and often relatively long cables are needed. Resistive loss of the cables causes incomplete cancelation of the time skews in linear ramps that will be addressed in this section. This effect is shown in Fig. 4 . Fig. 3 (except the pulse amplitude is changed to 2 V -pp). When the cable is terminated, as shown in the reference traces, the zero-crossing times between adjacent channels differ by about 4 ns. When the termination is removed, the timing skews are cancelled mostly but there are some residual skews. The timing difference between channel 1 and channel 4, for example, is about 950 ps. This value is significantly smaller than 12 ns if the cable is terminated, but is still too big in some applications.
To understand the effect of the cable loss, we consider a generic ramping function fit) as a sum of an even and an odd terms. We will also assume the ramping function pass across o at time t=0.
The cable loss can be modeled with exponential signal attenuation. For convenience, the cable end is used as reference point for both time and cable length. Therefore at any tap with time delay x from the cable end, the transmitting signal is higher than that probed at the cable end while the reflecting signal is lower due to the cable loss. The sum voltage of the transmitting and reflecting signals can be written: 
= e +ax f_(x) + e +ax f+(x) + e -ax f -( -x) + e -ax f+( -x) (5) = e +ax f_(x) + e +ax f+(x) -e -ax f_(x) + e -ax f+(x)
It can be seen that if the cable is loss less, and if the ramping voltage is any odd function/(x), (not just linear ramping), the time skews is cancelled at any tap, i.e., V(O,x) = 0, regardless the value of x. A very attractive odd ramping function is the sine function which will be discussed in later sections.
When the cable loss is not negligible, the ramping function should contain a small even term in order to maintain isochronal zero-crossing. The function form of the even term is determined by the odd term, as given in the following.
f+(x) = -tanh(ax)f_(x) (6) The hyperbolic tangent function can be approximated as ax since the cable attenuation is usually small. Therefore, if the odd term of the ramping voltage is a linear function, the even term has a simple form ax 2 .
Therefore, to further cancel the time skews due to cable loss, simply add a second order term to the linear ramping voltage. The ratio between the linear and the second order terms is to be adjusted according to the cable attenuation. Note that adjusting this ratio is only needed for different types of the cables, not the lengths of the cables to be used in the system.
The effect of adding a small second order term to the linear ramp can be seen in The clock pulse used in this test is shown in the upper trace of Fig. 5(a) which is a sum of the linear ramp trapezoid and a small second order term as shown in middle and lower traces in Fig. 5(a) . The pulses are sent into the open-ended cable and the skews of zero-crossing times of between channels are now well below 50 ps as shown in the traces in the right portion of the screen in Fig. 5(b) . The zero-crossing time skews of the linear ramps are displayed in the middle of the screen for comparison. Timing jitters of these ramps are also measured with RMS values around 20-30 ps.
Another interesting waveform is the exponential ramp generated with a simple RC circuit. The exponential ramps, the trapezoidal pulses and their sum are shown in Fig. 6(a) . The exponential ramp contains the linear, the second order and the higher order terms which are summed with the trapezoidal pulse. When the ratio between the linear and the second order components is adjusted to a correct value according to the cable loss, a very good skew cancellation can be achieved as shown in Fig. 6(b) . The nice feature of the exponential ramps is that the fme adjustment of the ratio can be done by changing the phase relationship between exponential ramps and the trapezoidal pulses while keeping the amplitudes of both unchanged. This is especially convenient when the two functions are derived from the logic level outputs from the FPGA.
IV. HARDWARE CONSIDERATIONS
The philosophy behind the trapezoidal clocking scheme is to keep the clock receivers simple. As seen in previous sections, the clock pulses with appropriate shapes are generated in the clock driver module.
In each module receiving the clock, only a simple zero-crossing comparator is needed to reproduce an isochronal clock. We discuss the hardware for the clock driver and receivers in details in this section.
A. The Clock Receiver Circuit
The clock pulse voltage should be picked up in each module with minimum perturbations to the transmission line. The input connector is immediately connected to a high impedance buffer with a shortest possible trace. The buffer output is sent to a comparator to produce clock pulses in logic levels. The threshold level of the comparator is as crucial as the clock pulses since it affects time skew of the reproduced clock significantly.
It is possible to use ground as the common reference and the comparator threshold if both positive and negative powers are available in both the clock driving module and the clock receiving module. In systems with single polarity power supplies, the clock pulses are transmitted in the coaxial cable with a DC offset. The comparator threshold will be a middle rail voltage and the clock pulse will be AC coupled to the comparator input. A possible circuit is shown in Fig. 7 .
Cl�
RI Vcc/2
Fig. 7. A possible clock receiver circuit
Usually the time constant Rl *Cl is chosen to be at least lOOO times larger than the clock period. It is recommended that the middle rail reference voltage is bypassed to the ground at the comparator threshold pin.
B. The Clock Pulse Generator
It is a common temptation to use a digital-to-analog converter (DAC) to generate signals with complicate shapes. In our case here, however, if the entire clock pulse is generated using a DAC, a high speed and high precision device would be needed. It is recommended to generate the trapezoidal pulse and the second order term in separate circuit and sum them together as shown in Fig. 8 . In this case, the second order term is first reduced using a voltage divider before adding to the trapezoidal pulse since the second order term is relatively small. This way, the speed and precision requirement to the DAC or other circuit used to generate the second order term are reduced. The second order term can also be generated using analog circuits, such as cascaded integrators. If the exponential ramps described in previous section are utilized, they can be generated with a simple RC circuit. The two outputs from the FPGA have opposite logic levels to reduce digital noise in the ground plane. (However, they are assigned in L VCMOS or other low impedance voltage source output standard, not L VDS or other differential output standards.) The outputs are driven in 1, Z, 0, Z (Z=high impedance) sequence with 50 ns per step. One output from the FPGA is used to charge the integrator while the other is used to establish the reference voltage level for the integrator. Consequently, symmetrical trapezoidal pulses are generated through the charging and discharging processes. The time constant RC is determined based on the required output voltage swing. The resistor Rl and capacitor Cl are used to establish appropriate DC offset in normal operation and their values are large.
C. Cable Detection and Characterization
Since the clock pulse is not terminated at the cable end, but rather, reflected back, it is possible to extract useful information regarding the cable length in the clock driver module. The difference of the zero-crossing times before and after the serial impedance match resistor is approximately the cable length. This time difference provides a continuous monitor on the cable status. Cable delay variation due to temperature change can be measured and can be used for high precision offline calibration. It will also sense lose connector or resistance change due to oxidization of the metal contact points.
The zero-crossing time is measured with the TDC implemented in the FPGA. Our previous works [3] have shown that the TDC in today's FPGA devices is able to make time measurement with a resolution better than 20 ps.
It is useful that a clock driver can detect whether the cable is plugged into its output. If at the initialization stage, the clock driver detects no cable, it can be turned off to reduce power consumption and reduce noise. Obviously the zero crossing time difference at both ends of the impedance matching resistor is 0 if no cable is plugged to the output.
It is possible to measure the cable attenuation when the threshold of the comparator can be changed by the FPGA. From the slopes of the transmitting and reflecting ramps, the cable attenuation can be calculated. However, the most straightforward way to measure the effect of the cable loss is to route the cable end back to the clock driver module and directly measure the timing skew of the first tap and the cable end as shown in Fig. 8 . In the initialization stage, the FPGA can make the fine adjustments on the second order tenn so that this skew is totally cancelled. An additional symmetry is further added by driving the cable with the same clock pulses feeding into both ends so that signals propagate in both directions. This configuration can be viewed as the two branches multi-tap cables connected at the cable ends and the cable length for each branch is reduced by half that potentially improves overall performance. Both branches can be used to distribute clock to drive receiving modules.
In this configuration, the "cable end" is actually the middle point of the cable.
D. External Clock Alignment
In very large systems, the clock signals are to be distributed in several cascaded stages. In this situation, the clock driver will receive clock signals from previous stage and reproduce the clock signals to the later stages with the same phase.
In Fig. 8 , when an external clock is detected, the phase lock-loop (PLL) in the FPGA will switch the clock source to follow the external clock and the PLL will drive the pulse generator circuits. It is very difficult to control FPGA internal delays so a measure-and-adjust approach is taken. When the generated clock becomes stable, the zero-crossing times of the external clock and the generated clock are measured with the TDCs.
The output clock phase from the PLL can be dynamically adjusted in today's FPGA devices. The FPGA will adjust the phase of the generated clock so that the time difference between the generated clock and the external clock vanishes.
E. Pulse Width Coded Clock
A complete clock distribution task consists of synchronization in microscopic scale and macroscopic scale. The phase alignment described earlier is the microscopic synchronization and the macroscopic synchronization, i.e., which clock cycle is the O-th cycle, is the same important. The macroscopic synchronization is usually done by distributing a reset signal in a separate set of cables. It is more convenient if the reset signal can be distributed with the clock pulses.
To most clock receiving devices, only the timing of the leading edge is critical and the trailing edge time is insensitive. So it is possible to carry infonnation in the clock pulses by changing the widths of some pulses. To maintain DC balance, the numbers of wide and narrow pulses must be the same.
Therefore it is possible, for example, to distribute the reset command with a wide-narrow clock pulse sequence. The clock receivers issue the reset signal after seeing the wide narrow sequence. This reset distribution method is a special case of the "clock-command combined carrier coding" (C5) scheme [4] . It is possible to carry more information and command such as register setting, trigger etc. using the C5 scheme in the clock pulses.
V. SlNEWAVECLOCKlNG
As discussed earlier, any odd functions can be used for generating isochronal crossings in open-ended cables. The sine function is an interesting example that is worth for more study.
The most useful feature of sine function is its mono frequency property. Real cables used in systems may have frequency-dependent characters such as attenuations and wave velocity.
The frequency-dependent characters cause waveform shape changes when a wave is propagating in a cable.
Only single frequency sinusoidal waves remain sinusoidal with same frequency after long propagation in cables. Therefore, the sine wave is suitable for clocking in relatively long cables. An example is given in Fig. lO .
Tek5top
The length of each cable segment used in Fig. lO is 8 ns, comparing with 4 ns in tests described in previous sections. It can be seen that when the termination is removed, the voltages at all the taps have a common crossing point.
If the cable is lossless, the transmitting and reflecting waves are added into a standing wave. The amplitudes of the voltage at different taps are different but they all oscillate with an identical phase.
Due to cable loss, the common isochronal crossing point is not at zero voltage level, but is slightly higher for rising ramp and is slightly lower for falling ramp. This phenomenon can be coarsely viewed that non-zero even components in the ramping function are needed when the cable loss is considered as indicated in Equation (6) .
For clock synchronization between modules using coaxial cables, the comparator in each clock receiving module will be DC coupled to the cable tap. The clock driver may send the sine wave with a small DC offset so that the thresholds of the comparators in all modules are set at the isochronal crossing level for either the rising or falling edge. Power supplies with both positive and negative voltages may be needed for the clock driver and clock receiving modules.
The most suitable application for sine wave clocking is synchronization using differential cables. The two input pins of the comparators in the modules are simply connected to the two wires of the cable directly as shown in Fig. ll(a) . The clock driver sends the wavefonn into two conductors of the differential cable with opposite polarities. The DC levels of the two waveforms are slightly different to compensate the cable loss. As shown in Fig. ll(b) , isochronal condition is fulfilled for one edge when an appropriate voltage difference is chosen for a given type of cable. (Note that one set of traces are higher and the other set of traces with opposite polarity is lower.)
The common mode DC level can be chosen so that entire waveform is in positive voltage range allowing both the clock driving and receiving module operating with single polarity power supplies. In Fig. 11(b) a DC offset 1.25 V is arbitrarily chosen to show this feature.
It is relatively difficult to produce pure sine wave using digital outputs from FPGA. An alternative is to use other functional forms to build sine-like functions as an approximation. In Fig. ll(b) , the actual function driving the cable is the integration of isosceles triangle wave, which can be produced from FPGA digital outputs using cascaded integrators. The DC level for cable loss compensation is slightly different for sine wave and this sine-like function, but they exhibit similar properties since the fust and the second order terms are cancelled similarly in both waveforms.
The disadvantage of sine or sine-like wave clocking is that it is difficult to stretch or shrink the width of the pulse to code information into the clock pulses. However, when differential clock signals are transmitted using ribbon cables or backplanes, additional differential pairs are usually available for transmitting reset or other fast signals. Therefore, this disadvantage is not as significant as in the situations using coaxial cables.
VI. CONCLUSIONS
Trapezoidal clock pulses in open-ended loss less cable produce isochronal crossings at the taps of the cable. This phenomenon can be used for clock synchronization with timing skews cancelled naturally. Variations of trapezoidal clocking are studied for cable loss compensation. Timing skew cancellation well below 50 ps has been achieved with typical cables.
