The clock distribution and generation circuitry forms a critical component of current synchronous digital systems. A digital system's clocks must have not only low jitter, low skew, but also well-controlled duty cycle in order to facilitate versatile clocking techniques. In high-speed CMOS clock buffer design, the duty cycle of a clock is liable to be changed when the clock passes through a multistage buffer because the circuit is not pure digital [8] . In this paper, we propose a pulsewidth control loop referred as MPWCL (modified pulsewidth control loop) that adopts the same architecture as the conventional PWCL, but with a new pulse generator and new charge pump circuit as a constituent of the duty cycle detector. Thanks to using new building blocks the proposed pulsewidth control loop can control the duty cycle in a wide range, and what is more important it becomes operative in saturation region too, what provides conditional for fast locking time. For 1.2 µm double-metal double-poly CMOS process with V dd = 5 V and operating frequency of 133 MHz, results of SPICE simulation show that the duty cycle can be well controlled in the range from 20 % up to 80 % if the loop parameters are properly chosen.
Introduction
With the rapid advances in deep-submicron CMOS process, modern digital systems operated from hundred kilohertz up to few gigahertz have been successfully developed for several years, such as high-speed high-performance superscalar and VLIW microprocessors, network processors, double data rate SDRAM, and so forth. Since there are more and more functional blocks integrated on the same chip as guided by the concept of system-on-a-chip and system-on-silicon, the skew, jitter, and asymmetric duty cycle clock signal become bottlenecks in realizing high-speed and high-performance digital systems [1] .
In order to minimize the negative effects caused by skew and jitter of clock signals, phase locked loops (PLLs) and delay locked loops (DLLs) are used [2] . In applications where frequency multiplication is required PLLs represent good candidate design solutions. From other side, where no clock synthesis is required, DLLs offer an attractive alternative to PLLs due to their better jitter performance, inherent stability, and simpler design [3] .
In systems that adopt a double data rate technology, both rising and falling edges of the clock are used to sample input data. These systems require that the duty cycle of the clock be precisely maintained at 50 %. Therefore, how to generate a clock with precise 50 % duty cycle for high-speed operation is an important issue [4] . Automatic control technology, such as pulsewidth control loop (PWCL) has been widely used for adjusting the output duty cycle of multistage driver for several years [4, 6, 8] .
In this paper, we address a new approach to achieve a fast-locking duty cycle PWCL architecture that can be used to control the pulsewidth in multistage clock buffer. Architectural descriptions and principles of operation for three different types of PWCLs, already well known from literature, will be described in Section 2. Section 3 describes the structure of a proposal, referred as modified pulsewidth control loop (MPWCL). Simulation results are illustrated in Section 4. Section 5 concludes this paper with summary.
Related work
Clocking is one of the single most important decisions facing the designer of a digital system [5] . The clock signal is used to synchronize different parts of a digital system, and the quality of the clock signal, including frequency, amplitude, phase, and duty cycle, undoubtedly influences the system performance [6, 7] . Currently, the PLLs and DLLs are mainly used for aligning frequency and clock phase, while the PWCLs are intended to control the duty cycle of the clock signal generated from a multistage driver [6] .
In high-speed design a multistage clock buffer implemented with a long inverters chain is often needed to drive a heavy capacitive load. For these design, it is a difficult to keep the clock duty cycle at 50 %. When the clock signal passes through a multistage buffer, the pulse width may be change due to the unbalance of the N and P channel transistors in the long buffer. This unbalance is introduced by many factors, such as process deviations, temperature changes, or mismatch in design. As a consequence, the clock duty cycle will wonder away from 50% and in the worst case, the clock pulse may disappear inside the clock buffer as the pulsewidth becomes to narrow or too wide [8] .
To overcome these problems, PWCLs have been proposed in [4] , [6] and [8] . In the sequel we will give a short review concerning the architecture and principles of operation for all three types of PWCLs, referred as conventional PWCL [8] , fixed-phase PWCL [6] , and fast-locking PWCL [4] .
Conventional PWCL
Schematic diagram of the conventional PWCL [8] is pictured in Fig. 1 . As can be seen from Fig. 1 , the conventional PWCL is realized as a system with feedback loop. The feedback loop functionally consists of: a) Pseudo-Inverter Control Stage (PICS) -chosen to be the first stage of the clock buffer and functions as a voltage-controlled pulse-generator. By changing the control voltage, V ctrl , we can adjust the pulsewidth of the output clock. PICS is implemented as simple inverter. Here, mark "*" indicates the transistor being controlled; b) Clock Buffer (CB) -a long inverter chain or buffer which acts as a multistage driver. The number of the stages in the clock buffer must meet a condition that correct feedback is guarantied. When the clock buffer has an even (odd) number of the stages the PWCL is configured as in Fig 1a (1b) ; c) Charge Pump 1 (CP1) -converts pulsewidth into current which charges or discharges capacitor C. At its output, CP1 creates a control voltage V c , i.e. it steers current by the clock pulse for detecting the change of pulsewidth; d) Charge Pump 2 (CP2) -is another identical charge pump that creates a reference bias voltage, V ref , by connecting to a reference clock with 50% duty cycle; e) Amplifier (Amp) -the amplifier is characterized by its gain A, realized as a single-stage operational transconductance amplifier with differential inputs. It is intended to provide a certain gain in the loop at low frequency; f) Reference Pulse (RP) -two stage inverter buffers used to drive CP2 with 50% duty cycle referent clock pulses; g) Loop Filter (LF) -the output resistor of Amp and capacitor C 2 form a first-order low-pass filter. Input signal of the LF is a current source and the output is a control voltage, V ctrl .
As can be seen from Fig. 1 , differential charge pumps are used in order to reduce the noise coming from the environment. The PWCL also needs differential signals in its amplifier inputs. In the proposal, Fig. 1 . two identical single-ended charge-pumps are used. The differential input reduces the difficulties in designing a perfect charge pump and bias circuit. Process dependence and temperature influence can be overcome as they appear in common mode. One of them is used for detecting the pulsewidth of the clock being controlled, and another is connected to a standard clock with 50 % duty cycle for generating the bias V ref . V ref is taken as reference voltage in amplifier Amp. The chargepumps, CP1 and CP2, and the differential amplifier, Amp, are used to act as duty cycle detector (comparator) to generate the control voltage, V ctrl , for the pulse generator PICS.
The pulsewidth of CB is controllable. This means that if the CB's clock output deviates from 50 % duty cycle, the control voltage, V ctrl , will change so that the offset is removed. When the loop is stable the CB output is adjusted to 50 % duty cycle, and the controllable dynamic range covers the range of possible offset.
The conventional PWCL [8] is a nonlinear feedback loop. The control voltage, V ctrl , must be quiet enough to ensure a precise duty cycle as the loop is closed. In order to follows duty cycle variations the loop gain must be keep low, however, with low gain the loop may take a long time to settle. This long settling time reduces the timing budget for other function blocks in a system [4] .
Fixed -phase PWCL
In [6] similar architecture as the conventional PWCL [8] , but with the building blocks replaced by new circuits is described in [8] . Namely, new duty cycle detector and new voltage controlled pulse generator that enable higher frequency operation at lowvoltage in respect to [8] are implemented. The voltage controlled pulse generator consists of NAND gate and two inverter chains. The First chain has fixed, while the second, realized as shunt capacitor delay line, has variable delay. The new duty cycle detector is actually a push-pull charge pump, in a concrete solution used as duty cycle detector. In the proposed phase-fixed PWCL, see Fig. 2 , the clock buffer can include PLL/DLL and PWCL in order to perform phase locking as well as pulsewidth adjustment simultaneously. Identical problem concerning precise duty cycle generation as in [8] are typical for the fixed-phase PWCL, too.
Fast -locking PWCL
A 500 MHz-1.25 GHz fast-locking PWCL with presettable duty cycle realized in 0.35 µm CMOS technology is described (see Fig. 3 ). The fast-locking mechanism is realized thanks to the building blocks enclosed in the dashed line. It consists of a voltage-difference-to-digital-converter (VDDC) and a pair of switched charge pumps (SCP) circuits. The VDDC is used to detect the corresponding linear and non linear regions in a transient process, while the SCP circuits provide different charge pump currents corresponding to the control codes from VDDC and the external codes which are used to preset the duty cycle of CLK out . Compared with the conventional PWCL, the proposed circuit can reduce the lock time by a factor of 2.58. Duty cycles of the output clocks can range from 35 % to 70 % in step of 5 %. Concerning the structures and principles of operation in respect to the conventional PWCL [8] , there are two new building blocks implemented in MPWCL. The first novelty relates to the PICS and the second to CP1 (CP2). The other constituents, pictured in Fig. 4 , are of identical (or almost-identical) architectures as that described in [8] so in the text that follows their analysis will be omitted.
CLK in
Pseudo inverter control stage -PICS - 
PICS
An electrical scheme of the proposed PICS is pictured in Fig. 5a ). It consists of three N -channel N 1 , N 2 and N 3 , and three P -channel P 1 , P 2 and P 3 , transistors. The PICS's equivalent electrical scheme is given in Fig. 5b) . Transistors P 1 and P 2 act as constant and variable current sources, J 1 and J 2 , while transistors N 1 and N 2 operate as constant and variable current sinks, I 1 and I 2 , respectively. Transistors P 3 and N 3 belong to the switching parts of the CMOS inverter. Capacitor C L corresponds to parasitic capacitive load. The constant current sources (sink) J 1 (I 1 ) provide nominal time delay of the leading (trailing) pulse edge at the output PICS out . The bias voltage V bp1 (V bn1 ) is used for correct polarization of transistor P 1 (N 1 ). In addition, the variable current source (sink) J 2 (I 2 ) involves variable time delay of the leading (trailing) pulse edge. The usage of such configuration, allow us to achieve controllable time delay for both, leading and trailing pulse edges. Waveforms generated at the output PICS out for different values of the control voltage V ctrl are given in Fig. 6 . As can be seen from Fig. 6 , the pulse leading (trailing) edge can varies in the range from t 1 (t 3 ) up to t 2 (t 4 ). Time delay variation of the leading (trailing) pulse edge in term of control voltage V ctrl is presented in Fig. 7 . For V ctrl = V dd /2 = 2.5 V time delay for both edges is identical. This means that good symmetry in geometry of P and N channel transistors is achieved. If the control voltage, V ctrl , decreases, the time delay of the trailing edge increases and time delay of leading edge decreases and vice versa.
In Fig. 8 , a range of duty cycle variation in term of V ctrl is shown. Again, for V ctrl = V dd /2 = 2.5 V 50 % duty cycle is achieved. When V ctrl decreases the duty cycle increases, in contrary it decreases.
Charge pump
The second novelty implemented in MPWCL relates to the charge pump (CPx). An electrical scheme of the CPx is given in Fig. 9 . The CPx consists of a pair of current source P 2c and current sink N 2c transistor, and two complementary switches P 1c and N 1c . The shaded block is used for correct polarization of the current source (sink) and belongs to the building block BC (see Fig. 4 ). Let not that both CP1 and CP2 from Fig. 4 . are of identical structure. 
MPWCL implementation and simulation results
MPWCL is a nonlinear feedback loop. In order to design and analyze the loop behavior, its transient mechanism must be investigated.
The mechanism of the MPWCL when CB has an even (odd) number of the stages is as follows. When the pulsewidth of PICS out (Fig. 5) is wide, it will make the pulsewidth on CLK out wide-too (narrow). So, the transistor N 2c (P 2c ) has more time to discharge (charge) the capacitor C 11 , and V c will drop (rise). Then, output V ctrl will rise. As a consequence the pulsewidth of PICS out is narrowed. This means that when the CB has an even (odd) number of stages the Amp is implemented as an inverting (non-inverting) amplifier. The gain of the amplifier is chosen properly so that the loop stability is provided.
SPICE simulation results for a 1.2 µm double-metal double-poly CMOS process are with V dd = 5 V and operating frequency 133 MHz, are presented in Fig. 10 . In concrete case the clock buffer has 7 stages, with tapering factor of 1.
The waveforms in the top are curves of V ref and V c . We start with our analysis from the instant when the system is powered-on (t = t 0 ). This implies that both charge pumps load capacitors, C 11 and C 12 , as well as the low-pass filter capacitor C 2 , are discharged. Due to input offset voltage difference, the output of the Amp, in a concrete case, is set to the upper voltage limit (i.e. near to V dd ). Since C 12 charges faster, in respect to C 11 , at instant to the voltage V ref becomes higher then V c and the output of Amp switches rapidly to lower voltage limit, 0 V. The second waveform in Fig. 10 corresponds to the control voltage, i. e. error signal, V ctrl . According to the transient response of V ctrl , the following three different regions, in the operation of the feedback loop, can be identified: a) From t 0 up to t 1 the loop operates in saturation (non-linear) region. The voltage difference between V ref and V c is very large, concretely from t 0 ' to t 1 the amplifier Amp is saturated, and the control voltage V ctrl is 0 V. Under this condition at the PICS out , pulses of minimal pulsewidth are generated (transistor P 2 is switched on, i.e. J 2 is maximal, a transistor N 2 is switched off, i.e. I 2 is zero, see Fig. 5b ). Let note that contrary to the proposals described in [4] , [6] and [8] where in the saturation region the PWCL is inoperative, i.e. CLK out is blocked, in MPWCL pulses of minimal duty cycle, at the output of CLK out , are generated. This possibility provides a condition for fast loop locking time.
b) As the input voltage difference becomes small enough the amplifier Amp enters in linear region what corresponds to the time interval from t 1 up to t 2 . The circuit's model for this region can be described by the following second order transfer function: c) Steady-state region characterizes stable-loop operation and corresponds to the time interval after t 2 . During this period, variations of V ctrl are less than ±25 mV, i.e. 1.8 % in respect to V ctrl (1.5 V).
The lower two waveforms in Fig. 10 , depict CLK out pulses valid for saturation region and steady-state region, respectively. As can be concluded from Fig. 10 , the duty cycle of CLK out in the saturation region is 20 %, and in the steady-state region is 51 %.
Conclusion
Numerous methods for distributing a clock within a VLSI IC has been discussed in the research literature over the years [9] , from the more obvious solution of using asynchronous communication between locally clocked regions [10] to more fancy methods like distributing and standing wave on the clock-wire across the whole chip [11] . However, most of today's research is targeted towards reducing the clock-skew, jitter and symmetrical duty cycle by improving current clock distribution methods. The clock distribution tree within the VLSI ICs is so large and carries so much capacitance that buffers need to be inserted just to be able to drive the clock-tree in order to have a reasonable clock waveform. When the clock passes through a multistage buffer changes its duty cycle. In order to obtain a satisfactory duty cycle correction a fast locking MPWCL was proposed. The MPWCL adopts almost identical architecture as conventional PWCL [8] but with two building blocks (charge pump and pulse generator) replaced by new simple structure circuits. With the new building blocks, the duty cycle can be controlled in the wider range, in respect to conventional PWCL, and what is also important fast
