I. INTRODUCTION
T HE insufficient resolution obtained in digital pulsewidth modulators (DPWMs) has been one of the main obstacles to the expansion of digital control in the field of switching-mode power supplies. DPWM resolution is a problem mainly for two reasons. The first one is that high DPWM resolution is needed in order to avoid limit cycling [1] , [2] . In fact, DPWM resolution needs to be higher than AD converter (ADC) resolution to avoid this effect. Therefore, DPWM is an indirect limit to the precision of the measurement of the output voltage. The second reason is that DPWM resolution is inversely proportional to the switching frequency. This has made the use of digital control impractical for high switching frequencies (over 1 MHz).
Given this interest, many papers have appeared in the last few years studying alternatives on how to increase the resolution of DPWMs. A first group of studies focuses on the minimum time step that the DPWM can obtain, by changing the hardware architecture of the DPWM [3] - [15] . A classification of these DPWMs has appeared [16] , probably the most important criterion being the time quantization scheme: single-element time quantization (e.g., counter-based DPWM), 2 N -element time quantization (e.g., delay-line-based DPWM), and multipleelement time quantization (e.g., hybrid DPWM). The second group of studies tries to increase the effective duty cycle resolution changing the pattern used in the generation of the output signal. Digital dither [1] , [13] , sigma-delta [7] , [12] , [17] , and slightly changing frequency [18] group. The present study is part of the first group, proposing a new architecture for reducing the minimum time step. To be precise, it is a hybrid DPWM. The proposed DPWM can be used with any of the techniques of the second group for increased resolution, as both groups are complementary. For example, the proposed DPWM can be enhanced using digital dither or sigma-delta techniques. The rest of the letter is organized as follows. Section II describes the proposed DPWM architecture. Section III shows the experimental results. Finally, Section IV gives the conclusions.
II. PROPOSED DWPM ARCHITECTURE

A. Basic DPWM Structure
The proposed DPWM takes advantage of the capability of field-programmable gate arrays (FPGAs) to shift the phase of the clock in small increments. Nowadays, almost every FPGA includes this feature in its internal clock management system, which is used for synchronization with external memories among other applications. A Virtex-5 device from Xilinx has been used in the experimental results of this letter. The resolution obtained is an order of magnitude beyond the resolution of other state-of-the-art proposals.
In Xilinx FPGAs, the block that shifts the phase of the clock is called digital clock manager (DCM). Apart from other features, such as frequency multiplication or division, the DCM includes the ability to shift the phase of the output clock with respect to the input clock using some additional control signals (see Fig. 1 ), basically a request-acknowledge protocol. This is an FPGA advanced feature called fine phase shift. The proposed DPWM is based on it. Its resolution is 1/256 of the input clock cycle or one tap delay of its internal delay-locked loop (DLL), 0885 -8993/$26.00 © 2010 IEEE which ever is greater. According to the Virtex-5 datasheet, the delay of a tap is between 7 and 30 ps. This resolution is, to the knowledge of the authors, finer than those of other stateof-the-art DPWMs [3] - [15] . Most of the previous results are in the order of 1 ns, with a few exceptions: 488 ps in [10] , 390 ps in [12] , and 255 ps in [11] . The resolution shown in the experimental results, 19.5 ps, goes well beyond these numbers.
Phase shifting in a DCM works as follows. The input clock is used as a time reference. Using the appropriate control signals, the relative phase of the input clock CLKIN and the output clock CLK0 can be changed in steps of 1/256 of the clock period (see Fig. 1 , variable fine phase shift). The DCM uses delay taps for adjusting the relative phase. In fact, the relative phase is adjusted to a number of delay taps that approaches the solution to the demanded phase, as much as possible. Any resulting phase, obtained by single increment steps, from 0
• to 360
• is possible, as long as the clock period is below the total delay of all the delay taps together.
The basic structure of the proposed architecture is shown in Fig. 2 . It is a hybrid architecture in which the MSBs of the duty cycle are handled in a synchronous block and the LSBs in an asynchronous block.
The synchronous block manages the MSBs. It is in fact a counter-based DPWM. If this block manages M bits, the internal counter ranges from 0 to 2 M − 1. Whenever the M MSBs are below the value of the counter, the main output of this block [synchronous high-side MOSFET (Sync HSM)] is high and low in the opposite case. This block is also in charge of the deadtimes, and that is why a second output [synchronous low-side MOSFET (Sync LSM)] is included. If there were no deadtimes, Sync LSM would simply be the inverse of Sync HSM. Dead-times are included to manage topologies with more than one switch, such as the synchronous buck converter. It is also important to note that the synchronous block uses the input clock CLKIN and not the shifted clock generated by the DCM. The reason will be apparent in the following.
The asynchronous block manages the LSBs. It uses 8 bits because the DCM of Xilinx FPGAs has 256 steps of fine phase shift. The rest of the asynchronous block uses the clock CLK0, which is shifted with respect to CLKIN in all the possible range, from 0
• , depending on the eight LSBs. A simplified version of this block is shown in Fig. 2 , which is valid for explaining how it works, but is not the complete circuit (see Fig. 5 ).
The way in which the DPWM output for the HSM is generated is as follows (see Fig. 3 ). The MSBs determine the number of clock cycles that the output has to be active through Sync HSM, which is also the set signal of the reset-set (RS) latch at the output. The eight LSBs are used for defining the fraction of clock cycle to be added at the output. These bits go to the block in charge of controlling the fine phase-shift process, which is a finite-state machine that shifts the CLK0 clock by single steps, until the phase between CLKIN and CLK0 is equal to the fraction of clock cycle to be added. The RS latch is reset when the CLK0 signal is active, but only after the set signal is already inactive (and gate, inverter). Therefore, the output signal (HSM) is active integer number of clock cycles (determined by the MSBs) plus a fraction of clock cycle (determined by the LSBs).
A symmetric technique is used for generating the LSM output, using CLK0 for the set of LSM and the synchronous signal Sync LSM (synchronized with CLKIN) for the reset.
B. Complete DPWM Structure
This technique, as represented in Fig. 2 , would not work for phases above 180
• . In these cases, the reset signal would also be active at the beginning of the clock cycle, resetting the output before expected. This problem is shown in Fig. 4 , which also shows the desired behavior in a discontinuous line. In order to avoid this problem, the real architecture is more complex than the one presented before. Fig. 5 shows the final architecture used in the asynchronous block. The synchronous block remains unchanged and is not repeated for the sake of simplicity. As can be seen, a register active at the rising edge of CLK0 drives the reset signal of HSM, avoiding the problem. It could seem that the register that drives the reset signal of HSM is the only necessary one in Fig. 5 , and that the multiplexer and the rest of registers are not necessary. This simplistic solution would be valid for ideal behavior, without taking into account delays. However, delays play a critical role in this circuit. The typical delay in a path inside an FPGA is in the order of 1 ns, which is much greater than the expected DPWM resolution (19.5 ps). Therefore, registers have been added for two reasons. One is to avoid the problem shown in Fig. 4 for phases near 0
• or 180
• , since the path delays can easily violate the setup time of two registers with similar phases. For instance, CLKIN and CLK0 have similar phases when the phase is near, but not equal to 0
• . That is why different paths are used for phases under and above 180
• . Each path includes a register driven by the opposite clock edge; therefore, the setup time is not violated. The second reason is to enable accurate delay equalization. Making a manual placement of the registers shown in Fig. 5 , very accurate path equalization is obtained. The timing diagram of the proposed complete structure is shown in Fig. 6 .
C. Possible Improvements
The main drawback of the proposed DPWM architecture is the phase-shift update time. The phase of the output clock of the DCM must be changed each time that any of the eight LSBs of the duty cycle changes. This is done through a requestacknowledge protocol in steps of 1/256 of the clock period. It has been measured that this protocol can take up to 700 ns for a Virtex-5 FPGA using the prototype of the experimental results. The eight LSBs are managed by changing the fine phase shift; therefore, in the worst case, 255 successive steps would be necessary (a change from 0
• to almost 360 • ). If the update time is critical, a way of decreasing the phase-shift update time is to use more than one clock. These clocks must have fixed phase differences among them. In this way, the maximum phase shift is 360
• /(number of clocks). Fig. 7 shows the basic structure when using four clock lines. The asynchronous block would be the one shown in Fig. 5 , but using CLK_SHIFTED instead of CLK0. In that case, the maximum phase shift would be 90
• and only 64 fine phase steps would be necessary. Taking into account that each DCM in Xilinx FPGAs includes four clock outputs phase-shifted by 0
• , 90
• , 180
• , and 270
• , and that most FPGAs have between two to eight DCMs, the maximum phase shift can be drastically reduced if necessary. Step response connecting the PWM signal to a driver feeding an RC filter. 50 µs/division and 10 mV/division (a) 1 unit step. (b) 8 units step.
The phase-shift update time affects only the transient behavior. During a transient, the maximum error can be up to the clock period divided by the number of clocks used. In any case, the MSBs are changed immediately; therefore, the transient error is relatively small.
III. EXPERIMENTAL RESULTS
The proposed architecture has been designed in very-highspeed integrated circuit hardware description language (VHDL) and implemented in a Virtex-5 (XC5VFX30T-1FFG665) Xilinx FPGA. The Virtex families are formed by high-end FPGAs. Virtex-5 has been chosen to show the best currently achievable performance. The target application has been a four-phases synchronous buck converter. Therefore, each DPWM has been replicated four times, one for each phase. Almost identical results have been obtained in the four phases, thus showing that the proposed DPWM can be accurately replicated. For the sake of simplicity, all the results are shown for a single phase, in order to avoid repeating the same graphs four times.
An external clock at 100 MHz is internally multiplied to 200 MHz. The switching frequency is 6.25 MHz, obtaining 13 bits of resolution (five in the synchronous block and eight in the asynchronous one). Fig. 8 shows the experimental waveforms with a duty cycle of 0.5 for the HSM (upper signal). The signal for the LSM (lower signal) has a duty cycle under 0.5 because of the dead-time. The time step resolution is 1/256 of the internal 200 MHz clock period, i.e., 19.5 ps. In order to obtain the same resolution with a counter-based DPWM, a 51.2-GHz clock would be necessary. It must be taken into account that the obtained resolution is very similar to usual values of clock jitter. For instance, the oscillator used in the experiments F4105-1000 has a cycle-to-cycle jitter of 20 ps, while the DCM of Virtex-5 guarantees a maximum clock synthesis period jitter of 120 ps. Therefore, the obtained resolution of 19.5 ps must be understood as a mean value, i.e., the mean duty cycle changes about 19.5 ps every single step. However, cycle-to-cycle changes that significantly differ from 19.5 ps must be expected due to clock period inaccuracies. In order to show the achieved resolution, the pulsewidth modulation (PWM) signal has been connected to an UCC37324 driver powered at 12 V, whose output feeds an RC filter (1 kΩ, 22 nF). Fig. 9(a) shows a 1-bit step in the duty cycle command. The expected change in the output voltage would be 12/2 13 V, i.e., 1.47 mV, similar to the experimental result (about 1.5 or 2 mV), a very high resolution in spite of having a switching frequency of 6.25 MHz. It can be appreciated that the mean value changes as expected, but there are also other minor fluctuations that can be attributed to multiple jitter sources. Fig. 9 (b) shows a step of 8 units in the duty cycle command; therefore, the expected step would be eight times larger, i.e., 11.72 mV, very similar to the experimental result of about 12 mV. Fig. 10 shows the results of the proposed DPWM in all the duty cycle range. The horizontal axis represents the duty cycle expressed as an integer number instead of a fraction: 13 bits of resolution; therefore, the duty cycle goes from 0 to 2 13 − 1 (8191). The vertical axis represents the ON-time of HSM in picosecond. The upper part of Fig. 10 shows the complete duty cycle range. Very high linearity is observed, as expected, because the five MSBs are used in a counter, which is a highly linear component. It is more interesting to see the detailed behavior when changing the eight LSBs; therefore, the small 19.5 ps steps can be appreciated. The lower part of Fig. 10 shows the ON-time when the duty cycle changes from 1279 (4FF in hexa) to 1536 (600 in hexa), which is a complete clock cycle (500 to 5FF in hexa) plus their adjacent values. High linearity is also observed, but the resolution is so high (19.5 ps) that it is difficult to make precise measurements. A 4-GSps oscilloscope (Agilent 54831D) has been used, by taking the mean value after at least one thousand measurements at each point.
IV. CONCLUSION
A hybrid DPWM based on fine clock phase shifting has been proposed. The architecture uses FPGA advanced clock management capabilities to obtain a very high resolution. The time step is so small that manual placement is necessary in the asynchronous part of the design. The DPWM has been described in VHDL and implemented in a Virtex-5 FPGA. The experimental results show the feasibility of the method, by obtaining good linearity with a time step of 19.5 ps.
