Abstract-
I. INTRODUCTION
Digital control of switching mode power supplies (SMPS) has obtained great research attention due to their now well known advantages [1] [2] [3] , such as programmability, advanced control algorithms, reduced component count, low sensitivity to external factors or aging, ease of design and prototyping, etc. However, their disadvantages are also well known, being the two more important the processing/sampling delay and the limited resolution [4] . Regarding the second factor, resolution is limited mainly by the ADC and the PWM. However, the ADC resolution is becoming a less important problem thanks to the windowed ADC technique [3] and because the PWM resolution needs to be higher than ADC resolution for avoiding limit cycling [4] [5] .
Traditional Digital PWMs (DPWMs) are based on counters (see section II.B). The advantage of these DPWMs is that they are very simple and obtain high linearity. However, their resolution can not be very high, as the minimum time step is equal to the clock period of the counter. Furthermore, their power consumption is proportional to the clock frequency, so trying to obtain a relatively high resolution results in high power consumption. In order to increase DPWMs resolution, delay-lines can be used. Given the interest of obtaining high resolution DPWMs, a lot of different architectures have been proposed in the last years [6] [7] [8] [9] [10] [11] , and even a classification of them has appeared [12] . Although all these architectures differ from each other, almost all of them make use of delay-lines. The main advantages of delay-line DPWMs are high resolution and low power consumption. However, they have lower linearity and even non-monotonic behavior in some cases. Trying to obtain a trade-off between the advantages of counter-based and delay-line DPWMs, hybrid architectures have also been proposed. Changing the weight of the synchronous (counter-based) and asynchronous (delay-line) blocks, there is a trade-off between linearity and power consumption, while keeping the high resolution of delay-line DPWMs as this block is usually in charge of the least significant bits (LSBs).
This paper proposes a new DPWM hybrid architecture. The synchronous block is counter-based. However, the asynchronous block does not implement a delay-line, but it uses FPGA internal resources. Taking advantage of these resources, it obtains an excellent trade-off between linearity and time resolution. The proposed DPWM is, in principle, only intended for FPGA implementation, but it is very simple to design and can be implemented even in the lowest cost FPGAs, as shown in the experimental results.
The paper is organized as follows: next section describes the proposed DPWM architecture, while section III shows the experimental results. Finally, section IV gives the conclusions.
II. PROPOSED DWPM ARCHITECTURE

A. DLL block
The key of the proposed DPWM architecture is that it takes advantage of the advanced DLL features that are available in almost every FPGA nowadays. Digital devices like FPGAs have specific blocks that can manage clock signals: DLL (Delay-Locked Loop) or PLL (Phase-Locked Loop). Using these DLLs or PLLs, it is possible to multiply or divide the clock frequency. Many of these DLLs can also generate four phase-shifted clocks (shifted 0º, 90º, 180º and 270º) directly (Spartan and Virtex families of Xilinx) or allow to generate phase-shifted versions of the clock (Cyclone and Stratix families of Altera, ProAsic3, Fusion or Axcelerator families of Actel).
The first feature of these DLLs that is used in the proposed DPWM is multiplying the clock frequency. The advantage of doing so is that a high clock frequency can be internally used in the DPWM while an external lower frequency is generated and also used in the rest of the digital controller. This is done for two main purposes. It is difficult to drive the package pin and the PCB line at very high frequencies due to their sizes, which are orders of magnitude above the size of internal chip connections (mm or even cm instead of um). Size is also responsible of the second purpose, which is decreasing power consumption. The parasitic capacitance of each element is proportional to its size, and the power consumption is also proportional to the parasitic capacitance. Internal clock multiplication is a well known and widely spread digital technique (i.e. no GHz signal can be found in the PCB of a laptop or desktop computer because they are generated inside the microprocessor). This clock multiplication is proposed for the DPWM. In the implementation of the experimental results, a 32 MHz external clock is multiplied by 4 for a 128 MHz internal clock in the DPWM. Using the multiplied clock, time resolution increases from 31.25 ns to 7.81 ns. However, the rest of the controller modules (i.e. control algorithm) can work at the low clock frequency for decreased power consumption and easier design (longer critical paths in these blocks are valid), as shown in Fig. 1 . This is very useful, given that a counter-based DPWM is very simple and can work at high frequencies, but other blocks in the controller are usually not so simple and usually need lower frequencies. Therefore different clock frequencies are proposed in the architecture: the high clock frequency is used for the DPWM and the low clock frequency for the rest of the controller.
However, the main contribution of the proposed DPWM comes from another DLL feature. Most DLLs in FPGAs also generate phase-shifted versions of the output clock. In many of these DLLs, four clocks shifted 0º, 90º, 180º and 270º are available. This allows us to multiply time resolution by 4 (2 additional bits) beyond the maximum resolution achievable with a counter-based technique. Therefore, it is necessary to use both synchronous and asynchronous techniques that are explained with more detail in the next sections.
B. Synchronous block
The synchronous block is a counter-based DPWM that uses the most significant bits (MSBs) of the duty cycle, d[n- 1, 2] , being n the total number of bits. As it can be seen in Fig. 2 , the synchronous block is based on a counter and comparison structure. The functionality of this block is the following: if the duty cycle command is over the counter value, the output is in the on-state, and when the counter reaches duty cycle the output is turned off. This is a simple block and, therefore, it can work at high clock frequencies.
In the synchronous block, resolution is given by both the clock and the switching frequencies. This resolution can be obtained as:
where f clk is the clock frequency and f sw is the switching frequency.
C. Asynchronous block
The two least significant bits (LSBs), d [1, 0] , are used in the asynchronous block. These two bits are used to select between the four phase-shifted clocks generated by the FPGA's DLL. In fact, these signals are combined using and gates (see Fig. 3 ) in order to obtain other four phase-shifted signals that are high only a quarter of a cycle instead of half a cycle (see Fig. 4 ). The basic idea of using these four phase-shifted signals is obtaining four possible switching instants during each clock cycle. Therefore, resolution is multiplied by four. In general, using m asynchronous bits, the total resolution is calculated as:
where m is the number of asynchronous bits, f clk is the clock frequency and f sw is the switching frequency.
D. Functionality of the DPWM architecture
The proposed DPWM, as shown in Fig. 3 , is composed of two blocks: a synchronous block and an asynchronous block, as explained in the last sections.
The functionality of the proposed hybrid DPWM is described as follows (see Fig. 3 ). The output of the synchronous block (counter-based block using the external frequency multiplied by 4) sets the output of the DPWM depending on the MSBs of d (duty cycle command). The synchronous DPWM uses only the 0º shifted clock, while the asynchronous block needs all four phase-shifted clocks. The proposed DPWM architecture is intended for a synchronous multiphase buck converter, so it creates driving signals for both the high side MOSFETs (HSM) and low side MOSFETs (LSM), as shown in Fig. 3 . Of course, it can be easily adapted to any other SMPS topology. The idea is that the turn-on instant of the HSM is always coincident with a 0º clock edge, while the turn-off instant can be at any of the four clocks edges, depending on the LSBs. The opposite is done for the LSM: turned-on with any of the four clock edges while turned-off with the 0º shifted clock. As it can be seen in Fig. 3 , the asynchronous block generates the signal named QuarterCycle, which corresponds to a quarter of the clock cycle, starting in the rising edge of one of the four clocks, depending on the value of the 2 LSBs of the duty cycle (see Fig. 4 ). For the HSM, the output is reset when the synchronous block is already off and QuarterCycle arrives. Therefore, the output is active and integer number of clock cycles (as generated by the synchronous block using the n-2 MSBs) plus 0 to 3 quarters of a cycle, depending on the 2 LSBs.
However, using asynchronous techniques involves delay problems. In this case, signal QuarterCycle suffers from these problems. Its value corresponding to the 270º clock also has a short on-time at the beginning of the 0º clock (see Fig. 4 ). This problem can produce a non-monotonic behavior (duty cycle commands ending in "11" would be similar to "00" commands). In order to avoid the non-monotonic behavior some modifications in the DPWM architecture are proposed. These modifications of the basic DPWM architecture shown in Fig. 3 are explained in the next section. 
E. Modifications for avoiding non-monotonic behavior
As explained before, the proposed basic DPWM shown in Fig. 3 can have problems of non-monotonic behavior. In order to guarantee monotonicity, it needs some additional resources used for avoiding the problems caused by asynchronous delays. The proposed block diagram is shown in Fig.5 . In order to avoid the non-monotonic behavior, the reset of the output when the 2 LSBs of the d are "11" is only allowed after the arrival of the 90º clock. For that reason, additional RS registers and multiplexers are added (see Fig 5) .
As a conclusion, the proposed DPWM achieves high resolution thanks to the use of 4 phase-shifted clocks while maintaining high linearity and monotonic behavior avoiding delay-lines. Resolution is increased 4 times compared to only counter-based DPWMs.
III. EXPERIMENTAL RESULTS
The whole DPWM (both synchronous and asynchronous blocks) has been designed in VHDL (the same can be done in Verilog). This allows the DPWM to be implemented in any FPGA. The only part not described in VHDL is the DLL, which is given by the manufacturer as a black box. This is the only limitation for implementing this DPWM in an IC, although a similar DLL can be designed for a possible IC implementation. However, using the FPGA's DLL drastically decreases the design process and time, while a well known and tested design given by the FPGA manufacturer increases reliability.
An important objective is to be able to use the automatic place & route tools for converting the VHDL code into a physical implementation. There are no problems for this in the synchronous block, but different delays in each data path of the asynchronous block decreases linearity of the proposed DPWM. In order to maintain linearity as high as possible, manual place & route can be used in the asynchronous block. However, as easy design is one of the objectives of the proposed DPWM, this can be avoided using only manual placement of some key elements, such as the RS output registers and QuarterCycle multiplexer. Placing the generation of QuarterCycle at similar distances from the different RS output registers helps the automatic place & route tool to do the rest of the placement and routing while maintaining high linearity.
A. Simulation results
The software used for simulation was ModelSim. In order to assure a correct VHDL description of the proposed design, behavioral simulations (these simulations do not include delays) were done. However, in order to check linearity and monotonic behavior, post-place and route simulations were carried out. These simulations include the delays of each signal. Using these simulations, non-monotonic behavior was identified in the design of Fig. 3 . Once it was changed to the design of Fig. 5 , further post-place and route simulations confirmed the monotonic behavior of the corrected design. Furthermore, these simulations allowed checking how linearity was changed using different placements and synthesis options. This kind of simulation is necessary, because the delays are a key part of the design in the asynchronous block. Fig. 6 shows the HSM and LSM driving signals of the 4 phases of a multiphase converter also used in the experimental results (including dead times for the 4 phases).
B. Experimental results
The proposed DPWM has been implemented in a Spartan-3 FPGA (Xilinx), occupying only 22410 equivalent gates for a 4-phases buck converter (including the DLLs). The Spartan families are the low-cost low-speed FPGAs of Xilinx, so even better time resolution results can be obtained using Virtex FPGAs. However, the objective is to show the feasibility of the method with low-cost devices. In fact, as these FPGAs can be even below 2$, FPGAs could be considered for final implementation, apart from prototyping. The exact device has been a XC3S200FT256-4. The suffix -4 means a low-speed and low-cost device, because even using the same family, higher speeds can also be obtained using -5 devices.
The experimental results have been obtained using an external clock of 32 MHz, which is multiplied by four using the DLLs (called Digital Clock Managers or DCMs in this family of FPGAs). These DCMs also generate the four phaseshifted clocks. Hence the internal clock frequency of the DPWM is 128 MHz. The final resolution of this DPWM can be calculated as a time step using:
where T clk4x is the period of the 128 MHz clock frequency (7.81 ns), and m is the number of asynchronous bits (2 in this case). Therefore, the final resolution is a quarter of the 128 MHz clock period, that is, 1.95 ns. As the switching frequency was 250 kHz for these experimental results, the total resolution can be calculated using (2) for a result of 2048 different duty cycle solutions. This is equivalent to 11 bits of resolution, 9 for the synchronous block and 2 for the asynchronous block.
However, the number of bits of resolution changes from application to application depending on the switching frequency. For instance, a 1 MHz converter would obtain 9-bits of resolution, or a 4 MHz converter would obtain 7-bits. What remains the same in all cases is the resolution measured as a time step, which is 1.95 ns. That is the resolution used in the experimental results, although the place & route tool report and post-place and route simulations assures proper working up to an internal frequency of 171.2 MHz, which would lead to a time step of 1.46 ns if a 171.2 / 4 = 42.8 MHz external clock was used instead of the 32 MHz which was employed. These results are obtained for a low cost Spartan-3 device. Of course, higher resolutions can be achieved using faster devices. Synthesis results for other FPGAs are shown in Table I . All these results were obtained using XST (Xilinx Synthesis Tool) and Xilinx ISE v8.1. These results are comparable or even higher than those of the other state-of-the-art DPWMs based on delay lines [6] [7] [8] [9] [10] [11] [12] .
The experimental results have been obtained using a 4-phases buck converter, forcing to repeat most of the DPWM structure 4 times. The experimental results have been almost the same in all the phases, showing that the automatic place & route process is valid (4 asynchronous blocks were generated and routed). The HSM and LSM driving signals were generated including dead times for the 4 phases of the multiphase converter. Fig. 7 shows the HSM and LSM experimental driving signals of the 4 phases. Linearity and monotonic behavior have also been tested. Table II shows the measured on-time for different duty cycles. These results are close to theoretical ones (1.95 ns steps), showing the feasibility of the proposed DPWM architecture. Furthermore, the accumulated step every four duty cycle solutions (discarding the 2 LSBs) shows an even higher linearity, as it is caused by the synchronous block which is highly linear. 
IV. CONCLUSIONS
A new hybrid counter-asynchronous DPWM architecture has been proposed. This DPWM, which is easy to design, is intended for FPGA implementation, as it takes advantage of the internal DLL available in almost every FPGA nowadays. The DLL raises the resolution of the DPWM in two ways. The external clock frequency is internally multiplied for a higher resolution of the counter-based block of the DPWM. Once the maximum possible resolution is achieved in the synchronous block, it is multiplied by four using four phase-shifted clock outputs of the DLL. The proposed DPWM has been verified through experimental results using a low-cost FPGA implementation (Spartan-3), which shows the feasibility of the method not only for prototyping purposes but also for final products. A time resolution under 2 ns has been obtained while keeping high linearity and monotonic behavior.
