A 1.2GHz delayed clock generator capable of adjusting its clock phase according to input clock frequencies has been developed. It consists of a full-digital CMOS circuit that leads to a simple, robust, and portable IP. One-cycle lock time enables clock-on-demand circuit structures. The implemented delayed clock generator tile in 0.13um CMOS technology occupies only 0.004mm 2 and operates at variable input frequencies ranging from 625MHz to 1.2GHz. Introduction Most microprocessors use PLLs as clock generators. However, a PLL is not suitable for low supply voltage operation and for clock-on-demand applications requiring fast lock time. To achieve a higher operating speed than the internal main clock frequency, multiphase clock can be used in large SoCs. However, large area is needed to distribute all of the multiphase clock signals through the entire die. In addition, skew and large power consumption become problematic. Local multiphase clock generation wherever it is needed overcomes these problems [2] . Although the mostly digital DLL in [1] and all-digital multiphase clock generator in [3] exhibit several digital design advantages they take up to 2.9 s and 128 cycles to be locked, respectively, which are not fast enough to achieve clock-on-demand. All-digital-circuit-based clock multiplication is also proposed with fast lock time of 1.3 cycles [4] . Small area, low power consumption, and fast lock time are important design parameters for local multiphase clock generators. In this work, a very compact delayed clock generator will be presented. The proposed delayed clock generator tile has two important features: open-loop non-PLL/DLL-based design and alldigital static-circuit-based design. The latter is good for portable IP and fast time-to-market designs. The former enables easy clock-on-demand schemes due to one-cycle lock time, smaller area, lower power consumption, no jitter accumulation, and lower voltage operation compared to PLL-based counterparts.
Once 1-to-0 transition detector finds the position of phase, the position of the /2 phase can be easily determined since half the number of delay cells within the phase is the position of /2 phase. If the number of delay cells within a phase is even the position of /2 can be easily calculated. On the other case, if the number of delay cells within a phase is odd, fine grain phases ( ' 0.5 , ' 1.5 ,…, ' n-0.5 ) are needed. This is achieved by using a phase interpolator. The select signal should be adjusted to compensate the delay between the output of delay cell and final output clock as shown in Fig. 2 . If the select signal points 3 earlier, the delay amount of interpolator, multiplexer, and buffer can be offset. If is very long compared with the cycle time, the number of logic one's within phase may be less than 6 at slow corner and the limitation of selection process is reached. To solve this problem, the repeated same clock phases are used for selecting the same clock phase as shown in Fig.  2(b) . When becomes very long compared to cycle time, the multiple repeated phases ( i , 3i ) are produced on a delay chain and one of them can be chosen using S i and S 3i . This provides more design flexibility and accuracy. In this example, since the phase of 4 can be mapped to one of { 11 , 12 , 13 }, one of { 11 , 12 , 13 } phases provides an accurate result. If 4 is mapped to 12 as shown in the Fig. 2 (b), 10 is chosen instead of 2 and 7 is finally selected considering 3 delay compensation. Implementation Using the proposed delayed clock generator, /2 and 3 /2 phase generator tiles (108.5um×36.67um) are implemented in a 130nm CMOS process as shown in Fig. 3 (top). The proposed delayed clock generator tiles are implemented in 14 places in a 1.2GHz RISC microprocessor as shown in Fig. 3(bottom) . Multiphase generator tiles enabled pseudopipeline technique to be used in the 1.2GHz microprocessor, which makes many blocks of the microprocessor operate at like-2.4GHz performance with small power consumption increase. The extracted output load of each tile from the full-chip layout is normally 400fF. Because the delayed clock generator is designed for the operating frequencies ranging from 625MHz to 1.2GHz, seventeen delay cells were used to cover all PVT variations. Another test vehicle, tile2, running at 600MHz is fabricated in a 0.18 m process to get more precise test results of the proposed clock generator because the one in a 0.13 m process, tile1, is integrated in the RISC processor and no pin is assigned for it.
Measurement Results and Conclusion Waveforms of tile1 are shown in Fig. 4 . The upper waves show the original clock and its delayed phases. The middle waves are 1-to-0 detector output, select signal from the 1-to-0 select signal generator and selected phase ( ' i ). The final output clock is generated with a /2 shifted phase with 1 cycle lock time. The phase errors ranged within ±2% of the cycle time. Figure 5 shows the measurement results from the test vehicle in a 0.18 m process, tile2. The left figure shows that the final output clock is generated with a /2 shifted phase with 1 cycle lock time. The tile2 running at 500MHz has a peak-to-peak jitter of 7.6ps as shown in Fig. 5(right) . Figure 6 shows the measured shmoo plot of tile2. The operating frequency range of the clock generator is from 200MHz to 600MHz within a ±2% phase error. It can operate up to 150MHz at halved supply voltage of 0.9V and consumes 0.5mW. Table 1 summarizes several digital multiphase clock generators. It shows that the proposed clock generator (tile1) occupies only 0.004mm 2 , which enables local multiphase clock generation with only small overhead of area and power consumption. One cycle lock time is essential to return from power-down mode to active-mode for clock on demand operation so that it can be applied to voltage scaling microprocessors for low power operation. Delayed Clk Delayed Clk
