Abstract.-One of the most important sources of switching noise and power consumption in large VLSI circuits is the clock generation and distribution tree. This paper analyzes how the use of an asymmetric clock can be an important solution to reduce the switching noise generated by the global clock, with a very reduced degradation in performances and reliability. The suited sizing of clock generators and the design of asymmetric clock tree cells, show the benefits of the proposed technique, validated through a design example where a 50% of noise reduction is achieved with 10% of loss in operation frequency and no penalty, even saving, in power consumption.
I. INTRODUCTION
Clocking is one of the most important issues in very deep submicron VLSI synchronous circuits: clock signals are typically loaded with the greatest fanout, travel over the longest distances, and operate at the highest speeds of any signal, within the entire system [1] . The combination of large capacitive loads and a continuous demand for higher clock frequencies have led to an increasingly larger proportion of the total power of a system dissipated within the clock distribution network, one of the most important source of power consumption in current VLSI systems [1] [2] [3] [4] . On the other hand, the necessity of reducing clock skew for timing closure, makes clocked digital flip-flops and latches change state together, producing a large cumulative current spike flowing through parasitic resistances and inductances. Noise injected into the substrate and/or propagated through supply network, limit the performances of the sensitive analog and RF portions of the design, even producing malfunction [5] [6] [7] [8] . Thus it is necessary to contemplate the distribution of a global clock throughout the global chip, making as much reduced as possible the effects of the generation and distribution of such global clock.
In this paper we propose a technique to reduce the switching noise generated by the clock buffering circuitry, based on the generation of an asymmetric clock. This special clock has different rise and fall times, increasing the transition time of the clock edge that is inactive for the operation of flip-flops loaded by the clock. The proposed clock circuitry does not significantly degrade the timing performance of the clock circuitry, yielding a This work has been sponsored by the Spanish MEC TEC2004-01509 DOC and the Junta de Andalucía TIC2006-635 Projects reduction in power consumption and switching noise generation of both the clock and the clocked logic, in a reduced area.
The organization of the paper is as follows: Section 2 introduces the influence of clocking in switching noise; Section 3 presents the proposal on asymmetric clocking; Section 4 includes the design and characteristics of the proposed clock buffers; in Section 5 the proposal is validated with HSPICE simulation of design examples. Finally, the conclusions are presented.
II. INFLUENCE OF CLOCKING IN SWITCHING NOISE
There are different solutions in the literature for clocking distribution and buffering networks [1] [2] [3] [4] . In a first approach, two basics schemes could be considered. The scheme in Fig. 1a is a single driver or tapered buffer generator, the easiest way of generating a clean clock signal because of its simplicity. On the other hand, the more complex balanced distribution buffer network in Fig. 1b reduces crosstalk and saves power and noise, since the size of buffers is calculated depending on the clock net load.
The schemes in Fig. 1 provide different timing and power performances but, in order to obtain clean clock signals in the sense of a high slope and signal integrity, a very high instantaneous consumption of current is needed. Considering also the high capacitive load of clock node, the generation of switching noise or di/dt noise in clocking circuitry is considerable, being the most important source of switching noise in the IC, excepting the I/O circuitry.
There are different approaches to reduce the clock noise. The most common are those based on modifying clock latencies by inserting or properly sizing clock buffers when designing the clock network [9] [10] [11] [12] [13] . For instance, [10] proposes a methodology for clock noise reduction, based on an error-driven optimization of the clock-tree latencies using supply current profiles taking timing constraints and the clock skew into account. Other techniques are based on shaping the clock: for instance, in [9] a reduction of the high frequency contents of the clock is proposed by increasing its transition time, but a special flip-flop is needed to operate correctly. It has also been demonstrated that the use of multiple-level clocking schemes, or asynchronous styles bring up benefits in terms of switching noise generation [8] . However, it is also clear that the use of a single-phase clocking scheme is encouraged by VLSI designers, if clock-skew reduction problems are affordable.
The proposed solution is valid for both schemes in Fig. 1 and it is especially suited for clocked circuits operating with edge-sensitive logic, with the only restriction that all the flipflops in the circuit should be driven by the same edge (either rise or fall). In the same way, it would be desirable that all flip-flops and clock buffers could be loaded by the same clock signal and this signal only loads such clocked elements. This is not a limitation, but a usual requirement for VLSI synchronous design.
III. ASYMMETRIC CLOCKING AND LOGIC ISSUES
The proposed technique is based on generating an asymmetric clock, as the ones shown in Fig. 2 . The transition time, both rising and falling, has influence in the average power consumption and in the peak in supply current for both the clock driver and the clocked logic, as it will be stated.
The charge and discharge of the clock load network requires current from the supply source. As transition time increases, the peak in supply current will be reduced, since the same amount of charge will be supplied during more time. Some HSPICE results showing this dependence for a 0.13 µm CMOS inverter are depicted in Fig. 3 . Results in table of Fig. 3 show the increment in propagation delay (t pLH ), fall time (t fall ) and shortcircuit current (I P ) as input slope decreases. In a smooth way, a reduction in the peak and a increment in the average of total current (I N ) is also noticed. Related to average power consumption, short-circuit power increases when transition time increases, due to NMOS and PMOS simultaneous operation in linear and saturation mode, respectively for the output falling transition. Managing these effect, a trade-off between peak current reduction and average power-timing performances degradation can be found if a suited value for input slope is considered.
The effect of reducing the clock slope on peak current reduction is twofold: i) the clock driver will generate a reduced current peak, reducing the noise injected in the circuit; ii) a lower clock slope will produce a more reduced current peak in the clocked logic within the circuit. However, reducing the clock slope has as drawbacks: a) reduction of timing performances; b) extra short-circuit power consumption; c) possibility of race conditions and setup violations in memory elements; and d) potential reduction in the reliable operation of the whole system due to clock-skew failures. For these reasons, the clock signal has been traditionally designed with high slope clean edges. The use of an asymmetric clock as the proposed in Fig. 2 will overcome the problems c) and d), because they concern the leading "useful" edge, and the proposed asymmetric clock will have a typical high slope leading edge triggering the flip-flops. Then, the influence of slow trailing edge in clock-skew failures and race conditions seems to be negligible. Thus, if properly designed, the clock driver for the asymmetric clock would generate a lower peak in supply current during the slow transition edge at the cost of reduced timing performances degradation.
The influence of the asymmetric clock in the clocked logic is reduced if the selected fast edge triggers the flip-flops. With this restriction the influence of trailing slow edge would be minimum in the performances degradation of the circuit.
There are recent approaches to logic design with asymmetric gates, but with a rather different viewpoint. The monotonic circuits [14] and several skewed logic families (SCSL [15] , S 2 L [16] ) are good examples of this, being the asymmetry obtained by using dual threshold voltage transistors or by sizing the transistors, as it is done in the technique used in this paper. DTPS logic [17] and other skewed logics can be applied to improve the propagation delay and power consumption in tapered buffers and datapaths, but without effects on the transition time of outputs. The proposed technique in this paper deals with modifications in the transition time of outputs, mainly clock signals, to achieve an important reduction in switching noise, being the main and important novelty of this work. Figure 3 .Influence of the input slope on output voltage and current waveforms for a CMOS inverter.
IV. DESIGN OF THE ASYMMETRIC CLOCK BUFFER
To generate an asymmetric clock, we can use a on-chip clock generator or, to use an actual symmetric clock and convert it into an asymmetric one. The later option is better because it does not impose any restriction to any other circuit in the system driven by the same clock. The clock drivers or the buffered distribution tree, as the ones shown in Fig. 4 could be used as symmetric to asymmetric clock converters, if suitely designed. The asymmetric clock generator is basically the serial connection of alternating inverters, shown in Fig. 4 . The arrows inside the symbols indicate the fast transition at its output, being the other transition slow, following the notation used in [15] . It is clear that both kind of asymmetric clocks in Fig. 2 can be obtained by selecting the input or output of an specific inverter.
An immediate way to modify the transition time of a clock signal is the suited sizing of pull-up and pull-down transistors in clock drivers. By reducing the W/L ratio of the pull-up (-down) in a clocked inverter, the driving load capability for falling (rising) transition will be reduced. In this way, the transition time for fall (rise) transitions will be increased and, in the same way, the current peak and switching noise generated, as suggested in Fig. 3 . However, the W/L reduction increases delay and the possibility of malfunction in flip-flops and clocked circuitry. A solution to the last problem is to keep the slope values within a range that guarantee the correct operation of CMOS logic gates and flip-flops. The degradation of propagation delay is a design concern [15] , but a trade-off between transition and propagation times can be found.
The library inverter INVx (x: driving capability) will be used as reference. The pull-up and pull-down sizes of INVx are 2Kx and Kx, respectively, being K the recommended minimum geometry for the selected technology, and x is 1, 2, 3, 4 or 8 for standard cells library. The sizes of pull-up and pull-down of asymmetric inverters will be divided accordingly if the slow transition is the rising or the falling edge. A summary of scaling is shown in table I. For instance, K = 0.5µm / 0.12µm for the 0.13 µm UMC technology.
V. DESIGN EXAMPLES AND BENCHMARKS
To assess the proposed technique, a clock driver based on a tapered buffer is proposed, to generate an asymmetric clock as Fig. 2a . A classical symmetric and an novel asymmetric implementations, shown in Fig. 5a and Fig. 5b , respectively, have been designed in the 0.13µm, 1.2V UMC technology, and simulated with HSPICE. The input is a 500MHz, symmetric clock, with 100ps of transition time. The tapering factor is 3, meaning that each inverter is 3 times wider than the preceding one. The dimensions of pull-up (top) and pull-down (bottom) of inverters are also shown in Fig. 5 . The output clock drives M (M=10, 100 and 1000) logic inverters of minimum size, to check the influence of output load in the clocking schemes.
As a first comparison result, a gain of 21% in area is expected when using the proposed scheme of figure 5b, due to the reduction of transistor size, indicating that the introduction of the proposed scheme does not introduce any penalty in hardware resources. The parameters measured (table II) are propagation delay, transition time, energy per transition and noise (peak of supply current) of the clock driver. To check the influence of the clock signal in the clocked logic, the energy per transition and noise induced in the logic driven by the clock are included in the right columns of table II. The measured ratio between the fall and the rise transition times is about 0.9 for the symmetric and 1.7 for the asymmetric solutions, being this ratio affordable by standard flip-flops to work with. So, no special flip-flops are needed in advance, as it happens in [9] .
As it can be seen from the results of table II, it is clear how the higher transition times in the trailing edge mean a reduction in peak current, without significant penalty in performances regarding the more critical leading edge. Particularly, the selected dimensions for the clock buffer introduces a reduction in timing performances, quantified as a 50% of degradation in fall time, as design target, but with a reduction up to 10% in rise time.
The 40% of degradation in high to low propagation delay is practically compensated with up to 30% gain in low to high delay, meaning around 10% of loss in operation frequency.
Concerning average power consumption, the energy per transition results in table II indicates that there is an important reduction of up to 28% in the clock driver for the proposal. However, this gain compensated with an increment of power consumed by the clocked logic, especially for high output load conditions. In the case of reduced capacitive load, the gain in the whole system due to the asymmetric clock is up to 20%, but for high load conditions, the increment in total power of clock driver and clocked logic goes to 16%, approximately.
Regarding peak power, as indirect measurement of switching noise, the overhead in the rising (active) edge is negligible, being comparable the results of both schemes. However, the gain in the use of the asymmetric proposal in the falling (inactive) edge is quite significant, as it was expected. Simulated data indicate that the peak in supply current in the falling transition is between 48% and 54% lower for the asymmetric buffer. In the same way, the peak induced in the clocked logic is reduced for the asymmetric buffer in a range between 4% and 29% for the light and heavy loading conditions, respectively.
CONCLUSIONS
One of the most important sources of switching noise in large VLSI circuits is the clock-driven circuitry and the clock generation and distribution logic. In this paper, a novel proposal on clock generation and buffering has been presented. The main idea is to increase the transition time of the trailing edge of the clock, converting the clock in an asymmetric signal. Since all the requirements for timing closure and limited, or zero, skew are concerning the leading edge, the reduction in reliability of the synchronization is negligible.
The increase of transition time in the trailing edge reduces the switching noise of the unused transition (48%-54%), producing additional reduction of switching noise in the clocked logic (4%-29%). The proposal has been designed and characterized for a design example in a 0.13 µm technology showing, a significative reduction in switching noise. The variation in power consumption strongly depends on the load conditions, ranging from a 20% of power reduction in light load conditions, up to a 16% of power increment for heavy load conditions. The limitation in timing performances is reduced up to 10% of average reduction in operation frequency. The proposal also presents a significant reduction in hardware resources.
As future work, the integration of a complex VLSI mixedsignal SOC with the clock tree based on the proposal is needed to assess the advantages of the proposed clocking scheme.
