Abstract Presente(d in 
Introduction
In recent years, the need for implementing highly complex algorithms within a limited power budget has forced researchers to explore various methods for powerreduction El]. These methods are applicable at various levels of the VLSI design methodology including architectural, logic and circuit level. However, the interaction between these techniques is highly complex and there exists a need for a unifying basis for powerreduction. A desirable characteristic of such a basis is that it should be applicable at all levels of the design methodology including the algorithmic level.
Most of the work done in the area of low-power design can be placed into the following two broad categories: 1 .) Development of power reduction techniques and 2.) investigating the lower bounds on power dissipation. Work done in Category 1 can be viewed as a collection of techniques to reduce power at all possible levels of the VLSI design methodology. These techniques include pipelining and parallel processing [l- [7] .
Some related work includes determining the lower bounds on the achievable powlar dissipation [S-S]. In [8], the lower bound on power dissi.pation per pole for analog circuits was presented. This bound is based upon the desired signal-to-noise ratio ( S N R ) . Empirical lower bound estimates for digital ciircuits were also presented in [8] , which was based upon <a desired S N R , estimates of gate complexity and energy per operation. The theory presented in this paper can provide the desired S N R for a given algorithm. Employing thermodynamic arguments [9] , the thermal noise power spectrum was determined to be kT and the min:imum energy required for a logic change was shown t o be lower bounded by approximately 4kT. The lower bounds calculated via the proposed theory is also a function of the noise power spectrum of the implementation media.
In this paper, we employ an information-theoretic approach to develop a mathematical basis for dete.rmining the lower-bounds on power-dissipation for DSP algorithms. Information-theoretic measures have been employed in the past to define a measure of computataonal work of a boolean transformation [lo-111 . This measure of computational work is closely related to the a.rea [12] occupied by an implementation of the boolean function. Elegant approaches to power estimation at the registertransfer level (RTL) have been proposed [13-141, which incorporate concepts such as entropy.
Our work differs significantly in that we begin with an implementation-independent view of an algorithm and then proceed to determine the lower-bounds on power dissipation for a given architecture. This is done via the introduction of the information transfer rate R (an implementation-independent quantity) ;and the channel capacity C (an architecture and technologydependent quantity). The proposed basis has two advantages: 1.) it allows us to derive lower bounds on the power dissipation in DSP algorithms and 2.) it enables us to unify existing power-reduction techniques under a common framework. We will illustrate 1.) via examples of simple digital filters arid 2 . ) , by determining the lower bounds on power dissi:pation for adiaba1;ic logic [4-51. However, it must be mentioned that with respect to 2 . ) , the proposed theory has been employed to provide a common basis for architectural techniques such as pipelining and parallel processing [15] . In order to provide the necessary background, we will review some basic information theoretic concepts in this section.
Entropy and Mutual Information
Consider a discrete source generating symbols from the set Sx = Xo, X I , . . . X L -~ according to a probability distribution P r ( X ) . A measure of the information content of this source is given by its entropy H(X), which is defined as follows 
where H ( X J Y ) is the conditional entropy of X conditioned on Y . The mutual information I ( X ; Y ) can be viewed as the reduction in uncertainty in X due to the knowledge of Y.
Information Transfer Rate
The reduction in uncertainty (by an amount I ( X ; Y ) ) in X is due to the information transferred from the input of the channel to its output. Thus, the information transfer rate R is defined as
where fop is the rate at which the symbols are generated by the source.
Channel Capacity
In his seminal work [ l S ] , Shannon showed that the capacity (C) of a channel band-limited to frequency W , is given by
where C is in bps. From (2.4), it is clear that the capacity C depends upon the S N R and the transmission bandwidth W
A Fundamental Basis for PowerReduction
In this section, we will first show that all logic transformations have an inherent information transfer rate requirements R associated with them. Next, we will present a theorem which allows one to determine the lower bounds on the power dissipation for a given architectural implementation. 
Logic Transformation
We can represent any noisy logic transformation as shown in Fig. 1 , where noise could have many sources including the implementation media itself. Without any loss of generality, we assume that the inputs and the outputs are latched synchronously.
The definition of r' is shown in Fig. 2 , where the input space SX is mapped onto the output space Syj.
The dark dots in the set SX represent the discrete values that the input X can assume, while the ones in the set Sy denote the values that the output can assume if the noise power were zero.
Assuming that the noise probability density function is identical for all :possible noiseless outputs Y, then we can represent the system in Fig. 1 as shown in Fig. 3 with the corresponding mapping for T as shown in Fig.  4 . In this figure, all the noise in I" has been referred to the output and we now have a noiseless transformation r mapping the input space SX to a noiseless output space Sy. In this paper, we will assume that the Fig. 3 can be employed to represent any noisy transformation I". Note that the computation model in Fig.   3 is quite similar to a generic digital communications system. The input and output latches can be viewed as transceivers (transmitter-receiver and the clock as providing the correct sampling epoc h (output of a timingrecovery block). In a digital communications system, the transformation I' is usually an identity transformation. Thus, the model in Fig. 3 allows us to apply results from digital communications theory [16] to determine the information transfer rate requirements for a given DSP algorithm. Thus, all digital transformations, in particular linear finite-precision DSP algorithms have an inherent information transfer rate requirement R given by (3.1). This requirement is an inherent property of the transformation and is independent of the implementation media or the architeciure.
Lower Bounds on Power Dissipation
It is well--:known that there can be many different digital architectures which achieve the same functionality. In the present context, we view each of these architectures as a communication network with a certain capacity C. The capacity C for a point-to-point link is given by (2.4), wlhere it can be seen that C is a function of 1.) the transmission bandwidth W , 2.) the signal-tonoise ratio , S N R ( f ) . In the context of a digital system, a connectioi~i between two logic gates can be viewed as a point-to-point link. This connection would have a certain resista:rice RL and capacitance CL (inductance L can also be considered if necessary). Hence to a first order approximation, the transmission bandwidth W is given by ~/ ( R L C L ) . The signal-to-noise ratio S N R ( f ) is a function of the supply-voltage V d d and the predominant sources of noise in the system.
A general digital network consists of numerous pointto-point links each connecting either combinational logic gates or synchronously clocked latches. We will not attempt to compute the capacity of such a general network as this is currently an open problem in information-theoretic literature. However, employing the abstraction indicated in Fig. 3 , we can obtain an equivalent noise source, and an equivalent resistance RL and CL, which are lumped at the output of the digital system. For example, the equivalent CL could be the sum of the capacitances of the critical path of the architecture. Therefore, the transmission bandwidth W , S N R ( f ) and therefore the capacity C can be determined.
Clearly, from [16] , the capacity C should be greater than or equal to R for a meaningful computation to take place. We exploit this result to formally present the following theorem.
Theorem 2: For Note that Theorem 2 does not provide us with the technique to achieve the lower bound. This is not surprising given the fact that Theorem 2 is derived from Shannon's joint source-channel coding theorem [ 1 6 ] , which in turn provides a proof of achievability but not the method.
Lower Bound Calculations
In this section, we present four examples of the ap-
The noise plicaiion of Theorem 1 and Theorem 2. Figure 5 : A degenerate case. source in Fig. 3 needs to be determined for the system. However, for conventional digital systems the noise is mainly due to the phenomenon of ground bounce. Just for the sake of demonstration, and without any loss of generality, we assume that the implementation technology is CMOS with a flat noise spectrum with average power r; = V2 over a usable bandwidth of W = 100 MHz. This is consistent with the value of ground bounce in a typical sub-micron CMOS technology. Furthermore, we can approximate the transmit pulse as a square-wave with an amplitude of =tvdd/2 volts for transmitting a '1' and 'O', respectively. The signal power U$ (or the variance) is therefore equal to V2 /4. The function F for static CMOS is defined as fo?fows 
FroquencJ
F = CL V : d 2 W,(4.
Degenerate Case
Consider an ideal low-pass filter with a cutoff frequency fc lower than the lower edge of the input signal spectrum (see Fig. 5 ). From Theorem 1 and (3.1), we get H ( Y ) = 0 and R = 0. Next, employing Theorem 2, we find that the lower bound on power dissipation P D ,~~~ is equal to zero. Clearly, the output of the lowpass filter is equal to zero. One particular implementation of such a system is an output line connected to ground. Other than non-idealities, such a system will not consume any power.
Filter 1
Let us consider an FIR filter with following transferfunction Further, assume that the sampling rate fs = lOOMHz and that the input z ( n ) to the filter is a 1-bit word, which is equally likely to be a '1' or a '0'. From 
Filter 2
In this example, we consider the following transfer function (4.5)
Assume all the other parameters to be the same as in Comparing (4.3) and (4.6), we find that the lower-bound on the supply voltage for Filter 2 is lower than that of Filter 1. This is due to the fact that Filter 1 has an information transfer rate requirements that is higher than that of Filter 2. The lower bound on the power dissipation is given by P D ,~~~ = 7.32pW. A comparison of (4.4) and (4.7) shows that Filter 2 has a smaller lower bound than that of Filter 1. Note that this is a counter-intuitive result given the fact that there are two non-zero coefficients in a direct-form implementation of (4.5) as compared to one non-zero coefficient in the corresponding direct-form implementation of (4.2). However, by applying the associativiiy transformation [17] , (4.5) can be implemented with one multiplier and one adder. The addition would be done first followed 
Vc(t) = V d d s i n ( 2 n W t + t~n -l ( 2 n W R~C~) ) .
(4.10)
The power dissipation in Fig. 6 is given by Figure 6 : Adiabatic computation by one mulliplication. Hence, the number of hardware units for a direct-form implementation of both filters would be identical. This example shows that the proposed theory does indicate the possibility of finding an architecture that would dissipate lesser power than the existing one. Hence it can be employed to compare the power dissipation characteristics of two algorithms under similar implementation constraints.
Adiabatic Logic
Adiabati'c: computation [4-51 and pulsed powersupply CMOS logic [18] work in a unique fashion. In particular, t,hese techniques are based upon the fact that power dissipation can be minimized by ensuring that the voltage across any resistor is kept as small as possible. This is shown in Fig. 6 , where the capacitor CL is charged u p by applying a voltage source V ( t ) , whose rise time T, >> RLCL. Under this condition, the power dissipation can approach zero. A lower bound on the energy dissipa.tion for adiabatic logic has been calculated in [4-51 for a given value of T,. In this subsection, we show that 5r, itself is a function of the supply-voltage V d d and the: information transfer rate R. Hence, there is an upper limit for T, and therefore a lower bound on the power dissipation. In practice, RL is the source to drain resistance of a MOSFET, which is a function of the current. However, for the sake of simplicity and without any loss of generality, we will assume that RL is a constant.
The power reduction property of adiabatic computation [4-51 and pulsed power-supply CMOS logic [8] can be explained via the information-theoretic framework presented in this paper. By applying a time-varying voltage sour'ce V ( t ) , adiabatic computation involves reducing the bandwidth of the transmit pulse and hence the transm:ission bandwidth W . From (2.4), we see that reduci:ng W also reduces the capacity C of the channel. Furthermore, it can be shown that the power dissipated in an adiabatic computation is a function of W . For a desired information transfer rate R and supply voltage V d d , there is a lower limit on W and hence on the power dissipation. The limit to which the power dissipation can be reduced is computable from Theorem 2. First assume a sinusoidal supply waveform A lower bound on W implies a lower bound on P o , which is given by (4.13). Unlike other methods of power reduction, the lower bound given by 4.13) is easily approachable. This is due to the fact t h at we have included all implementation constraints in the derivation of (4.13).
Note that (4.13) is not an absolute lower bound as it is a function of the supply voltage V d d . In order to compute the absolute lower bound, we first substitute (4.14) into (4.13) to obtain .RL
PD,min , a d b =
Clearly, from (4.17) the power dissipation for adiabatic logic will be zero only if the desired information transfer rate R = 0.
Conclusions
A mathematical basis for power-reduction in digital VLSI circuits was presented in this paper. The proposed basis was applied to determine the lower-bounds on power dissipation for DSP filtering algorithms and also adiabatic logic. This clearly indicates that the proposed approach can traverse a wide range of design abstraction from algorithms to circuits. For the sake of simplicity, we have assumed in this paper that a constant value of ground bounce voltage is the major noise contributor. Clearly, ground bounce is a function of signal power and the architecture. Hence, furture work is being directed towards an accurate characterization of this and other noise sources in digital circuits. Approaching the lower bounds presented here requires a cohesive application of power-reduction techniques at all levels of the design hierarchy starting from algorithms, architectures, logic and circuits. We are currently in the processes of developing such a strategy employing the framework proposed in this paper.
