power high performance digital systems design, asynchronous design, self-timed digital system design and STEM education. As a result of her work, she has numerous peer reviewed journal and conference publications. She recently authored a book entitled "Low Power Self-Timed Size Optimization for an Input Data Distribution," which explores innovative techniques to reduce power consumption for portable electronic devices. 
INTRODUCTION
Power conservation has become an increasingly important issue among modern digital circuit designers. As the evolution of digital technology takes us into the 21 st century coupled with ground breaking system performance, the power consumed by these circuits are at record highs. In fact, power dissipation or energy loss in the form of heat is reaching levels comparable to nuclear reactors. The negative affect associated with the power dissipation compromises or in many cases, impair chip reliability and life expectancy.
Over the past decade, research in this area has eased but not solved this power issue. Many solutions involved increasing chip parameter size to ease the chips density that has lead us to this hot spot. However, as the demand for portable electronic devices rise, scaling technology forces us to deal with this problem. Remarkably enough, if research does not produced a technique to break through the "power wall", advancements in circuit technology will have reached its limits. New design techniques with energy-delay characteristics that are superior to that of the synchronous timing and control approach are needed today because the throughput of systems realized with this method is limited by the power dissipation of nanometer scale devices and the power management strategies developed to insure that they do not exceed device thermal constraints. As the demand for low power electronics rise, power aware techniques are being applied to all levels of circuit and system design. A novel circuit device sizing approach that is based on the circuit input data distribution is proposed in this paper. I have chosen to optimize the transistor level by using a probability set input data distribution for a given function and appropriately sizing transistors based on likelihood of path activity. Transistors along pathways with frequent activity are sized larger than transistors on inactive pathways. Although this is a work in progress, preliminary results show promise. The analysis is based on the Logical Effort RC model [5] of a ripple-carry adder. The model was extracted from SPICE simulation for the TMSC 0.18um process. The performance and energy dissipation of circuits implemented with this approach is 7% and 13% respectively better than circuits designed with previously proposed approaches. This methodology can be extremely useful when teaching low power system design techniques.
MODERN ELECTRONIC CIRCUIT DESIGN TECHNIQUES
Electronic system designers must choose a technique that compliments the circuit's logic and system design when building a circuit. For the most part, circuit designs in the industry are built with synchronous logic; small blocks of combinatorial logic separated by synchronously clocked registers. Figure 1 gives an illustration of a synchronous system. As its name suggest, synchronous circuits use a clock to synchronize each transition. In other words, change in the circuit happens at the same rate and occur at the same time. The biggest advantage of this logic style is the ease in determining the maximum clock frequency of a design by finding and calculating the longest delay path between registers in a circuit. Another advantage of synchronous design is hazard avoidance. Static logic can introduce hazards through spurious transitions meaning that some flip flops have internal meta-stable transition before the settle to their final logic. If the signal is used before the final logic state, the wrong signal may be forwarded. Synchronous logic eliminates this hazard because the clock insures that these glitches have been worked out before transitioning to next state. One major disadvantage of synchronous design is the unused clock cycle time. Even if the gate has finished transitioning, the signal cannot go to the next state until the clock signals the transition. More power is used because the clock uses energy whether gates transition or not. Clock skew is another problem that synchronous systems encounter. This is the difference in time that the clock signal arrives throughout the circuit. It is even further exaggerated as we scale systems because wire delay does not scale the same as transistor switching speed. There are a small number of available CAD tools for design, simulation and testing of asynchronous circuits since synchronous systems have dominated the circuit design industry. However, as the semiconductor industry wrestle with mounting problems trying to achieve higher performances and lower power consumption without significant increases in fabrication costs, developers are turning to asynchronous alternatives to solve these problems. Over the past few years, universities and established asynchronous companies have focused their research on developing Electronic Design Automation (EDA) tools and design flows that can be integrated into the custom and semi-custom methods now used by the industry for synchronous design.
This paradigm shift has opened the door for unprecedented advances in the circuit design industry. [6, 7, 8, 9] all investigate the possible benefits of self-timed system design. Asynchronous logic works extremely well on power dissipation reduction. At 40% activity, an asynchronous system will dissipate 50% less power than its synchronous counterparts [3] . Asynchronous circuits have several other possible benefits. Speed is area where these circuits shine. The timing of an asynchronous circuit depends on the structure of the transistor network, the delay of its signals and the length of the signal paths. Worst case performance of traditional synchronous systems is replaced by average case since performance is dependent on only the current active path. They have better technology migration potential and automatic adaptation to physical properties-fabrication, temperature and power supply voltage. No clock skew is another area -the difference in arrival time of clock signals to different parts of the circuit. Since asynchronous circuits have no clock, there is no clock skew.
Modern synchronous digital systems are limited by power dissipation of nanometer scaled devices and power management strategies developed to insure that they do not exceed circuit thermal constraints. Traditional optimization techniques are based on synchronous digital systems that use a global clock network which consume a considerable amount of the systems power. 50% of Dynamic Power is consumed by clock circuitry [10] . Furthermore, significant power can be wasted in transitions within blocks, even when their output is not needed. Global clock signals are particularly affected by scaling technology in that the long interconnect wires have increasing different times which must be manage to produce valid output. System designers have dealt with the power challenges by clock gating, which saves power by adding logic gates to a circuit in order to disable, portions of the clock tree when not needed. Even though clock gating reduces the power dissipation, it is more effectively implemented on a macro level as opposed to the circuit level. The handshake protocol shown in figure 2 regulates the flow of information through the selftimed pipeline. Input arrives and a Request to F1 is raised. If F1 is inactive, it transfers the data and acknowledges this fact to the input buffer which can then fetch the next input. Next F1 is enabled by raising the Start signal. The Done signal goes high after the completion of the computation. A Request is issued to F2. If it is free, an Acknowledgement is raised and the output value is sent to R2. After which, the process can repeat itself. We have chosen to implement the design to minimize the power dissipation using asynchrony because of the inherit "power down" nature of asynchronous circuits.
LOGIC GATE DELAY
Now that we understand how self-timed circuits are realized, let's review how we model the timing process. The delay in a logic gate is determined by the topology of the gate (fan in) and the capacitive load that the logic gate drives (fan out). Logical effort is a term coined by Ivan Sutherland and Bob Sproull in 1991 which is a method that is used to model the delay of a single logic gate. Logical effort method provides a technique to determine the most efficient transistor sizing on the critical path to minimize the delay, as well as, providing an estimation of that delay. The delay of a logic gate using logical effort is given as:
where p is the parasitic delay which is the intrinsic delay of the gate driving no load, and f is the stage effort. The stage effort is defined as:
where g is logical effort which is the ratio of the input capacitance of a given gate to that of an inverter capable of delivering the same output current and h is effective fan out cout/cin. The dependency is demonstrated in figure 3 . The delay is a function of electrical effort of and inverter for a two input NAND gate. The slope of each line is the logical effort and the y-intercept is the parasitic delay. As shown, we can adjust the total delay by adjusting the electrical effort or by choosing a logic gate with a different logical effort [11] . The tables below are a representation of the logical effort for static gates and dynamic gates. Clearly, the dynamic logic style allows for smaller sizing which partially explains why dynamic gates are faster than static gates. In static gates, much of the input capacitance is wasted on slow PMOS transistors that are not even used during a falling transition. [1] From table 1 we see that the dynamic inverter has a logical effort of 1/3 less than the static inverter. Since logical effort is used for sizing estimations of each component, I have included the table below where N=number of inputs. 
Logical Effort

TRANSISTION SIZING USING INPUT DATA PROBABILITY
Modern electronic system designers should consider non-traditional levels of abstraction such as input data probability profiling to achieve high performance and manage power loss. Since the switching activity of a logic gate is a strong function of the input signal statistics, system designers can use this knowledge to exploit power delay capabilities of a circuit. In this paper, a pipelined architecture that intersects the timing function of the circuit itself and the data that it is processing is utilized. Using input data probability statistics to increase self-timed circuit performance and decrease energy dissipation is novel because the timing is determined locally, which is a function of the circuit and the input data. One major advantages of this proposed technique is the decreased circuit area. This is accomplished when the probability of a path being used is very low then the transistors on the path will be sized smaller. There is also an increase average circuit performance because when you include data profiling, performance is even better than self-timed alone. Since energy is only consumed when and event happens, the average energy dissipation is also decreased. The decrease circuit noise is caused in part by the fact that fewer transistors are used which decreases circuit activity. The local clock distribution alleviates the greedy global clock network and hazards that can be introduced by clock skew. This technique is less sensitive to changes to process variation because timing is generated locally.
A novel circuit device sizing approach is presented in this section that is based on the optimization of circuit device size for a specified input distribution to minimize circuit average completion time. The performance of circuits realized with this approach outperforms previously proposed self-timed circuits for the specified input distribution. This due in-part to the fact that the circuit input distribution is not used to size circuit devices. The device sizing approach presented here is based on the Newton-Raphson algorithm [11] and generally converges rapidly for a given circuit and input distribution. A self-timed full adder is used in this section to demonstrate the proposed device sizing approach. Figure 3 gives a graphical illustration of a one bit self-timed Ripple Carry Adder circuit path activation probability with eight different input distributions (0-7) and four different activation or critical paths illustrated by the different colors along the path.
Figure 4: Circuit Path Activation Probability
For this simple adder, there are four critical paths which are based on the data input as opposed to synchronous circuits which have one critical path, worst case delay. Figure 3 also shows two distributions, Gaussian or normal and binomial which apply to discrete numbers for digital system. We see that they are very similarity to Gaussian which is by definition continuous. The distributions show the probability that the input appears at the input of the ripple carry adder. If we assume the given distribution, then three is more likely to occur at the input and zero and six are less likely to occur. Therefore, transistors on the green path would be sized larger and transistors on the purple and red path would be sized smaller. Let take a closer look at the fundamental principles of this proposed approach.
The time between the start signal (i.e. self-timed circuit local clock) rising transition and the rising transition on the Done node in fig. 4 is defined as the completion time of the adder. It is a function of the execution time of the self-timed circuit/system functional block. It depends on the circuit inputs and therefore it is the average of all the active critical path delays for the circuit input space. The active critical path delay is the propagation delay along the longest signal path for a given circuit input over the 2 valid input combinations of a self-timed circuit with n primary input bits. The circuit in fig. 4 contains four active critical paths. The circuit four active critical paths from the primary inputs (i.e. 0 , 0 and ) to the output of the completion detection circuit (i.e. node Done) are shown in Fig. 4 with the respective inputs that activate the paths. The bits that define the numbers in Fig. 4 are organized as follows: 0 0 where 0 is the MSB.
DEVICE SIZING
The propagation delay along the active critical path associated with input 000 and 111 can be defined four different ways. This is because both the NAND and NOR gate (i.e. gate 1 and 2 for input 000 and gate 3 and 4 for input 111) are on for these inputs. The active critical path delay associated with these inputs is equal to the propagation time of the path with the minimum value. The four combinations of active critical path delay values are shown in table 1. Let, 0 -probability circuit input is 000. 1 -probability circuit input is 001. 2 -probability circuit input is 010. W 3 -probability circuit input is 011. W 4 -probability circuit input is 100. W 5 -probability circuit input is 101. W 6 -probability circuit input is 110. W 7 -probability circuit input is 111. 
RESULTS
The adder device sizing information computed using logical effort for this distribution is shown in table 3. The device sizing used in previous proposed realizations is also in the not begin use and the larger transistor are used for data paths that have a higher probability of being used. This, in essence allows the circuit designer to boost performance and save power dissipation at the same time. 
Conclusions
The performance and energy given off in the form of heat of self-timed circuits/systems depend on the circuit transistor level implementation, device sizing and input distribution. The device sizing approach used in previously proposed self-timed circuits is identical to that used for synchronous realizations. Therefore it is only optimized to minimize the propagation delay of all circuit signal paths. The performance and energy dissipation, i.e. average completion time and energy dissipation, of the proposed approach for a self-timed circuit is optimized, with respect to device sizing, for a given input distribution. It is less than realizations that do not considered this feature of the input space. This design process causes the active critical path delay of the circuit paths with the highest probability of being active to be less than the path delay in a realization that does not use input data. This design technique also generates delay paths with larger propagation delay than that in previously proposed self-timed circuits design for path that are rarely used, i.e. paths associated with low probability. Both the performance and energy dissipation of self-timed circuits are reduced if the device sizing is optimized for the input distribution.
