Abstract
Introduction
Arithmetic operators are often the major building blocks and performance limiting factor for application specific Digital Signal Processing (DSP) and other numerical data processing hardware. It is generally believed that asynchronous arithmetic operators are slower and occupy a larger chip ;rea than their synchronous counterparts and this is supported by several published studies [9] . We contend however, that with careful selection and design of control and data paths, asynchronous operators can be designed with perfonnance equal to that of equivalent synchronous ones. When a degradarionfacror (typically 50 '3% or greater [2] ) is applied to the clock speed to dlow for temperature, supply voltage and process variations, asynchronous designs can exhibit 3 significant performance advantage. Here we present the design of a sixteen bit fixed point parallel multiplier which achieves a performance level similar to that of non-derated synchronous designs.
Asynchronous design techniques
Asynchronous digital logic design removes the global timing constraints of a clocked system. The flow of data is dictated by local timing considerations. This attribute is becoming increasingly important a s feature size is reduced and chip complexity increases. Other potential advantages of asynchronous design include lower power consumption, simplified system level design, and greater product longevity. There are many approaches to the design of asynchronous system\, Hauck in (41 provides an excellent summary. Here we briefly review some methods used for reported multiplier designs which we will use for comparison. We have chosen to use a bundled data path with fourcycle handshaking and a delay model in the control path.
The four-cycle signalling protocol provides a return-to-zero phase which can be used with precharged logic.
Overall structure
The multiplier uses sixteen stages of sixteen bit carry save adders (CSA) and a combination of Manchester mm' for the chosen prototype fabrication process (Orbit !$emiconductor tiny chip) [ 7 ] .
The floorplan of the multiplier is shown in Figure 1 . The X operand is input at the top of the array, while Y IS led in from the right. The array of carry save adders and the final Manchester adder stages occupy most of the ;hip ;wc;t. Pipeline register stages for Y input and lower order outputs are on the right, while control circuitry occupies ii vertical stnp ktween the input pipeline and the adder imiy. Recharged logic with a pull down evaluation tree of n-channel MOSFETS is used for all computational logic blocks.
Progressive evaluation
'To ensure that the NMOS pull down tree evaluates correctly rt is essential that evaluation of a given stage does not commence until all its inputs are correct and stable. One 
use for this technique is Progressive Evaluation (PE).
This technique is similx to a multi-phase synchronously clocked precharged logic system. Timing for the precharge and evaluation phases is derived from taps in the delay model for the ovcrall coinputational stage as shown in Figure 2 . The string of buffers is a delay line which models the delay of the adder stages. co-ordinate data transfer between pipeline stages. When ;m input request is generated, and the stage is empty, latch signal (LO) goes high, latching the data into the input register. The output of the C-element is then driven high starting the evaluation of the first CSA (signal P1 changes from pre-chiuge state to evaluate). The signal progresses through the inverter chain causing signals P2 -P4 to go high in succession. Finally a latching signal is generated for the next pipeline register and all evaluation stages are returned to the precharged state. A SPICE simulation for one pipeline stage of four carry save adders is shown in Figure 3. 5 Circuit elements
Carry save adder
The circuit used for thc carry save adder stage is shown i n Figure 4 . Complementary inputs arid outputs are used to eliminate the need for inverter stages thus minimizing computation time. Simulations using normal transistor process models show that both sum andcany outputs which evaluate to a low state reach 50 % of rail voltage in 0.6 ns and 25 % within0.8 lis. Because no pull-up transistors are active during the evaluation phase, the evaluation of the following stage can be commenced once any low level inputs have settled below the threshold of the NMOS evaluation tree. Allowing 0.8 ns between successive evaluations gives a safety margin of 50 %I with a typical half rail threshold.
Manchester carry adder
The final pipeline stage of the multiplier resolves the high order sixteen bits of the result. This achieved using a combination of Manchester carry adders, and carry select adders
The result is evaluated and latched in approximately 5 ns, which matches the time taken for the previous pipeline stages.
Pipeline registers
The pipeline registers for X and Y inputs and output results, use nine transistor single-phase positive edge triggered D flip flops Ill. The circuit. shown in Figure 5 (a) can be implemented with fewer transistors thin micropipeline style transition registers [lo] and exhibits near zero data hold time. An input inverter is used to give a non inverting latch. resulting in eleven transistors for each latch element. Simulations indicate. when implemented in the 1.2 micron CMOS process used for our design, the latch has a setup time of 0.8 ns and total delay of 1 11s under normal operating conditions.
Carry and sum outputs are latched using the circuit shown in Figure 5 (b) . This is a latched form ofa Muller C-element. When the latch signal (L) is high and D is not equal to -D, the output will equal D. When D equals -D (le when precharged) the output will be held. When L is low the output will also be held. When used wilh the latching signals shown in Figure 3 , the latch is enabled while the latch signal is high and assumes the correct value when the precharged signals evaluale. Simulation results show a delay of0.Yns from input to output. Figure 6 . Operational results should be available for presentation itt the conference.
