INTRODUCTION
In this work, we combine the advantages of silicon-on-insulator (SOI) technology with asynchronous design techniques to assess the overall benefits in reducing power. A 16-bit self-timed adder is employed as the demonstrator circuit. At the technology level, fully-depleted SO1 offers advantages conpared with bulk CMOS. These arise due to lower subthreshold slope, lower vertical electric field hence enhanced channel mobility and reduced junction capacitance. The superior offstate current of SO1 enables the threshold voltage to be lowered enhancing the current drive properties of SO1 technology and enabling M e r power reduction without the loss of p e r f o m c e (1) . At the architectural level of design, the adoption of asynchronous timing rather than a global clock reduces power. Here, the synchronous clock is replaced by local handshake signals between blocks. Although asynchronous control tends to be larger than in synchronous systems, significant power savings should result as the clock generation, drivers, and distribution are consuming around one third of the power in large, complex, high performance systems (2) .
ASYNCJ3RONOUS PROTOCOL
An asynchronous approach encourages a modular design, whereby circuit blocks operate independently of and concurrently with other blocks at their fastest natural rate. Fig. 1 shows the "monly used bundled data method of c0"Unicating between blocks. Valid input data to the block is indicated by a 'Request In' signal. The data r e " valid until the block signals 'Acknowledge In' to the driving block. After allowing the block to operate, 'Request Out' signals the readiness ofethe output data and this data must remain collsfazlt until 'Acknowledge Out' from the receiving block A four-phase protocol where the activation of the Acknowledge causes the Request line to be lowered which in t u m causes the Acknowledge line to be deactivated, is highly suited to an ALU block H e r e , the Request In signal can act as a Start signal for the arithmetic operation while Completion is used to form Request Out. In generating a Request Out signal, it is necessary to detect when the block has completed its operation. Techniques COIIllIlonly used include a matched delay, a --bit data path or self-timing. The latter is possible in blocks such as the adder where the completion time is data dependent. conducted with aspect ratios of all devices set at unity for maximum power reduction. Further simulations were conducted after a degree of aptimisaton for power/throughput whereby aspect ratios of the transistors that were not in the critical time delay path were fixed at unity and other blocla were 'speeded up' by judicious increase in appropriate aspect ratios. The simulations were conducted with 'wry in' equal to logic 1, all 16 bits of @ut A equal to logic 1 and all 16 bits of input B equal to logic 0. This represents a worst-case condition. The delay through the adder was taken between the time when the inputs, A & B, were correct and the complete signal going high, taking the 50% points as the reference. The energy per operation refers to the energy consumed by the adder during the longest wry operation. The energy consumed was logged through the use of an integrator circuit (7) and was measured between the time when the carry in and were all going high, Bo-15 were going low and the complete signal was switching from low fiom high The energy required to reset the adder so that it is ready for its next operation was also included in the calculation.
RESULTS & ANALYSIS
As expected the unity aspect ratio adder on SO1 technology offers reduced energy consumption and superior time delay over bulk, as shown in Fig.7 . The SO1 adder has a delay of 511s at 5V compared to bulks 14--a factor of 3 Merence. For a reduced supply voltage of 1.5V the SO1 adder "tarns its relative performance advantage over the buk (16ns while the bulk is 471s). The SO1 adder uses 40% less energy than bulk at 5V and this increases to 52% at 1.5V. For the optimised bulk adder the delay reduces and the energy consumed by the adder increases in respect of the unity adder, Fig. 8 , a delay of 14ns at 5V before optimisation being reduced to 8. 111s. The SO1 delay improves with supply voltage because delay is proportional to load capacitance and inversely proportional to the transistor drive. The SO1 enhancement NMOS transistor drive advantage over bulk reduces with decreasing supply voltage however, because the accompanYing reduction in the vertical field has more leverage on bulk mobility than that of SOL The d b u t i o n of bulk conduction in the case of accumulafion mode SOI-PMOS is thought to result in the enhanced current drive at lower voltages.
. . The delay and mergy results of the two adders were "pared. The delay of the bulk adder was reduced by 4045% (supply voltage range 1.5V to 5V) due to opthisation while the energy increased from 10-15% over the same voltage range (Figs.9 and 10) . The SO1 adders delay is reduced by 20-25% while it uses 20-25% more en-.
There is more leverage to optimise the bulk design than compared to the SO1 design. The energy-delay data for the 4 versions of the adder can be seen in figure 11. As the plots have, no "dip" the optimum supply voltage for the adder still has not been reached. The energy-delay 4 product for bulk is higher than for SOI; while the SO1 optimised and Unity aspect ratio adders have nearly identical energy-delay product. The energy consumed by the adder was divided up into 3 categories and is summarised in Fig.12 : firstly the energy dissipated in the hrst XOR gate in each bit, secondly the energy consumed by the adder when there is carry ripple in the circuit ('carry')y and M Y the "W required to =set the adder ('reset'). The energy required to reset the adder adder for the two supply voltages stated collsumes about a fifth of the total energy for both SO1 and bulk adders. The energy used by the first XOR gate in the SO1 adder is a higher proportion of the total energy compared to bulk (approx. 40% for SO1 compared to approx. 30% for bulk). It is worth noting that only 40% of the energy used by the SO1 adder (50% for bulk) is actually being used for the carry ripple and validation circuiw. The results far the SOIunity aspect ratio adder at N are a little confusing as there are is anXOR gate and 2 inverters m the XOR circuit while there is an XOR, a multiplexery an inverter and two NAND gates in the Cany circuit. When all the A inputs switch at the same time (AO-A15 ia Fig.14) a large current spike is appamt in the current drawn h m the supply. The magnitude of this spike is greater in SO1 than in bulk due to the higher drive of the SO1 transistor, 14mA "pared to 9mA (Fig. 14) , although the energy dissipated during switching is 67pJ for bulk and 56pJ for SOI. The magnitude of this current spike for the SO1 case is much reduced during the carry-ripple phase of the adder 1.25mA compared to 14mA. The above displays a need for SO1 circuits to have the capability of handling large current spikes, which may not occur in bulk circuits. This d relate to wider metal tracks and more vias & contacts than required m bulk CMOS design. Hence, the SO1 designer needs to be extra carefbl when using global signals in SO1 circuits, which activate circuits at the same time. The latter is a good case for using asynchronous design for SO1 compared to synchronous design.
