The various design limitations imposed on RSFQ circuits through synchronous clocking schemes, make the use of asynchronous clocking attractive. We critically review and compare a wide range of asynchronous methodologies that have thus far been proposed in the published literature. Circuits are optimized using a Genetic Algorithm optimizer and are then compared in terms of yield, critical margins and latency. A full-adder is also designed and simulated for each methodology to obtain information regarding implementation above the primitive cell level.
Introduction
A large number of asynchronous design methodologies have been proposed in published literature in the 15 odd years since RSFQ started gaining popularity. A review and comparison of many of the methodologies are contained in this poster presentation with emphasis on digital logic design. They are, in no particular order, Data-Driven Self-Timed (DDST), Dual-Rail RSFQ (DR-RSFQ), RSFQ Asynchronous Timing (RSFQ-AT) and Delay Insensitive Dual-Rail (DI). Cells were optimized using a Genetic Algorithm with objectives of yield, critical margin, leakage current and latency. Full-Adders for all methodologies were designed and compared in regards to junction count and latency. β c was taken as 1 for all damped junctions. 
References

Conclusion
Genetic algorithm test setup and RSFQ primitives
A multi-objective Genetic Algorithm (Fig. 1) was developed for circuit optimization, using an aggregate method to create single fitness values. A variable mutation rate was implemented to facilitate a high mutation rate at the start of the algorithm and ever decreasing mutations as the algorithm progresses. Parameters governing the algorithm are given in Table 1 , whilst the aggregate weight assignments are shown in Table 2 .
All circuits were analysed and optimized using WRSpice or JSpice3. Standard 250uA JTLs were used as load cells on all inputs and outputs. Process variations for the use of Monte Carlo simulations were based on the 1kA/cm 2 Hypres process as shown in Table 3 [1]. Yield was calculated using 100 Monte Carlo simulations per circuit. It should be noted that very little emphasis was placed on latency, explaining the slow gate propagation times throughout this paper. Latencies are calculated as the relative time between the arrival of the clock pulse and the output pulse. For gates with no clock input the last input received serves as reference to the output. RSFQ-AT and DDST both make use of standard RSFQ primitives to construct their respective circuits modularly, and the standard RSFQ library needed to be optimized. Parameters for the optimized general RSFQ gates are shown in Table 4 .
RSFQ-AT
RSFQ-AT [2] is a simple, yet functional, way to implement asynchronous circuits. For the circuits to function asynchronously, a timing pulse is sent in parallel with each data pulse. This takes the physical form of a clock input for every data input in the logic gate as shown in Fig. 2 . The clock signals are fed into a RSFQ Muller-C element that releases an output pulse only when all input pulses have arrived. The output of the Muller-C element is split to the clock input of the RSFQ logic gate, and to serve as clock signal for the next RSFQ-AT gate. A delay element, denoted by a buffer in Fig. 2 , is used to synchronize the timing of the output data and clock signals. It is clear that for safe operation of the circuit that all the timing pulses must arrive with, or after, the data pulses. It is because of this limitation that RSFQ-AT can not be classified as being delay insensitive. Parameters for the optimized RSFQ-AT gates are shown in Table 5. RSFQ-AT has been used to design an asynchronous CPU [3] . To construct the RSFQ-AT gates, optimized RSFQ primitives were used and thus these gates were not optimized as a unit. This could be why the RSFQ-AT AND and NOT gates suffer from poor stability. Whole gate optimization would minimize leakage currents between elements and have an overall positive effect on yield and critical margin.
DUAL-RAIL
Dual-Rail circuits [4] use two inputs for every logical input to a gate -one 'True' and the other `False' (Fig.  3) . The simulated parameters of this methodology are presented in Table 6 . The difference obtained by optimizing whole gates can clearly be seen by comparing this methodology parameters to those of RSFQ-AT.
The Delay-Insensitive methodology appears to be more robust, faster less space-consuming than its counterparts. More complex circuit experiments are needed to verify these conclusions.
For small purpose applications, RSFQ-AT delivers reasonable speed with relatively small layout. The logic gates will have to be optimized as a whole to raise circuit yield and critical margin. For large systems the process variations might prove to be a problem for the clock-follow-data constraint.
Dual-Rail methodologies appear to be the most beneficial for large scale integration. The logic gates, if designed and optimized correctly, are intrinsically stable in regards to process variations. Although DR-RSFQ has a stable design, it lacks cell variety. DDST can thus be very useful for use with arbitrary RSFQ circuits, though as has been stated before, optimization will have to be performed on complete circuits.
Leakage current effects on larger systems appear to be substantial, as was shown with the RSFQ-AT and DDST. More emphasis will have to be put on leakage current as RSFQ enters the LSI arena.
DATA-DRIVEN SELF-TIMED
Data-Driven Self-Timed is also a Dual-Rail methodology [4] (Fig. 4) . Like RSFQ-AT, this methodology uses mostly existing RSFQ primitives to implement asynchronous circuits. DDST gates are constructed modularly. The arrival of data pulses on both Dual-Rail inputs of a logical circuit is detected using a Muller-C element. This output is then split for use as a clock signal for a logic gate, as well as the complementary D-type Flip-Flop that stores the logic gate's result. It is necessary to slow down the propagation of the clock signal to the D Flip-Flop so that the output is not clocked before the result enters the Flip-Flop. Table 7 contains the parameters of DDST.
Once again the penalty of not optimizing whole gates is apparent. DDST is also the largest methodology tested and thus also most prone to leakage current influence. These gates can also be optimized further for junction count, as many RSFQ primitives contain buffer junctions not needed by this implementation. This would improve all gate parameters.
DELAY-INSENSITIVE-RSFQ
The delay insensitive methodology proposed by [5] does not make use of existing RSFQ primitives. A set of universal primitives were introduced that makes use of Dual-Rails for its timing requirements. Two of the universal primitives are shown in Fig. 5 . A m x n join can be described as having m row inputs, n column inputs and a m x n matrix that contains a corresponding output for each input pair. Questions marks denote inputs while outputs are represented by exclamation marks. The simulated parameters of the methodology are shown in Table 8 . 
Methodologies
Full Adders
Full-Adders for RSFQ-AT, Dual-Rail and DDST were all implemented using a general design (See Fig. 6 ). DI-RSFQ, being a wholly different methodology, had another implementation (Fig. 7) . Latency for all circuits were calculated from last input to last output. Table 9 contains the parameters for the Full-Adder implementations. 
