A fast control wrapper for a micropipeline with two-phase control is presented.
datapath. Control logic performance is also important if a micropipeline stage has finished a computation, and is waiting on an acknowledgment from a successor stage in order to latch the new computation, thus providing the new value to the successor stage. Acknowledgements propagate backwards through the pipeline, and thus do not have delay elements in their path.
A Fast Two-Phase Wrapper: Figure 1 shows the two-phase micropipeline control wrapper used in the design of a five-stage pipelined MIPS-compatible processor [5] . Each bundled data input i consists of a group of data lines data_bundl_i and its associated control line Cin_i. Each predecessor stage (fanin) provides a data bundle, and each successor stage (fanout) provides an acknowledgement signal. The control is two-phase, so each Cin input and acknowledgement will either all transition low-to-high, or high-to-low. After all Cin and acknowledgements have transitioned, then the C-element output transitions high-to-low or low-to-high. The XOR gate and Cout loopback signal generates a high-pulse on the GC signal when the C-element output changes state, latching the new outputs. The delay elements on the Cin inputs are used to match the delay of the control path to the compute function path. A 0.13µ standard cell library from Artisan was used to implement the processor presented in [5] . The C-element was mapped to standard cells using the approach in [4] , as the Artisan standard cell library did not have an integrated C-element. Processor simulations using pre-layout, Verilog gate level simulations generated by the Synopsys synthesis tool indicated that the control logic path was the limiting performance factor in several blocks, either because the compute function delay was small, or because the block was triggered by arrival of an acknowledgement. The C-element and XOR gate was subsequently replaced by the logic shown in Figure 2 . This removed the XOR gate from the critical path of the control logic, and also reduced the delay of the arrival detection Deleted: the logic. The non-inverting delay in the multiplexer select path is used to increase the high pulse width of the GC signal. Table 1 would only occur if the micropipeline was using a form of delay-insensitive dual-rail signaling between micropipeline stages. Figure 3 gives the path detail for the original logic in the case of 8 control inputs and 256 data outputs, while Figure 4 gives the path detail for the new wrapper logic using the same test case. The standard cell naming convention is gtype_k_X_n, where k is the number of inputs for gate type gtype, and n is the drive strength. From Figures 3 and 4 , it is obvious that the new wrapper logic has a faster critical path, and that the XOR gate in the original design contributes a substantial portion to the total delay for this particular case.
Conclusion: This paper introduces a fast two-phase control wrapper for a micropipeline block.
The wrapper is intended for efficient mapping to a commercial standard cell library that does not have specialized support cells such as C-elements for asynchronous design. 
List of Captions

