In this paper, we present a novel synchronization approach to support data flow in clockless designs using single-rail encoding. This approach is based on self-resetting stage logic in which a pipeline stage resets itself before starting the next execution cycle. As such, a stage goes through a reset phase when its output is null, and an evaluate phase when its output is the result of the evaluation of its inputs. To insure correct operation, a pipeline stage is ready to absorb inputs from a previous stage if it is in the reset phase. As a result, data flow from one stage to another when the preceding stage is in the evaluate phase while the following stage is in the reset phase. To support this data flow, a latch-based synchronization mechanism is proposed. This mechanism yields an efficient and simple uni-directional handshaking scheme between stages that allows for easy implementation. This handshaking scheme is extended to handle the join and forks of data flows encountered in non-linear pipelines. A concept design of a four-bit 16-stage pipeline is presented to illustrate the inner workings of self-resetting stage logic and its data-flow synchronization mechanism. The pipeline performance is examined through a detailed signal timing analysis. This analysis reveals some insights on how the duration of the evaluate phase gradually increases while the duration of the reset phase and the latch enable gradually decreases toward the left stages of the pipeline. This gradual decrease in the duration of the enable of the latches between stages is used to derive a bound on the maximum possible depth of the pipeline.
INTRODUCTION
In digital design, asynchronous approaches can be characterized by a specific data encoding and a precise signaling protocol. Early attempts in asynchronous design focused on implementing signaling protocols at the gate level using static CMOS. Recently, other efforts began to focus on designing circuit logic families such as dynamic CMOS that are better suited to support lowpower fast asynchronous logic. However, few tools are available to synthesize and verify large size architectures implemented in these circuit families [1] . At the most, these efforts show proof of concepts using small data path modules without providing any insight on how they would perform in the context of a large architecture. Our approach is to develop a pipeline stage model with a specific signaling protocol for large asynchronous designs that can be easily synthesized and verified using existing synthesis and verification tools, but also amenable to optimal circuit implementation. As a proof-of-concept in the early stages of development of this approach, we propose a clockless pipeline in which data flows between stages using self-resetting logic gates embedded in each stage. For some time, a special family of dynamic circuit, known as self-resetting logic, has been exploited successfully in RAM cell design. While our self-resetting approach is similar to the one proposed in [2] , it is quite generic and does not require any technology specific synthesis tools. Since our logic is implemented using standard CMOS cell libraries, it is free of any device sizing issues that are often needed to insure correct operation. In addition, no stringent timing requirements are imposed on inputs and outputs as opposed to the approaches in [2, 3] .
SELF-RESETTING STAGE LOGIC
In self-resetting stage logic, a stage consists of two networks: a reset logic network and a combinational logic network. In Figure  1 , the combinational logic computes an output Z and its complement Z'. Only the lined labeled Output is visible to the outside environment. The inputs are represented in single-rail encoding. When the output of the NOR gate is 1, both AND gates are enabled and the Z value is sent to the outside environment. This step represents the evaluation (Evaluate) phase. Since Z and Z' are Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. complement, the inputs of the NOR gate will be 0 and 1, in which case the output of the NOR gate switches to 0. This in turn disables both AND gates and the output to the outside environment become 0. This step represents the resetting (Reset) phase. As the NOR output switches back and forth, a stage can oscillate between a reset and evaluate phase in a single execution cycle. Based on this oscillation, a stage is ready to absorb inputs when it is in the reset phase. While the inputs are traveling along the critical path of the combinational network, the NOR output is similarly traveling along the matching delay shown in Figure 1 . This delay buffer delays the reset phase long enough to allow the outputs of the combinational network to stabilize. To initialize the stage properly, a reset signal connected to the NOR gate is used to force its output to 0, which triggers the reset phase of the stage. Figure 2 shows how the reset network controls the multi-bit outputs of a combinational network, specifically the reset network will be connected to only output bit 0 of the combinational network. The remaining output bits will go through additional AND gates that are enabled by the phase line and thus control the phase of the output bits of the combinational networks. As a result, the output bits reset and evaluate all at the same time. 
LINEAR PIPELINING
As shown in Figure 3 , each stage consists of a combinational and a reset network. Data flows from one stage to another through a latch in a linear pipeline. To insure proper data flow across stages, data is transferred from the current stage to the next one if the current stage is in the evaluate phase while the next stage is in the reset phase. Hence, the latch separating both stages is enabled when both stages are in the evaluate and reset phase respectively. At any cycle of execution, the latch on the left side of a stage in the reset phase will be enabled while the latch on its right side will be disabled. The latter will be enabled only when the stage enters its evaluate phase. At any execution cycle, every other stage will be in the reset phase while the remaining stages will be in the evaluate phase. A cycle later, the stages that were in the reset phase start their evaluate phase while the stages that were in the evaluate phase start their reset phase.
NON-LINEAR PIPELINING
While linear pipelines can be used in many applications, complex systems require data to flow in divergent and convergent directions. Such systems can be realized as non-linear pipelines which support additional primitives such as the fork and join operations. Figure 4 shows the pipeline structure for the join operation. Interstage data flow is similar to the data flow in a linear pipeline. Data is transferred from stage A to stage C when the former is in the evaluate phase while the latter is in the reset phase. Similarly, data flows from stage B to stage C when the former is in the evaluate phase while the latter is in the reset phase. When these conditions are true, the latches separating stage A and B from stage C are activated to latch the outputs of stage A and B, and feed them to the inputs of stage C. This scheme works well when the stage delays are comparable. Figure 5 shows the pipeline structure for the fork operation. forcing the output of the AND gate to be 0. This action, in return, allows the completion of the reset phase of stage A. The role of the AND gate is to wait until both stages, B and C, are in the same evaluate phase before allowing stage A to proceed with its reset phase. This approach works perfectly when all stages, A, B, and C, have comparable individual delays.
The Join Operation

The Fork Operation
PIPELINE PERFORMANCE Let d(E i ) and d(R i
) be the time duration of the evaluate and reset phase in stage i respectively. Also, let P i be the period of stage i, which is the delay between the arrival of an input to the current stage i to the arrival of the next input to the current stage i. P represents a single cycle of execution in a stage consisting of a reset and an evaluate phase.
The Reset and Evaluate Phase
Assume there are n stages in the pipeline. Since the evaluate phase of stage n does not depend on the reset phase of another stage, its reset and evaluate phase tend to have the same duration:
However, this is not true for stages located on the left side of the pipeline. The equal duration of the reset and evaluate phase on the right side of the pipeline can be seen for stage 4 in Figure 3 where the path traversed by N 4 = 0 on the reset network loop (i.e., stage 4 in the reset phase) is the same as the one traveled by N 4 = 1 (i.e., stage 4 in the evaluate phase). However, the evaluate phase of stage n−1 has to wait on the arrival of the reset phase from stage n to the latch-enabling AND gate in order for data to flow from the former to the latter. This has the effect of stretching the duration of the evaluate phase of stage n−1:
Since stage n starts its reset phase somewhat earlier, it tends to complete this phase also earlier, thus causing the reset phase of stage n-1 to be somewhat shorter:
The increase in the evaluate phase and the decrease in the reset phase of stage n-1 with regard to the phases of stage n is exactly the same:
This has the effect of keeping the period equal for both stages n and n-1: 
In brief, this dependence propagates toward the left side of the pipeline thus causing the duration of the reset and evaluate phases to gradually increase and decrease respectively with each stage to the left of the pipeline without changing the duration of a single cycle.
Effect of δ on the Latch Enable
Based on equation (4) shown in the previous section with regard to stage n-1 and n,
) be the duration of the enable of the latch at logic level 1 between stage i-1 and i. Since the latch between stage n-1 and n is enabled when the former is in the evaluate phase and the latter is in the reset phase, the duration of the latch enable depends primarily on that of the reset phase of stage n since this reset phase is shorter than the evaluate phase of stage n-1 as shown in equation (10) As a result, as the duration of the reset phase of each stage decreases by moving to stages on the left side of the pipeline, so does the duration of the latch enable:
) be the duration of the enable at logic level 1 of agiven latch used in the implementation of the pipeline. Any latch in the pipeline will operate correctly if
. Given this requirement, one can predict the maximum number of stages that the pipeline can accommodate by solving the above equation for the variable n starting from stage 1: 
PIPELINE IMPLEMENTATION
A 16-stage four-bit linear pipeline was modeled in structural VHDL where each stage contains a four-bit ripple-carry adder. The corresponding netlist was generated using Synopsys Design Compiler based on a 0.25 µm CMOS library [4, 5] . Cadence Silicon Ensemble was used to place and route the pipeline. The pipeline fits into a frame of 159,254.25 µm 2 yielding a total latency of 197.72 nanoseconds and a throughput of 84.31 MHz. The layout of this pipeline contains 2492 standard cells connected by 2564 nets and 166 IO pins. Four parameters were measured in layout simulations of the pipeline, namely the period of each stage (P i ), the duration of the evaluate phase (d(E i )), the reset phase of each stage (d(R i )), and the enable of each latch (d(L i + )). Figure 6 shows the duration of the latch enable, reset phase, evaluate phase, and the period of each stage. As shown in this figure, δ (labeled as Delta Delay) and the period are constant across stages. However, the reset phase gradually decreases from right to left while the evaluate phase gradually increases from right to left across the stages of the pipeline as predicted by equation (7) and (8). This gradual increase (decrease) in the evaluate phase (reset phase) is attributed to the propagation of δ based on the explanation proposed in Section 5. Furthermore, the duration of the latch enable is almost equal to that of the reset phase of a stage, which shows how the former is closely tied to the latter as derived in equation (12).
CONCLUSION
In this paper, we presented a novel synchronization approach to support clockless designs based on self-resetting stage logic. This approach is illustrated with the design and implementation of a four-bit 16-stage pipeline. The performance of this pipeline is analyzed through a detailed signal timing analysis. It was observed that the duration of the evaluate phase gradually increases while the duration of the reset phase and the latch enable gradually decreases toward the left stages of the pipeline.This gradual decrease in the duration latch can be used to determine an upper bound on the degree of pipelining that can be achieved using self-resetting stage logic.
