Abstract-Phased logic is proposed as a solution to the increasing problem of timing complexity in digital design. It is a delay-insensitive design methodology that seeks to restore the separation between logical and physical design by eliminating the need to distribute low-skew clock signals and carefully balance propagation delays. However, unlike other methodologies that avoid clocks, phased logic supports the cyclic, deterministic behavior of the synchronous design paradigm. This permits the designer to rely chiefly on current experience and CAD tools to create phased logic systems. Marked graph theory is used as a framework for governing the interaction of phased logic gates that operate directly on Level-Encoded two-phase Dual-Rail (LEDR) signals. A synthesis algorithm is developed for converting clocked systems to phased logic systems and is applied to benchmark examples. Performance results indicate that phased logic tends to be tolerant of logic delay imbalances and has predictable worst-case timing behavior. Although phased logic requires additional circuitry, it has the potential to shorten the design cycle by reducing timing complexities.
4 mechanics of phase changes are completely decoupled from the logical behavior of the gates; instead, phasing is determined solely by the circuit topology. Phased logic does not use Petri net theory to model a system's logical behavior; rather, it is used to establish the phasing activity that ferries data values from gate to gate without overloading some gates or starving others. By decoupling the phasing activity from the logical behavior, phased logic avoids the complications of how logical choices steer signal events. Because of its decoupling of the temporal control for data transmission from the data values being transmitted, phased logic can be viewed as an extension of an asynchronous datapath method to encompass the less regular circuitry of control. A nice consequence of this is that data and control signals can be handled uniformly and freely mixed.
The price of reducing sensitivity to delays and eliminating global clocks is nearly always an expansion in circuitry that increases component cost. Phased logic is no exception; so, one near-term application would be to help rapidly develop complex computers that must be verified by running end-user programs. If such a computer is designed as a conventional clocked system, the first implementation of its gate networks can be a phased logic prototype used to execute test applications much faster than simulations. Because of its delay-insensitivity, this prototype could evolve through various mixtures of software, programmable hardware, and custom ICs without changes in the components resulting in an avalanche of timing problems. Section II of this paper describes phased logic gate primitives and discusses the application of Petri net theory as a framework for governing gate interactions. Section III develops a synthesis algorithm that converts a specification for a conventional clocked system into a phased logic system. Section IV lists the important behavioral and performance properties of phased logic, and Section V discusses implementation issues.
II. Phased Logic Concepts
The network of gates in Fig. 1 is a prototypical clocked system that will be used to illustrate the phased logic methodology. The gates may be primitives like ANDs and D flip-flops, or they may represent larger subsystems such as an ALU. These conventional gates rely on a clock to regulate their interaction. The designer assumes that the outputs of all the sequential gates change (almost) simultaneously in response to a clock event. The sequential gates align the computations 5 of the combinational gates whose outputs may change at any time in response to events on their inputs. Note that the clocked system in Fig. 1 uses one ungated clock and is "closed," i.e., all gate inputs and outputs connect to other gates in the network. These restrictions are imposed by the phased logic synthesis algorithm. Requiring the system to be closed does not mean that it must be isolated from the environment. Rather, the regular inputs and outputs should be grouped into logical interfaces that are represented by interface "gates." Fig. 1 shows an example of a single combinational gate acting as an interface, but multiple combinational and sequential gates may be interfaces in a given system. With the I/O organized this way, the synthesis algorithm can use the topology of the gate network to guarantee critical properties of the final phased logic system. The foundation of phased logic's delay-insensitivity is that the new gate primitives substituted into a system like that of Fig. 1 will communicate via signals that carry both timing and value information simultaneously. Timing and value information are combined using a dual-rail sequential gate combinational gate clock generator external environment interface "gate" dual-rail approaches such as four-phase encoding, transition-signaling, etc., and any of these could be used. However, this paper describes a particular method based on the Level-Encoded two-phase Dual-Rail (LEDR) scheme [8, 15] . Unlike four-phase encoding, LEDR needs no resetting transitions that consume time and power, and unlike transition-signaling, there is a fixed relationship between the last logical value transmitted and the current subsignal values. signal s using the LEDR encoding. The two subsignals are given subscripts v for "value" and t for "timing." A feature of the LEDR encoding that makes interfacing to phased logic easy is that the v subsignal always carries the logical value of the LEDR signal. When the value does not change, but a timing transition is needed, the code word representing the current value in the opposite phase is transmitted. This creates a transition on the t subsignal. The prominence of the even and odd phases in the LEDR encoding is the source of the name "phased logic."
Recent proofs show that, in spite of the way signals are encoded, conventional gates cannot be used to build complete delay-insensitive systems (especially control circuit portions) [16, 17] . To escape the limitations of conventional gates, phased logic uses primitives that operate directly on LEDR signals. Other researchers [8, 15] have proposed similar primitives, but they assume a different timing convention for multiple outputs, and they do not apply the marked Fig. 3 shows an example phased logic AND gate with two LEDR inputs and one LEDR output. At time zero, the even gate phase indicates that the gate is waiting for its inputs to change to even. By time two, both inputs are even and the gate is enabled. After a propagation delay, the gate fires at time three. Causality arcs A and B emphasize that the gate does not fire until the inputs reach the proper phase. When the firing occurs, the gate phase toggles to indicate that the inputs must change to odd before the gate can fire again, and the output phase toggles to show that a new output value is available. The toggling of the output phase is also an external sign that the input values are no longer needed. Until this phase change, the environment of the gate must guarantee that the inputs remain stable. The inputs change again at times four and five, and arcs C and D emphasize that these events must occur after the gate fires. In this case, the inputs change soon after the firing, but, in general, a gate must be able to wait an indefinite time before it is reenabled.
The value subsignals in Fig. 3 display the gate's logical behavior. For example, s v 1 and s v 2 are both 1 at the first firing; so, s v 3 toggles to 1 in accordance with the AND function of the gate. If the function were not AND and s v 3 needed to stay at 0, then s t 3 would toggle to produce a phase change at the first firing instead of s v 3 . There are no restrictions on the logical function a phased logic gate implements except that it should not change based on the timing of input events.
More precisely, the gate should be a delay-insensitive component [18, 19] .
Note that a phased logic gate's outputs are held constant during the transition between phases at the inputs. For example, the output is constant between times four and five in Fig. 3 even though it is already known that the output value should go to 0. Thus, in addition to 8 calculating a logical function like a combinational gate, a phased logic gate also has some characteristics of a sequential gate. A global reset signal is needed to fix a phased logic gate's initial output values and phases. However, unlike a global clock, this reset signal is not required to have low skew.
Like the clocked system it emulates, a phased logic system will have a closed topology, and it must interact with the environment through interface gates. To the phased logic system, an interface gate appears to be a normal phased logic gate that fires when enabled. To the external environment, an interface gate appears to be an I/O interface that must be specially treated. That is, the environment must wait for a set of inputs to arrive, perform a logical computation, and then output values with the appropriate phase. Of course, the environment must drive all of the interface gates through continuous phase changes in order to keep the phased logic system operating.
A single output of a phased logic gate may be connected to several gate inputs; so single output gates are sufficient to build a system. However, a phased logic gate is not restricted to one output, and Fig. 4 illustrates important details of the behavior for a gate with multiple inputs and outputs. Between times one and two, the inputs change from odd to even and enable the gate. The gate fires at time three, and causality arc A indicates that the inputs can start to change phase as soon as any output changes phase. This is an important difference between the timing convention of phased logic and that of previous LEDR applications. The synthesis method discussed in Section III assumes that the firing of a gate is an atomic event. Any output change is an indication that the firing has occurred and that the current inputs are no longer needed. Previous timing conventions have required all outputs to change before any input could change. Thus, in phased logic, precautions must be taken in a gate's design to latch values and/or balance delays as necessary to ensure that the inputs may change any time after an output changes. This is a special concern if a large amount of circuitry is encapsulated in one phased logic "gate" having many output signals.
However, to simplify the gate designer's job, the environment of the gate must guarantee that all the outputs have changed before the gate is reenabled. This is indicated by causality arc B .
The timing convention of a phased logic gate, called the firing rule , can be summarized as two constraints: 
10
• Internal Constraint: The gate fires if and only if it is enabled . This is illustrated by causality arcs A and B in Fig. 3 . The firing is defined to occur when the first output change is observed. This is suggested by causality arc A in Fig. 4 .
• External Constraint: The phase of each input and output toggles once between the n th and ( n + 1) th firings of the gate. In particular, after the gate is enabled, the inputs do not change phase again until the gate fires. This is suggested by causality arcs C and D in Fig. 3 . After the gate fires, all the outputs change phase before the gate is reenabled. This is illustrated by causality arc B in Fig. 4 .
The internal constraint is a requirement on gate design that specifies when a gate should perform a computation. The external constraint is a requirement on system design that guarantees a gate will have sufficient time to complete a computation. The firing rule represents a contract between the gate designer and the system designer that must be upheld for proper operation. Showing how to enforce the external constraint is the focus of the remainder of this section and Section III.
When the phase of an input signal matches the phase of the receiving gate, a token is said to be present at the input. A token "carries" a new data value that has not yet been used in a computation by the gate. Fig. 5 information. However, the proper interpretation is that the token indicates that the receiving gate has not used the signal's information yet. In token diagrams, multiple output signals may represent multiple gate outputs, or as shown in case three, they may indicate that one output is connected to multiple gates. This suits the representation of a phased logic system as a directed graph.
As the signal and gate phases change, the tokens appear to flow through gates and circulate around the system. Fig. 6 illustrates this dynamic behavior by showing the time steps leading to the first firing of the AND gate in Fig. 3 . An alternative view of the firing rule's internal constraint is that a gate acts like a data flow operator [20] , i.e., it fires when there are tokens on all its inputs. This is the same as saying that all the input phases match the gate's phase.
A closed network of phased logic gates starts at reset with an initial assignment of phases to the gates and signals called the initial phase marking . However, it will be easier to work with tokens and specify an initial token marking . It is possible to find an initial phase marking for any initial token marking if phase inversion of gate outputs is used. The default condition, shown in A buffer gate could be added on input s , but that might require more circuitry than g 1 itself. This is a safeness problem, and it represents a violation of the firing rule's external constraint at g 1 . Examining the actual phase changes shows that the phase on input s of g 1 toggles twice before g 1 fires for the first time. Gate g 1 might output an erroneous value or it (a) Initial token marking with a liveness problem (b) Token marking after gate g 2 fires might believe it is not enabled at all. Although the firing sequence {g 2 , g 4 , g 2 } could cause a system failure, the sequence {g 2 , g 4 , g 1 , g 2 } would not. Either sequence is possible with the appropriate gate and wire delays, and thus, the system would not be delay-insensitive.
For phased logic to be useful, it must be possible to automatically determine initial token markings that are free of liveness and safeness problems. Fortunately, graphs with token markings like those shown in Fig. 7 have already been studied as a special case of Petri nets called marked graphs [11, 21] . Marked graphs are also called decision-free systems because they cannot model behavior involving a choice in token movement. However, using marked graphs does not restrict the logical behavior of a phased logic system because logical choices can still control the values carried by the tokens even if the token movements always follow the same pattern.
In marked graphs, the vertices are called transitions and the edges are called the flow relation. The initial marking of a marked graph evolves through transition firings. When a transition fires, a token is removed from each input edge and a token is added to each output edge. A marking M i is said to be reachable from marking M j if the token distribution of M i can be created from the token distribution of M j through some sequence of firings. A marked graph is live if for any marking M reachable from the initial marking, every transition is enabled in some marking reachable from M. A marked graph edge is safe if no more than one token can appear on the edge in any marking reachable from the initial marking. A marked graph is safe if all of its edges are safe.
In phased logic, the marked graph associated with a phased logic system is required to be live and safe. With this restriction, the problems seen in Fig When a phased logic system's marked graph is safe, the firing rule's external constraint is satisfied at every gate. This is because a token appearing on a signal corresponds to the phase of that signal toggling. Consider the input and output signals of some gate g i . Safeness guarantees that a token appearing on an input signal is removed by a firing of g i before another token appears; so, the input's phase toggles once between firings of g i . Suppose an output signal of g i is received
by gate g j . Safeness also guarantees that after g i fires, a firing of g j consumes the token on the output before g i fires again. That is, the system topology guarantees that the reenabling of g i is postponed until g j observes the phase change on the output and fires. For a self-loop, the phase change must certainly occur before g i is reenabled. Thus, the output's phase toggles once between firings of g i . A live and safe associated marked graph also guarantees delay-insensitivity. It can be shown that, assuming liveness and safeness, each gate receives the same sequence of input values and generates the same sequence of output values regardless of gate and wire delays [3] .
Let a directed circuit in a graph be a closed sequence of edges and vertices in which no edge is repeated, and let the token count of a directed circuit be the total number of tokens on all the edges of the circuit. The following theorems [21] will be very useful in Section III.
Theorem 1:
A marked graph is live if and only if the initial token marking places at least one token on each directed circuit.
Theorem 2:
A live marked graph is safe if and only if every edge belongs to some directed circuit with a token count of one in the initial token marking. Such a circuit is called a synchronizing loop [22] .
Both of these theorems hold for the initial marking shown in Fig. 8 . The marking in part (a) of Fig. 7 has a liveness problem because directed circuit C 1 has no token. The marking in part (c) has a safeness problem because s only belongs to C 2 and this directed circuit has two tokens.
III. Phased Logic Synthesis
This section describes a synthesis algorithm that takes the topology of a clocked system like that of Fig. 1 and generates a phased logic system with the same logical behavior. To ensure 15 that the phased logic system operates continuously like the clocked system, the algorithm guarantees liveness, i.e., every directed circuit of the phased logic system has at least one signal with an initial token. To ensure that the external constraint of the firing rule is satisfied, the algorithm also guarantees safeness, i.e., every phased logic signal belongs to a synchronizing loop. To make the phased logic system emulate the clocked system's synchronous behavior, the algorithm enforces a third constraint on the initial token marking that will be discussed shortly. This constraint significantly limits the freedom of token placement. In fact, meeting all three constraints imposed on the initial token marking by the liveness, safeness, and synchronous requirements usually forces the synthesis algorithm to supplement the topology of the clocked system with additional signals and gates. Intuitively, these additions act like "local clocks" that compensate for the loss of the global clock. The additional signals and gates are overhead that the algorithm seeks to minimize. "aligned" in the phased logic system so that it emulates the synchronous behavior of the clocked system. Fig. 10 shows the initial token marking for the example's original phased logic system.
An important feature of this type of marking is that the initial tokens in the original phased logic network lie on the outputs of barrier gates. A barrier gate is one that has initial tokens on all its output signals, and these gates figure prominently in the synthesis algorithm. They are also the key to hierarchical connections of separately synthesized phased logic subsystems [3] . for each gate g j found in the backward depth-first search Perform a forward depth-first search along clear paths starting at g j to determine which, if any, signals are covered if a feedback signal is added from g i to g j . Calculate the score for a feedback signal from g i to g j . If the score is the best seen so far for any g i , save the feedback signal as the current best. Add the best feedback signal seen to the original phased logic network and mark all the signals that are covered by it. acknowledge signals of static data flow [20] but are handled in a more general manner. Note that feedback signals can be treated as normal phased logic signals for which the value is constant.
That is, the v subsignal in the LEDR encoding is a constant 0 or 1 while the t subsignal is toggled each time the phase changes. Since the value subsignal is constant, it can be dropped. This saves a wire and makes a feedback signal half as expensive to implement as an LEDR signal.
The synthesis algorithm may require any gate to generate or receive feedback signals.
Generating a feedback output is easy because a toggling internal state variable is needed to keep track of the phase of the gate. This can be brought out as the feedback signal generated by the gate. There are several options for receiving feedback inputs. It may be possible to connect a feedback signal to an unused LEDR input by tying the v subsignal to a constant that removes it from the computation (such as 1 for an AND gate). Another possibility is to use a parts library with versions for each gate having zero, one, or more dedicated feedback inputs. If many feedback inputs are needed, a Muller C-element [10] can act as a feedback concentrator.
The second step of the synthesis algorithm in Fig. 9 begins by adding special gates to solve problems associated with the connection of feedback signals to barrier gates. Consider the unsafe signal s 7 in Fig. 10 connecting the barrier gates g 2 and g 7 . Because of the restriction discussed above, a feedback signal without an initial token cannot be added from g 2 to g 7 to cover s 7 because g 2 is a barrier gate. No other feedback signal can create a synchronizing loop through s 7 . Since all initial tokens in the original phased logic network lie on barrier gate outputs, this problem can be avoided by not allowing direct connections between barrier gates. Thus, step two begins by inserting splitter gates into signals that directly connect barrier gates. A splitter gate simply copies its input value to its output, but it also serves as a connection point for feedback.
Consider the problem of finding all signals that lie on paths connecting two gates. Variants of this problem arise in the remainder of the algorithm, and they can be solved using depth-first searches. A forward depth-first search marks every vertex that the starting vertex can reach via a directed path. A backward depth-first search marks every vertex that can reach the starting vertex via a directed path. To determine the signals that lie on paths connecting an output of gate g i to an input of gate g j , a backward depth-first search is performed starting at g j followed by a forward 20 depth-first search starting at g i . If a signal's destination gate is found by the backward search and its starting gate is found by the forward search, then that signal lies on a path from g i to g j .
After inserting splitter gates, the next activity in step two of the synthesis algorithm is to mark the signals that are already safe in the original phased logic network. To find the safe signals, each token can be examined to determine all the signals that lie in circuits containing the token but not passing through other signals with tokens. Since all initial tokens are on the output signals of barrier gates, it is easy to examine each token by scanning through the barrier gates. For each barrier gate, the signals that lie on paths starting and ending at the gate but not passing through other barrier gates are marked as safe. For example, signal s 1 in Fig. 10 is safe because of a directed circuit that starts at barrier gate g 2 and passes through gates g 3 and g 1 . As outlined in Fig. 9 , the signals meeting this criteria can be found with the depth-first search method discussed above if the searches are not allowed to recurse through barrier gates.
After marking the signals that are already safe, step two's main activity of adding feedback signals to cover unsafe signals can finally begin. Recall that a feedback signal without an initial token can only be connected from a non-barrier gate to a barrier gate. Because of this restriction, the synthesis algorithm only tries to add feedback signals between gates that are connected by at least one clear path. A clear path is a sequence of gates and signals that avoids barrier gates except for possibly a single barrier gate at either the beginning or the end of the path. If no clear paths connect a feedback signal's destination gate to its source gate, then any new directed circuits created by the feedback signal will have at least two initial tokens. However, if clear paths exist from the destination to the source, then the feedback signal covers all of the signals on these clear paths. To understand why, consider the three types of feedback signals associated with clear paths: the source and destination are not barrier gates, the source is a barrier gate but not the destination, and the destination is a barrier gate but not the source. Table 1 shows an example feedback signal for each of these types that could be added to the network in Fig. 10 Table 1 , only two signals would be counted as covered since s 1 was already safe. If the number of covered signals is zero, the feedback signal is useless and is not considered further. Otherwise, its score is determined by the following equation:
.
(
The two negative terms are meant to influence the choice between feedback additions that cover roughly the same number of signals. The "number of feedback inputs" is the number of other feedback signals that are already inputs to the potential feedback signal's destination gate. This term reduces the score in an attempt to distribute the required feedback inputs across the available gates. There is also a small reduction in the score for the "length" of the feedback signal (the 0.1 with smaller cycle times. Cycle times are discussed in Section IV(B).
The while loop in Fig. 9 adds feedback in a greedy manner until all signals are covered.
During each iteration, the gates are scanned to determine the feedback signal that has the best score after taking into account the effects of feedback added on previous iterations. For each gate Some circuits had a few gates that needed a large number of feedback inputs (>20). However, this was associated with gates having unusually large fanouts (>100). Obviously, some optimization would be needed on these circuits before synthesis. Table 2 shows that the added feedback signals 23 increased the wiring an average of 9.9% over the normal factor of two needed by the LEDR encoding. This increase takes into account that a feedback signal requires only one wire.
If is the number of signals and is the number of gates, then a worst-case bound on the time complexity of the basic algorithm is . This can be reduced with various enhancements. For example, the best feedback signals originating at the majority of gates will be unchanged in an iteration and can be cached instead of recalculated. The number of feedback signals considered for addition can also be reduced by only looking for candidate destination gates at certain levels in the backward depth-first search. This may miss better candidates, but after the first four levels, trying gates only at levels equal to powers of two seems to work well. With these enhancements, the complexity of the synthesis algorithm is closer to .
IV. Phased Logic Properties

A. Synchronous Behavior
Without a clock network, a phased logic system cannot guarantee that the values for all its synchronous signals are generated simultaneously. However, simultaneous generation is not where the values at reset are considered to be the result of the "zeroth" evaluation. A phased logic system is said to "display synchronous behavior" if it preserves this critical characteristic, i.e., the n th evaluations of synchronous functions use the output values from the (n -1) th evaluations. The phased logic synthesis process must ensure that, even with the clock removed, a synchronous function does not compute too quickly and get ahead of another function.
S G O S G
To explain how phased logic can display synchronous behavior even though it is delay-insensitive, cycle numbers are defined. The cycle number of a phased logic gate indicates the number of times the gate has fired since reset, i.e., the number of synchronous function evaluations in which it has participated. Gate cycle numbers are zero at reset. The cycle number of a token indicates the synchronous function evaluation for which the token's value should be used.
This number is assigned according to the following rules:
• If a token is on a synchronous signal, it represents a new output value of a synchronous function, and the cycle number of the token is one greater than the cycle number of the gate that generated the token. An initial token on a synchronous signal has a cycle number of one.
• If a token is on an ordinary signal, it represents a new intermediate value of a synchronous function evaluation, and the token's cycle number is the same as the cycle number of the gate that generated the token. An initial token on an ordinary signal has a cycle number of zero.
• If a token is on a feedback signal, no value is associated with it, and it does not affect the logic of function evaluations. Thus, its cycle number is left undefined.
If the tokens are visualized as flowing through the system, the synchronous signals appear to increment the token cycle numbers. The tokens generated as synchronous function outputs of the (n -1) th evaluation are given cycle number n to indicate their use in the n th evaluation of the synchronous functions. Therefore, a phased logic system displays synchronous behavior if the non-feedback signal tokens that enable the n th firing of a gate always have cycle number n.
The key to the synchronous behavior of phased logic is the link between the cycle numbers of connected gates. Let g i and g j be phased logic gates, and let signal s join an output of g j to an input of g i . Since the marked graph associated with a phased logic system must be safe, when g j fires and places a token on s, g i must fire before g j fires again. Alternatively, once g i fires and
consumes the token on s, the firing rule does not allow g i to fire again until g j fires and places another token on s. If the initial token marking places a token on signal s at reset, then g i must fire first to remove the token from s before g j fires. Gate g i reaches cycle one while g j is still in cycle zero. Since the cycle number of g j is always less than or equal to the cycle number of g i , g j is said to be behind g i or, equivalently, g i is ahead of g j . If the initial token marking does not place a token on signal s at reset, then the cycle number of g j is always greater than or equal to the cycle number of g i , and g j is ahead of g i . Fig. 13 illustrates the ahead and behind relationships for a more complex situation. In this example, g 2 is ahead of g 1 and g 3 is behind g 1 .
Note that the initial token at time zero in Fig. 13 is placed on a synchronous signal and that the n th firing of g 1 is always enabled by tokens with cycle number n. This is no coincidence. An arbitrary initial token marking may not yield synchronous behavior even if the associated marked graph is live and safe. However, the next theorem gives sufficient, but not necessary, conditions. Proof: Let g i be a gate in a phased logic system. Suppose that g i is enabled and is ready to fire for the n th time. At this point, g i has a cycle number of n -1 and has tokens on all of its input signals. It must be shown that the non-feedback signal tokens enabling g i are cycle n tokens. Let signal s be one of g i 's non-feedback input signals, and let g j be the gate driving s. If signal s is an ordinary signal, no token was placed on s in the initial marking and g j is ahead of g i .
Gate g j is in cycle n and, by the rules for token cycle numbers, the current token on s has cycle number n. If signal s is a synchronous signal, a token was placed on s in the initial marking and g j is behind g i . Thus, g j is in cycle n -1 and, by the rules for token cycle numbers, the token on s has cycle number (n -1) + 1 = n. Hence, the non-feedback signal tokens that enable the n th firing of a gate always have cycle number n.
B. Performance
Although phased logic is delay-insensitive, the actual performance is still critically important. The cycle time of a concurrent system is defined as the average time separation between equivalent events in the system [24] . For phased logic, the equivalent events of interest are the consecutive firings of a gate. The cycle time for systems described by marked graphs has been studied by several researchers [24] [25] [26] , and applying their results indicates that the cycle time of a phased logic system is determined by the delays of signals and gates that form elementary directed circuits. Let the cycle time of an elementary directed circuit be defined as the total delay around the circuit divided by the token count of the circuit at reset [26] . Then, the cycle time of a phased logic system is the maximum cycle time of all elementary directed circuits.
The cycle time is a measure of the best average performance the system can achieve.
However, it would be useful to find a bound on the minimum performance. The authors have used the analysis approach of [25] to derive a worst-case performance bound for phased logic [3] .
Although the problem seems quite nonlinear, equations describing a phased logic system can be written in a special max-plus algebra and solved in a manner analogous to linear systems.
Let a synchronous path be one that starts with a synchronous signal and then takes any number (including zero) of ordinary or feedback signals to its destination. The algebraic analysis yields an upper bound on the firing times of gates expressed in terms of the maximum delay along any synchronous path in the system. Let be the time of the n th firing of gate g i , let be the maximum synchronous path delay, and define to be a constant greater than or equal to the latest time for the first firing of any gate. Then the bound can be stated formally as .
The constant accounts for possible delays in the first firings. Delays may result if the inactivation of the reset does not reach all gates simultaneously. A gate could be ready to fire but might be held back by a lingering active reset. The constant should be set to a value greater than and large in comparison to any expected skew in the reset. Although the bound in Eq. (2) constrains the gate firings to occur at least by the specified times, the firings can occur earlier and particular intervals between firings can be smaller or larger than .
A nice feature of this bound is that the maximum synchronous path delay is closely associated with performance measures for clocked systems. When a phased logic system is synthesized from a specified clocked system using the algorithm of Section III, paths between sequential gates in the clocked system become synchronous paths in the phased logic system. Also, since feedback signals are only added between gates that are connected by at least one clear path, they cannot introduce synchronous paths longer than the one corresponding to the longest path between sequential gates in the clocked system. Thus, the performance analysis tools used to optimize the length of paths between sequential gates in a clocked system can be applied to phased logic.
Note that the average performance of a phased logic system may exceed the worst-case performance bound because of the interaction of synchronous path delays. Recall that the average time between consecutive gate firings is the maximum elementary circuit cycle time. For a given phased logic system, the synchronous path having the longest delay may be combined in elementary directed circuits with synchronous paths having shorter delays. Since each synchronous path in a directed circuit has one synchronous signal, and thus, adds one token to the circuit, the maximum synchronous path delay is averaged with the delays of the shorter paths.
An example of this is shown in Fig. 14 where each gate and signal is assumed to have unit delay. Suppose this elementary directed circuit has the maximum cycle time and contains the
longest synchronous path. The worst-case bound of Eq. (2) would only require gate firings to occur before time barriers spaced every eight units, but the average time between gate firings would actually be six units. This averaging effect can also be seen in the synthesis benchmarks. Table 2 shows that the average performance is sometimes 30% faster than the worst-case performance predicted by the maximum synchronous path delay. In a clocked system, one long delay path between sequential gates can force a clock cycle increase. This can lead to extra design effort as an attempt is made to precisely balance all of these paths. However, the averaging effect for phased logic delays could diminish this effort since only rough balances would be needed.
Because of the predictable worst-case performance of phased logic, a system can be a combination of clocked and phased subsystems. In this case, the maximum synchronous path delay must be less than or equal to the clock period. Also, the reset of the clocked subsystem must be inactivated after its first inputs have arrived and after . The advantage of such a mixture is that, although the phased logic gate and wire delays have to be considered in the design process, the difficulty of distributing the clock over the phased subsystem is eliminated.
V. Phased Logic Implementation Issues
Other researchers [8, 15] Any of the inputs can be programmed to accept feedback input signals, and the toggling internal gate phase is exposed as a feedback output signal, s f . The regular structure of the RAM design makes it possible to assume that the internal delays of circuits with similar geometries are reasonably matched, and this simplifies the circuitry without sacrificing adherence to the firing rule.
Explicit delay elements meant to time the operation of other components are avoided. To help satisfy the internal constraint of the firing rule, each decoder waits until its inputs match the phase of the gate before generating a minterm output. For example, Fig. 16 shows the row decoder cell that detects a combined logical value of 2 on s 1 and s 2 (where s 2 is considered the most significant bit for the row). Signals po and pe from the control logic indicate the gate phase. When the gate phase is odd, po is high and enables the ro 2 circuitry while pe is low and precharges the re 2 circuitry. Under these conditions, ro 2 goes high if the combined logical value of s 1 and s 2 is 2 and they are both odd. When the gate phase switches to even, po and pe toggle and the re 2 circuitry is enabled while the ro 2 circuitry is precharged to prepare for the next odd gate phase. The column decoders operate in a similar manner. Fig. 17 shows the memory cell used in the RAM array. It is similar to those in two-port static RAMs except that the two "ports" are used for the two phases instead of for satisfying two simultaneous data requests. By separating the even and odd phases, the signals associated with one phase can be precharged while the stored value is read by the signals associated with the other phase. For example, when the gate phase is odd, circuitry in the multiplexer is precharging the e x
and en x signals for all columns while the o x and on x signals are ready to discharge. When the ro x signal for the selected row goes high, each cell in the row discharges either the o x or on x signal in its column. The o x signal is discharged when the cell's stored value is 1 while the on x signal is values from the proper column to the control logic where the actual output signal is generated.
Note that ro x and re x are only used to read the cell. The w x and d x signals are write and data lines, respectively, that would be part of a grid covering an entire FPGA composed of these gates. The w x and d x signals initialize the configuration data in the RAM cells and would be driven from the periphery of the FPGA using low-frequency, clocked circuitry.
To determine realistic values for the speed and area of the programmable gate, a VLSI layout was done in 2 µm CMOS. Using nominal transistor parameters, the average delay of the gate is 3.6 ns which is comparable to clocked versions. Nominal transistor parameters are used instead of worst-case parameters because one of the advantages of delay-insensitive systems is their ability to run at typical, and not worst-case, speeds. In the future, global clocks are likely to become performance bottlenecks that may give phased logic an opportunity to gain a speed advantage. Of course, in the short run, the area increase for phased logic may decrease performance.
The programmable gate requires 3.8 times the area of a clocked lookup table. This increase in gate area is comparable to the expected factor-of-four increase in wiring area. A factor of four is expected because the number of horizontal and vertical wiring tracks is roughly doubled due to the dual-rail encoding. In a VLSI system composed of relatively small phased logic primitives, a large amount of dual-rail interconnect is needed. In this case, an overall factor-of-four area increase is about the lowest that can be hoped for, and the programmable gate is compact enough to avoid further expansion. Other researchers have found similar area increase factors in the range of two to six for various circuits using dual-rail encodings [15, 27] .
Of course, the area increase could be reduced by using larger macro blocks that encapsulate more complex timing constraints inside their boundaries while still behaving externally like a phased logic gate. The greater use of timing assumptions inside the larger blocks would lower their area overhead compared to clocked versions, and the amount of dual-rail wiring between blocks would be reduced. Candidate blocks would include datapaths and memories. The choice of block size is an economic tradeoff between the increased time needed to design the more complex blocks and the increased area required to wire more primitive gates.
Power is a potential problem in a phased logic system due to the increase in switching activity. During an evaluation of the synchronous functions, one subsignal in every LEDR signal undergoes a transition. Thus, the probability of a transition associated with a phased logic signal is 1.0. In a clocked microprocessor, the switching activity ranges from only 0.09 [28] at control nodes to 0.5 [29] at data nodes. However, in addition to this activity, the entire clock network must be charged and discharged every cycle, and it can consume half the power of a chip-as much as all the logic circuitry combined. Significant power is also wasted in clocked circuitry due to spurious transitions caused by logic hazards [30] . More research is needed to determine how the various factors influence the final power usage, but it seems reasonable to say that the power dissipated by a phased logic system will be commensurate to its increase in circuitry.
VI. Conclusion
Phased logic seeks to combine elements of both the clocked and asynchronous approaches. Like the clocked approach, phased logic supports the synchronous design paradigm.
This allows designers to build on their experience and continue to use familiar CAD tools. Like asynchronous approaches, phased logic eliminates global clocks. The integration of the synchronous design paradigm with delay-insensitivity helps restore the separation between the logical specification and the physical implementation of a digital system. This has the potential to 34 simplify the design process, but this simplification comes at the cost of increased circuitry. Extra circuitry can result in greater component cost because of its area and power requirements, but digital designers are often ready to trade component cost for a shorter, more flexible design process.
For example, standard cells are usually chosen for ASIC design over a fully custom approach.
Both methodologies require the same manufacturing fixed cost, but standard cells have a significantly shorter design cycle, and a short design cycle is becoming increasingly important to the success of products. Its characteristics make phased logic suited for rapid prototyping, emulation of complex computer systems, and custom computing machines.
For future research, phased logic could be extended to support more complex cyclic behavior. For example, freezing one part of a phased logic system while other parts do computations would save power. A performance improvement is also possible with this approach since a long synchronous path could be frozen while the rest of the system cycles at a faster rate. A more efficient synthesis algorithm and a greater variety of phased logic gate implementations are additional areas for future research.
