Abstract-A key element (one is tempted to say the heart) of most digital systems is the clock. Its period determines the rate at which data are processed, and so should be made as small as possible, consistent with reliable operation.
I. INTRODUCTION
V IRTUALLY all contemporary computers and other [digital systqms rely on clock pulses to control the execution of sequential functions. A number of different general schemes are used, along with several different types of flip-flops or similar storage elements. Despite the deceptively simple outward appearance of the clocking system, it is often a source of considerable trouble in actual systems. The number of parameters involved, particularly in 2-phase systems, is large, and a close analysis reveals a surprising degree of conceptual complexity.
If one is not particularly interested in maximizing performance, then a 2-phase system with nonoverlapping clocks, or a 1-phase system with edge-triggered FF's is not difficult to design. However, if minimizing the clock period is a prime issue, then the problem becomes far more complex. However, significant performance gains are possible by carefully choosing the clocking parameters (period, pulse-widths, overlap), and further gains may be achieved by using well-designed latches.
In this study we types of systems that make possible intelligent tradeoffs between speed maximization (period minimization) and the difficulty of satisfying constraints on the logic path delays. We begin with discussions of the state devices considered, the nature of imprecision in clock-pulse generation and distribution systems, logic block delays, and the design goals'. We then analyze the simple case of the 1-phase system using edgetriggered FF's. After this warm-up, we proceed to treat the 1-phase system using latches, a considerably more complicated case. An extension of the methodology used in that section is then applied to the case of 2-phase systems using latches. Some overall conclusions are then presented in the final section.
A. State Devices and Their Parameters
The state devices (or storage elements) treated here are:
The latch [2] , [6] , [1] (sometimes referred to as the polarity hold latch. This is a device with inputs C and D, and output Q (often Q', the complement of Q is also generated), such that, ideally, while C -0, Q remains constant (regardless of the value of D), and while C = 1, Q = D, changing whenever D changes (see Fig. 1 ). (For real latches, as is explained below, there are nonzero delays in the response times, and there must be constraints on the behavior' of the inputs.) The C and D inputs are usually referred to as the clock and data inputs, respectively. Although it is not, in general, necessary to do so, in the applications treated here, the system clock signals are indeed fed to the C inputs of the latches. A variety of implementations of latches are -known, differing in such factors as suitability for various technologies, load driving ability, and relative values of the parameters to be discussed subsequently. Latches with logic hazards have been used in some systems. In order to eliminate the possibility of malfunction due to those hazards, the complement of the C signal is distributed independently to the latches with its edges carefully controlled relative to the corresponding edges of the C signals. We do not discuss such systems here, where it is assumed that the latches are free of hazards.
The edge-triggered D-flip-flop (ETDFF) [2] , [6] has the same inputs and outputs as the latch, but Q responds to changes in D only on one edge of the C pulse (see Fig. 2 ). That is, Q can change only at the time that C changes from 0 to 1 (the rising edge of the C signal), and then only if necessary to assume the same value that D has at that time. (There are also ETDFF's that change state on the negative-going edge of the C signal. Furthermore, it' is possible to build a doubleedge-triggered D-FF [9] that will respond on both edges of the C pulse. are listed below, with rough definitions (illustrated in Fig. 3) . These definitions are then refined to take into account dependencies that exist among the parameters. Cw-m: Minimum clock-pulse width, the minimum width of the clock pulse such that the latch will operate properly even under worst case conditions, and such that widening the C pulse further by making its leading edge occur earlier will not affect the values of DDQ, U, or H, as defined below.
DcQ Propagation delay from the C terminal to the Q terminal, assuming that the D signal has been set early enough relative to the leading edge of the C pulse.
DDQ: Propagation delay from the D terminal to the Q terminal, assuming that the C signal has been turned on early enough relative to the D change.
U: The setup time, the minimum time between a D change and the trailing edge of the C pulse such that, even under worst case conditions, the Q output will be guaranteed to change so as to become equal to the new D value, assuming that the C pulse is sufficiently wide.
H: The hold time, the minimum time that the D signal must be held constant after the trailing edge of the C signal so that, even under worst case conditions, and assuming that the most recent D change occurred no later than U prior to the trailing edge of C, the Q output will remain stable after the end of the clock pulse. (It is not unusual for the value of this parameter to be negative.)
Note that DDQ, for example, may vary significantly depending on whether the latch output is being changed from 0 to 1 or vice versa. A similar situation exists for DCQ. Where appropriate it is useful to add subscripts R or F to these parameters to distinguish between the rising and falling output cases. This will not be done here. Instead, we shall confine ourselves to using overall maximum and minimum values, as indicated below.
The addition to the subscripts of DDQ or DCQ of an M or m make these parameters the maximum or minimum values, respectively. These are the extremes with respect to variations in the parameters of the components from which the latches are constructed, the directions of signal changes, and the destinations (Q or Q') of the signals.
In the definition ofDCQ, it is assumed that D has assumed its proper value early enough. We can make this concept more precise by requiring that the change in D occurs sufficiently early so that making it appear any earlier would have no effect on when Q changes. For any real latch it is always possible to define such an interval. Similarly, when defining DDQ, it is assumed that the leading edge of C appears sufficiently early so that turning C on any earlier would not make Q change any sooner. Again this is possible for any real latch. Now we state an important postulate regarding propagation delays:
Suppose that C goes on at time tc, and that D changes, making D different from Q, at time tD. Then we postulate that the time tQ at which Q changes is, at the latest:
Although for some latches there are higher order effects, depending on the technology, that may cause tQ to be larger when the difference between the arguments of the max is small, the error is small enough to justify our postulate for most practical purposes. Refining the model to take such effects into account is left for further research.
A related-assumption about latch behavior is that, provided that the setup, hold-time, and minimum pulse-width constraints are observed, the propagation delay will not be affected by the clock-pulse going off before the output changes in response to a D change. An examination of a variety of latch designs appears to justify this assumption.
There are other possibilities for refining our results, by using more complex definitions of latch parameters. If we define the actual interval between the occurrence of a D change and the trailing edge of C as u (note that proper operation requires that u 2 U), then, for many latch designs it will be found that the hold time H is, over some range of values of u, a decreasing function of u. There are also possibilities for reducing the clock-pulse width below Cwm (within limits), usually at a cost of increasing propagation delays and/or setup and hold times. For the sake of making the analysis more tractable, we shall not consider these alternatives, but instead shall assume that there is a fixed, consistent, set of latch parameters, as described above.
In summary, we assume that the minimum clock-pulse width is large enough so that further increases cannot reduce any of the other latch parameters, that U is minimal, that H is minimal given U, and that the postulate stated above regarding propagation delays is valid.
2) Edge-Triggered-D-FF Parameters: The significant parameters for an ETDFF are defined below (see also Fig. 4) .
U: The setup time, the minimum time that the D signal must be stable prior to the triggering edge of the C pulse.
H: The hold time, the minimum time that the D signal must be held constant after the triggering edge of the C pulse. (The value of H may be 0 or even negative for some ETDFF's.) Cw ,m: Minimum clock-pulse width, the minimum width of the clock pulse such that the ETDFF will operate properly even under worst case conditions. DCQ: Propagation delay from the C terminal to the Q terminal, assuming that the D signal has been set up sufficiently far in advance as specified by the setup time constraint.
B. Clock-Pulse Edge Deviation
In any real-world system there are limits to the precision with which events can be timed. Our concern here is with synchronous systems with clock-pulses distributed to a multitude of devices for the purpose of coordinating events. The intent is to have certain clock-pulse edges occur simultaneously at all devices (in some cases fixed displacements may be specified for corresponding signals at different devices). In designing clocking schemes, it is necessary to take into account the extent to which this goal cannot be fully attained.
The approach taken here is to assume that, at each significant clock-pulse edge, there is a specified tolerance range, within which we can assume the errors will be confined. This is, essentially, a "worst case" approach. No attempt will be made to exploit statistical information that could make possible more precise estimates of errors, nor will any effort be made to consider the effects of correlations between errors or between delays.
The most elaborate situation that we deal with is that of 2-phase systems using latches as storage elements. Here both the leading and trailing edges of both clock-pulses are of interest (although the analysis makes it clear that certain edges are more significant than others). We define tolerances for all 4 edges, designating them as TIL, TIT, T2L, and T2T, corresponding to the leading and trailing edges of Cl and C2, respectively. Assume that, for example (see Fig. 13 ), the leading edge of the Cl pulse for some period would have arrived at every latch at time t (which we refer to as its nominal arrival time) if there were no inaccuracies in timing. Then, in the actual system, this edge is received at every latch somewhere in the time interval, (t -TIL, t + T1L). Corresponding assumptions of course apply for the other three edges. Our goal is to design our systems so that if this assumption, and corresponding assumptions about the precision of the components used, are valid, then there will be no failures due to timing, even if some malicious demon is, in each case, permitted to choose the extreme deviations most likely to cause trouble. Of course in 1-phase systems we need only define two edge tolerances, TL and TT.
We are lumping together in these edge tolerances all sources of imprecision in clock timing and distribution. These are principally in the circuits used to determine the clock-pulse widths, often called "shapers," and in the networks used to distribute the pulses to the individual latches (or other similar devices). This latter factor is generally referred to as clockpulse skew. In the case of 2-phase systems, it is also necessary to consider the circuits that determine the phase relationship between the Cl-and C2-clocks.
Relative to other sources of error, the precision with which the clock frequency can be maintained, at least in highperformance systems, is so great (due to the use of crystal controlled oscillators) that we can safely neglect this factor. (If this assumption is not justified in any particular case, it is not difficult to introduce a tolerance factor on the clock period, which can be superposed on our basic results.)
By representing all of the timing deviations in terms of the edge tolerances, we simplify our analysis, making it easier to treat, as a separate issue, the mechanisms whereby precision is lost.
The precision with which clock-pulse widths can be controlled is generally a function of how precisely delay elements can be specified. The same factor usually is involved in controlling the phase between the Cl and C2 pulses of a 2-phase system. The ratio of 2 delays on the same chip can be specified with much greater precision than is the case for delays on different chips. Usually one edge of the output of a shaper can be controlled more precisely than the other. In the 2-phase case, there are techniques for minimizing the edge tolerances for particular pairs of edges. As is shown in the sequel, T2L and T1T are usually more significant. They should therefore be kept smaller, relative to the other two-edge tolerances.
Several factors contribute to clock-pulse skew. Despite all efforts to equalize conduction path lengths between the clock source and each clock-pulse "consumer," differences inevitably occur in both off-chip wiring and in paths on chips. Since it is usually necessary to provide amplifiers in the distribution paths, variations in the delays encountered in such devices along different paths produce significant amounts of skew.
Another contribution to skew results from the fact that pulse edges are never vertical as shown in our ETDFF'S For 1-phase systems using ETDFF's, the clocking parameters to be determined, (see Fig. 4 ) are the period P and the clock-pulse width W. A block diagram of the systems under consideration is shown as Fig. 5 .
We develop a set of constraints, such that if all are satisfled, and if the D signals arrive on time for the first cycle, then they will also arrive'on time for the next cycle and will remain stable long enough to ensure that the FF's react properly. By induction, it follows that, for all succeeding cycles, the FF inputs are also stable over the appropriate intervals, so that the system will behave according to specifications.
For any clock-pulse period, proper operation requires that the D signals become stable' at least U prior to the earliest possible occurrence, of the triggering edge. (It is assumed here that this is the positive-going edge. Precisely the same arguments apply where the triggering edge is negative goingor evern if the FF's trigger on both edges.) If we assume that t = 0 coincides with the nominal time of the leading edge of the current clock pulse, then the earliest possible occurrence time Simplifying and rearranging terms yields the basic constraint that defines DLmB, the lowet bound on the short-path delays:
In addition to constraints (4) and (6) on the period and shortpath delays, it is necessary to add a third constraint to ensure that the minimum pulse-width specification for the FF's is satisfied. Since, under worst case assumptions skew might make the leading edge late ahd the trailing edge early, the minimum width specification for the clock pulses is W. TL+ TT+ CWm. (7) delay pads at the outputs of the FF's. Since the D signal must arrive at least U prior to this edge, we have for the latest permissible arrival time for D, tDLArr:
tDLArr. W -$-U. (8) Assume now that the above constraint is satisfied for the first clock cycle.
A. Preventing Late Arrivals of D Signals
The latest (under worst case conditions) arrival titne of D signals for the next cycle is designated as tDLArrN. The maximum permitted value of tDLArrN is found by simply adding P to the right side of (8) tDLArrN. W-TT U+P (9) (see Fig. 9(a) ). The procedure for choosing optimum clocking parameters for 1-phase systems using ETDFF's is usually very straight-using postulate (1) for determining the latest time at which for 1-phase' systems using ETDFF's iS usually very straight-teotu falthcudcag eoti forward. We simply set W at any convenient value satisfying constraint (7) and set P to satisfy constraint (4) with equality. In most Cases, it will be found that the constraint on the shortpath bound given by (6) is not difficult to meet. In the unlikely (The discussions pertaining to the left and right parts, event that this is not the case, it may be inecessary to insert respectively, of the rhax expression are illustrated by Fig. 9(b 
Combining (9) (This discussion is illustrated by Fig. 10(a) (The discussion involving the left part of the max is illustrated in Fig. 10(b) DLm>DLmBs-TL+ TT+H+ W-DCQm.
The above expression gives us the lower bound D mB on the short-path delay. Satisfying this bound is necessary and sufficient to ensure against the premature arrival of a D signal.
C. Consequences of the Constraints
The basic constraints derived in the previous subsections are reproduced below.
DLm> DLmB TL+ TT+H+ W-DcQm.
To these we must add one more to ensure that, even under worst case conditions, the clock-pulse width at any latch input meets the minimum clock pulse width specifications of the latches. This is:
W in (11) cannot usefully be increased beyond the point where the right side of (11) would, if equality held, violate (12) , which of course also represents a lower bound on P. Note that it is undesirable to increase W gratuitously, since this would, as indicated by (17), raise the lower bound on the short-path delays. To find the maximum useful value of W, treat (11) and (12) If the value of the lower bound on the short-path delays given by the above relation is attainable, then the minimum P value of (12) The relations developed here are the basis for the optimization procedure of the next subsection. First, however we must consider a possible variation of the development thus far. 1 The initial assumption in the discussion of 1-phase systems was that the D signals must appear at latch inputs no later than U prior to the trailing edges of the clock pulses. In what followed, this constraint was consistently observed. But what if we had made a stronger assumption, i.e., that the D changes must appear even earlier, say at U + r (r > 0) prior to the trailing edges of the clock pulses? Is it possible that there might be some advantages to this?
The key to analyzing this question is to observe that the proposal is exactly equivalent to assuming a larger value of the setup time U. The effect of this can be determined by looking at those constraints and derived relations that involve U, namely (11), (19), (20), and (21). The value of DLmB necessary to achieve the minimum P increases with U. So does the minimum value of P for any value of DLmB in the range for which (21) Other procedures based on the constraints developed here may be useful under special circumstances. Fig. 12 is a general block diagram of the 2-phase clocked systems treated here. Clock signals (shown in Fig. 13 ) go directly to the C inputs of the latches. Facilities for scan-in and scan-out are not included as they do not affect the basic arguments.
IV. OPTIMUM PARAMETERS FOR 2-PHASE CLOCKING WITH LATCHES
The strategy to be followed is based on the assumption that if the D inputs to all of the latches are valid in the intervals specified by the U and H parameters, then the system will operate as specified. A set of constraints will be 
Let tCILL be the latest arrival time of the leading edge of a Cl pulse. Then, recalling (1) about latch propagation delays, the latest time when the output of an LI latch changes (an alternate description of tD2LArr) is as follows (the left side of the max is illustrated by Fig. 14(c) and the right side by part Fig. 14(b) Fig. 15(a) .) The upper bound on the latest arrival time tD1LArrN of a D1 signal during the next cycle is obtained from (23), which gives the latest permissible arrival time for the first cycle by simply adding the period P to the right side. This gives us tDlLArrN<P+ V-Ul -TT Now consider how long it might take a signal to get through an LI latch, through the following L2 latch, and through the logic to reach an LI latch input in time for the next Cl pulse.
(See Fig. 12 ). In terms of the latest arrival time at an L2 input tD2LArr and the latest possible occurrence of a C2 leading edge tC2LL, (1) 
There are three factors restricting the propagation of signals thru the two latches: propagation thru the D inputs of both 1; and L2 latches, propagation from the C inputs of the LI latches (involving the location of the Cl leading edge) through the D inputs of L2 latches, and propagation from the C inputs of the L2 latches (involving the location of the C2 leading edge). These are all accounted for in the above expression. They are illustrated in Fig. 15(b), (c), and (d Fig. 16(a) .) The earliest arrival time tDIEArrN of such "short-path" signals for the next cycle must be later than H1 after the latest possible occurrence of a Cl trailing-edge; that is tD1EArrN> V+ TIT+ Hl.
(34)
The earliest time that a D1 signal can change as a result of signal changes generated during the same clock period getting around the loop is arrived at analogously to the way (29) was produced; the same three categories of constraints must be considered. Now, however, since we seek the minimum delays, we use minimum values for the delays within the max expressions, and the earliest times for the critical clock-pulse edges.
With tC2EL as the earliest occurrence time of a C2 pulse leading edge, and with tD2EArr as the earliest arrival time of a D2 input change, postulate (1) indicates that the earliest output from an L2 latch can occur at tQ2E, given by 
(The first 2 parts of the max are illustrated in Fig. 16(b) and (c), respectively.) Now we show that, for a system that operates properly even under worst case conditions, (34) is valid if, and only if, it is valid when the value used for tD1EArrN is that of (37) with the third part of the max deleted. The "if" part of this assertion is obviously true.
To prove necessity (the "only if' part), let us assume the contrary, namely that (34) is valid and that neither of the first 2 parts of the max of (37) exceeds the right side of (34).
Then, since tD,EArrN must satisfy (34), it follows that the third part of the max must do so. Therefore, it must exceed each of the first two parts, both of which can therefore be deleted from (37), reducing it to tDIEArrN = tD IEArr + DIDQ) + D2DQm + DLm.
But, from (31) it is clear that (38) P>DIDQm +D2DQm + DLm.
UNGER AND TAN: CLOCKING SCHEMES FOR HIGH-SPEED DIGITAL SYSTEMS
Adding tDlEArr to both sides gives us tDIEArr+ P> tDlEArr+ DIDQm + D2DQm + DLm. Fromn the above and from (38) we have tD lEArrN< tD lEArr + P But this means that, for each cycle (in the worst case), DI arrives earlier and earlier relative to the trailing edge of Cl. Therefore, even if tDlEArr iS comfortably above the minimum for the first cycle it will eventually violate the hold-time constraint, so that the system would not operate properly. Hence, by contradiction, we have completed our argument.
Thus, we can replace tD1EArrN in (34) with the right side of (37), omitting the third part of the max (and factoring out While it is conceivable that a system might exist for which the right side of (40) is less than the right side of (39), an examination of the 2 expressions suggests that this is very unlikely. Hence, in most cases it is constraint (39) that should be relied upon.
D. Premature Changes of D2 Signals
Now consider how to ensure that the D2 signals, once on, remain stable long enough for proper operation, i.e., that the hold-time constraints for the L2 latches are satisfied. It is necessary to ensure that tD2EArrN the time of the earliest change in a D2 signal resulting from a signal passed by the next Cl pulse satisfies the following relation where tC2LT is the latest occurrence time of the trailing edge of C2. tD2EArrN> tC2LT+H2Z
The latest appearance of the trailing edge of C2, C2LT, and occurs at W2 + T2T. (Refer now to Fig. 17(a) The left and right parts of the max of (44) are illustrated in Fig. 17(b) and (c), respectively. Relation (44) can be expressed as the following pair of relations, at least one of which must be satisfied:
These may be more conveniently expressed, respectively, 
'hey constitute necessary and (along with the other conints developed above) sufficient conditions for ensuring the inputs to the L2 latches will remain on for a iciently long time relative to the -trailing edges of the C2 ,es. Under most circumstances, it would appear that (46) is h more likely to be satisfied than is (45). Our objective is to choose the clock parameters (widths, period, and overlap) so as to maximize the speed of the system (clearly this is achieved when the period P is minimized), while making it as insensitive as possible to parameter variations. That is, we would like to make the tolerances as large as possible. We often start out with a desired value for the maximum logic delay-DLM in a logic path (the long-path delay) as this is largely determined by the given technology and the desired maximum number of stages of logic. The crucial factor determining feasibility with known tolerances for delay per logic stage is then the minimum delay in a logic path DLm or short-path delay. If the required lower bound on the short-path delay is too large compared to the long-path delay, then the system may be difficult or impossible to realize reliably.
We therefore define the problem as that of finding the minimum value of P such that the lower bound on the shortpath delay (DLmB) is acceptable (not too large). It is assumed that we are given all of the latch parameters, the clock-pulse edge tolerances, and the long-path delay DLM.
The key constraint on DLm is almost always (39). Hence, we set DLmP equal to the right side of that constraint and solve for 892 UNGER AND TAN: CLOCKING SCHEMES FOR HIGH-SPEED DIGITAL SYSTEMS V=DLmB-Hl -T1T-T2L +D2CQm * (52) Now substitute the above right side for V in (33), which is the key constraint on P, to obtain an expression for the minimum value of P as a function of the short-path delay: P-H1 + U1 + D2CQM-D2CQm +DLM-DLmB+ 2(TIT+ T2L).
(53)
This expression is valid provided that the value of P obtained does not violate (31). Thus, tQ find the maximum value of DLmB beyond which no further reductions in P are possible, we must first find the maximum value of V for which (33) is valid (i.e., the value for which (31) is not violated). We do this by substituting the right side of (31) for P in (33) and, treating the resulting expression as an equality, solving for V:
There is clearly nothing to be gained by making the overlap any larger than the value given in (54), since the effect would be to increase the lower bound on the short-path delay without reducing P beyond the absolute minimum given by (31). First observe that U1 appears in (26), (32), and (33), as well as in (54) for the maximum useful overlap, in (55) for the value of DLLmB corresponding to the absolute minimum bound on P, and in (53) for the minimum value of P as a function of the lower bound on the short-path delay. The direct effects of increasing U1 are detrimental in all cases except that corresponding to (26). That is, the period would have to be increased and/or DLmB would have to be increased (various tradeoffs are possible), both of which are bad, but the lower bound on the width of the C2 pulse would be relaxed, a benefit, but seldom one that is needed.
The U2 term appears only in (26) and (27), and in (49) for the end of the unstable period for the outputs of L2. In the first two cases it tightens (by increasing) the lower bounds on the pulse widths, which is mildly bad, and in the last case it increases the interval during which the Q2 signals are stable, which might conceivably be advantageous in some situation.
It therefore does not seem useful to consider requiring the D inputs to the latches to arrive earlier than necessary, unless a very special circumstance should make important one pf the factbrs discussed above. An interesting and perhaps useful added conclusion from the above discussion is that the setup time for the L2 latches is of less importance with respect to speed and tolerances than is the set-up time for the LI latches.
G. Computing Optimum Clock Parameters
Let Dmax LmB be the largest lower bound that we can enforce on the short-path delays. To The procedure given above is intended as a general guide to the use of the constraints developed here. In particular cases alternative procedures may be more appropriate.
V. CONCLUSIONS As is evident from the length of the corresponding section, the task of determining optimum clocking parameters for systems using ETDFF's is relatively simple. The clock-pulse width is not critical, and the constraint on the short-path delays is seldom stringent. The price paid for this is that the minimum clock period is the sum, not only of the maximum delays through the logic and the FF's, but also of the setup time and twice the edge tolerance. No tradeoffs are possible to reduce this quantity.
For 1-phase systems using latches, it may be possible to make the period as small as the sum of the maximum delays through a latch (from the D input) and the logic. In order to do this, the clock-pulse width must be made sufficiently wide (usually past the point where the leading edge of'the clockpulse precedes the appearance of the D signals). Wider clock pulses imply increased values of DLmB, the lower bound on the short-path delays. If this bound is not to become unreasonably high, it is necessary to keep the edge tolerances small. It is also helpful if the difference between the maximum and minimum values of the propagation delays from the C inputs of the latches are small.
The 2-phase system with latches is inherently more complex in that more variables are involved. As is reduced to the sum of the maximum propagation delays through the LI and L2 latches (from the D inputs) and the logic. Again it is possible to absorb the effect of edge tolerances in terms of short-path rather than long-path problems.
An important advantage of 2-phase over 1-phase systems is that, for every 2-phase system, simply by varying the overlap (i.e., the phasing between the Cl and C2 clock pulses), DLmB can be varied continuously from zero to the highest useful value (with the minimum P of course changing in the opposite direction). On the other hand, for 1-phase systems, the range of variation of DLmB possible by varying the clock-pulse width is often much smaller, particularly at the low 'end. As illustrated in the graph of Fig. 11 With only one latch in each feedback path, the lower limit on the clock period is lower for 1-phase systems, although this factor is somewhat attenuated by the fact that some latches in 1-phase systems will have both inputs from sources that fan out to other latches, and outputs that fan-out to many gate inputs. Both of these are factors that reduce speed. But in 2-phase systems each LI latch feeds only one other device (an L2 latch), and each L2 latch receives its D input from a source (an L 1 latch) feeding no other device. Hence, all other things being equal, we would expect the delays through the two la-ehes in the feedback paths of 2-phase systems to have less than twice-the delays of the one latch in the feedback path of a l-p'hase system. An advantage of 2-phase systems over both of the other types considered here is that they are somewhat more compatible with the LSSD concept for system testing [1] , [2] .
It appears that all three types of systems have their places. Where there is a willingness to exert great efforts to suppress skew (e.g., by hand-tuning the delays in clock distribution paths), and to control other related factors very precisely, the 1-phase system may be the best choice, as in the case of the CRAY 1 machine. In other cases of high-performance machines, 2-phase clocking may be more suitable. Use of ETDFF's seems to have advantages for less aggressive designs.
The results presented here in such precise looking relations obviously depend heavily on the precision with which the parameters of those relations can be determined. Realistic figures must be obtained that take into account such matters as power supply and temperature variations, as well as data sensitive loading considerations.
The relations developed here may be useful in determining what latches to use in certain situations and to determine how to modify latch designs so as to improve system performance. For example, an examination of the constraints developed in Section III-C for 1-phase systems with latches suggests that the minimum value of DDQ is of no importance, whereas the minimum value of DCQ is important in that the larger it is, the less stringent is the constraint on short-path delays.
In the 2-phase case, minimizing (D2CQM -D2CQm) is clearly helpful. It relaxes the requirement on DLmB imposed by (55), which, if it can be satisfied, allows P to be set to the minimum value given by (31). If It is clear from the results developed here that minimizing clock edge tolerances is of considerable importance in highperformance digital systems. In 2-phase systems, a special effort is warranted to minimize T1T and T2L, which appear in key several constraints. Unfortunately, technology trends are such as to emphasize factors that cause skew. For example, as the dimensions of logic elements on chips shrink, the ratio of wiring delays to gate delays grows. A high priority must therefore be given in wiring algorithms to the clock distribution system'. Off-chip wiring forming part of the clock distribution network must be carefully controlled. In some cases, the insertion of adjustable delays in these paths may be warranted.' It is quite likely that the continuation of the trends that exacerbate the skew problem will soon make it worthwhile to consider systems that do not use clock pulses or that use clock pulses only locally. Discussions of such asynchronous, self-timed, or speed-independent systems are in [4] and [8] .
Logic designers and those developing computer aids for logic design customarily pay a great deal of attention to minimizing long-path delays. It is also important to consider techniques for increasing short-path delays. In line with this there is a need for circuit designers to develop techniques for introducing precisely controlled delay elements where needed. At present, in many technologies, logic designers are forced to cascade inverters to produce delays. This is wasteful in terms of both chip area and power. In general, the idea that greater speed may result from better delay elements should be conveyed to those developing digital technology.
Further developments along the lines developed here would include the use'of statistical rather than worst-case analyses, which would allow us to choose clocking parameters such that the likelihood of a timing failure is very small, but not zero. This usually implies shorter clocking periods. In using this approach it is important to be able to take into account correlations among delay values, skew etc., in various parts of the system [5] , [7] . It is also possible to speed up systems by exploiting detailed knowledge of the logic paths. There may be, for example, constraints on the sequencing of signals through certain combinations of paths that allow us to consider consecutive pairs, triples, etc., of cycles together and thereby realize that shorter periods are feasible than would be the case if each period were considered separately. Research along this line is being conducted by K. Maling [3] .
An earlier presentation of the work discussed here, in a different form with different notation was issued by the authors several years ago [10] , [11] . The idea that clocked systems could be speeded up by permitting the D-inputs to latches to lag behing the leading edges of the clock-pulses and by allowing the C 1-and C2-clock pulses to overlap is not new. These ideas are included in the very interesting book on digital systems design by Langdon [2] , and have been pointed out by D. Chang of IBM's Poughkeepsie Laboratories a number of years ago in at least one internal memorandum. Other pertinent work, in connection with pipelining, is by Kogge [12] and Cotten [13] .
