Perfectly synchronizing an asynchronous digital signal in bounded time is known to be impossible, since all bistable devices exhibit a region of metastability. In practice, "reliable synchronization" means the achievement of a synchronization failure rate comparable to hardware failure rates. Since metastable state decay times are exponentially distributed, an arbitrarily low synchronization failure rate can be achieved by performing the synchronization with a shift register of 
sufficlent length. Unfortunately, this leads to a trade-off between failure rate and propagation delay. Low synchronization failure rates imply long propagation delays, which can seriously degrade the performance of delay-sensitive systems. This paper describes a synchronizer that exhibits an arbitrarily low failure rate with a short 
II_
It is impossible to synchronize an asynchronous digital signal in bounded time, with probability zero of failure to synchronize, for the same reason that no upper bound can be placed on the time required for a pencil perfectly balanced on its point to achieve a stable (e.g., horizontal) state. Knifeedge decision-making implies the existence of metastable states, states that persist until sufficiently perturbed by statistical phenomena such as Johnson noise.
Many researchers have directly observed metastability in bistable logic devices with both ordinary and sampling oscilloscopes [1, 2, 5] . Others have adduced statistical proof of the existence of metastability [3, 4] . Furthermore, a general proof exists that all bistable devices, electrical, mechanical, hydraulic or otherwise, must exhibit a region of metastability [6] . (Nevertheless, as recently as 1977,
articles claiming to present metastability-free synchronization schemes have been published in reputable journals [7] .) As a practical matter, it is straightforward to construct a simple finite-state machine with an asynchronous input that fails regularly unless explicit steps are taken to avoid metastability. Such a machine is described in section 3.1.
Current practice recognizes that metastable state decay times are exponentially distributed since metastable state decay is a Poisson process [5, 8] . Therefore, arbitrarily low synchronization failure rates can be achieved by allowing synchronizing flip-flops sufficient time to settle. An ubiquitous circuit embodying this principle is the shifi-register synchronizer, which consists of n cascaded D flipflops, n =2 being the most common case. If the clock to which the asynchronous input is to be synchronized has a period of T, the shift-register synchronizer allows (n -1)T seconds for metastable states to settle. Although input transitions are delayed by a constant (n -1)T seconds, the shift-1 register synchronizer exhibits a throughput of I, which is as fast as can be. As a concrete example, a clocked sequential circuit constructed with low-power Schottky TTL components and operating at a clock frequency of 10 mHz would require a single-stage shift-register synchronizer to synchronize an asynchronous input, 100 ns being more than sufficient to achieve a synchronization failure rate comparable to hardware failure rates, presumably a tolerable level.
An Interesting Special Case
The canonical example of an asynchronous event in the life of a computer is the depressing of a teletype key by a user. It is neither necessary nor possible to improve upon the performance of the shift-register synchronizer in such a case. Consider instead a pair of finite state machines that are in communication with one another. Each machine has its own clock. These clocks may or may not be of the same nominal frequency, and even if they are, the phase relationship between them is unspecified. A pair of independently clocked processors competing for a shared resource is an example of such a system. A common strategy is to require that each machine treat the current state of the other as a totally asynchronous input, capable of changing at any time. To do so, however, is to ignore the fact that a clocked sequential circuit cannot change state except immediately after ticks of its clock. If each FSM were provided with the clock signal of the other, the metastable-state settlingtime penalty need be paid only when absolutely necessary.
How it works
A scheme for constructing a minimum-average-latency synchronizer is sketched in the following sections.
Definitions
LCLK: the local clock; i.e., a locally-generated signal consisting of periodic, positive-going edges.
FCLK: a foreign clock; i.e., any signal consisting of periodic, positive-going edges.
GO: an aperiodic signal synchronous with FCLK; i.e., all transitions of GO are guaranteed to occur between 0 and rf seconds after a positive-going edge of FCLK, where 7r << Tf, the period of FCLK.
Problem
Synchronize GO with LCLK, delaying GO only as necessary to avoid arbitration failure. The synchronizer is susceptible to metastability just when these window signals overlap, since then and only then is it possible for GO to be in transition when it mustn't be. Because the LCLK and FCLK window signals are periodic, it is possible to predict which local clock pulses will be accompanied by overlapping LCLK and FCLK windows and to protect the synchronizer from metastability by disabling it at the proper time. Specifically, a pulse detector must monitor the logical AND of advance copies of the LCLK and FCLK window signals, closing a latch inserted between GO and the synchronizer as necessary to prevent changes in GO from causing synchronizer metastability.
Since the degree of overlap between the windows can vary continuously from none to total, ANDing advance copies of the two window signals can produce runt pulses. Therefore, the advance copies must be sufficiently in advance to allow the pulse detector to recover from metastable states.
Performance
Assuming that all possible phase relationships between FCLK and LCLK are equally likely, the probability that any randomly chosen FCLK window will collide with an LCLK window is
Ti -9-
Hence the synchronization process can take at most one cycle, and the average number of cycles required to recognize a change in GO is just
-10-
TTL Implementation
This section describes TTL realizations of two previously mentioned devices:
* the synchronization failure detector of section 1, and * the minimum-average-latency synchronizer of section 2.
Experimental results produced by the metastability detector are presented, and the performance of the minimum-average-latency synchronizer is computed.
Synchronization failure detector
Synchronizer metastability will cause random behavior in all finite-state machines for which the value of a single input variable affects more than one state variable. Therefore, a machine susceptible to errors due to synchronization failure need have only two state variables. Such a machine is described by the following state-transition diagram:
Note that except for a start-up transient, this FSM simply shuttles back and forth between the two states marked 00 and 11 depending on the value of the input. 
Minimum-average-latency synchronizer
Given local and foreign clocks characterized by T = 100 ns (3.4) Tf = 125 ns (3.5) the task is to implement the minimum-average-latency synchronizer described in section 2.2.
Circuit topology
The schematic of a Schottky TTL implementation of the synchronizer appears in Appendix 2; 
Delay line values
The effective set-up time is' of the 74LS175 D flip-flop is equal to its intrinsic set-up time plus the propagation delay id of the 74S373 D latch that precedes its D input:
ts' = ts + Id = 10 ns + 7 ns = 17 ns (3.6) Since the required minimum data hold-time of the 74LS175 is 0 ns, 
>1

ICiE
Taking everything into consideration, the delay lines that determine the foreign and local clock window widths should have the following minimum values:
rf' = 7f + 1g, + e = 22 ns + 4 ns + 3 ns = 29 ns (3.9)
'rl' = •1 + 9g 2 + rg 3 + e = 17ns + 3ns + 7ns + 3ns = 30ns
Given the above, it is theoretically possible for GO to change as much as is' + tg + e + Ip = 17ns + 7ns + 3ns + 22ns = 49ns If f(t) is to be advanced by A seconds, then
;111"-~- (3.13) where D =nT -A >0 n > (3.14) T is the amount by which f(t) must be delayed, and conversely. The use of a two-stage pulse detector to monitor NO requires local and foreign clock window overlaps to be predicted between one and two local clock periods in advance. Therefore, LCLKWIN is effectively advanced by 2TI -is'-t92 = 200ns -17 ns -3ns = 180ns (3.15) and the smallest non-negative amount by which FCLKWIN can be delayed is 250 ns -180 ns = 70 ns.
f(1)'= f(t + A) = f( + A -nT)= f(T-D)
Performance
The average latency exhibited by the synchronizer described in the previous section can be no greater than T, 100 ns local clock cycles, or 32 ns.
The latency of a two-stage shift-register synchronizer is 1.00 local clock cycles, or 100 ns.
Experimental results
Experiments were conducted with the synchronization failure detector whose schematic appears 
Conclusions
With a significant increase in complexity and with considerable attention to detail, it is possible to better the performance of a shift-register synchronizer, provided that the input to be synchronized is synchronous with some periodic signal to which the synchronizer has access.
Applications
The synchronization strategy discussed in this paper applies whenever two or more independently-clocked synchronous sequential circuits must communicate and latency is an issue. For example, consider that the limiting performance factor in many computer systems is memory access time. If a number of processors must contend for the use of an asynchronous data bus, metastablestate settling time puts an absolute floor underneath memory access time. Even if the bus is synchronous, it may be inconvenient or impossible for all potential bus masters to operate on clocks that are phase-locked to a global timing signal. Particularly if the peak-to-average utilization ratio of the bus is high, overall performance may be degraded by enforcing global synchrony. True synchronization may be impractical because the system is too widely distributed.
Concrete examples of multiple-processor systems that exhibit one or more of these characteristics include the Nu Machine, a personal work-station developed by the M.I.T. Laboratory for Computer Science, and Concert, a general-purpose multiprocessing system under construction by the same group.
Further research
A difficulty with the current design is that it requires a priori knowledge of the local and foreign clock frequencies. Ideally, the synchronizer would automatically adjust itself to any clock frequencies within a reasonable range. Current speculation centers around the use of a voltage-controlled oscillator contained within a phase-locked loop as an alternative to a fixed delay line to produce the necessary phase shift in the foreign clock window signal. 
