3 research outputs found
Implications of Clock Distribution Faults and Issues with Screening Them During Manufacturing Testing
Based on real process data of a reference microprocessor, fault models are derived for the manufacturing defects most
likely to affect signals of the clock distribution network. Their
probability is estimated with Inductive Fault Analysis performed on the actual layout of the reference microprocessor. The effects of the most likely faults have been evaluated by electrical level simulations.
We have found that, contrary to common assumptions, only a small
percentage of such faults result in catastrophic failures easily
detected during manufacturing testing. On the contrary, the majority of such faults lead to local failures not likely to be detected during manufacturing testing, despite their possibly compromising the microprocessor operation and reliability. In particular, we have found that the clock faults can be detected during manufacturing testing in only 12 percent of cases. Even more surprisingly, we have also found that, in 10 percent of cases, the undetected clock faults also invalidate the testing procedure itself
Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation
Today's hardware technology presents a new challenge in designing robust
systems. Deep submicron VLSI technology introduced transient and permanent
faults that were never considered in low-level system designs in the past.
Still, robustness of that part of the system is crucial and needs to be
guaranteed for any successful product. Distributed systems, on the other hand,
have been dealing with similar issues for decades. However, neither the basic
abstractions nor the complexity of contemporary fault-tolerant distributed
algorithms match the peculiarities of hardware implementations. This paper is
intended to be part of an attempt striving to overcome this gap between theory
and practice for the clock synchronization problem. Solving this task
sufficiently well will allow to build a very robust high-precision clocking
system for hardware designs like systems-on-chips in critical applications. As
our first building block, we describe and prove correct a novel Byzantine
fault-tolerant self-stabilizing pulse synchronization protocol, which can be
implemented using standard asynchronous digital logic. Despite the strict
limitations introduced by hardware designs, it offers optimal resilience and
smaller complexity than all existing protocols.Comment: 52 pages, 7 figures, extended abstract published at SSS 201