Today's hardware technology presents a new challenge in designing robust
systems. Deep submicron VLSI technology introduced transient and permanent
faults that were never considered in low-level system designs in the past.
Still, robustness of that part of the system is crucial and needs to be
guaranteed for any successful product. Distributed systems, on the other hand,
have been dealing with similar issues for decades. However, neither the basic
abstractions nor the complexity of contemporary fault-tolerant distributed
algorithms match the peculiarities of hardware implementations. This paper is
intended to be part of an attempt striving to overcome this gap between theory
and practice for the clock synchronization problem. Solving this task
sufficiently well will allow to build a very robust high-precision clocking
system for hardware designs like systems-on-chips in critical applications. As
our first building block, we describe and prove correct a novel Byzantine
fault-tolerant self-stabilizing pulse synchronization protocol, which can be
implemented using standard asynchronous digital logic. Despite the strict
limitations introduced by hardware designs, it offers optimal resilience and
smaller complexity than all existing protocols.Comment: 52 pages, 7 figures, extended abstract published at SSS 201