6 research outputs found

    Towards a Systematic Design of Fault-Tolerant Asynchronous Circuits

    Get PDF
    I. MOTIVATION Accommodating billions of transistors on a single die, VLSI technology has reached a scale where principal physical limitations have a strong impact on design principles. Among the particular challenges are maintaining the synchronous clock abstraction in settings where wiring delays dominate over switching delays, and coping with increasing transient failure rates. In an attempt to address some of these challenges, we recently developed a clocking scheme called DARTS 1 . The cornerstone of this approach is a distributed fault-tolerant Tick Generation (TG) unit that implements an adaptation of a faulttolerant clock synchronization algorithm originally developed in the distributed computing context The implementation of the TG units [3] required a design process that was quite different from the traditional one: First, the TG circuitry dedicated to generating the chip's clock signals naturally mandated an implementation in asynchronous logic. Second, our fault-tolerance requirements made it necessary to cope with lost/faulty clock signal transitions, which rendered a delay-insensitive approach impossible. And last but not least, we had to bridge the gap between the highlevel distributed algorithm's view with its mathematical proofs, and the low-level VLSI implementation with its tool-based verification techniques. This paper (resp. its extended version II. FAULT-TOLERANT ASYNCHRONOUS CIRCUITS Asynchronous circuits [5] employ communication based on explicit handshaking, rather than strictly time-driven communication based on a common clock, and hence allow the design of "self-clocking" systems. The probably most elegant paradigm for modelling asynchronous circuits is transition signaling A. Incorporating fault-tolerance In delay-insensitive asynchronous circuits, causal precedence is the only meaningful relation between events: In typical implementations, the flow control is entirely based on a REQ/ACK handshake between the sender and the receiver of a data item. This handshaking transitively extends also to not directly connected pairs of sender and receiver. As a consequence, a "self-oscillating" mutual cause/effect relationship (REQ ⇒ ACK ⇒ REQ . . .) is established, which explicitly synchronizes sender and receiver, and, when employed in a feedback loop, leads to the desired self-clocking property. Unfortunately, this strict causality is too restrictive for FASY circuits, like threshold gates Backed up by distributed computing results like the impossibility of consensus in asynchronous systems B. Verification and proofs The properties to be achieved by the DARTS clocks are specified via two measures, namely, precision and accuracy in presence of at most f arbitrary (Byzantine) faulty TG units among n ≥ 3f + 2 TG units present on the chip. On some reasonably high abstraction level, we have developed mathematical proofs that the underlying distributed tick generation algorithm indeed satisfies those propertie

    Counter Attack on Byzantine Generals: Parameterized Model Checking of Fault-tolerant Distributed Algorithms

    Full text link
    We introduce an automated parameterized verification method for fault-tolerant distributed algorithms (FTDA). FTDAs are parameterized by both the number of processes and the assumed maximum number of Byzantine faulty processes. At the center of our technique is a parametric interval abstraction (PIA) where the interval boundaries are arithmetic expressions over parameters. Using PIA for both data abstraction and a new form of counter abstraction, we reduce the parameterized problem to finite-state model checking. We demonstrate the practical feasibility of our method by verifying several variants of the well-known distributed algorithm by Srikanth and Toueg. Our semi-decision procedures are complemented and motivated by an undecidability proof for FTDA verification which holds even in the absence of interprocess communication. To the best of our knowledge, this is the first paper to achieve parameterized automated verification of Byzantine FTDA

    How to speedup fault-tolerant clock generation in VLSI systems-on-chip via pipelining

    Get PDF
    Fault-tolerant clocking schemes become inevitable when it comes to highly-reliable chip designs. Because of the additional hardware overhead, existing solutions are considerably slower than their non-reliable counterparts. In this paper, we demonstrate that pipelining is a viable approach to speed up the distributed fault-tolerant DARTS clock generation approach introduced in (Függer, Schmid, Fuchs, Kempf, EDCC'06), where a distributed Byzantine fault-tolerant tick generation algorithm has been used to replace the traditional quartz oscillator and highly balanced clock tree in VLSI Systems-on-Chip (SoCs). We provide a pipelined version of the original DARTS algorithm, termed pDARTS, together with a novel modeling and analysis framework for hardware-implemented asynchronous fault-tolerant distributed algorithms, which is employed for rigorously analyzing its correctness & performance. Our results, which have also been confirmed by the experimental evaluation of an FPGA prototype implementation, reveal that pipelining indeed allows to entirely remove the adverse effect of large interconnect delays on the achievable clock frequency, and demonstrate again that methods and results from distributed algorithms research can successfully be applied in the VLSI context

    Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation

    Full text link
    Today's hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduced transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations. This paper is intended to be part of an attempt striving to overcome this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build a very robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel Byzantine fault-tolerant self-stabilizing pulse synchronization protocol, which can be implemented using standard asynchronous digital logic. Despite the strict limitations introduced by hardware designs, it offers optimal resilience and smaller complexity than all existing protocols.Comment: 52 pages, 7 figures, extended abstract published at SSS 201

    Fault-Tolerant Distributed Clock Generation in VLSI Systems-on-Chip

    No full text
    corecore