3,496 research outputs found
Characterization and Scaling of MOS Flip Flop Performance in Synchronizer Applications
The measured and calculated values of t he Flip Flop parameters needed to specify synchronizer reliability are presented for 3 different depletion-load,
silicon gate, NMOS, R-S Flip Flop circuits with gate lengths ranging from 6μm to 4.2μm. Estimates of the probability of synchronizer failure to resolve within allowed or desired times can be determined from these
parameters
Formal Design of Asynchronous Fault Detection and Identification Components using Temporal Epistemic Logic
Autonomous critical systems, such as satellites and space rovers, must be
able to detect the occurrence of faults in order to ensure correct operation.
This task is carried out by Fault Detection and Identification (FDI)
components, that are embedded in those systems and are in charge of detecting
faults in an automated and timely manner by reading data from sensors and
triggering predefined alarms. The design of effective FDI components is an
extremely hard problem, also due to the lack of a complete theoretical
foundation, and of precise specification and validation techniques. In this
paper, we present the first formal approach to the design of FDI components for
discrete event systems, both in a synchronous and asynchronous setting. We
propose a logical language for the specification of FDI requirements that
accounts for a wide class of practical cases, and includes novel aspects such
as maximality and trace-diagnosability. The language is equipped with a clear
semantics based on temporal epistemic logic, and is proved to enjoy suitable
properties. We discuss how to validate the requirements and how to verify that
a given FDI component satisfies them. We propose an algorithm for the synthesis
of correct-by-construction FDI components, and report on the applicability of
the design approach on an industrial case-study coming from aerospace.Comment: 33 pages, 20 figure
Fault-Tolerant Distributed Services in Message-Passing Systems
Distributed systems ranging from small local area networks to large wide area networks like the Internet composed of static and/or mobile users have become increasingly popular. A desirable property for any distributed service is fault-tolerance, which means the service remains uninterrupted even if some components in the network fail. This dissertation considers weak distributed models to find either algorithms to solve certain problems or impossibility proofs to show that a problem is unsolvable. These are the main contributions of this dissertation:
• Failure detectors are used as a service to solve consensus (agreement among nodes) which is otherwise impossible in failure-prone asynchronous systems. We find an algorithm for crash-failure detection that uses bounded size messages in an arbitrary, partitionable network composed of badly- behaved channels that can lose and reorder messages.
• Registers are a fundamental building block for shared memory emulations on top of message passing systems. The problem has been extensively studied in static systems. However, register emulation in dynamic systems with faulty nodes is still quite hard and there are impossibility proofs that point out scenarios where change in the system composition due to nodes entering and leaving (also called churn) makes the problem unsolvable.
We propose the first emulation of a crash-fault tolerant register in a system with continuous churn where consensus is unsolvable, the size of the system can grow without bound and at most a constant fraction of the number of nodes in the system can fail by crashing. We prove a lower bound that states that fault-tolerance for dynamic systems with churn is inherently lower than in static systems.
• We then extend the results in the crash-fault tolerant case to a dynamic system with continuous churn and nodes that can be Byzantine faulty. It is the first emulation of an atomic register in a system that can withstand nodes continually entering and leaving, imposes no upper bound on the system size and can tolerate Byzantine nodes. However, the number of Byzantine faulty nodes that can be tolerated is upper bounded by a constant number. Although the algorithm requires that there be a constant known upper bound on the number of Byzantine nodes, this restriction is unavoidable, as we show that it is impossible to emulate an atomic register if the system size and maximum number of servers that can be Byzantine in the system is unknown
Monitoring Partially Synchronous Distributed Systems using SMT Solvers
In this paper, we discuss the feasibility of monitoring partially synchronous
distributed systems to detect latent bugs, i.e., errors caused by concurrency
and race conditions among concurrent processes. We present a monitoring
framework where we model both system constraints and latent bugs as
Satisfiability Modulo Theories (SMT) formulas, and we detect the presence of
latent bugs using an SMT solver. We demonstrate the feasibility of our
framework using both synthetic applications where latent bugs occur at any time
with random probability and an application involving exclusive access to a
shared resource with a subtle timing bug. We illustrate how the time required
for verification is affected by parameters such as communication frequency,
latency, and clock skew. Our results show that our framework can be used for
real-life applications, and because our framework uses SMT solvers, the range
of appropriate applications will increase as these solvers become more
efficient over time.Comment: Technical Report corresponding to the paper accepted at Runtime
Verification (RV) 201
A Prescription for Partial Synchrony
Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals.
Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size
and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic
constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior.
As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured
by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system
services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well.
Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors
Fairness Properties of the Trusting Failure Detector
In 1985 it was shown by Fischer et al. that consensus, a fundamental problem in distributed computing, was impossible in asynchronous distributed systems in the presence of even just one process failure. This result prompted a search for alternative system models that were capable of solving such problems and culminated in the development of two helpful constructs: partially synchronous system models and failure detectors. Partially synchronous system models seek to solve the problem of identifying process crashes by constraining the real-time behavior of the underlying system. In the resulting models, crashed processes can be detected indirectly through the use of timeouts. Failure detectors, on the other hand, address process crashes by directly providing (potentially inaccurate) information on failures. As a result, failure detectors were viewed as abstractions of real-time information. Pike et al. proposed a different perspective on failure detectors; as abstracting fairness properties. Fairness in a system imposes bounds on the relative frequencies of communication and execution between processes in a system, and it was shown that four frequently-used failure detectors from the Chandra-Toueg hierarchy (P, ♢P, S, ♢S) encapsulate these fairness properties. This discovery suggests that failure detectors may be better understood as abstractions of fairness rather than real-time properties as well as demonstrates the possibility to communicate results between systems augmented with failure detectors and partially synchronous system models. In this thesis, we will be discussing an extension of the Pike et al. result to the trusting failure detector. The trusting failure detector is the weakest failure detector to implement the problem of fault-tolerant mutual exclusion: a fundamental primitive for distributed computing
Runtime Verification Based on Executable Models: On-the-Fly Matching of Timed Traces
Runtime verification is checking whether a system execution satisfies or
violates a given correctness property. A procedure that automatically, and
typically on the fly, verifies conformance of the system's behavior to the
specified property is called a monitor. Nowadays, a variety of formalisms are
used to express properties on observed behavior of computer systems, and a lot
of methods have been proposed to construct monitors. However, it is a frequent
situation when advanced formalisms and methods are not needed, because an
executable model of the system is available. The original purpose and structure
of the model are out of importance; rather what is required is that the system
and its model have similar sets of interfaces. In this case, monitoring is
carried out as follows. Two "black boxes", the system and its reference model,
are executed in parallel and stimulated with the same input sequences; the
monitor dynamically captures their output traces and tries to match them. The
main problem is that a model is usually more abstract than the real system,
both in terms of functionality and timing. Therefore, trace-to-trace matching
is not straightforward and allows the system to produce events in different
order or even miss some of them. The paper studies on-the-fly conformance
relations for timed systems (i.e., systems whose inputs and outputs are
distributed along the time axis). It also suggests a practice-oriented
methodology for creating and configuring monitors for timed systems based on
executable models. The methodology has been successfully applied to a number of
industrial projects of simulation-based hardware verification.Comment: In Proceedings MBT 2013, arXiv:1303.037
- …