7,990 research outputs found

    Resilience in Numerical Methods: A Position on Fault Models and Methodologies

    Full text link
    Future extreme-scale computer systems may expose silent data corruption (SDC) to applications, in order to save energy or increase performance. However, resilience research struggles to come up with useful abstract programming models for reasoning about SDC. Existing work randomly flips bits in running applications, but this only shows average-case behavior for a low-level, artificial hardware model. Algorithm developers need to understand worst-case behavior with the higher-level data types they actually use, in order to make their algorithms more resilient. Also, we know so little about how SDC may manifest in future hardware, that it seems premature to draw conclusions about the average case. We argue instead that numerical algorithms can benefit from a numerical unreliability fault model, where faults manifest as unbounded perturbations to floating-point data. Algorithms can use inexpensive "sanity" checks that bound or exclude error in the results of computations. Given a selective reliability programming model that requires reliability only when and where needed, such checks can make algorithms reliable despite unbounded faults. Sanity checks, and in general a healthy skepticism about the correctness of subroutines, are wise even if hardware is perfectly reliable.Comment: Position Pape

    Active Virtual Network Management Prediction: Complexity as a Framework for Prediction, Optimization, and Assurance

    Full text link
    Research into active networking has provided the incentive to re-visit what has traditionally been classified as distinct properties and characteristics of information transfer such as protocol versus service; at a more fundamental level this paper considers the blending of computation and communication by means of complexity. The specific service examined in this paper is network self-prediction enabled by Active Virtual Network Management Prediction. Computation/communication is analyzed via Kolmogorov Complexity. The result is a mechanism to understand and improve the performance of active networking and Active Virtual Network Management Prediction in particular. The Active Virtual Network Management Prediction mechanism allows information, in various states of algorithmic and static form, to be transported in the service of prediction for network management. The results are generally applicable to algorithmic transmission of information. Kolmogorov Complexity is used and experimentally validated as a theory describing the relationship among algorithmic compression, complexity, and prediction accuracy within an active network. Finally, the paper concludes with a complexity-based framework for Information Assurance that attempts to take a holistic view of vulnerability analysis

    Production of Reliable Flight Crucial Software: Validation Methods Research for Fault Tolerant Avionics and Control Systems Sub-Working Group Meeting

    Get PDF
    The state of the art in the production of crucial software for flight control applications was addressed. The association between reliability metrics and software is considered. Thirteen software development projects are discussed. A short term need for research in the areas of tool development and software fault tolerance was indicated. For the long term, research in format verification or proof methods was recommended. Formal specification and software reliability modeling, were recommended as topics for both short and long term research

    Comparing Experiments to the Fault-Tolerance Threshold

    Full text link
    Achieving error rates that meet or exceed the fault-tolerance threshold is a central goal for quantum computing experiments, and measuring these error rates using randomized benchmarking is now routine. However, direct comparison between measured error rates and thresholds is complicated by the fact that benchmarking estimates average error rates while thresholds reflect worst-case behavior when a gate is used as part of a large computation. These two measures of error can differ by orders of magnitude in the regime of interest. Here we facilitate comparison between the experimentally accessible average error rates and the worst-case quantities that arise in current threshold theorems by deriving relations between the two for a variety of physical noise sources. Our results indicate that it is coherent errors that lead to an enormous mismatch between average and worst case, and we quantify how well these errors must be controlled to ensure fair comparison between average error probabilities and fault-tolerance thresholds.Comment: 5 pages, 2 figures, 13 page appendi

    Causality and Temporal Dependencies in the Design of Fault Management Systems

    Get PDF
    Reasoning about causes and effects naturally arises in the engineering of safety-critical systems. A classical example is Fault Tree Analysis, a deductive technique used for system safety assessment, whereby an undesired state is reduced to the set of its immediate causes. The design of fault management systems also requires reasoning on causality relationships. In particular, a fail-operational system needs to ensure timely detection and identification of faults, i.e. recognize the occurrence of run-time faults through their observable effects on the system. Even more complex scenarios arise when multiple faults are involved and may interact in subtle ways. In this work, we propose a formal approach to fault management for complex systems. We first introduce the notions of fault tree and minimal cut sets. We then present a formal framework for the specification and analysis of diagnosability, and for the design of fault detection and identification (FDI) components. Finally, we review recent advances in fault propagation analysis, based on the Timed Failure Propagation Graphs (TFPG) formalism.Comment: In Proceedings CREST 2017, arXiv:1710.0277

    Economic Small-World Behavior in Weighted Networks

    Get PDF
    The small-world phenomenon has been already the subject of a huge variety of papers, showing its appeareance in a variety of systems. However, some big holes still remain to be filled, as the commonly adopted mathematical formulation suffers from a variety of limitations, that make it unsuitable to provide a general tool of analysis for real networks, and not just for mathematical (topological) abstractions. In this paper we show where the major problems arise, and how there is therefore the need for a new reformulation of the small-world concept. Together with an analysis of the variables involved, we then propose a new theory of small-world networks based on two leading concepts: efficiency and cost. Efficiency measures how well information propagates over the network, and cost measures how expensive it is to build a network. The combination of these factors leads us to introduce the concept of {\em economic small worlds}, that formalizes the idea of networks that are "cheap" to build, and nevertheless efficient in propagating information, both at global and local scale. This new concept is shown to overcome all the limitations proper of the so-far commonly adopted formulation, and to provide an adequate tool to quantitatively analyze the behaviour of complex networks in the real world. Various complex systems are analyzed, ranging from the realm of neural networks, to social sciences, to communication and transportation networks. In each case, economic small worlds are found. Moreover, using the economic small-world framework, the construction principles of these networks can be quantitatively analyzed and compared, giving good insights on how efficiency and economy principles combine up to shape all these systems.Comment: 17 pages, 10 figures, 4 table

    Stochastic Analysis of a Churn-Tolerant Structured Peer-to-Peer Scheme

    Full text link
    We present and analyze a simple and general scheme to build a churn (fault)-tolerant structured Peer-to-Peer (P2P) network. Our scheme shows how to "convert" a static network into a dynamic distributed hash table(DHT)-based P2P network such that all the good properties of the static network are guaranteed with high probability (w.h.p). Applying our scheme to a cube-connected cycles network, for example, yields a O(logN)O(\log N) degree connected network, in which every search succeeds in O(logN)O(\log N) hops w.h.p., using O(logN)O(\log N) messages, where NN is the expected stable network size. Our scheme has an constant storage overhead (the number of nodes responsible for servicing a data item) and an O(logN)O(\log N) overhead (messages and time) per insertion and essentially no overhead for deletions. All these bounds are essentially optimal. While DHT schemes with similar guarantees are already known in the literature, this work is new in the following aspects: (1) It presents a rigorous mathematical analysis of the scheme under a general stochastic model of churn and shows the above guarantees; (2) The theoretical analysis is complemented by a simulation-based analysis that validates the asymptotic bounds even in moderately sized networks and also studies performance under changing stable network size; (3) The presented scheme seems especially suitable for maintaining dynamic structures under churn efficiently. In particular, we show that a spanning tree of low diameter can be efficiently maintained in constant time and logarithmic number of messages per insertion or deletion w.h.p. Keywords: P2P Network, DHT Scheme, Churn, Dynamic Spanning Tree, Stochastic Analysis
    corecore