7,990 research outputs found
Resilience in Numerical Methods: A Position on Fault Models and Methodologies
Future extreme-scale computer systems may expose silent data corruption (SDC)
to applications, in order to save energy or increase performance. However,
resilience research struggles to come up with useful abstract programming
models for reasoning about SDC. Existing work randomly flips bits in running
applications, but this only shows average-case behavior for a low-level,
artificial hardware model. Algorithm developers need to understand worst-case
behavior with the higher-level data types they actually use, in order to make
their algorithms more resilient. Also, we know so little about how SDC may
manifest in future hardware, that it seems premature to draw conclusions about
the average case. We argue instead that numerical algorithms can benefit from a
numerical unreliability fault model, where faults manifest as unbounded
perturbations to floating-point data. Algorithms can use inexpensive "sanity"
checks that bound or exclude error in the results of computations. Given a
selective reliability programming model that requires reliability only when and
where needed, such checks can make algorithms reliable despite unbounded
faults. Sanity checks, and in general a healthy skepticism about the
correctness of subroutines, are wise even if hardware is perfectly reliable.Comment: Position Pape
Active Virtual Network Management Prediction: Complexity as a Framework for Prediction, Optimization, and Assurance
Research into active networking has provided the incentive to re-visit what
has traditionally been classified as distinct properties and characteristics of
information transfer such as protocol versus service; at a more fundamental
level this paper considers the blending of computation and communication by
means of complexity. The specific service examined in this paper is network
self-prediction enabled by Active Virtual Network Management Prediction.
Computation/communication is analyzed via Kolmogorov Complexity. The result is
a mechanism to understand and improve the performance of active networking and
Active Virtual Network Management Prediction in particular. The Active Virtual
Network Management Prediction mechanism allows information, in various states
of algorithmic and static form, to be transported in the service of prediction
for network management. The results are generally applicable to algorithmic
transmission of information. Kolmogorov Complexity is used and experimentally
validated as a theory describing the relationship among algorithmic
compression, complexity, and prediction accuracy within an active network.
Finally, the paper concludes with a complexity-based framework for Information
Assurance that attempts to take a holistic view of vulnerability analysis
Production of Reliable Flight Crucial Software: Validation Methods Research for Fault Tolerant Avionics and Control Systems Sub-Working Group Meeting
The state of the art in the production of crucial software for flight control applications was addressed. The association between reliability metrics and software is considered. Thirteen software development projects are discussed. A short term need for research in the areas of tool development and software fault tolerance was indicated. For the long term, research in format verification or proof methods was recommended. Formal specification and software reliability modeling, were recommended as topics for both short and long term research
Comparing Experiments to the Fault-Tolerance Threshold
Achieving error rates that meet or exceed the fault-tolerance threshold is a
central goal for quantum computing experiments, and measuring these error rates
using randomized benchmarking is now routine. However, direct comparison
between measured error rates and thresholds is complicated by the fact that
benchmarking estimates average error rates while thresholds reflect worst-case
behavior when a gate is used as part of a large computation. These two measures
of error can differ by orders of magnitude in the regime of interest. Here we
facilitate comparison between the experimentally accessible average error rates
and the worst-case quantities that arise in current threshold theorems by
deriving relations between the two for a variety of physical noise sources. Our
results indicate that it is coherent errors that lead to an enormous mismatch
between average and worst case, and we quantify how well these errors must be
controlled to ensure fair comparison between average error probabilities and
fault-tolerance thresholds.Comment: 5 pages, 2 figures, 13 page appendi
Causality and Temporal Dependencies in the Design of Fault Management Systems
Reasoning about causes and effects naturally arises in the engineering of
safety-critical systems. A classical example is Fault Tree Analysis, a
deductive technique used for system safety assessment, whereby an undesired
state is reduced to the set of its immediate causes. The design of fault
management systems also requires reasoning on causality relationships. In
particular, a fail-operational system needs to ensure timely detection and
identification of faults, i.e. recognize the occurrence of run-time faults
through their observable effects on the system. Even more complex scenarios
arise when multiple faults are involved and may interact in subtle ways.
In this work, we propose a formal approach to fault management for complex
systems. We first introduce the notions of fault tree and minimal cut sets. We
then present a formal framework for the specification and analysis of
diagnosability, and for the design of fault detection and identification (FDI)
components. Finally, we review recent advances in fault propagation analysis,
based on the Timed Failure Propagation Graphs (TFPG) formalism.Comment: In Proceedings CREST 2017, arXiv:1710.0277
Economic Small-World Behavior in Weighted Networks
The small-world phenomenon has been already the subject of a huge variety of
papers, showing its appeareance in a variety of systems. However, some big
holes still remain to be filled, as the commonly adopted mathematical
formulation suffers from a variety of limitations, that make it unsuitable to
provide a general tool of analysis for real networks, and not just for
mathematical (topological) abstractions. In this paper we show where the major
problems arise, and how there is therefore the need for a new reformulation of
the small-world concept. Together with an analysis of the variables involved,
we then propose a new theory of small-world networks based on two leading
concepts: efficiency and cost. Efficiency measures how well information
propagates over the network, and cost measures how expensive it is to build a
network. The combination of these factors leads us to introduce the concept of
{\em economic small worlds}, that formalizes the idea of networks that are
"cheap" to build, and nevertheless efficient in propagating information, both
at global and local scale. This new concept is shown to overcome all the
limitations proper of the so-far commonly adopted formulation, and to provide
an adequate tool to quantitatively analyze the behaviour of complex networks in
the real world. Various complex systems are analyzed, ranging from the realm of
neural networks, to social sciences, to communication and transportation
networks. In each case, economic small worlds are found. Moreover, using the
economic small-world framework, the construction principles of these networks
can be quantitatively analyzed and compared, giving good insights on how
efficiency and economy principles combine up to shape all these systems.Comment: 17 pages, 10 figures, 4 table
Stochastic Analysis of a Churn-Tolerant Structured Peer-to-Peer Scheme
We present and analyze a simple and general scheme to build a churn
(fault)-tolerant structured Peer-to-Peer (P2P) network. Our scheme shows how to
"convert" a static network into a dynamic distributed hash table(DHT)-based P2P
network such that all the good properties of the static network are guaranteed
with high probability (w.h.p). Applying our scheme to a cube-connected cycles
network, for example, yields a degree connected network, in which
every search succeeds in hops w.h.p., using messages,
where is the expected stable network size. Our scheme has an constant
storage overhead (the number of nodes responsible for servicing a data item)
and an overhead (messages and time) per insertion and essentially
no overhead for deletions. All these bounds are essentially optimal. While DHT
schemes with similar guarantees are already known in the literature, this work
is new in the following aspects:
(1) It presents a rigorous mathematical analysis of the scheme under a
general stochastic model of churn and shows the above guarantees;
(2) The theoretical analysis is complemented by a simulation-based analysis
that validates the asymptotic bounds even in moderately sized networks and also
studies performance under changing stable network size;
(3) The presented scheme seems especially suitable for maintaining dynamic
structures under churn efficiently. In particular, we show that a spanning tree
of low diameter can be efficiently maintained in constant time and logarithmic
number of messages per insertion or deletion w.h.p.
Keywords: P2P Network, DHT Scheme, Churn, Dynamic Spanning Tree, Stochastic
Analysis
- …