9,504 research outputs found
Spectra: Robust Estimation of Distribution Functions in Networks
Distributed aggregation allows the derivation of a given global aggregate
property from many individual local values in nodes of an interconnected
network system. Simple aggregates such as minima/maxima, counts, sums and
averages have been thoroughly studied in the past and are important tools for
distributed algorithms and network coordination. Nonetheless, this kind of
aggregates may not be comprehensive enough to characterize biased data
distributions or when in presence of outliers, making the case for richer
estimates of the values on the network. This work presents Spectra, a
distributed algorithm for the estimation of distribution functions over large
scale networks. The estimate is available at all nodes and the technique
depicts important properties, namely: robust when exposed to high levels of
message loss, fast convergence speed and fine precision in the estimate. It can
also dynamically cope with changes of the sampled local property, not requiring
algorithm restarts, and is highly resilient to node churn. The proposed
approach is experimentally evaluated and contrasted to a competing state of the
art distribution aggregation technique.Comment: Full version of the paper published at 12th IFIP International
Conference on Distributed Applications and Interoperable Systems (DAIS),
Stockholm (Sweden), June 201
Statistical Reliability Estimation of Microprocessor-Based Systems
What is the probability that the execution state of a given microprocessor running a given application is correct, in a certain working environment with a given soft-error rate? Trying to answer this question using fault injection can be very expensive and time consuming. This paper proposes the baseline for a new methodology, based on microprocessor error probability profiling, that aims at estimating fault injection results without the need of a typical fault injection setup. The proposed methodology is based on two main ideas: a one-time fault-injection analysis of the microprocessor architecture to characterize the probability of successful execution of each of its instructions in presence of a soft-error, and a static and very fast analysis of the control and data flow of the target software application to compute its probability of success. The presented work goes beyond the dependability evaluation problem; it also has the potential to become the backbone for new tools able to help engineers to choose the best hardware and software architecture to structurally maximize the probability of a correct execution of the target softwar
Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution
Flow-Updating (FU) is a fault-tolerant technique that has proved to be
efficient in practice for the distributed computation of aggregate functions in
communication networks where individual processors do not have access to global
information. Previous distributed aggregation protocols, based on repeated
sharing of input values (or mass) among processors, sometimes called
Mass-Distribution (MD) protocols, are not resilient to communication failures
(or message loss) because such failures yield a loss of mass. In this paper, we
present a protocol which we call Mass-Distribution with Flow-Updating (MDFU).
We obtain MDFU by applying FU techniques to classic MD. We analyze the
convergence time of MDFU showing that stochastic message loss produces low
overhead. This is the first convergence proof of an FU-based algorithm. We
evaluate MDFU experimentally, comparing it with previous MD and FU protocols,
and verifying the behavior predicted by the analysis. Finally, given that MDFU
incurs a fixed deviation proportional to the message-loss rate, we adjust the
accuracy of MDFU heuristically in a new protocol called MDFU with Linear
Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave
very well in practice, even under high rates of message loss and even changing
the input values dynamically.Comment: 18 pages, 5 figures, To appear in OPODIS 201
Transient analysis of manufacturing system performance
Includes bibliographical references (p. 28-34).Supported by the INDO-US Science and Technology Fellowship Program.Y. Narahari, N. Viswanadham
Approximate performability and dependability analysis using generalized stochastic Petri Nets
Since current day fault-tolerant and distributed computer and communication systems tend to be large and complex, their corresponding performability models will suffer from the same characteristics. Therefore, calculating performability measures from these models is a difficult and time-consuming task.\ud
\ud
To alleviate the largeness and complexity problem to some extent we use generalized stochastic Petri nets to describe to models and to automatically generate the underlying Markov reward models. Still however, many models cannot be solved with the current numerical techniques, although they are conveniently and often compactly described.\ud
\ud
In this paper we discuss two heuristic state space truncation techniques that allow us to obtain very good approximations for the steady-state performability while only assessing a few percent of the states of the untruncated model. For a class of reversible models we derive explicit lower and upper bounds on the exact steady-state performability. For a much wider class of models a truncation theorem exists that allows one to obtain bounds for the error made in the truncation. We discuss this theorem in the context of approximate performability models and comment on its applicability. For all the proposed truncation techniques we present examples showing their usefulness
On cost-effective reuse of components in the design of complex reconfigurable systems
Design strategies that benefit from the reuse of system components can reduce costs while maintaining or increasing dependability—we use the term dependability to tie together reliability and availability. D3H2 (aDaptive Dependable Design for systems with Homogeneous and Heterogeneous redundancies) is a methodology that supports the design of complex systems with a focus on reconfiguration and component reuse. D3H2 systematizes the identification of heterogeneous redundancies and optimizes the design of fault detection and reconfiguration mechanisms, by enabling the analysis of design alternatives with respect to dependability and cost. In this paper, we extend D3H2 for application to repairable systems. The method is extended with analysis capabilities allowing dependability assessment of complex reconfigurable systems. Analysed scenarios include time-dependencies between failure events and the corresponding reconfiguration actions. We demonstrate how D3H2 can support decisions about fault detection and reconfiguration that seek to improve dependability while reducing costs via application to a realistic railway case study
Techniques for the Fast Simulation of Models of Highly dependable Systems
With the ever-increasing complexity and requirements of highly dependable systems, their evaluation during design and operation is becoming more crucial. Realistic models of such systems are often not amenable to analysis using conventional analytic or numerical methods. Therefore, analysts and designers turn to simulation to evaluate these models. However, accurate estimation of dependability measures of these models requires that the simulation frequently observes system failures, which are rare events in highly dependable systems. This renders ordinary Simulation impractical for evaluating such systems. To overcome this problem, simulation techniques based on importance sampling have been developed, and are very effective in certain settings. When importance sampling works well, simulation run lengths can be reduced by several orders of magnitude when estimating transient as well as steady-state dependability measures. This paper reviews some of the importance-sampling techniques that have been developed in recent years to estimate dependability measures efficiently in Markov and nonMarkov models of highly dependable system
- …