47,316 research outputs found
Recommended from our members
Assessing Asymmetric Fault-Tolerant Software
The most popular forms of fault tolerance against design faults use "asymmetric" architectures in which a "primary" part performs the computation and a "secondary" part is in charge of detecting errors and performing some kind of error processing and recovery. In contrast, the most studied forms of software fault tolerance are "symmetric" ones, e.g. N-version programming. The latter are often controversial, the former are not. We discuss how to assess the dependability gains achieved by these methods. Substantial difficulties have been shown to exist for symmetric schemes, but we show that the same difficulties affect asymmetric schemes. Indeed, the latter present somewhat subtler problems. In both cases, to predict the dependability of the fault-tolerant system it is not enough to know the dependability of the individual components. We extend to asymmetric architectures the style of probabilistic modeling that has been useful for describing the dependability of "symmetric" architectures, to highlight factors that complicate the assessment. In the light of these models, we finally discuss fault injection approaches to estimating coverage factors. We highlight the limits of what can be predicted and some useful research directions towards clarifying and extending the range of situations in which estimates of coverage of fault tolerance mechanisms can be trusted
Fault-tolerance techniques for hybrid CMOS/nanoarchitecture
The authors propose two fault-tolerance techniques for hybrid CMOS/nanoarchitecture implementing logic functions as look-up tables. The authors compare the efficiency of the proposed techniques with recently reported methods that use single coding schemes in tolerating high fault rates in nanoscale fabrics. Both proposed techniques are based on error correcting codes to tackle different fault rates. In the first technique, the authors implement a combined two-dimensional coding scheme using Hamming and Bose-Chaudhuri-Hocquenghem (BCH) codes to address fault rates greater than 5. In the second technique, Hamming coding is complemented with bad line exclusion technique to tolerate fault rates higher than the first proposed technique (up to 20). The authors have also estimated the improvement that can be achieved in the circuit reliability in the presence of Don-t Care Conditions. The area, latency and energy costs of the proposed techniques were also estimated in the CMOS domain
Recommended from our members
Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers
If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present
Toward the assessment of the susceptibility of a digital system to lightning upset
Accomplishments and directions for further research aimed at developing methods for assessing a candidate design of an avionic computer with respect to susceptability to lightning upset are reported. Emphasis is on fault tolerant computers. Both lightning stress and shielding are covered in a review of the electromagnetic environment. Stress characterization, system characterization, upset detection, and positive and negative design features are considered. A first cut theory of comparing candidate designs is presented including tests of comparative susceptability as well as its analysis and simulation. An approach to lightning induced transient fault effects is included
Application of advanced technology to space automation
Automated operations in space provide the key to optimized mission design and data acquisition at minimum cost for the future. The results of this study strongly accentuate this statement and should provide further incentive for immediate development of specific automtion technology as defined herein. Essential automation technology requirements were identified for future programs. The study was undertaken to address the future role of automation in the space program, the potential benefits to be derived, and the technology efforts that should be directed toward obtaining these benefits
- …