319,399 research outputs found
How Errors in Component Reliability Affect System Reliability
This paper studies how sampling variation in component reliability estimates affects the computation of system reliability that uses these estimates as input. Results show that relative bias in system reliability grows quadratically with the number of components for which each component reliability estimate is used, whereas the corresponding coefficient of variation grows linearly with this number of components. If these components are in parallel they lead to an understatement of system reliability. In series, they lead to an overstatement. The paper describes resampling schemes that eliminate bias without increasing the dominant variance term
Cross-layer system reliability assessment framework for hardware faults
System reliability estimation during early design phases facilitates informed decisions for the integration of effective protection mechanisms against different classes of hardware faults. When not all system abstraction layers (technology, circuit, microarchitecture, software) are factored in such an estimation model, the delivered reliability reports must be excessively pessimistic and thus lead to unacceptably expensive, over-designed systems. We propose a scalable, cross-layer methodology and supporting suite of tools for accurate but fast estimations of computing systems reliability. The backbone of the methodology is a component-based Bayesian model, which effectively calculates system reliability based on the masking probabilities of individual hardware and software components considering their complex interactions. Our detailed experimental evaluation for different technologies, microarchitectures, and benchmarks demonstrates that the proposed model delivers very accurate reliability estimations (FIT rates) compared to statistically significant but slow fault injection campaigns at the microarchitecture level.Peer ReviewedPostprint (author's final draft
Recommended from our members
Assessing Asymmetric Fault-Tolerant Software
The most popular forms of fault tolerance against design faults use "asymmetric" architectures in which a "primary" part performs the computation and a "secondary" part is in charge of detecting errors and performing some kind of error processing and recovery. In contrast, the most studied forms of software fault tolerance are "symmetric" ones, e.g. N-version programming. The latter are often controversial, the former are not. We discuss how to assess the dependability gains achieved by these methods. Substantial difficulties have been shown to exist for symmetric schemes, but we show that the same difficulties affect asymmetric schemes. Indeed, the latter present somewhat subtler problems. In both cases, to predict the dependability of the fault-tolerant system it is not enough to know the dependability of the individual components. We extend to asymmetric architectures the style of probabilistic modeling that has been useful for describing the dependability of "symmetric" architectures, to highlight factors that complicate the assessment. In the light of these models, we finally discuss fault injection approaches to estimating coverage factors. We highlight the limits of what can be predicted and some useful research directions towards clarifying and extending the range of situations in which estimates of coverage of fault tolerance mechanisms can be trusted
A research review of quality assessment for software
Measures were recommended to assess the quality of software submitted to the AdaNet program. The quality factors that are important to software reuse are explored and methods of evaluating those factors are discussed. Quality factors important to software reuse are: correctness, reliability, verifiability, understandability, modifiability, and certifiability. Certifiability is included because the documentation of many factors about a software component such as its efficiency, portability, and development history, constitute a class for factors important to some users, not important at all to other, and impossible for AdaNet to distinguish between a priori. The quality factors may be assessed in different ways. There are a few quantitative measures which have been shown to indicate software quality. However, it is believed that there exists many factors that indicate quality and have not been empirically validated due to their subjective nature. These subjective factors are characterized by the way in which they support the software engineering principles of abstraction, information hiding, modularity, localization, confirmability, uniformity, and completeness
Improving Software Performance in the Compute Unified Device Architecture
This paper analyzes several aspects regarding the improvement of software performance for applications written in the Compute Unified Device Architecture CUDA). We address an issue of great importance when programming a CUDA application: the Graphics Processing Unitâs (GPUâs) memory management through ranspose ernels. We also benchmark and evaluate the performance for progressively optimizing a transposing matrix application in CUDA. One particular interest was to research how well the optimization techniques, applied to software application written in CUDA, scale to the latest generation of general-purpose graphic processors units (GPGPU), like the Fermi architecture implemented in the GTX480 and the previous architecture implemented in GTX280. Lately, there has been a lot of interest in the literature for this type of optimization analysis, but none of the works so far (to our best knowledge) tried to validate if the optimizations can apply to a GPU from the latest Fermi architecture and how well does the Fermi architecture scale to these software performance improving techniques.Compute Unified Device Architecture, Fermi Architecture, Naive Transpose, Coalesced Transpose, Shared Memory Copy, Loop in Kernel, Loop over Kernel
A compositional method for reliability analysis of workflows affected by multiple failure modes
We focus on reliability analysis for systems designed as workflow based compositions of components. Components are characterized by their failure profiles, which take into account possible multiple failure modes. A compositional calculus is provided to evaluate the failure profile of a composite system, given failure profiles of the components. The calculus is described as a syntax-driven procedure that synthesizes a workflows failure profile. The method is viewed as a design-time aid that can help software engineers reason about systems reliability in the early stage of development. A simple case study is presented to illustrate the proposed approach
- âŠ