670 research outputs found

    Software dependability techniques validated via fault injection experiments

    Get PDF
    The present paper proposes a C/C++ source-to-source compiler able to increase the dependability properties of a given application. The adopted strategy is based on two main techniques: variable duplication/triplication and control flow checking. The validation of these techniques is based on the emulation of fault appearance by software fault injection. The chosen test case is a client-server application in charge of calculating and drawing a Mandelbrot fracta

    Validation of a software dependability tool via fault injection experiments

    Get PDF
    Presents the validation of the strategies employed in the RECCO tool to analyze a C/C++ software; the RECCO compiler scans C/C++ source code to extract information about the significance of the variables that populate the program and the code structure itself. Experimental results gathered on an Open Source Router are used to compare and correlate two sets of critical variables, one obtained by fault injection experiments, and the other applying the RECCO tool, respectively. Then the two sets are analyzed, compared, and correlated to prove the effectiveness of RECCO's methodology

    Software dependability modeling using an industry-standard architecture description language

    Full text link
    Performing dependability evaluation along with other analyses at architectural level allows both making architectural tradeoffs and predicting the effects of architectural decisions on the dependability of an application. This paper gives guidelines for building architectural dependability models for software systems using the AADL (Architecture Analysis and Design Language). It presents reusable modeling patterns for fault-tolerant applications and shows how the presented patterns can be used in the context of a subsystem of a real-life application

    Reliability Analysis of Concurrent Systems using LTSA

    Get PDF
    The analysis for software dependability is considered an important task within the software engineering life cycle. However, it is often impossible to carry out this task due to the complexity of available tools, lack of expert personnel and time-to-market pressures. As a result, released software versions may present unverified dependability properties subjecting customers to blind software reliability assessment. In particular, concurrent systems present certain behaviour that require a more complex system analysis not easily grasped at system design and architecture level

    Software dependability in the Tandem GUARDIAN system

    Get PDF
    Based on extensive field failure data for Tandem's GUARDIAN operating system this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate

    Principles of Antifragile Software

    Full text link
    The goal of this paper is to study and define the concept of "antifragile software". For this, I start from Taleb's statement that antifragile systems love errors, and discuss whether traditional software dependability fits into this class. The answer is somewhat negative, although adaptive fault tolerance is antifragile: the system learns something when an error happens, and always imrpoves. Automatic runtime bug fixing is changing the code in response to errors, fault injection in production means injecting errors in business critical software. I claim that both correspond to antifragility. Finally, I hypothesize that antifragile development processes are better at producing antifragile software systems.Comment: see https://refuses.github.io

    Hierarchical Simulation to Assess Hardware and Software Dependability

    Get PDF
    This thesis presents a method for conducting hierarchical simulations to assess system hardware and software dependability. The method is intended to model embedded microprocessor systems. A key contribution of the thesis is the idea of using fault dictionaries to propagate fault effects upward from the level of abstraction where a fault model is assumed to the system level where the ultimate impact of the fault is observed. A second important contribution is the analysis of the software behavior under faults as well as the hardware behavior. The simulation method is demonstrated and validated in four case studies analyzing Myrinet, a commercial, high-speed networking system. One key result from the case studies shows that the simulation method predicts the same fault impact 87.5% of the time as is obtained by similar fault injections into a real Myrinet system. Reasons for the remaining discrepancy are examined in the thesis. A second key result shows the reduction in the number of simulations needed due to the fault dictionary method. In one case study, 500 faults were injected at the chip level, but only 255 propagated to the system level. Of these 255 faults, 110 shared identical fault dictionary entries at the system level and so did not need to be resimulated. The necessary number of system-level simulations was therefore reduced from 500 to 145. Finally, the case studies show how the simulation method can be used to improve the dependability of the target system. The simulation analysis was used to add recovery to the target software for the most common fault propagation mechanisms that would cause the software to hang. After the modification, the number of hangs was reduced by 60% for fault injections into the real system

    Reliability demonstration for safety-critical systems

    Get PDF
    This paper suggests a new model for reliability demonstration of safety-critical systems, based on the TRW Software Reliability Theory. The paper describes the model; the test equipment required and test strategies based on the various constraints occurring during software development. The paper also compares a new testing method, Single Risk Sequential Testing (SRST), with the standard Probability Ratio Sequential Testing method (PRST), and concludes that: ‱ SRST provides higher chances of success than PRST ‱ SRST takes less time to complete than PRST ‱ SRST satisfies the consumer risk criterion, whereas PRST provides a much smaller consumer risk than the requirement
    • 

    corecore