18 research outputs found

    Software dependability in the Tandem GUARDIAN system

    Get PDF
    Based on extensive field failure data for Tandem's GUARDIAN operating system this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate

    Dependability in Federated Cloud Environments

    Get PDF
    Cloud Computing has emerged as a large-scale distributed system model for utility computing, whereby services are supplied on-demand. It has been proposed that Clouds are in the process of evolving from single, monolithic Clouds such as EC2 or Microsoft Azure serving many consumers to a federation of autonomous Clouds. However, there remain a number of research challenges in building dependable and robust Clouds; a critical research problem that has not yet to be fully understood. This paper discusses the issues and challenges surrounding Cloud dependability, and outlines research areas of opportunity for improving the dependability and robustness of federated Clouds

    Reproducibility of environment-dependent software failures: An experience report

    Get PDF
    Abstract-We investigate the dependence of software failure reproducibility on the environment in which the software is executed. The existence of such dependence is ascertained in literature, but so far it is not fully characterized. In this paper we pinpoint some of the environmental components that can affect the reproducibility of a failure and show this influence through an experimental campaign conducted on the MySQL Server software system. The set of failures of interest is drawn from MySQL's failure reports database and an experiment is designed for each of these failures. The experiments expose the influence of disk usage and level of concurrency on MySQL failure reproducibility. Furthermore, the results show that high levels of usage of these factors increase the probabilities of failure reproducibility

    Automation Derivation of Application-Aware Error Detectors Using Compiler Analysis

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Science Foundation / NSF ACI CNS-040634 and NSF CNS 05-24695Gigascale Systems Research CenterMotorola Corp

    Safety Implications of Robotic Surgery: A Study of 13 Years of FDA Data on da Vinci Surgical Systems

    Get PDF
    Robotic surgical systems are intended to enable surgeons to perform minimally invasive operations with increased vision, precision, dexterity, and control, and to reduce the rate of injuries, blood loss, length of hospital stay, and post-operative complications. Recently, concerns regarding the safety and effectiveness of robot-assisted surgeries have heightened as an increased number of adverse events associated with the surgical robots have been reported to the U.S. Food and Drug Administration (FDA). Our study focuses on the analysis of the adverse events and recalls of da Vinci surgical systems, collected by the FDA over a period of 13 years from 2000 to 2012. We use the data on deaths, injuries, and robot malfunctions, combined with the technical problems and corresponding recovery actions taken by the company (provided by the recalls), together with systematic accident analysis using a tool called CAST. Using an automated natural language parsing tool trained with domain-specific dictionaries and part-of-speech and negation taggers, we extracted valuable information on the potential causes of robotic accidents in order to understand the effectiveness of using robotic devices for different minimally invasive procedures. We found that despite the increasing number of procedures being done with the da Vinci surgical system, a significant number of malfunctions and system downtimes with potentially adverse impacts on patients are being experienced. We provide insights on the use of existing state-of-the-art technologies for enhancing safety in future robotic surgical systems.National Science Foundation (NSF CNS10-18503 CISE); IBM Corporation; Infosys LtdOpe
    corecore