692 research outputs found

    Developing a distributed electronic health-record store for India

    Get PDF
    The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India

    Evaluation of Resiliency in a Wide-area Backup Protection System via Model Checking

    Get PDF
    Modern civilization relies heavily on having access to reliable power sources. Recent history has shown that present day protection systems are not adequate. Numerous backup protection (BP) systems have been proposed to mitigate the impact of primary protection system failures. Many of these novel BP systems rely on autonomous agents communicating via wide-area networks. These systems are highly complex and their control logic is based on distributed computing. Model checking has been shown to be a powerful tool in analyzing the behavior of distributed systems. In this research the model checker SPIN is used to evaluate the resiliency of an agent based wide-area backup protection (WABP) system. All combinations of WABP system component malfunctions that lead to system failure are identified and classified. The results of this research indicate that the WABP system evaluated is more resilient to component malfunctions than previously reported. Possible WABP system improvements are introduced as well

    Enhancing Program Soft Error Resilience through Algorithmic Approaches

    Get PDF
    The rising count and shrinking feature size of transistors within modern computers is making them increasingly vulnerable to various types of soft faults. This problem is especially acute in high-performance computing (HPC) systems used for scientific computing, because these systems include many thousands of compute cores and nodes, all of which may be utilized in a single large-scale run. The increasing vulnerability of HPC applications to errors induced by soft faults is motivating extensive work on techniques to make these applications more resilient to such faults, ranging from generic techniques such as replication or checkpoint/restart to algorithm-specific error detection and tolerance techniques. Effective use of such techniques requires a detailed understanding of how a given application is affected by soft faults to ensure that (i) efforts to improve application resilience are spent in the code regions most vulnerable to faults, (ii) the appropriate resilience techniques is applied to each code region, and (iii) the understanding be obtained in an efficient manner. This thesis presents two tools: FaultTelescope helps application developers view the routine and application vulnerability to soft errors while ErrorSight helps perform modular fault characteristics analysis for more complex applications. This thesis also illustrates how these tools can be used in the context of representative applications and kernels. In addition to providing actionable insights into application behavior, the tools automatically selects the number of fault injection experiments required to efficiently generation error profiles of an application, ensuring that the information is statistically well-grounded without performing unnecessary experiments
    • …
    corecore