19,158 research outputs found

    Producing Scheduling that Causes Concurrent Programs to Fail

    Get PDF
    A noise maker is a tool that seeds a concurrent program with conditional synchronization primitives (such as yield()) for the purpose of increasing the likelihood that a bug manifest itself. This work explores the theory and practice of choosing where in the program to induce such thread switches at runtime. We introduce a novel fault model that classifies locations as .good., .neutral., or .bad,. based on the effect of a thread switch at the location. Using the model we explore the terms in which efficient search for real-life concurrent bugs can be carried out. We accordingly justify the use of probabilistic algorithms for this search and gain a deeper insight of the work done so far on noise-making. We validate our approach by experimenting with a set of programs taken from publicly available multi-threaded benchmark. Our empirical evidence demonstrates that real-life behavior is similar to what our model predicts

    Tools for distributed application management

    Get PDF
    Distributed application management consists of monitoring and controlling an application as it executes in a distributed environment. It encompasses such activities as configuration, initialization, performance monitoring, resource scheduling, and failure response. The Meta system is described: a collection of tools for constructing distributed application management software. Meta provides the mechanism, while the programmer specifies the policy for application management. The policy is manifested as a control program which is a soft real time reactive program. The underlying application is instrumented with a variety of built-in and user defined sensors and actuators. These define the interface between the control program and the application. The control program also has access to a database describing the structure of the application and the characteristics of its environment. Some of the more difficult problems for application management occur when pre-existing, nondistributed programs are integrated into a distributed application for which they may not have been intended. Meta allows management functions to be retrofitted to such programs with a minimum of effort

    Tools for distributed application management

    Get PDF
    Distributed application management consists of monitoring and controlling an application as it executes in a distributed environment. It encompasses such activities as configuration, initialization, performance monitoring, resource scheduling, and failure response. The Meta system (a collection of tools for constructing distributed application management software) is described. Meta provides the mechanism, while the programmer specifies the policy for application management. The policy is manifested as a control program which is a soft real-time reactive program. The underlying application is instrumented with a variety of built-in and user-defined sensors and actuators. These define the interface between the control program and the application. The control program also has access to a database describing the structure of the application and the characteristics of its environment. Some of the more difficult problems for application management occur when preexisting, nondistributed programs are integrated into a distributed application for which they may not have been intended. Meta allows management functions to be retrofitted to such programs with a minimum of effort

    A project to investigate mechanisms and methodologies for the design and construction of communicating concurrent processes in real-time environments

    Get PDF
    Research undertaken in 1979 into effective and appropriate mechanisms to aid in the design and construction of software for use in the flight research programs undertaken by NASA is presented

    Synthesis and Stochastic Assessment of Cost-Optimal Schedules

    Get PDF
    We present a novel approach to synthesize good schedules for a class of scheduling problems that is slightly more general than the scheduling problem FJm,a|gpr,r_j,d_j|early/tardy. The idea is to prime the schedule synthesizer with stochastic information more meaningful than performance factors with the objective to minimize the expected cost caused by storage or delay. The priming information is obtained by stochastic simulation of the system environment. The generated schedules are assessed again by simulation. The approach is demonstrated by means of a non-trivial scheduling problem from lacquer production. The experimental results show that our approach achieves in all considered scenarios better results than the extended processing times approach

    Pregelix: Big(ger) Graph Analytics on A Dataflow Engine

    Full text link
    There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by process-centric, message passing designs that many graph processing systems follow. Pregelix is a new open source distributed graph processing system that is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads. As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15x speedup compared to Apache Giraph and up to 35x speedup compared to distributed GraphLab), and makes more effective use of available machine resources to support Big(ger) Graph Analytics

    Building Resilient Cloud Over Unreliable Commodity Infrastructure

    Full text link
    Cloud Computing has emerged as a successful computing paradigm for efficiently utilizing managed compute infrastructure such as high speed rack-mounted servers, connected with high speed networking, and reliable storage. Usually such infrastructure is dedicated, physically secured and has reliable power and networking infrastructure. However, much of our idle compute capacity is present in unmanaged infrastructure like idle desktops, lab machines, physically distant server machines, and laptops. We present a scheme to utilize this idle compute capacity on a best-effort basis and provide high availability even in face of failure of individual components or facilities. We run virtual machines on the commodity infrastructure and present a cloud interface to our end users. The primary challenge is to maintain availability in the presence of node failures, network failures, and power failures. We run multiple copies of a Virtual Machine (VM) redundantly on geographically dispersed physical machines to achieve availability. If one of the running copies of a VM fails, we seamlessly switchover to another running copy. We use Virtual Machine Record/Replay capability to implement this redundancy and switchover. In current progress, we have implemented VM Record/Replay for uniprocessor machines over Linux/KVM and are currently working on VM Record/Replay on shared-memory multiprocessor machines. We report initial experimental results based on our implementation.Comment: Oral presentation at IEEE "Cloud Computing for Emerging Markets", Oct. 11-12, 2012, Bangalore, Indi

    Study of fault-tolerant software technology

    Get PDF
    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
    corecore