3,528 research outputs found

    CSP methods for identifying atomic actions in the design of fault tolerant concurrent systems

    Get PDF
    Limiting the extent of error propagation when faults occur and localizing the subsequent error recovery are common concerns in the design of fault tolerant parallel processing systems, Both activities are made easier if the designer associates fault tolerance mechanisms with the underlying atomic actions of the system, With this in mind, this paper has investigated two methods for the identification of atomic actions in parallel processing systems described using CSP, Explicit trace evaluation forms the basis of the first algorithm, which enables a designer to analyze interprocess communications and thereby locate atomic action boundaries in a hierarchical fashion, The second method takes CSP descriptions of the parallel processes and uses structural arguments to infer the atomic action boundaries. This method avoids the difficulties involved with producing full trace sets, but does incur the penalty of a more complex algorithm

    A development framework for artificial intelligence based distributed operations support systems

    Get PDF
    Advanced automation is required to reduce costly human operations support requirements for complex space-based and ground control systems. Existing knowledge based technologies have been used successfully to automate individual operations tasks. Considerably less progress has been made in integrating and coordinating multiple operations applications for unified intelligent support systems. To fill this gap, SOCIAL, a tool set for developing Distributed Artificial Intelligence (DAI) systems is being constructed. SOCIAL consists of three primary language based components defining: models of interprocess communication across heterogeneous platforms; models for interprocess coordination, concurrency control, and fault management; and for accessing heterogeneous information resources. DAI applications subsystems, either new or existing, will access these distributed services non-intrusively, via high-level message-based protocols. SOCIAL will reduce the complexity of distributed communications, control, and integration, enabling developers to concentrate on the design and functionality of the target DAI system itself

    Improving a data-acquisition software system with abstract data type components

    Get PDF
    Abstract data types and object-oriented design are active research areas in computer science and software engineering. Much of the interest is aimed at new software development. Abstract data type packages developed for a discontinued software project were used to improve a real-time data-acquisition system under maintenance. The result saved effort and contributed to a significant improvement in the performance, maintainability, and reliability of the Goldstone Solar System Radar Data Acquisition System

    An occam Style Communications System for UNIX Networks

    Get PDF
    This document describes the design of a communications system which provides occam style communications primitives under a Unix environment, using TCP/IP protocols, and any number of other protocols deemed suitable as underlying transport layers. The system will integrate with a low overhead scheduler/kernel without incurring significant costs to the execution of processes within the run time environment. A survey of relevant occam and occam3 features and related research is followed by a look at the Unix and TCP/IP facilities which determine our working constraints, and a description of the T9000 transputer's Virtual Channel Processor, which was instrumental in our formulation. Drawing from the information presented here, a design for the communications system is subsequently proposed. Finally, a preliminary investigation of methods for lightweight access control to shared resources in an environment which does not provide support for critical sections, semaphores, or busy waiting, is made. This is presented with relevance to mutual exclusion problems which arise within the proposed design. Future directions for the evolution of this project are discussed in conclusion

    A distributed camera system for multi-resolution surveillance

    Get PDF
    We describe an architecture for a multi-camera, multi-resolution surveillance system. The aim is to support a set of distributed static and pan-tilt-zoom (PTZ) cameras and visual tracking algorithms, together with a central supervisor unit. Each camera (and possibly pan-tilt device) has a dedicated process and processor. Asynchronous interprocess communications and archiving of data are achieved in a simple and effective way via a central repository, implemented using an SQL database. Visual tracking data from static views are stored dynamically into tables in the database via client calls to the SQL server. A supervisor process running on the SQL server determines if active zoom cameras should be dispatched to observe a particular target, and this message is effected via writing demands into another database table. We show results from a real implementation of the system comprising one static camera overviewing the environment under consideration and a PTZ camera operating under closed-loop velocity control, which uses a fast and robust level-set-based region tracker. Experiments demonstrate the effectiveness of our approach and its feasibility to multi-camera systems for intelligent surveillance

    Analysis of backward error recovery for concurrent processes with recovery blocks

    Get PDF
    Three different methods of implementing recovery blocks (RB's). These are the asynchronous, synchronous, and the pseudo recovery point implementations. Pseudo recovery points so that unbounded rollback may be avoided while maintaining process autonomy are proposed. Probabilistic models for analyzing these three methods under standard assumptions in computer performance analysis, i.e., exponential distributions for related random variables were developed. The interval between two successive recovery lines for asynchronous RB's mean loss in computation power for the synchronized method, and additional overhead and rollback distance in case PRP's are used were estimated

    Process Management in Distributed Operating Systems

    Get PDF
    As part of designing and building the Amoeba distributed operating system, we have come up with a simple set of mechanisms for process management that allows downloading process migration, checkpointing, remote debugging and emulation of alien operating system interfaces.\ud The basic process management facilities are realized by the Amoeba Kernel and can be augmented by user-space services: Debug Service, Load-Balancing Service, Unix-Emulation Service, Checkpoint Service, etc.\ud The Amoeba Kernel can produce a representation of the state of a process which can be given to another Kernel where it is accepted for continued execution. This state consists of the memory contents in the form of a collection of segments, and a Process Descriptor which contains the additional state, program counters, stack pointers, system call state, etc.\ud Careful separation of mechanism and policy has resulted in a compact set of Kernel operations for process creation and management. A collection of user-space services provides process management policies and a simple interface for application programs.\ud In this paper we shall describe the mechanisms as they are being implemented in the Amoeba Distributed System at the Centre for Mathematics and Computer Science in Amsterdam. We believe that the mechanisms described here can also apply to other distributed systems
    corecore