2,131 research outputs found

    An object based algebra for specifying a fault tolerant software architecture

    Get PDF
    AbstractIn this paper we present an algebra of actors extended with mechanisms to model crash failures and their detection. We show how this extended algebra of actors can be successfully used to specify distributed software architectures. The main components of a software architecture can be specified following an object-oriented style and then they can be composed using asynchronous message passing or more complex interaction patterns. This formal specification can be used to show that several requirements of a software system are satisfied at the architectural level despite failures. We illustrate this process by means of a case study: the specification of a software architecture for intelligent agents which supports a fault tolerant anonymous interaction protocol

    Survey of Distributed Decision

    Get PDF
    We survey the recent distributed computing literature on checking whether a given distributed system configuration satisfies a given boolean predicate, i.e., whether the configuration is legal or illegal w.r.t. that predicate. We consider classical distributed computing environments, including mostly synchronous fault-free network computing (LOCAL and CONGEST models), but also asynchronous crash-prone shared-memory computing (WAIT-FREE model), and mobile computing (FSYNC model)

    Deceit: A flexible distributed file system

    Get PDF
    Deceit, a distributed file system (DFS) being developed at Cornell, focuses on flexible file semantics in relation to efficiency, scalability, and reliability. Deceit servers are interchangeable and collectively provide the illusion of a single, large server machine to any clients of the Deceit service. Non-volatile replicas of each file are stored on a subset of the file servers. The user is able to set parameters on a file to achieve different levels of availability, performance, and one-copy serializability. Deceit also supports a file version control mechanism. In contrast with many recent DFS efforts, Deceit can behave like a plain Sun Network File System (NFS) server and can be used by any NFS client without modifying any client software. The current Deceit prototype uses the ISIS Distributed Programming Environment for all communication and process group management, an approach that reduces system complexity and increases system robustness

    The Impact of RDMA on Agreement

    Full text link
    Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n2fP+1n \geq 2f_P + 1 processes (where fPf_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires nfP+1n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith

    Using mobility and exception handling to achieve mobile agents that survive server crash failures

    Get PDF
    Mobile agent technology, when designed and used effectively, can minimize bandwidth consumption and autonomously provide a snapshot of the current context of a distributed system. Protecting mobile agents from server crashes is a challenging issue, since developers normally have no control over remote servers. Server crash failures can leave replicas, instable storage, unavailable for an unknown time period. Furthermore, few systems have considered the need for using a fault tolerant protocol among a group of collaborating mobile agents. This thesis uses exception handling to protect mobile agents from server crash failures. An exception model is proposed for mobile agents and two exception handler designs are investigated. The first exists at the server that created the mobile agent and uses a timeout mechanism. The second, the mobile shadow scheme, migrates with the mobile agent and operates at the previous server visited by the mobile agent. A case study application has been developed to compare the performance of the two exception handler designs. Performance results demonstrate that although the second design is slower it offers the smaller trip time when handling a server crash. Furthermore, no modification of the server environment is necessary. This thesis shows that the mobile shadow exception handling scheme reduces complexity for a group of mobile agents to survive server crashes. The scheme deploys a replica that monitors the server occupied by the master, at each stage of the itinerary. The replica exists at the previous server visited in the itinerary. Consequently, each group member is a single fault tolerant entity with respect to server crash failures. Other schemes introduce greater complexity and performance overheads since, for each stage of the itinerary, a group of replicas is sent to servers that offer an equivalent service. In addition, future research is established for fault tolerance in groups of collaborating mobile agents

    Programming with process groups: Group and multicast semantics

    Get PDF
    Process groups are a natural tool for distributed programming and are increasingly important in distributed computing environments. Discussed here is a new architecture that arose from an effort to simplify Isis process group semantics. The findings include a refined notion of how the clients of a group should be treated, what the properties of a multicast primitive should be when systems contain large numbers of overlapping groups, and a new construct called the causality domain. A system based on this architecture is now being implemented in collaboration with the Chorus and Mach projects

    A software architecture for consensus based replication

    Get PDF
    Orientador: Luiz Eduardo BuzatoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Esta tese explora uma das ferramentas fundamentais para construção de sistemas distribuídos: a replicação de componentes de software. Especificamente, procuramos resolver o problema de como simplificar a construção de aplicações replicadas que combinem alto grau de disponibilidade e desempenho. Como ferramenta principal para alcançar o objetivo deste trabalho de pesquisa desenvolvemos Treplica, uma biblioteca de replicação voltada para construção de aplicações distribuídas, porém com semântica de aplicações centralizadas. Treplica apresenta ao programador uma interface simples baseada em uma especificação orientada a objetos de replicação ativa. A conclusão que defendemos nesta tese é que é possível desenvolver um suporte modular e de uso simples para replicação que exibe alto desempenho, baixa latência e que permite recuperação eficiente em caso de falhas. Acreditamos que a arquitetura de software proposta tem aplicabilidade em qualquer sistema distribuído, mas é de especial interesse para sistemas que não são distribuídos pela ausência de uma forma simples, eficiente e confiável de replicá-losAbstract: This thesis explores one of the fundamental tools for the construction of distributed systems: the replication of software components. Specifically, we attempted to solve the problem of simplifying the construction of high-performance and high-availability replicated applications. We have developed Treplica, a replication library, as the main tool to reach this research objective. Treplica allows the construction of distributed applications that behave as centralized applications, presenting the programmer a simple interface based on an object-oriented specification for active replication. The conclusion we reach in this thesis is that it is possible to create a modular and simple to use support for replication, providing high performance, low latency and fast recovery in the presence of failures. We believe our proposed software architecture is applicable to any distributed system, but it is particularly interesting to systems that remain centralized due to the lack of a simple, efficient and reliable replication mechanismDoutoradoSistemas de ComputaçãoDoutor em Ciência da Computaçã
    corecore