Search CORE

2,131 research outputs found

An object based algebra for specifying a fault tolerant software architecture

Author: Dragoni Nicola
Gaspari Mauro
Publication venue: Elsevier Inc.
Publication date: 01/01/2005
Field of study

AbstractIn this paper we present an algebra of actors extended with mechanisms to model crash failures and their detection. We show how this extended algebra of actors can be successfully used to specify distributed software architectures. The main components of a software architecture can be specified following an object-oriented style and then they can be composed using asynchronous message passing or more complex interaction patterns. This formal specification can be used to show that several requirements of a software system are satisfied at the architectural level despite failures. We illustrate this process by means of a case study: the specification of a software architecture for intelligent agents which supports a fault tolerant anonymous interaction protocol

Elsevier - Publisher Connector

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Survey of Distributed Decision

Author: Feuilloley Laurent
Fraigniaud Pierre
Publication venue
Publication date: 14/06/2016
Field of study

We survey the recent distributed computing literature on checking whether a given distributed system configuration satisfies a given boolean predicate, i.e., whether the configuration is legal or illegal w.r.t. that predicate. We consider classical distributed computing environments, including mostly synchronous fault-free network computing (LOCAL and CONGEST models), but also asynchronous crash-prone shared-memory computing (WAIT-FREE model), and mobile computing (FSYNC model)

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Deceit: A flexible distributed file system

Author: Birman Kenneth
Marzullo Keith
Siegel Alex
Publication venue
Publication date: 01/11/1989
Field of study

Deceit, a distributed file system (DFS) being developed at Cornell, focuses on flexible file semantics in relation to efficiency, scalability, and reliability. Deceit servers are interchangeable and collectively provide the illusion of a single, large server machine to any clients of the Deceit service. Non-volatile replicas of each file are stored on a subset of the file servers. The user is able to set parameters on a file to achieve different levels of availability, performance, and one-copy serializability. Deceit also supports a file version control mechanism. In contrast with many recent DFS efforts, Deceit can behave like a plain Sun Network File System (NFS) server and can be used by any NFS client without modifying any client software. The current Deceit prototype uses the ISIS Distributed Programming Environment for all communication and process group management, an approach that reduces system complexity and increases system robustness

NASA Technical Reports Server

eCommons@Cornell

The Impact of RDMA on Agreement

Author: Aguilera Marcos K.
Ben-David Naama
Guerraoui Rachid
Marathe Virendra
Zablotchi Igor
Publication venue
Publication date: 03/03/2020
Field of study

Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires

n \geq 2f_P + 1

processes (where

f_P

is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires

n \geq f_P + 1

processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Using mobility and exception handling to achieve mobile agents that survive server crash failures

Author: Pears Simon
Publication venue
Publication date: 01/01/2005
Field of study

Mobile agent technology, when designed and used effectively, can minimize bandwidth consumption and autonomously provide a snapshot of the current context of a distributed system. Protecting mobile agents from server crashes is a challenging issue, since developers normally have no control over remote servers. Server crash failures can leave replicas, instable storage, unavailable for an unknown time period. Furthermore, few systems have considered the need for using a fault tolerant protocol among a group of collaborating mobile agents. This thesis uses exception handling to protect mobile agents from server crash failures. An exception model is proposed for mobile agents and two exception handler designs are investigated. The first exists at the server that created the mobile agent and uses a timeout mechanism. The second, the mobile shadow scheme, migrates with the mobile agent and operates at the previous server visited by the mobile agent. A case study application has been developed to compare the performance of the two exception handler designs. Performance results demonstrate that although the second design is slower it offers the smaller trip time when handling a server crash. Furthermore, no modification of the server environment is necessary. This thesis shows that the mobile shadow exception handling scheme reduces complexity for a group of mobile agents to survive server crashes. The scheme deploys a replica that monitors the server occupied by the master, at each stage of the itinerary. The replica exists at the previous server visited in the itinerary. Consequently, each group member is a single fault tolerant entity with respect to server crash failures. Other schemes introduce greater complexity and performance overheads since, for each stage of the itinerary, a group of replicas is sent to servers that offer an equivalent service. In addition, future research is established for fault tolerance in groups of collaborating mobile agents

Durham e-Theses

OpenGrey Repository

Programming with process groups: Group and multicast semantics

Author: Birman Kenneth P.
Cooper Robert
Gleeson Barry
Publication venue
Publication date: 29/01/1991
Field of study

Process groups are a natural tool for distributed programming and are increasingly important in distributed computing environments. Discussed here is a new architecture that arose from an effort to simplify Isis process group semantics. The findings include a refined notion of how the clients of a group should be treated, what the properties of a multicast primitive should be when systems contain large numbers of overlapping groups, and a new construct called the causality domain. A system based on this architecture is now being implemented in collaboration with the Chorus and Mach projects

NASA Technical Reports Server

eCommons@Cornell

A software architecture for consensus based replication

Author: Vieira Gustavo Maciel Dias
Publication venue: [s.n.]
Publication date: 17/08/2018
Field of study

Orientador: Luiz Eduardo BuzatoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Esta tese explora uma das ferramentas fundamentais para construção de sistemas distribuídos: a replicação de componentes de software. Especificamente, procuramos resolver o problema de como simplificar a construção de aplicações replicadas que combinem alto grau de disponibilidade e desempenho. Como ferramenta principal para alcançar o objetivo deste trabalho de pesquisa desenvolvemos Treplica, uma biblioteca de replicação voltada para construção de aplicações distribuídas, porém com semântica de aplicações centralizadas. Treplica apresenta ao programador uma interface simples baseada em uma especificação orientada a objetos de replicação ativa. A conclusão que defendemos nesta tese é que é possível desenvolver um suporte modular e de uso simples para replicação que exibe alto desempenho, baixa latência e que permite recuperação eficiente em caso de falhas. Acreditamos que a arquitetura de software proposta tem aplicabilidade em qualquer sistema distribuído, mas é de especial interesse para sistemas que não são distribuídos pela ausência de uma forma simples, eficiente e confiável de replicá-losAbstract: This thesis explores one of the fundamental tools for the construction of distributed systems: the replication of software components. Specifically, we attempted to solve the problem of simplifying the construction of high-performance and high-availability replicated applications. We have developed Treplica, a replication library, as the main tool to reach this research objective. Treplica allows the construction of distributed applications that behave as centralized applications, presenting the programmer a simple interface based on an object-oriented specification for active replication. The conclusion we reach in this thesis is that it is possible to create a modular and simple to use support for replication, providing high performance, low latency and fast recovery in the presence of failures. We believe our proposed software architecture is applicable to any distributed system, but it is particularly interesting to systems that remain centralized due to the lack of a simple, efficient and reliable replication mechanismDoutoradoSistemas de ComputaçãoDoutor em Ciência da Computaçã

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

Design of a Flexible and Extensible Fault Injector for Testing Concurrent and Distributed Applications

Author: Pedro Fernando Moreira da Silva Antunes
Publication venue
Publication date: 07/12/2022
Field of study

Repositório Aberto da Universidade do Porto