4,071 research outputs found

    Measuring the BDARX architecture by agent oriented system a case study

    Get PDF
    Distributed systems are progressively designed as multi-agent systems that are helpful in designing high strength complex industrial software. Recently, distributed systems cooperative applications are openly access, dynamic and large scales. Nowadays, it hardly seems necessary to emphasis on the potential of decentralized software solutions. This is because the main benefit lies in the distributed nature of information, resources and action. On the other hand, the progression in multi agent systems creates new challenges to the traditional methodologies of fault-tolerance that typically relies on centralized and offline solution. Research on multi-agent systems had gained attention for designing software that operates in distributed and open environments, such as the Internet. DARX (Dynamic Agent Replication eXtension) is one of the architecture which aimed at building reliable software that would prove to be both flexible and scalable and also aimed to provide adaptive fault tolerance by using dynamic replication methodologies. Therefore, the enhancement of DARX known as BDARX can provide dynamic solution of byzantine faults for the agent based systems that embedded DARX. The BDARX architecture improves the fault tolerance ability of multi-agent systems in long run and strengthens the software to be more robust against such arbitrary faults. The BDARX provide the solution for the Byzantine fault tolerance in DARX by making replicas on the both sides of communication agents by using BFT protocol for agent systems instead of making replicas only on server end and assuming client as failure free. This paper shows that the dynamic behaviour of agents avoid us from making discrimination between server and client replicas

    Fault-Tolerant Adaptive Parallel and Distributed Simulation

    Full text link
    Discrete Event Simulation is a widely used technique that is used to model and analyze complex systems in many fields of science and engineering. The increasingly large size of simulation models poses a serious computational challenge, since the time needed to run a simulation can be prohibitively large. For this reason, Parallel and Distributes Simulation techniques have been proposed to take advantage of multiple execution units which are found in multicore processors, cluster of workstations or HPC systems. The current generation of HPC systems includes hundreds of thousands of computing nodes and a vast amount of ancillary components. Despite improvements in manufacturing processes, failures of some components are frequent, and the situation will get worse as larger systems are built. In this paper we describe FT-GAIA, a software-based fault-tolerant extension of the GAIA/ART\`IS parallel simulation middleware. FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some protection against byzantine failures since synchronization messages are replicated as well, so that the receiving entity can identify and discard corrupted messages. We provide an experimental evaluation of FT-GAIA on a running prototype. Results show that a high degree of fault tolerance can be achieved, at the cost of a moderate increase in the computational load of the execution units.Comment: Proceedings of the IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2016

    Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

    Full text link
    This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

    The Impact of RDMA on Agreement

    Full text link
    Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n2fP+1n \geq 2f_P + 1 processes (where fPf_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires nfP+1n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith

    A computer scientist looks at game theory

    Full text link
    I consider issues in distributed computation that should be of relevance to game theory. In particular, I focus on (a) representing knowledge and uncertainty, (b) dealing with failures, and (c) specification of mechanisms.Comment: To appear, Games and Economic Behavior. JEL classification numbers: D80, D8

    Computer Science and Game Theory: A Brief Survey

    Full text link
    There has been a remarkable increase in work at the interface of computer science and game theory in the past decade. In this article I survey some of the main themes of work in the area, with a focus on the work in computer science. Given the length constraints, I make no attempt at being comprehensive, especially since other surveys are also available, and a comprehensive survey book will appear shortly.Comment: To appear; Palgrave Dictionary of Economic
    corecore