4,071 research outputs found
Measuring the BDARX architecture by agent oriented system a case study
Distributed systems are progressively designed as multi-agent systems that are helpful in designing high strength complex industrial software. Recently, distributed systems cooperative applications are openly access, dynamic and large scales. Nowadays, it hardly seems necessary to emphasis on the potential of decentralized software solutions. This is because the main benefit lies in the distributed nature of information, resources and action. On the other hand, the progression in multi agent systems creates new challenges to the traditional methodologies of fault-tolerance that typically relies on centralized and offline solution. Research on multi-agent systems had gained attention for designing software that operates in distributed and open environments, such as the Internet. DARX (Dynamic Agent Replication eXtension) is one of the architecture which aimed at building reliable software that would prove to be both flexible and scalable and also aimed to provide adaptive fault tolerance by using dynamic replication methodologies. Therefore, the enhancement of DARX known as BDARX can provide dynamic solution of byzantine faults for the agent based systems that embedded DARX. The BDARX architecture improves the fault tolerance ability of multi-agent systems in long run and strengthens the software to be more robust against such arbitrary faults. The BDARX provide the solution for the Byzantine fault tolerance in DARX by making replicas on the both sides of communication agents by using BFT protocol for agent systems instead of making replicas only on server end and assuming client as failure free. This paper shows that the dynamic behaviour of agents avoid us from making discrimination between server and client replicas
Fault-Tolerant Adaptive Parallel and Distributed Simulation
Discrete Event Simulation is a widely used technique that is used to model
and analyze complex systems in many fields of science and engineering. The
increasingly large size of simulation models poses a serious computational
challenge, since the time needed to run a simulation can be prohibitively
large. For this reason, Parallel and Distributes Simulation techniques have
been proposed to take advantage of multiple execution units which are found in
multicore processors, cluster of workstations or HPC systems. The current
generation of HPC systems includes hundreds of thousands of computing nodes and
a vast amount of ancillary components. Despite improvements in manufacturing
processes, failures of some components are frequent, and the situation will get
worse as larger systems are built. In this paper we describe FT-GAIA, a
software-based fault-tolerant extension of the GAIA/ART\`IS parallel simulation
middleware. FT-GAIA transparently replicates simulation entities and
distributes them on multiple execution nodes. This allows the simulation to
tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some
protection against byzantine failures since synchronization messages are
replicated as well, so that the receiving entity can identify and discard
corrupted messages. We provide an experimental evaluation of FT-GAIA on a
running prototype. Results show that a high degree of fault tolerance can be
achieved, at the cost of a moderate increase in the computational load of the
execution units.Comment: Proceedings of the IEEE/ACM International Symposium on Distributed
Simulation and Real Time Applications (DS-RT 2016
Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication
This paper presents FT-GAIA, a software-based fault-tolerant parallel and
distributed simulation middleware. FT-GAIA has being designed to reliably
handle Parallel And Distributed Simulation (PADS) models, which are needed to
properly simulate and analyze complex systems arising in any kind of scientific
or engineering field. PADS takes advantage of multiple execution units run in
multicore processors, cluster of workstations or HPC systems. However, large
computing systems, such as HPC systems that include hundreds of thousands of
computing nodes, have to handle frequent failures of some components. To cope
with this issue, FT-GAIA transparently replicates simulation entities and
distributes them on multiple execution nodes. This allows the simulation to
tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some
protection against Byzantine failures, since interaction messages among the
simulated entities are replicated as well, so that the receiving entity can
identify and discard corrupted messages. Results from an analytical model and
from an experimental evaluation show that FT-GAIA provides a high degree of
fault tolerance, at the cost of a moderate increase in the computational load
of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731
The Impact of RDMA on Agreement
Remote Direct Memory Access (RDMA) is becoming widely available in data
centers. This technology allows a process to directly read and write the memory
of a remote host, with a mechanism to control access permissions. In this
paper, we study the fundamental power of these capabilities. We consider the
well-known problem of achieving consensus despite failures, and find that RDMA
can improve the inherent trade-off in distributed computing between failure
resilience and performance. Specifically, we show that RDMA allows algorithms
that simultaneously achieve high resilience and high performance, while
traditional algorithms had to choose one or another. With Byzantine failures,
we give an algorithm that only requires processes (where
is the maximum number of faulty processes) and decides in two (network)
delays in common executions. With crash failures, we give an algorithm that
only requires processes and also decides in two delays. Both
algorithms tolerate a minority of memory failures inherent to RDMA, and they
provide safety in asynchronous systems and liveness with standard additional
assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith
A computer scientist looks at game theory
I consider issues in distributed computation that should be of relevance to
game theory. In particular, I focus on (a) representing knowledge and
uncertainty, (b) dealing with failures, and (c) specification of mechanisms.Comment: To appear, Games and Economic Behavior. JEL classification numbers:
D80, D8
Computer Science and Game Theory: A Brief Survey
There has been a remarkable increase in work at the interface of computer
science and game theory in the past decade. In this article I survey some of
the main themes of work in the area, with a focus on the work in computer
science. Given the length constraints, I make no attempt at being
comprehensive, especially since other surveys are also available, and a
comprehensive survey book will appear shortly.Comment: To appear; Palgrave Dictionary of Economic
- …