Search CORE

1,641 research outputs found

Recommended from our members

On designing dependable services with diverse off-the-shelf SQL servers

Author: A. Avizienis
A. Avizienis
A. Vaysburd
B. Kemme
C. Babbage
D. Powell
F. Pedone
F. Schneider
I. Gashi
J. Gray
J. Gray
J.C. Laprie
M. Patino-Martinez
M. Weismann
P. Popov
P.A. Bernstein
P.E. Ammann
P.J. Traverse
P.M. Chen
R. Jimenez-Peris
R. Jimenez-Peris
S. Chandra
S. Chandra
S. Poledna
S. Poledna
T. Anderson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

City Research Online

Crossref

Complexity of Multi-Value Byzantine Agreement

Author: Liang Guanfeng
Vaidya Nitin
Publication venue
Publication date: 01/01/2010
Field of study

In this paper, we consider the problem of maximizing the throughput of Byzantine agreement, given that the sum capacity of all links in between nodes in the system is finite. We have proposed a highly efficient Byzantine agreement algorithm on values of length l>1 bits. This algorithm uses error detecting network codes to ensure that fault-free nodes will never disagree, and routing scheme that is adaptive to the result of error detection. Our algorithm has a bit complexity of n(n-1)l/(n-t), which leads to a linear cost (O(n)) per bit agreed upon, and overcomes the quadratic lower bound (Omega(n^2)) in the literature. Such linear per bit complexity has only been achieved in the literature by allowing a positive probability of error. Our algorithm achieves the linear per bit complexity while guaranteeing agreement is achieved correctly even in the worst case. We also conjecture that our algorithm can be used to achieve agreement throughput arbitrarily close to the agreement capacity of a network, when the sum capacity is given

arXiv.org e-Print Archive

CiteSeerX

An approach to rollback recovery of collaborating mobile agents

Author: Bargiela A
Osman T
Wagealla W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2004
Field of study

Fault-tolerance is one of the main problems that must be resolved to improve the adoption of the agents' computing paradigm. In this paper, we analyse the execution model of agent platforms and the significance of the faults affecting their constituent components on the reliable execution of agent-based applications, in order to develop a pragmatic framework for agent systems fault-tolerance. The developed framework deploys a communication-pairs independent check pointing strategy to offer a low-cost, application-transparent model for reliable agent- based computing that covers all possible faults that might invalidate reliable agent execution, migration and communication and maintains the exactly-one execution property

Crossref

Nottingham Trent Institutional Repository (IRep)

Recommended from our members

Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers

Author: Gashi I.
Popov P. T.
Strigini L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2007
Field of study

If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present

City Research Online

Crossref

Better Sooner Rather Than Later

Author: Durand Anaïs
Raynal Michel
Taubenfeld Gadi
Publication venue
Publication date: 20/09/2023
Field of study

This article unifies and generalizes fundamental results related to

n

-process asynchronous crash-prone distributed computing. More precisely, it proves that for every

0\leq k \leq n

, assuming that process failures occur only before the number of participating processes bypasses a predefined threshold that equals

n-k

(a participating process is a process that has executed at least one statement of its code), an asynchronous algorithm exists that solves consensus for

n

processes in the presence of

f

crash failures if and only if

f \leq k

. In a very simple and interesting way, the "extreme" case

k=0

boils down to the celebrated FLP impossibility result (1985, 1987). Moreover, the second extreme case, namely

k=n

, captures the celebrated mutual exclusion result by E.W. Dijkstra (1965) that states that mutual exclusion can be solved for

n

processes in an asynchronous read/write shared memory system where any number of processes may crash (but only) before starting to participate in the algorithm (that is, participation is not required, but once a process starts participating it may not fail). More generally, the possibility/impossibility stated above demonstrates that more failures can be tolerated when they occur earlier in the computation (hence the title).Comment: 10 page

arXiv.org e-Print Archive