A simple proof of the necessity of the failure detector $\Sigma$ to implement a register in asynchronous message-passing systems by Bonnet, François & Raynal, Michel
HAL Id: inria-00392450
https://hal.inria.fr/inria-00392450
Submitted on 8 Jun 2009
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
A simple proof of the necessity of the failure detector Σ
to implement a register in asynchronous message-passing
systems
François Bonnet, Michel Raynal
To cite this version:
François Bonnet, Michel Raynal. A simple proof of the necessity of the failure detector Σ to implement
a register in asynchronous message-passing systems. [Research Report] PI 1932, 2009, pp.8. ￿inria-
00392450￿
Publications Internes de l’IRISA
ISSN : en cours
PI 1932 – juin 2009
A simple proof of the necessity of the failure detectorΣ
to implement a register in asynchronous message-passing systems
François Bonnet* , Michel Raynal**
francois.bonnet@irisa.fr, raynal@irisa.fr
Abstract: This paper presents a simple proof that the quorum failure detector class (denotedΣ) is the weakest failure detector class
required to implement an atomic read/write register in an asynchronous message-passing system prone to an arbitrary number of process
crashes.
Key-words: Asynchronous message-passing system, Atomic register, Necessity proof, Process crash, Weakest failure detector.
Une preuve simple de la nécessit́e du d́etecteur de fautesΣ
pour impĺementer un registre dans un système asynchronèa passage de messages
Résuḿe : Ce rapport propose une preuve simple de la nécessit́e du d́etecteur de fautesΣ pour impĺementer un registre dans un
syst̀eme asynchronèa passage de messages.
Mots clés : Asynchronisme, D́efaillances des processus, Détecteur de fautes, Preuve de nćessit́e, Registre atomique.
* Projet ASAP:équipe commune avec l’INRIA, le CNRS, l’université Rennes 1 et l’INSA de Rennes
** Projet ASAP:équipe commune avec l’INRIA, le CNRS, l’université Rennes 1 et l’INSA de Rennes
c©IRISA – Campus de Beaulieu – 35042 Rennes Cedex – France – +33 2 99 84 71 00 – www.irisa.fr
2 F. Bonnet & M. Raynal
1 Introduction
Atomic register Among the objects that allow concurrent processes to exchange information and cooperate to a common goal, the
atomic register is certainly the most fundamental. Such an object (let us denote itREG) provides the processes with two operations
REG .read() andREG .write(v). The read operation provides the invoking process with the value of the object, while the write operation
associates a new valuev with the object.
Atomicity [10, 11] means that the read and write operations issued on a register appear as if they have been executed sequentially,
and this “witness sequence” is (1) legal (a read returns the value written by the closest write that precedes it in this sequence) and (2)
respects the real time occurrence order on the operations (if the operationop1 terminates before an operationp2 starts,op1 appears
beforeop2 in the witness sequence).
Simulating a register in an asynchronous system In an asynchronous message-passing system, the processes communicate by send-
ing and receiving message through channels, and there are assumptions neither on the speed of processes, nor on message transmission
delays.
If the system is reliable, it is easy to build an atomic register on top of an asynchronous message-passing system. This is no longer
the case if processes can crash. Letn be the number of processes that compose the system andt be a model parameter that defines an
upper bound on the number of processes that may crash. Algorithms that build an atomic register object despite asynchrony and up to
t < n/2 process crashes are described in [1, 2].
An important result is also shown in [2], namely, there is no algorithm implementing an atomic register in asynchronous message-
passing systems wheret ≥ n/2. The intuition that underlies this impossibility is that, due to asynchrony and the fact thatt ≥ n/2, the
system can appear as being partitioned, in such a way that each partition considers that the processes in the other partition have crashed
(while they actually have not).
The failure detector approach to circumvent the “t ≥ n/2” impossibility The failure detector approach [4] has been introduced to
circumvent impossibility results. It consists in enriching each process of an unreliable asynchronous system with an additional device
(sometimes called “oracle”) that provides it with hints on process failures. According to the type and the quality of these hints, several
classes of failure detectors can been defined.
The class ofquorumfailure detectors, denotedΣ, has been introduced by Delporte-Gallet, Fauconnier and Guerraoui in [5]. It is
shown in [5, 6] thatΣ is the weakest class of failure detectors that allows building an atomic register object in asynchronous message
passing systems despite any number of process crashes (i.e., in systems wheret = n−1). “Weakest” means thatΣ captures the minimal
information on failures that has to be known by the processes in order to implement a register. The definition ofΣ s given below. (A
quorum is a set of processes. Quorums have first been introduced by Gifford [9].)
Content of the paper Showing thatΣ is the weakest class of failure detectors to build a register amounts to show two things, namely,
thatΣ is sufficient and that it is necessary.
On the “sufficiency” part, designing aΣ-based algorithm that builds a register is relatively easy. It consists in replacing the “majority
of non-faulty processes” assumption used in algorithms such as the ones described in [2, 3, 12] by aΣ quorum. From an operational
point of view, this amounts to replace the statement “wait for messages from a majority of processes” by the statement “wait for messages
from a quorum”. SuchΣ-based algorithms are described in [5, 8].
The difficult part is the “necessity” part. LetD be any failure detector that allows building a register in an asynchronous message-
passing system despite any number of process crashes, andA be anyD-based algorithm that builds a register. The proof of the
“necessity” part consists in showing that, given anyD-based algorithmA, it is possible to build a failure detector of the classΣ (we say
that it is possible to “extract”Σ from A). The first such extraction algorithm appeared in [5].
We present in this paper a new proof of the “necessity” part, that is particularly simple. This proof is based on a technique totally
different from the one used in [5]. Interestingly and in addition to its simplicity, the proposed extraction algorithm does not use sequence
numbers and requires only a bounded local memory at each process.
2 Computation model and definitions
2.1 Asynchronous message-passing Systems
As already indicated, the computation model consists ofn asynchronous processes (denotedp1, ...,pn) that communicate by exchanging
messages through point-to-point reliable asynchronous channels. The integeris th identity ofpi. LetΠ = {1, . . . , n}. Up tot = n−1
processes can crash. A crash is a premature halting: the process stops executing. Until it crashes a process executes correctly the code
of its algorithm. Given a run, a process that crashes is said to befaulty in that run. Otherwise, it is correct. In the followingC denote the
set of correct processes.
Collection des Publications Internes de l’Irisac©IRISA
A simple proof of the necessity of the failure detectorΣ to implement a register in asynchronous message-passing systems 3
The underlying time model is the set of integers. This time notion is not accessible to the processes. It can only be used from an
external observer point of view to state or prove properties. Time instants are denoted byτ ,τ ′, etc.
2.2 The failure detector classΣ
As indicated previously, thequorum failure detectorclass has been introduced and investigated in [5]. Each processpi is provided with
a local variable (denotedΣi) that it can only read. At any time, such a variable contains a set of process identities (quorum). LetΣτi
be the value ofΣi at timeτ . The classΣ contains all the failure detectors that satisfy the following properties (C denotes the set of
identities of the processes that are correct in the considered run):
• Intersection property.∀ i, j ∈ {1, . . . , n}: ∀ τ, τ ′: Στi ∩ Στ
′
j 6= ∅.
• Liveness property.∃τ : ∀ τ ′ ≥ τ : ∀i ∈ C: Στ ′i ⊆ C.
The first property states that the values of any two quorums taken at any times do intersect. This property prevent partitioning and is
consequently used to maintain the consistency of the atomic register. The second property states that a quorum cannot block the process
that uses it. Because two majorities always intersect, it is easy to see thatΣ can be implemented in systems wheret < n/2. Differently,
it cannot be implemented in pure asynchronous systems whent ≥ /2.
3 Σ is necessary to build a register
3.1 Principle
Aim The aim is to design an algorithm that emulates the output ofΣ at each processpi. This algorithm uses as a subroutine any
algorithmA and failure detectorD such thatA is aD-based algorithm that implements an atomic register in an asynchronous message-
passing system prone to any number of process crashes.
A simple task Q being any non-empty set of processes, let us consider an array ofn atomic registersREGQ[1..n], initialized to
[⊥, . . . ,⊥], and the task denotedWRQ where each processpi such thati ∈ Q executes the following algorithm (wheregi[1..n] is an
array local topi):
algorithm WR:
REGQ[i].write(>); for eachx ∈ {1, ..., n} do regi[x]← REGQ[x].read() end for.
The processpi first writes the value> in its entry of the arrayREGQ, and then reads asynchronously all its entries. The
REGQ[i].write(>) andREGQ[x].read() operations are provided to the processes by the previous algorithmA. (Let us notice that
the value obtained by a read is irrelevant. As we will see, what is important is the fact thatREG [x] has been written or not.) A
corresponding run ofWRQ is denotedEQ. In that run, no process outsideQ sends or receives messages related to the taskWRQ.1
Let us observe that, as the underlying failure detector-based algorithmA that builds a register is correct, if the setQ contains all the
correct processes (i.e.,C ⊆ Q), EQ is such that every correct process terminates the taskWRQ. In the other cases, i.e., for the tasks
WRQ such that¬(C ⊆ Q), EQ is such that a process ofQ either terminatesWRQ, or blocks forever, or crashes (this depends on the
actual failure pattern, the outputs of the underlying failure detectorD used by the algorithmA, and the code ofA).
Running concurrently 2n − 1 tasks The extraction algorithm considers the2n − 1 distinct tasksWRQ whereQ is a non-empty set
of 2Π. This means that each processpi manages2n−1 threads, one for each subsetQ such thati ∈ Q. Let us notice that the crash of a
processpi entails the crash of all its threads.
Let us finally recall that each register of an arrayREGQ[1..n] is implemented by the algorithmA executed by|Q| threads associated
with the processes ofQ. Due to the correctness ofA, eachREGQ[x] is an atomic register.
3.2 The extraction algorithm
The algorithm that extractsΣ is described in Figure 1. Let us recall that the aim is to provide each processpi with a local variableΣi
such that the(Σx)1≤x≤n variables satisfy the intersection and liveness properties defined in Section 2.2.
To that end, each processpi manages two local variables: a set of sets of process identities, denotedquorum setsi, and a queue
denotedqueuei. The aim ofquorum setsi is to contain all the setsQ such thatpi terminatesWRQ (taskT1), whilequeuei is managed
1When we consider the underlying failure detector-based algorithmA that implements the registersREGQ[1..n], as the processes that are not inQ do not participate
in WRQ, the messages sent by the processes ofQ to these processes are never received, or are delayed for an arbitrarily long period. Alternatively (as in [7]), we could
say that, inWRQ, the processes ofQ “omit” sending messages to the processes that are not inQ.
Collection des Publications Internes de l’Irisac©IRISA
4 F. Bonnet & M. Raynal
in such a way that eventually the correct processes appear in it before the faulty processes (tasksT2 andT3).
The idea is to select a set ofquorum setsi as the current output ofΣi. As we will see in the proof, given any pair of processespi
andpj , any quorum inquorum setsi has a non-empty intersection with any quorum inquorum setsj , thereby supplying the required
intersection property.
The main issue is to ensure the liveness property ofΣi (eventuallyΣi has to contain only correct processes) while preserving
the intersection property. This is realized with the help of the local variablequeuei as follows: the current output ofΣi is the set
(quorum) ofquorum setsi that appears as being the “first” inqueuei. The formal definition of “first set ofquorum setsi with respect
to queuei” is stated in the taskT4. To make it easy to understand, let us consider the following example. Letquorum setsi =
{{3, 4, 9}, {2, 3, 8}, {4, 7}}, andqueuei =< 4, 8, 3, 2, 7, 5, 9, · · · >. The setS = {2, 3, 8} is the first set ofquorum setsi with respect
to queuei because each of the other sets{3, 4, 9} and{4, 7} includes an element (9 and7, respectively) that appears inqueuei after the
elements ofS. (In case several sets are “first”, any of them can be selected).
Init : quorum setsi ← {{1 . . . , n}}; queuei ←< 1, . . . , n >;
for eachQ ∈
(
2Π \ {∅, {1, . . . , n}}
)
do
if (i ∈ Q) then launch a thread associated with the taskWRQ end if end for.
% Each processpi participates concurrently in all the tasksWRQ such thati ∈ Q %
Task T1: whenpi terminates in the taskWRQ: quorum setsi ← quorum setsi ∪ {Q}.
Task T2: repeat periodically broadcast ALIVE (i) end repeat.
Task T3: when ALIVE (j) is received: suppressj from queuei; enqueuej at the head ofqueuei.
Task T4: whenpi readsΣi:
let m = minQ∈quorum setsi (maxx∈Q(rank[x])) whererank[x] denotes the rank ofx in queuei;
return (a setQ such thatmaxx∈Q(rank[x]) = m).
Figure 1: ExtractingΣ from a failure detector-based algorithmA that implements a register (code forpi)
Remark Initially quorum setsi contains the set{1, . . . , n}. As no set of processes is ever withdrawn fromquorum setsi (taskT1),
quorum setsi is never empty. Moreover, it is not necessary to launch the taskWR{1,...,n} in which all the processes participate. This
is because, as the underlying failure detector-based algorithmA (that implements a register) is correct, it follows that all the correct
processes decide in the taskWR{1,...,n}. This case is directly taken into account in the initialization ofquorum setsi (thereby saving
the execution of the taskWR{1,...,n}).
3.3 The necessity theorem
Theorem 1 Let A be any failure detector-based algorithm that implements an atomic register in an asynchronous message-passing
system prone to any number of process crashes. GivenA, the algorithm described in Figure 1 is a bounded construction of a failure
detector of the classΣ.
Proof
Proof of the intersection property The proof is by contradiction. Let us first observe that the setΣi returned to a processpi is a set
of quorum seti (that contains the set{1, . . . , n} -initial value- plus all the setsQ such thatpi terminatesWRQ). Let us assume that
there are two setsQ1 andQ2 such that (1)Q1, Q2 ∈
⋃
1≤j≤n(quorum setj), and (2)Q1 ∩Q2 = ∅. The first item means thatQ1 and
Q2 can be returned to some processes as their local value forΣ.
Let pi be a process that terminatesWRQ1 andpj a process that terminatesWRQ2 (due to the “contradiction” assumption, such
processes do exist). Using the fact that the message-passing system is asynchronous, let us construct the runsEQ1 andEQ2 associated
with WRQ1 andWRQ2 as follows. If any (see footnote 1), the messages sent by the processes ofQ1 to the processes ofQ2, when
they executeA to implement each register of the arrayREGQ1 , are delayed for an arbitrarily long period (untilpi has addedQ1 to
quorum seti andpj has addedQ2 to quorum setj). And similarly for the messages sent by the processes ofQ2 to the processes of
Q1 when they executeA for each register of the arrayREGQ2 .
Let us observe that, in the concurrent runsEQ1 andEQ2 , the algorithmA that is executed only by (1) the processes ofQ1 in EQ1
to build the registersREGQ1 [1..n], and (2) only the processes ofQ2 in EQ2 to build the registersREGQ2 [1..n], is fed with the same
outputs of the underlying failure detectorD. Due to the fact that (if any) the messages fromQ1 to Q2 and fromQ2 to Q1 are delayed,
we have thatpi reads⊥ from REGQ1 [j] in EQ1 , andpj reads⊥ from REGQ2 [i] in EQ2 .
Collection des Publications Internes de l’Irisac©IRISA
A simple proof of the necessity of the failure detectorΣ to implement a register in asynchronous message-passing systems 5
Let us construct a runEQ12 , whereQ12 = Q1 ∪ Q2, that is a simple merge ofEQ1 andEQ2 defined as follows. In this run, the
algorithmA (that involves only the processes inQ12 and implements the array of registersREGQ12 [1..n]) is fed with the same failure
detector outputs as the ones supplied to the concurrent runsEQ1 andEQ2 . Moreover, the messages fromQ1 to Q2 and fromQ2 to Q1
are delayed as inEQ1 andEQ2 . So,pi (resp.,pj) receives the same messages and the same outputs from the underlying failure detector
in EQ12 andEQ1 (resp.,EQ2).
• On the one side, we have the following. As the processpi receives the same messages and the same failure detector outputs inEQ12
as inEQ1 , the arraysREGQ1 [1..n] andREGQ12 [1..n] contains the same values. Consequently,pi reads⊥ from REGQ12 [j].
Similarly, pj reads⊥ from REGQ12 [i].
• On the other side we have the following. InEQ12 , the processpi writes> into REGQ12 [i] and the processpj writes> into
REGQ12 [j]. Moreover, one of these operations terminates before the other. Without loss of generality, let us assume that the write
by pi terminates before the write bypj . Consequently,pj readsREGQ12 [i] after it has been written. Due to the atomicity of that
register, it follows thatpj obtains the value> when it readsREGQ12 [i].
The second item contradicts the first one. It follows that the initial assumption (existence of a failure detector-based algorithmA that
builds a register,Q1, Q2 ∈
⋃
1≤j≤n(quorum setj) andQ1∩Q2 = ∅) is false, from which we conclude that at least one of the assertions
Q1, Q2 ∈
⋃
1≤j≤n(quorum setj) andQ1∩Q2 = ∅ is false, which completes the proof of the intersection property (Corollary 1 -stated
below- is an immediate consequence of that property).
Proof of the liveness property As far as the liveness property is concerned, let us consider the taskWRC (recall thatC is the
set of correct processes). As the underlying failure detector-based algorithmA t at implements the registersREGC [1..n] is correct
(assumption), each correct processpi terminates itsREGC [i].write(>) andREGC [x].read() operations inEC . Consequently, in the
extraction algorithm, the variablequorum seti of each correct processpi eventually contains the setC.
Moreover, after some finite time, each correct processpi receivesALIVE (j) messages only from correct processes. This means that,
at each correct processpi, all the correct processes eventually precede the faulty processes inqueuei. Due to the definition of “first
set ofquorum seti with respect toqueuei” stated in the taskT4, it follows that, from the timeC has been added toquorum seti, the
quorumQ selected by the taskT4 is always such thatQ ⊆ C, which proves the liveness property ofΣk.
The construction is bounded A simple examination of the extraction algorithm shows that (1) both the variablesqu uei and
quorum setsi are bounded, and (2) messages carry bounded values, from which it follows that the construction is bounded.2Theorem 1
The proof of intersection property shows that it is not possible to have two setsQ1 andQ2 such thatQ1 ∩Q2 = ∅ and at least one
process ofQ1 terminatesWRQ1 and at least one process ofQ2terminatesWRQ2 . Hence the following corollary.
Corollary 1 Let two setsQ1 andQ2 such thatQ1 ∩Q2 = ∅. Then, no process ofQ1 terminatesWRQ1 or no process ofQ2 terminates
WRQ2 (or both).
References
[1] Attiya H., Efficient and Robust Sharing of Memory in Message-passing Systems.Journal of Algorithms, 34(1):109-127, 2000.
[2] Attiya H., Bar-Noy A. and Dolev D., Sharing Memory Robustly in Message Passing Systems.Journal of the ACM, 42(1):121-132, 1995.
[3] Attiya H. and Welch J., Distributed Computing: Fundamentals, Simulations and Advanced Topics, (2d Edition),Wiley-Interscience, 414 pages,
2004.
[4] Chandra T. and Toueg S., Unreliable Failure Detectors for Reliable Distributed Systems.Journal of the ACM, 43(2):225-267, 1996.
[5] Delporte-Gallet C., Fauconnier H. and Guerraoui R., Shared memoryvs Message Passing.Tech ReportIC/2003/77, EPFL, Lausanne, December
2003.
[6] Delporte-Gallet C., Fauconnier H., Guerraoui R., Hadzilacos V., Kouznetsov P. and Toueg S., The Weakest Failure Detectors to Solve Certain
Fundamental Problems in Distributed Computing.Proc. 23th ACM Symposium on Principles of Distributed Computing (PODC’04), ACM Press,
pp. 338-346, 2004.
[7] Delporte-Gallet C., Fauconnier H., Guerraoui R. and Tielmann A., The Weakest Failure Detector for Message Passing Set-Agreement.Proc. 22th
Int’l Symposium on Distributed Computing (DISC’08), Springer-Verlag LNCS #5218, pp. 109-120, 2008.
[8] Friedman R., Mostefaoui A. and Raynal M., Asynchronous Bounded Lifetime Failure Detectors.Inf mation Processing Letters, 94(2):85-91,
2005.
[9] Gifford D.K., Weighted Voting for Replicated Data.Proc. 7th ACM Symposium on Operating System Principles (SOSP’79), ACM Press, pp.
150-172, 1979.
[10] Herlihy M.P. and Wing J.L., Linearizability: a Correctness Condition for Concurrent Objects.ACM Transactions on Programming Languages
and Systems, 12(3):463-492, 1990.
[11] Lamport L., On Interprocess communication. Part I: Formalism. Part II: Algorithms.Distributed Computing, 1-2(2):87-103, 1986.
[12] Lynch N.A., Distributed Algorithms.Morgan Kaufmann Pub., San Francisco (CA), 872 pages, 1996.
Collection des Publications Internes de l’Irisac©IRISA
