Wait-free consensus in "in-phase" multiprocessor systems by Papatriantafilou, M. & Tsigas, P.
Waitfree Consensus in Inphase Multiprocessor Systems
Marina Papatriantalou
MaxPlanckInstitut fur Informatik
Im Stadtwald  Saarbrucken Germany




Im Stadtwald  Saarbrucken Germany
tsigasmpisbmpgde
Abstract
In the consensus problem in a system with n processes each process starts with a private
input value and runs until it chooses irrevocably a decision value which was the input value
of some process of the system moreover all processes have to decide on the same value
This work deals with the problem of waitfreefully resilient to processor crash and napping
failuresconsensus of n processes in an inphase multiprocessor system It proves the
existence of a solution to the problem in this system by presenting a protocol which ensures
that a process will reach decision within at most n n   steps of its own in the worst
case or within n steps if no process fails




CR Subject Classication  B B B C C C D D D
Keywords 	 Phrases Agreement Consensus Distributed Computation FaultTolerance
Multiprocessor Systems Napping Failures Processor Crahses Shared Memory WaitFree
Synchronization
Note Partially supported by the EC ESPRIT II BRA ALCOM II contr  




In the consensus problem in a system with n processes each process starts with a private
input value and runs until it chooses irrevocably a decision value which has to be valid
ie to equal the input value of some process and consistent ie it has to be the same
value for all processes Whereas this is no problem in an ideal failurefree environment it
imposes certain constraints on the capabilities of an actual system which is viable only if
it permits protocols tolerant to failures In a system with failures the consensus problem
becomes a central issue of multiprocessor synchronization and coordination Solutions which
guarantee that each process decides after a certain number of its own steps regardless of the
other processes relative speeds are called waitfree Waitfreedom is a desirable property
in concurrent systems since it helps in taking advantage of the inherent parallelism in the
system by ensuring that no process may be blocked by others which might be slow preempted
swapped out delayed without warning by interrupts moreover it implies maximumtolerance
to processor crash and napping failures
As expected such a fundamental problem received the attention of many researchers as a
result many faces of the problem have been studied In 	
 and 	
 it has been proven that
in completely asynchronous systems message passing and shared memory respectively not
even one processor crash can be tolerated by a deterministic consensus protocol In 
 the
result is generalized for message passing systems several critical parameters are identied
and it is examined how they aect the number of faults that can be tolerated by a consensus
protocol In 
 shared memory data objects are partially classied according to the number




 concurrently and independently have proven a conjecture rst
stated in 
 that even in the case when the agreement condition is weakened so that the
decision values produced may dier there is no protocol to tolerate k failures where k is the
maximum number of distinct values that may be chosen as decisions Of particular interest
was the introduction of algebraic and combinatorial topology in the study of these problems
 		  
 On the other hand since in the asynchronous model the faulttolerant
consensus problem cannot be solved deterministically solutions that have been given employ
randomization or assume some form of synchrony For a survey and for detailed references
to those works cf 
 	

If we want to sum these up from the theoretical point of view we have many surprising
negative results while the interest in the problem remains high This is because on one
hand it is interesting to develop a thorough understanding of the borders and relations
between classes of objects with respect to their synchronization power on the other hand
it is interesting to study more uptodate architectures which provide more fundamental
synchronization primitives than just atomic reads and writes Besides it is easily noticeable
that there is an importantmiddle ground between the completely asynchronous and completely
synchronous extreme this middle ground is reasonable for modeling real concurrent systems
As a result there is an increasing interest in the past few years in research towards dening
and designing new architectures eg the transactional memory 
 or exploiting the
properties of already existing ones which are not all present in the theoretical models
in order to implement waitfree shared data objects  	 
 to mention but a few
 The Computation Model 
   Results and comparison with previous work
Following the direction mentioned above in this work we consider the waitfree consensus
problem in an inphase multiprocessor system 
 In the same system model the
waitfree clock synchronization problem has been earlier studied in 
 Since by todays
technologymultiprocessor computers have large numbers of processors and since the probability
of a crash increases with the number of processors in the system it is vitally important to
design multiprocessor systems that tolerate faults
In an inphase multiprocessor system processors share a common clock pulse in the
duration of a pulse a processor reads the shared data of one processor does some local
computation and updates its own shared data It should be pointed out that a processor
cannot modify the contents of registers owned by other processors It is possible that
processes in this system operate at very dierent speeds ie miss pulses because of preemption
interrupts page faults or even processor crashes So although in a step a process atomically
reads and writes and therefore 	process waitfree consensus is solvable 
 due to pulse
misses and possible processor crashes it is not obvious whether the nprocess consensus
problem can be solved waitfree in the system This work presents a solution for the waitfree
consensus problem with n processes in an inphase multiprocessor system thus answering
an open question stated in 
 and showing that this system modelarchitecture is strong
enough to support deterministic n process fault tolerant agreement The protocol ensures
that a process will reach decision within nn  	  steps of its own in the worst case or
within n    steps when no process misses a pulse
To the best of our knowledge no solution to this problem has been given before Previous
results that could serve as solutions can be found in   
 Those protocols are for the





n expected number of steps respectively but at the
cost of randomization On the other hand the n process waitfree consensus protocols
presented in 
 require some form of multiwriter readmodifywrite or more sophisticated
primitives augmented queue memorytomemoryswap the system model studied here
does not provide that directly Besides the three protocols presented in 
 as part of
a thorough analysis of the cases when consensus is solvable in messagepassing systems
cannot be translated into solutions for our system model Protocols E	 and E of 

assume synchronous processes no napping faults and totally ordered messages not just
FIFO channels respectively Protocol E relies much on the nature of synchronous message
communication ie that a process which had a long napping failure receives all the messages
sent to it during that time interval as soon as it resumes execution In our model a process
has to make n    steps to learn about shared variable modications during that time it
might suer a new napping fault and this might be repeated unbounded many times
 The Computation Model
The system consists of n processes which are identied by distinct identity numbers denoted
by P

     P
n
 The processes communicate via a set of singlewriter multireader atomic
registers Each one owns a subset of these registers The owner of a register can write the
 Description of the Protocol 
register while all the other processes can read it A step by a process P
k
consists of the
following actions i read by P
k






ii transition of P
k
s local state program counter local variables and iii update of its
own shared registers
We consider inphase multiprocessor systems in which all processors share a common
clock pulse Each pulse is a possibly empty set of process names the set of processes that
make a step in the pulse Each process can make at most one step in one pulse if it does
not make a step in some pulse it will be said to miss that pulse A conguration is a tuple of









of alternating pulses denoted by 
x
 and congurations denoted by c
x
 consecutive pulses
are indexed with consecutive integers Each conguration c
i
in a system execution is derived
from its directly preceding conguration c
i
by the state transitions and the shared variable
updates of the processes that make a step in pulse 
i
 the reads of shared registers that occur
in pulse 
i
return the respective values of c
i
 while the updates of the shared registers in
the same pulse take place in unison to derive c
i
 For any pulse 
j
in any system execution E






 denote the number of steps that P
k
made from the
beginning of E pulse 

 until and including pulse 
j

In the consensus problem each process P
k
is given an input value v
k
and is required to
return an output value v
 
k





makes its return step in some pulse 
j
 it makes no more steps in subsequent pulses in
the execution if at some pulse in an execution P
k
has not made its return step yet it is
called undecided in that pulse A waitfree consensus protocol should satisfy the following
requirements for every system execution













  T 
Validity For each process P
k
 its output value v
 
k





























 Description of the Protocol
The protocol is described in Clike pseudocode in gures  and 	 We have adopted several
conventions like using capital case names for shared variables and capital boldface for calls to
readwrite shared registers The following paragraphs describe the protocol more intuitively
Each process P
k
k   rst plays a game with P






g and writes that information on its DOM
k
shared register Subsequently
the nal decision is reached through stages in which partial decisions are made inductively
Let D
proc
denote the decision that would be taken if the system consisted only of processes
P

     P
proc
 The protocol tries to follow the inductive rule D
n













 the input value of P


Each process tries to nd D
proc





 it writes the value v decided and the process identity proc on its DEC
k


















  valtype boolean n valtype 	






with input value val 

var proc d set d set p  n 	 
 Initially     

dom dom p rech boolean 	 
 Initially     

dec dec p val p valtype 	






s registers as rst action in a step	 check for decisions 

begin









if d set p  i then d set  d set p	 dec  dec p 	 




 Update own vars as last action of step 

begin










proc recheck l  n dec tmp valtype
var i  n 	
begin
for  i  	 i  l	 i  do
if i  k do
READcheck i 	
if d set  l then proc  i  d set 	 
 advance current process ptr 






Figure  Protocol DECIDE a Shared Variables and auxiliary procedures
and D SET
k
shared registers respectively Let D
k
proc  v denote that mapping which






will nally return its D
k
n Since processes might
miss pulses and are therefore asynchronous deviations from the rule for deciding D
proc
are allowed so that a fast process P
k
 








to nd out D
proc










proc   Since this is done by a fast process
ie early enough that information is available in the respective shared registers for the slow














k dominance in fP
 
     P
k




if d set   then proc  d set 	
else if val p  null then dom   	
if dom and k   then d set  	 dec  val 	
UPDATE  	
for    proc	 proc  k	 proc  do
READcheck proc 	
if d set  proc then proc  d set 	 
 advance current process ptr 

else if  dom then d set  proc 	
if dom p then dec  val p 	 







if proc  k    and d set  k then d set  proc  k 	






 Main body of procedure DECIDE 

if k   then dec  val 	 
 Step S

k announce presence 

UPDATE  	
if k   then SafePhase k 	
for    proc	 proc  n	 proc do
READcheck proc 	
if d set  proc then proc  d set 	 
 advance current process ptr 

else if dom p then
if k   then d set  proc	 dec  val p 	
else dec tmp  val p	 rech   	 
 should recheck for deviate decisions 

else d set  proc 	 
 consider proc not dominant  deviate decision 

UPDATE  	




Figure 	 Protocol DECIDE b Procedure SafePhasek and main body of DECIDE
processes to nd out about the deviation from the rule and therefore decide consistently
Naturally each process in each one of its steps checks whether a nal decision or a more
advanced than the one it knows so far partial decision is reached and adopts it thus
advancing its process scanning pointer ie local variable proc
The game between P

and an arbitrary P
k
is played as follows Each process including
P

 as its rst action of the protocol announcement or step simply announces its
 Description of the Protocol 
participation in the game by writing its input value on its V AL
k
register In its next step each
P
k
k   reads the register of P

and becomes dominant only if it reads that P

has not made
its announcement step yet ie if VAL

 null in all other cases P
k
looses the game P





     P
n
g one at a step and in that order except when advancing
its local variable proc by checking whether P
proc
is dominant if not P
proc
will never become
dominant since it will read that P

has already made its announcement step Thus in that















safely for reasons that become clear later in this











k   follows a protocol of 	 stages rst using procedure SafePhasek
it estimates the partial decisionD
k
and then continues forD
k
     D
n
 After making
an observation we rst describe the second phase to give the intuition in a more clear way
Observation  If a process P
k
is not dominant and reads DOM
k
 




will not become dominant in that execution because P

has made its announcement step
Suppose that P
k
has an estimation from procedure SafePhasek of the partial decision
D
k
 Then for each proc  k       n it estimates D
proc
in that order except from
advances of the local variable proc as follows if it reads DOM
proc
  ie that P
proc
is
not dominant then only if P
k
itself is not dominant it can safely make a conclusion namely
that P
proc
will never become dominant by observation  otherwise it cannot conclude




to make a step anything
about this future Therefore it arbitrarily considers P
proc





proc    by leaving its DEC
k
shared register unchanged and updating
only its D SET
k





 proc which will
possibly later read that P
proc
is dominant before feeling free to decide D
k
 
proc  V AL
proc

it will have to recheck for earlier deviate decisions about D
proc
 among processes with
identity P
x
 st x  proc ie among processes that might have had to decide about that
with insucient information Note that P

need not recheck for deviations in decisions for
any D
proc
if it reads that P
proc
is dominant if such a decision was made this would have
happened before the rst step of P
proc
 therefore before the step of P

itself by a process
P
x
 st x  proc Thus P

would have read that decision before reading P
proc
s registers
Now lets see what the rst phase is In procedure SafePhasek if P
k
becomes dominant




 it only has to check whether a fast process in
fP

     P
k




is not dominant it has
to estimate D
k
 In both cases it suces for P
k
to scan once the shared registers of each
one of the processes in fP

     P
k





it knows that P
proc





proc   Otherwise if P
proc
is dominant it decides D
k
proc  V AL
proc
 It does not
have to recheck for earlier deviate decisions for D
proc
 because if there was any P
k
would
have read it before reading P
proc
s registers by an argument similar to the one in the previous
paragraph
 Correctness and Performance of the Protocol 	
 Correctness and Performance of the Protocol
For what follows in this section we need some auxiliary notationterminology Consider an
arbitrary system execution E
 A process P
k
maps a value v to the set f     procg in E if there exists a conguration
such that DEC
k
 v and D SET
k
 proc we denote this mapping by D
k
proc and say that
P
k
decides this value v for the set f     procg This might happen either because P
k
copies
that decision from the shared register of some other process in READ check or because
P
k
computes that decision in SafePhase on in the main body of DECIDE Note that
P
k
in its return step in E returns D
k
n In analogy with consistency for the nal output
value we say that the decisions for a set f     procg are consistently made in E by the












 If s and s
 
are steps by processes s  s
 





that s either precedes or is concurrent with s
 
in E the latter is equivalent with s
 
 s
The step of P
k
in which it reads P
proc
s shared registers for the rst time not during procedure








 A process P
k
is dominant in E if in the conguration after S

k and henceforth in all
subsequent congurations in E it is DOM
k
 
Lemma  Waitfreedom In a system with n processes in any execution each process P
k
makes its return step after having made at most nn  	   steps




k in k steps of its own for each D
k
x
k  x  n it needs in the worst case ie if recheck is necessary  x   steps of its
own Summing this all up we have that in the worst case P
k





x  	  nn  	  kk   	
steps of its own The largest value is when k  	 because P

never rechecks and terminates
in at most n steps and equals nn   	   steps  
Lemma  Validity In each execution each process which makes a return step outputs a
value that equals the input value of some process in the execution
Proof Sketch A process P
k
that terminates returns the value D
k
n that decides and
holds in its DEC
k







shared variable or from an assignment to DEC
k
of the input value V AL
proc
of a process P
proc
 In the latter case the lemma is straightforward in the former case if we
trace back the origin of the value held in DEC
k
 
 by the same argument we will nd that it
is an input value of some process in that execution  
 Correctness and Performance of the Protocol 

Lemma  Consistency In a system with n processes the decisions for all sets f     pg
  p  n are consistently made by the processes of the set fP

     P
n
g in every system
execution
Proof Sketch This can be proven by induction on the number p
Consider an arbitrary execution E in a system with n processes For p   the lemma




     P
n
g in order to decide D
proc





 which implies that the input value of P

is available in its V AL

shared
variable therefore for each P
proc





Assume the lemma holds for all p  k we will show that it also holds for p  k   From
the induction hypothesis we know that the decisions for f     kg are consistently made by
the processes of the set fP

     P
n
g In order to prove that the decisions for f     k g
are also consistently made by the processes of the set fP

     P
n

















proc in E reads DOM
k
  and therefore can only possibly decide D
proc





itself will also decide D
k
k    D
k
k Since the decisions
for f     kg are consistently made by the processes in fP

     P
n
g the same follows in this













     P
n
g that executes S
k
proc and decides for
f     k  g reads DOM
k
  and therefore sets D
proc







k    VAL
k
 Therefore in this case the decisions
for f     k k g are consistently made by the processes in fP








     P
k
g which executes S
k
proc and decides for f     k  g
reads DOM
k
  Clearly proc   and proc  k   otherwise the dominance of
P
k




k    D
proc





k   P
k
































k    D
proc





will copy that decision from P
proc





k   steps respectively ie before the steps in which they would have to






     P
k




and decides for f     k  g





  and therefore sets D
proc
 

















values   respectively it holds that














 frecheckk  VAL
k












k the above imply that P
proc
 
will copy that decision from P
proc






     P
n




and decides for f     kg
we have the following a if P
proc
  
is dominant in E since k    proc
  
 from
the protocol it follows that P
proc
  
may only copy D
proc
  
k   from a process in
fP

     P
k
g b if P
proc
  
is not dominant in E and reads DOM
k




k    D
proc
  
k if it reads DOM
k
  then we have the following about





k   P
k









































k   D
proc












Since the decisions for f     kg are consistently made by the processes in fP

     P
n
g
the same follows in this case for the decisions for f     k g
 
Considering the requirements from a solution to the waitfree consensus problem the
previous lemmas imply the following theorem
Theorem  TheDECIDE protocol correctly implements a waitfree solution to the consensus
problem in an inphase multiprocessor system with n processes with T  nn   	  	
Conclusions
For the nprocess consensus problem we have given a solution that tolerates processor crash
and napping failures in an inphase multiprocessor system thus showing that this system
model has a nice property which is useful for faulttolerant multiprocessor coordination and
synchronization Besides faulttolerant consensus and clock synchronization it is interesting
to solve other problems eciently and faulttolerant in this system model moreover it would
be useful to implement these algorithms
References 
References
 K Abrahamson On Achieving Consensus Using a Shared Memory Proceedings of
PODC  pp 		
	 Juan Alemany and Edward W Felten Performance Issues in NonBlocking
Synchronization on Shared Memory Multiprocessors Proceedings of PODC 
 pp
	
 James Aspnes Maurice Herlihy Waitfree data structures in the Asynchronous
PRAM Proceedings of SPAA 
 James Aspnes Maurice Herlihy Fast Randomized Consensus Using Shared Memory
Journal of Algorithms  pp  
 James Aspnes Orli Warts Randomized Consensus in Expected On log

n
Operations Per Processor Proceedings of FOCS 
 pp 
 Greg Barnes A Method for Implementing LockFree Shared Data Structures
Proceedings of SPAA  pp 		
 Elisabeth Borowsky Eli Gafni Generalized FLP Impossibility Results for tresilient
Asynchronous Computations Proceedings of STOC  pp 
 Soma Chaudhuri Agreement Is Harder Than Consensus Set Consensus Problem in
Totally Asynchronous System Systems Proceedings of PODC  pp 	
 Danny Dolev Cynthia Dwork Larry Stockmeyer On the Minimal Synchronism
Needed for Distributed Consensus Journal of the ACM vol  No  January 
pp 
 Shlomi Dolev Jennifer L Welch Waitfree clock synchronization Proceedings of
PODC  pp 
 MJ Fischer The Consensus Problem in Unreliable Distributed Systems A Brief
Survey YALEUDCSRR
 June 
	 Michael Fischer Nancy Lynch and Michael Paterson Impossibility of
Distributed Consensus with One Faulty Process Journal of ACM Vol 	 No 	 April
 pp 	
 Maurice Herlihy A Methodology for Implementing Highly Concurrent Data Objects
Proceedings of ACM PPoPP  pp 	
 Maurice Herlihy WaitFree Synchronization ACM Transactions on Programming
Languages and Systems Vol  No  January  pp 	
 Maurice Herlihy JEB Moss Transactional Memory Architectural Support for
LockFree Data Structures Proceedings of ISCA 
 Maurice Herlihy and Sergio Rajsbaum Set Consensus Usibg Arbitrary Objects
Proceedings of PODC  pp	
 Maurice Herlihy and Nir Shavit The Asynchronous Computability Theorem for
tresilent Tasks Proceedings of STOC  pp 	
 Maurice Herlihy and Nir Shavit The Asynchronous Computability Theorem for
References 
Waitfree Computation Proceedings of STOC 
 K Hwang Advanced Computer Architectures Parallelism Scalability
Programmability McGrawHill Inc 
	 Michael C Loui and Hosame H AbuAmara Memory Requirements for Agreement
among Unreliable Asynchronous Processes Advances in Computing Research Vol 
 pp 
	 Nancy A Lynch and Isaac Saias Distributed Algorithms Lecture Notes
MITLCSRSS  Research Seminar Series
		 Michael Saks and Fotios Zaharoglou WaitFree kset Agreement is impossible
The topology of Public Knowledge Proceedings of STOC  pp 
