In the consensus problem in a system with n processes, each process starts with a private input value and has to choose irrevocably a decision value, which was the input value of some process of the system; moreover, all processes have to decide on the same value. This work deals with the problem of waitfree-fully resilient to processor crash and napping failures-consensus of n processes in an "in-phase" multiprocessor system. It proves the ezistence of a solution to the problem in this system by presenting a protocol which ensures that each process will reach decision within at most n(n -3)/2 + 3 steps of its own in the worst case, or within n steps i f no process fails.
Introduction
In the consensus problem in a system with n processes, each process starts with a private input value and has to choose irrevocably a decision value, which should be valid, i.e. it should equal the input value of some process, and consistent, i.e. it should be the same value for all processes. Whereas this is no problem in a n ideal, failure-free environment, it imposes certain constraints on the capabilities of an actual system, which is viable only if it permits protocols tolerant to failures. In a system with failures the consensus problem becomes a central issue of multiprocessor synchronization and coordination. Solutions which guarantee that each process decides after a certain number of its own steps, regardless of the other processes' relative speeds, are called wait-free. Wait-freedom is a desirable property in concurrent systems, since it helps in taking advantage of the inherent parallelism in the system by ensuring that no process may be blocked by others which might be slow, preempted, swapped out, delayed without warning by interrupts; moreover, As expected, such a fundamental problem received much attentmion; as a result, many faces of the problem have been studied. In [12] and [22] it has been proven that in completely asynchronous systemsmessage passing and shared memory, respectively-not even one processor crash can be tolerated by a deterministic consensus protocol. In [9] the result is generalized for message passing systems; several critical parameters are identified and it is examined how they affect the number of faults that can be tolerated by a consensus protocol. In [14] shared memory data objects are partially classified according to the number of processes that can reach consensus in a wait-free manner using them. Recently, three groups with [i'], [17] and [25] , concurrently and independently have proven a conjecture first stated in [8] , that even in the case when the agreement condition is weakened so that the decision values produced may differ, there is no protocol t o tolerate k failures, where k is the maximum number of distinct values that may be chosen as decisions. Of particular interest was the introduction of algebraic and combinatorial topology in the study of these problems ([17, 25, 16, 181) . On the other hand, since in the asynchronous model the fault-tolerant consensus problem cannot be solved deterministically, solutions that have been given employ randomization or assume some form of synchrony. For a n introduction and more references cf.
[ll], [23] .
If we want to sum these up, from the theoretical point of view we have many surprising negative results, while the interest in the problem remains high. This is because, on one hand, it is interesting to develop a thorough understanding of the borders and relations between classes of objects with respect to their synchronization power; on the other hand, it is interesting to study more up-to-date architectures, which provide more fundamental synchronization primitives than just atomic reads and writes. Besides, it is easily noticeable that there is an important middle ground between the completely asynchronous and the completely synchronous extremes; Ithis middle ground is reasonable for modeling real concurrent systems. As a result, there is an increasing interest in the past few years in research towards defining and designing new architectures (e.g. the transactional memory [15]) or towards exploiting the properties of already existing ones (those properties which wiere not all present in the theoretical models), in order to implement waitfree shared data objects ([13, 2, 61, to mention but a few).
Results and comparison with previous work
Following the direction mentioned above, in this work we consider the wait-free consensus problem in an "in-phase" multiprocessor system. In this system processors share a common clock pulse; in the duration of a pulse a processor reads the shared data of one processor, does some local computation and updates its own shared data. (It should be plointed out that a processor cannot modify the contents of registers owned by other processors.) It is possible that processes in this system operate a t very difierent speeds because of pulse misses due to preempt,ion, interrupts, page faults, or even processor crashes. So, although in a step a process atomically reads and writes, and, therefore, 2-process wait-free consensus is solvable ([14]), due to pulse misses and possible processor crashes, it is not obvious whether the n-pro cess consensus problem can be solved wait-free in the system.
In the same system model the wait-free clock synchronization problem has been earlier studied in [IO, 241. Since by today's technlology multiprocessor computers have large numbers of processors and since the probability of a crash increases with the number of processors in the system, it is vi1,ally important both to study which multiprocessor can support protocols which tolerate faults, as well as to design such faulttolerant protocols for them.
This work presents a solution for the wait-free consensus problem with n processes in an "in-phase" multiprocessor system, thus answering an open question stated in [lo] and showing that this system model/architecture is strong enough to support deterministic n process fault tolerant agreement. The protocol ensures that a process will reach decision within n ( n -3)/2 + 3 steps of its own in the worst case, or within n -1 steps when no process misses a pulse.
To the best of our knowledge no solution to this problem has been given before. Previous results that could serve as solutions can be found in [l, 4, 51. Those protocols are for the the asynchronous model, which supports only resid OIC write atomically, and guarantee decision in exponential, O ( n 4 ) and O(nlog2n) expected number of ~telps, respectively, but at the cost of randomization. On the other hand, the n process wait-free consensus protocols presented in [14] require some form of multi-writer read-modify-write or more sophisticated primitives (augmented queue, memoryto-memory-swap); the system model studied here does not provide that directly. Besides, the three protocols presented in [9] a s pact of a thorough analysis of the cases when consensus is solvable in message-passing systems, cannot be translated into solutions for our system model. Protocols E2 and E3 (of [9] ) assume synchronous processes (no napping faults) and totally ordered messages (not just FIFO channels), respectively. Protocol E l relies much on the nature of synchronous message communication, i.e. that a process which had a long napping failure receives all the messages sent to it during that time interval as soon as it resumes execution. In our model it can happen that a process has to maike n -1 steps to learn about shared variable modifications; during that time it might suffer a new napping fault and this might be repeated unbounded many times.
The Computation Model
The system consists of n processes which are identified by distinct identity numbers, denoted by PI,. . . , P,. The processes communicate via a set of single-writer, multi-reader atomic registers. Each one owns a subset of these registers. The owner of a register can write the register while all the other processes can read it. A stlep by a process P k consists of the following actions: We consider "in-phase" multiprocessor systems, in which all processors :share a common clock pulse. Each pulse is a (possibly empty) set of process names; the set of processes that make a step in the pulse. Each process can make rtt most one step in one pulse; if it does not make a step in some pulse it will be said to miss that pulse. A configuration is a tuple of process states and values ofthe shared variables. A system ezecution is a sequence coalc1a2 . . . of alternating pulses (denoted by az) and configurations (denoted by c z ) ; consecutive pulses sire indexed with consecutive integers. Each configuration ci in a system execution is derived from its directly preceding configuration ci-1 by the state transitions and the shared variable updates of the proces,ses that make a step in pulse q; the reads of shared rlegisters that occur in pulse T* re-turn the respective values of c;-1, while the updates of the shared registers in the same pulse take place in unison to derive c,.
This system can be viewed as modeling either a PRAM (cf. [20, 211) with faults or a multiprocessor synchronous system (cf. [19] ) in which scheduling of the processes in different processors is done independently. Pause intervals can be interpreted as periods during which some process is not scheduled in a processor or is delayed without warning by interrupts, preemption, faults in the connections of the pausing processor or even processor crashes.
For any pulse T ; in any system execution E and for any process Pk, let work(Pk, a) denote the number of steps that Pk made from the beginning of E (pulse n-1) until (and including) pulse x i .
In the consensus problem each process Pk is given an input value vk and is required to return an output value vi; we call the step when this happens the return step of 4 . If Pk makes its return step in some pulse n-i, it makes no more steps in subsequent pulses in the execution; if a t some pulse in an execution Pk has not made its return step yet, it is called undecided in that pulse. A wait-free consensus protocol should satisfy the following requirements for every system execution:
Wait-freedom There should be a bounded number T such that: there is no process Pk and pulse n-, such that Pk is undecided in ~i and work(Pk, T ; ) > T .
Validity For each process Pk, its output value vfi should equal the input value V k l of some process Pkt of the system. Consistency For any processes Pk and with output values vi and vi, it should be vfi = vfi,.
Description of the Protocol
The protocol is described in C-like pseudocode in figures 1 and 2. We have adopted several conventions, like using capital case names for shared variables and capital boldface for calls to read/write shared registers. The following paragraphs describe the protocol more intuitively.
Each process Pk ( k # 1) first plays a game with PI.
If it wins it is called dominant in the set {PI, ..., Pk} and writes that information on its DOMk shared register (we also write "dominant" as a shorthand for "dominant in the set { P I , ..., Pk)"). Subsequently, the final decision is reached through stages in which partial decisions are made inductively. Consider an arbitrary execution E and let Dl..,,,, denote the decision that would be taken in the execution E' derived from E by considering only processes P I , . . . , P,,,, (i.e. if the steps of the other processes in E are eliminated).
The protocol tries to follow the inductive rule:
if P, is dominant otherwise where VALk, equals the input value of 4. Each process tries to find D~..p+oc for proc = 1,. . . , n. After a process Pk makes a partial decision for D1..pToc, it writes the value v decided and the process identity proc on its DECk and D-SET' shared registers, respectively. Let Dk(proc) = v denote that mapping, which is the estimation of p k for Dl..p+oc; Pk will finally return its D k ( n ) . Since processes might miss pulses and are, therefore, asynchronous, deviations from the rule for deciding D~. . p 7 0 c are allowed, so that a fast process Pk, can meet the wait-free requirement when PI and P,,,, are so slow that the result of the game between them is not known at the time that Pkl needs to find out D~..p+oc. The deviation is that the fast process (arbitrarily) considers that P,,,, is not dominant and sets D k ( p -0~) = Dk(proc -1). Since this is done by a fast process, i.e. early enough, that information is available in the respective shared registers, for the slow processes to find out about the deviation from the rule, and, therefore, decide consistently. Naturally, each process, in each one of its steps, checks whether a final decision or a more advanced-than the one it knows so far-partial decision is reached and adopts it, thus advancing its process scanning pointer, i.e local variable proc.
The game between PI and an arbitrary Pk is played as follows: Each process (including P I ) , as its first action of the protocol-announcement or 0-step-simply announces its participation in the game by writing its input value on its VALk register. In its next step, otherwise, ifit sees that P,,,, is dominant, Pk decides Dk(proc) = VALpToc, without having to recheck for earlier, deviate decisions for D1..pToc, because if there was any, Pk would have read it before reading PpToC's registers (by a n argument similar to the one in the previous paragraph).
Observation 1 If a process Pk

Correctness and Performance of the Protocol
For what follows in this section, we introduce some auxiliary notation/terminology. Consider a n arbitrary system execution E :
A process Pk maps a value v to the set { 1, . . . , proc} in E if there exists a configuration such that 6)ECk = v and D-SETk = proc; we denote this mapping by &(proc) and say that Pk decades this value v for the set (1,. . . , proc).
This might happen either because Pk copies that decision from the shared register of some other process (in READ&check) or because Pk computes that decision (in SafePhase on in the main body of D E C I D E ) . Note that Pk in its return step in E returns D k ( n ) . In analogy with consistency for the final output value, we say that the decisions for a set (1, . . . , proc) are consistently made in E by the processe:; of a set P, if for any two Pk, Pk' E P that decide for {I,. . .,pot), it holds that &(pTOC) = Dk! (,proc).
I f s and s' are steps by processes, s -+ SI denotes that s precedes S I , while s I(-+ s' denotes that s either precedes or is concurrent with SI in E; the latter is equivalent with ~( ' s ' -+ s). The step of Pk in which it reads Pproc's shared registers for the first time-if ever-and not during procedure 
steps of its own. Since PI never rechecks and, hence, terminates in at most n steps, the largest value of the above expression is for IC = 2 and equals n(n -3)/2 + 3 steps.
ci Lemma 2 (Validity) In each execution each process which makes a return step outputs a value that equals the input value of some process in the execution.
Proof (Sketch) A process Pk that terminates returns the value & ( n ) that decides and holds in its DECk shared variable. That value came either from a copy from some process Pp'S DECk: shared variable, or from an assignment to DECk of the input value VAL,,,, OS ;R process P,,,,. In the latter case the lemma is straightforward; in the former case, if we trace back the origin of the value held in DECk,, by the same argument, we will find that it is an input value of some process in that execution. Since the decisions) for {l, . . . , k} are consistently made by the processes in { P I , . . . , Pn}, the same follows in this case for the delcisions for {l, . . . , k + 1). ( l ) or that S k s l ( l ) -+ SpTOc~(k + 1). In the for- Moreover, from the protocol we have that Considering the requirements from a solution to the wait-free consensus problem, the previous lemmas imply the following theorem:
Theorem I The DECIDE protocol correctly implements a wait-free solution to the consensus problem in an in-phase multiprocessor system with n processes, with T = n(n -3)/2 + 2.
Conclusions
In this paper it is shown that the in-phase multiprocessor system model is strong enough t o support wait-free-i.e. tolerating processor crash and napping failures-sollutions t o the n-process consensus problem. Previously it has been shown that the same architecture can support wait-free and self-stabilizing clock synchronization protocols. Both the consensus protocol and the clock synchronization protocols have quadratic time complexity; since both problems are central issues in fault-tolerant multiprocessor coordination, it is interesting t o either prove that this is also the lower bound or t o find more efficient solutions.
