Though it is common practice to treat synchronization primitives for multiprocessors as abstract data types, they are in reality machine instructions on registers. A crucial theoretical question with practical implications is the relationship between the size of the register and its computational power. We wish to study this question and choose as a rst target the popular compare&swap operation (which is the basis for many modern multiprocessor architectures). Our main results are:
Introduction
We consider an asynchronous concurrent system consisting of n processes that communicate via shared memory. It is well known that the type of operations allowed on the shared memory cells greatly e ects the kind of tasks that the n processes can solve. The rst results of this type 7, 9, 12, 15] proved that if the only operations supported by the hardware are atomic read or write of memory cells (registers) then the system cannot implement a wait-free solution to the consensus problem, even if n = 2. (An algorithm is wait-free if each process nishes the algorithm in a nite number of steps regardless of the number of faults and the speed of other processes.) However, if the hardware also supports atomic testand-set operations on single bits in the shared memory (as some old IBM machines do, and some modern machines such as Encore's Multimax, Sequent's Symmetry, DEC's Fire y and 6380 Corollary support) then 2 processes can solve the consensus problem among them but 3 processes cannot 9, 12, 15] . In his seminal paper Herlihy de ned a hierarchy on abstract operation types, classifying them according to the number of processes among which these operations can solve consensus 9]. More speci cally, an operation type has consensus number k if any system supporting that operation type and the read/write operation type on an arbitrary size and number of registers, can be used to solve consensus among k processes, but cannot be used to solve consensus among k + 1 processes. At the bottom level of the hierarchy are the weakest type of operations with consensus number 1, e.g. atomic read/write of registers, while at the top are operation types such as compare&swap, whose consensus number is 1.
Herlihy and Plotkin showed that any wait-free synchronization task can be solved with any operation in the top level of the hierarchy 12, 18] . That is, they presented a universal construction for any sequentially-speci ed wait-free task. Jayanti and Toueg later presented simple and bounded universal constructions 13] .
In this paper we de ne the space complexity of synchronization registers as the number of di erent values that the registers can hold (which is exponential in the number of bits in the registers). We de ne the space complexity of an arbitrary synchronization object as the number of states in its sequential speci cation. We study the e ect of the space complexity of the registers on the e ciency with which synchronization tasks are solved.
We demonstrate our results by considering solutions to the leader election problem with compare&swap registers. In the leader election task each process proposes its own identity as its input and all processes decide on one unique identity as their output decision value. Validity requires that the elected identity must be one that has been proposed. The compare&swap register type which is supported by two contemporary machines (486-based Corollary and 68030-based NEWS) is in the top level of Herlihy`s hierarchy, and is de ned as follows: c&s(a ! b) operation, on register r, is: c&s(a ! b)(r): return (v) prev := r ; if prev = a then r := b return (prev) Throughout the paper we assume that compare&swap registers are initialized (before any algorithm starts using them) to ?. Also, in all the consensus algorithms throughout the paper, the rst step of a process is to write its identity in a read/write register (thus notifying others that it is active in the algorithm).
We begin the paper with a presentation of several previously known leader election algorithms, analyzing the time-space tradeo of each. We then present an algorithm that uses a compare&swap of space complexity k (and atomic read/write registers) to solve leader election among (k ? 1)! processes, in k ? 1 accesses to the compare&swap. Finally, we show a matching lower bound proving that this algorithm is optimal in its time-space tradeo . The lower bound on the tradeo is proved by a simulation technique that reduces a compare&swap leader-election algorithm with limited resources (time and space) to a set-consensus algorithm which uses only read/write registers.
Related work : Since the rst impossibility proof of asynchronous agreement in a fail-stop distributed system, by Fischer Lynch and Paterson 7] there have been many papers extending and generalizing the proof method 4, 5, 6, 13] . Those papers extend it to deal with di erent models of communication and di erent models of failure. Some of the recent papers 3, 4, 10] addressed the number of failures that shared memory objects can with stand. This issue of t-resiliency have not been addressed in our paper. However, in 3] Borowsky and Gafni have introduced a simulation technique to address the power of various shared objects (without restriction on their space complexity). In their technique each simulating process tries to simulate all the codes of the simulated algorithm while in our technique we divide the codes among the simulators, each simulating several codes. Burns Cruz and Loui 1] consider the e ect of the size of registers on their ability to solve leader election. However, Burns et. al. make two strong assumptions, (1) that each read-modify-write register may be written at most once, and (2) that the system is equipped only with read-modify-write registers (there are no read/write registers). Thus, the lower bound proof of Burns et. al. is simpler, since the state of the system is rendered by the state of the strong registers. Under these assumptions, Burns et. al. prove that a k value read-modify-write register can elect a leader among at most k ? 1 processes (compared with the O(k!) in our model) and in general if there are several such registers then the number of processes is the product of the registers sizes (where the size of a register is the number of values it can hold).
The paper proceeds as follows: Model and De nitions are in Section 2, previous work is presented in Section 3, the algorithm is presented in Section 4 while the time-space tradeo proof is in Section 5. Conclusions and discussion are in Section 6.
Model and De nitions
We use the same model and notation as in 9] . A consensus protocol is a system of n processes where each process starts with an input value from some domain D. The processes communicate with one another by applying operations to the shared memory and eventually agree on a common input value and halt. A consensus protocol is required to be: (a) Consistent: distinct processes never decide on distinct values, (b) Valid: the common decision value is the input to some process and (c) Wait-free: each process decides after a nite number of steps.
The sequential speci cation of a consensus object is that all decide operations return the argument value of the rst decide 11, 18] . A wait free linearizable implementation of a consensus object is called a consensus protocol.
A leader election (LE) protocol is a consensus protocol where the domain D is the processes' names and the input to process i is its own id i. Throughout the paper we assume that the ids are from the range 1 : : :n. This assumption is based on the existence of wait free renaming algorithms that use only SWMR shared atomic registers. Such algorithms can be used before our algorithm to reduce the range of the id's to 2n (n, the number of processes). Note, however, that the complexity of the algorithm will change according to the renaming algorithm we use.
The k-set consensus problem is a generalization of the consensus problem. Informally, a k-set consensus protocol is a system of n processes where each process starts with an input value from some domain D. The processes communicate with one another by applying operations to the shared memory registers and eventually each decide on a value from a set D 0 D where jD 0 j k. A k-set consensus protocol is required to be: (a) Consistent: jD 0 j k, (b) Wait-free: each process decides after a nite number of steps and (c) Valid: the decision value of any process is the input to some process.
Informally, an atomic SWMR register is an object that supports read operation of the register by any process, but can be updated by only one speci c process.
A compare&swap-(k) object is a compare&swap as de ned in the introduction, and whose register can hold k di erent values, from the set C = f?; 0; 1; : : :; k ? 2g. The space complexity of a compare&swap-(k) is k. A compare&swap operation is said to succeed if the operation has changed the register`s value. An algorithm is said to have time complexity t if there is a run of the algorithm where at least one process performs t operations on the compare&swap object.
Previous Results
Consider the following sequence of three leader election algorithms with decreasing space complexity and increasing time complexity:
1. To elect a leader among n processes using a compare&swap register that can hold n + in the read/write memory and iterates. However, if it fails, it scans the memory for the largest recorded iteration number, adopts that process's candidate id and iteration number and iterates (using the new id and iteration number). In this O(log n) space algorithm each process performs at most O(log n) accesses to the compare&swap register. This sequence of algorithms poses the following questions: 1. What is the optimal space complexity of a system of compare&swap registers that can solve leader election (with unbounded read/write memory). That is,what is the smallest compare&swap register needed for solving LE among n processes. 2. What is the optimal number of accesses to the compare&swap register in a leader election algorithm if the register space complexity is k (in other words, what is the optimal tradeo ). An algorithm which has optimal time-space tradeo is presented in the following section. We conjecture that its space complexity is optimal. 4 Leader Election for (k ? 1)! processes using compare&swap-(k)
A leader election (LE) algorithm among n processes, using one compare&swap-( log n loglog n ) (i.e. one compare&swap that can hold logn loglog n di erent values) is presented in this section.
W.l.o.g. assume that n = (k ? 1)!. Let P be the (k ? 1)! di erent permutations of f0; 1; : : :; k ? 2g. The algorithm uses a one to one mapping F : f1; 2; : : :; ng ! P, from the processes ids to P. With each id i of process P i we associate a permutation L(i) of C (the possible values of the compare&swap) as follows: L(i) = ?jjF(i). Since we assumed n = (k ? 1)! each process executes at most (k ? 1) iterations of the code. If there are less processes, less iterations are needed (the number of iterations, i, is such that (k?1)! (k?1?i)! n). In the algorithm processes agree on the identity of the leader by iteratively reaching consensus on the sequence of symbols L(leader) that is associated with the leader identity. In each phase one more symbol of the sequence is agreed upon. The consensus is reached by using the compare&swap register. While in solution 3 of the previous section the compare&swap register holds both an index and a binary bit, in this algorithm it holds only one value, the symbol that processes have most recently agreed upon. In each phase a process performs the following four steps: First, according to its phase number and candidate id it tries to c&s() the last symbol agreed upon with the next symbol of its candidate. If the c&s() was successful, the process records its success (phase number and value swapped in) in the shared memory and continues to the next phase. If the c&s() operation failed then the process scans the shared memory to see if the other processes have already agreed upon a symbol for the current, or more advanced phase. If so, the process adopts the phase number and candidate id of the process with the most advanced phase number. However, if no phase number larger than the current is found in the memory then it must be that the value returned by the failed c&s() operation is the symbol agreed upon in the current phase. The process then adopts as a candidate id a process whose permutation pre x equals to the sequence of symbols agreed so far, and continues to the next phase.
The key point in this algorithm is that processes reuse the same compare&swap register to reach agreement in all the phases, despite the fact that processes may be in di erent phases at the same time. This is made possible because each process id is represented by a non repeating permutation of symbols. This ensures that delayed c&s(a ! b) operations from previous phases would fail, since a may not repeat in the permutation.
The structure of the algorithm (given in Figure 1 ) is similar to that of algorithm 3 from section 3.
The algorithm uses one compare&swap object, csk, that can hold k values, (k ? 1)! = n, and which is initialized to ?. Each process P i has one swmr atomic register, R i , that it can write and all other can read. Each such atomic register has the following two elds: (Phase i , CandidateId i ), 1 i n. Each process starts the algorithm by posting its id in its swmr atomic register. In the rst phase each process P i , proposes the rst symbol of L(i) (=Symbol (1,i) Operation Leader-Election(id:value) returns(value)
The LE operation accepts as parameter the processes's id and returns as output the id of one of the participating processes. All participating processes accept the same output value. shared R i , 1 i n, n atomic registers each consisting of a pair (Phase i ,CandidateId i ).
CandidateId i holds the current candidate of process i for being the consensus value. It is initialized to i. Phase i holds the number of bits that process i knows that have been agreed upon. It is initialized to 1.
For a given k we show LE among up to (k ? 1)! processes. If some process is in a more advanced phase then copy the data of the most advanced such process. Corollary 1 All the processes whose Phase equals k ? 1 have elected a unique leader. Proof of Claim: In any state of the algorithm, denote any process that satis es Part 1 of the claim for-runner. We prove the claim by induction on the length of the run. Let l = 1 : : : l be a pre x run of the algorithm. For 0 = ; the claim trivially holds. Assume the claim is correct for l , we prove correctness for l+1 = l l+1 .
Let P i be the process executing l+1 . Clearly, if l+1 is not an assignment to R i or to the compare&swap, the operation does not e ect the correctness of the claim. Four cases remain to be veri ed:
1. Assume l+1 is the c&s() operation in Line 5, a the symbol that was in the compare&swap before the operation and b the symbol that is in the compare&swap after the operation. If l+1 fails, the claim is correct by induction. Assume otherwise (a 6 = b). Since the Phase i +1 symbol in CandidateId i is b, which is the current symbol in the compare&swap, P i is a for-runner and thus Part 1 of the claim follows. As there is no change of any atomic registers, Parts 2 and 3 of the claim are correct by induction. Since P i is a for-runner, there is no process P j such that Phase i + 1 Phase j . Therefore Part 4 of the claim is not relevant for P i and follows from the induction hypothesis. 2. Assume l+1 is the assignment operation R i := (Phase i +1,CandidateId i ) by process P i , in Line 7.
Let x be the last successful c&s() operation of P i , c&s(a ! b) where a was the symbol in the compare&swap before the operation and b the symbol in the compare&swap after the operation. That is, l+1 = 1 : : : x : : : l+1 . We consider two cases, case 1 in which P i is a for-runner at the end of l and case 2 where P i is not a for-runner at the end of l . Case 1: The only way in which the rst part of Claim 1 does not hold at l+1 is if the value in the compare&swap register at the end of l is not b. This means some process, P j , updated the compare&swap between x and l+1 . Thus, when P j executed the update operation, the Phase j symbol in CandidateId j was b. Assume the value in Phase j at the end of x was v 1 . As P i is a for-runner after l , Phase i Phase j v 1 . From Parts 2 and 3 of the induction hypothesis it follows that the v 1 symbol of CandidateId i is b. But, we know that b is the Phase i +1 symbol of CandidateId i , a contradiction since the same symbol does not appear twice in an id. This means that P i stays a for-runner at the end of l+1 and thus Part 1 of the claim holds. Obviously, Part 2 of the claim is not e ected and remains true. Part 3 holds since any process P j that satis es Part 3 of the induction hypothesis (Phase j Phase i at the end of l ) satis es Part 3 of the claim at the end of l+1 . Since P i is a for-runner at the end of l , this is true for all processes and so Part 3 holds. Part 4 of the claim follows by the same argument. Case 2: P i is not a for-runner at the end of l . Let P lead be a for-runner process at the end of l . This case describes the behavior of a process that updated the compare&swap and was then suspended for a long time so that other processes continued with the leader election without it. We claim that at the end of l Phase lead > Phase i . Assume to the contrary that Phase lead = Phase i . Then the symbol in the compare&swap may not be b (because we assume P i is not a for-runner). This means that some process, say P j , successfully updated the compare&swap between x and l . Assume that the value in Phase j just before the update operation was v 2 . Then, from Parts 2 and 3 of the induction hypothesis, it follows that at the end of l the v 2 symbol of CandidateId lead must be b. Since Phase lead =Phase i it follows that the v 2 symbol in CandidateId i is b. Since we know that the Phase i +1 symbol of CandidateId i is also b and since the same symbol does not appear twice in the same id, we reach a contradiction and therefore Phase i < Phase lead at the end of l . Hence P lead stays a for-runner after l+1 and thus Part 1 of the claim follows. Part 2 is not e ected and therefore holds. Part 3 holds since any process P j that satis es Part 3 of the induction hypothesis (Phase j Phase i at the end of l ) satis es Part 3 of the claim at the end of l+1 . Part 3 of the claim holds because all other processes (for which Phase j > Phase i ) satisfy Part 4 of the induction at the end of l . Since P i executed the update of the shared memory register, Part 4 of the claim is no longer relevant for P i and follows by induction. 3. Assume l+1 is the assignment R i := R 0 k in Line 11. Since before the assignment Phase i < Phase 0 k we know P i was not a for-runner. Therefore, any process that was a for-runner at the end of l stays a for-runner at the end of l+1 and Part 1 of the claim follows. Since before the assignment phase of P i before the assignment. Also, let x be the last (unsuccessful) c&s() operation of P i , that is, l+1 = 1 : : : x : : : l+1 . This scenario occurs if some process P s with phase equal to Phase i and CandidateId which has the same rst Phase i symbols as CandidateId i but a di erent Phase i +1 symbol, updated the compare&swap to its Phase i +1 symbol but has not yet increased its phase when P i executed the last collect (Line 9). Assume the current value in the compare&swap at the end of l is b. We distinguish between two cases: (a) Assume that at the end of l there was some process with phase strictly larger than phase i . Then, there must have been a for-runner process, P lead , with phase strictly larger than phase i . P lead stays a for-runner after l+1 which means Part 1 of the claim holds. Part 2 of the claim holds since we chose P 0 k to be such that the rst Phase i symbols of CandidateId 0 k (local copy!) are the same as the rst Phase i symbols of CandidateId i . To prove Part 3, it is necessary to show that the rst Phase i +1 symbols of CandidateId 0 k are also the rst Phase i +1 symbols of P s , the process that successfully updated the compare&swap to b. Assume that after l all processes were in phase less than or equal to Phase i . We claim that the value in the compare&swap at the end of l+1 must be b and therefore P i is a for-runner. By way of contradiction assume the value in the compare&swap at the end of l is not b. Then between x and l+1 some process, say P j , changed the value in the compare&swap. This means that the Phase j symbol in CandidateId j just before the update operation must have been b. Assume that Phase j =v 1 . Then, it follows that at the end of l the v 1 symbol of CandidateId j is b. From Part 3 of the assumption the v 1 symbol of CandidateId i must be b and since b appears only once in CandidateId i , and it is the Phase i +1 symbol, Phase j =v 1 =Phase i +1 which contradicts the assumption that Phase i is bigger than all other phases. Thus, P i is a for-runner and therefore Part 1 of the claim holds. Part 2 of the claim holds since, from Part 3 of the induction hypothesis, the rst Phase i symbols of CandidateId i are also the rst Phase i symbols of CandidateId 0 k . Part 3 of the claim holds because at the end of l for every process P j , Phase j Phase i and so the rst Phase j symbols of P j are also the rst Phase j symbols of CandidateId i . Therefore (as we already proved Part 2) they are also the rst Phase j symbols of CandidateId i at the end of l+1 . Since we proved Part 2 of the claim and because for any process P j , Phase j Phase i , Part 4 of the claim holds by induction.
Claim 2 For every process p i , Phase i increases with every iteration of the while loop. Proof: In each iteration either one of the Lines 7,11,13 must execute. In all of them Phase i increases.
However, it still needs to be shown that, in Line 12, there must exist such a P k . Following the notation of Part 4 of the proof of Claim 1, we note that P i failed to update the compare&swap, returning b as a result of the unsuccessful update. This means that some other process, P j , with phase larger than or equal to Phase i updated the compare&swap (because there must be at least one for-runner). But, during the collect, no process with Phase larger than Phase i was found and therefore the phase of the process that updated the compare&swap is equal to Phase i . This means that b is its Phase i +1 symbol and so at least one such P 0 k must be found in Line 12.
Corollary 2 The Leader Election algorithm is wait-free.
The next theorem follows from the two claims. Figure 1 is a wait-free algorithm that uses a compare&swap register of k values to solve leader election among at most k! processes in at most k iterations per process.
Theorem 1 The algorithm in

The time complexity of LE with compare&swap-(k)
Theorem 2 Let B be an algorithm for LE among n processes that share only one compare&swap-(k), csk, and any number of atomic registers. Then, there must be a run in B such that at least one process performs O(log k?1 (n)) operations on csk in this run. We begin by an outline of the proof structure (the following two paragraphs). All the details follow there after.
Proof outline: The intuition of the proof is as follows: Each emulator emulates the run of its virtual processes in B, until one of them reaches a decision state, at which point the emulator adopts that decision value. The emulation proceeds as follows: Each emulator iteratively emulates the front ends of its virtual processes one by one until each is about to perform a successful c&s() operation (some c&s() operations can be immediately identi ed as unsuccessful and emulated by internally returning the current value assumed to be in the compare&swap). At this point it chooses one of its virtual processes to succeed and advances all others, by failing them in the c&s() operation (because their operation \takes place" right after the successful one which changes the value in the compare&swap thus causing the others to fail). Thus, the emulator assumes a speci c value as the next value of csk, the compare&swap register. After emulating the successful operation of the chosen virtual process the emulator emulates no more steps of this virtual process, e ectively emulating a fail-stop failure of this process. If several emulators choose the same \next value" at about the same time, then e ectively only one virtual process from all of them is succeeding in its c&s() operation. Although the emulators do not know which is the successful one, it does not matter since they are all marked dead. The problem arises when two or more emulators choose di erent \next value" for csk. This problem is solved by allowing each of the emulators to assume a di erent \next value" in csk, thus proceeding in di erent runs of B. The main idea is that at this point virtual processes of emulators that chose di erent \next value"s proceed to emulate di erent runs of B. In each run it is assumed that a di erent value was successfully written in the csk. While this is not a legal run for all of the virtual processes together, it is legal for each group that chose the same sequence of \next value"s. The virtual processes of each group of emulators \assumes" that all the virtual processes of the other groups have fail-stopped at the splitting point (where one run departs from the other). Roughly speaking this entails that each operation on the compare&swap register might cause each group of emulators to break into at most k ? 1 disjoint subgroups, each continuing a di erent run of B.
The rest of the proof, given after this paragraph, details the book keeping necessary for the above emulation process. The essence of it is that each emulator keeps a history variable that records the sequence of values it believes the csk had. Each emulated write to an atomic register by a virtual process is tagged by the value of the history of the corresponding emulator at the time of the write.
Proof: The reduction actually emulates a full-information version of algorithm B 3, 8, 10, 19] . Every atomic register A is replaced by a list that holds all the values that have ever been written to the register (single-writer!). Each value written is tagged with the history of the writing emulator and is appended to the register. As old histories of an emulator are always a pre x of its newer histories, and since we use swmr registers the sequence of histories in a register list are pre xes of each other as well.
Each read veri es that it reads a value from a process which was together with it in the same run at the time it wrote the value by observing the history marks of the values in the read register (which might force it not to take the most recent value of that register).
The history variable of each emulator is a sequence, a 1 ; a 2 ; : : :, where a i 2 C. Initially all histories contain the singleton ? which is the initial value of csk. The history describes the sequence of values that were written in csk. When an emulator emulates an operation of a virtual process on csk, it rst snapshots all the history variables, and then executes an internal function, calc h*() on the snapshot. The function calc h*() chooses from all the history variables one history (called h calc ) which is maximal in the sense that it is not a pre x of any other history, and that the emulator's own history variable is a pre x of (or equal to) h calc . Then, the emulation returns the last symbol of h calc as the result of the virtual process operation on csk. If the emulated c&s() is assumed successful, then the emulator appends to h calc the new value (that was assigned by the c&s() operation) and writes the result to its history variable. A crucial point in our emulation is that in all the emulators that concurrently wrote the same new history, only one emulated process actually succeeds and we don't know which one. But, as the emulated processes all fail-stop immediately after the operation, it does not matter which one of them succeeded. If the c&s() operation failed, the emulator writes h calc as is to its history variable.
When an emulator emulates a read from register A, it looks for the entry in A whose history tag is the largest pre x of the emulator's history (or equal to it) and returns the value of that entry. However, if there is an entry in the list such that the current history of the emulator is a pre x of that entry's history, then the emulator chooses the largest (latest) such entry, updates its history to the history of that entry and returns the value of that entry. Obviously, the new history of the emulator is an extension of its previous history.
Each virtual process is in one of two states, alive or dead. Initially all are alive. A virtual process that is marked dead is assumed to have fail-stopped in the emulation, and thus no more steps of its front-ends will be emulated.
Emulator q executes the front-ends of its alive virtual processes one by one. In each it proceeds until either the front-end reaches a decision, or a c&s() operation that succeeds (assuming the csk value is the last symbol in h calc ).
In the former, q terminates returning the same decision value as the virtual process. In the later case, successful c&s(), it does not execute it (meaning not returning the result to the virtual process nor updating the history variable) but switches to any other alive virtual process whose next operation is not a successful c&s(), and continues to emulate it. In this procedure the emulator might return to an alive virtual process more than once because its history may change while executing other virtual processes, thus making some c&s() non-successful. Note that when returning to a previously stopped c&s() the snapshot is re-executed.
If the next step of all the alive virtual processes of the emulator is a successful c&s() then the emulator performs the following procedure: (1) one virtual process, v x , whose next operation is, say, c&s(a ! b) is arbitrarily chosen, (2) v x is marked dead, (3) b is appended to the end of h calc and the result is written to the history variable, (4) the next operation of all alive virtual processes (which is of the form c&s(a ! )) is emulated by assuming the c&s() failed and returning b to the calling virtual processes (no reads or writes of the history variables are needed in this step as we use h calc jjb).
The procedure described in the last three paragraphs is repeated until the emulator reaches a decision state with one of its virtual processes. In any run of B, each emulator might kill at most d virtual processes (as there are at most d c&s() operations). Since each emulator has at least d + 1 virtual processes, the emulators must reach a decision state.
We now show that the emulation is correct, that is, that it implements a (k? . Also, note that h calc is a computation of h k at a point in the run. We denote a list of symbols from C (the legal values of the compare&swap) as .
Some of the operations in every R 0 are bookkeeping operations of the emulation. Others, are operations that directly correspond to operations of the front ends of the virtual processes.
The operations of B 0 that are operations of the front ends of B are called virtual operations. There are three types of virtual operations:
1. Read operations of shared memory variable x into some internal memory r, 0 = (r := read(x)) that corresponds to a read operation in the front end of the virtual process, r := read(x).
2. Write operations of value v to shared memory variable x, 0 = (write x (h 0 ; v)) that correspond to a write operation in the front end of the virtual process, write x (v).
3. History operations. There are two kinds of history operations: a history read, 0 =(r:=SNAPSHOT), where SNAPSHOT is an atomic snapshot of all the history variables, and a history write, 0 =write h 0 (v) where h 0 is the emulator's history variable. A history read and write pair corresponds to a c&s() operation of the type r :=c&s(a ! b) (in case of a successful c&s() operation the pair of history read and write corresponds to several c&s() operations, one for each active virtual process). Each emulator rst executes a history read and then calculates h calc using calc h*() as de ned before. If the last symbol of h calc is not a (h calc = m; m 6 = a) then h calc is written to the history variable by the history write ( 0 =write h 0 (h calc )) and the pair of history read and write corresponds to a failed c&s(a ! b) operation. If the last symbol of h calc is a (h calc = a) and there is another virtual process whose next operation is not c&s(a ! ) then the emulator stops executing the current virtual process and continues with the other virtual process. Note that if the next operation of the new virtual process is c&s(), we do not need to re-execute a history read but can continue the run as in the failure case. The pair of history read (performed while emulating the previous virtual process) and history write (performed while emulating the new virtual process) are mapped to a single failed c&s(). If however, the next operation of the new virtual process was not a c&s() operation then the history read is degenerate (as it does not have a matching history write) and is skipped over. The last case is when the next operation of all virtual processes is c&s(a ! ). In this case, h calc jjb is written by the history write ( 0 =write h 0 (h calc jjb)) and the virtual process is marked dead (thus emulating a fail-stop). Also, the next operation of all other virtual processes is emulated by returning b as the current value of the compare&swap. The pair of history read and write corresponds in this case to all the c&s() operations of all the virtual processes where the rst operation is c&s(a ! b). Assume the claim holds for R 0 k , h k , we will prove the claim for R 0 k+1 = R 0 k 0 k+1 , h k+1 a maximal history in R 0 k+1 which is also an extension of h k (or equal to it).
If 0 k+1 is not a virtual operation then R 0 k+1 j h = R 0 k j h for any h. Furthermore, if for some h k+1 (maximal history in R 0 k+1 ), h 0 k+1 is not a pre x of h k+1 then, by de nition, R 0 k+1 j h k+1 = R 0 k j h k+1 . Thus, it remains to prove the inductive step for the cases in which for some maximal history, h k+1 , of R 0 k+1 , h 0 k+1 is a pre x of h k+1 . We begin the proof by considering the cases where 0 k+1 corresponds either to a virtual read operation or to a virtual write operation.
Since R 0 k j h k+1 is legal (h k+1 is an extension of h k but h k was maximal in R 0 k and so no new operations are added to the runs) the invocation part of 0 k+1 is legal (because it is decided by the emulator using the same protocol that the real processes use in B). For that all the following shared operations of that emulator will not be part of any R 0 n j h n where R 0 n = R 0 k+1 0 k+2 : : : 0 n and hi n is an extension of h k+1 . Such a condition corresponds to a fail-stop in B of all the processes that the emulator emulates immediately after k+1 . As such a condition is legal, the run R 0 k+1 jh k+1 is legal and so point 1 of the claim holds. Note that it does not matter what the response of 0 k+1 was because in any R 0 n j h n the process emulating k+1 fail-stopped and, k+1 being a read operation, its response does not e ect any extension of R 0 k+1 j h k+1 . Although the value in the history variable of the emulator executing 0 k+1 has changed, h k+1 = h k and so (as the compare&swap did not change) point 2 of the claim also holds. Two points are of interest. One is that although for any run that is de ned by h the process p i that executed k+1 has fail-stopped, the emulation of v i continues (but in a di erent run, one that corresponds to, say, h 0 ). The second is that there might be an \illegal" run where all processes p i in some run R 0 j h fail-stopped. But, as the emulations continue in other runs this case does not in uence the integrity of the emulation. Now, we consider the case of history operations. We start by checking the case of history read operations that are not skipped over. Assume 0 k+1 =(r:=SNAPSHOT). As noted before, 0 k+1 corresponds to k+1 =(r:=c&s(a ! b)). Such a case occur when h calc = m; m 6 = a (as computed by calc h*() over the snapshot result). From the assumption we know that the value in the compare&swap after R 0 k j h k is the last value in h k .
1. If h calc is not a pre x of h k+1 then no new operation of this emulator will be part of any extension run of R 0 k j h k+1 which means that the virtual processes fail-stopped after k+1 . 2. If h calc is a pre x of h k+1 then it must be equal to it (by the de nition of h calc ) and therefore we return the correct value. These points mean that point 1 of the claim is correct. Point 2 is immediately correct since h k+1 = h k and the value in the compare&swap didn't change.
The case of history write operations that are not skipped over is probably the most complicated. Let 0 f = (r:=SNAPSHOT) be the matching history read (the one executed by the emulator just before executing the history write)
1. Assume that h calc = h k . In such a case 0 k+1 corresponds to k+1 = (r:=c&s(a ! b)), k+2 = (r:=c&s(a ! c)), . . . . Since we mark the virtual process that executed k+1 as dead its return value is not important and so can be assumed successful. The last symbol in h calc is a (otherwise we would have skipped over the write operation). This means that the last symbol in h k is also a. From the assumption we know that a is the value in the compare&swap and so the value in the compare&swap changes to b. Since all other c&s() operations return b and fail, point 1 is correct. Since h k+1 = h k jjb point 2 is also correct. 2. If h calc does not equal h k then it can only be smaller (by de nition). In such an event, there are two possible cases :
(a) Assume h calc jjb is not a pre x of h k . Then all the next operations of the emulator will not be part of B 0 j h k and so the virtual process is assumed to fail stop after k+1 . From this follows that the claim is correct (see 2b).
(b) Assume h calc jjb is a pre x of h k . Let f be the rst successful compare&swap operation mapped after skipping over 0 f (the corresponding history read). We know one such operation occurred since h calc jjb is a pre x of h k . We also know that is was a c&s(a ! b). Assume 0 k+1 corresponds to c&s(a ! b), c&s(a ! c 1 ),: : :,c&s(a ! c x?1 ). We map k+1 in the following way. First we shift all operations of R 0 k j h k that are executed after f by x places (that is, for each f + x < y, y = y?x . Then, we map 0 k+1 to the vacant place: f+1 =c&s(a ! b), f+2 =c&s(a ! c 1 ),: : :, f+x =c&s(a ! c x?1 ). Since f : : : f+x all fail and since there are no operations of the emulator in f+x+1 ; : : : k+x the insertion into R 0 k j h k do not e ect the legality of the run. As the invocation part of f ; : : :; f+x are legal (they all occur after the history read) and as the response is correct (the rst virtual process fail stops and all others return b) point 1 is correct. Point 2 is correct because h k+1 = h k and the value in the compare&swap didn't change. To conclude this case case we remark that if a history read operation is skipped over then the matching history write is not and vice versa. This means that every pair is mapped only once to exactly one c&s(). As ignoring operations is obviously legal the claim is correct of all the cases.
A point of interest is that although in the protocol the emulation of a successful c&s(a ! b) operation occurs only after all the virtual processes of some emulation were about to execute a c&s(a ! ) operation, it is not a requirement for correctness.
To complete the proof of the claim, we rst note that every emulator decides on the value that its rst virtual process decides upon. Also, in every run R 0 all virtual processes whose operations correspond to the same R 0 j h decide on the same value. But, as there are at most (k ? 1) d di erent h and as every virtual process must belong to some such , the emulation implements a (k ? 1) d -set consensus among n d+1 emulators.
Conclusions
This paper addresses the dependency between the size of a shared memory object and its ability to e ciently solve consensus algorithms. We presented an algorithm for LE using a compare&swap and proved that it has optimal time-space tradeo . Moreover, we conjecture that, given a compare&swap-(k), there is no algorithm that can solve LE among more then O(k!) processes (that is, that our algorithm is computability optimal). The computability problem was addressed in our paper 21] where we succeeded only in showing a much higher bound. We showed that given a compare&swap-(k) object, there is no algorithm that can solve LE among O(2 k 2 ) processes.
In reaching our results, we developed a new method for reducing leader election algorithms among n processes that share strong synchronization objects to l-set consensus algorithms among e < n processes that share only read/write registers (which immediately implies that e = l). Our use of the reduction method exempli es its exibility.
Several extensions come to mind. For example, all our results can be generalized to RMW objects and in fact, perhaps to any linearizable object. Herein, we focus on algorithms that use one copy of the strong register. We claim that one copy of any shared memory register of k values is as strong as several copies of the same object of k i values each, where k Q k i . This result is immediate for a general RMW object. The generalization of our register reduction method to systems with several strong objects is one direction to proceed in (perhaps by managing a separate set of history variables for each strong register). Extending our complexity results to k-set algorithms is straightforward.
