The "wait-free hierarchy" classifies multiprocessor synchronization primitives according to their power to solve consensus. The classification is based on assigning a number n to each synchronization primitive, where n is the maximal number of processes for which deterministic wait-free consensus can be solved using instances of the primitive and read write registers. Conditional synchronization primitives, such as Compare-and-Swap and Load-Linked/StoreConditional, can implement deterministic wait-free consensus for any number of processes (they have consensus number ∞), and are thus considered to be among the strongest synchronization primitives; Compare-and-Swap and LoadLinked/Store-Conditional have consequently became the synchronization primitives of choice, and have been implemented in hardware in many multiprocessor architectures.
INTRODUCTION
A wait-free implementation of a concurrent object is one that guarantees that any process can complete an operation in a finite number of its own steps. Wait-freedom implies lock-freedom, thus it also provides for better robustness, as processes coordinate without using locks, and so the inherent problems of locks, such as deadlock, convoying, and priority-inversion, are avoided. In 1991, Herlihy introduced his influential "wait-free hierarchy" [7] , which classifies multiprocessor synchronization primitives according to their power to solve consensus. The classification is based on assigning a number n to each synchronization primitive, where n is the maximal number of processes for which deterministic wait-free consensus can be solved using instances of the primitive and read-write registers. The strongest synchronization primitives according to this classification are those that can solve consensus for any number of processes, and hence have a consensus number of ∞.
Synchronization primitives such as Compare-and-Swap (denoted CAS ) and Load-Linked/Store-Conditional (denoted LL/SC ) have consensus number ∞ and are thus among the strongest primitives according to Herlihy's hierarchy; to some extent because of that, they have became the synchronization primitives of choice, and have been implemented in hardware in many multiprocessor architectures. (For example, CAS has been implemented in Motorola 680x0, IBM 370 and SPARC Ö architectures; LL/SC has been implemented in MIPS Ö , PowerPC Ö and DECAlpha Ö architectures.) CAS and LL/SC belong to a class of primitives which is called by the slightly misleading name conditional synchronization primitives (or simply conditionals). Conditionals are primitive operations that modify the value of the register r on which they operate only if its value, just before the operation starts, is a specific value, vw, that depends on the input, w, of the conditional operation. As an example, if some process issues a CAS(addr, old, new) operation, the operation changes the contents of the register addr to new only if addr's value just before the operation starts equals old ; otherwise, the operation is not visible to other processes (except that it might have an adverse effect on memory contention and process latency).
Several researchers have previously obtained results suggesting that conditionals are no stronger than read-write registers in the context of mutual exclusion. Cypher [4] has obtained a lower bound of Ω(log log n/ log log log n) remote memory references for n-process mutual exclusion based on read-write registers and registers that support conditional operations. This bound was later improved by Anderson and Kim [1] to Ω(log n/ log log n) remote memory references. In their survey of shared-memory mutual exclusion [2] , Anderson and Kim comment on the "...unexpected weakness" of conditionals in general, and CAS in particular, which is "...still widely regarded as being the most useful of all primitives to provide in hardware". Our work shows that conditionals are relatively weak not just in the context of mutual exclusion time-complexity. Rather, we show that conditionals are inefficient in terms of memory space for implementing many widely used distributed objects. Collectively, these results imply that basing multiprocessor synchronization solely on conditional synchronization primitives might not be the best design choice.
In the article that introduced covering arguments [3] , Burns and Lynch obtain a lower bound of n on the number of readwrite registers required to solve mutual exclusion. In the context of mutual exclusion, our work extends their work for starvation-free mutual exclusion to obtain bounds for implementations that use conditionals. To do this, we use generalized covering arguments that can be applied to conditionals. Some of the ideas that we use in our proofs for long-lived objects are also similar to those used in their lower bound proof.
Fich, Herlihy, and Shavit, [6] obtain a lower bound of Ω( √ n) on the space complexity of randomized wait-free implementations of n-process consensus that use historyless objects. These are objects, such as read-write registers, test&set, or swap, whose state depends only on the last nontrivial operation (for example, write) that was applied to them. Jayanti, Tan, and Toueg [10] show time and space lower bounds of n − 1 for lock-free implementations of several objects (e.g. counters and k-valued CAS ) from historyless objects and resettable consensus. Note that test&set is a historyless conditional.
A long-lived object is one whose operations may be called by a process any number of times. In contrast, the operations of one-time objects may be called by a process only once. For one-time objects, we show tradeoffs between memory space and the number of writes, and between memory space and the number of memory stalls incurred by highlevel operations. The concept of memory stalls was defined by Dwork, Herlihy, and Waarts, in a paper [5] that introduced a formal model to capture the phenomenon of memory contention in shared memory multiprocessors.
Our Results
Our results apply to implementations of any object that supports some operation that must perform a "visible" write before it terminates. This is a large class of concurrent objects that includes well-known objects such as counters (both linearizable and non-linearizable), stacks, queues, swap, fetchand-add, and single-writer snapshot. For all such long lived objects, we show the following.
• Any n-process wait-free implementation that uses either only read-write registers or registers that support only conditional operations, requires at least n registers;
• Any n-process wait-free implementation that uses registers on which both read-write and conditional operations can be applied, requires at least n 2 such registers.
For all such one-time objects, we show the following tradeoffs for any wait-free implementation that uses m < n registers that support only read, write and/or conditional operations.
• Either m > √ n, or the amortized number of writes performed by high-level operations is in Ω( √ n);
• Either m > n 2 3 , or the amortized number of memory stalls incurred by high-level operations is in Ω(n 2 3 ).
All of the above results apply also to starvation-free implementations of mutual exclusion. These results apply to both deterministic and randomized implementations, even if they use registers of unbounded size.
The rest of the paper is organized as follows. In section 2, we present our model of shared-memory systems, and provide a formal definition of conditional primitives. In Section 3 we prove space lower bounds for long-lived objects. In Section 4 we prove time/space tradeoffs for implementations of one-time objects. Section 5 concludes with a discussion of the results.
PRELIMINARIES

Shared Memory System Model
Our model of an asynchronous shared memory system is largely based on the model described by Cypher in [4] , which is based, in turn, on the model given by Merritt and Taubenfeld [11] . Shared objects are specified as in [7] . An object is an abstract data type. It is characterized by a set of possible values and by a set of operations that provide the only means to manipulate it. A shared memory protocol provides a specific data-representation for the object, and a specific implementation for each of its operations. An nprocess shared memory protocol consists of a non-empty set E of executions, a set P of n processes, a set R of shared memory registers, and a function I that assigns an initial value to each register in R. Any register may support one or more operations including Read, Write, and various types of atomic Read-Modify-Write operations. No bound is assumed on register size (i.e. the number of different possible values the register can have). An execution-fragment is a sequence (either finite or infinite) of events, where an event is an operation that a specific process applies to a specific register. An execution is an execution-fragment that starts from the initial state. An event e can have one of the following three forms:
• Read(p,r ): indicates that process p atomically reads the value of register r ;
• Write(p,r,w ): indicates that process p atomically writes the value w to register r ;
• RMW(p,r,w,g,h): indicates that the code shown in Figure 1 (a) is atomically performed on behalf of process p. The write function, g, and the return-value function, h, both receive two parameters, v, the value of register r when event e starts to execute, and w, the input value to the RMW event. The type of a RMW operation is specified by the ordered pair <g,h> consisting of its write and return-value functions.
1
As an example, the write and return-value functions of the CAS operation are shown in Figure 1 (b). We next define the class of conditional RMW primitive operations we investigate in this paper. 
Let Op =<g,h> be a conditional operation and let e = RM W (p, r, w, g, h) be an event of type Op. We call the single value vw = g(vw, w) the change-point of e. We call any other value v, v = vw, a fixed-point of e.
In other words, a RMW operation is conditional if, for every input w, there is exactly one value vw such that g(vw, w) = vw. If g and w are such that g(v, w) = v for all values of v, then the event RMW(p,r,w,g,h) is essentially an event that never writes. For any event e = RM W (p, r, w, g, h), 1 Our definition of a RMW operation-type is a generalization of the definition of comparison primitives that appears in Section 5 of [1] . the value vw is the only value of r that will be changed by e. We observe that CAS and Test-and-set are conditional operations by Definition 2.1. Following [9] , we model the Load-Linked (called LL) and Store-conditional (called SC ) operations as follows. The state of every shared register is a structure, r, that contains two fields: r.value and r.processSet. When some process, p, performs an LL operation on some register r, p is added to r.processSet and LL returns r.value. When p performs an SC(v) operation on r, then there are the following two possibilities: (1) If p ∈ r.processSet, the SC operation sets r.processSet ← φ, sets r.value ← v and returns (true, v) . In this case we say the SC was successful. (2) Otherwise, SC returns (false, r.value), in which case we say the SC was unsuccessful. Modelled this way, it is easily seen that LL is equivalent to a Read and SC is a conditional by Definition 2.1.
We call a register that supports only Read and Write operations a read-write register ; we call a register that supports only conditional operations a conditional register ; we call a register that supports Read, Write, and one or more conditional operations a read-write-conditional register.
We denote the empty execution by . Given any event e, we say that e is a writing event if e is a Write event or if e is a RM W event; if e is a Read event or if e is a RM W event, we say that read(e) holds.
The memory register accessed (read and/or written) by e is denoted mem(e). The process that executes e is denoted proc(e). For any two execution-fragments E and E , where E is finite, the execution-fragment E • E denotes the concatenation of E and E . For any execution-fragment E, we define procs(E) to be the set of processes that perform some event in E. Let r ∈ R be a memory register and let E ∈ E a finite execution, then value(E, r) denotes the last value written by an event in E to r, or I(r) if there was no such event. Given an execution E and any subset P ⊆ P, we let proj(E, P ) denote only those events in E that were performed by processes in P . If P = {p}, we also use the notation proj(E, p) instead of proj(E, {p}). Let E ∈ E be a finite execution, and e be an event. If E • e ∈ E holds, we say that e is enabled after E. An execution-fragment E is fair, if any process that has an enabled event just before E starts is in procs(E). An execution is fair if all of its suffixes are fair.
High Level Operations
The execution of a high-level object operation involves, in general, both private-and shared-memory events; in this paper we only deal with shared-memory events. Thus, we view an execution of a high-level operation Op as consisting of a sequence of one or more atomic shared memory events, each of which can be either Read, Write, or RMW. Let Op be an execution of a high-level operation performed by some process in some execution E. We assume processes execute at most a single high-level operation at any given time. We denote by proc(E, Op) the process that executes Op in E; we denote by events(E, Op) the sequence of memory-events performed by proc(Op) while executing Op in E. Whenever E is clear from the context, we simply write proc(Op) and events(Op). We say that a process p is active after E, and write active(E, p), if, after E, p is in the middle of executing some high-level operation Op, i.e. p performed at least one event of an instance of Op, but has not performed the last event of this instance of Op. If p is not active after E, we say that p is idle after E, and write idle (E, p For presentation simplicity, we often refer to an underlying execution as a state. Thus, for example, instead of saying that a high-level operation Op is enabled after execution E, sometimes we say that after E the system is in a state where Op is enabled. If S is the system state after E, we also use the notation read solo(S,Op) instead of read solo(E,Op).
LONG-LIVED OBJECTS
In this section, we obtain lower bounds on the memory requirements of wait-free implementations of long-lived objects in a class of objects that we call Visible(n), when implementations use registers that only support Read, Write and/or conditional operations. We also consider starvationfree implementations of mutual exclusion from such registers.
First, we define the key concepts of invisible and visible events. Informally, an invisible event is a writing event by some process that cannot be observed by other processes. An invisible event was termed an obliterated event in [3] . It was defined there only for read-write registers. We extend this notion to conditional RMW events.
The following two definitions formally define the concepts of invisible RMW and Write events.
We say that e is invisible in E, and write invisible(E, e), if at least one of the following holds:
• g(value(E1, r), w) = value(E1, r);
e is a Write event and mem(e ) = r.
In other words, a RM W event e is invisible, if either the value of the register on which e operates is a fixed-point of e, or if e is immediately followed by a Write event on the same register. We note that e would also be made invisible to other processes in an execution where it is followed by a sequence of events on other registers that terminates with a Write event applied to r; however our proofs do not require this broader definition. • e is a Write event and mem(e ) = r;
In other words, a Write event e, issued by some process p, is invisible in an execution E, if it is overwritten in E by another Write event before it is read by another process. Note that whereas Write events can only be made invisible by subsequent Write events, RMW events can be made invisible either by preceding writing events, or by an immediately following Write event. If e is a Write or RMW event that is not invisible in E, we say that e is a visible event in E.
We next define the class Visible(n), to which our results apply. It is easily proven that many key concurrent objects are in Visible(n): counters, stacks, queues, swap, fetch-and-add, and single-writer snapshot are some examples. As one example, we prove in the following that concurrent counting is in Visible(n).
A concurrent counter object supports a single fetch-andincrement (F AI) operation. The counter values returned by different F AI operations in an execution are required to be distinct natural numbers. It is also required that, in any quiescent state, the values returned by the counter constitute a contiguous range of natural numbers. Proof. Assume otherwise by way of contradiction. Let P be a wait-free protocol implementing a concurrent counter. By assumption, there is at least one execution E of P such that some FAI operation F ⊂ E, performed by some process p, terminates in E and does not perform a visible write. We assume without loss of generality that E is quiescent. (Otherwise, as P is wait-free, we can extend E by letting all active processes run until they finish their high-level operations.) Let V be the number of F AI operations completed in E (including F ). Then the counter must have returned the values 1, . . . , V . We now extend E by a solo execution-fragment, E1, of another FAI operation, F1, by process q = p. F1 must return value V + 1. However, by assumption, there is another execution of P , E , such that p has no events in E and the following holds:
Consequently, there exists an execution E2 = E • F1 and F1 returns V + 1 also in E2. Note that in E the counter has returned only V − 1 values; it follows that, after E2 terminates, we are in a quiescent state and the values returned by the counter do not constitute a contiguous range, contradicting the definition of a counter. 2 Linearizable concurrent counters are also required to be linearizable [8] , i.e. if F AIi and F AIj are two executions of the F AI operation, and F AIi entirely precedes F AIj in an execution E, then result(F AIi, E) < result(F AIj, E) must hold. The proof of Lemma 3.1 applies for both linearizable and non-linearizable concurrent counters, thus both counter types are in Visible(n).
The following definition is key to our proofs. 
.•e σ(i |I|) , ej)
We call A a k-levelled-sequence in S.
The organization of the proofs for long-lived objects is as follows. We first prove that all wait-free implementations of objects in V isible(n) and all starvation-free implementations of mutual exclusion can be brought to an nlevelled state. We then prove that any protocol that uses only read-write registers or only conditional registers, and reaches a k-levelled state, uses at least k such registers, and that any protocol that uses only read-write-conditionals registers, and reaches a k-levelled state, uses at least k 2 such registers. Combining these results, we obtain the space lower bounds for long-lived objects.
The following lemma proves that wait-free implementations of objects in V isible(n) can be made to reach n-levelled states.
Lemma 3.2. Let P be a wait-free protocol implementing some object O ∈ V isible(n). Then P can be brought to an n-levelled state.
Proof. In the following, we denote the execution of a high-level operation Op by process pi, 1 ≤ i ≤ n, as Opi. We construct an execution E incrementally, in n phases. In phase i, we extend the execution so that pi is on the verge of performing the first visible writing event (within its current high-level operation) that cannot be made invisible by processes pi+1, . . . , pn. Figure 2 shows pseudo-code describing the construction of E. We next prove that the following claims hold for each phase i, 1 ≤ i ≤ n. The proof proceeds by induction on the phase number, i. Clearly, before the first phase all of the above claims hold vacuously. We now assume the claims hold after the termination of phase i − 1, and prove for phase i. We first show that the construction of phase i terminates. From wait-freedom, the read-solo execution by process pi (line 4) is finite. By induction hypothesis, before phase i begins, pi is idle, and so, as O is in V isible(n), Opi must perform a visible write in the course of phase i. Consequently, the read-solo execution by pi must terminate with a pending writing event, which we denote ei. Next, we show that the phase construction eventually reaches line 9. There are two cases to consider:
The construction of phase
1: E ← 2: for (i = 1; i ≤ n; i++) // Construct E in n phases. 3: { construct-read-solo: • If there is no extension E1 • ei • E2 of E, with procs(E1), procs(E2) ⊆ {pi+1, . . . , pn} that makes ei invisible, then the construction reaches line 9 immediately.
• Otherwise, there is an extension E1 • ei • E2 (possibly with an empty E1 and/or an empty E2), with
and jump to construct-read-solo. As P is wait-free, and as every jump to construct-read-solo implies that Opi performs at least one additional event, we can only jump to construct-read-solo a finite number of times, after which we must reach line 9.
Having reached line 9, if i < n, we extend E with the execution-fragment E3, where in E3 all of the active processes in {pi+1, . . . , pn}, if any, complete their high-level operations. The existence of such a finite execution-fragment E3 is guaranteed from the wait-freedom of P ; thus, claims 1 and 3 are proven for phase i. Claim 2 follows directly from the construction and from the induction hypothesis. Let S be the state after the execution of E. By construction, for every 1 ≤ i ≤ n, visible(S • E1 • ei • E2, ei) holds for any execution-fragments E1, E2 consisting of any ordering of any disjoint subsets of the events ei+1, . . . , en. Thus A = e1, . . . , en is an n-levelled-sequence in S, with ei having level i, and S is an n-levelled state.
Next we show the same result for starvation-free mutual exclusion. We model mutual exclusion as follows. A mutual exclusion protocol supports a single operation, called Mutex, which has the structure shown in Figure 3 . The non-critical section is assumed to take place outside of Mutex operations. It is assumed that no process halts while executing the Mutex operation. The requirements from a mutual exclusion implementation are the following:
• Exclusion: at most one process executes its critical section at any time.
Mutex() { Entry Section Critical Section Exit Section } Figure 3 : Structure of the mutual-exclusion operation.
• Starvation-freedom (also known in the literature as lockout-freedom): in any fair execution, if some process is in its entry section, then that process eventually executes its critical section.
• Finite exit: in any fair execution, if a process is in its exit section, then it eventually exits the Mutex operation.
We need the following technical lemma.
Lemma 3.3. Let P be a starvation-free mutual exclusion protocol, and let E be an execution of P where some process p enters the critical section, then E contains a visible writing event by p.
Proof. Assume otherwise by way of contradiction, then there is some execution E after which p is in the critical section, and E does not contain any visible write by p. Consequently, there exists another execution, E , that does not contain any events by process p, such that:
From starvation-freedom, there is some extension E of E , such that some process q = p enters the critical section in E • E and p / ∈ procs(E • E ). Thus, q enters the critical section also in the execution E • E , and exclusion is violated, a contradiction.
Lemma 3.4. Let P be a long-lived starvation-free mutual exclusion protocol. Then P can be brought to an n-levelled state.
The proof of Lemma 3.4 is almost identical to that of Lemma 3.2. An execution that results in an n-levelled state is constructed in exactly the same way. The only differences between the two proofs are that here we use starvationfreedom instead of wait-freedom and we use Lemma 3.3 instead of membership in Visible(n).
We next prove that any protocol that reaches a k-levelled state and uses registers that support only Read, Write and/or conditional operations, uses Ω(k) such registers.
Lemma 3.5. Assume that a protocol P can be brought to a k-levelled state S for some k > 0. Let SP ACE(P ) denote the number of registers used by P .
If P uses only read-write registers, then
SP ACE(P ) ≥ k.
If P uses only conditional registers, then
If P uses only read-write-conditional registers, then
Proof. Let S be a k-levelled state with a k-levelled sequence A = e1 · · · e k . Let pi = proc(ei) and let R = {mem(ei)|i = 1, . . . , k}. Note that no two Write events can be pending on the same register because any of these two events makes the other invisible. Thus, a lower-level event can be made invisible by a higher-level event, contradicting our assumption that A is a k-levelled sequence of S. This proves (1).
Next we show that no two conditional RMW events can be pending on the same register. Assume otherwise. Let ei = RM W (pi, r, wi, gi, hi) and ej = RM W (pj, r, wj , gj, hj ) be two conditional RMW events pending on the same register r ∈ R. Since A is a k-levelled sequence in S, both visible(S • ei, ei) and visible(S • ej , ej) hold. Consequently value(S, r) is the only change-point of both ei and ej . This implies that either of the events ei or ej can make the other invisible, contradicting our assumption that A is a k-levelled sequence of S. This proves (2) .
No two Write events can be pending on the same register and no two conditional RMW events can be pending on the same register. It follows that at most two events can be pending on any single register r ∈ R: one Write and one conditional RMW. This proves (3).
Our main result follows directly from the above lemmata.
Theorem 3.6. Let P be an n-process protocol that is a starvation-free implementation of mutual exclusion or a waitfree implementation of an object in Visible(n). Let SP ACE(P ) denote the number of registers used by P .
• If P uses only read-write registers, then SP ACE(P ) ≥ n.
• If P uses only conditional registers, then SP ACE(P ) ≥ n.
• If P uses only read-write-conditional registers, then
Proof. Immediate from Lemmata 3.2, 3.4 and 3.5
ONE-TIME OBJECTS
A one-time object is an object whose operations can be called by a process only once. It is easily seen that the memory bounds proven in Section 3 for long-lived objects do not hold, in general, for one-time objects. As an example, a one-time wait-free counter shared by n processes can be implemented by using a single register that supports the CAS operation. However, by Theorem 3.6, a long-lived wait-free counter requires at least n such registers. For one-time objects, we prove time-space tradeoffs for wait-free implementations that use only read-write-conditional registers. Our proofs apply for a class of objects defined as follows. It is easily seen that one-time counters, stacks, queues, swap, fetch-and-add, and single-writer snapshot objects are in V isible (n). We note that the following tradeoffs also apply to starvation-free implementations of one-time mutual exclusion.
We first need the following lemma, that states that when multiple conditional events and Write events are pending on the same register, they can always be scheduled such that at most two events are visible. Let L1 contain all Write events in L, and let L = L \ L1 be the non-empty set of conditional events in L. Let cur be the value of r in state S, and consider any event e ∈ L , e = RM W (p, r, w, g, h). As < g, h > is conditional, there is only a single value v such that g(v, w) = v; we denote this value by old(e). We partition the events of L to the following two disjoint sets: L = L2 ∪ L3, where L2 = {ej ∈ L |old(ej) = cur}, and L3 = L \ L2. In any schedule where the events of L2 are performed first (in any order) followed by all the events of L3 (in any order), we have at most a single visible event (namely the first event scheduled from L3, if any). If L1 is non-empty, we schedule the events in it last (in any order), and the last of these to be scheduled may also be visible.
The following theorem states a tradeoff between the number of read-write-conditional registers used by a protocol and the number of write events it may have to perform. Theorem 4.2. Let O be an object in Visible'(n), and let P be an n-process wait-free implementation of O that uses only read-write-conditional registers. If P uses at most m < n registers, then P has an execution E in which the total number of writing events performed is in Ω(
Proof. For simplicity, we assume that 2·m divides n. As O is in V isible (n), the high-level operations performed by the n processes cannot terminate before they perform a visible write. We construct an execution E in phases. In each phase, we bring all processes to be on the verge of writing. As P can only apply Writes and conditional RMW operations to registers, from Lemma 4.1, an adversarial scheduler can arrange the writing events on each register r so that at most two of these events are visible in E at r. Consequently, at most 2m processes perform a visible write during the phase. We let these processes run to completion. In the second phase there remain n − 2m processes that did not perform a visible write. The same argument can be applied to them. We continue in this manner, until all processes have performed a visible write. Denoting the total number of writes made in E by W (P ), we get: 
The results of Theorems 4.2 and 4.3 can be strengthened, by counting not only the number of writes, but also the number of memory stalls caused from write contention. In all shared-memory systems, when multiple processes attempt to write to the same memory location simultaneously, the writes are serialized, and waiting operations incur memory stalls. The concept of memory stalls was introduced in [5] . We use the following definition of memory stalls, which is stricter than that of [5] , as it counts only stalls caused by contention in writing. We obtain a tradeoff between space complexity and the number of memory stalls incurred by a wait-free implementation of any object in V isible (n) that uses only read-writeconditional registers. Proof. For simplicity, and without loss of generality, we assume that 2 · m divides n. We consider the same type of phased execution E we constructed in Theorem 4.2. We denote by MS(E) the total number of memory stalls incurred by processes during the execution of E; we denote by M k (E), for 1 ≤ k ≤ n/2m, the total number of memory stalls incurred by processes on account of their writing events in phase k, and for 1 ≤ i ≤ m we denote by W riters(E, k, i) the number of processes that write to register ri in phase k of E. We get:
It is easily seen that M k (E) is minimized in the equations above when the writes are equally distributed among registers (i.e. every register is written by n−2·k·m m processes), thus we get:
Summing over all phases, we get:
From Theorem 4.4 we get the following, more specific, tradeoff between space and number of memory steps: 
DISCUSSION
Conditional synchronization primitives are among the strongest primitives according to Herlihy's wait-free hierarchy, as they can be used to implement deterministic wait-free consensus for any number of processes. This paper shows that conditional synchronization primitives are relatively inefficient in terms of memory space for wait-free implementations of most non-trivial objects. Our results apply to starvation-free implementations of mutual exclusion and to wait-free implementations of objects in Visible(n), a class that contains objects supporting an operation that must perform a visible write before it terminates. In contrast, starvation-free mutual exclusion and some of the key objects in Visible(n) can be implemented using O(1) registers that support other synchronization primitives, such as fetchand-increment.
Several researchers have previously obtained results that indicate the weakness of conditional primitives. Anderson and Kim [1] , continuing the work of Cypher [4] , proved a Ω(log n/ log log n) remote memory references lower bound on starvation-free mutual exclusion implementations that use CAS and LL/SC, "almost" equal to the log n remote memory references upper bound that uses read-write registers. In contrast, starvation-free mutual exclusion can be implemented in O(1) remote memory references by using registers that support fetch-and-increment. Jayanti [9] considers wait-free object implementations that use registers that support read/write and any of the following operations: LL/SC, swap, and move. He proves a lower bound of Ω(log n) on the worst-case latency of any such implementation for counters, stacks and queues. This is yet another indication of the weakness of LL/SC in the context of waitfree implementations.
Multi-valued objects are objects such that different operations (called by different processes) may return different, yet related, values. Consensus is the key concurrent object that is not multi-valued. The aforementioned results, viewed collectively, imply that even though CAS and LL/SC are strong in the context of solving deterministic wait-free consensus, they are relatively weak in the context of wait-free implementations of most multi-valued objects. The conclusion is therefore, that basing multiprocessor synchronization solely on conditional synchronization primitives is probably not a good design choice.
