We de ne a class of operations called pseudo read-modify-write (PRMW) operations, and show that nontrivial shared data objects with such operations can be implemented in a bounded, wait-free manner from atomic registers. A PRMW operation is similar to a \true" read-modify-write (RMW) operation in that it modi es the value of a shared variable based upon the original value of that variable. However, unlike an RMW operation, a PRMW operation does not return the value of the variable that it modi es. We consider a class of shared data objects that can either be read, written, or modi ed by an associative, commutative PRMW operation, and show that any object in this class can be implemented without waiting from atomic registers. The implementations that we present are polynomial in both space and time and thus are an improvement over previously published ones, all of which have unbounded space complexity.
Introduction
The implementation of shared data objects is a subject that has received much attention in the concurrent programming literature. A shared data object is a data structure that is shared by a collection of processes and is accessed by means of a xed set of operations. Traditionally, shared data objects have been implemented by using mutual exclusion, with each operation corresponding to a \critical section." Although conceptually simple, such implementations su er from two serious shortcomings. First, they are not very resilient: if a process experiences a halting failure while accessing such a data object, then the data object may be left in a state that prevents subsequent accesses by other processes. Second, such implementations may unnecessarily restrict parallelism. This is especially undesirable if operations are time-consuming to execute.
As a result of these two shortcomings, there has been much interest recently in wait-free implementations of shared data objects. An implementation of a shared data object is wait-free i the operations of the data object are implemented without any unbounded busy-waiting loops or idle-waiting primitives. Wait-free shared data objects are inherently resilient to halting failures: a process that halts while accessing such a data object cannot block the progress of any other process that also accesses that same data object. Wait-free shared data objects also permit maximum parallelism: such a data object can be accessed concurrently by any number of the processes that share it since one access does not have to wait for another to complete.
One of the major objectives of researchers in this area has been to characterize those shared data objects that can be implemented without waiting in terms of single-reader, single-writer, single-bit atomic registers. An atomic register is a shared data object consisting of a single shared variable that can either be read or written in a single operation 21] . An N-reader, M-writer, L-bit atomic register consists of an L-bit variable that can be read by N processes and written by M processes. It has been shown in a series of papers that multi-reader, multi-writer, multi-bit atomic registers can be implemented without waiting in terms of single-reader, singlewriter, single-bit atomic registers 6, 9, 10, 17, 18, 21, 22, 24, 26, 27, 28, 29, 30] . This work shows that, using only atomic registers of the simplest kind, it is possible to solve the classical readers-writers problem without requiring either readers or writers to wait 14] .
Another shared data object of interest is the composite register, a data object that generalizes the notion of an atomic register. A composite register is an array-like shared data object that is partitioned into a number of components. As illustrated in Figure 1 , an operation of such a register either writes a value to a single component, or reads the values of all components. Afek et al. 2] and Anderson 3, 4] have shown that composite registers can be implemented from atomic registers without waiting. This work shows that, using only atomic registers of the simplest kind, it is possible to implement a shared memory that can be read in its entirety in a single \snapshot" operation, without resorting to mutual exclusion.
In this paper, we consider the important question of whether there exist other nontrivial shared data objects that can be implemented from atomic registers without waiting. We de ne a class of operations called pseudo read-modify-write (PRMW) operations and consider a corresponding class of shared data objects called PRMW objects. This class of objects includes such fundamental objects as counters, shift registers, and multiplication registers. A PRMW object consists of a single shared variable that can either be read, written, or modi ed by an associative, commutative PRMW operation. The PRMW operation takes its name from the classical readmodify-write (RMW) operation as de ned in 19] . The RMW operation has the form \temp; X := X; f(X)," where X is a shared variable, temp is a private variable, and f is a function. Executing this operation has the e ect of modifying the value of X according to f, and returning the original value of X in temp. The PRMW operation has the form \X := f(X)," and di ers from the RMW operation in that the value of X is not returned.
We prove that any PRMW object can be implemented from atomic registers in a wait-free manner. We establish this result by rst considering the problem of implementing a counter without waiting. A counter is a PRMW object whose value can be read, written, or incremented by an integer value. 1 We rst show that counters can be implemented from composite registers without waiting, and then show that our implementation can be generalized to apply to any PRMW object. Given the results of 2, 3, 4] , this shows that any PRMW object can be implemented without waiting using only atomic registers. Our results stand in sharp contrast to those of 5, 15] , where it is shown that RMW operations cannot, in general, be implemented from atomic registers without waiting.
The problem of implementing PRMW objects without waiting has been studied independently by Aspnes and Herlihy in 7] . Aspnes and Herlihy give a general, wait-free implementation that can be used to implement any PRMW object. A counter implementation, which is obtained by optimizing the general implementation, is also given. Both of these implementations have unbounded space complexity: the rst uses a graph of unbounded size to represent the history of the implemented data object, and the second uses unbounded timestamps. Our counter implementation and its generalization are polynomial in space and time.
The rest of the paper is organized as follows. In the next section, we formally de ne the problem of implementing a counter from composite registers. The counter implementation mentioned above is described in Section 3. The correctness proof for the implementation is given in Section 4. In contrast to almost every other paper in the literature on wait-free algorithms (including some written by the rst author), our proof is assertional rather than operational. Thus, as a secondary contribution, this paper serves as an example of how to apply assertional techniques in reasoning about wait-free implementations, a task that is complicated by the fact that the primary correctness condition for such implementations (i.e., linearizability 16]) refers to operational concepts such as histories, operations, and precedence relationships. In Section 5, we discuss several issues pertaining to our implementation, and show that the implementation can be generalized to implement any PRMW object. Concluding remarks appear in Section 6.
Problem De nition
In this section, we consider the problem of implementing a counter from composite registers, and give the conditions that such an implementation must satisfy to be correct. An implementation consists of a set of N processes along with a set of variables. Each process is a sequential program comprised of atomic statements. Informally, an atomic statement is a language construct whose execution is semantically indivisible. We assume a repertoire of atomic statements that is typical of most sequential programming languages. We do not give a complete list of such statements, but do give a restriction below that limits the manner in which atomic statements may a ect the variables of an implementation.
Each process of an implementation consists of a main program and three procedures, called Read, Write, and Increment. Each such procedure has the following form:
procedure name(inputs) body; return(outputs) end where name is the name of the procedure, inputs is an optional list of input parameters, outputs is an optional list of output parameters, and body is a program fragment comprised of atomic statements. The Read, Write, and Increment procedures of a process constitute its interface with the implemented counter. The Read procedure is invoked to read the value of the counter; the value read is returned as an output parameter. The Write procedure is invoked to write a new value to the counter; the value to be written is speci ed as an input parameter. The Increment procedure is invoked to increment the value of the counter; the value to add is given as an input parameter. A process invokes its procedures only from its main program. We leave the exact structure of each process's main program unspeci ed, but do assume that each process repeatedly invokes its three procedures in an arbitrary, serial manner. As an example of the syntax we use for specifying the variables and procedures of an implementation, see Figures 2 and 3.
Each variable of an implementation is either private or shared. A private variable is de ned only within the scope of a single process, whereas a shared variable is de ned globally and may be accessed by more than one process. For simplicity, we stipulate that each process may access shared variables only within its Read, Write, and Increment procedures, and not within its main program. The procedures and variables of each process are required to satisfy the following two restrictions.
Atomicity Restriction: Each shared variable is required to correspond to a component of some composite register. Thus, each atomic statement within a procedure may either write a single shared variable or read one or more shared variables, but not both. In the latter case, the shared variables must all correspond to the components of a single composite register. ! where s 0 is an initial state. We assume that in any history, the rst event of each process occurs as the result of executing an atomic statement in that process's main program. (In other words, each process's program counter must be initialized so that it equals a location within the main program of that process.)
The subsequence of events in a history corresponding to a single procedure invocation is called an operation. An operation of a Read (respectively, Write or Increment) procedure is called a Read operation (respectively, Write operation or Increment operation). 2 Each history de nes an irre exive partial order on operations: an operation p precedes another operation q in this ordering i each event of p occurs before all events of q in the history.
As mentioned above, each Read procedure has an output parameter that returns the value read from the counter; the value returned by a Read operation is called the output value of that operation. As also mentioned above, each Write (Increment) procedure has an input parameter that speci es the value to be written (added) to the counter; the value written (added) to the counter by a Write (Increment) operation is called the input value of that operation. We designate one integer value as the initial value of the implemented counter.
An operation of a procedure P in a history is complete i the last event of the operation occurs as the result of executing the return statement of P. A history is well-formed i each operation in the history is complete.
Given this terminology, we are now in a position to de ne what it means for a history of an implementation to be \linearizable." Linearizability provides the illusion that each operation is executed instantaneously, despite the fact that it is actually executed as a sequence of events. It can be shown that the following de nition is equivalent to the more general de nition of linearizability given by Herlihy and Wing in 16] , when restricted to the special case of implementing a counter.
Linearizable Histories: A well-formed history h of an implementation is linearizable i the partial order on operations de ned by h can be extended to a total order such that for each Read operation r in h, the following condition is satis ed.
If there exists a Write operation w such that w r^:h9v : v is a Write operation :: w v ri, 3 then the output value of r equals that obtained by adding the input value of w to the sum of the input values of all Increment operations ordered between w and r by . 
Counter Implementation
In this section, we present our counter implementation. For now, we assume that the counter stores values ranging over the integers. Later, in Section 5, we consider the case in which the counter stores values over some bounded range. (In the latter case, over ow is a problem.)
The shared variable declarations for the implementation are given in Figure 2 and the procedures for process i, where 0 i < N, are given in Figure 3 . Central to the proof of the implementation is the \history variable" H, which is de ned in Figure 2 . H is used to totally order the operations of the implemented counter, and is one of several auxiliary variables that are used to facilitate the proof of correctness. H is a sequence of tuples of the form (op; pnum; id; val), where op ranges over fREAD, WRITE, INCg, pnum ranges over 0::N ? 1, and id and val are integers. Intuitively, H is a \log" of operations that have been performed on the implemented counter, and each tuple records the \e ect" of a speci c operation. The type of the particular operation is identi ed by the op eld, the process invoking the operation is identi ed by the pnum eld (pnum stands for \process number"), and the id eld is used to di erentiate between operations of the same type by the same process. The val eld is used to record the output value of a Read operation or the input value of a Write or Increment operation. The following notational conventions regarding history variables will be used in the remainder of the paper. may be no occurrences of z in X, in which case X ?z = X.) The symbol ; is used to denote the empty sequence. 2
According to the semantics of a counter, Write and Increment operations change the value of the implemented counter, whereas Read operations do not. This is re ected in the de nition of the function Val, given next. This function gives the \value" of the implemented counter as recorded by a sequence of READ, WRITE, and INC tuples.
De nition of Val: Let i range over 0::N ? 1, let n and v be integer values, let init be the initial value of the implemented counter, and let be a sequence of READ, WRITE, and INC tuples. Then, the function Val is de ned as follows:
The proof of correctness is based upon the following lemma.
Lemma: If the following two conditions hold, then each well-formed history of the implementation is linearizable.
Ordering: During the execution of each operation (i.e., between its rst and last events), an event occurs that appends a single, unique tuple for that operation to H, and this tuple is not subsequently removed from H. To see why this lemma holds, consider a well-formed history h. Let denote the \ nal" value of H in h: i.e., if h is nite, then H = in the nal state of h, and if h is in nite, then is in nite and every nite pre x of is a pre x of H for some in nite sequence of states in h. De ne a total order on the operations in h as follows: p q i p's tuple occurs before q's tuple in . By the Ordering condition, extends the partial precedence ordering on operations de ned by h. By the Consistency condition and the de nition of Val, is consistent with the semantics of a counter. That is, the output value of each Read operation in h equals that obtained by adding the input value of the most recent Write operation according to (or the initial value of the implemented counter if there is no preceding Write operation) to the sum of the values of all intervening Increment operations according to . This implies that h is linearizable.
We justify the correctness of the implementation below by informally arguing that the Ordering and Consistency conditions are satis ed. A formal proof of Consistency (which turns out to be the most signi cant proof obligation) is given in the next section. Before proceeding, several comments concerning notation are in order.
Notational Conventions for Implementations: Each initial state of the implementation is required to satisfy the initially assertion given in Figure 2 . (If a given variable is not included in the initially assertion, then its initial value is arbitrary. Note that each private variable has an arbitrary initial value.) As in the de nition of Val, we use init to denote the initial value of the implemented counter. To make the implementation easier to understand, the keywords read and write are used to distinguish reads and writes of (nonauxiliary) shared variables from reads and writes of private variables. To simplify the implementation, each labeled sequence of statements is assumed to be a single atomic statement. (Each such sequence can easily be implemented by a single multiple-assignment. ) 2
Each of the labeled atomic statements in Figure 3 satis es the Atomicity restriction of Section 2. In particular, no statement writes more than one (nonauxiliary) shared variable, and no statement both reads and writes (nonauxiliary) shared variables. The Wait-Freedom restriction is also satis ed, since each procedure contains no unbounded loops or idle-waiting primitives. With regard to the Atomicity restriction, it should be emphasized that auxiliary variables are irrelevant. These variables are not to be implemented, but are used only to facilitate the proof of correctness: observe that no auxiliary variable's value is ever assigned to a nonauxiliary variable. To make it easier to see how auxiliary variables would be removed from the program text, we have listed each assignment that refers to such variables on a separate line.
We continue our description of the implementation by considering the shared variables as de ned in Figure  2 . There is only one nonauxiliary shared variable, namely the (N + 1)-component composite register Q. Each process i, where 0 i < N, may read all of the components of Q, and may write components Q i] and Q N]. Each component of Q consists of two elds, val and tag. The val eld is an integer \value," and is used to record the input value of an Increment or Write operation. The tag eld consists of two elds, seq and pnum. The seq eld is a \sequence number" ranging over 0::N + 1, and the pnum eld is a \process number" ranging over 0::N ? 1.
As mentioned previously, a number of shared auxiliary variables are also included in the implementation. The most important of the auxiliary variables is the history variable H, described above. Another shared history variable S is used to hold INC tuples for \pending" Increment operations. The role of S is explained in detail below. The shared auxiliary boolean variable Ovlap In this context, \recent" and \subsequent" are interpreted with respect to the total order , which is de ned using the history variable H as explained above. The value of the counter is formally de ned by the following expression. With expression (1) in mind, we now consider how Read, Write, and Increment operations are executed in the implementation. A Read operation simply computes the sum de ned by (1) . Note that, because Q is a composite register, this sum can be computed by reading Q only once.
A Write operation rst computes a new tag value, and then writes its input value and tag value to Q N]. The tag value consists of a sequence number and process number. The sequence number is obtained by rst reading Q, and then selecting a value di ering from any sequence number appearing in the components of Q. Note that, because there are N + 1 sequence numbers appearing in the N + 1 components of Q, and because each sequence number ranges over 0::N + 1, such a value exists.
Each Increment operation of process i is executed in two phases. In both phases, Q is read and then Q i] is written. In each phase, the tag value written to Q i] is obtained by copying the value read from Q N]:tag. If several successive Increment operations of process i obtain the same tag value, then their input values are accumulated in Q i]:val (see the assignments to sum in statement 7). It can be shown that the value assigned to Q i]:val by an Increment operation equals the sum of the input values of all Increment operations of process i that are ordered by to occur after the most \recent" Write operation.
To conclude our description of how Increment operations are executed, we informally describe why two phases are necessary. For the sake of explanation, suppose that we modify the implementation of Figure 3 by removing statements 5 and 6 of the Increment procedure. With this modi cation, each Increment operation consists of only one phase. Now, consider the history depicted in Figure 4 . In this gure, operations are denoted by line segments with \time" running from left to right. Certain statement executions (events) within these operations have been denoted as points, each of which is labeled by the corresponding statement number. In this history, w, w 0 , and w 00 are three successive Write operations of process i, p is an Increment operation of another process j, and r and s are Read operations of arbitrary processes. The scenario depicted in this history is outlined below. Figure 4 . Because Read operation r nishes execution before w 00 writes to Q N], by (1), we must linearize r to precede w 00 . By the given predecence ordering, p precedes r. Hence, by transitivity, we must linearize p to precede w 00 . Now, because the expression Q j]:val 6 = 0^Q j]:tag = Q N]:tag holds after w 00 nishes execution, when s computes the sum de ned by (1), p's input value is included. This implies that we should linearize w 00 to precede p, which is a contradiction.
The above problem arises because p assigns a nonzero input value to Q j]:val and an \old" tag value to Q j]:tag. To illustrate how this problem is handled in our implementation, consider again the history illustrated in Figure 4 , but this time assume that p executes the two-phase Increment procedure of Figure 3 . Because w 00 chooses (15; i) as its tag value, Q j]:tag 6 = (15; i) holds when w 00 reads from Q. By the precedence ordering of Figure 4 , p's rst phase precedes the read by w 00 from Q. This implies that p's rst phase obtains a tag value that di ers from (15; i). Because p obtains di erent tag values in its two phases, it assigns the value 0 to Q j]:val in its second phase. Thus, p's input value is not included in the sum computed by s. This completes our description of how operations are executed in the implementation.
To formally establish the correctness of the implementation, it su ces to prove that the Ordering and Consistency conditions hold. It is straightforward to show that the Ordering condition holds. Each Read and Write operation appends a unique tuple for itself to H (statements 0 and 3) and such tuples are never removed from H. For Increment operations, the situation is slightly more complicated. When an Increment operation p of process i reads Q in its rst phase (statement 5), a unique tuple for that operation is appended to the history variable S. S contains tuples for \pending" Increment operations. The tuple for p is subsequently appended to H either by p itself or by an \overlapping" Write operation. In particular, if a write to Q N] (statement 3) occurs between the execution of statements 5 and 8 by p, then the rst such write removes p's tuple from S and appends it to H. On the other hand, if no write to Q N] occurs in this interval, then p's tuple is removed from S and appended to H when p executes statement 8. To complete the proof, note that an INC tuple can be removed from H only by the operation to which that tuple corresponds (statement 8), and in this case, the tuple is reappended. This implies that Ordering is satis ed.
We henceforth limit our attention to the Consistency condition. In the remainder of this section, we outline the proof of Consistency. The formal proof appears in the next section. To see that Consistency holds, rst observe that the tuples in H may be reordered only when an INC tuple is removed and reappended (see statement 8). However, prior to being removed, such an INC tuple must be part of a sequence of INC tuples followed by a WRITE tuple (see statement 3). By the de nition of Val, removing and reappending such an INC does not invalidate the value of any READ tuple.
To complete the proof, we must show that each READ tuple has a valid value when rst appended to H. That is, we must prove that outval = Val(H) whenever such a tuple is appended to H by the execution of statement 0 by any process. This can be established by proving the invariance of the following assertion. As explained in Section 4, a given assertion is an invariant i it is initially true and is never falsi ed. Establishing the invariance of (2) is the crux of the proof; see assertion (I23) of Section 4.
By the de nition of the initial state, assertion (2) is initially true. Thus, to prove that it is an invariant, we must show that it is not falsi ed by the execution of any statement by any operation. Showing that (2) is not falsi ed by any Increment operation is the most di cult part of the proof. The main thrust of this part of the proof is to show that the following two conditions hold for each Increment operation p: rst, if the execution of statement 8 by p increments the right-side of (2) by the input value of p, then it also appends an INC tuple for p to H; second, if the execution of statement 8 by p leaves the right-side of (2) 
Showing that (2) is not falsi ed by the statements of Read and Write operations is somewhat simpler. The statements that must be considered are those that may modify H or Q. For Read and Write operations, there are two statements to check, namely 0 and 3.
Statement 0 does not modify Q, but appends a READ tuple to H. However, appending a READ tuple to H does not change the value of Val(H). Therefore, statement 0 does not falsify (2) . Theorem: Counters can be implemented in a bounded, wait-free manner from atomic registers.
4 Proof of Consistency
In this section, we formally prove that our counter implementation satis es the Consistency condition. We nd it convenient to express the implementation in the UNITY programming notation of Chandy and Misra 11] . This version of the implementation is shown in Figure 5 . In UNITY, the enabling condition of each assignment is explicitly stated; this eliminates the need for introducing notation for referring to the \control" of each process. A brief description of UNITY notation is presented in an appendix. The following notational conventions will be used in the remainder of this section.
Notational Conventions: Unless otherwise speci ed, we assume in this section that j and k range over The equivalence of the original program in Figure 3 and the UNITY program in Figure 5 can easily be established by comparison. In the UNITY program, statement (SS.j) is executed when process j takes a \snapshot" of Q. Statement (R.j) (respectively, (W.j) or (I.j)) is executed when a Read (respectively, Write or Increment) operation is performed by process j. Statement (RE.j) (respectively, (WE.j) or (IE.j)) is executed in order to \enable" a Read (respectively, Write or Increment) operation of process j. A Read operation of process j is performed by executing the sequence of assignments (RE.j); (SS.j); (R.j). A Write operation of process j is performed by executing the sequence of assignments (WE.j); (SS.j); (W.j). An Increment operation of process j is performed by executing the sequence of assignments (IE.j); (SS.j); (I.j); (SS.j); (I.j). Variables frz j] and alt j] are used to enforce the sequential execution of statements of process j. Variable frz j] is set to true when process j executes statement (SS.j) to take a snapshot and is set to false when process j executes statements (R.j), (W.j), and (I.j); this has the e ect of \freezing" process j from taking subsequent snapshots until the values read during this snapshot are subsequently used in one of the latter statements. Variable alt j] is used to \alternate" between the two phases of an Increment operation of process j. Variable rd j] (respectively, wr j] or inc j]) is set to true when a Read (respectively, Write or Increment) operation is performed by process j.
Several variables are introduced in the always section as a shorthand for various expressions. Variable i:tag corresponds to process i's tag value, and is assumed to be of type Tagtype (see Figure 2 ). Variable i:tup corresponds to process i's current tuple, and is assumed to be of type Htype. Variable i:val corresponds to the value of the implemented counter as determined by process i's last read from Q, and variable Qval gives the value of the counter as de ned by the components of Q; these variables are assumed to range over the integers.
Before giving the proof of Consistency, we rst introduce some terminology. Proof: Initially afterw is false, and hence (I9) is true. To prove that (I9) is stable, it su ces to consider those statements they may establish afterw or falsify the right-side of the implication. The statements to consider are (W.j) and (I.j). Each e ective execution of (I.j) establishes :afterw. Therefore, this statement does not falsify (I9). For statement (W.j), the following assertions hold. These three assertions imply that fI11g (I:j) fI11g holds. tag. The statements to check are (SS.j), (R.j), (W.j), (I.j), (RE.j), (WE.j), (IE.j), and (W.k), where k 6 = j. By the axiom of assignment, each e ective execution of (SS.j) establishes x j; N]:tag = Q N]:tag. By (I0), :inc j] holds prior to each e ective execution of (R.j) or (W.j), and by the program text, :inc j] holds prior to each e ective execution of (RE.j) or (WE.j); thus, by the axiom of assignment, :inc j] holds after each e ective execution of these statements. By (I10), :frz j]^alt j] = 0 holds after each e ective execution of (IE.j). It follows, then, that these statements do not falsify (I13). The remaining statements to consider are (I.j) and (W.k). For statement (I.j), the following assertions hold. The last three of these assertions imply that fI13g (I:j) fI13g holds.
Finally, consider statement (W.k), k 6 = j. For this statement, the following assertions hold. , by the axiom of assignment.
fB^C^I24g (I:j) fI24g
, by previous two assertions.
f(:B _ :C)^I24g (I:j) fI24g
, H not modi ed with this precondition.
The last two of these assertions imply that fI24g (I:j) fI24g holds. 2
Discussion
In the following subsections, we discuss several issues pertaining to our counter implementation.
Handling Over ows
In Section 3, we assumed that the value of the implemented counter ranges over the integers. To implement a counter that stores values over some bounded range, our implementation must be modi ed to prevent over ows. An over ow may result if an Increment operation is performed when the value of the counter is \very close" to the maximum allowed. Over ows can be dealt with in two ways: we can either modify the Increment procedure so that potential over ows are detected and avoided; or we can allow the value of the counter to \wrap around" when an over ow occurs. Incorporating the latter approach into our implementation is straightforward. For example, to implement a counter whose value ranges over 0::L ? 1, we need only modify statements 0 and 7 in Figure 3 so that when outval and sum are computed, addition is performed modulo L. Over ows can be detected and avoided by testing the value assigned to sum by the Increment procedure.
Suppose, for example, that each val eld in Q ranges over ?L::L. In this case, if jsumj L holds following the execution of statement 7, then Q i] is updated as before. However, if jsumj > L, then an error code is returned to process i and Q i] is not modi ed. This approach has the disadvantage that the counter does not have a single \maximum value": for a given value of the counter, an Increment operation by one process may cause an over ow error, while an Increment operation with the same input value by another process does not. This inconsistency results from the fact that over ow for process i depends only on the value of Q i]:val, and not on the value of any other component.
Complexity
The time complexity of an implementation is de ned to be the number of reads and writes of composite registers required to execute an operation of the implemented counter. It is easy to see that the time complexity of each Read, Write, and Increment operation in our implementation is O(1). The space complexity of an implementation is de ned to be the number of single-reader, single-writer, single-bit atomic registers required to realize the implementation. If the implemented counter stores values ranging over f?L; : : :; Lg, then by the results of 2, 3, 4], the space complexity is polynomial in L and N. 30 
Generalizing the Implementation
In the proof of Consistency, we considered an arbitrary well-formed history h, and showed that the partial order on operations de ned by h can be extended to a total order that is consistent with the semantics of a counter.
In particular, we proved that the output value of each Read operation r in h equals y + z, where y and z are de ned as follows.
If there exists a Write operation w such that w r^:h9v : v is a Write operation :: w v ri, then y equals the input value of w and z equals the sum of the input values of all Increment operations ordered between w and r by .
If no such w exists, then y equals the initial value of the implemented counter, and z equals the sum of the input values of all Increment operations ordered before r by .
In more general terms, the protocol followed in the implementation allows each Read operation to determine two values y and z, where y is the most recently written value according to , and z is a function over the input values of all intervening Increment operations according to . Note that this protocol does not enable a Read operation to determine the relative ordering of the intervening Increment operations. However, this ordering is irrelevant in determining the value of the counter because Increment operations are de ned in terms of addition, which is associative and commutative. This protocol can be generalized to yield the following theorem.
Theorem: Any shared register X that can either be read, written, or modi ed by a PRMW operation of the form \X := X v," where is an operator that is associative and commutative and v is an integer value, can be implemented in a bounded, wait-free manner from atomic registers.
2
As an example, consider the problem of implementing a \multiplication register," i.e., one that can either be read, written, or multiplied by an integer value. We can implement such a register by de ning the value of the implemented register to be and by modifying statement 0 of the Read procedure accordingly. The procedure used to multiply the value of the register would be similar to the Increment procedure in Figure 3 , except that in statements 5 and 7, \sum := 0" would be replaced by \sum := 1," and in statement 7, \sum := x i]:val + inval" would be replaced by \sum := x i]:val inval" (actually, prod would be a more suitable variable name than sum in this case). The initialization of the register would be similar to that given in Figure 2 , except for the requirement h8j : 0 j < N :: Q j]:val = 1i.
Implementing More Powerful Shared Data Objects
One may wonder whether atomic registers can be used to implement even more powerful shared data objects in a wait-free manner, i.e., ones that may be modi ed by means of numerous PRMW operations. In order to partially address this question, we consider the problem of implementing a register that combines the operations shared of both counters and multiplication registers; we call such a register an accumulator register. In the remainder of this subsection, we show that the following theorem holds.
Theorem: Accumulator registers cannot be implemented from atomic registers without waiting.
2
The proof is based upon the problem of two-process consensus. In the consensus problem, two processes are required to agree on a common boolean \decision value"; trivial solutions in which both processes agree on a predetermined value are not allowed. It has been shown by Anderson and Gouda 5] , by Chor, Israeli, and Li 12] , by Herlihy 15] , and by Loui and Abu-Amara 23] that two-process consensus cannot be solved in a wait-free manner using only atomic registers. Therefore, to prove that accumulator registers cannot be implemented from atomic registers without waiting, it su ces to prove that accumulator registers can be used to solve two-process consensus in a wait-free manner. Figure 6 depicts a program that solves two-process consensus without waiting by using a single shared accumulator register X. To see that this program solves the consensus problem, consider Figure 7 . This gure depicts the possible values of X; each arrow is labeled by the statement that causes the change in value. Based on this gure, we conclude that if statement 0 is executed before statement 3, then the nal value of y equals 4 or 8 and the nal value of z di ers from 2 and 5, in which case both processes decide on \true." On the other hand, if statement 3 is executed before statement 0, then the nal value of z equals 2 or 5 and the nal value of y di ers from 4 and 8, in which case both processes decide on \false." Thus, this program solves the consensus problem.
Concluding Remarks
We have shown that there exist nontrivial shared data objects with PRMW operations that can be implemented from atomic registers in a bounded, wait-free manner. In particular, we have presented an implementation that can be generalized to implement any shared data object that can either be read, written, or modi ed by an associative, commutative PRMW operation. Our implementation is polynomial in both space and time, and thus is an improvement over the unbounded implementations of Aspnes and Herlihy 7] .
It is interesting to note that our results can be applied to extend the notion of a composite register by allowing an additional associative, commutative PRMW operation on each component. As an example, consider the problem of implementing an array of counters that can be written or incremented individually or read collectively in a single snapshot. An individual counter can be implemented by using a single composite register as described in Section 3. A set of counters that can be read atomically can be implemented in a straightforward fashion by combining the composite registers that implement the individual counters into a single composite register. This approach can be further generalized as discussed in Section 5.3 for the case of other associative, commutative PRMW operations.
The results of this paper provide yet another example of an unbounded wait-free implementation that can be made bounded. An interesting research question is whether it is possible to develop a general mechanism for converting any unbounded wait-free implementation into a bounded one, provided the data object under consideration is syntactically bounded. 4 This question was noted previously by Afek et al. 2] .
The correctness proof for our implementation is noteworthy because of the fact that it is assertional, rather than operational. Most proofs of wait-free implementations that have been presented in the literature are based upon operational concepts such as histories and events. Such proofs require one to mentally execute the program at hand (e.g., \: : : if process i does this, then process j does that : : :"), and thus are quite often error prone and di cult to understand. In our proof, auxiliary history variables are used to record the e ect of each operation; these history variables serve as a basis for stating the required invariants.
The use of auxiliary history variables in correctness arguments is, of course, not new. Early references include the work of Clint 13] and also Owicki and Gries 25] . More recent references include Abadi and Lamport's work on re nement mappings 1] and Lam and Shankar's work on module speci cations 20]. Our use of history variables was, in fact, motivated by the latter paper, where history variables are used to formally specify database serializability. We believe that history variables better facilitate the development of assertional proofs of wait-free implementations than do shrinking functions and their variants 8].
Appendix: UNITY Programming Notation
In this appendix, we describe the UNITY programming notation used in Figure 5 . Our treatment will be necessarily brief and our descriptions operational. For a complete description of UNITY, the interested reader is referred to 11].
A UNITY program consists of four sections, namely declare, always, initially, and assign. The declare section gives variable declarations. The always section de nes auxiliary variables that are functions of other variables. The initially section speci es initial conditions. The assign section gives the executable statements of the program. Operationally, a UNITY program is executed from state to state; at each state an assignment statement is selected nondeterministically and is executed. (A parallel assignment is considered to be a single assignment statement; in Figure 5 , the assignment statements are labeled.) The execution of such a statement may or may not be e ective. To ensure progress, statement selection is required to be fair. However, because linearizability is a safety property 16], we have no need to consider such issues.
