The construction of shared data objects is a fundamental issue in asynchronous concurrent systems, since these objects provide the means for communication and synchronization between processes. Constructions which guarantee that concurrent access to the shared object by processes is free from waiting are of particular interest, since they help to increase the amount of parallelism and to provide fault-tolerance. The problem of constructing a k-valued wait-free shared register out of binary subregisters of the same type where each write access consists of one subwrite (constructions with one-write) is important, since it lies at the heart of studying lower bounds of the complexities of register constructions and trade-o s between them. The rst such construction was for the safe register case it uses k binary safe registers and exploits the properties of a rainbow coloring function of a hypercube graph. The best known construction for the regular (atomic) case uses ; k 2 binary regular (resp. atomic) registers, while if the one-write requirement is lifted, there exists a construction that uses 4(log k + 1 ) binary registers. Here we s h o w h o w rainbow coloring can be extended to simulate handshaking between the reader and the writer of the register, thus o ering a wait-free solution for the atomic case with one reader, using only 3k ; 2 binary registers. The best known lower bound for such a construction is k ; 1.
The construction of shared data objects is a fundamental issue in asynchronous concurrent systems, since these objects provide the means for communication and synchronization between processes. Constructions which guarantee that concurrent access to the shared object by processes is free from waiting are of particular interest, since they help to increase the amount of parallelism and to provide fault-tolerance. The problem of constructing a k-valued wait-free shared register out of binary subregisters of the same type where each write access consists of one subwrite (constructions with one-write) is important, since it lies at the heart of studying lower bounds of the complexities of register constructions and trade-o s between them. The rst such construction was for the safe register case it uses k binary safe registers and exploits the properties of a rainbow coloring function of a hypercube graph. The best known construction for the regular (atomic) case uses ; k 2 binary regular (resp. atomic) registers, while if the one-write requirement is lifted, there exists a construction that uses 4(log k + 1 ) binary registers. Here we s h o w h o w rainbow coloring can be extended to simulate handshaking between the reader and the writer of the register, thus o ering a wait-free solution for the atomic case with one reader, using only 3k ; 2 binary registers. The best known lower bound for such a construction is k ; 1.
INTRODUCTION Background
In all forms of communication in a concurrent system the problem of sharing data between multiple processes must be faced at some level. The traditional way to share data among processes which either read or write the data is to require either mutual exclusion or that a write have exclusive access to the data, thus making only concurrent reading possible 1]. The requirement that some actions happen in an exclusive manner implies waiting by some process for another. However, in an asynchronous system, where some processors may b e inherently faster than others, the above approach w ould slow a fast process down to the speed of a slow one. A natural property to require from an implementation of a shared data object in an asynchronous concurrent system is to guarantee that any process can complete any access to the object in a nite number of steps, regardless of the execution speeds of the other processes. Such an implementation is called wait-free. Wait-free shared data objects not only help in taking advantage of the inherent parallelism in concurrent systems, but also guarantee resiliency to halting/stopping failures, since a process that crashes while accessing the object cannot block the progress of any other process intending to access the same object.
A shared variable that supports concurrent read and write operations in a wait-free manner is also called a wait-free shared register from this point in the paper, we adopt the convention to call it register.
A register is characterized by t h e n umber of readers that may concurrently read it, the number of writers that may concurrently write it, as well as the number of values it can take on. Registers are also classi ed according to the consistency guarantees they provide in the presence of concurrent operations. Three kinds of consistency guarantees, namely safeness, regularity and atomicity, h a ve been de ned by Lamport in 2] and have become of fundamental importance in the study of shared registers. According to those de nitions, a shared register, which can be concurrently accessed by one writer process and one or more reading processes, is called:
safe if it guarantees only that a read which d o e s not happen concurrently with any write always returns the most recent v alue written to the register. The safeness property ensures nothing for the value returned by a r e a d w h i c h o verlaps with writes it may equal any possible value of the register regular if, besides ensuring safeness, it guarantees that a read that happens concurrently with one or more writes returns a \reasonable" value, which might be either the old one or one of the values written by one of the overlapping writes atomic if it guarantees that even when read and write operations overlap, there exists a way to \shrink" each one of them in an atomic grain of time which lies in its respective t i m e duration, in a way that the value returned by each read equals the value written by the most recent write according to the sequence of \shrunk" operations in the time axis. These dimensions imply a hierarchy o n registers. The idea is to start with simple communication primitives (such as single-writer single-reader safe registers), which can be provided directly in hardware, and successively construct more powerful multi-reader (even multi-writer) multi-valued objects 2, 3] . This procedure leads to modular system organization.
Contribution of this paper and related work
Despite the fact that there has been a great deal of research on implementations of stronger registers out of weaker ones 4, 5, 6, 7, 8, 9, 2, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] , to the best of our knowledge, comparatively few results have appeared studying the necessary costs incurred by such implementations 21, 22, 2 3 , 1 6 ].
Chaudhuri and Welch in 22] summarize the issues involved in the study of the intrinsic complexity of register constructions: Since registers may di er in several dimensions, the inherent complexity of constructing a strong register out of weaker ones has multiple cases to be examined. They focus on two parameters of interest: the number of values and the consistency guarantees of the register. Thus, they propose the problem of studying the inherent cost of constructing multireader, single-writer, k-valued safe, regular and atomic registers out of multireader, single-writer, 2-valued (binary) safe, regular and atomic registers, respectively. Following their abbreviation, we refer to such constructions as k-valued from 2-valued safe/regular/atomic register constructions. The cost measures are the number of binary registers used and the number of subreads and subwrites performed during each read or write operation of the registers. In that paper the particular classes examined are those of k-valued from 2-valued safe and regular constructions. First they prove that for the case in which the writer performs only one write suboperation, k ; 1 subregisters are necessary. As a second step they give an algorithm which implements a safe register, using as an encoding function a special function that can vertex-color a (k ; 1)-dimensional hypercube (recalling its de nition: a n-dimensional hypercube is a regular undirected graph with 2 n vertices labelled from 0 t o 2 n ; 1, where two v ertices are connected if the binary representation of their labels di er in exactly one bit gure 1 shows a 3-dimensional hypercube) with k colors, after proving that such a coloring exists if and only if k is a power of 2.
Kant and van Leeuwen in 24] independently have shown the same result for this special coloring of hypercubes and they applied it to the le distribution problem.
The aforementioned special coloring of a (k ; 1)-dimensional hypercube with k colors is called rainbow coloring and is such that each node of the hypercube has exactly one neighbor with each one of the k ; 1 colors other than its own. Here we show how a rainbow coloring |which has been used to construct the weakest type of registers, the safe one| can be extended to implement a p o werful handshaking mechanism. The handshaking mechanism is particularly important i n w ait-free algorithms, as has been pointed out by T romp in 16], by Kirousis, Spirakis and Tsigas in 25] and by Dwork et al. in 26, 2 7 ] . Roughly speaking, handshaking involves having the processes play a \hide-and-seek" game by accessing di erent physical locations (shared variables), so as to minimize concurrent access of the same memory locations. In section 3, the mechanism is described in more detail. Using our implementation, we get a k-valued from 2-valued atomic construction where the writer performs one write suboperation per operation and there are no overlapping reads. Our construction uses 3 2 dlog ke ; 2 subregisters (when k is a power of 2 this is 3k ; 2), while the best known lower bound for such a construction is k ; 1. Before proceeding to the description of our construc-tion, the following section presents more precisely the requirements of the problem.
PRELIMINARIES
A construction of a register comprises of a data structure consisting of memory cells called subregisters, a set of initial values for the subregisters and a set of read and write procedures (aka a protocol) which provide the means to the processes to access the register.
When a process needs to perform either a read or a write operation on the register it must invoke the respective procedure we call this process either reader or writer, respectively. Each procedure execution, or shortly operation, is a sequential execution of a procedure's statements (steps), which m a y b e e i t h e r r e a d o r write suboperations on the subregisters or some local computations of the procedure. To a void confusion between operations on the constructed register and operations on the subregisters used in the construction, the term operations is used only for the former and suboperations is used for the latter.
A construction C is called wait-free if it guarantees that any operation will complete in a bounded number of steps. The wait-free condition rules out unbounded busy waiting, as well as conditional waiting.
In a global time model 2] each operation q \occu-pies" a time interval s(q) f (q)] on one linear time axis (s(q) < f (q)) think of s(q) f (q) as the starting and nishing time instants of q. During this time interval the operation is said to be pending.
A precedence r elation among a set of operations (denoted bỳ !') is a strict partial order: q 1 ! q 2 means that q 1 ends before q 2 starts. The precedence relation is extended to relate suboperations of operations that are related: if q 1 ! q 2 , then for any suboperations op 1 and op 2 of q 1 and q 2 , respectively, it holds that op 1 ! op 2 .
A system execution over the register is a tuple (A !), where A is a set of operations on the register and ! is the relation describing precedence among them, as above. Operations in A which are incomparable under ! are called overlapping. It is assumed that there exists a write operation, which initializes the register, that precedes all other operations on it.
A sequential execution (A )) o ver the register is one whose operations are totally ordered according to the transitive irre exive order ). A sequential execution (A )) satis es the sequential speci cation of the register if each read operation in A returns the value written by the most recent write operation according to ). A system execution (A !) is atomic or linearizable 7, 6] if it is equivalent to a sequential execution over the register which satis es the sequential speci cation of the register. A construction implements an atomic register if all its possible system executions are atomic.
For the proof of the atomicity of our construction we will use the following atomicity criterion for singlewriter register constructions, which is due to Lamport 2]: No-New-Old-Inversion: For any two read operations r 1 and r 2 of it is not the case that:
(r 1 ! r 2 and (r 2 ) ! (r 1 )).
Roughly speaking, the above conditions imply that each read returns a written but not overwritten value and that reads obtain correctly-ordered values. (Actually the rst two conditions alone are su cient for regularity.)
Finally, the cost measures for the computation of space and time complexities of a construction C are: the number of subregisters used by C and the maximum number of suboperations on the subregisters performed during any read and any write operation in any system execution of C.
DESCRIPTION OF THE CONSTRUC-TION
The protocol is given in gure 2. We adopted the convention that shared variables are denoted by upper-case letters and local variables are denoted by l o wer-case letters. From the notation point of view we adopt the use of x to denote 1 ; x, where x 2 f0 1g. Also , bin(i) shaking. This mechanism implies that there are two \virtual places", also called modes, where the reader and the writer may be during each access to the register the reader tries to be at the same place with the writer, while the latter tries to avoid it, by \moving" to the other virtual place when it sees that it has been \followed". By having disjoint sets of subregisters that can be accessed in each virtual place, the handshaking mechanism guarantees the existence of a piece of information that can be accessed by e a c h communicating part without collision on the physical level. The controller of the game here is the writer in each write operation it has to: 1. determine the reader's mode by reading the subregister RM and 2. assign the new value to the register and change its virtual place place (i.e. change mode) if the reader has \followed". From the particular rainbow property of the coloring function, it follows that the writer has the capability o f changing the value of the register by modifying a single one of the construction's subregisters moreover, in or- On the other hand, the reader, in each read operation, rst assigns to its local variable wm the virtual place (mode) in which the writer is. It determines this from the values of the subregisters of H, using a parity function, as explained in the previous paragraph. If the writer has \moved" (changed mode) since the previous read, the reader reads the subregisters in L wm , i n order to nd which was the last con guration of that set when the writer had to \move" to virtual place wm.
(Note that, if the writer has not \moved" since the previous read, the information in L wm remains intact since it was last read.) After that, the reader updates RM in order to show to the writer that it has followed it in its new virtual place (mode). Subsequently, in either case, it reads the subregisters in L wm . At that point the reader has a complete view of a recent e n o u g h ( i . e . the most recent view that is not subject to be updated concurrently with it, i.e. after it started reading) set of 
PROOF OF CORRECTNESS
First we p r o ve that the encoding adopted using function COLOR is correct: Lemma such that y 0 6 = y and y 0 di ers from x only in one bit, either bit k ;1 + i or bit i, respectively, and COLOR(y) = COLOR(y 0 ). This is because COLOR(y) COLOR(y 0 ) = 0 .
On the other hand, given again x and y as above, for any other y 00 2 f 0 1g 2(k; 1) which di ers from x in only one bit excluding bits i and k ; 1 + i (x and y 00 di er either in bit j or in bit k ; 1 + j, 1 j k ; 1 and j 6 = i) it holds that COLOR(y) 6 = COLOR(y 00 ) because COLOR(y) COLOR(y 00 ) = bin(i) bin(j) which is not zero, because i 6 = j.
Next we focus in proving the atomicity of our construction. We introduce some auxiliary terminology, which helps the presentation: For a read operation r, let each one of its read suboperations be mapped to the most recent write operation which modi ed the respective subregister (according to the total order de ned on the operations of the atomic subregister). We de ne (r) to be the write operation of this set such that every other operation of this set precedes it. (If the only operation in this set is the initial write operation w init , w h i c h, must naturally write (initialize) all the subregisters, then (r) = w init .) This is a well de ned function because the write operations are totally ordered, since there are no overlapping writes. As already indicated earlier in the paper, we will use Lamport's atomicity criterion (lemma 2.1) to prove t h e atomicity of our construction. Consider any arbitrary system execution . We rst prove a couple of auxiliary lemmas (lemmas 4.2 and 4.3). These are followd by three lemmas on important properties of the handshaking implementation (lemmas 4.4, 4.5 and 4.6), namely that (i) while a read r scans the subregisters in H (resp. L mode(r) ), there can be at most one write that may c o ncurrently attempt a modi cation in the same set and that (ii) if a read r scans L mode(r) , there is no write that may concurrently a attempt to modify a register in that set (i.e. that scan is \collision-free"), while if it does not scan L mode(r) , then it does not miss any information, since the values in L mode(r) could not have been modi ed since the last time that they were read. These properties are then used to prove (lemma 4.7) that can play the role of the reading function in Lamport's atomicity criterion. Then we s h o w that the construction satis es the three conditions in the criterion ((lemma 4.9). Lemma 4.2. For any read r and for any write w, such that put(r)!get(w) and (:9 read r 0 : put(r) ! put(r 0 ) ! get(w)), it is mode(r) = mode(w).
Proof. Since there are no overlapping reads, the lemma hypothesis implies that r is the last read to modify RM before w reads it. Thus w will read from RM into its local variable rm the value that r wrote either this value will be complementary to the value of w's local variable wm, o r w will complement wm. In both cases, due to the de nitions of mode(r), mode(w), the lemma follows. Lemma 4.3. In any sequence of successive writes with the same mode, only the rst one could write to a subregister of the set H. Proof. From the protocol it follows that a write w has a di erent mode compared to its directly preceding write i it writes to a subregister of the set H. Lemma 4.4. For any read r there can be at most one write whose write suboperation occurs between the start of r (i.e. s(r)) and put(r) and modi es a subregister in the set H.
Proof. Since there are no overlapping reads, we can use induction on the number of reads that occur in any arbitrary system execution .
Let r i denote the ith read in the execution. For the induction basis, suppose, towards a contradiction, that there exist write operations w x ! w x+1 ! : : : ! w x+q ! put(r 1 ) ( q 1) whose write suboperations modify subregisters in H and occur between the rst suboperation of r 1 and put(r 1 ).
From the initialization conditions it follows that each write w such t h a t get(w) ! put(r 1 ) sees that RM = 1 and its local variable wm = 0 therefore, according to the writer's protocol, w will not write in H. This leads to the desired contradiction, which p r o ves the induction basis.
For the induction hypothesis, assume that the lemma holds for all r k , 1 i k. We will show that it also holds for r k+1 . get(w x ) ! put(r k ): From the induction hypothesis we know t h a t b e t ween s(r k ) a n d put(r k ) only one subwrite to a subregister in H occurs. We can modify execution into one 0 where the subwrite of w x occurs between s(r k ) and put(r k ), but at a time so that r k "misses" it (i.e. r k reads the subregister modi ed by w x before the subwrite takes place).
Execution 0 is equivalent t o , as the only two operations involved in the modi cation behave in exactly the same way and do not a ect the behaviour of any other operation hence, every operation in 0 behaves the same as its equivalent i n . By the induction hypothesis we k n o w t h a t between s(r k ) and put(r k ) only one subwrite to a subregister in H occurs. In this case it is the subwrite by w x , which is \missed" by r k , Hence, if mode(w x ) = m then mode(r k ) = m. r ; is the read directly preceding r in the execution (r ; exists since the rst read in any execution scans L m ), there is no write which modi es a subregister in L m and whose write suboperation occurs between put(r ; ) a n d put(r).
Proof. 1. Assume, towards a contradiction that there exists such a w r i t e w. Since r scans L m , either r is the rst read in the execution or there exists a read r ; directly preceding r with mode(r ; ) = m.
In the former case we directly come to a contradiction, since, due to the initializing conditions, any write w such that get(w) ! put(r) has mode(w) = 0 and writes to L 0 , while mode(r) i s a l s o 0 .
In the latter case, it must be get(w) ! put(r ; ) (otherwise, by lemma 4.2 it would be mode(w) = m and hence, w would not write in L m ). Since r scans L m after having scanned H and we assumed that w's subwrite in L m occurs during the scan of L m by r, this implies that during the interval that r scans the subregisters in H there is no write that modi es any of them and it holds that i=1 ::: k;1 H i = m. But The following is the main lemma to use in order to apply Lamport's atomicity criterion. Note that it cannot be the case that (r) is overlapping r and its subwrite is on a subregister in L m .
If this was the case, then, due to the rst part of lemma 4.6, the subwrite of (r) i n L m would have happened before r started to scan L m . But the fact that mode(r) = m would imply that there exists a write w with mode(w) = m, modifying H, s u c h that (r) ! w, and r \sees" the modi cation of w. This would contradict the de nition of (r).
From the previous lemma it follows that:
Corollary 4 . 8 . For any read r, mode(r) = mode( (r)). Lemma 4.9 . The protocol satis es lemma 2.1 (Lamport's Atomicity Criterion).
Proof. We u s e (r) a s (r), which, by lemma 4.7 assigns to each read r a write w, such that the value returned by r |according to the read procedure invoked| is the value written by w. We next prove the three conditions that are su cient to imply atomicity.
No-Future: From the de nition of (r) it follows that the last suboperation of (r) occurs before the last suboperation of r.
No-Past: Suppose, towards a contradiction, that there exist a read r and a write w of such that (r)!w!r. Let mode(r) = m. If w writes in H or L m , we have directly a contradiction to the de nition of (r), since a read with mode m always scans H and L m . Hence, the only possibility i s t h a t w writes in L m . Note that this implies that mode(w) = m. Since, due to corollary 4.8 mode( (r)) = m and, due to the protocol, a write has di erent mode compared to its directly preceding write i it writes to a subregister in H, w e have that there must be a write w 0 that writes to a subregister in H, a n d (r)!w 0 ! w!r. Hence, either mode(r) = m (a contradiction to the assumption) or there exists another write w 00 such that (r)!w 0 ! w!w 00 that modi es H and is \seen"
by r (a contradiction to the de nition of (r).
No-New-Old-Inversion: Suppose, towards a contradiction that 9 reads r 1 , r 2 in such that r 1 ! r 2 and (r 2 ) ! (r 1 ). From the de nition of (r 1 ) i t follows that the last suboperation of (r 1 ) occurs before the last suboperation of r 1 . This implies that (r 2 ) ! (r 1 ) ! r 2 , since r 1 ! r 2 . But this is a contradiction to the No-Past condition, which has already been shown to hold.
For the case that k is not power of 2, the protocol can use 3l ; 2 subregisters, where l = 2 dlog ke , i.e. l is the smallest power of 2 larger than k. In this way the protocol will in fact implement a n l-valued atomic register (k < l ), which can also serve a s a k-valued one.
Hence, we h a ve the following: Theorem 4.10. The presented construction correctly implements a wait-free k-valued atomic register using 3 2 dlog ke ; 2 atomic binary subregisters. The maximum number of suboperations performed during any read r is 3 2 dlog ke ;3, while each write w performs one read and one write suboperation.
