Introduction
The problem of process coordination has been extensively addressed in the context of sequential consist,ency. However, modern multiprocessors present a. large va,riety of memory models that are anything but sequentially consistent. Relaxed memory consistency models increase the challenge of solving various coordinatmion problems. The relaxation of ordering constmints on memory accesses in these models makes rea,soning about concurrency a subtle and involved issue. Process coordination problems are still more difficult, if the descript,ion of the way processors interact witsh memory is ambiguous, informal, or incomplete.
Algorithms t1ia.t coordinate processes via critical sections have long been known for sequentially consistent, systems [32, 23, 28, 311 . However, many wea.ker memory models cannot provide this coordination through only read and write opera.tions. Therefore, multiprocessors come equipped with additional basic operations which make synchronization possible. The use of synchronization primitives, however, incurs subshntial additional execution time. So we a.re niot,ivated to keep the use of strong primitives to a. minimum. Hence we need to determine what kinds of coordination problems can be solved on common memory models using just read and write operations. The University of Calgary, Canada email:kawash@cpsc.ucalgary.ca
In earlier work we defined a formal framework for specifying memory consistency models [18] . In this paper, we use that formalism to address various process coordination problems for some memory consistency models where constraints on orderings of memory accesses are weaker than those guaranteed by sequential consistency. Specifically, this paper determines the possibility or impossibility of solving the critical section problem and versions of the producer/consumer problem on SPARC multiprocessors (with memory models Total Store Order and Partial Store Order) as well as on common weak memory models from the literature (Coherence, Pipelined-RAM, Goodman's processor consistency) without using strong synchronization operations.
Lamport's seminal work beginning in 1978 [21, 22, 241 seems to have launched investigations into relaxations of sequential consistency. Since then there has been substantial research on a large selection of memory consistency models. Most closely related to our research are several papers presenting formal descriptions of specific memory models [22, 2, 1, 4, 5, 8, 14, 13, 16, 20, 27, 301 and papers that present various formalisms for describing memories and reasoning about them [3, 6, 7, 4, 11, 15, 25, 26, 17, 29, 331. Our work has benefited from all of these papers. For an extensive bibliography on memory consistency models see the online listing at the University of Alberta (http://www.cs.ualberta.ca/ Section 2 reviews a framework for defining memory consistency models, and then uses it to define six models: Sequential Consistency, Coherence, Pipelined RAM, Processor Consistency, and the SPARC Total Store Order and Partial Store Order models. Section 3 examines the ability of these models to support mutual exclusion. Section 4 examines the ability of these models to solve some producer/consumer problems.
Our contributions are summarized in Section 5. rasit/dsmbiblio/node2 .html).
Modelling Multiprocessor Memories

2.1
Our goal is to precisely capture the impact of the memory model of a mult,iprocessor system on the possible outcomes of computations of that system. Here A framework for describing memory we briefly overview our framework' for specifying this impact.
A multiprocessor system is modelled as a collection of processes operating an a collection of shared data objects. Informally, the program of each process issues a stream of invocations to a collection of abstract objects and receives a collection of responses that, from the process's point of view, are interleaved with its stream of invocations. We do not specify how the objects are implemented, how the communication proceeds or how the invocations are serviced. Instead we precisely define the constraints that the memory system imposes on the responses observed by each process. This is achieved by formalizing objects, processes, executions and the constraints on these executions.
One way to define a data object is to describe the object's initial state, the operations that can be applied to the object and the change of state that results from each applicable operation. As observed by Herlighy and Wing [17] , it suffices to define a data object to be the set of all sequence of allowable operations together with their results, as follows. An action is a 4-tuple (op, obj, in, out) where "op" is an operation, "obj" is an object name, and ('in7' and "out" a,re sequences of parameters. The action (op, obj, in, out) means that the operation "op" with input parameters "in" is applied to the object "obj" yielding the output parameters " o u~" .
A (data) object is specified by a set of sequences of actions. the action-invocation and (op, obj, out) is t i, e m a t c h A process is a sequence of action-invocations. A (multiprocess) system, ( P , J ) , is a collection P of processes and a collection J of objects, such that for every process p E P the action-invocations of p are applied to objects in J .
A process execution is a (possibly infinite) sequence of events, such that each response event follows its matching invocation event. A (multiprocess) system execution for a system ( P , J ) , is a collection of process executions, one for each p E P .
Not,e that an action invocation applied t>o an object may not produce an explicit output. (For example, the complete framework appears in [18] . 'A denotes the empty sequence.
an atomic write of a register by a process may not be followed by an acknowledgment of that write and thus the output component of the write action is empty.) We assume however, that in a process execution, each invocation event is followed by a matching response event by augmenting the stream of response events with implicit acknowledgement events for each invocation that does not have an explicit response. We place these implicit responses immediately following the matching invocation, thus reflecting the semantics of the process -that the invocation is assumed to be immediately executed.
A system execution gives rise to a set of actions -namely the set of actions that result from combining each action-invocation with its matching actionresponse.
Let ( P , J ) be a multiprocess system, and 0 be the set of actions that result from an execution of this system. 0 l p denotes the set of actions o E 0 such that invoc(o) is in the sequence of action-invocations that defines p in P . 0 l i c is the set of actions that are applied to object x in S.
We define a partial order3 (0,") called program order on the actions of a system. Let 01 and 02 be actions in 0 l p for some p E P . Then 01 programprecedes 0 2 , denoted 01'302, if and only if invoc(o2) follows invoc(o1) in the definition of p . Observe that for each process p E P , the program order is a total order on Olp.
For the definition of some memory consistency models it is necessary to distinguish the actions that change (write) a shared object from those that only inspect (read) a shared object. Let 0, be that subset of 0 consisting of those actions in 0 that update a shared object. There are also some consistency models that provide other synchronization operations. Their corresponding actions are defined when needed.
Given any set of actions 0 on a set of objects J , a lanearazataon of 0 is a linear order4 (0, < L ) such that for each object IC E J , the subsequence ( O l z , <~) of 0, < L ) , which contains only the actions on 2 , is valid lor z.
In the following sections, we define various consistency conditions by stating restrictions on executions.
An execution E meets some consistency condition C if the execution meets all the conditions of C. A system provides memory consistency C if every execution that can arise from the system meets the consistency condition C.
Memory Modlel Definitions
In earlier work we defined several memory consistency models using the framework just described [18] and showed that these definitions do unambiguously capture the (sometimes less precise) informal descriptions of these models [19] . Here we repeat the definitions (without justification) of the six models investigated in this paper [22, 16, 27 , 341.
Let 0 be the set of actions that results from the execution E of the multiprocess system ( P , J ) .
and
In 
( ( O I p~O , ) \ ( O i n v i s i b l e , U O m e m w r i t e s , )
, . 
Critical Section Problem
In the critical section problem (CSP), a set of processes coordinate to share a resource. We investigate minimum requirements for memory models to be able t80 provide a solution to CSP without the use of strong synchronization primitives.
We assume each process has the following structure: 
Proof:
Let <P be a linearization for p E P of (Olp U 0,) that is guaranteed by P-RAM. Similarly, let <* be a linearization for q E P over (Olq U Ow).
Since, for any object 2 E J , there is only one processor say r that writes to 2 , and both <" and < 4 have all these writes t,o x in the program order of r , the order of the writes to I(: in <, is the same as the order of the writes to x in <S. Therefore, the definition of PCG is satisfied.
I
We will use partial execut'ions E x l , Ex2, and Ex3 defined as follows.
where X denotes the empty sequence and 4 denotes the ith operation of p . Let z be the one shared object and let $ be the first writ8e to x by p if a.ny. Then the sequence (4, ..., oj"-l)(oj , ..., op)(dj, ..., dj, ..., 0 : ) is a total order on 0 which preserves '3. By a similar argument to that in item 2, this sequence, when rest,ricted to operations on x, is a linearization. However, there are no other shared objects, hence the whole sequence is a linearization. Thus Ex3 is Sequentially Consistent if there is only one shared object.
Observe that none of the argumentss in Claim 3.2 depends on the Fairness property, so the systems listed there cannot solve CSP even without fairness. A second observation follows after recalling that Peterson's Algorithm [31] , which uses multi-writer registers, is a correct solution for the CSP even for PCG [4] . However, Claim 3.2( l) establishes that CSP is impossible in PCG with single-writer objects.
Corollary 3.3 In a PCG system, multi-writer objects cannot be amplemented from single-wrater objects.
4 Producer/Consumer Problem Producer/Consumer [lo] objects are frequently used for process coordination. The producer is a process that produces items and places them in a shared structure. A consumer is a process which removes these items from the structure. We distinguish two structures whose solution requirements vary: the set structure where the order of consumption is insignificant, and the queue structure where items are consumed in the same order as that in which they were produced.
Producers and consumers are assumed to have the following form:
We denote the producer/consumer queue problem as PmCn-queue where m and n are respectively the number of producer and consumer processes. Similarly the producer/consumer set problem is denoted PmCn-set.
A solution to PmCn-set must satisfy the following:
e Progress: Any producer in <entry> will eventually be in <producing> and then subsequently in <exzt>. Any consumer in <entry> will eventually be in < consuming> then subsequently in <exit> unless the set is empty.
A solution to P,C,-gueue must satisfy Safety and 0 Ordered Consumption: consumers consume items in the same order as that in which the items were produced.
Progress plus the following property:
Claim 4.1 There is an algorithm that solves PIC1-queue in the following system.s:
1. a system that is only Coherent.
a system that is only P-RAM.
Proof: correct.
We show that the algorithm in Figure 1 is A Coherent system: Safety : line (1) is the producer's only write to global memory. By coherence, this global write can not be committed to main memory unless "queue[in] = I" since these two operations are to t8he same object. Therefore, program order is preserved. By a similar argument, we can show that, the consumer can not consume an unproduced item. progress: the writes in lines (1) and (3) will eventually reach the global memory preventing lockouts. Ordered Consumption: By coherence, two insta.nces of line (1) may reach the globa,l memory out of order only if they a.re to two different locations (different queue cells). However, the consumer will not consume itrems out of order because it is only required to consume the cell pointed to A P-RAM system: Safety: t.he writ,e of line (1) can not be committed to global memory unless "queue[in] = I" because P-RAM requires ' 3 to be maintained per process. By a similar a.rgument, the consumer can not consume an unproduced item. progress: as in coherence.
Ordered Consumption: The order of consumption follows that of production because P-RAM requires the consumer to see all the writes in the w o g system in an order which adheres to --f.
by "out".
I
Although CSP cannot be solved in several weak memory systems aim 3.2), PICl-queue can be TSO and PSO systems are coherent [19] .
Therefore the corollary follows from Claim 4.1.
We have now seen weak memory systems for which CSP cannot be solved, but PICl-queue (and hence PICl-set) can be. We now investigate more general producer-consumer problems for weak systems. Pm Cl-queue:
a system that is only P-RAM and
The proof for the impossibility of this problem is very simifar to that for P;C,-queue.
i
Note that any systsem that can solve mutual exclusion can solve P,C,-queue as well. This can be achieved by simply protecting the queue structure in a critical section so that it is accessed by only one process at a time. Since a PCG system can solve CSP, Claim 4.4(2) is as strong as possible. ducer/consumer set problem. T h i s corollary contrasts strongly with t h e pro- 
Proof:
(Sketch) Since we can solve PICl-queue in these systems, PICn-set can be solved by associating a separate queue with each consumer. The producer inserts its items into these queues using any discipline, say round-robin. The correctness argument is straightforward. Similarly, P, C1-set can be solved by associating queues to producers. To solve the gen-P,C,-set we combine these two solutions. If m, 2 n let consumer ci consume from producers pifi for i R 5 n , and similarly for n 2 m.
8
Corollary 4. 7 There is an algorithm that solves P,,C,-set for each of the systems listed in Claim 4.4.
Proof:
Each such system is either Coherent or P-RAM. So, Claim 4.6 applies.
I
Conclusion
Sequential consistency is an easy a,nd intuitive memory consistency model to adopt. In fact, sequential consistency has usually been assumed for work in analysis of distributed algorithms. However! current and proposed multiprocessor machines implement weaker models than sequential consistency in order to enhance performance. In this paper, we have revisited the critical section problem and the producer/consumer problem in the context of weak memory systems -systems that are not) sequentially consistent. Table 1 summarizes our impossibility and possibility results. We have shown that several wea,k memory systems cannot s u p p o r t a solution to the critical section problem without additional synchronization or powerful inst,ructions. These systems include coherence, P-RAM, TSO, and PSO systems. In spite of t,liis, we have shown that certain versions of the producer/consumer problem can be solved without, any a,dditioiial synchroniza,tion. In particular, even the weakest memory models (coherence and P-RAM) ca,n support a solut,ion to the producer/consumer queue problem for two processes -a single producer and a single consumer. Moreover, they can support a solution for any number of processes if we remove the requirement of processing the queue in order.
Weak system architectures provide additional powerful synchronization instructions. These instructions are expensive; it is useful to identify ways to avoid using these primitives and incurring the corresponding performance degradation.
