The Cache Coherent (CC) and the Distributed Shared Memory (DSM) models are standard shared memory models, and the Remote Memory Reference (RMR) complexity is considered to accurately predict the actual performance of mutual exclusion algorithms in shared memory systems. In this paper we prove a tight lower bound for the RMR complexity of deadlock-free randomized mutual exclusion algorithms in both the CC and the DSM model with atomic registers and compare&swap objects and an adaptive adversary. Our lower bound establishes that an adaptive adversary can schedule n processes in such a way that each enters the critical section once, and the total number of RMRs is Ω(n log n/ log log n) in expectation. This matches an upper bound of Hendler and Woelfel [16].
INTRODUCTION
The mutual exclusion problem, introduced by Dijkstra in 1965 [10] , is a fundamental and well-studied problem in asynchronous computing. Processes coordinate their access to a shared resource by serializing the execution of a piece of code, called critical section.
In this paper we consider the mutual exclusion problem in asynchronous shared memory models that provide atomic registers and compare&swap (CAS) objects. In such models, the number of steps a process executes can be unbounded, since processes may have to wait for other processes to leave the critical section. Therefore, the classical measure of efficiency, step complexity, is meaningless.
In shared memory systems, some of the memory is local to each process, while the rest of the memory is located in other processing units or in dedicated storage. For example, in cache-coherent (CC) systems, each processor keeps local copies of (remote) shared variables in its cache; the consistency of copies in different caches is maintained by a coherence protocol. In distributed shared-memory (DSM) systems, on the other hand, each shared variable is permanently locally accessible to a single processor and remote to all other processors.
References to remote memory (short RMRs) are orders of magnitude slower than accesses to local memory. Hence, the performance of many algorithms for shared memory multiprocessor systems depends critically on the number of RMRs they incur [4, 22] , and in particular the efficiency of mutual exclusion algorithms is usually measured in terms of the number of RMRs incurred by processes entering and exiting the critical section. Local-spin algorithms, which perform busy-waiting by repeatedly reading locally accessible shared variables, achieve bounded RMR complexity and have practical performance benefits [4] . In fact, recent research on mutual exclusion has almost entirely focused on the RMR complexity of the problem (see, e.g., [3, 2, 21, 6, 9, 18, 19, 20, 7, 15, 16] ).
Using strong primitives, such as fetch&increment objects, it is possible to implement mutual exclusion so that every process incurs only a constant number of RMRs per passage through the critical section. A prominent example is the MCS lock [23] , which uses an object that allows both compare&swap and swap operations. Other examples can be found in standard textbooks, such as [17] . If the system provides only atomic registers, then the RMR complexity of the mutual exclusion problem is higher. Since objects such as compare&swap can be simulated in O(1) RMRs from atomic registers [13] , they don't affect the RMR complexity of the mutual exclusion problem.
Yang and Anderson [24] presented the first deterministic mutual exclusion algorithm for n processes (using atomic registers) in which every process that enters the critical section incurs at most O(log n) RMRs. Anderson and Kim [1] then conjectured that this is best possible. Following several lower bound proofs [8, 20, 11] , Attiya, Hendler, and Woelfel [6] finally proved this conjecture true.
More recently, randomized techniques have been employed to improve the efficiency of mutual exclusion algorithms. Hendler and Woelfel [16] presented a randomized algorithm, where each process incurs an expected number of O(log n/ log log n) RMRs per passage through the critical section. The algorithm works for the strong adaptive adversary model, where scheduling decisions can depend on all past events, including local coin flips.
Recently, Bender and Gilbert [7] presented a very different approach to solving mutual exclusion. Their algorithm employs approximate counting techniques to guarantee with high probability an amortized RMR complexity of O(log 2 log n) per passage through the critical section on the CC model. (However, processes can deadlock with a small probability.) This upper bound bound was shown for a weak, oblivious adversary model, in which the schedule is independent of the random decisions made by processes.
Our Results
In reality, the speed of operations can depend on the random decisions of processes. E.g., the location of register accesses may be decided at random, but due to the memory hierarchy and architecture, the speed of such accesses is not uniform. It would therefore be desirable to achieve a similar RMR complexity as in the algorithm of Bender and Gilbert [7] , for stronger adversaries. The strongest "reasonable" adversary is the adaptive adversary, and thus, an algorithm with low RMR complexity for this adversary would guarantee efficiency independent of the system behavior. However, in the face of the fact that the best known algorithm for the adaptive adversary [16] has an O(log n/ log log n) expected RMR complexity per passage through the critical section, Bender and Gilbert [7] noted that their "choice of a weaker adversary seems fundamental." We show that this is indeed the case, by proving that any deadlock-free mutual exclusion algorithm for the n-process CC or DSM model has an expected RMR complexity of Ω(log n/ log log n) per passage through the critical section, against an adaptive adversary. This lower bound holds even for one-time mutual exclusion algorithms. Specifically, there is an adaptive adversary that schedules n processes in such a way that every process enters the critical section once, and the expectation of the total number of RMRs is Ω(n log n/ log log n). This is the first non-trivial lower bound for the RMR complexity of randomized mutual exclusion, against an adaptive adversary; for weaker adversary models no lower bounds are known.
Techniques
We define a randomized adaptive adversary, i.e., the adversary makes random scheduling decisions but the distribution over these decisions is independent of processes' future coin flips. We prove our lower bound on the expected RMR complexity of any deterministic mutual exclusion algorithm scheduled by the above randomized adversary. Our result then follows from Yao's Principle [25] .
Our randomized adversary schedules processes in rounds, but in every round only a small, randomly chosen fraction of the processes takes steps. This way, it is difficult for processes to "find" other processes. With every process we associate a potential such that the difference between the total potential initially and after all processes have finished is Ω(n log n/ log log n), and we argue that the expected decrease in potential per round is proportional to the expected number of RMRs executed in that round.
Our lower bound proof is very different from previous lower bounds for deterministic mutual exclusion algorithms [8, 20, 11, 6] . Potential functions have been used before to show lower bounds for deterministic shared memory algorithm for various problems (see, e.g., [12] ). However, we are not aware of any lower bound proofs for randomized shared memory algorithms that use potential function techniques.
MODEL
Our model is an asynchronous shared memory system where a set P of n randomized processes with unique IDs communicate by executing operations on a (possibly unbounded) set R of shared atomic registers. Each process is a probabilistic automaton (with a possibly unbounded number of states) that performs a sequence of steps. A step is an atomic read or write operation on a single shared register, followed, optionally, by a coin flip that returns a uniformly random value over some fixed, bounded domain Ω.
Note that there are implementations of linearizable compare&swap primitives from registers [13] . It follows from [14] , that these implementations can be used in a randomized adaptive adversary model in place of atomic compare&swap objects without increasing the expected RMR complexity. Therefore, our lower bound holds even when the system provides atomic compare&swap objects.
Our result holds for the cache-coherent (CC) model with a write-through cache 1 as well as for the distributed shared memory (DSM) model. In fact, we consider a hybrid of both models (similar to a NUMA with caches), where each process has its own memory segment of shared registers, which are local to the process. In addition, whenever a process reads a register r that is not in its own segment, it keeps a copy of r in its local cache memory. A coherence protocol ensures that if a write to register r occurs, all cached copies of r get invalidated. In the proof of the lower bound, we assume that even by writing to a register a process obtains a cache-copy of that register. More precisely, we say that process p has a valid cached copy of r, if p has accessed r and no process has written to r since p's last access of r. A read operation by p on register r incurs an RMR, if
• r is not in p's memory segment, and
• at the time of the operation p has no valid cached copy of r.
A write operation on r incurs an RMR, whenever r is not in p's memory segment. If p has a valid cached copy of r when some process q = p writes to r, then we say that q's write invalidates p's cached copy. 1 We believe that the arguments used in our proof can be extended to hold for a write-back cache as well.
DEFINITIONS AND NOTATION

Executions
An execution is a (possibly infinite) sequence E = (op1, op2, . . . ) of operations, where opi identifies the process p ∈ P that invokes the operation, the type of the operation (write or read), the register r ∈ R on which it is applied, the value that this operation writes or reads, and finally, if p executes a coin flip after this write/read operation, the outcome of that coin flip. The length of the sequence E is denoted |E|.
A schedule is a sequence σ = (p1, p2, . . . ) of process IDs. For a deterministic algorithm (i.e., one involving no coin flips), a schedule σ yields a unique execution E(σ), in which processes take steps in the order determined by σ, starting from the (unique) initial configuration. (If during execution E(σ) a process reaches a final state, i.e., it cannot take any further steps, then we assume it simply passes whenever it is scheduled again.) For a randomized algorithm, in order to obtain a unique execution we also need n (sufficiently large) coin-flip vectors, a vector cp = (cp,1, cp,2, . . . ), cp,i ∈ Ω, for each p ∈ P. A schedule σ together such a collection of coinflip vectors { cp}p∈P yields a unique execution E(σ, { cp}), in which the i-th operation is executed by process pi, and if that operation involves the j-th coin flip by process p then the coin flip returns value cp,j.
A configuration is a description of the states of all processes and the values of all registers. The configuration after the last operation in an execution E is denoted C(E). The concatenation of two executions E1 and E2 is denoted E1 • E2.
Adaptive Adversary
We assume that schedules are generated by an adaptive adversary (see, e.g., [5, 14] ), and thus can depend on the random values generated by the processes. In the adaptive adversary model, after each step the adversary decides which process takes the next step, and in order to make this decision it can take all preceding events into account, including the results of past coin flips, but not the results of any of the future coin flips. For determinability algorithms, adaptive adversaries are clearly no more powerful than non-adaptive (oblivious), which generate the schedule before the execution starts.
Randomized One-Time Mutual Exclusion
We prove a lower bound for every randomized one-time mutual exclusion algorithm. In such an algorithm, every process executes three pieces of code, an entry section, a critical section, and an exit section (in this order). The algorithm satisfies mutual exclusion if at any time at most one process is in its critical section.
We assume (w.l.o.g.) that in its critical section a process first reads a shared register rcrit and then it writes its ID to this register. No process accesses rcrit outside its critical section. Processes that have finished their exit section take no further steps. (Whenever they are scheduled again they simply pass.) We say that a process is active, if it has not finished its exit section; otherwise it is inactive. For an execution E, we let I(E) denote the set of processes that are inactive in configuration C(E).
Our lower bound holds for any randomized one-time mutual exclusion algorithm G that satisfies the following progress-condition, which we call expected deadlock freedom. Let C be an arbitrary reachable configuration, and let Q ⊆ P be the set of processes that have taken at least one step and are still active in C. There is an integer T = T (G, n) > 0 such that if we start from configuration C and all processes in Q take steps in a round-robin fashion (while the remaining processes take no steps), then the expected total number of steps until some process from Q finishes is at most T . We say then that the algorithm satisfies the deadlock freedom property for step-bound T .
AN RMR LOWER BOUND FOR A RAN-DOMIZED ADVERSARY
We describe a randomized adaptive adversary D for an arbitrary deterministic mutual exclusion algorithm. Then we prove a lower bound on the RMR complexity of any such deterministic algorithm L scheduled by the randomized adversary D, provided that L finishes in a bounded expected number of step (Lemma 4.2). Further, we prove that any randomized mutual exclusion algorithm G that satisfies expected deadlock freedom finishes in a bounded expected number of steps when scheduled by a deterministic adversary obtained by fixing the random choices of D (Lemma 4.3).
The Randomized Adversary
W.l.o.g. we assume that whenever a processes p writes some value x to a register it also writes its ID to the register, i.e., it writes the pair (x, p). We say that p is visible on register r, if the value of r is (x, p) for some arbitrary value x. Also, we assume (w.l.o.g.) that n = k k for some integer k, and we define
We now define randomized adaptive adversary D. The adversary schedules processes in batches as follows. We describe below a randomized adversary D(P ), for any P ⊆ P, which schedules only a small random subset P of the processes in P , until all processes in P finish. The expected size of P is ε · |P |. Adversary D now is composed of 1/ε adversaries D(P i ), 1 ≤ i ≤ 1/ε, where
and P i is the set of processes scheduled by D(P i ). First, processes are scheduled by D(P 1 ) until all processes in P 1 finish, then remaining processes are scheduled by D(P 2 ) until the processes in P 2 finish, and so on. Finally, any processes remaining are scheduled deterministically in a roundrobin fashion.
In the rest of this section we describe the randomized adversary D(P ), for an arbitrary set P ⊆ P. The adversary schedules processes in n 2 rounds; in each round zero or more processes take steps. Every round consists of three phases, an RMR Phase, a Roll-Forward Phase, and a Local Step Phase. We use a stack to keep track of the sets of processes to schedule in rounds. A stack element is a set of process IDs.
At the beginning (in round 0), a subset P of the processes in P is selected at random: each process from P is added to P with probability ε (independently). Next we schedule each process in set P to take steps until it is poised to execute an RMR, and we push the set P onto the stack. Throughout the execution, only processes in P will take steps.
We now describe round i, for 1 ≤ i ≤ n 2 . We maintain the invariant that at the beginning of each round, every process in P that is still active (i.e., it has not finished its exit section) is poised to perform an RMR. Clearly this invariant is true at the beginning of the first round.
RMR Phase
Round i starts with an RMR phase. Suppose that the stack is not empty at the beginning of the round. (If the stack is empty, then all processes in P must have finished their exit section, as will become clear, and thus we can stop scheduling processes.) We pop the topmost set of processes from the stack, and let Gi be the subset of the processes from that set that are still active. If Gi = ∅, the round ends immediately, and we proceed to the next round, i + 1, if i < n 2 . Suppose that Gi = ∅. For each register r ∈ R, we flip a biased random coin with one green side and the other red. All coin flips are independent, and the coin shows the green side with probability ε and the red side with probability 1 − ε. If the coin flipped for some register r is green, then we say that register r is green in round i; otherwise we say it is red. Based on the outcome of the coin flips, we partition Gi into processes that participate in the RMR Phase of the current round, and processes that are halted in this round. A process p ∈ Gi participates in the RMR Phase, if it is poised to execute its next operation on a green register; otherwise (i.e., if poised to execute its operation on a red register), process p is halted in the current round. (Note that the adaptive adversary always knows on which register what operation a process is poised to execute.) Halted processes don't take any steps in the current round, unless the get rolled forward (see the description of the Roll-Forward Phase). All processes that participate in the RMR Phase execute exactly one step in that phase; first the processes that are poised to read (in the order of increasing process IDs), and then the processes that are poised to write (again in the order of increasing process IDs). For each register r, the last process that writes to r in this phase is called the top-writer of register r in round i. If process q accesses (reads or writes) register r in the RMR Phase of the round i and process p = q is the top-writer of r in round i, then we say that p covers q in round i.
Roll-Forward Phase
After every process that participates in the RMR Phase has executed its (single) step in that phase, the Roll-Forward Phase starts. During this phase we keep track of a set of processes that are being "rolled forward". For every execution E we define a roll-forward set F(E), which is a superset of the set I(E) of inactive processes. If E is the longest prefix of E s.t. a process p ∈ F (E) − I(E) is not in F(E ) − I(E ), then we say that p gets rolled forward in the first step in E that follows E . (Note that a process p can be added to the roll-forward set because it becomes inactive, but in this case we don't say that p gets rolled forward.) Suppose our schedule has generated an execution E that ends during a Roll-Forward Phase. We now choose the process whose last step during the current Roll-Forward Phase is longest ago (giving preference to processes that have not taken any steps during the current Roll-Forward Phase). If multiple such processes exist, we choose among those the one with the smallest ID. Then we let that process take one step. This yields a longer execution E and a new roll-forward set F(E ). We let E = E and if F(E) = I(E), we repeat the above to determine a new process from the new set F(E) − I(E), and we schedule that process next. This is repeated until we have obtained an execution E such that F(E) = I(E), i.e., all processes in the roll-forward set are inactive. When this happens, the Roll-Forward Phase ends.
To complete the description of the Roll-Forward Phase it remains to give the definition of F(E), which we do next. We say that process p finds a process q = p on register r when one of the following two events happens:
1. p accesses r during an RMR Phase at a point when q is still active, and either r is in q's local memory segment or q was visible on r at the beginning of the RMR Phase; or 2. p accesses r during a Roll-Forward Phase at a point when q is not in the roll-forward set, and either r is in q's local memory segment or q was visible on r just before p executed its operation on r.
(Note that when multiple processes p1, . . . , p access the same register r in an RMR phase and one of them finds a process q, then all processes p1, . . . , p = q find q.) We say that process p spoils process q on register r in round i, if p and q both top-write in the RMR Phase of round i, p top-writes to r, and q had a valid cached copy of r at the beginning of round i. We define the set F(E) by the following five rules: (I0) If p ∈ I(E), then p ∈ F(E), i.e., all inactive processes are in the roll-forward set.
(F1) If process p finds process q = p during E, then both p and q are in F(E).
(F2) If process p spoils processes q1, . . . , qt in some round during E, then p and the process in {q1, . . . , qt} with the smallest ID are both in F(E).
(F3) If p incurs at least k RMRs in E, then p ∈ F(E).
(F4) If p ∈ F(E) and process q covers p in some round during E, then q ∈ F (E).
Local Step Phase and Preparation of the Next Round
After the Roll-Forward Phase has ended, we finish the round by letting all active processes take steps as long as these steps do not incur RMRs. We do this in the order of process IDs, i.e, first we let the process with the smallest ID take steps until either it becomes inactive or its next step is poised to incur an RMR; then we do the same for the process with the second-smallest ID, and so on. This is the Local Step Phase of round i.
If we are not in the last round, i.e., i < n 2 , then after the Local Step Phase is finished we prepare the next round by pushing appropriate sets of processes onto the stack. Let Ai be the subset of processes in Gi that are still active at the end of round i. We partition Ai into three sets Gi,j, j ∈ {1, 2, 3}, as follows:
• Gi,1 is the subset of processes in Ai that top-wrote to some green register r during the RMR Phase of round i (and thus are now visible on r);
• Gi,2 is the subset of processes in Ai − Gi,1 that took at least one step during round i; and
Then we push onto the stack all of the sets Gi,3, Gi,2, and Gi,1 that are not empty (in this order), and proceed to round i + 1. If now i = n 2 , our adversary becomes deterministic and it schedules all processes in P that are still active simply in a round-robin fashion.
The RMR Lower Bound
An analysis of our randomized algorithm, in Sections 6 and 7, yields the following lower bound on the RMR complexity of deterministic one-time mutual exclusion algorithms scheduled by that adversary.
Lemma 4.1. Let P be any subset of processes with |P | ≥ n/2, and L be any deterministic one-time mutual exclusion algorithm scheduled by randomized adversary D(P ). If all processes that take at least one step finish in a bounded expected number of steps, then the expected number of RMRs incurred is Ω(ε log n/ log log n).
Recall that ε = k −4 = Θ (log n/ log log n) 4 . Using this lemma we can easily show the next result.
Lemma 4.2. Let L be an arbitrary deterministic one-time mutual exclusion algorithm scheduled by D. If all processes finish in a bounded expected number of steps, then the expected number of RMRs incurred is Ω(log n/ log log n).
Proof. Let Ri, for 1 ≤ i ≤ 1/ε, be the number of RMRs incurred during the schedule by D(P i ). To prove the claim it suffices to show that Exp 1/ε i=1 Ri = Ω(ε log n/ log log n). We have
= Ω(ε log n/ log log n). We now bound Prob(|P i | ≥ n/2). By a simple induction argument we obtain that Exp[|P i |] = (1 − ε) i−1 n, and thus by Markov's inequality,
And since
the claim follows.
An Upper Bound on the Number of Steps.
The RMR bounds in Section 4.2, hold only for (deterministic) algorithms that finish in a bounded expected number of steps when scheduled by the randomized adversary. Below we show that any randomized algorithm that satisfies expected deadlock freedom is guaranteed to finish in a bounded expected number of steps, when scheduled by any deterministic adversary that is obtained by fixing the coin flips of the randomized adversary. We also argue that the randomized adversary corresponds to a distribution over a finite collection of deterministic adversaries.
Lemma 4.3. Let G be any randomized one-time mutual exclusion algorithm that satisfies expected deadlock freedom for step-bound T , and let A be any deterministic adaptive adversary obtained by fixing the random choices of D. In an execution of G scheduled by A, the total number of steps until all processes finish is upper-bounded by 18T n 3 with probabil-
Proof. Denote by A(P i ), for 1 ≤ i ≤ 1/ε, the part of adversary A that corresponds to the part D(P i ) of D. In every RMR Phase scheduled by A(P i ), each process in P i takes at most one step, while the remaining processes take no steps. Thus, the total number of steps in all RMR Phases scheduled by A is upper-bounded by
To bound the steps in the rest of the execution we use the following result, which we prove below.
Claim 4.4. Let E be a (proper) prefix of an execution of G scheduled by A, such that the first step after E ends is not part of an RMR Phase. The expected number of steps from the end of E until any one of the following events occurs is at most T : (a) a process gets rolled forward; (b) a process becomes inactive; (c) a process is stopped because it is poised to incur an RMR in a Local Step Phase.
Let E denote the execution of G scheduled by A, and let S be the subset of the steps in E that are not executed during RMR Phases. We say that a process becomes inactive in step t, if it becomes inactive after executing step t − 1. We will refer to the parts of the execution that are not regular phases (and in which processes are scheduled deterministically in a round-robin fashion) as Complementary Phases. Let ti, for i ≥ 1, be the i-th smallest step in S in which one of the events (a)-(c) occurs, or some phase starts (clearly, this phase is not an RMR Phase, but it can be a Complementary Phase). Further, let t i > ti be the smallest step in which one of the events (a)-(c) occurs; this step may or may not belong to S. We denote by i * be the largest i for which ti is defined.
Xi, because at the end of every phase other than RMR Phases event (b) or (c) occurs. Now if i ≤ i * and we have fixed the first ti − 1 steps, we can apply Claim 4.4 to bound Xi = t i − ti: From the claim and Markov's inequality, it follows that Xi is dominated by 2T · Yi, where Yi is a geometrically distributed random variable with expectation 2. Further, we observe that i * ≤ 2n + n 3 : Each of the n process in P is rolled forward at most once, it becomes inactive exactly once, and it participates in at most n 2 Local
Step Phases.
By combining the above we obtain that |S|
Yi, where Y1, . . . , Y 2n+n 3 are independent geometrically distributed random variables with expectation 2. And standard Chernoff-bound arguments yield that 2T ·
Yi is upper-bounded by 16T (2n + n 3 ) with probability at least 1 − e −(2n+n 3 ) . Therefore, with this probability, the total number of steps is upper-bounded by n 3 + 16T (2n + n 3 ). To compete the proof it remains to show Claim 4.4. Proof of Claim 4.4: If execution E ends during a Complementary Phase, then all active processes that have taken at least one step up to that point are scheduled in a roundrobin fashion, and thus the claim follows from the definition of expected deadlock freedom.
Suppose now that E ends during a Roll-Forward or Local
Step Phase scheduled by D(P i ). The following invariant holds for any execution E that ends during an RMR, Roll-Forward, or Local Step Phase scheduled by D(P i ): At the end of E no process p knows of some other process q that is not in the roll-forward set F(E ); or formally, process p cannot distinguish execution E from execution E | F (E ) ∪ {p}, in which processes in F(E ) ∪ {p} take steps in the same order as in E , and the other processes take no steps. The invariant holds because of the roll-forward rule (F1), as we explain now. Rule (F1) ensures that no process q / ∈ F (E ) is found during E , and this implies that no process ever accesses a register r during a phase at the beginning of which q was visible on r. It remains to show that no process q / ∈ F(E ) becomes visible on a register r after the beginning of a phase and subsequently r is read by a process p during the same phase. This is true for an RMR Phase because all reads precede all writes in that phase. Also it holds for a Roll-Forward Phase because any process that takes steps during that phase is already in F(E ). Finally, it is true for a Local Step Phase because in order for a process q to write to some register r and for a process p = q to subsequently read r at least one RMR must occur, which is impossible since processes are stopped before they incur an RMR in a Local Step Phase.
We can now finish the proof of Claim 4.4. First we consider the case in which E ends during a Roll-Forward Phase. By definition, after E ends all active processes in F(E ) are scheduled to take steps in a round-robin fashion until some of these processes becomes inactive or a new process gets rolled forward; denote by X the number of these steps. By the invariant we showed above, the execution until that point (at which a process gets rolled forward or becomes inactive) is indistinguishable to any p ∈ F(E ) from an execution in which only processes in F(E ) take steps (in the same order). In the latter execution, by expected deadlock freedom, the number Y of steps from configuration C(E | F(E )) until some process in F(E ) becomes inactive has an expectation of at most T . And since by the indistinguishability of the two executions, Y ≥ X, the claim follows.
Consider now the case where E ends during a Local Step Phase. Throughout this phase the set of rolled-forward processes is the same as the set of inactive processes: All the roll-forward events described in (F1)-(F3) require that a process incurs an RMR; hence, none of these events happens in the Local Step Phase, and thus neither does the event in (F4). Suppose now that process p is scheduled to take the next step after E ends. Then p continues to take steps solo until it either becomes inactive or is stopped because it is about to incur an RMR; again let X denote the number of these steps. By the invariant we showed earlier, from p's point of view the execution up to that point is indistinguishable from the one where only the processes in F(E ) = I(E ) and p take steps, and, by expected deadlock freedom, in the latter execution the number Y of steps from C(E | F(E ) ∪ {p}) until p becomes inactive has an expectation of at most T . And since Y ≥ X, the same bound holds for X. This completes the proof of Claim 4.4, and of Lemma 4.3.
The randomized decisions that adversary D must make are that for each 1 ≤ i ≤ 1/ε, (1) it must select the random subset P i of the processes in P i that are scheduled by D(P i ); and (2) it must determine the colours of the shared registers in each of the n 2 rounds schedule by D(P i ). At most n independent coin flips are needed for (1), one coin flip for each process in P i . But for (2) an infinite number of coin flips may be necessary to colour all registers, since R can be unbounded. Note, however, that in each round the adversary only needs to colour the registers that are accessed in the RMR Phase of that round, that is, at most |P i | registers. Therefore, the total number of coin flip that D executes is at most
Further, these coin flips are governed by the same distribution, that is, one side has probability ε and the other 1 − ε. Therefore, a vector of n/ε + n 3 independent coin flips yields a unique deterministic adaptive adversary. The next observation now follows.
Observation 4.5. Randomized adversary D can be described as a probability distribution over a collection of at most 2 n 3 +n/ε deterministic adaptive adversaries.
THE MAIN THEOREM
We are now ready to prove the main result of this paper.
Theorem 5.1. For every randomized one-time mutual exclusion algorithm that satisfies expected deadlock freedom, there is an adaptive adversary that yields an execution where the total number of RMRs incurred has an expectation of Ω(n log n/ log log n).
Proof. Fix a randomized one-time mutual exclusion algorithm G that satisfies expected deadlock freedom for stepbound T . We can view G as a probability distribution over a set L of deterministic algorithms obtained by fixing the coin-flip vector associated with each process. Since these vectors are infinite, L can be uncountably infinite. Consider now the randomized adaptive adversary D described in Section 4. We saw in Observation 4.5 that D can be viewed as a probability distribution over a set A of 2 n For each deterministic adversary A ∈ A, let L(A) ⊆ L be the set of deterministic algorithms such that each algorithm in L(A) finishes in at most λ = 18T n 3 steps when scheduled by A. From Lemma 4.3, for any A ∈ A, the total number of steps in an execution of G scheduled by A is bounded by λ = 18T n 3 with probability at least 1 − e
Then every algorithm in L finishes in at most λ steps when scheduled by and adversary in A, and by the union bound,
We will now show using Yao's Principle that
( 5.3) Recall that the set A is finite, but L may be uncountably infinite (which prevents us from applying Yao's Principle). However, by definition, any algorithm L ∈ L finishes after at most λ steps when scheduled by any adversary A ∈ A. Thus, instead of set L we can consider the set L of algorithms obtained if we modify each algorithm L ∈ L by forcing every process to stop after it executes its λ-th step (if it has not finished earlier); we denote by L(λ) this bounded version of L.
3 Clearly, TRMR(L, A) = TRMR(L(λ), A) for all L ∈ L and A ∈ A, and thus (5.2) still holds if we replace L by L . Further, we have that L is finite: since for every algorithm in L each processes takes at most λ steps and thus it uses at most λ coin flips, the number of distinct algorithms in L is at most |Ω| nλ , where Ω is the domain of the coin flips. Therefore, by Yao's Principle, for any distribution G over L ,
= Ω n log n log log n .
Equation (5.3) now follows by choosing
Combining (5.1) and (5.3), yields the theorem.
THE PROBABILITY OF FINDING PRO-CESSES
One of the main challenges in our analysis is to bound the probability that processes get rolled forward due to rollforward rule (F1) i.e., because some process finds another one. In this section we show that in every step in which a process incurs an RMR it triggers that roll-forward event only with small probability.
As discussed earlier, we assume that processes are deterministic. Hence, an execution is uniquely determined by a schedule provided by the randomized adversary. Further, the schedule is uniquely determined by the colour (red or green) of each register in each of the first n 2 rounds. (Recall that after n 2 rounds, the adversary switches to a deterministic round-robin schedule.) A colour schedule is a binary matrix (Mi,r) 1≤i≤n 2 ,r∈R that corresponds to the schedule constructed by our adversary: Mi,r = 1 if register r is green in round i, and Mi,r = 0 if r is red in round i. Let E(M ) be the execution resulting from the colour schedule M . We sometimes say Mi,r is green (red ), if Mi,r = 1 (resp., Mi,r = 0). We say that a colour schedule M dominates a colour schedule M , if M i,r ≥ Mi,r for all (i, r) ∈ 1, . . . , n 2 × R. Recall that a register is green in round i with probability ε.
In the remainder of this section we prove the following statement, which is the core of the analysis of the randomized adversary.
Lemma 6.1. Let M be a random colour schedule. For a process b, a round number j, and an integer ζ ≥ 0, define the events E b,j,ζ and E f ind b,j,ζ (M ) as follows:
in the RMR Phase of round j of E(M ), and E f ind b,j,ζ (M ) is the event that b finds a process in that phase.
• If ζ > 0, then E b,j,ζ (M ) is the event that in the RollForward Phase of round j of E(M ) process b incurs the ζ-th RMR of that phase, and E f ind b,j,ζ (M ) is the event that b finds a process when it incurs the ζ-th RMR.
The idea is the following: Consider a colour schedule M that results in an execution E(M ) in which a process b finds some other active process a in a step s (where s is the ζ-th RMR process b incurs in some round j). We flip some array entries of M from green to red to obtain a new colour schedule N . In the resulting execution D = E(N ) process b does not find any other process in step s. We have to be careful, though to not change E too much. In particular we must ensure that in D there are not any more roll-forward events than in E. In fact, the execution D will be the same as E for almost all processes.
We then evaluate the relative probability of the two colour schedules M and N . Since we only flip some registers from green to red in order to obtain N from M , colour schedule N is significantly less likely than colour schedule M . However, we also have to take into account the number of distinct colour schedules M that get mapped to the same N in order to avoid that process b finds some other process in step s.
Preliminaries
We start with some observations regarding the rollforward rules. Roll forward rule (F4) implies that there can be a chain of roll-forward events in some step: If process a covers b and b covers c, then when c gets rolled forward, a and b have to get rolled forward, too. Part (a) of the following claim implies that such chains have length at most k + 1. In part (b) of the claim, we show that event (F3) cannot trigger event (F4). Part (c) states that if in the RMR Phase of some round a process gets rolled forward due to (F3), then all processes that participate in that RMR Phase get rolled forward because they execute their k-th RMR. Finally, in part (d) we show that at most one process may become inactive in a round without getting rolled forward.
Claim 6.2. Let E be an execution obtained by the randomized adversary.
(a) If p1, p2, . . . , p ∈ F(E), and for all 1 ≤ j < process pj covers pj+1 at some point during E, then ≤ k.
(b) If p, q ∈ F(E) and E ends at the beginning of a round in which p will get popped from the stack, then q does not cover p during E.
(c) If processes p and q both participate in round i, then in every round j < i either both, p and q, or none of them participated. (In particular, if a process incurs its k-th RMR in the RMR Phase of round i, then all processes that participate in this RMR Phase are rolled forward in the following Roll-Forward Phase.) (d) In each round, at most one process becomes inactive among the processes that do not get rolled forward.
Proof. (a):
For the purpose of a contradiction, suppose ≥ k + 1. Let i1, . . . , i −1 be the round numbers, such that pj covers pj+1 in round ij, for all 1 ≤ j < . We show that p participates in all rounds i1, . . . , i −1 . I.e., p takes a step in the RMR phase of each of those rounds, and thus it incurs − 1 ≥ k RMRs during E. Thus, by rule (F3) it is in F(E), contradicting the assumption.
Clearly, p participates in round i −1 , as it gets covered by p −1 in that round. Suppose p does not participate in some round ij, 1 ≤ j ≤ − 2. Then either p is still in a set on the stack when the set containing pj and pj+1 was popped from the stack at the beginning of round ij, or p gets halted in round ij. In either case, since pj is a top-writer in round ij, process p ends up in a set on the stack that is below pj's set. Hence, p cannot participate again until it gets rolled forward, or until pj has become inactive. Neither of that happens during E as p ∈ F(E). It follows that p does not participate in round i −1 -a contradiction.
(b): Suppose q does cover p in some round i of E. Then at the end of round i process q is added to set Gi,1 and p to set Gi,2, and first Gi,2 and then Gi,1 are pushed on the stack. Thus, p won't get popped from the stack until q has become inactive.
(c): This is immediate from the definition of the Local Step Phase: If a set S is pushed on the stack, then either all or none of the processes in S participated in the previous RMR Phase. And processes can only participate in an RMR Phase, if their set is popped from the stack.
(d): Recall that in its critical section a process first reads a shared register rcrit, and then it writes its ID to this register. No process accesses rcrit outside its critical section.
Assume by way of contradiction that two processes p and q that to not get rolled forward become inactive in round i, and assume (w.l.o.g.) that p enters its critical section before q does. First we argue that p becomes inactive before q enters its critical section. Let p be the first process that enters its critical section after p. When p takes its first step in the critical section, which is a read of register rcrit, p must already be inactive, since otherwise p would find p and p would get rolled forward. Since either q = p or q enters it critical section after p , it follows that p becomes inactive before q enters its critical section.
Next we observe that rcrit cannot be in q's memory segment: if it were, then p would have found q when it accessed rcrit in its critical section, and thus p would have been rolled forward. Therefore, q incurs two RMRs during its critical section: one for reading and one for writing register rcrit. And as we argued earlier, both RMRs are incurred after p became inactive. Since we assumed that both p and q become inactive during round i, it follows that q incurs (at least) two RMRs during this round. However, unless a process gets rolled forward, it incurs at most one RMR in each round (in the RMR Phase of the round). Therefore, q gets rolled forward-a contradiction.
Erasing Processes
wrote also in some other round i < i < j. Then in the new execution, E(M ), a is halted in round i , so one of the processes that were covered by a will now become a top-writer. This may change the execution E(M ) considerably, and lead to other processes getting rolled forward. In order to avoid this, we flip every matrix entry M i ,r from green to red, if process a top-writes in round i to register r. The resulting colour schedule is N . This way, in E(N ), all processes that were previously covered by a in a round i will now get halted in round i , and thus don't take any more steps until a becomes inactive. Note that if they were not halted and be covered instead, then they would also not take any more steps until a has become inactive.
For a colour schedule M , an integer i ≥ 1, and a value λ ∈ N ∪ {0, ∞}, define E(M, i, λ) to be the prefix of E(M ) that ends
• at the beginning of the RMR Phase of round i if λ = 0,
• just before the λ-th step of the Roll-Forward Phase if the Roll-Forward Phase has at least λ steps,
• at the end of the Roll-Forward Phase if the RollForward Phase has fewer than λ steps, and
• at the end of round i (i.e., after the Local Step Phase) if λ = ∞.
For every process p and every integer i ∈ {1, . . . , n 2 } define erase(M, p, i) to be the colour schedule N , where for ( , r) ∈ 1, . . . , n Let b be some process, r ∈ R some register, λ ∈ N ∪ {0, ∞}, and 1 ≤ i ≤ j ≤ n 2 . We define a function f j,λ,r,b that maps a colour schedule M , a process a, and an integer i to a new colour schedule as follows. If (A1) process a top-writes to register r in round i of E(M ), (A2) a ∈ F E(M, j, λ) , (A3) b takes at least one step in round j of E(M ), and (A4) a does not cover b in round j of E(M, j, λ),
For the remainder of this section, we fix arbitrarily a process b, values λ ∈ N ∪ {0, ∞} and j ∈ {1, . . . , n 2 }, and a register r. This uniquely determines a function f j,λ,r,b .
Claim 6.3. Let a, a = b be distinct processes, let M, M , N be colour schedules that differ in at least one of the first j rows, and let i ∈ {1, . . . , j}. If Proof. Since M , M , and N are distinct, N = erase(M, a, i) = erase(M , a , i). Hence, by (A1), in round i of E(M ) resp. E(M ) process a resp. a top-writes register r. Then Mi,r = M i,r = 1 (or else no process topwrites in round i on register r). From the definition of the function erase, Mi * ,r * = Ni * ,r * = M i * ,r * for all pairs (i * , r * ) ∈ {1, . . . , i} × R − {(i, r)}. Since in addition Mi,r = M i,r = 1, executions E(M ) and E(M ) are identical during the first i rounds. As only one process can top-write register r in the i-th round of these executions, we have a = a . This proves (a). Now let (i , r ) be the first pair in a lexicographical ordering, where M i ,r = M i ,r . By the assumption that M and M differ in one of the first j rows we have i ∈ {1, . . . , j}. Moreover, (b) follows immediately from the choice of i . W.l.o.g. M i ,r = 1 and M i ,r = 0. By definition of the function erase, M and M dominate N , so N i ,r = 0. Hence, again from the definition of erase, in round i of E(M ) process a top-writes to register r . This proves (d1). But the first i − 1 rounds of E(M ) and E(M ) are identical, so at the beginning of round i of E(M ) process a is poised to execute a step on register r . It follows from the definition of erase that M and N are equal in row i and M and N are equal in all entries of that row except the one of column r . Hence, (c) follows.
It remains to show (d2). If i = j, then this follows immediately from the already established fact that M and N are equal in row i . Hence, from now on assume i < j.
Let D = E(M, j, λ). We claim that there is a process c = a that participates in the RMR Phase of round j of D and is not covered by a at any time during D: If b participates in round j, then it gets popped from the stack in that round, so by Claim 6.2 (b) and (A2) a does not cover b in execution D prior to round j. By (A4) a does not cover b in round j of execution D either. Hence, if b participates in round j, then we can choose c = b. Otherwise, b can only take steps in round j after it has been rolled forward. Hence, b ∈ F(D). In this case b cannot have been rolled forward because of rule (F3), because that would imply that b had incurred an RMR in round j before it got rolled forward. From rules (F1), (F2), and (F4), there must be at least one other process c = b that participates in the RMR Phase of round j and that causes a first process to get rolled forward in that round. But then this process, c, gets rolled forward, too, so c ∈ F(D). Since a ∈ F(D), we have c = a, and moreover due to roll-forward rule (F4), a never covered c during D. By (d1) process a top-writes register r in round i of E(M ). Then c does not access register r in the roundi RMR Phase of D (if c wrote r in round i , then a would cover c, and if c read r , then a would be rolled forward due to (F1)).
Due to (b) and (c), during the first i rounds of E(M ) process c performs exactly the same steps as in E(M ). Now recall that process a top-writes in round i of E(M ). Since in E(M ) process c participates in round j > i at a point when a is still active, it follows that in round i process c must have top-written, too. (Otherwise, at the end of round i of E(M ) process c would land on the stack in a set below a's set and it would never get popped from the stack before a has become inactive, contradicting that c participates in round j of E(M ).) Hence, in round i of E(M ) and thus also in E(M ) process c lands in a set on top of the stack. Since a gets halted in round i of E(M ), a does not end up in the same set on the stack as c, and in particular it lands in a set below c's. Thus, the only way that a can take a step in a later round of E(M ) before c is finished is if a gets rolled forward, but by the assumption (A2) a ∈ F(E(M , j, λ)). It follows that in E(M ) process a does not take any steps and in particular does not top-write in any of the rounds i , i + 1, . . . , j. Thus, from the definition of function erase, it follows that M equals N in rows i , i + 1, . . . , j, which completes the proof of (d2).
The next lemma bounds the number of distinct colour schedules that are mapped to the same schedule via the above function. Let j, λ, r, and b be fixed arbitrarily as above.
Lemma 6.4. For every colour schedule N , and all integers i ∈ {0, . . . , j} there are no k + 2 pairs (M1, a1) , . . . , (M k+1 , a k+2 ) such that M1, . . . , M k+2 are all distinct in their first j rows, and f j,λ,r,b (Ms, as, i) = N for all 1 ≤ s ≤ k + 1.
Proof. For the purpose of a contradiction, suppose there are pairs (M1, a1) , . . . , (M k+2 , a k+2 ) as described in the statement of the lemma. W.l.o.g. assume that N ∈ {M1, . . . , M k+1 } (i.e., possibly but not necessarily N = M k+2 ). Then we get a := a1 = · · · = a k+1 from Claim 6.3 (a).
Consider arbitrary M, M ∈ {M1, . . . , M k+1 }, and let i be the index in {1, . . . , j} that satisfies Claim 6.3, and in particular M i ,r = 1 and M i ,r = 0 for some register r ∈ R.
From part (b) of Claim 6.3 it follows that M dominates M in rows 1, . . . , i − 1. By the definition of erase, M dominates N . Hence, from part (d2) it follows that M dominates M in rows i , . . . , j. Thus, for any two matrices in {M1, . . . , M k+1 } it holds that one dominates the other in rows 1, . . . , j. Since the dominance relation is transitive, it follows that we can relabel the indices of M1, . . . , M k+1 such that Ms+1 dominates Ms for 1 ≤ s ≤ k. Now for every s ∈ {1, . . . , k} let is ∈ {1, . . . , j} be the index such that i = is satisfies Claim 6.3 for the matrices M, M ∈ {Ms, Ms+1}. Since Ms+1 dominates Ms we get from parts (b), (c) and (d2) of that claim that Ms+1 and Ms are equal in rows 1, . . . , is − 1 and differ in row is, and Ms equals N in rows is, . . . , j. Hence, i1 < i2 < · · · < i k . A simple induction on s = 1, . . . , k, shows that process a topwrites in rounds i1, i2, . . . , is of execution E(Ms+1): For s = 1 this follows immediately from Claim 6.3 (d1) for M = M2 and M = M1. Now suppose in execution E(Ms) process a top-writes in rounds i1, . . . , is−1. Since Ms equals Ms+1 in rows i1, . . . , is − 1 and i1, . . . , is−1 < is, process a also topwrites in rounds i1, . . . , is−1 of execution E(Ms+1). From Claim 6.3 (d1) for M = Ms+1 and M = Ms it follows that in addition a top-writes in round is of E(Ms+1).
Hence, we have shown that a top-writes in rounds i1, . . . , i k of execution E(M k+1 ). Hence, in its top-write in round i k , a incurs its k-th RMR during E(M k+1 ), so a gets rolled forward immediately after. Since i k ≤ j we have a ∈ F E(M k+1 , j, λ) . But then M k+1 = N by the definition of function f i,j,λ,r,a,b , which contradicts the assumption that N ∈ {M1, . . . , M k+1 }.
The Erasing Lemma
For an execution E and an integer i ≥ 1 let RMRi(E), RF i(E), and LS i(E) denote the sub-executions of E that comprise the round-i RMR, Roll-Forward, and Local Step Phase, respectively. executions can be empty, e.g., if E has fewer than i rounds, or if it E ends before the corresponding phase in round i has started. For a register r, let rmr i,r (E) denote the set of processes that access register r during RMRi(E). Further, let partcpti(E) denote the set of processes that participate in the round-i RMR Phase of execution E, i.e., the processes that take at least one step in RMRi(E).
Let state(p, E) denote the state of process p at the end of execution E. We assume w.l.o.g. that processes record in their state how many steps they have taken, and the sequence of responses from all previous operations. Hence, if E|p = E |p, then state(p, E) = state(E , p). We also assume w.l.o.g. that processes know and record in their state, whether and when one of their own steps incurred an RMR, and the round numbers in which they took steps. Let val (r, E) denote the value of register r at the end of execution E, and let local (p, E) denote the set of registers which are local to p at the end of E (i.e., they are either in p's local memory segment, or p has valid cached copies of them). We write owner (r) to denote the process in whose memory segment a register r is located. Further, let topw (p, E) denote the set of registers which process p top-writes during execution E, and cov (p, E) is the set containing p and all processes that get covered by p during E.
If E is the empty execution or E ends at the end of a Local Step Phase, then stack (j, E) denotes the set of processes which at the end of E is in the j-th level of the stack, where stack (0, E) is the topmost set. For any other execution E, stack (j, E) = stack (j, E ), where E is the longest prefix of E that is either empty or ends immediately after a Local
Step Phase.
The next lemma is at the core of our analysis and lists the properties of the erase function.
Lemma 6.5. Let M be some colour schedule and N = erase(M, a, tmin) for some process a and some integer tmin. Further, let i ≥ 1, λ ∈ N ∪ {0, ∞}, E = E(M, i, λ), and D = E (N, i, λ) . If a ∈ F (E ), then all of the following are true:
(f ) Suppose λ = ∞. If i < t1, then the stacks are identical. Otherwise, there is some index j * such that a ∈ stack (j * , E ) and
In order to prove the lemma, we define E = E(M ) and D = E(N ). Hence, E and D are prefixes of E and D, respectively. Let t1, . . . , t be the rounds of E in which process a top-writes, and let rj, 1 ≤ j ≤ , be the register to which a top-writes in round tj. We start with the following claim. Claim 6.6. If a ∈ F (E ), then for j ∈ {1, . . . , }, then all of the following is true:
(b) In E no process in P − {a} accesses register rj after the round-tj RMR Phase.
(c) In E no process p ∈ cov a, RMRt j (E) − {a} executes any steps in the round-tj Roll-Forward Phase, nor does it get popped from the stack or execute any more steps after round tj.
(d) In E no process p accesses a register r, if owner (r) = p was covered by a in an earlier step.
Proof. (a):
For the purpose of a contradiction, assume there is a process p ∈ cov j ∩ F(E ). Then p = a because a ∈ F (E ). Hence, by the definition of cov j , p gets covered by process a in round tj of E . But then p ∈ F(E ) together with roll-forward rule (F4) implies a ∈ F (E )-a contradiction.
(b): Assume the claim is not true. Let p = a be the first process to accesses rj after the round-tj RMR Phase of E . Since process a top-wrote rj in that phase, a is still visible on rj when p accesses it. Hence, p finds a and thus by (F1) a ∈ F (E )-a contradiction.
(c): Let p ∈ cov a, RMRt j (E) −{a}. By part (a), p, a ∈ F(E ). In round tj process p gets covered by process a, so it does not execute any more steps during that round. By Claim 6.2 (b) p does not get popped from the stack after round tj, and since p does not get rolled forward during E , it cannot take any steps in a round where it doesn't get popped from the stack.
(d): Suppose p accesses register r and q := owner (r) = p. Then p finds q, so q ∈ F (E ). By part (a), q ∈ cov (a, E ).
Proof of Lemma 6.5. We prove the lemma by induction on the length of E , i.e., for increasing i and λ. First suppose i < t1 or i = t1 and λ = 0, i.e, E ends at the beginning of round t1 or earlier: Since a does not top-write in any round j, where tmin ≤ j < t1, it follows from the definition of erase that M equals erase(M, a, tmin) in the first t1 − 1 rows. Hence, the executions D and E are identical. Now assume that E ends some time during round i ≥ t1. We consider three cases, first that E ends right after the RMR Phase (λ = 1), second that E ends at a later point during the Roll-Forward Phase (2 ≤ λ < ∞), and third that E ends during the Local Step Phase (λ = ∞).
Case 1: λ = 1. Hence, E and D end immediately after the RMR Phase of round i. Assume that the induction hypothesis holds for the prefixes E and D of E and D, resp., which end at the end of round i − 1.
and Gi(D) be the sets of active processes that are popped from the stack at the beginning of round i in E and in D (and thus in E and D ), respectively. From the induction hypothesis (f) for E and D and Claim 6.6 (c) we know that
Consider a register r ∈ topw (a, E ). By Claim 6.6 (b) no process except possibly a will access register r during RMRi(E), so ∀r ∈ topw (a, E ) :
If r is red in round i of E, then by definition of erase, r is also red in round i of D, so rmr i,r (D) = ∅. Now suppose that r is green in round i of E. Then (6.2) implies that no process in Gi(E) − {a} is poised to access r at the end of E (or else it would access r during the round-i RMR Phase of E). By (6.1), Gi(E) − {a} ⊆ cov (a, E ), and by part (a) of the induction hypothesis applied to E and D every process in cov (a, E ) is in the same state at the end of E as at the end of D . It follows that also at the end of D no process in Gi(E) − {a} is poised to access register r.
Since Gi(E) − {a} = Gi(D) (by (6.1)), we get ∀r ∈ topw (a, E ) :
Now consider a register r ∈ topw (a, RMRi(E)). By construction a process executes an operation on r during the round-i RMR Phase of E if and only if it is either a or it gets covered by a during that phase. Moreover, by the definition of erase, Ni,r = 0 so in RMRi(D) no process accesses r. Thus, ∀r ∈ topw (a, RMRi(E)) : RMRi(a, E) ) ⊆ cov (a, E ) and
Now consider a register r ∈ topw (a, RMRi(E)). If r is red in round i of E, then r is red in round i of D, too, and so RMRi(E) = RMRi(D) = ∅. Now suppose r is green in round i of E. Since r ∈ topw (a, RMRi(E)), by definition of erase register r is also green in round i of D. Then rmr i,r (E) and rmr i,r (D) are the sets of processes in Gi(E) and Gi(D), respectively, which are poised to access r at the end of execution E and D , respectively. From (6.1) none of the processes in Gi(D) = Gi(E) − {a} is in cov (a, E ). Thus, from part (a) of the induction hypothesis all of the processes in Gi(D) = Gi(E) − {a} are in the same state at the end of E as at the end of D , so if a process other than a is poised to access r at the end of E it is also poised to access r at the end of D . Thus, rmr i,r (E) − {a} = rmr i,r (D). Moreover, since r ∈ topw (a, RMRi(E)), no process gets covered by a on r during RMRi(E). Since none of the processors participating in round i of E is in cov (a, E ), we have ∀r ∈ topw (a, RMRi(E)) :
Note that partcpt i (E ) is the union of all rmr i,r (E ) for all r ∈ R. Hence, (6.4) and (6.5) imply that
In the following we show for every register r ∈ R that ∀p ∈ P − {a} : Moreover, if r ∈ topw (a, E ), then val (r, E ) = val (r, D ), (6.8) and ∀p ∈ rmr i,r (D) :
Then together with the induction hypothesis, (6.7) implies (d), and (6.8) implies (b). Moreover, if p ∈ cov (a, E ) then (6.4) and (6.5) imply that either p participates neither during rmr i(E ) nor during RMRi(D ), or, in both executions it accesses the same register r. In the latter case, by (6.4) that register r is not in topw (a, E ), so (6.9) applies. In either case, parts (a) and (c) of follow. We prove (6.7)-(6.9) for some fixed register r ∈ R: If Mi,r = Ni,r = 0, no process executes any steps on r during RMRi(E) or RMRi(D), so (6.7)-(6.9) are trivially true. Since M dominates N , Ni,r = 1 and Mi,r = 0 is not possible. If Mi,r = 1 and Ni,r = 0, then by construction of N , process a is the (only) top-writer during RMRi(E) on register r, but no process accesses r during RMRi(D), so (6.7) is true. Moreover, then r ∈ topw (a, E ), so there is nothing to show for (6.8) and (6.9). Now assume Mi,r = Ni,r = 1. If a ∈ rmr i,r (E), then by (6.5) R := rmr i,r (E) = rmr i,r (D) and no process in R is in cov (a, E ). Hence, due to part (a) of the induction hypothesis all processes invoke exactly the same operation on r in RMRi(E ) as in RMRi(D ). The order of all operations on r is uniquely determined by the set of processes executing an operation on that register, so this is the same for RMRi(E ) and RMRi(D ). By part (b) of Claim 6.6 r ∈ topw (a, E ), and by part (d) owner (r) ∈ cov (a, E ). Thus, by part (b) of the induction hypothesis, r has the same value at the end of E as at the end of D . Hence, the sequence of operations (including responses) executed on r in RMRi(E ) is exactly the same as in RMRi(D ), and so (6.7)-(6.9) follow. Now assume a ∈ rmr i,r (E). By the assumption that Ni,r = 1 and the construction of N , process a is not a top-writer in round i. Hence, by (6.5) rmr i,r (E) = rmr i,r (D) ∪ {a} and no process in rmr i,r (E) − {a} is in cov (a, E ). If no process writes r during RMRi(E), then a also reads and does not change the value of register r. In this case, (6.7)-(6.9) follow immediately. If some process writes r during RMRi(E), then the process p that topwrites r is not a, as otherwise Ni,r = 0 by the definition of erase. Moreover, either a reads during RMRi(E), or, if it writes, its write-operation gets immediately overwritten. All other processes in rmr i,r (E) − {a} execute exactly the same operations in the same order during RMRi(E) as during RMRi(D). Hence, (6.7)-(6.9) follow.
This completes the proof of (a)-(d).
We now prove (e). In particular, we show that if a process p gets rolled forward in RMRi(E), then and only then it gets rolled forward in RMRi(D). This establishes
and thus from part (e) of the induction hypothesis we get F(E ) = F(D ). By Claim 6.6 (a), cov (a, E ) ∩ F (E ) = ∅, so (e) follows.
We informally say an event (Fi), 1 ≤ i ≤ 4, occurs, if there is a process p that gets rolled forward because of rule (Fi).
Suppose (F1) occurs during RMRi(E) or RMRi(D) because some process p finds a process q = p on a register r. Then a ∈ {q, p}: In RMRi(D) process a takes no steps, and in RMRi(E) process a cannot be found or find another process, because a ∈ F (E ). Therefore, r ∈ topw (a, E ), because by (6.2) only process a accesses such a register in RMRi(E ), and by (6.2) no process accesses such a register in RMRi(D ). Hence, by the induction hypothesis, the same process is visible on r at the end of E as at the end of D . Thus, if a process gets found in both, RMRi(E) and in RMRi(D), then it must be the same process q in both executions. If r ∈ topw (a, RMRi(E)), then by (6.4) rmr i,r (E) ⊆ cov (a, E ), so a process in that set finds q which contradicts Claim 6.6 (a). If r ∈ topw (a, RMRi(E)) then we get from (6.5) that rmr i,r (E) = rmr i,r (D) as a ∈ rmr i,r (E) (because otherwise a would find q). Hence, exactly the same set of processes find q in RMRi(E) as in RMRi(D). Now we consider (F2). Suppose some process p spoils some process q on a register r either during RMRi(E) or during RMRi(D). Then by Claim 6.6 (a), p, q ∈ cov (a, E ), and in particular p, q = a. From (6.2) and (6.3) it follows that no process can spoil another process on a register r ∈ topw (a, E ) during RMRi(E ) or during RMRi(D ). If r ∈ topw (a, RMRi(E)), then process a top-writes to r in RMRi(E), so no process spoils another process on r during RMRi(E). Moreover, in this case register r is red in round i of D, so no process spoils another process on r during RMRI (D). Hence, r ∈ topw (a, E ). Since p, q ∈ cov (a, E ), then from induction hypothesis (c), q has a valid cached copy of r at the beginning of round i of both executions, D and E , and as established earlier, each of p and q executes in RMRi(D ) exactly the same step as in RMRi(E ). It follows that p spoils q in both executions.
Next consider (F3). From Claim 6.2 (c), if one process gets rolled forward in RMRi(E) or in RMRi(D) because it has incurred k RMRs, then all processes in partcpt i (E) and in partcpt i (D), respectively, execute their k-th RMR in RMRi(E) and RMRi(D), respectively. From part (a) of the induction hypothesis, all processes p ∈ cov (a, E ) have incurred equally many RMRs during E as during D . According to (6.6), partcpt i (E) ∩ cov (a, E ) = partcpt i (D). Thus, if some process in partcpt i (E) gets rolled forward during RMRi(E) due to (F3), then that same process gets also rolled forward during RMRi(D). Now suppose that a process p ∈ partcpt i (D) executes its k-th RMR on some register r during RMRi(D) but does not do so during partcpt i (E).
But by part (a) of the induction hypothesis, at the beginning of round i of E process p is poised to access register r. Hence, p ∈ topw (a, RMRi(E)), so by (6.4) rmr i,r (D) = ∅-a contradiction.
Finally, we consider (F4). This event occurs only if a process p gets rolled forward during RMRi(E ) or RMRi(D ) and that triggers some other process q to get rolled forward (i.e., it causes q ∈ F(E )). There may be a chain of such (F4) events, i.e., a sequence of processes p1, . . . , p may get rolled forward and ps triggers event (F4) for ps+1. But the first process, p1 gets rolled forward because of rules (F1)-(F3). Then from the induction hypothesis and as already established, p1 ∈ F(E ), p1 ∈ F(D ), and p1 ∈ cov (a, E ); this is the base case for an induction on s, where we show ps ∈ F (E ) ∩ F (D ). For the inductive step, note that ps+1, 1 ≤ s < , covers ps on a register r in some round j ∈ {1, . . . , i} during E or D . Since ps+1 ∈ F(E ), we have ps+1 = a. Thus, by part (d) of the induction hypothesis, ps+1 top-writes in round j of E and in round j of D to the same register. Since ps ∈ F (E ) we have ps ∈ cov (a, E ) (by claim 6.6 (a)). Hence, from part (a) of the induction hypothesis it follows that ps executes the same steps in round j of E as in round j of D . Thus, ps+1 covers ps in round j of both executions. Since ps ∈ F (E ) ∩ F(D ) rule (F4) implies that ps+1 ∈ F (E )∩F (D ). This completes the proof of (e).
Since E does not end at the end of a Local
Step Phase, part (f) follows trivially for E . Case 2: 2 ≤ λ < ∞. We argue that
(6.10) Then (a)-(c) and (e) follow immediately from the induction hypothesis. Since there is nothing to show for (d) and (f), this completes the proof.
Let E = E(M, i, λ − 1) and D = D(N, i, λ − 1), and suppose that the induction hypothesis is true for E and D . (For λ = 1 this follows from Case 1 above.) If E = E then λ > |RF i(E), so the round-i Roll-Forward Phase of E is finished after E , i.e., F(E ) = I(E ). From part (a) and (e) of the induction hypothesis, F(D ) = I(D ), so the round-i Roll-Forward Phase of D is also finished after D , so D = D . Hence, E = E and D = D , so (6.10) follows immediately from the induction hypothesis.
Now assume E = E • op for some operation op on a register r by process p. Since a ∈ F(E ), p = a. By Claim 6.6 (c) no process in cov (a, E ) − {a} takes any steps during RF i(E), so p ∈ cov (a, E ) = cov (a, E ). From part (b) of Claim 6.6, r ∈ topw (a, E ) = topw (a, E ), and from part (d), owner (r) ∈ cov (a, E ) = cov (a, E ). Hence, due to part (b) of the induction hypothesis val (r, E ) = val (r, D ). Since by part (e) of the induction hypothesis F(E ) = F(D ) ⊆ cov (a, E ) and since by part (a) every process in F(E ) is in exactly the same state at the end of E as at the end of D , p is the next process in F(D ) to take a step in D , and its operation invocation will be exactly the same as in E . Since val (r, E ) = val (r, D ) so the operation response will also be the same in both executions. Hence, D = D • op and thus D = E .
It remains to show F(E ) = F(D ) ⊆ cov (a, E ). During the Roll-Forward Phase, only rules (F1) and (F4) can cause new processes to be added to the roll-forward set. Suppose (F1) occurs when process p ∈ F(E ) executes op, i.e., it finds a process q = p, q ∈ F(E ) = F(D ), on some register r. If r is in q's local memory segment, then p finds q also when it executes op in D . Otherwise, q is visible on r at the end of E . Then p = owner (r) ∈ cov (a, E ) by Claim 6.6 (d). But then, since r ∈ topw (a, E ), from part (b) of the induction hypothesis the same process q is visible on r at the end of D . Hence, the same process q gets rolled forward due to operation op. Now suppose that (F4) occurs because p gets rolled forward and q covered p. Then with exactly the same arguments as those made for the RMR Phase, either p and q get rolled forward in both, E and D , or in none of them. This shows that F(E ) = F(D ). Finally, F(E ) ⊆ cov (a, E ) follows immediately from Claim 6.6 (a).
Case 3: λ = ∞. In this case, E and D end after the round-i Local Step Phases. Let E and D be the prefixes of E and D , respectively, that end after the round-i RollForward Phases.
First note that the Local Step Phase does not affect the validity of cached copies a processes has: A process p has to incur an RMR on register r to get a new valid cached copy of that register. Moreover, only a write-operation that incurs an RMR by some process q can invalidate p's cached copy of r. Hence, throughout the Local Step Phase, the valid cached copies of processes don't change, so (c) follows immediately from the induction hypothesis. Now assume that (a)-(f) are true for E and D . Let Ep and Dp be the sequence of operations executed by process p during LS i(E) and during LS i(D), respectively. We first show that properties (a) and (b) are preserved throughout executions Ep and Dp.
First consider the case p ∈ cov (a, E ) = cov (a, E ). Recall that when the Local Step Phase starts, all processes that were previously rolled-forward are inactive. Hence, we have p ∈ F (E ) = F(D ). Note that in this case p has no valid cache-copies of a register in topw (a, E ) at the end of E or the end of D : By Claim 6.6 (b) process p has not accessed such a register in E after process a top-wrote to that register, and thus from induction hypothesis (a) the same is true for D . Moreover, whenever p accesses a register r then from Claim 6.6 (d) and the assumption that p ∈ cov (a, E ) we have owner (r) ∈ cov (a, E ). Thus, due to part (b) of the induction hypothesis, process p finds exactly the same information on registers it accesses during LS i(E) as during LS i(D). By part (c) of the induction hypothesis, p has exactly the same cached copies at the end of E as at the end of D . Moreover, during LS i(E) none of the registers in p's local memory segment or in p's cache can be changed by a process p = p, as this would require p to incur an RMR. Hence p performs exactly the same sequence of operations in LS i(E) as in LS i(E), i.e., Ep = Dp. Now consider the case p ∈ cov (a, E ). Then (a) is trivially true for p. Moreover, during Ep or Dp, process p can only write to a register in its local memory segment, as any other write would incur an RMR. Hence, (b) follows.
It follows that executions Ep and Dp preserve properties (a) and (b). We have already argued that the sets of valid cached copies cannot not change during the Local Step Phases, so (c) is preserved. Clearly, no process top-writes, so there is nothing to show for (d). Any of the roll-forward events (F1)-(F3) require that a process incurs an RMR, so they and thus also (F4) don't happen during the Local Step Phase. Thus, (e) is maintained.
It remains to show (f):
Recall that E and D finish at the end of the round-i Local Step Phase of E and D, respectively. Let E and D be the prefixes of E and D , respectively, that end at the beginning of the round-i RMR Phase. We use the same notation as in Case 1 (λ = 1) of the proof. Recall that Gi(E) and Gi(D) are the sets that were popped from the stack at the beginning of round i of executions E and D, respectively. Nothing changes for the sets that were below Gi(E) and Gi(D), on the stack, so we only have to show that the sets that are pushed on the stack satisfy (f). Let Mi(E ) and Mi(D ) be the sets of processes in Gi(E) and Gi(D), respectively, that are still active at the end of E resp. D , when the Local Step Phase has ended. From (6.1) and since F(E ) = F(D ), we have either
According to the rules of the Local Step Phase, M (E ) and M (E ) are partitioned into three subsets, Gi,s(E) resp. Gi,s(D) for s ∈ {1, 2, 3}, and these three sets are pushed on the stack. Sets Gi,1(E) and Gi,1(D) contain all processes in M (E ) and M (D ) that top-wrote in RMRi(E) and in RMRi(D), respectively. Since a does not top-write in RMRi(D) (but perhaps in RMRi(E)), we immediately get from (6.7) that Gi,1(E) − {a} = Gi,1(D).
(6.12)
Since Gi,1(E) respectively Gi,1(D) land on top of the stack, the top element of the stack satisfies property (f). Moreover, if a ∈ Gi,1(E), then j * = 0 and we are done. Thus, suppose a ∈ Gi,1(E). Then a does not top-write in round i of E , so from i ≥ t1 we conclude i > t1. Hence, from (6.1) a ∈ Gi(D) and thus a ∈ M (D ). Therefore (6.11) simplifies to
We show that in this case Gi,2(E) − {a} = Gi,2(D). Since the three sets that are pushed on the stack in D and E partition M (E ) and M (D ), resp., we then get from (6.13) that Gi,3(E) − {a} = Gi,3(D). Since first Gi,3(E) resp.
Gi,3(D) and then Gi,2(E) resp. Gi,2(D) are pushed on the stack, the claim (f) follows. By the assumption a ∈ Gi,1(E), a is not a top-writer in RMRi(E). Then cov (a, RMRi(E)) = ∅, and so by Claim 6.6 (c) no process in cov (a, E ) − {a} participates in the round-i RMR Phase of E . Thus, from (6.5) we have
(6.14)
Now note that by definition of the Local Step Phase, Gi,2(E) respectively Gi,2(D) contain the set of processes that are covered in round i, i.e., those processes that participate but don't top-write. Formally, partcpt i (E) = Gi,1(E) ∪ Gi,2(E), and partcpt i (D) = Gi,1(D) ∪ Gi,2(D). Now, the claim, Gi,2(E) − {a} = Gi,2(D), follows immediately from (6.12) and (6.14).
Proof of Lemma 6.1
Fix b, j, and ζ and for every colour schedule M define
. Choose a colour schedule M at random and let E = E(M ) be the execution defined by M . Suppose event E(M ) occurs. Then choose λ = 0 if ζ = 0, and otherwise choose λ such that E = E(M, j, λ) is the prefix of E that ends just before b's ζ-th RMR in the round-j Roll-Forward Phase of E. Now suppose that E f inds (M ) occurs. Then b finds a process a on a register r when it executes in E the operation that follows E . Hence, a is visible on r at the end of E . Moreover, a ∈ F(E ), so a can be visible on r at the end of E only if it previously top-wrote to r. Now let i be the earliest round whose RMR Phase completed during E such that some process a ∈ F(E ) ∪ {b} top-wrote to r in round i. (It is possible that a = a .) Now let N = f j,λ,r,b (M, a, i). From the discussion above it follows that properties (A1)-(A3) are satisfied. Moreover, (A4) is true: Suppose b is covered by a in the RMR Phase of round j of E = E(M, j, λ). Then E must end during the Roll-Forward Phase of round j, so b is rolled-forward before it finds a . But then b ∈ F(E ) and so a ∈ F(E ) due to (F4)-a contradiction.
Since ( In D no process top-writes to register r in round i or later: Suppose process p does in some round j , i ≤ j ≤ j, and p is the first process to do so. Then by construction of erase(M, a, i), p = a. But then by Lemma 6.5 (d), process p also top-writes to register r in round j of E. But when this happens, a is visible, so p finds a, contradicting a ∈ F(E ). Now recall that by our choice of i, no process in F(E ) topwrites register r in E before round i. From the definition of the mapping erase, the first i − 1 rounds of D and E are identical, and from Lemma 6.5 (d), F(D ) = F(E ). Hence, in D no process in F(E ) top-writes register r before round i, either. It follows that at the end of D either no process is visible on r, or if some process q is visible on r, then q ∈ F(D ). I.e., At the end of D no process in F(E ) is visible on r.
(6.15) Now note that during E process b never accesses a register r that a top-writes during E . If b accessed r in a round other than the one when a top-writes r for the first time, then either a would find b or b would find a, and so a ∈ F(E ). IF b accessed r in the round in which a top-writes r for the first time, then a would cover b which cannot happen according to Claim 6.6 (c).
Hence, b ∈ cov (a, E ), so we can apply Lemma 6.5 (a), and we obtain
(6.16) Now Lemma 6.5 (e) implies that if ζ > 0, then the step that follows D in the execution E(N ) will be b's ζ-th RMR of the round-j Roll-Forward Phase. But from (6.16) we see that in this step b does not find any other process. If ζ = 0, then we get from Lemma 6.5 (f) that b will be popped from the stack at the beginning of round i of D. Since Ni,r = 1 (a does not top-write r in the round-j RMR Phase of E) b will access r. Again from (6.16) we conclude that b does not find any other process during the round-i RMR Phase of D. Therefore, we have established the following:
there exist a colour schedule N , a ∈ P, and i ≤ j:
Let Gj(D) and Gj(E) be the sets of processes popped from the stack in the round-j RMR Phases of D and E, respectively. There is at least one process c ∈ Gj(E) − {a}: If λ = 0, then c = b, and otherwise c is the first process that triggers a process to be rolled forward during the round-j RMR Phase of E. From 6.5 (f) it follows that Gj(D) = Gj(E) − {a}, so c ∈ Gj(D). Let Tj(N ) and Tj(M ) be the set of indices such that c participates during round < j of D and E, respectively. Let D and E be the prefixes of E and D, respectively, that end immediately after round j − 1. From Claim 6.6 (a) applied to E we have c ∈ cov (a, E ). Thus, from Lemma 6.5 (a) we get state(a, E ) = state(a, D ). Hence, Tj(N ) = Tj(M ). Finally, we argue that i ∈ Tj(N ) ∪ {j}: If not then i ∈ Tj(M ) and i < j, so process a top-writes in round i while c does not participate in round i. But then at the end of round i process c is in a set on the stack below a's set, so c will not get popped from the stack again until a has been rolled forward (contradicting c ∈ Gj(E) and a ∈ F (E )). Hence, (6.17) is equivalent to:
there exist a colour schedule N , a ∈ P, and i ∈ Tj(N ) ∪ {j}:
Now fix some colour schedule N . Let M(N ) be the set of triples (M, a, i) such that M = N is a colour schedule, a ∈ P, i ∈ Tj(N ) ∪ {j} and f j,λ,r,b (M, a, i) = N . Note that |Tj(N ) ∪ {j}| ≤ k, because by roll-forward rule (F3) no process can participate in more than k rounds. Hence, from Lemma 6.4 we get
Moreover, for any (M, a, i) ∈ M(N ) − {N }, the probability that N is chosen by the adversary is at most ε/(1 − ε) times the probability that M is chosen by the adversary, because M dominates N = erase(M, a, i), and there is at least one array entry that is red in N but green in M . I.e., 20) Now let N be the set of colour schedules such that E(N ) ∧ E f ind . Then from (6.18) we conclude for random colour schedules M and N :
PROOF OF LEMMA 4.1
The proof is based on a potential function analysis. With every process p ∈ P that is active after round i ∈ {0, . . . , n 2 }, 4 we associate a potential
where Np,i is the number of active processes that are in the same set as p on the stack after round i; Xp,i is the total number of RMRs incurred by p until the end of round i; and Yp,i is the number of valid cached copies of registers that p has after round i. If process p ∈ P is not active after round i, then its potential is Φp,i = 0. We define the (total) potential after round i as Φi = p∈P Φp,i.
Note that the potential initially (after round 0) is Φ0 = N log k N , where N := |P |. Since Exp[N ] = ε|P |, it follows from Jensen's inequality and the convexity of function
In Section 7.1, we prove that the expected potential decrease in a round is bounded by the expected number of RMRs incurred in that round scaled by some constant factor. Then, in Section 7.2, we use this result to derive Lemma 4.1.
Analysis of a Single Round
Let Xi, 0 ≤ i ≤ r, denote the total number of RMRs incurred until the end of round i. The following lemma is the main result of this section.
Lemma 7.1. There is a constant c > 0 such that for all rounds 1 ≤ i ≤ r,
Below we introduce some notation and give an outline of the steps of the proof. In the proof we only look at a single round, round i, and all the notation we describe is with respect to that round. Thus in the notation we do not make explicit the dependence on i.
We denote by Gpart the set of processes that participate in the RMR Phase of round i, and by G halt = Gi − Gpart the set of halted processes. We let Xrmr = |Gpart| be the number of RMRs induced in the RMR Phase, and we let X rf be the number of RMRs induced in the Roll-Forward Phase. Also, we denote by Yinv the number of cached copies that were valid at the beginning of round i and got invalidated in the RMR Phase.
The proof first bounds the decrease in the number of active process during round i, in terms of Xrmr, X rf , and Yinv. Then it bounds the potential difference induced by this decrease, and also by the further partitioning of Gi into smaller sets.
Recall that in the RMR phase, each process p ∈ Gpart executes a single step; in this step, p may find some other process, or it may spoil some process. Let F rmr-find be the number of processes that find other processes in the RMR Phase, and let F spoil be the number of processes that spoil other processes. In the Roll-Forward Phase following the RMR Phase, any processes that found or spoilt some processes in the RMR Phase get rolled forward, together with processes that were found or spoilt. As a result, more processes may be found and thus get rolled forward. We denote by F rf-found the number of these new processes added to the rollforward set because they are found during the Roll-Forward Phase. We show that Exp[F rf-found ] is bounded by, roughly, εk 2 · Exp[X rf ] (Claim 7.4). Processes can also get rolled forward because they have incurred their k-th RMR, or because of covering rule (F4). Let F be the number of all the processes that get rolled-forward during the round. We show that F is bounded by 2k·(F spoil +F rmr-find +F rf-found )+I·Xrmr, where I = 1 if processes in Gpart incur their k-th RMR in the RMR phase, and I = 0 otherwise (Claim 7.5).
Recall that the subset of processes in Gi that are still active after the end of the round is partitioned into three sets before they are pushed back onto the stack: top-writers, participating processes that did not top-write, and halted processes We show that the expected potential difference induced by partitioning Gi into sets G * 1 , G * 2 , and G halt , where {G * 1 , G * 2 } is an arbitrary partition of Gpart, is bound by O(1)· Exp[Xrmr] (Claim 7.6)-this does not take into account that some processes become inactive during the round. Then, we show that the expected additional decrease in the potential that results from the removal of F + 1 processes in total from these sets is bounded, roughly, by
In the final part of the proof, we combine the above results to derive Lemma 7.1.
Bounding the Processes Rolled Forward
The first result is an upper bound on the expected number of processes that find a process in the RMR Phase.
Proof. For each process p, let Ep be the event that p participates in the RMR Phase, and E p the event that it finds a process in that phase. Then, by Lemma 6.1,
, and thus,
Next we bound the expected number of processes that spoil a process. Recall that Yinv is the number of cached copies that are valid at the beginning of the round but are invalidated during the RMR Phase. Proof. Fix the configuration C at the beginning of the round, before the adversary decides the colours of the registers. Let Gpotw denote the set of "potential" top-writers for the round, i.e., the subset of processes p ∈ Gi that will be top-writers if they participate in the round. We denote by rp the register that p ∈ Gpotw is poised to write. and by Vp the set of processes q ∈ Gpotw − {p} that have a valid cached copy of rp in C. Since we have fixed the configuration C at the beginning of the round, the set Gpotw is also fixed, and so are the registers rp and the sets Vp, for all p ∈ Gpotw.
For each p ∈ Gpotw, let Ep be the event that p is a topwriter in this round, which is the same as the event that rp is green in this round. Thus, the events Ep, for p ∈ Gpotw, are mutually independent (since rp = rq for distinct p, q ∈ Gpotw), and Prob(Ep) = ε. Let E p be the event that p spoils some process in this round. Clearly, E p = Ep ∧ q∈Vp Eq, and thus Prob E p = Prob We now bound the expected number of processes that are found by other processes in the Roll-Forward Phase.
Proof Proof of Claim 7.4. For 1 ≤ λ ≤ X rf , let E λ be the event that some process is found during the step in which the λ-th RMR of the Roll-Forward Phase is incurred. From Lemma 6.1 it follows that
Then, for Z λ = 1E λ the indicator random variable of event E λ , we have Exp[Z λ | λ ≤ X rf ] ≤ ε 1−ε · k(k + 2). And since F rf-found = Xrmr λ=1 Z λ , the claim follows from Wald's theorem.
Recall that F is the total number of processes that get rolled forward in round i. In the next claim we bound F in terms of the quantities we bounded in the previous claims. By we denote the number of rounds in which the processes in set Gpart have participated, including round i (by Claim 6.2(c), all these processes have participated in exactly the same sequence of rounds). Note that is also equal to the number of RMRs that each process in Gpart has incurred until the end of the RMR Phase in round i.
Claim 7.5.
F ≤ 2k · (F rmr-find + F spoil + F rf-found ) + 1 { =k} · Xrmr.
Proof. By Rule (F1), at most 2F rmr-find processes get rolled forward because during the RMR Phase they either find a process or are found by a process; and F rf-found processes get rolled forward because they are found by a process during the Roll-Forward Phase. By (F2), at most 2F rmr-find processes get rolled forward because either they spoil some process or they are spoilt by a process (not all processes that are spoilt get rolled forward). By (F3), if = k then a number of |Gpart| = Xrmr processes get rolled forward because they incur their k-the RMR in the RMR Phase. It remains to account for the processes that get rolled forward due to (F4). From Claim 6.2(a) it follows that for every process that gets rolled forward because of (F1) or (F2), at most (k − 1) other processes get rolled forward because of (F4). And from Claim 6.2(b) it follows that if a process gets rolled-forward because of (F3), then this does not cause any additional processes to get rolled forward because of (F4). Combining the above yields the claim.
From the above result and Claim 6.2(d), it follows that at most F + 1 processes become inactive in round i.
Bounding the Potential Difference
First we bound the decrease ∆Φ split in the potential that results if we partition set Gi into sets G * 1 , G * 2 and G * 3 , where {G * 1 , G * 2 } is an arbitrary partition of Gpart and G * 3 = G halt . In the description of the adversary, the partitioning takes place at the end of the round. However, it is equivalent if at the beginning of the round (after the adversary has decided the colours of the register) we partition the processes in Gi into top-writers, participating processes that do not top-write, and halted processes, and then at the end of the round we remove from these sets any inactive processes.
From the definition of potential,
where f is the function f (x) = x log k x if x ≥ 1; 0 if x ≤ 1. Suppose that at the beginning of round i there are L − 2 sets of processes on the stack, thus after set Gi is popped and split into sets G * 1 , G * 2 , G * 3 , we have L sets. Let N1, . . . , NL be the number of active processes in these sets at the beginning of the round, and let N 1 , . . . , N L be the corresponding numbers at the end of the round. Clearly, N j ≤ Nj, for 1 ≤ j ≤ L, and F ≤ L j=1 (Nj − N j ) ≤ F + 1, where the upper bound follows from Claim 6.2(d). We now bound the decrease ∆Φ shrink in the potential due to the decrease in the sizes of these sets, Proof. We break ∆Φ shrink into two terms, ∆Φ shrink = ∆Φ1 + ∆Φ2, where ∆Φ1 accounts for the removal of F processes in total from arbitrary sets, and ∆Φ2 accounts for the removal from G * 1 or G * 2 , of the (at most) one process p that becomes inactive in round i without getting rolled forward. Clearly, p must be a process in G * 1 ∪ G * 2 = Gpart, because before p becomes inactive it must perform an RMR in the RMR Phase of round i.
First we bound ∆Φ1. We write N 1 , . . . , N L to denote the set sizes after the removal of the F processes. Then 
3)
The claim now follows from (7.2) and (7.3).
