Contention-Free Complexity of Shared Memory Algorithms  by Alur, Rajeev & Taubenfeld, Gadi
File: 643J 256101 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 6754 Signs: 4658 . Length: 60 pic 11 pts, 257 mm
Information and Computation  IC2561
information and computation 126, 6273 (1996)
Contention-Free Complexity of Shared Memory Algorithms*
Rajeev Alur
ATHT Bell Laboratories, 600 Mountain Avenue, Murray Hill, New Jersey 07974
E-mail: alurresearch.att.com
and
Gadi Taubenfeld
ATHT Bell Laboratories, 600 Mountain Avenue, Murray Hill, New Jersey 07974, and Israel Open University
E-mail: gadiresearch.att.com
Worst-case time complexity is a measure of the maximum time
needed to solve a problem over all runs. Contention-free time com-
plexity indicates the maximum time needed when a process executes by
itself, without competition from other processes. Since contention is
rare in well-designed systems, it is important to design algorithms
which perform well in the absence of contention. We study the conten-
tion-free time complexity of shared memory algorithms using two
measures: step complexity, which counts the number of accesses to
shared registers; and register complexity, which measures the number
of different registers accessed. Depending on the system architecture,
one of the two measures more accurately reflects the elapsed time. We
provide lower and upper bounds for the contention-free step and
register complexity of solving the mutual exclusion problem as a func-
tion of the number of processes and the size of the largest register that
can be accessed in one atomic step. We also present bounds on the
worst-case and contention-free step and register complexities of solv-
ing the naming problem. These bounds illustrate that the proposed
complexity measures are useful in differentiating among the computa-
tional powers of different primitives. ] 1996 Academic Press, Inc.
1. INTRODUCTION
1.1. Worst-Case versus Contention-Free Complexity
We consider an asynchronous distributed environment
where processes communicate via shared registers. For such
shared memory systems, there has been extensive research
aimed at understanding the time needed to solve a variety
of coordination and synchronization problems, such as
mutual exclusion, consensus, and renaming. Most of this
research has focused on the worst-case complexity, which
indicates the maximum time taken by any process to accom-
plish its task. The contention-free complexity, on the other
hand, gives an upper bound on the time used by a process
when the process runs by itself without any interference
from other processes. Since contention is rare in many
well-designed systems, it is important to design algorithms
that perform well also in the absence of contention. Thus,
between two algorithms with the same worst-case com-
plexity, the one with the lower contention-free complexity
may perform better in practice. The importance of con-
tention-free complexity was first explored by Lamport
[Lam87], who presented a mutual exclusion algorithm in
which any process accesses shared registers only a constant
number of times before entering the critical section in the
absence of contention.
Unlike the worst-case complexity, there are no known
tight bounds on the contention-free complexity of con-
current tasks. The proofs of lower bounds on the worst-case
complexity usually involve constructing runs with many
processes interfering with each other, and do not imply
lower bounds on the contention-free complexity. This
means that it may be possible to improve existing algo-
rithms by modifying them to have better contention-free
complexity. This calls for research with the goal of obtaining
tight bounds on contention-free complexity, and to under-
stand the relationship between contention-free and worst-
case complexity for a variety of problems. This paper
initiates such an investigation by presenting bounds for the
contention-free time complexity for mutual exclusion in a
shared memory with atomic registers, and for the naming
problem in a shared memory that supports different types of
readmodifywrite bits.
1.2. Measures of Time Complexity
There is no agreement on the appropriate measure of time
complexity in asynchronous shared-memory systems. Most
suggestions agree that the time it takes to access the shared
memory is the dominating factor, since this is typically
much slower then the time required for local computation.
The traditional measure of time complexity accounts for the
total number of accesses to the shared registers. We define
the step complexity of a task to be the maximum number of
article no. 0034
620890-540196 18.00
Copyright  1996 by Academic Press, Inc.
All rights of reproduction in any form reserved.
* An abbreviated version of this paper appears in the Proceedings of the
13th Annual ACM Symposium on Principles of Distributed Computing,
August 1994.
File: 643J 256102 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 6647 Signs: 6125 . Length: 56 pic 0 pts, 236 mm
times any process accesses shared registers. For each
problem, the worst-case and contention-free step com-
plexities can be defined in the obvious way. For instance, the
worst-case step complexity of an algorithm for solving con-
sensus is the maximum number of times any process
accesses shared registers, from its start until it decides.
The definition of step complexity does not always capture
the real time complexity of an algorithm [PF77]. If one
process is in a loop waiting for another process to complete
some action, then increasing the execution speed of the first
process increases its number of steps, without necessarily
degrading the total system performance. This leads to the
following alternative definition (from [PF77]). Assume
that every step of every process takes at most 1 time unit.
Thus, if two processes start at the same time and one takes
100 steps while the other takes 5 steps, then the total time
elapsed is at most 5 units. Time complexity is then defined
by counting time units instead of steps. For our investigation,
it is enough to notice that in the absence of contention, this
last complexity measure and step complexity are the same.
Recently, there have been many suggestions to measure
time complexity differently. Yang and Anderson propose to
distinguish between a remote and a local access to shared
memory [YA93]. Shared registers may be locally accessible
as a result of coherent caching, or when using distributed
shared memory where memory is physically distributed
among the processors. The definition of a remote access is
very delicate and depends on specific architectural details of
a given system (see [YA93] for one possible definition).
Assuming that no process physically owns any portion of
the shared memory, it follows that the first access to a new
shared register by a process must be a remote access (since
the process does not yet have a copy of this register in its
cache). Thus, the number of different registers accessed by a
process in a given run is a lower bound on the number of its
remote accesses. We call this measure register complexity.
As in step complexity, for each problem, we can define
worst-case register complexity and contention-free register
complexity. For instance, the contention-free register com-
plexity of a consensus algorithm is the maximum number of
different registers accessed by a process along runs in which,
while this process is executing, all other processes have
either decided, or failed, or not started. Observe that in the
contention-free runs, once a process obtains a local copy,
there is no other process executing concurrently that can
invalidate the local copy, and hence, it is reasonable to
assume that the number of different registers accessed
accurately reflects the number of remote accesses.
It is clear that register complexity is a lower bound for
step complexity. Depending on the architecture, one of the
two measures more accurately reflects the time taken. It is
also worth noting that register complexity is different from
the space complexity, which is usually defined to be the total
number of shared registers used in the algorithm.
Finally, note that the definitions of (worst-case) step and
register complexity are different from the definition of time
complexity explored by Dwork et al. [DHW93], which
accounts for the contention levelthe number of processes
that access the same shared memory simultaneously. For
example, if n processes are trying to access the same memory
location simultaneously, then according to our measures,
it will cost each one of them one step. According to
[DHW93], it will cost the i th process i steps because it
must wait for the other i&1 processes to finish.
1.3. Bounds for Mutual Exclusion
The mutual exclusion problem is to design a protocol that
guarantees mutually exclusive access to a critical section
among a number of competing processes [Dij65]. We
assume that the shared memory supports only atomic
registers. The contention-free step (register) complexity of a
mutual exclusion algorithm is measured by counting the
number of accesses to the shared registers (the number of
different shared registers accessed) in the entry code and the
exit code, when no other processes are competing. Lamport
presented a mutual exclusion algorithm for n processes
that has constant contention-free step complexity [Lam87].
However, these accesses involve reading and writing
registers of size log n bits.
An interesting question is whether it is possible to further
improve Lamport’s solution by reducing the number of
times shared bits need to be accessed. In particular, is there
an algorithm in which the number of times a process only
needs to access a constant number of shared bits before
entering its critical section, in the absence of contention?
Our first result implies that no such algorithm exists. We
show that the contention-free step complexity of every
mutual exclusion algorithm for n processes is greater than
(log n)(l&2+3 log log n), where l is the size (in terms of
bits) of the biggest register accessed by the algorithm
in one atomic step. We also prove a lower bound of
- (log n)(l+log log n) on the contention-free register
complexity. This implies that the contention-free register
complexity cannot be only a constant number of bits. For
upper bounds, we give an algorithm with contention-free
step as well as register complexity O(W(log n)lX).
In modern multiprocessors, it is possible to load or store
32-bit or even 64-bit words, so having registers of size log n
bits poses no problem. However, although our study con-
cerns registers of smaller size, and is mainly of theoretical
interest, it may be of practical value as well. Some recent
shared-memory systems support access to words of memory at
different granularities (i.e., multi-grain atomic reads and
writes). Thus, several registers of smaller size can be packed
into one word of memory, enabling reads or writes to all or a
subset of them in one atomic step. This was demonstrated by
Michael and Scott [MS93], who improve the performance
63CONTENTION-FREE COMPLEXITY
File: 643J 256103 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 6331 Signs: 5416 . Length: 56 pic 0 pts, 236 mm
of Lamport’s algorithm [Lam87] and the authors’ algorithm
[AT92] by more than 250, by exploiting the ability to
read and write atomically at both full- and half-word
granularities.
Note that in the worst-case, a process may be forced to
take an unbounded number of steps before entering its criti-
cal section; that is, the worst-case step complexity of mutual
exclusion is infinity [AT92]. The algorithm presented by
Kessels [Kes82], which is based on the idea of using a tour-
nament tree from [PF77], uses only shared bits and has
O(log n) worst-case register complexity. No lower bound is
known for the worst-case register complexity for large
registers.
Yang and Anderson independently study the time com-
plexity of mutual exclusion [YA94]. The results imply (1)
an 0(logw n) lower bound on the contention-free step com-
plexity and an 0(- logw n) lower bound on the contention-
free register complexity, where w is the maximum number of
processes that may simultaneously write the same register,
and (2) an 0(logc n) lower bound on the contention-free
register complexity, where c is the maximum number of
processes that may simultaneously access the same register.
In another related paper [BL93], it is shown that any
deadlock-free mutual exclusion algorithm for n processes in
an asynchronous system must use n shared registers.
1.4. Bounds for Naming
In Section 3, we consider the naming problem. The goal
of the naming problem is to assign unique names to a collec-
tion of initially identical processes. The processes may crash
(i.e., stopping failures), and the solution is required to be
wait-free; that is, it should guarantee that every participat-
ing process will always be able to terminate in a finite num-
ber of steps, regardless of the behavior of other processes
[PF77, Her91]. We consider a shared memory system that
supports atomic access only to individual bits, and consider
a variety of models corresponding to different combinations
of allowed operations, such as test-and-set, read, and test-
and-flip. (This last operation flips the bit and returns the old
value, and is similar to the balancer of [AHS91].)
Our results give tight lower and upper bounds on worst-
case and contention-free step and register complexities of
naming in some of these models. These results, admittedly
not of direct practical interest on their own, demonstrate the
differences between contention-free and worst-case com-
plexity and between step and register complexity. For
instance, if the model supports only the test-and-set opera-
tion, n&1 is a tight bound on all four measures. Strengthen-
ing the model with a read operation lowers both measures
of the contention-free complexity to log n. Introducing test-
and-reset to this model lowers the worst-case register com-
plexity to log n, whereas with the availability of test-and-flip
operation, log n is a tight bound on all four measures. Thus,
a consideration of all four measures of time complexity
brings out the differences in the computational powers of
different primitives.
2. MUTUAL EXCLUSION
2.1. Problem Definition
The mutual exclusion problem is to design a protocol that
guarantees mutually exclusive access to a critical section
among a number of competing processes [Dij65]. The solu-
tion to the problem should satisfy: (a) no two processes are
in their critical section at the same time (mutual exclusion),
and (b) if some process p starts executing its algorithm, then
eventually some process (possibly different from p) is in its
critical section (deadlock freedom). A weaker version of the
problem replaces the requirement of deadlock freedom by
weak deadlock freedom, which requires deadlock freedom
only in the absence of contention.
We consider the mutual exclusion problem in a shared
memory environment that supports only atomic registers;
that is, a process can only either read or write a shared
register in one atomic step. We assume that there are n com-
peting processes, where each process is assumed to have a
unique identifier from [1, ..., n]. We define the atomicity of
an algorithm to be the size (in terms of bits) of the biggest
register accessed in one atomic step. We will use l to denote
the atomicity of an algorithm.
2.2. Complexity Measures
We outline our formal model of distributed systems,
followed by definitions of the complexity measures.
Processes are modeled as (possibly infinite) state-machines
communicating by writing and reading from shared
registers.
A (global) state of the system includes the state of each
process and the values of all shared registers. An event is a
single step of some process, and is either an access to a
shared register, or simply an update of the internal state of
a process.
As in the standard interleaving semantics, a run _ of the
system is an alternating sequence s0 w
e0 s1 w
e1 } } } of states
si and events ei such that (1) the initial state s0 satisfies the
initial condition, and (2) every state si+1 is derived from the
previous state si by executing the event ei . Each event
belongs to a unique process. The notion of a run, as defined
here, captures the asynchronous nature of a system, where
no assumptions are made about the relative speeds of the
processes.
For a run _ and indices i j, the run fragment _ij denotes
the subsequence si w
ei } } } wwej&1 sj of _. The step com-
plexity of a process p in a run fragment _ij is the number of
events in _ij that involve access to shared registers and
64 ALUR AND TAUBENFELD
File: 643J 256104 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 6192 Signs: 5068 . Length: 56 pic 0 pts, 236 mm
belong to p. The register complexity of a process p in a run
fragment _ij is the number of different shared registers
accessed by p in this run fragment.
The worst-case step (register) complexity of an algorithm
for mutual exclusion gives an upper bound on the time
spent (the number of registers accessed) by the winning pro-
cess in its entry code and exit code. In the formal definition
we give below, for simplicity it is assumed that a process
does not take any steps in its critical section. The worst-case
step (register) complexity of the entry code of an algorithm
is the maximum step (register) complexity of a process p in
a run fragment _ij=si w
ei } } } wwej&1 sj such that
1. process p is in its entry code in state si , and is in its
critical section in state sj , and
2. no process is in its exit code or critical section in
states sk for ik< j.
The second condition ensures that we start counting the
number of steps (registers) of the winning process p only
after the processes previously in the critical section have
finished their exit code. The worst-case step (register) com-
plexity of the exit code of an algorithm is the maximum step
(register) complexity of a process p in a run fragment _ij
such that process p is in its critical section in state si and in
its exit code in state sj . The worst-case step (register) com-
plexity of an algorithm is the sum of the worst-case step
(register) complexity of its entry code and exit code.
The contention-free step (register) complexity of an algo-
rithm is the maximum step (register) complexity of a
process p in a run fragment _ij such that
1. process p is in its entry code in state si , and is in its
exit code in state sj , and
2. in all states sk , ik j, process p is not in its
remainder region, and all other processes p${p are in their
remainder regions.
It is known that the worst-case step complexity of mutual
exclusion is unbounded [AT92]. The worst-case register
complexity can be as low as log n, even when atomicity is
only 1 (i.e., all shared registers are bits) [Kes82], and there
is no known matching lower bound. We consider bounds
for contention-free complexity. In our results, sometimes we
make finer distinctions such as read-step complexity and
write-step complexity (or read-register, write-register) to
refer separately to the number of read and write accesses. As
usual, the complexity of a problem is defined as the com-
plexity of the best algorithm that solves it.
2.3. The Contention Detection Problem
The lower bounds are proven for a much weaker
problem, namely, the problem of detecting contention. The
contention detection problem requires that when a process is
activated, it executes its protocol and terminates with some
output value from [0, 1] such that
v in every run, at most one process terminates with out-
put value 1, and
v in a run in which only one process p is activated, p ter-
minates with output 1.
The contention detection problem can be viewed as a single-
shot mutual exclusion problem (that is, a process entering
its critical section stays there forever), where only weak
deadlock freedom is required as a liveness property. A solu-
tion to the mutual exclusion problem can be used to solve
the contention detection problem as follows. A process
entering its critical section sets some special bit to 1, and
outputs 1, and every other process that notices that the bit
is set to 1, returns 0. This implies that
Lemma 1. A lower bound on any time complexity (worst-
case or contention-free, step or register) of the contention
detection problem is also a lower bound on the corresponding
time complexity of the mutual exclusion problem.
2.4. Lower Bound on Contention-Free Step Complexity
Let A be an algorithm for detecting contention. For each
process p, let run( p) denote the (finite) sequence of accesses
to the shared registers by process p in a run where only pro-
cess p is activated. Let W( p, m) be the pair (x, v), where x
is a shared register and v is a value, if the m th write opera-
tion along run( p) is write(x, v) (W( p, m) is undefined if no
such operation exists). If W( p, m) is (x, v), we will also write
Wr( p, m)=x and W v( p, m)=v. For a process p, let R( p) be
the set of shared registers x such that run( p) contains the
operation read(x). We first prove the following lemma.
Lemma 2. Let A be an algorithm for detecting conten-
tion, and let p1 and p2 be distinct processes. Then for some m,
W( p1 , m) and W( p2 , m) are well defined, with W( p1 , m){
W( p2 , m), and either Wr( p1 , m) # R( p2) or Wr( p2 , m) #
R( p1).
Proof. Consider two distinct processes p1 and p2 .
Suppose each of the runs run( p1) and run( p2) contain w
write operations. (If the two runs do not contain the same
number of operations, then we can introduce some dummy
write operations at the end of one of the runs.) The proof
will be by contradiction. Assume that for all 1mw,
if W( p1 , m) { W( p2 , m), then W r( p1 , m)  R( p2) and
W r( p2 , m)  R( p1). We will merge the runs run( p1) and
run( p2) to obtain a run in which both p1 and p2 terminate
with output value 1, violating the safety requirement of the
problem.
65CONTENTION-FREE COMPLEXITY
File: 643J 256105 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 6258 Signs: 4995 . Length: 56 pic 0 pts, 236 mm
We will inductively construct the runs _0 , _1 , ..., _w , such
that at the end of each _i , for each process pj , for j=1, 2, the
following conditions hold: (i) the sequence of operations by
process pj in _i is precisely the prefix of run( pj) up to the i th
write operation, and (ii) for each x # R( pj), the value of x is
either its initial value or the value written by pj in its last
write operation to x. Thus at the end of each _i , p1 is
‘‘hidden’’ from p2 and vice versa. Let _0 be the empty run.
Now assume that we have a run _i that satisfies both the
conditions (i) and (ii), and we describe how to extend it to
obtain _i+1. First, let p1 execute read operations until it is
ready to execute its (i+1)th write operation. Then let p2
execute its read operations and also its (i+1)th write opera-
tion. Now let p1 execute its (i+1)th write operation. Either
both processes write the same value to the same register, or
each process writes to some register that is never read by the
other one. Then _i+1 satisfies the requirements (i) and (ii).
Thus, we can construct inductively the run _w in which both
processes terminate with output value 1, a contradiction. K
Now we prove the following lemma, which relates the
number of processes, atomicity, the number of write opera-
tions, and the number of different registers read.
Lemma 3. Let A be an algorithm for detecting conten-
tion among n processes, with atomicity l, contention-free
read-register complexity r, and contention-free write-step
complexity w. Then
w } l+w } log(w2 } r+w } r2)log n.
Proof. Let A be the given algorithm for detecting con-
tention among a set P of n processes. For each p # P, run( p)
contains at most w write operations, and the size of R( p) is
at most r. We will construct a sequence P0 , P1 , ..., each
PiP, and a sequence W0 , W1 , ..., each Wi[1, ..., w]
such that for each i,
1. n(2l } w } r } (w+r)) i } |Pi |,
2. |Wi |=i, and
3. for all p, p$ # Pi and j # Wi , W( p, j)=W( p$, j).
Initially, let P0 be P and W0=<. Now, given Pi and Wi
satisfying the above three conditions, we show how to
obtain Pi+1 and Wi+1. Let |Pi |=ni . First, consider the
following definitions.
v A register x is a read-witness for a process p # Pi if
x # R( p) and there exists a set P$Pi such that |P$|
ni (w+r), and for each p$ # P$, there exists j  Wi with
Wr( p$, j)=x.
v A register x is a write-witness for a process p # Pi , if
Wr( p, j)=x for some j  Wi , and there exists a set P$Pi
such that |P$|ni(w+r), and for each p$ # P$, x # R( p$).
From Lemma 2, and the assumption that condition 3 is
satisfied, we get that for every distinct p, p$ # Pi , there exists
j  Wi such that either Wr( p, j) # R( p$) or Wr( p$, j) # R( p).
Each process reads at most r registers and writes at most w
registers. Hence, each p # Pi has either a read-witness or a
write-witness. It follows that either some process has a read-
witness or all processes have write-witnesses. Next we prove
that in either case there exists a register x and a set P$Pi
of processes such that |P$|ni r(w+r) and for each process
p$ # P$ there exists j  Wi such that Wr( p$, j)=x.
Case 1. There exists a process p # Pi that has a read-wit-
ness. Let x be the read-witness for p. By definition, there is
a set P$Pi such that |P$|ni (w+r) and for each process
p$ # P$ there exists j  Wi such that Wr( p$, j)=x.
Case 2. Each process p # Pi has a write-witness. Each
write-witness is read by ni (w+r) number of processes.
Since each process reads at most r registers, there are at
most ni } r number of different read operations along the
(contention-free) runs of processes in Pi . This implies that
at most r } (w+r) different registers can be write-witnesses.
Since each process has a write-witness, it follows that some
register x is a write-witness for at least ni r(w+r) processes.
That is, there is a set P$Pi such that |P$|nir(w+r) and
for each process p$ # P$ there exists j  Wi such that
Wr( p$, j)=x.
Since each run has at most w write operations, it follows
that there exists j  Wi and a set P"P$ of size at least
ni rw(w+r) such that for each p$ # P", Wr( p$, j)=x. Let
Wi+1=Wi _ [ j]. Since there are only 2 l different values
that can be written to a register, there exist a value v and a
set Pi+1P" of size at least ni2lrw(w+r) such that for
each process p$ # Pi+1 , W( p$, j)=(x, v). Thus, we have the
desired sets Pi+1 and Wi+1.
Since each run contains at most w write operations, from
Lemma 2, it follows that the set Pw can contain at most one
process. This implies that n(2l } w } r } (w+r))w, and hence
log nw } l+w } log(w2 } r+w } r2). K
The proof of this lemma views a run of the processes as a
set of registers (corresponding to read operations) and a
sequence of pairs of registers and values (corresponding to
write operations). Lemma 2 states some properties of such
runs using the problem requirements, whereas the proof
of Lemma 3 is purely combinatorial: it involves only a
counting argument for runs satisfying Lemma 2.
Lemma 4. Let A be an algorithm for detecting conten-
tion among n processes, with atomicity l and contention-free
step complexity c. Then
c >
log n
l&2+3 log log n
.
66 ALUR AND TAUBENFELD
File: 643J 256106 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 5420 Signs: 3661 . Length: 56 pic 0 pts, 236 mm
Proof. Let r be the contention-free read-step com-
plexity, and let w be the contention-free write-step com-
plexity. Since c denotes the contention-free step complexity,
by definition,
cw+r. (1)
Before a process terminates, it must read (and write) at least
once. That is, r1 (and w1) and hence
c&1w. (2)
From Lemma 3,
w } l+w } log(w2 } r+w } r2)log n. (3)
Using the above inequalities, we will find a good
approximation for c. From (1), w+r=c$ for some c$c.
The expression w2 } r+w } r2 gets its maximum value when
w=r=c$2, and hence it is bounded from above when w
and r are assigned the value c2. That is, for w+rc,
c3
4
=\c2+
2
}
c
2
+
c
2
} \c2+
2
w2 } r+w } r2. (4)
From (3) and (4), it follows that,
w } l+w } log
c3
4
log n. (5)
Clearly, the left-hand side of inequality (5) has maximum
value when w is maximum. Hence by (2), we can substitute
c&1 for w to obtain
(c&1) } l+(c&1) } log
c3
4
log n, (6)
which is the same as
(c&1) } log(2l&2 } c3)log n. (7)
To solve (7), we use the following simple fact. For positive
k and m,
if c } log(k } c3)m then c >
m
log(k } m3)
. (8)
Using (8), we get from (7) that
c >
log n
l&2+3 log log n
. (9)
From Lemma 1 and Lemma 4, we get the following lower
bound for mutual exclusion:
Theorem 1. Let A be a (weak) deadlock-free mutual
exclusion algorithm for n processes, with atomicity l and
contention-free step complexity c. Then
c >
log n
l&2+3 log log n
.
One immediate consequence is that, although the conten-
tion-free step complexity may be a constant when the
atomicity is big enough, the number of times a process needs
to access shared bits cannot be a constant, regardless of the
level of atomicity. More precisely, in every algorithm with
atomicity l and contention-free step complexity c, the
number of times some process needs to access shared bits in
the absence of contention is at least l+c&1.
2.5. Lower Bound on Contention-Free Register Complexity
Now we consider the contention-free register complexity.
Since it is possible to access the same register several times,
Theorem 1 says nothing about the number of different
registers that must be accessed. However, it is possible
to prove that there is no algorithm in which only a con-
stant number of distinct shared bits are accessed by a
process before entering its critical section, in the absence of
contention.
Let A be an algorithm for detecting contention among n
processes. Let c be the contention-free register complexity
of A, and let w be the contention-free write-register com-
plexity of A. For each process p, let run( p) denote the
sequence of accesses to the shared registers by process p
when only p is activated. Let wr( p) denote the sequence of
registers written by p along run( p) such that a register x
appears before a register y along wr( p) iff along run( p), the
first write operation to x precedes the first write operation
to y. The length of wr( p) is bounded by w, and for the worst-
case analysis in our proofs, it suffices to assume that the
length of wr( p) equals w for every process p. For 1mw,
let wr( p, m) denote the m th register of the sequence wr( p).
The contention-free run run( p) can be divided in w+1
stretches as follows. Suppose wr( p) is the sequence
x1 x2 } } } xw . The 0th stretch contains only read operations.
The first stretch starts with a write operation to x1 , and con-
tains the subsequence of run( p) up to the first write to x2 .
Thus, all of its write operations involve x1 . The second
stretch starts with a write operation to x2 , and contains
writes to the registers x1 and x2 . Thus, at the beginning of
the i th stretch, p is about to write to xi , and all registers,
except x1 , ..., xi&1 , have their initial values. Observe that
the i th stretch contains arbitrarily many read and write
67CONTENTION-FREE COMPLEXITY
File: 643J 256107 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 6328 Signs: 5152 . Length: 56 pic 0 pts, 236 mm
operations, but its every write involves one of the registers
x1 , ..., xi . Let wv( p, i, j) denote the value of xj at the begin-
ning of the ith stretch. That is, wv( p, i, j) is the initial value
of xj if i j, and if j<i then it is the last value written by p
to xj before its first write to xi . For the worst-case analysis,
it suffices to assume that the i th stretch contains at least one
write operation to each of the registers x1 , ..., xi . Thus, each
process p in its i th stretch writes wv( p, i, j) to xj for j<i.
For a set P of processes, let shared(P) be the set of indices
1mw for which there are two distinct processes p, p$ # P
such that wr( p, m) is read by p$ along run( p$). Let
common(P) be the set of indices 1mw such that for
all p, p$ # P, wr( p, m)=wr( p$, m) and wv( p, m$, m)=
wv( p$, m$, m) for all 1m$w.
Lemma 5. Let P be a set of processes such that
shared(P)=common(P). Then |P|<2 } w!.
Proof. Let P be a set of n processes. For the worst-case
analysis, assume that shared(P)=common(P)=[1, ..., w].
Hence, for all p, p$ # P, wr( p$)=wr( p), let this sequence of
registers be x1 } } } xw . Also for all p, p$ # P, and 1m,
m$w, wv( p$, m, m$)=wv( p, m, m$). We will denote the
value of xj at the beginning of the i th stretch by vij . All
processes in P agree on these values by assumption.
We combine the contention-free runs of the processes
in P. We inductively construct the runs _1 , _2 , ..., _w , such
that at the end of each _i the following conditions hold:
1. there is a set Pi of ni! processes such that for every
p # Pi , the sequence of operations by p in _i is the same as
the prefix of run( p) containing the first (i&1) stretches,
2. for every j<i, there is a set Pij of ni! processes such
that each p # Pij is about to write the value v
i
j to the register
xj , and
3. _i does not contain any write operations to the
registers xi , ..., xw .
Constructing the run _1 is easy: simply let each process
p # P execute its 0th stretch one after the other. (Recall that
the 0th stretch contains only read operations.) The set P1
equals P.
Now suppose we have constructed the run _i satisfying
the above three conditions. Each process in Pi is about to
execute its i th stretch. For a process p # Pi to be able to
execute its ith stretch, it is required that the value of each
register xj , for j{i, be vij . The value of xi does not matter
because p is about to write to xi . Each process in Pij is about
to write vij to xj for j<i. For j>i, the register xj has not yet
been changed, and its value at the end of _i is same the as
its initial value, which also equals vij . Hence, we can let one
process from each of the sets Pij , j<i, execute one write
operation, and then let a process in Pi execute its i th stretch
(note that |Pi |=|Pij |, for each j<i).
We let |Pi |(i+1) processes execute their i th stretch to
completion, and this constitutes the set Pi+1 of processes
ready to execute the next stretch. For each ji, each pro-
cess writes the value vi+1j to the register xj along its i th
stretch (the last value it writes to xj along this stretch).
Hence, for each ji, we let |Pi |(i+1) processes execute
their i th stretch only partially, and suspend these processes
just before they are about to write the value vi+1j to the
register xj . This gives us the sets Pi+1j . The resulting run is
the desired _i+1.
If n2w!, then at the end of _w we have |Pw |2. Each
process is about to write a value to xw , and for each j<w,
we have two processes about to write the value vwj to xj .
Hence, as before, we can let both processes in Pw execute
their last stretch, violating the safety requirement. K
Next we use the above lemma to obtain a relationship
among the number of processes, the contention-free register
complexity, and atomicity.
Lemma 6. Let A be an algorithm for detecting conten-
tion among n processes, with atomicity l, contention-free
write-register complexity w, and contention-free register
complexity c. Then,
n<2w! } (4c } w!)c } (w } 2lw)w.
Proof. Let A be the given algorithm for detecting con-
tention among a set P of n processes. We construct a
sequence P0 , P1 , ... of processes, a sequence R0 , R1 , ... of
shared registers, and a sequence W0 , W1 , ... of sets of
indices, each Wi [1, ..., w], such that for each i,
1. n(4c } w!)i } (w } 2lw) |Wi | } |Pi |,
2. |Ri |+ |Wi |=i,
3. for all x # Ri , for each p # Pi , x is read, but not writ-
ten, by p along run( p), and
4. Wi common(Pi).
Initially, let P0 be P and R0=W0=<. Now, given Pi ,
Ri , and Wi satisfying the above conditions, we show how to
obtain Pi+1 , Ri+1 , and Wi+1. Let |Pi |=ni .
First we construct an edge-labeled (undirected) graph Gi
as follows. The graph contains one node for every process p
in Pi . Let x be a shared register such that x  Ri _
[wr( p, m) | p # Pi and m # Wi]. There is an edge between p
and p$ labeled with x if both p and p$ access x along their
contention-free runs.
Let Q be an independent set of vertices (i.e. if Gi has an
edge between p and p$ then at least one of p or p$ is not in
Q). From the construction it follows that, for 1mw, if
m # shared(Q) then m # Wi , and hence, m # common(Q).
From Lemma 5, it follows that |Q|<2w!.
Since the graph Gi does not contain an independent set of
size 2w!, it must contain a vertex p with degree at least
68 ALUR AND TAUBENFELD
File: 643J 256108 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 5558 Signs: 3637 . Length: 56 pic 0 pts, 236 mm
ni (2w!). The process p accesses at most c registers in its
contention-free run, and hence, the edges incident on p have
at most c different labels. Thus, there is a register x, and a
set Q1 of processes such that each process in Q1 accesses x
along its contention-free run, and |Q1 |n$ where n$=ni
(2c } w!). Let Q2 Q1 be the set of processes q # Q1 such
that q writes to x along run(q). There are two cases to be
considered.
v Case 1. |Q2 |<n$2. We choose Pi+1 to be the set
Q1&Q2 , Ri+1 to be the set Ri _ [x], and Wi+1=Wi . Each
process q # Pi+1 reads x without writing to x along run(q).
Thus condition 3 is satisfied for the register x. |Pi+1 |>n$2,
where n$=ni (2c } w!), implying condition 1.
v Case 2. |Q2 |n$2. For each q # Q2 , x appears along
wr(q). The length of wr(q) is w, hence, there is a subset
Q3 Q2 for which there exists 1mw such that
wr(q, m)=x for all q # Q3 , and |Q3 ||Q2 |w. For each
q # Q3 , for each j, wv(q, j, m) can have 2l different values.
Hence there is a subset Q4 Q3 of processes such that for all
q, q$ # Q4 and for all 1 jw, wv(q, j, m)=wv(q$, j, m),
and |Q4 ||Q3 |2lw. Thus, m # common(Q4). Setting
Pi+1=Q4 , Ri+1=Ri , and Wi+1=Wi _ [m] satisfies all
the conditions.
Since the contention-free register complexity is c and
contention-free write-register complexity is w, we have
|Ri _ Wi |c and |Wi |w. Thus the above procedure can
be repeated at most c times. Furthermore, from Lemma 5,
|Pc |<2w!. This implies
n<2w! } (4c } w!)c } (w } 2lw)w. K
Now the above lemma can be used to prove a lower
bound on the contention-free register complexity.
Theorem 2. Let A be an algorithm for solving the con-
tention detection problem, or for (weak) deadlock-free mutual
exclusion. Let n be the number of processes, l be the
atomicity, and c be the contention-free register complexity.
Then,
c log nl+log log n .
Proof. From Lemma 6, we get
n<2w! } (4c } w!)c } (w } 2lw)w, (1)
where w is the contention-free write-register complexity.
Since w!ww, we have
l } w2+1+2c+(c+2) } w } log w+c } log c>log n. (2)
Since cw, we can substitute c for w in (2) to obtain
l } c2+1+2c+(c+3) } c } log c>log n. (3)
It is easy to check that
1+2c+(c+3) } c } log c(c+1)2 } log(c+1)2. (4)
Thus, putting (3) and (4) together, we get that
l } c2+(c+1)2 } log(c+1)2>log n, (5)
which gives us
(c+1)2 } log(2l } (c+1)2)>log n. (6)
Using the simple fact, that for every positive x, y and z,
if x } log( y } x)z then x>
z
log( y } z)
(7)
it follows immediately from (6) that
(c+1)2 >
log n
l+log log n
. (8)
The bound of the theorem follows, using the fact that c must
be an integer. K
Even though the lower bound on contention-free register
complexity might not be tight, it implies that the total
number of different shared bits accessed by a process cannot
be constant, even in the absence of contention.
2.6. Efficient Contention-Free Algorithms for Mutual
Exclusion
As far as upper bounds are concerned, for Lamport’s fast
mutual exclusion algorithm [Lam87], the atomicity is
log n, and the contention-free complexity is a constant. For
atomicity smaller than log n, we can prove the following
result.
Theorem 3. For every 1llog n, there is a deadlock-
free mutual exclusion algorithm (and hence also an algorithm
for detecting contention) for n processes, with atomicity l,
contention-free step complexity 7 W(log n)l X, and conten-
tion-free register complexity 3 W(log n)l X.
Proof. We use Lamport’s fast algorithm as a basic
building block. As already mentioned, in this algorithm, in
the absence of contention a process needs to access the
shared memory five times in order to enter its critical section
and twice in order to exit ita total of seven accesses. Only
3 different registers are accessed. When the atomicity is l, the
algorithm can handle 2l processes.
69CONTENTION-FREE COMPLEXITY
File: 643J 256109 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 6235 Signs: 4705 . Length: 56 pic 0 pts, 236 mm
We design an algorithm for n processes by constructing
an 2l-ary tree where each node is a copy of Lamport’s solu-
tion for 2l processes, using a separate set of registers. To
enter its critical section process i starts participating in the
protocol at leaf Wi2lX (level 1) and advances towards the
root of the tree. A process advances to level i if it is a winner
at level i&1. A process can enter its critical section when it
is the winner at the root. To exit its critical section process
i simply executes the exit code in all the nodes in its path
from the leaf to the root. The depth of the tree is
Wlog(2l) nX=W(log2 n)l X, and in the absence of contention a
process needs 7 accesses to 3 different registers at each node
it visits. The result follows. K
The idea of using a binary tree for solving mutual exclu-
sion is due to Peterson and Fischer [PF77]. In the above
solution, the number of accesses differ for different values of
l. However, the number of times that bits are accessed in the
absence of contention is O(log n), for all values of l. Even
though the contention-free time complexity of mutual exclu-
sion is about the same as that of contention detection, their
worst-case time complexities are quite different. Contention
detection can be solved by an algorithm whose worst-case
step complexity is Wlog nl X. For mutual exclusion for two
or more processes, in the presence of contention it is always
possible to schedule the processes in a way that will force the
winner to busy wait before entering its critical section.
The bounds on the four complexity measures for
deadlock-free mutual exclusion for n processes, with
atomicity l, are summarized in the following table:
Bounds for mutual exclusion
Lower bound Upper bound
Contention-free register  log nl+log log n 3 log nl |
(Theorem 2) (Theorem 3)
Contention-free step
log n
l&2+3 log log n
7 log nl |
(Theorem 2) (Theorem 3)
Worst-case register  log nl+log log n O(log n) [Kes82]
(Theorem 2)
Worst-case step  [AT92] 
3. NAMING
In order to further demonstrate the differences among dif-
ferent notions of time complexity, we consider a very simple
problem. The naming problem is to assign unique names to
initially identical processes. A desired property of a solution
is that the name space from which the names are assigned is
small. A solution to the problem is required to be wait-free,
that is, it should guarantee that every participating process
will always be able to obtain a unique name and terminate
in a finite number of steps, regardless of the behavior of
other processes (such as abnormal termination).
3.1. Models
For process communication, we use the shared memory
model with shared binary registers (bits) initialized to some
known values. It is easy to see that if in one atomic step a
process can either read or write a shared register, but cannot
do both, then the naming problem is not solvable deter-
ministically, since it is not possible to break symmetry.
A probabilistic solution for the atomic read and write model
is presented in [LP90]. Since we are only interested here in
deterministic solutions, we need to make stronger atomicity
assumptions. Below we list the eight different operations a
process may apply to a single bit. Each operation is defined
in terms of how it affects the bit, and whether or not it
returns a value.
1. skip: has no effect on the value of the bit and does
not return a value.
2. read: has no effect on the value of the bit, and
returns the current value.
3. write-0: the value 0 is assigned to the bit, and no
value is returned.
4. test-and-reset: the value 0 is assigned to the
bit, and the old value is returned.
5. write-1: the value 1 is assigned to the bit, and no
value is returned.
6. test-and-set: the value 1 is assigned to the bit,
and the old value is returned.
7. flip: flips the value of the bit, and does not return
a value.
8. test-and-flip: flips the value of the bit, and
returns the old value.
A model is a subset of these eight operations, and it
defines the operations that are supported for each shared
bit. (Thus, there are 28 different models.) For example, in
the model [read, write-0, write-1], in one atomic
step a process can either read or write a shared bit, but
cannot do both. The model which includes all the eight
operations is called the readmodifywrite model in the
literature. The test-and-flip operation is sometimes
called the fetch-and-complement operation.
3.2. Complexity Measures
As in mutual exclusion, we consider the step and register
complexities in the worst and the contention-free cases. The
worst-case step (register) complexity of a naming algorithm
is the maximum step (register) complexity of a process p in
a run fragment _ij=si w
ei } } } wwej&1 sj such that both the
70 ALUR AND TAUBENFELD
File: 643J 256110 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 6256 Signs: 5390 . Length: 56 pic 0 pts, 236 mm
extreme events ei and ej&1 belong to p. For the contention-
free case, we take the maximum over run fragments _ij such
that both ei and ej&1 belong to p, and for every process p$
either p$ has terminated (or failed) in state si or p$ has not
started in state sj . Thus, in the contention-free case, there is
no interference from other processes while p executes; every
process either decided (or failed) before p starts, or starts
only after p finishes.
Observe that the operations write-0 and write-1 are
duals of each other, and similarly, the operations test-
and-reset and test-and-set are duals of each other.
Two models M and M$ are duals of each other if M is
obtained by replacing each operation of M$ by its dual
operation. For the operations skip, read, flip, test-
and-flip, each operation is its own dual. It is easy to
prove that, if a model M is the dual of M$ then, for every
measure of time complexity, any bounds applicable to M
also hold for M$.
3.3. Bounds for Naming
In order to further demonstrate the differences among dif-
ferent notions of time complexity, we will find out the con-
tention-free and worst-case, step and register complexities
of solving the naming problem in several of the 28 possible
models. Our goal is not to solve the problem in all models.
For most of the models, it is easy to find tight bounds for
algorithms that solve the naming problem. We will cover
here some of the more interesting models, and leave it as an
exercise for the reader to come up with bounds for the other
models.
In each of the algorithms below, we assume that initially
there are n identical processes, and the algorithms assign the
processes unique names from the set [1, ..., n]. Hence, they
are optimal in terms of the size of the name space. The
resulting upper bounds are summarized in the following
theorem:
Theorem 4. 1. For every model which includes test-
and-flip, there is a naming algorithm whose worst-case
step complexity is log n.
2. For every model which includes both test-and-set
and test-and-reset, there is a naming algorithm whose
worst-case register complexity is log n.
3. For every model which includes test-and-set ,
there is a naming algorithm whose worst-case step complexity
is n&1.
4. For every model which includes test-and-set and
read, there is a naming algorithm whose contention-free step
complexity is log n.
Proof. To prove the first part, we design a tree-based
algorithm. We use n&1 shared bits which are arranged as
a balanced binary tree of depth log n. It is assumed that the
leaves of the tree are numbered 1 through n2. The initial
values of the bits are immaterial. Each process starts at the
root, and follows a path to one of the leaves. At each node
it applies test-and-flip operation to the corresponding
shared bit. At an internal node, if the returned value is 0, the
process advances towards the left subtree, otherwise, it
advances towards the right subtree. At a leaf numbered l, if
the returned value is 0, then the process is assigned the name
2l&1, otherwise, it is assigned the name 2l.
To prove the second part, a similar algorithm is used.
Again, each process follows a path from the root to a leaf.
At each node, since a test-and-flip operation is not
available, the process alternately applies test-and-set
and test-and-reset operations on the shared bit until
either a test-and-set operation returns 0 or a test-
and-reset operation returns 1. The value obtained in the
last operation at a node is used, either to proceed to the next
level, or to assign a name, as in the first part.
To prove the third part, we design the following trivial
algorithm. We use n&1 bits with initial value 0, which are
numbered 1 through n&1. Each process scans the bits, in
the same order, starting with the first bit. At each step, the
process applies the test-and-set operation, and either
moves to the next bit if the returned value is 1, or stops when
the returned value is 0 or if it has already scanned all the
n&1 bits. The process is assigned the name equal to the bit
on which its (last) test-and-set operation returned 0,
or the name n, if no operation returned the value 0.
To prove the fourth part, we slightly modify the previous
algorithm. Again, n&1 bits with initial values 0 are used,
which are numbered 1 through n&1. Each process starts
with a binary search on the array of n&1 bits, trying to find
the least numbered bit whose value is still 0. Such a search
takes exactly log n steps. All the first (log n)&1 steps are
done using read operations only, while the last step of the
binary search is a test-and-set operation. If the last
operation returns 0, then the process stops. Otherwise, it
continues as in the algorithm of the previous paragraph,
moving on to the next bit when the returned value is 1, or
stopping when the returned value is 0 (or if it has already
scanned all the n&1 bits), The process is assigned the name
equal to the bit on which its (last) test-and-set opera-
tion returned 0, or the name n, if no operation returned the
value 0. K
Now we consider lower bounds in different models. The
lower bound proofs need no assumptions about the size of
the name space (it may be infinite), and hence we get
stronger results. The next theorem gives a lower bound on
all our complexity measures in all models.
Theorem 5. For every model, the contention-free
register complexity of every naming algorithm is no less than
log n.
71CONTENTION-FREE COMPLEXITY
File: 643J 256111 . By:CV . Date:04:06:96 . Time:16:16 LOP8M. V8.0. Page 01:01
Codes: 6596 Signs: 5277 . Length: 56 pic 0 pts, 236 mm
Proof. Let A be an arbitrary naming protocol. Let _ be
a run of A in which all the processes are scheduled one at
a time, one after the other, and when a process is scheduled
it runs without interruption until it terminates. Thus, in _
there is no contention, and at any point only one process is
participating in the algorithm. Let R( p, m) be the pair
(x, v), where x is a shared bit and v is a value, if the m th
operation executed by process p along _ accesses the bit x,
and returns the value v. R( p, m) is undefined if p takes less
than m steps, and let v be = if the mth operation does not
return a value. If R( p, m) is (x, v), we will also write
Rr( p, m)=x and Rv( p, m)=v. Since all processes are iden-
tical, and mth step of a process is determined by the values
returned in earlier steps, we can prove the following: For all
processes p, p$, and m, if R( p, m$)=R( p$, m$) for all m$<m,
then (a) R( p, m) and R( p$, m) are defined, (b) Rr( p, m)=
Rr( p$, m), and (c) if Rr( p, m)=Rr( p, m$) for some m$<m,
then Rv( p, m)=Rv( p$, m). From these properties, it is
straightforward to prove that if there are n processes, then
there is some p which takes at least log n steps. K
Recall that register complexity is always bounded above
by step complexity, and contention-free complexity is
always bounded above by worst-case complexity. Thus the
above two theorems imply that for models with test-
and-flip, log n is a tight bound on all complexities. The
next theorem gives lower bounds on the worst-case step
complexity when test-and-flip is not available:
Theorem 6. For every model which does not include
test-and-flip, the worst-case step complexity of every
naming algorithm is no less than n&1.
Proof. We use the following simple observation. If n
processes apply the same operation to the same bit, which
is different from test-and-flip , then the returned
values for at least n&1 processes must be the same. Now
consider the following run. First we activate all the n pro-
cesses and let each one of them take one step. Since the
processes are identical they will apply the same operation
to the same bit and hence n&1 of them will get the same
response. Thus n&1 processes are still identical (i.e., have
the same history). Next we activate these n&1 identical
processes and let each one take one step, and are guaranteed
to end up with a set of at least n&2 identical processes. We
can repeat this procedure n&1 times, and hence at least one
process takes n&1 steps. K
Note that the above lower bound does not apply to the
worst-case register complexity. In fact, from Theorem 4,
with test-and-set and test-and-reset, it can be
only log n. It is possible to prove a similar lower bound if the
model includes only one of these two operations:
Theorem 7. For the model [test-and-set], the con-
tention-free register complexity of every naming algorithm is
no less than n&1.
Proof. First we observe that with only test-and-
set, a shared bit can be changed at most once in a given
run. We will construct a contention-free run in which pro-
cesses execute one after the other. Let us order the processes
p1 , ..., pn . First let p1 execute till it chooses a name, and then
let p2 execute till completion, and so on. It is clear that if pi
does not change any bit, then the next process chooses the
same name as pi does. Hence, each pi , for i<n, changes at
least one bit. For i<n, let xi be the first bit that pi changes
(the bits x1 , ..., xn&1 are all distinct). Since all processes are
identical, it follows that every process following pi accesses
xi . It follows that pn&1 (as well as pn) access the n&1 dis-
tinct registers x1 , ..., xn&1. K
Some of our results are summarized in the following table
that shows tight bounds on various complexity measures for
some of the models. (We use c-f and w-c as abbreviations for
contention-free and worst-case, respectively.)
Tight bounds for naming
Read+ Read+
Test-and- Test-and- test-and-set+ Test-and- rmw
set set test-and-reset flip (all)
c-f register n&1 log n log n log n log n
c-f step n&1 log n log n log n log n
w-c register n&1 n&1 log n log n log n
w-c step n&1 n&1 n&1 log n log n
This table illustrates how different synchronization
primitives can be distinguished by different complexity
measures. If the model supports only test-and-set
operation, n&1 is a tight bound on all four measures.
Strengthening the model with read lowers both measures
of the contention-free complexity to log n. Introducing
test-and-reset to this model lowers the worst-case
register complexity to log n, but the worst-case step com-
plexity still remains n&1. With the availability of test-
and-flip operation, log n is a tight bound on all four
measures. In particular, the traditional measure of worst-
case step complexity fails to distinguish between the models
of the first three columns. A simultaneous consideration of
all four measures of time complexity results in more refined
distinctions among the computational powers of these dif-
ferent primitives.
4. DISCUSSION
In this paper we have focused on the notion of conten-
tion-free time complexity, which seems to be an important
complexity measure in the analysis of distributed algo-
rithms.
Reading from shared-memory can take a long time on
today’s architectures, compared to reading from a cache.
Hence, the time to access the shared memory is strongly
influenced by the level of contentionthe number of
72 ALUR AND TAUBENFELD
File: 643J 256112 . By:CV . Date:04:06:96 . Time:16:11 LOP8M. V8.0. Page 01:01
Codes: 5448 Signs: 4505 . Length: 56 pic 0 pts, 236 mm
processors that access the same memory location
simultaneously. Several formal methods of how to model
contention have been suggested in the past, which focus on
the worst-case analysis under high level of contention.
While good performance in the presence of high contention
is important, the behavior in the absence of contention is
sometimes even more important. As Lamport points out:
‘‘...the current belief among operating system designers is that
contention for a critical section is rare in a well designed
system’’ [Lam87]. Thus, ‘‘high-contention analysis’’ may be
a misleading indication of performance in practice.
Good time complexity in the absence of contention can
help achieve good performance also in the presence of high
contention, using a technique called backoff : when a pro-
cess notices contention it delays itself for some time, giving
other processes a chance to proceed. Experiments show that
in various mutual exclusion algorithms (such as Lamport’s
fast algorithm) with backoff, regardless of the level of con-
tention, the time it takes the winning process to enter its
critical section since the last time a critical section was
released, is very close to the time it takes in absence of
contention (for example, see [MS93]).
We have also shown how the size of the biggest register
that may be accessed by a process in one atomic step can
affect the time complexity of a given problem. Several recent
papers have investigated a related question, of how the
register size affects the concurrent computability of various
problems [AS93, FMRT90, FMT93].
ACKNOWLEDGMENTS
We are grateful to Michael Merritt for many helpful discussions, and for
carefully proofreading the paper.
Received December 13, 1994; final manuscript received January 17, 1996
REFERENCES
[AS93] Afek, Y., and Stupp, G. (1993), Synchronization power
depends on the register size, in ‘‘Proceedings of the 34th IEEE
Annual Symposium on Foundations of Computer Science,’’
pp. 196205.
[AHS91] Aspnes, J., Herlihy, M., and Shavit, N. (1991), Counting
networks and multi-processor coordination, in ‘‘Proceedings
of the 23rd ACM Symposium on Theory of Computing,’’
pp. 348358.
[AT92] Alur, R., and Taubenfeld, G. (1992), Results about fast
mutual exclusion, in ‘‘Proceedings of the 13th IEEE Real-
Time Systems Symposium,’’ pp. 1221.
[BL93] Burns, J., and Lynch, A. (1993), Bounds on shared memory
for mutual exclusion, Inform. and Comput. 107, 171184.
[DHW93] Dwork, C., Herlihy, M., and Waarts, O. (1993), Contention
in shared memory algorithms, in ‘‘Proceedings of the 25th
ACM Symposium on Theory of Computing,’’ pp. 174183.
[Dij65] Dijkstra, E. W. (1965), Solution of a problem in concurrent
programming control, Comm. ACM 8(9), 569.
[FMRT90] Fischer, M. J., Moran, S., Rudich, S., and Taubenfeld, G.
(1990), The wakeup problem, in ‘‘Proceedings of the 22nd
ACM Annual Symposium on Theory of Computing,’’
pp. 106116; to appear in SIAM J. Comp.
[FMT93] Fischer, M. J., Moran, S., and Taubenfeld, G. (1993), Space-
efficient asynchronous consensus without shared memory
initialization, Inform. Process. Lett. 45, 101105.
[Her91] Herlihy, M. (1991), Wait-free synchronization, ACM Trans.
Programming Languages Systems 13(1), 124149.
[Kes82] Kessels, J. L. W. (1982), Arbitration without common
modifiable variables, Acta Informat. 17, 135141.
[Lam87] Lamport, L. (1987), A fast mutual exclusion algorithm, ACM
Trans. Comput. Systems 5(1), 111.
[LP90] Lipton, R. J., and Park, A. (1990), The processor identity
problem, Inform. Process. Lett. 36, 9194.
[MS93] Michael, M. M., and Scott, M. (1993), ‘‘Fast Mutual Exclu-
sion, Even with Contention,’’ Technical Report 460, Depart-
ment of Computer Science, University of Rochester.
[PF77] Peterson, G. L., and Fischer, M. J. (1977), Economical solu-
tions for the critical section problem in a distributed system,
in ‘‘Proceedings of the 9th ACM Symposium on Theory of
Computing,’’ pp. 9197.
[YA93] Yang, J.-H., and Anderson, J. H. (1993), Fast, scalable syn-
chronization with minimal hardware support, in ‘‘Proceedings
of the 12th ACM Symposium on Principles of Distributed
Computing,’’ pp. 171182.
[YA94] Yang, J.-H., and Anderson, J. H. (1994), Time bounds for
mutual exclusion and related problems, in ‘‘Proceedings of
the 26th ACM Symposium on Theory of Computing,’’
pp. 224233.
73CONTENTION-FREE COMPLEXITY
