Long-Lived Counters with Polylogarithmic Amortized Step Complexity by Baig, Mirza Ahad et al.
Long-Lived Counters with Polylogarithmic
Amortized Step Complexity
Mirza Ahad Baig
LaBRI, Bordeaux INP, France
CNRS, ReLaX, UMI2000, Siruseri, India
Chennai Mathematical Institute, Siruseri, India
mirzabaig.cmi@gmail.com
Danny Hendler
Ben-Gurion University of the Negev, Beer-Sheva, Israel
hendlerd@cs.bgu.ac.il
Alessia Milani
LaBRI, Bordeaux INP, France
milani@labri.fr
Corentin Travers
LaBRI, Bordeaux INP, France
travers@labri.fr
Abstract
A shared-memory counter is a well-studied and widely-used concurrent object. It supports two
operations: An Inc operation that increases its value by 1 and a Read operation that returns its
current value. Jayanti, Tan and Toueg [16] proved a linear lower bound on the worst-case step
complexity of obstruction-free implementations, from read and write operations, of a large class of
shared objects that includes counters. The lower bound leaves open the question of finding counter
implementations with sub-linear amortized step complexity.
In this paper, we address this gap. We present the first wait-free n-process counter, implemented
using only read and write operations, whose amortized operation step complexity is O(log2 n) in
all executions. This is the first non-blocking read/write counter algorithm that provides sub-linear
amortized step complexity in executions of arbitrary length. Since a logarithmic lower bound on the
amortized step complexity of obstruction-free counter implementations exists, our upper bound is
optimal up to a logarithmic factor.
2012 ACM Subject Classification Theory of computation → Shared memory algorithms; Theory of
computation → Concurrent algorithms
Keywords and phrases Shared Memory, Wait-freedom, Counter, Amortized Complexity, Concurrent
Objects
Digital Object Identifier 10.4230/LIPIcs.DISC.2019.3
Funding Mirza Ahad Baig, Alessia Milani and Corentin Travers are supported by ANR projects
Descartes and FREDDA. Mirza Ahad Baig is additionally supported by UMI Relax. Danny Hendler
is supported by the Israel Science Foundation (grant 380/18).
Acknowledgements We thank the anonymous reviewers for their many helpful comments.
1 Introduction
A shared-memory counter [18] is a well-studied and widely-used concurrent object [2, 5, 7,
10, 17]. A counter supports two operations: An Inc operation that increases its value by 1
and a Read operation that returns its current value.
A wait-free counter can be constructed easily by using an atomic snapshot [1, 3, 7] object,
allowing each process to update its own component (by invoking an Update operation) and
to obtain an atomic view of all components (by invoking a Scan operation). To increment
© Mirza Ahad Baig, Danny Hendler, Alessia Milani, and Corentin Travers;
licensed under Creative Commons License CC-BY
33rd International Symposium on Distributed Computing (DISC 2019).
Editor: Jukka Suomela; Article No. 3; pp. 3:1–3:16
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
3:2 Long-Lived Counters with Polylogarithmic Amortized Step Complexity
the counter, a process p simply increments its component. To read the counter’s value, p
invokes Scan and returns the sum of all components in the view it obtains. Since wait-free
atomic snapshot can be implemented, using reads and writes only, in step complexity linear
in the number of processes n [8, 14], so can counters.
Indeed, a well-known result by Jayanti, Tan and Toueg [16] proved a linear lower bound on
the worst-case step complexity of obstruction-free read/write implementations of a large class
of shared objects that includes counters. Aspnes, Attiya and Censor-Hillel [4] observed that
the lower bound holds only when numerous operations are applied to the object and does not
rule out the possibility of obtaining algorithms whose step complexity is sub-linear when the
number of operations is bounded. Leveraging this observation, they presented constructions
of several data structures for which operations’ step complexity is polylogarithmic in n as
long as the object’s value is polynomial in n. Specifically, they presented a wait-free counter
for which the step complexities of Inc and Read operations are O
(
min(logn log v, n)
)
and
O
(
min(log v, n)
)
, respectively, where v is the object’s current value. However, the worst-case
and amortized step complexities of the counter algorithm of [4] deteriorate as the number of
Inc operations increases. For executions in which the number of Inc operations is exponential
in n, both the worst-case and the amortized step complexities become the same as those of
the snapshot-based algorithm, that is, linear in n.
Our contribution. The lower bound of [16] leaves open the question of whether there exists
a counter algorithm with sub-linear amortized step complexity. In this paper, we answer
this question in the affirmative, by presenting the first wait-free read/write counter whose
amortized step complexity is polylogarithmic. This is the first non-blocking read/write
counter that provides sub-linear amortized step complexity in executions of arbitrary length.
Our counter implementation is based on the counter algorithm presented in [4]. Their counter
algorithm uses max registers, an object type they introduced and implemented. A max
register r supports a WriteMax(r,v) operation that writes a non-negative integer v to r and
a ReadMax(r) operation that returns the maximum value previously written to r.
We present a novel wait-free deterministic implementation of an unbounded max register
and “plug it” into the counter algorithm of [4], thus obtaining a counter with O(log2 n)
amortized step complexity. Aspnes et al. also presented an unbounded max register, however
the step complexities of both ReadMax and WriteMax operations in their algorithm are
O
(
min(log v, n)
)
, where v is the objects’s current value. Thus, executions of arbitrary length
can have linear amortized complexity. Aspnes and Censor-Hiller [6] presented an unbounded
max register implementation for which every operation terminates in a constant number of
steps with high probability, under the assumption that the max register’s value does not
grow too quickly. Our unbounded max algorithm makes a similar assumption. The max
register algorithm of [6] is randomized since it relies on a randomized helping mechanism,
whereas ours is deterministic.
Using information-theoretic arguments, Jayanti established a logarithmic lower bound
on the worst-case operation step complexity for obstruction-free implementations of a set
of one-time objects that includes a fetch&increment object, from operations such as load-
linked/store-condition, move and swap [15]. Attiya and Hendler [9] presented lower bounds
on the time and space complexities of obstruction-free implementations of several objects
from k-word compare-and-swap operations. Specifically, using an information-theoretic
argument as well, they proved a logarithmic lower bound on the amortized step complexity of
implementing an obstruction-free one-time fetch&increment object [9, Theorem 9]. Their
proof can be modified in a straightforward manner to establish the same result for counters,
implying that our algorithm is optimal in terms of amortized step complexity up to a
logarithmic factor.
M.A. Baig, D. Hendler, A. Milani, and C. Travers 3:3
The rest of this paper is organized as follows. We present the system model we assume
and additional required definitions in Section 2. In Section 3, we present our key technical
contribution – an unbounded max register algorithm that guarantees linearizability and
logarithmic amortized step complexity when its value is not increased “too quickly”. In
Section 4, we prove that by “plugging” our unbounded max register into the counter algorithm
of [4] (instead of using the max register algorithm of [4]) we obtain a linearizable counter with
polylogarithmic amortized step complexity. The paper is concluded with a short discussion
in Section 5.
2 Model and Preliminaries
Read/write shared memory. We consider a standard shared-memory model, where a set P
of n crash-prone asynchronous processes communicate via shared registers, supporting only
atomic read and write operations. A concurrent object implementation specifies the object’s
state representation and the algorithms processes follow when they perform operations
supported by the object. An execution is a series of steps performed by processes as they
follow their algorithms, in each of which a process applies at most a single read or write
operation to a register (possibly in addition to some local computation). In what follows, we
only consider finite executions. Roughly speaking, an implementation is linearizable [13] if
each operation appears to take effect atomically at some point between its invocation and
response; it is wait-free [11] if each process completes its operation if it performs a sufficiently
large number of steps; it is lock-free if at least one process completes its operation after
a sufficiently large number of steps is performed; it is obstruction-free [12] if each process
completes its operation if it performs a sufficiently large number of steps when running solo.
Operation Op1 precedes operation Op2 in an execution E, if Op1’s response appears in E
before Op2’s invocation.
Complexity measure. The worst-case amortized step complexity (henceforth simply amort-
ized step complexity) is defined as the worst-case (taken over all possible executions) average
number of steps performed by operations. It measures the performance of an implementation
as a whole rather than the performances of individual operations. Indeed, in an execution
of a lock-free implementation, some operations may never terminate and the worst-case
operation step complexity may thus be infinite. More precisely, given a finite execution E,
an operation Op appears in E if it is invoked in E. We denote by Nsteps(Op,E) the number
of steps performed by Op in E and by Ops(E) the set of operations that appear in E. The
amortized step complexity of an implementation A is then:
AmtSteps(A) = max
E: finite execution of A
∑
Op∈Ops(E)Nsteps(Op,E)
|Ops(E)| .
Max registers. A max register r supports a WriteMax(r,v) operation that writes a non-
negative integer v ≥ 0 to r and a ReadMax(r) operation that returns the maximum value
previously written to r. A bounded max register MaxRegm can assume values from {0, . . . ,m−
1}, for some integer m. An unbounded max register UnboundedMaxReg can store any non-
negative integer.
DISC 2019
3:4 Long-Lived Counters with Polylogarithmic Amortized Step Complexity
3 Polylogarithmic Amortized Step Complexity Max Register
The pseudo-code of our unbounded max register is presented in Algorithm 1. Lines in black
font constitute a lock-free version of the algorithm, which we describe and analyze in this
section. Lines in lighter (metal) color add a helping mechanism that makes the algorithm wait-
free. For presentation simplicity, we defer the description of this mechanism to Subsection 3.3.
We proceed with a description of Algorithm 1. An UnboundedMaxRegm object M consists of
an infinite number of shared bounded MaxRegm max registers, denoted maxj , for all j ∈ N0.
Register maxj will be used for representing values in the range [m·j,m·(j+1)−1]. Henceforth,
the subscript m in the type UnboundedMaxRegm refers to the bound m of the bounded max
registers used by objects of this type. Each bounded max register maxj is associated with a
shared switchj bit. All max registers and their corresponding switches are initialized to 0.
Each process i has a variable lasti, storing the largest index j such that i accessed maxj ,
initialized to 0 as well.
The Write function. To write value v, process i first computes the index k of the bounded
max register to write to and the residue v′ to be written to it (lines 2–3). Next, i checks
in line 4 whether maxk is obsolete. We say that a (bounded) max register is obsolete, if its
corresponding switch is set, indicating that values were already written to higher-indexed
max registers and thus maxk should no longer be accessed. If maxk is obsolete, i does not
need to write to it, so it proceeds to line 12 for increasing its last index, if required, and
returns. Otherwise, maxk is not obsolete, so i writes to it the residue v′ (line 5). If the max
object written to is not the first (line 6), then i ensures that the previous max object is
obsolete (lines 8, 11), updates its last index (line 12), if required, and returns.
Algorithm 1 Unbounded Max Register UnboundedMaxRegm, code for process i.
Shared variables:
switchj ∈ {0, 1} : a 1-bit register for each j ∈ N0, initially all 0
maxj : a MaxRegm object for each j ∈ N0, initially all 0
lasti ∈ N0 : stores the largest index j such that process i accessed maxj , initially 0
H[n][n] initially all 0 : a 2D integer array, H[i][j] used by process j to help process i
nextToHelpi : identifer of last process helped by i
1: function Write(UnboundedMaxRegm, v)
2: v′ ← v mod m
3: k ← bv/mc
4: if switchk = 0 then
5: WriteMax(maxk, v′)
6: if k > 0 then
7: curMax ← ReadMax(maxk−1) + (k − 1) ·m
8: if switchk−1 = 0 then
9: H[nextToHelpi][i]← curMax
10: nextToHelpi ← (nextToHelpi + 1) mod n
11: switchk−1 ← 1
12: lasti ← max(k, lasti)
13: function Read(UnboundedMaxRegm)
14: local c initially 0
15: while switchlasti 6= 0 do
16: lasti ← lasti + 1, c← c+ 1
17: if (c mod n) = 0 then
18: if (hVal ← GetHelp(c)) > 0 then return hV al
19: v ← ReadMax(maxlasti)
20: return v + (lasti ·m)
M.A. Baig, D. Hendler, A. Milani, and C. Travers 3:5
The Read function. Process i scans the switches in increasing order in lines 15–16, increasing
the value of its last index in the process, until it finds the first non-obsolete bounded max
register (this might never happen). Once it does, it reads the maximum residue previously
written to that max object (line 19), adds to the residue a multiple of m corresponding to
the index of that (non-obsolete) max register and returns the sum (line 20).
3.1 Linearizability
The correctness of Algorithm 1 is guaranteed only in executions in which the max register’s
value is increased in bounded increments. This requirement is formalized by the following
definition.
I Definition 1 (`-Bounded-Increment Execution). Let E be an execution and let M be an
UnboundedMaxReg object. We say that E is an `-bounded-increment execution for M if for
each write operation op = Write(v) on M in E, with v > `, there exists a write operation
op′ = Write(v′) on M in E that precedes op, such that v − ` ≤ v′ < v.
In Section 4, we present an n-process unbounded counter implementation that uses
UnboundedMaxReg objects. As we prove, all the executions of that counter are n-bounded-
increment executions for all these objects. Let M be an UnboundedMaxRegm object, imple-
mented by Algorithm 1, for m ≥ n, and let E be an n-bounded-increment execution for M ,
we now show that M is linearizable in E. We classify every write operation W on M that
appears in E to exactly one of the 4 following types.
(i) W did not yet execute line 4 in E.
(ii) W executed line 4 and read switchk = 0, but its WriteMax in line 5 was not yet
linearized.
(iii) W executed line 4, read switchk = 0 and its WriteMax operation in line 5 was linearized.
We say that W is associated with that WriteMax operation.
(iv) W executed line 4 and read switchk = 1.
Similarly, we classify every read operation R on M to the following 2 types:
(i) R did not yet perform in E a ReadMax operation in line 19 that was linearized.
(ii) R read switchk = 0, for some k, and its ReadMax in line 19 was linearized. We say that
R is associated with that ReadMax operation.
We associate with each k ∈ N0 two sets of operations on M in E, denoted Downk and Futilek.
Operations on M are partitioned into these sets as follows:
Downk contains Write operations of type (iii) and Read operations of type (ii) that are
associated with WriteMax/ReadMax operations on maxk.
Futilek contains Write operations of type (iv).
Operations that were not assigned to any Down or Futile set are Write operations of types
(i) and (ii) and Read operations of type (i). All these operations did not complete in E and
will not appear in its linearization. We refer to these as removed operations. The rest of the
operations are linearized according to the following ordering rules.
1. For all pairs k, k′ such that k < k′, all the operations in Downk are ordered before all
the operations in Downk′ .
2. We order the operations within each set Downk according to the linearization order of the
WriteMax and ReadMax operations on the maxk register with which they are associated.
DISC 2019
3:6 Long-Lived Counters with Polylogarithmic Amortized Step Complexity
3. Rules 1-2 order all Down operations. Enumerate them as Dop1, Dop2, . . . , Dopr. For
any futile operation Fop ∈ ⋃k Futilek, define the set
SFop = {Dop ∈ ∪k∈N0Downk | Fop precedes Dop in E}.
If SFop is empty, we put Fop after Dopr. Otherwise, let Dopi be the least operation
in SFop according to the ordering on Down operations, we put Fop immediately before
Dopi. For each set of the Futile operations put either immediately before some Dopi
or after Dopr, the set of Futile operations is ordered according to their real-time order
in E.1
Rules 1–3 define a full ordering among all non-removed operations.
I Observation 2. An operation Op on M in E is associated with a ReadMax or WriteMax
operation on maxk if and only if Op ∈ Downk.
I Observation 3. The sets Downk and Futilek (for all values of k) are mutually exclusive
and contain all the operations on M that appeared in E except for removed operations.
B Claim 4. M ’s switchj switches are set to 1 in E in increasing order, starting from switch0.
Proof. Follows since M is an UnboundedMaxRegm object, for m ≥ n, E is an n-bounded-
increment execution for M , and from Lines 8,11. J
B Claim 5. For all k′ ≤ k, there are no two operations Fop, Dop such that Fop ∈ Futilek,
Dop ∈ Downk′ and Fop is linearized before Dop in the ordering given by rules 1-3.
Proof. Suppose towards a contradiction that Fop is linearized before Dop. If Fop was placed
after Dopr when applying rule 3, we immediately reach a contradiction. Assume otherwise,
then, from rule 3, there exists a Down operation Dopi, such that Fop precedes Dopi, Fop
is linearized before Dopi, and no Down operation is linearized between Fop and Dopi.
Consequently, it must be that Dop is linearized after Dopi. From rules 1-2, we have that
Dopi ∈ Downk1 such that k1 ≤ k′ ≤ k. Since Dopi ∈ Downk1 , Dopi reads 0 from Switchk1 .
However, Fop reads Switchk = 1 before Dopi starts, hence, by Claim 4, Switchk1 = 1 when
Dopi starts. This is a contradiction. J
I Lemma 6. Ordering rules 1-3 define a sequential order between E’s non-removed operations
that preserves the real-time order between non-overlapping operations in E.
Proof. From ordering Rule 2, Observation 2 and the linearizability of the maxj objects, for
each j, the real-time order between all operations in Downk is preserved. From Claim 4, M ’s
switches are set to 1 in increasing order. Consequently, for any two operations Op ∈ Downk
and Op′ ∈ Downk′ , such that k < k′, Op′ does not precede Op in E. It follows that the
real-time order between each pair of Down operations is preserved by the linearization.
It remains to argue about Futile operations. Let Dop1, Dop2, . . . , Dopr be the linear
order among all Down operations, as specified by rules 1-2. Let Fop1, Fop2 be Futile
operations such that Fop1 is linearized before Fop2. There are two cases to consider. If
both operations are put after Dopr or immediately before the same operation Dopi then,
according to rule 3, their order preserves E’s real-time order. Otherwise, there exists at least
one Down operation linearized between them. Let Dop be the first Down operation ordered
1 In general, this induces a partial order on Futile operations, which can be extended to a full order
arbitrarily.
M.A. Baig, D. Hendler, A. Milani, and C. Travers 3:7
after Fop1. From rule 3, Fop1 precedes Dop in real-time order. Suppose Fop2 precedes Fop1
in real-time order, then Dop ∈ SFop2 holds. Since Fop2 is linearized by rule 3 before all the
Down operations that follow it in real-time order, this is a contradiction.
Let Fop and Dop respectively be a Futilek and a Downk′ operation. Suppose Fop is
linearized before Dop but Dop precedes Fop in real-time order. If k′ ≤ k, then, by Claim 5,
this is a contradiction. Assume, then, that k < k′ holds and consider the application of
ordering rule 3 to Fop. If Fop is put after Dopr, then it is linearized after Dop, which is a
contradiction. Assume, then, that Fop is put by rule 3 immediately before some Dopi ∈ SFop,
so Fop precedes Dopi in real-time order. It follows that Dop precedes Dopi in real-time
order, so Dop is linearized before Dopi. Since no Down operation can be linearized between
Fop and Dopi, it follows that Dop is linearized before Fop. This is a contradiction.
Finally, suppose that Fop precedes Dop in real-time order. In this case, from rule 3, Fop
is linearized before Dop, preserving real-time order. J
I Lemma 7. The linearization defined by ordering rules 1-3 satisfies the sequential semantics
of a max register.
Proof. For an integer v ∈ N0, let v′ = v mod m and k = bv/mc. Consider a Read operation
Opr onM in E that returns value v. Then Opr is associated with a ReadMax(maxk) operation
that returned v′. First, we prove that there is a Write operation Opw that wrote v to M and
Opw is linearized before Opr. From the linearizability of maxk, there is a WriteMax(maxk, v′)
operation that is linearized before the ReadMax(maxk) associated with Opr. Thus, there is a
Write operation Opw(v′ + k ·m) on M and, by Observation 2, it belongs to Downk, so Opw
is ordered before Opr according to ordering rule 2. To conclude the proof, we show that
there is no Write operation Op1 that writes value v1 > v to M and is linearized before Opr.
Suppose towards a contradiction that Op1 exists. The following two cases exist:
bv1/mc = k. This implies that (v1 mod m) > (v mod m). If Op1 ∈ Downk, this
contradicts the linearizability of maxk, because the ReadMax operation associated with
Opr does not return the maximum value written to maxk before it. If Op1 ∈ Futilek, this
contradicts Claim 5.
bv1/mc > k. In this case, either Op1 ∈ Downk′ or Op2 ∈ Futilek′ , for some k′ > k. In
the first case, Op1 is linearized after Opr by rule 1. In the second case, Claim 5 ensures
that Op1 is linearized after Opr. J
I Lemma 8. Algorithm 1 (without the helping mechanism) is lock-free.
Proof. Write operations perform a single invocation of the wait-free WriteMax operation
and a constant number of additional steps, hence they are wait-free. A Read operation may
loop forever in lines 15–16, searching for a non-obsolete max register, but only if Write
operations keep making additional max registers obsolete (in line 11). If no more Write
operations complete, each Read operation is guaranteed to complete. J
3.2 Step Complexity Analysis
The step complexity analysis provided in this section relates to the implementation of
Algorithm 1 without the helping mechanism. In the following, we denote by Ops(E) the
set of all operations that appear in E and by OpsR(E) (resp. OpsW (E)) the set of all
Read operations (resp. all Write operations) that appear in E. For an operation Op, we let
Nsteps(Op,E) denote the number of steps performed by Op in E.
I Lemma 9. If m ≥ n2, then the UnboundedMaxRegm implementation of Algorithm 1 has
amortized step complexity of O(logm) in any n-bounded-increment execution.
DISC 2019
3:8 Long-Lived Counters with Polylogarithmic Amortized Step Complexity
Proof. Let E be an n-bounded-increment execution. We wish to bound:
AmtSteps(E) =
∑
op∈Ops(E)
Nsteps(op,E)
|Ops(E)| . (1)
Let r be the number of read operations and w be the number of write operations in Ops(E).
WriteMax and ReadMax operations on an m-bounded max register perform O(logm) steps
each. Clearly from the pseudo-code of Algorithm 1, each Write operation performs a constant
number of steps in addition to possibly invoking a single WriteMax operation, thus the step
complexity of each Write operation is O(logm).
A Read operation Op performs loopOp + O(logm) steps, where loopOp is the number
of steps performed in the while loop of lines 15–16 and O(logm) is the number of steps
performed by the invocation of ReadMax in line 19. We get:
AmtSteps(E) = O
(( ∑
op∈OpsW (E)
logm+
∑
op∈OpsR(E)
logm+ loopop
)
/(w + r)
)
. (2)
If r = 0, then clearly AmtSteps(E) = O(logm), so assume that r > 0. From lines 12 and 16,
for every process i, lasti is never decreased and is incremented once in every iteration of the
while loop of lines 15–16. Therefore:
∑
op ∈ OpsR(E)
loopop = O
(
r +
∑
i∈P
lasti
)
. (3)
Consequently,
AmtSteps(E) = O
(w · logm+ r · logm+ (r +∑i∈P lasti)
w + r
)
. (4)
Assume that max register maxα is accessed in E. Since E is an n-bounded-increment execution
and all maxj registers are m-bounded, at least m · (α−1)/n Write operations have completed
prior to this access. Letting L = max
i∈P
lasti denote the maximum value of all lasti variables
at the end of E, we get that w ≥ m · (L − 1)/n. Furthermore,
∑
i∈P
lasti ≤ n · L. Thus,
AmtSteps(E) = O
(
w logm+ r logm+ (r + n · L)
w + r
)
= O
( (w + r) logm
w + r +
r
w + r +
n · L
w + r
)
= O
(
logm+ n · Lm
n
(L − 1) + r
)
= O
(
logm+
n2
m
L
(L − 1) + n
m
r
)
. (5)
The lemma now follows, since r > 0 and m ≥ n2 hold. J
From Lemmata 6–9, we obtain:
I Theorem 10. Algorithm 1 is a linearizable implementation of an unbounded max register
with amortized step complexity of O(logm) in any n-bounded-increment execution, if m ≥ n2.
The algorithm (without the helping mechanism) is lock-free.
M.A. Baig, D. Hendler, A. Milani, and C. Travers 3:9
Algorithm 2 The GetHelp utility function, code for process i.
Shared variables:
HRi[n] : an integer array, to which the i’th row in the H array is copied
Ci[n] : an integer array, counting number of writes by each helper for process i
1: function GetHelp(c)
2: if c = n then
3: for j ∈ {0, . . . , n− 1} do
4: HRi[j]← H[i][j], Ci[j]← 0
5: else
6: for j ∈ {0, . . . , n− 1} do
7: if HRi[j] < H[i][j] then
8: HRi[j]← H[i][j], Ci[j] + +
9: if Ci[j] = 2 then return HRi[j]
10: return 0
3.3 The Helping Mechanism
We now explain the helping mechanism that makes Algorithm 1 wait-free (presented in the
metal-colored lines of that algorithm). It uses a 2-dimensional shared array H. Entry H[i][j]
is used by process j to help process i by writing to it a (maximum) value of M that process
j was able to compute. Each process i owns variable nextToHelpi, storing the index of
the next process it should help. Helping is attempted by process i inside Write operations,
whenever i is about to make another max register obsolete. Specifically, if i is about to write
to a max register k > 0 (line 6), it reads the maximum residue written so far to maxk−1,
computes the corresponding value of M based on it and stores it to a local variable curMax
(line 7). If switchk−1 is 0 (line 8), then maxk−1 must be made obsolete. As we prove, in this
case, curMax was indeed a value of M at some point during the execution interval of i’th
Write operation, so i attempts to help process nextToHelpi by writing to the appropriate
entry of array H and increments nextToHelpi modulo n (lines 9–10).
The goal of the helping mechanism is to ensure that every Read operation eventually
completes. Every n iterations of the while loop of lines 15–18, the GetHelp utility function
is called, receiving an integer that is a multiple of n, indicating whether or not this is its
first invocation by the current Read operation (line 14, lines 17–18). If GetHelp returns
a positive value then, as we prove, this was indeed M ’s value at some point during the
execution interval of Read, so it returns this value in line 18. Otherwise, the search for a
non-obsolete max register is resumed.
The pseudo-code of GetHelp is presented by Algorithm 2, described next. In its first
invocation by Read operation R (performed by some process i), initialization is done by
copying the i’th row of the H array to array HRi and initializing all elements of a second array
Ci to 0 (lines 2–4). Both HRi and Ci are only accessed by process i. Element Ci[j] counts
the number of times in which i observed that it was helped by process j in the course of R.
In the first invocation, 0 is returned (line 10), indicating that a maximum value is not yet
available. In each subsequent invocation of GetHelp (lines 5–9), if any, i checks, for each j,
if it was helped by j since the last time it read H[i][j], in which case it updates HRi[j] and
increments Ci[j]. If i was helped by some process j at least twice since R started then, as we
prove, the maximum value computed by j for i was indeed M ’s value at some point during
R’s execution interval, so GetHelp returns it in line 9 and R then returns this value in line 18
of Algorithm 1. Otherwise, 0 is returned in line 10.
DISC 2019
3:10 Long-Lived Counters with Polylogarithmic Amortized Step Complexity
3.4 Correctness
In this section we prove that the algorithm with the helping mechanism (henceforth the
full algorithm) is linearizable. We classify read and write operations to types as we did in
Section 3.1, except that now we have a 3’rd class of read operations – those that return in
line 18 of Algorithm 1 after being helped. We say that these are Read operations of type (iii).
Let R be a type (iii) Read operation by process i that returns value u and let k′ = bu/mc,
then there is a Write operation W by process j, concurrent with R, that wrote u to H[i][j]
(in Line 9 of Algorithm 1) after performing a ReadMax operation on maxk′ (in Line 7 of
Algorithm 1) and R returns value u after reading it from H[i][j] (Lines 8, 9 of GetHelp). We
say that R is associated with that ReadMax operation.
As in Section 3.1, we partition the operations of E to the sets Downk and Futilek, except
that we now add each Read operation of type (iii) that is associated with a ReadMax on
maxk to Downk. We use the ordering rules defined in Section 3.1 to linearize all of E’s
non-removed operations. It is easily verified that Observations 2–3 and Claim 4 hold also
with the extended definition of the sets Downk.
I Observation 11. Let R be a type (iii) Read operation associated with a ReadMax operation
R′ on maxk′ . Then all throughout the execution of R′, switchk′ = 0 holds.
Proof. Immediate from Claim 4 and the fact that the Write operation that invokes R′ in
line 7 of Algorithm 1 writes the value read by R′ (in line 9) to the H array only after verifying
that switchk′ = 0 holds (in line 8). J
Based on Observation 11, we now prove that Claim 5 holds for full algorithm.
B Claim 12. In any ordering of operations for the full algorithm, for all k′ ≤ k, there are
no two operations Fop, Dop such that Fop ∈ Futilek, Dop ∈ Downk′ and Fop is linearized
before Dop in the ordering given by rules 1–3.
Proof. Suppose towards a contradiction that Fop is linearized before Dop. If Fop was placed
after Dopr when applying rule 3, then we immediately reach a contradiction. Assume
otherwise, then, from rule 3, there exists a Down operation Dopi, such that Fop precedes
Dopi, Fop is linearized before Dopi, and no Down operation is linearized between Fop and
Dopi. Consequently, it must be that Dop is linearized after Dopi. From rules 1-2, we
have that Dopi ∈ Downk1 such that k1 ≤ k′ ≤ k. Since Dopi ∈ Downk1 , either Dopi
reads 0 from Switchk1 or, otherwise, it is a type (iii) Read operation, in which case, from
Observation 11, Switchk1 = 0 holds at some point during its execution. However, Fop reads
Switchk = 1 before Dopi starts, hence, by Claim 4, Switchk1 = 1 when Dopi starts. This is
a contradiction. C
We next show that Lemma 6 holds also for the full algorithm.
I Lemma 13. Ordering rules 1-3 for the full algorithm define a sequential order between E’s
non-removed operations that preserves the real-time order between non-overlapping operations
in E.
Proof. In Lemma 6, the corresponding claim was proven w.r.t. the algorithm without the
helping mechanism for operations of all types, except for Read operations of type (iii). A
type (iii) operation R ∈ Downk by process i is associated with a ReadMax operation R′ on
maxk invoked from a concurrent Write operation, performed by some process j 6= i. Since the
condition of line 9 of GetHelp was satisfied when evaluated by i, the execution interval of R′
M.A. Baig, D. Hendler, A. Milani, and C. Travers 3:11
is fully contained within that of R. It follows that R can be linearized when R′ is linearized
on maxk. Thus, R is ordered w.r.t. other operations in Downk by applying ordering rule 2
to R′ breaking ties arbitrarily which, from Observation 2 and the linearizability of maxk,
ensures that real-time order is maintained between all operations in Downk.
Let Op ∈ Downk and Op′ ∈ Downk′ be two operations such that k < k′. From Claim 4,
M ’s switches are set to 1 in increasing order. Based on this and on Observation 11 (which is
required if either Op or Op′ is a type (iii) Read operation), Op′ does not precede Op in E. it
follows that the real-time order between each pair of Down operations is preserved by the
linearization.
Let Fop and Dop respectively be a Futilek and a Downk′ operation. Suppose Fop is
linearized before Dop but Dop precedes Fop in real-time order. If k′ ≤ k, then, by Claim 12,
this is a contradiction. The rest of the proof proceeds exactly as in the proof of Lemma 6. J
It is easily verified that Lemma 7 holds also for the full algorithm. The only change required
in its proof is to use Claim 12 instead of Claim 5.
B Claim 14. If a monotonically-increasing sequence of values is written to M , then some
process performs line 9 of Algorithm 1 infinitely often.
Proof. If a monotonically-increasing sequence of values is written to M , then max registers
are made obsolete infinitely often. Since a max register is only made obsolete in line 11
of Algorithm 1, it is immediate from the code that line 9 of that algorithm is performed
infinitely often as well. Since the number of processes is finite, it follows that some process
performs that line infinitely often. C
I Lemma 15. The full Algorithm 1 is wait-free.
Proof. As proven in Lemma 8, the algorithm is lock-free and Write operations are wait-
free. It remains to show that Read operations are wait-free as well. From Claim 14, if a
monotonically-increasing sequence of values is written to M , then there is some process j
that performs line 9 of Algorithm 1 infinitely often. Thus, any Read operation, say by process
i, eventually either finds a non-obsolete max register in line 15 or increments Ci[j] twice in
line 9 of GetHelp and is therefore able to terminate.
Otherwise, there is no such sequence of monotonically-increasing values. Thus, starting
from some point in the execution, M ’s value does not increase, so the set of obsolete max
object stops growing, hence every Read operation that does not fail-stop eventually reaches a
non-obsolete max register and completes. J
I Theorem 16. If m ≥ n2, then the full algorithm is a wait-free linearizable n-process
implementation of an unbounded max register with amortized step complexity of O(logm) in
any n-bounded-increment execution.
Proof. From Lemmata 7, 13 and 15 the full algorithm is linearizable and wait-free, so it
remains to argue regarding its complexity. In Algorithm 2, every iteration of the for loop at
either line 3 or line 6 incurs a constant number of steps. Thus, every invocation of GetHelp
incurs O(n) steps. In Algorithm 1, a Write operation performs at most one WriteMax and
at most one ReadMax operation, incurring a total of O(logm) steps. We note that any Read
operation invokes GetHelp once every k · n steps, for some k > 1, when c = 0 mod n. Thus,
at any point in the course of the execution, the number of steps taken by a Read operation
R inside GetHelp is O(loopR). Consequently, as in the proof of Lemma 9, we get:
AmtSteps(E) = O
(( ∑
op∈OpsW (E)
logm+
∑
op∈OpsR(E)
logm+loopop
)
/(w+r)
)
= O(logm). (6)
J
DISC 2019
3:12 Long-Lived Counters with Polylogarithmic Amortized Step Complexity
Algorithm 3 An n-process counter Cj , code for process i.
Shared variables:
R: an n-process UnboundedMaxRegn2 object, initially 0
If j > 1: left: a Cdj/2e counter object, initially 0
right: a Cj−dj/2e counter object, initially 0
1: function Inc(Cj)
2: if j = 1 then
3: v ← ReadMax(R)
4: WriteMax(R, v + 1)
5: else
6: if i’s C1 leaf-counter is on the left sub-tree then Inc(left) else Inc(right)
7: v0 ← read(left)
8: v1 ← read(right)
9: WriteMax(R, v0 + v1)
10: function Read(Cj)
11: return ReadMax(R)
4 Wait-Free Counter with Polylogarithmic Amortized Step
Complexity
Algorithm 3 presents a wait-free recursive construction of a linearizable counter that has
polylogarithmic amortized step complexity in all executions, regardless of their length. The
algorithm is essentially the same as the (non-recursive) counter construction of Aspnes et al.
[4], except that the latter uses the max registers of [4], whose amortized step complexity is
linear for sufficiently long executions, whereas ours uses our wait-free unbounded max registers.
Let Cj denote a counter, shared by n processes, implemented by Algorithm 3. For
simplicity and without loss of generality, assume in the following that each of n and j is an
integral power of 2. Cj ’s value is stored in an n-process wait-free unbounded max register
R, which is of type UnboundedMaxRegn2 . If j > 1 holds, then Cj also contains two Cj/2
child-counters – left and right. A counter Cn serves as a root of a tree of counters and all
processes can invoke Inc operations on Cn. At the bottom layer of the tree, each process i is
associated with a single C1 leaf-counter on which only i can invoke Inc operations.
To read Cj , process i simply invokes a ReadMax operation on Cj ’s R object and returns
the response (line 11). Incrementing a C1 object consists of simply reading R and writing to
it a value larger by one (lines 3–4). To increment a Cj counter, for j > 1, process i increments
either the left or the right child counter, depending on whether its C1 leaf-counter is on
the left or the right subtree of Cj , reads the values of both child counters and writes their
sum to R (lines 6–9). Observe that at most j distinct processes can invoke Inc operations
on any specific Cj counter.
In the following proofs we let C denote a Cn object implemented by Algorithm 3 and E
be an execution of C.
I Lemma 17. The Cj counter implementation of Algorithm 3 is linearizable.
Proof. The proof is by induction on j.
Base Case. For j = 1, the UnboundedMaxReg object R of a C1 counter may only be
incremented by a single process. Since R’s value is always increased by exactly 1, the
execution is 1-bounded-increment for R, so the correctness of R follows from Theorem 16.
Increment operations on C1 are linearized when the WriteMax operation invoked in line 4 is
linearized and read operations on C1 are linearized when the ReadMax operation invoked in
line 11 is linearized.
M.A. Baig, D. Hendler, A. Milani, and C. Travers 3:13
Induction Hypothesis. For all k < j, Ck is a linearizable counter and the value of the max
object R it uses is never increased by more than k.
Inductive Step.
I Sub-Lemma 17.1. E is a j-bounded-increment execution for Cj .R.
Proof. The proof is divided into two parts. We first prove the left-hand inequality of
Definition 1. Let E′ be a prefix of E immediately after which process p is about to invoke
a WriteMax() operation Opv on Cj .R with input v (in line 9). Let I be the set of Inc
operations that have completed on Cj in E′. Observe that each operation Op ∈ I has
performed one Inc operation on either Cj .left or Cj .right. We partition I accordingly:
I = I0 ∪ I1, where for any Op ∈ I, Op ∈ I0 if Op performed an Inc operation on Cj .left
and Op ∈ I1 if Op performed an Inc operation on Cj .right.
By IH, both Cj .left and Cj .right are linearizable counters. Let Op0 ∈ I0 be the
operation whose Inc operation on Cj .left is linearized last among all Inc operations on
Cj .left performed by the operations in I0. Let c0 be the value of Cj .left immediately
after the Inc operation on that object by Op0. Op1 and c1 are defined similarly. From lines
7–9, for each r ∈ {0, 1}, after performing an Inc operation on either Cj .left or Cj .right,
Opr performs read operations on both Cj .left and Cj .right before writing the sum ur
of the values read to Cj .R. We show that v′ = max{u0, u1} ≥ c0 + c1. Indeed, assume
that Op0’s read operation on Cj .right returns a value strictly smaller than c1. Then, Op1’s
Inc operation on Cj .right is linearized after Op0’s Read operation on Cj .right. It thus
follows that Op1’s read operation on Cj .left starts after Op0’s Inc operation on Cj .left
has completed. We thus conclude that u1 ≥ c0 + c1.
As both Op0 and Op1 have completed in E′, a WriteMax operation on R of value
v′ ≥ c0 + c1 has completed in E′. If v ≤ v′ then v − j ≤ v′ and the claim holds. Otherwise,
again from lines 7–9, the operand v of the WriteMax operation Opv is the sum of the values
v0, v1 returned by the Read operations performed on the counters Cj .left and Cj .right,
respectively. v0 = c0 + δ, for δ > 0, implies that there are δ Inc operations on Cj .left that
have been linearized after the Inc operation on the same counter by Op0. From the definition
of Op0, these δ operations take place within δ Inc operations on Cj that did not complete
in E′. The same argument applies for v1. Since there are at most j processes that may
invoke Inc operations on Cj and thus at most j incomplete Inc operations on Cj after E′,
it follows that v = v0 + v1 ≤ j + c0 + c1. Hence, there is a value v′ = max{u0, u1} such that
v − v′ ≤ j and a WriteMax(v′) on R has completed before the operation Opv = WriteMax(v)
on R starts.
We next prove both inequalities of Definition 1. Let Op be a WriteMax operation on Cj .R
with input v > j. The first part above established that there exists a WriteMax operation
Op′ on Cj .R with input v′ that finishes before Op starts, such that v − n ≤ v′. Assume
that v′ ≥ v. Let O> be the set of WriteMax operations on Cj .R that (1) precede Op and (2)
whose input is larger than or equal to v. We define a partial order ≺ on the operations in
O> as follows:
∀W,W ′ ∈ O>,W ≺W ′ ⇐⇒ W precedes W ′ in E.
Let us observe that O> is non-empty and finite. The latter is because E is finite and so
only finitely many operation precede Op in E and the former follows from the existence of
Op′. Consider any minimal element in the partially ordered set O>, that is any operation W
such that for any operation W ′ ∈ O>, W ′ does not precede W . Since O> is finite, there is
DISC 2019
3:14 Long-Lived Counters with Polylogarithmic Amortized Step Complexity
at least one such operation W . Let inW denote its input. Since W ∈ O>, we have inW ≥ v.
Also, by applying the left-hand inequality (proved in the first part of the proof) to W , there
exists an operation W ′ with input inW ′ that precedes W such that inW ′ ≥ inW − j ≥ v − j.
As W ′ ≺W , and W is chosen as a minimal element of O>, it follows that W ′ /∈ O>. Since
W ′ precedes both W and Op, we get that inW ′ < v, which concludes the proof. J
From Sub-lemma 17.1 and Theorem 16 we conclude that Cn.R is linearizable in E. Based
on this, the proof proceeds similarly to the proof of [4, Lemma 4].
From IH, Cj .left and Cj .right are linearizable counters. We associate with every
increment operation Op on Cj a value as follows. Let c0 and c1 respectively denote the values
of Cj .left and Cj .right immediately after p’s increment of Cj ’s child (corresponding to
p’s identifer), in line 6, is linearized. Then we associate with Op the value v = c0 + c1. We
linearize an Inc operation Op, associated with value v, when a value v′ ≥ v is first written
to Cj .R in line 9 (either by p or by another process). We linearize a Read operation on Cj
when it reads Cj .R in line 11.
We now prove that each linearization point lies within its operation execution interval.
Consider an Inc operation Op associated with value v. A value v′ ≥ v cannot be written to
Cj .R before Op starts, because, from the linearizability of Cj .left and Cj .right, before Op
starts, the sum of these two counters is less than c0 + c1. Since Op itself writes value v to
Cj .R before it terminates, the linearization point occurs before Op terminates. The fact that
the linearization point of a Read operation on Cj lies within its execution interval follows
immediately from the linearizability of Cj .R, established by Sub-lemma 17.1. Finally, the
linearization points result in a valid sequential execution, because every Read operation on
Cj that returns value v is preceded by exactly v Inc operations on Cj . J
I Lemma 18. Algorithm 3 has O(log2 n) amortized operation step complexity.
Proof. From Algorithm 3 and the fact that C is shared by n processes, every operation on
C applies a constant number of ReadMax/WriteMax operations to each of O(logn) different
UnboundedMaxRegn2 objects, as the recursive calls in lines 7–9 and 11 unfold. Letting
COps(E) denote the number of operations on C that appear in E, the total number of
ReadMax/WriteMax operations on all the implementation’s UnboundedMaxRegn2 objects is
therefore O
(
logn · COps(E)). From Theorem 16, letting m = n2, it follows that the total
number of steps performed in E is O
(
log2 n · COps(E)). J
I Theorem 19. Algorithm 3 is a wait-free linearizable n-process implementation of an
unbounded counter with amortized step complexity of O(log2 n).
Proof. From Lemma 17, the algorithm is linearizable. From Lemma 15, all the Unbounded
MaxReg objects used by Algorithm 3 are wait-free, thus, clearly from the pseudo-code,
Algorithm 3 is wait-free as well. The claimed complexity follows from Lemma 18. J
Attiya and Hendler proved a logarithmic lower bound on the amortized step complexity
of implementing an obstruction-free one-time fetch&increment object from read, write and
k-word compare-and-swap operations [9, Theorem 9]. Their proof can be easily adapted to
obtain the following result:
I Lemma 20. Any n-process obstruction-free implementation from read/write registers of
a counter object has an execution that contains Ω(n logn) steps, in which every process
performs a single Inc operation followed by a single Read operation.
M.A. Baig, D. Hendler, A. Milani, and C. Travers 3:15
Lemma 20 establishes that every non-blocking read/write counter implementation has an
execution whose amortized step complexity is at least logarithmic in the number of processes,
showing that our counter algorithm is optimal in terms of amortized step complexity up to a
logarithmic factor.
5 Discussion
In this work, we presented the first non-blocking read/write counter algorithm that provides
sub-linear amortized step complexity in all executions, regardless of their length. The
amortized operation step complexity of our algorithm is O(log2 n), where n is the number
of processes sharing the implementation. This is optimal up to a logarithmic factor, since
there exists a logarithmic lower bound on the amortized step complexity of n-process one-
time counters.
It is unclear whether there exists a wait-free (or even lock-free or obstruction-free)
read/write counter implementation with o(log2 n) amortized step complexity. Interestingly,
a similar gap between an O(log2 n) upper bound and an Ω(logn) lower bound exists for the
worst-case step complexity of counters [4].
The space complexity of our counter is infinite, since it uses our unbounded max registers,
and each of these encapsulates an infinite number of bounded max registers. A second
question is that of finding a bounded-space read/write counter with sub-linear amortized
step complexity. These questions are left for future work.
References
1 Yehuda Afek, Hagit Attiya, Danny Dolev, Eli Gafni, Michael Merritt, and Nir Shavit. Atomic
snapshots of shared memory. Journal of the ACM (JACM), 40(4):873–890, 1993.
2 Yehuda Afek, Haim Kaplan, Boris Korenfeld, Adam Morrison, and Robert E Tarjan. The CB
tree: a practical concurrent self-adjusting search tree. Distributed computing, 27(6):393–417,
2014.
3 James Anderson. Composite registers. Distributed Computing, 6(3):141–154, 1993.
4 James Aspnes, Hagit Attiya, and Keren Censor-Hillel. Polylogarithmic concurrent data
structures from monotone circuits. J. ACM, 59(1):2:1–2:24, 2012.
5 James Aspnes and Keren Censor. Approximate shared-memory counting despite a strong
adversary. ACM Trans. Algorithms, 6(2):25:1–25:23, 2010.
6 James Aspnes and Keren Censor-Hillel. Atomic Snapshots inO(log3 n) Steps Using Randomized
Helping. In 27th International Symposium on Distributed Computing (DISC), volume 8205 of
Lecture Notes in Computer Science, pages 254–268. Springer, 2013.
7 James Aspnes and Maurice Herlihy. Fast randomized consensus using shared memory. Journal
of algorithms, 11(3):441–461, 1990.
8 Hagit Attiya and Arie Fouren. Adaptive and efficient algorithms for lattice agreement and
renaming. SIAM Journal on Computing, 31(2):642–664, 2001.
9 Hagit Attiya and Danny Hendler. Time and Space Lower Bounds for Implementations Using
k-CAS. IEEE Trans. Parallel Distrib. Syst., 21(2):162–173, 2010.
10 Michael A. Bender and Seth Gilbert. Mutual Exclusion with O(log2 logn) Amortized Work.
In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pages 728–737.
IEEE, 2011.
11 Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages
and Systems (TOPLAS), 13(1):124–149, 1991.
12 Maurice Herlihy, Victor Luchangco, and Mark Moir. Obstruction-Free Synchronization:
Double-Ended Queues as an Example. In 23rd International Conference on Distributed
Computing Systems (ICDCS), pages 522–529. IEEE Computer Society, 2003.
DISC 2019
3:16 Long-Lived Counters with Polylogarithmic Amortized Step Complexity
13 Maurice P Herlihy and Jeannette M Wing. Linearizability: A correctness condition for
concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS),
12(3):463–492, 1990.
14 Michiko Inoue, Toshimitsu Masuzawa, Wei Chen, and Nobuki Tokura. Linear-time snapshot
using multi-writer multi-reader registers. In International Workshop on Distributed Algorithms,
pages 130–140. Springer, 1994.
15 Prasad Jayanti. A Time Complexity Lower Bound for Randomized Implementations of Some
Shared Objects. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of
Distributed Computing, PODC ’98, pages 201–210, 1998.
16 Prasad Jayanti, King Tan, and Sam Toueg. Time and Space Lower Bounds for Nonblocking
Implementations. SIAM J. Comput., 30(2), 2000.
17 Shlomo Moran and Gadi Taubenfeld. A lower bound on wait-free counting. J. Algorithms,
24(1):1–19, 1997.
18 Shlomo Moran, Gadi Taubenfeld, and Irit Yadin. Concurrent Counting. J. Comput. Syst. Sci.,
53(1):61–78, 1996.
