Progressive Transactional Memory in Time and Space by Kuznetsov, Petr & Ravi, Srivatsan
Progressive Transactional Memory in Time and
Space
Petr Kuznetsov1 Srivatsan Ravi2
1Te´le´com ParisTech
2TU Berlin
October 12, 2018
Abstract
Transactional memory (TM) allows concurrent processes to organize se-
quences of operations on shared data items into atomic transactions. A trans-
action may commit, in which case it appears to have executed sequentially or
it may abort, in which case no data item is updated.
The TM programming paradigm emerged as an alternative to conven-
tional fine-grained locking techniques, offering ease of programming and
compositionality. Though typically themselves implemented using locks,
TMs hide the inherent issues of lock-based synchronization behind a nice
transactional programming interface.
In this paper, we explore inherent time and space complexity of lock-
based TMs, with a focus of the most popular class of progressive lock-based
TMs. We derive that a progressive TM might enforce a read-only transaction
to perform a quadratic (in the number of the data items it reads) number
of steps and access a linear number of distinct memory locations, closing
the question of inherent cost of read validation in TMs. We then show that
the total number of remote memory references (RMRs) that take place in
an execution of a progressive TM in which n concurrent processes perform
transactions on a single data item might reach Ω(n logn), which appears to
be the first RMR complexity lower bound for transactional memory.
1 Introduction
Transactional memory (TM) allows concurrent processes to organize sequences
of operations on shared data items into atomic transactions. A transaction may
commit, in which case it appears to have executed sequentially or it may abort, in
which case no data item is updated. The user can therefore design software having
only sequential semantics in mind and let the TM take care of handling conflicts
(concurrent reading and writing to the same data item) resulting from concurrent
executions. Another benefit of transactional memory over conventional lock-based
concurrent programming is compositionality: it allows the programmer to easily
1
ar
X
iv
:1
50
2.
04
90
8v
2 
 [c
s.D
C]
  1
3 N
ov
 20
15
compose multiple operations on multiple objects into atomic units, which is very
hard to achieve using locks directly. Therefore, while still typically implemented
using locks, TMs hide the inherent issues of lock-based programming behind an
easy-to-use and compositional transactional interface.
At a high level, a TM implementation must ensure that transactions are consis-
tent with some sequential execution. A natural consistency criterion is strict seri-
alizability [21]: all committed transactions appear to execute sequentially in some
total order respecting the timing of non-overlapping transactions. The stronger
criterion of opacity [14], guarantees that every transaction (including aborted and
incomplete ones) observes a view that is consistent with the same sequential ex-
ecution, which implies that no transaction would expose a pathological behavior,
not predicted by the sequential program, such as division-by-zero or infinite loop.
Notice that a TM implementation in which every transaction is aborted is triv-
ially opaque, but not very useful. Hence, the TM must satisfy some progress guar-
antee specifying the conditions under which a transaction is allowed to abort. It
is typically expected that a transaction aborts only because of data conflicts with
a concurrent one, e.g., when they are both trying to access the same data item and
at least one of the transactions is trying to update it. This progress guarantee, cap-
tured formally by the criterion of progressiveness [13], is satisfied by most TM
implementations today [6, 7, 16].
There are two design principles which state-of-the-art TM [6–8,12,16,23] im-
plementations adhere to: read invisibility [4, 9] and disjoint-access parallelism [5,
18]. Both are assumed to decrease the chances of a transaction to encounter a data
conflict and, thus, improve performance of progressive TMs. Intuitively, reads per-
formed by a TM are invisible if they do not modify the shared memory used by the
TM implementation and, thus, do not affect other transactions. A disjoint-access
parallel (DAP) TM ensures that transaction accessing disjoint data sets do not con-
tend on the shared memory and, thus, may proceed independently. As was earlier
observed [14], the combination of these principles incurs some inherent costs, and
the main motivation of this paper is to explore these costs.
Intuitively, the overhead invisible read may incur comes from the need of val-
idation, i.e., ensuring that read data items have not been updated when the trans-
action completes. Our first result (Section 4) is that a read-only transaction in an
opaque TM featured with weak DAP and weak invisible reads must incrementally
validate every next read operation. This results in a quadratic (in the size of the
transaction’s read set) step-complexity lower bound. Informally, weak DAP means
that two transactions encounter a memory race only if their data sets are connected
in the conflict graph, capturing data-set overlaps among all concurrent transactions.
Weak read invisibility allows read operations of a transaction T to be “visible” only
if T is concurrent with another transaction. The lower bound is derived for mini-
mal progressiveness, where transactions are guaranteed to commit only if they run
sequentially. Our result improves the lower bound [13, 14] derived for strict-data
partitioning (a very strong version of DAP) and (strong) invisible reads.
Our second result is that, under weak DAP and weak read invisibility, a strictly
2
serializable TM must have a read-only transaction that accesses a linear (in the size
of the transaction’s read set) number of distinct memory locations in the course of
performing its last read operation. Naturally, this space lower bound also applies
to opaque TMs.
We then turn our focus to strongly progressive TMs [14] that, in addition to pro-
gressiveness, ensures that not all concurrent transactions conflicting over a single
data item abort. In Section 5, we prove that in any strongly progressive strictly seri-
alizable TM implementation that accesses the shared memory with read, write and
conditional primitives, such as compare-and-swap and load-linked/store-conditional,
the total number of remote memory references (RMRs) that take place in an exe-
cution of a progressive TM in which n concurrent processes perform transactions
on a single data item might reach Ω(n logn). The result is obtained via a reduction
to an analogous lower bound for mutual exclusion [3]. In the reduction, we show
that any TM with the above properties can be used to implement a deadlock-free
mutual exclusion, employing transactional operations on only one data item and
incurring a constant RMR overhead. The lower bound applies to RMRs in both the
cache-coherent (CC) and distributed shared memory (DSM) models, and it appears
to be the first RMR complexity lower bound for transactional memory.
2 Model
TM interface. A transactional memory (in short, TM) supports transactions for
reading and writing on a finite set of data items, referred to as t-objects. Every
transaction Tk has a unique identifier k. We assume no bound on the size of a
t-object, i.e., the cardinality on the set V of possible different values a t-object
can have. A transaction Tk may contain the following t-operations, each being a
matching pair of an invocation and a response: readk(X) returns a value in some
domain V (denoted readk(X)→ v) or a special value Ak /∈V (abort); writek(X ,v),
for a value v ∈V , returns ok or Ak; tryCk returns Ck /∈V (commit) or Ak.
Implementations. We assume an asynchronous shared-memory system in which
a set of n> 1 processes p1, . . . , pn communicate by applying operations on shared
objects. An object is an instance of an abstract data type which specifies a set of
operations that provide the only means to manipulate the object. An implementa-
tion of an object type τ provides a specific data-representation of τ by applying
primitives on shared base objects, each of which is assigned an initial value and a
set of algorithms I1(τ), . . . , In(τ), one for each process. We assume that these prim-
itives are deterministic. Specifically, a TM implementation provides processes with
algorithms for implementing readk, writek and tryCk() of a transaction Tk by ap-
plying primitives from a set of shared base objects. We assume that processes issue
transactions sequentially, i.e., a process starts a new transaction only after the previ-
ous transaction is committed or aborted. A primitive is a generic read-modify-write
(RMW) procedure applied to a base object [10, 15]. It is characterized by a pair of
functions 〈g,h〉: given the current state of the base object, g is an update function
3
that computes its state after the primitive is applied, while h is a response function
that specifies the outcome of the primitive returned to the process. A RMW primi-
tive is trivial if it never changes the value of the base object to which it is applied.
Otherwise, it is nontrivial. An RMW primitive 〈g,h〉 is conditional if there exists
v, w such that g(v,w) = v and there exists v, w such that g(v,w) 6= v [11]. For e.g,
compare-and-swap (CAS) and load-linked/store-conditional (LL/SC are nontrivial
conditional RMW primitives while fetch-and-add is an example of a nontrivial
RMW primitive that is not conditional.
Executions and configurations. An event of a process pi (sometimes we say
step of pi) is an invocation or response of an operation performed by pi or a rmw
primitive 〈g,h〉 applied by pi to a base object b along with its response r (we call
it a rmw event and write (b,〈g,h〉,r, i)). A configuration specifies the value of
each base object and the state of each process. The initial configuration is the
configuration in which all base objects have their initial values and all processes
are in their initial states.
An execution fragment is a (finite or infinite) sequence of events. An execution
of an implementation I is an execution fragment where, starting from the initial
configuration, each event is issued according to I and each response of a rmw
event (b,〈g,h〉,r, i) matches the state of b resulting from all preceding events. An
execution E ·E ′, denoting the concatenation of E and E ′, is an extension of E and
we say that E ′ extends E.
Let E be an execution fragment. For every transaction identifier k, E|k denotes
the subsequence of E restricted to events of transaction Tk. If E|k is non-empty,
we say that Tk participates in E, else we say E is Tk-free. Two executions E and
E ′ are indistinguishable to a set T of transactions, if for each transaction Tk ∈ T ,
E|k = E ′|k. A TM history is the subsequence of an execution consisting of the
invocation and response events of t-operations.
The read set (resp., the write set) of a transaction Tk in an execution E, denoted
Rset(Tk) (and resp. Wset(Tk)), is the set of t-objects on which Tk invokes reads
(and resp. writes) in E. The data set of Tk is Dset(Tk) = Rset(Tk)∪Wset(Tk).
A transaction is called read-only if Wset(Tk) = /0; write-only if Rset(Tk) = /0 and
updating if Wset(Tk) 6= /0. Note that, in our TM model, the data set of a transaction
is not known apriori, i.e., at the start of the transaction and it is identifiable only
by the set of data items the transaction has invoked a read or write on in the given
execution.
Transaction orders. Let txns(E) denote the set of transactions that participate in
E. An execution E is sequential if every invocation of a t-operation is either the
last event in the history H exported by E or is immediately followed by a matching
response. We assume that executions are well-formed: no process invokes a new
operation before the previous operation returns. Specifically, we assume that for
all Tk, E|k begins with the invocation of a t-operation, is sequential and has no
events after Ak or Ck. A transaction Tk ∈ txns(E) is complete in E if E|k ends with
a response event. The execution E is complete if all transactions in txns(E) are
4
complete in E. A transaction Tk ∈ txns(E) is t-complete if E|k ends with Ak or Ck;
otherwise, Tk is t-incomplete. Tk is committed (resp., aborted) in E if the last event
of Tk is Ck (resp., Ak). The execution E is t-complete if all transactions in txns(E)
are t-complete.
For transactions {Tk,Tm} ∈ txns(E), we say that Tk precedes Tm in the real-time
order of E, denoted Tk ≺RTE Tm, if Tk is t-complete in E and the last event of Tk
precedes the first event of Tm in E. If neither Tk ≺RTE Tm nor Tm ≺RTE Tk, then Tk and
Tm are concurrent in E. An execution E is t-sequential if there are no concurrent
transactions in E.
Contention. We say that a configuration C after an execution E is quiescent
(and resp. t-quiescent) if every transaction Tk ∈ txns(E) is complete (and resp.
t-complete) in C. If a transaction T is incomplete in an execution E, it has exactly
one enabled event, which is the next event the transaction will perform according
to the TM implementation. Events e and e′ of an execution E contend on a base
object b if they are both events on b in E and at least one of them is nontrivial (the
event is trivial (and resp. nontrivial) if it is the application of a trivial (and resp.
nontrivial) primitive). We say that a transaction T is poised to apply an event e
after E if e is the next enabled event for T in E. We say that transactions T and T ′
concurrently contend on b in E if they are each poised to apply contending events
on b after E.
We say that an execution fragment E is step contention-free for t-operation opk
if the events of E|opk are contiguous in E. We say that an execution fragment E is
step contention-free for Tk if the events of E|k are contiguous in E. We say that E is
step contention-free if E is step contention-free for all transactions that participate
in E.
3 TM classes
TM-correctness. We say that readk(X) is legal in a t-sequential execution E if it
returns the latest written value of X , and E is legal if every readk(X) in H that does
not return Ak is legal in E.
A finite history H is opaque if there is a legal t-complete t-sequential history
S, such that (1) for any two transactions Tk,Tm ∈ txns(H), if Tk ≺RTH Tm, then Tk
precedes Tm in S, and (2) S is equivalent to a completion of H.
A finite history H is strictly serializable if there is a legal t-complete t-sequential
history S, such that (1) for any two transactions Tk,Tm ∈ txns(H), if Tk ≺RTH Tm, then
Tk precedes Tm in S, and (2) S is equivalent to cseq(H¯), where H¯ is some comple-
tion of H and cseq(H¯) is the subsequence of H¯ reduced to committed transactions
in H¯.
We refer to S as an opaque (and resp. strictly serializable) serialization of H.
TM-liveness. We say that a TM implementation M provides interval-contention
free (ICF) TM-liveness if for every finite execution E of M such that the configu-
ration after E is quiescent, and every transaction Tk that applies the invocation of
5
a t-operation opk immediately after E, the finite step contention-free extension for
opk contains a matching response.
TM-progress. We say that a TM implementation provides sequential TM-progress
(also called minimal progressiveness [14]) if every transaction running step contention-
free from a t-quiescent configuration commits within a finite number of steps.
We say that transactions Ti,Tj conflict in an execution E on a t-object X if
X ∈ Dset(Ti)∩Dset(Tj), and X ∈Wset(Ti)∪Wset(Tj).
A TM implementation M provides progressive TM-progress (or progressive-
ness) if for every execution E of M and every transaction Ti ∈ txns(E) that returns
Ai in E, there exists a transaction Tk ∈ txns(E) such that Tk and Ti are concurrent
and conflict in E [14].
Let COb jH(Ti) denote the set of t-objects over which transaction Ti ∈ txns(H)
conflicts with any other transaction in history H, i.e., X ∈COb jH(Ti), iff there exist
transactions Ti and Tk that conflict on X in H. Let Q ⊆ txns(H) and COb jH(Q) =⋃
Ti∈Q
COb jH(Ti).
Let CTrans(H) denote the set of non-empty subsets of txns(H) such that a set
Q is in CTrans(H) if no transaction in Q conflicts with a transaction not in Q.
Definition 1. A TM implementation M is strongly progressive if M is weakly pro-
gressive and for every history H of M and for every set Q ∈ CTrans(H) such that
|COb jH(Q)| ≤ 1, some transaction in Q is not aborted in H.
Invisible reads. A TM implementation M uses invisible reads if for every execu-
tion E of M and for every read-only transaction Tk ∈ txns(E), E|k does not contain
any nontrivial events.
In this paper, we introduce a definition of weak invisible reads. For any execu-
tion E and any t-operation pik invoked by some transaction Tk ∈ txns(E), let E|pik
denote the subsequence of E restricted to events of pik in E.
We say that a TM implementation M satisfies weak invisible reads if for any
execution E of M and every transaction Tk ∈ txns(E); Rset(Tk) 6= /0 that is not
concurrent with any transaction Tm ∈ txns(E), E|pik does not contain any nontrivial
events, where pik is any t-read operation invoked by Tk in E.
Disjoint-access parallelism (DAP). Let τE(Ti,Tj) be the set of transactions (Ti and
Tj included) that are concurrent to at least one of Ti and Tj in E. Let G(Ti,Tj,E) be
an undirected graph whose vertex set is
⋃
T∈τE (Ti,Tj)
Dset(T ) and there is an edge be-
tween t-objects X and Y iff there exists T ∈ τE(Ti,Tj) such that {X ,Y} ∈ Dset(T ).
We say that Ti and Tj are disjoint-access in E if there is no path between a t-object
in Dset(Ti) and a t-object in Dset(Tj) in G(Ti,Tj,E). A TM implementation M is
weak disjoint-access parallel (weak DAP) if, for all executions E of M, transac-
tions Ti and Tj concurrently contend on the same base object in E only if Ti and Tj
are not disjoint-access in E or there exists a t-object X ∈Dset(Ti)∩Dset(Tj) [5,22].
6
Rφ (X1) · · ·Rφ (Xi−1)
i−1 t-reads Rφ (Xi)→ nvWi(Xi,nv)
Ti commits
TφTi
(a) Rφ (Xi) must return nv by strict serializability
Rφ (X1) · · ·Rφ (Xi−1)
i−1 t-reads
Wi(Xi,nv)
Ti commits
Rφ (Xi)→ nv
new value
Tφ
Ti
(b) Ti does not observe any conflict with Tφ
Figure 1: Executions in the proof of Lemma 2; By weak DAP, Tφ cannot distinguish
this from the execution in Figure 1a
Lemma 1. ( [5], [20]) Let M be any weak DAP TM implementation. Let α ·ρ1 ·ρ2
be any execution of M where ρ1 (and resp. ρ2) is the step contention-free execution
fragment of transaction T1 6∈ txns(α) (and resp. T2 6∈ txns(α)) and transactions T1,
T2 are disjoint-access in α ·ρ1 ·ρ2. Then, T1 and T2 do not contend on any base
object in α ·ρ1 ·ρ2.
4 Time and space complexity of sequential TMs
In this section, we prove that (1) that a read-only transaction in an opaque TM fea-
tured with weak DAP and weak invisible reads must incrementally validate every
next read operation, and (2) a strictly serializable TM (under weak DAP and weak
read invisibility), must have a read-only transaction that accesses a linear (in the
size of the transaction’s read set) number of distinct base objects in the course of
performing its last t-read and tryCommit operations.
We first prove the following lemma concerning strictly serializable weak DAP
TM implementations.
Lemma 2. Let M be any strictly serializable, weak DAP TM implementation that
provides sequential TM-progress. Then, for all i ∈ N, M has an execution of the
form pi i−1 ·ρ i ·α i where,
• pi i−1 is the complete step contention-free execution of read-only transaction
Tφ that performs (i−1) t-reads: readφ (X1) · · ·readφ (Xi−1),
• ρ i is the t-complete step contention-free execution of a transaction Ti that
writes nvi 6= vi to Xi and commits,
• αi is the complete step contention-free execution fragment of Tφ that per-
forms its ith t-read: readφ (Xi)→ nvi.
7
Proof. By sequential TM-progress, M has an execution of the form ρ i ·pi i−1. Since
Dset(Tk)∩Dset(Ti) = /0 in ρ i ·pi i−1, by Lemma 1, transactions Tφ and Ti do not con-
tend on any base object in execution ρ i ·pi i−1. Thus, ρ i ·pi i−1 is also an execution
of M.
By assumption of strict serializability, ρ i · pi i−1 ·αi is an execution of M in
which the t-read of Xi performed by Tφ must return nvi. But ρ i · pi i−1 ·αi is in-
distinguishable to Tφ from pi i−1 · ρ i ·αi. Thus, M has an execution of the form
pi i−1 ·ρ i ·αi.
Theorem 3. For every weak DAP TM implementation M that provides ICF TM-
liveness, sequential TM-progress and uses weak invisible reads,
(1) If M is opaque, for every m ∈ N, there exists an execution E of M such that
some transaction T ∈ txns(E) performs Ω(m2) steps, where m = |Rset(Tk)|.
(2) if M is strictly serializable, for every m ∈ N, there exists an execution E of
M such that some transaction Tk ∈ txns(E) accesses at least m− 1 distinct
base objects during the executions of the mth t-read operation and tryCk(),
where m = |Rset(Tk)|.
Proof. For all i ∈ {1, . . . ,m}, let v be the initial value of t-object Xi.
(1) Suppose that M is opaque. Let pim denote the complete step contention-
free execution of a transaction Tφ that performs m t-reads: readφ (X1) · · ·readφ (Xm)
such that for all i ∈ {1, . . . ,m}, readφ (Xi)→ v.
By Lemma 2, for all i ∈ {2, . . . ,m}, M has an execution of the form E i =
pi i−1 ·ρ i ·αi.
For each i ∈ {2, . . . ,m}, j ∈ {1,2} and `≤ (i−1), we now define an execution
of the form Eij` = pi
i−1 ·β ` ·ρ i ·α ij as follows:
• β ` is the t-complete step contention-free execution fragment of a transaction
T` that writes nv` 6= v to X` and commits
• α i1 (and resp. α i2) is the complete step contention-free execution fragment of
readφ (Xi)→ v (and resp. readφ (Xi)→ Aφ ).
Claim 4. For all i∈ {2, . . . ,m} and `≤ (i−1), M has an execution of the form Ei1`
or Ei2`.
Proof. For all i ∈ {2, . . . ,m}, pi i−1 is an execution of M. By assumption of weak
invisible reads and sequential TM-progress, T` must be committed in pi i−1 ·ρ` and
M has an execution of the form pi i−1 ·β `. By the same reasoning, since Ti and T`
have disjoint data sets, M has an execution of the form pi i−1 ·β ` ·ρ i.
Since the configuration after pi i−1 · β ` · ρ i is quiescent, by ICF TM-liveness,
pi i−1 ·β ` ·ρ i extended with readφ (Xi)must return a matching response. If readφ (Xi)→
vi, then clearly Ei1 is an execution of M with Tφ ,Ti−1,Ti being a valid serialization
of transactions. If readφ (Xi)→ Aφ , the same serialization justifies an opaque exe-
cution.
8
Suppose by contradiction that there exists an execution of M such that pi i−1 ·
β ` ·ρ i is extended with the complete execution of readφ (Xi)→ r; r 6∈ {Aφ ,v}. The
only plausible case to analyse is when r = nv. Since readφ (Xi) returns the value
of Xi updated by Ti, the only possible serialization for transactions is T`, Ti, Tφ ;
but readφ (X`) performed by Tk that returns the initial value v is not legal in this
serialization—contradiction.
We now prove that, for all i ∈ {2, . . . ,m}, j ∈ {1,2} and `≤ (i−1), transaction Tφ
must access (i−1) different base objects during the execution of readφ (Xi) in the
execution pi i−1 ·β ` ·ρ i ·α ij.
By the assumption of weak invisible reads, the execution pi i−1 ·β ` ·ρ i ·α ij is in-
distinguishable to transactions T` and Ti from the execution p˜i i−1 ·β ` ·ρ i ·α ij, where
Rset(Tφ ) = /0 in p˜i i−1. But transactions T` and Ti are disjoint-access in p˜i i−1 ·β ` ·ρ i
and by Lemma 1, they cannot contend on the same base object in this execution.
Consider the (i−1) different executions: pi i−1 ·β 1 ·ρ i, . . ., pi i−1 ·β i−1 ·ρ i. For
all `,`′ ≤ (i−1);`′ 6= `, M has an execution of the form pi i−1 ·β ` ·ρ i ·β `′ in which
transactions T` and T`′ access mutually disjoint data sets. By weak invisible reads
and Lemma 1, the pairs of transactions T`′ , Ti and T`′ , T` do not contend on any
base object in this execution. This implies that pi i−1 · β ` · β `′ · ρ i is an execution
of M in which transactions T` and T`′ each apply nontrivial primitives to mutually
disjoint sets of base objects in the execution fragments β ` and β `′ respectively (by
Lemma 1).
This implies that for any j ∈ {1,2}, ` ≤ (i− 1), the configuration Ci after E i
differs from the configurations after Eij` only in the states of the base objects that
are accessed in the fragment β `. Consequently, transaction Tφ must access at least
i−1 different base objects in the execution fragment pi ij to distinguish configuration
Ci from the configurations that result after the (i−1) different executions pi i−1 ·β 1 ·
ρ i, . . ., pi i−1 ·β i−1 ·ρ i respectively.
Thus, for all i ∈ {2, . . . ,m}, transaction Tφ must perform at least i− 1 steps
while executing the ith t-read in pi ij and Tφ itself must perform
m−1
∑
i=1
i= m(m−1)2 steps.
(2) Suppose that M is strictly serializable, but not opaque. Since M is strictly
serializable, by Lemma 2, it has an execution of the form E = pim−1 ·ρm ·αm.
For each `≤ (i−1), we prove that M has an execution of the form E` = pim−1 ·
β ` ·ρm · α¯m where α¯m is the complete step contention-free execution fragment of
readφ (Xm) followed by the complete execution of tryCφ . Indeed, by weak invisible
reads, pim−1 does not contain any nontrivial events and the execution pim−1 ·β ` ·ρm
is indistinguishable to transactions T` and Tm from the executions p˜im−1 · β ` and
p˜im−1 ·β ` ·ρm respectively, where Rset(Tφ ) = /0 in p˜im−1. Thus, applying Lemma 1,
transactions β ` ·ρm do not contend on any base object in the execution pim−1 ·β ` ·
ρm. By ICF TM-liveness, readφ (Xm) and tryCφ must return matching responses
in the execution fragment α¯m that extends pim−1 ·β ` ·ρm. Consequently, for each
` ≤ (i− 1), M has an execution of the form E` = pim−1 · β ` · ρm · α¯m such that
9
Algorithm 1 Mutual-exclusion object L from a strongly progressive, strict serial-
izable TM M; code for process pi; 1≤ i≤ n
1: Local variables:
2: bit facei, for each process pi
3: Shared objects:
4: strongly progressive, strictly
5: serializable TM M
6: t-object X , initially ⊥
7: storing value v ∈ {[pi, facei]}∪{⊥}
8: for each tuple [pi, facei]
9: Done[pi, facei] ∈ {true, false}
10: Succ[pi, facei] ∈ {p1, . . . , pn}∪{⊥}
11: for each pi and j ∈ {1, . . . ,n}\{i}
12: Lock[pi][p j] ∈ {locked,unlocked}
13: Function: func():
14: atomic using M
15: value := tx-read(X)
16: tx-write(X , [pi, facei])
17: on abort Return false
18: Return value
19: Entry:
20: facei := 1− facei
21: Done[pi, facei].write(false)
22: Succ[pi, facei].write(⊥)
23: while (prev← func) = false do
24: no op
25: end while
26: if prev 6=⊥ then
27: Lock[pi][prev.pid].write(locked)
28: Succ[prev].write(pi)
29: if Done[prev] = false then
30: while Lock[pi][prev.pid] = unlocked
do
31: no op
32: end while
33: Return ok
34: // Critical section
35: Exit:
36: Done[pi, facei].write(true)
37: Lock[Succ[pi, facei]][pi].write(unlocked)
38: Return ok
transactions T` and Tm do not contend on any base object.
Strict serializability of M means that if readφ (Xm)→ nv in the execution frag-
ment α¯m, then tryCφ must return Aφ . Otherwise if readφ (Xm)→ v (i.e. the initial
value of Xm), then tryCφ may return Aφ or Cφ .
Thus, as with (1), in the worst case, Tφ must access at least m−1 distinct base
objects during the executions of readφ (Xm) and tryCφ to distinguish the configura-
tion Ci from the configurations after the m−1 different executions pim−1 ·β 1 ·ρm,
. . ., pim−1 ·βm−1 ·ρm respectively.
5 RMR complexity of strongly progressive TMs
In this section, we prove every strongly progressive strictly serializable TM that
uses only read, write and conditional RMW primitives has an execution in which
in which n concurrent processes perform transactions on a single data item and
incur Ω(logn) remote memory references [2].
Remote memory references(RMR) [3]. In the cache-coherent (CC) shared mem-
ory, each process maintains local copies of shared objects inside its cache, whose
consistency is ensured by a coherence protocol. Informally, we say that an access
to a base object b is remote to a process p and causes a remote memory reference
10
(RMR) if p’s cache contains a cached copy of the object that is out of date or inval-
idated; otherwise the access is local.
In the write-through (CC) protocol, to read a base object b, process p must have
a cached copy of b that has not been invalidated since its previous read. Otherwise,
p incurs a RMR. To write to b, p causes a RMR that invalidates all cached copies
of b and writes to the main memory.
In the write-back (CC) protocol, p reads a base object b without causing a RMR
if it holds a cached copy of b in shared or exclusive mode; otherwise the access of
b causes a RMR that (1) invalidates all copies of b held in exclusive mode, and
writing b back to the main memory, (2) creates a cached copy of b in shared mode.
Process p can write to b without causing a RMR if it holds a copy of b in exclusive
mode; otherwise p causes a RMR that invalidates all cached copies of b and creates
a cached copy of b in exclusive mode.
In the distributed shared memory (DSM), each register is forever assigned to a
single process and it remote to the others. Any access of a remote register causes a
RMR.
Mutual exclusion. The mutex object supports two operations: Enter and Exit, both
of which return the response ok. We say that a process pi is in the critical section
after an execution pi if pi contains the invocation of Enter by pi that returns ok, but
does not contain a subsequent invocation of Exit by pi in pi .
A mutual exclusion implementation satisfies the following properties:
(Mutual-exclusion) After any execution pi , there exists at most one process that
is in the critical section.
(Deadlock-freedom) Let pi be any execution that contains the invocation of
Enter by process pi. Then, in every extension of pi in which every process takes
infinitely many steps, some process is in the critical section.
(Finite-exit) Every process completes the Exit operation within a finite number
of steps.
5.1 Mutual exclusion from a strongly progressive TM
We describe an implementation of a mutex object L(M) from a strictly serializable,
strongly progressive TM implementation M (Algorithm 1). The algorithm is based
on the mutex implementation in [17].
Given a sequential implementation, we use a TM to execute the sequential code
in a concurrent environment by encapsulating each sequential operation within an
atomic transaction that replaces each read and write of a t-object with the trans-
actional read and write implementations, respectively. If the transaction commits,
then the result of the operation is returned; otherwise if one of the transactional
operations aborts. For instance, in Algorithm 1, we wish to atomically read a t-
object X , write a new value to it and return the old value of X prior to this write.
To achieve this, we employ a strictly serializable TM implementation M. More-
over, we assume that M is strongly progressive, i.e., in every execution, at least one
transaction successfully commits and the value of X is returned.
11
Shared objects. We associate each process pi with two alternating identities
[pi, facei]; facei ∈ {0,1}. The strongly progressive TM implementation M is used
to enqueue processes that attempt to enter the critical section within a single t-
object X (initially ⊥). For each [pi, facei], L(M) uses a register bit Done[pi, facei]
that indicates if this face of the process has left the critical section or is executing
the Entry operation. Additionally, we use a register Succ[pi, facei] that stores the
process expected to succeed pi in the critical section. If Succ[pi, facei] = p j, we
say that p j is the successor of pi (and pi is the predecessor of p j). Intuitively,
this means that p j is expected to enter the critical section immediately after pi.
Finally, L(M) uses a 2-dimensional bit array Lock: for each process pi, there are
n−1 registers associated with the other processes. For all j ∈ {0, . . . ,n−1}\{i},
the registers Lock[pi][p j] are local to pi and registers Lock[p j][pi] are remote to pi.
Process pi can only access registers in the Lock array that are local or remote to it.
Entry operation. A process pi adopts a new identity facei and writes false to
Done(pi, facei) to indicate that pi has started the Entry operation. Process pi now
initializes the successor of [pi, facei] by writing ⊥ to Succ[pi, facei]. Now, pi uses
a strongly progressive TM implementation M to atomically store its pid and iden-
tity i.e., facei to t-object X and returns the pid and identity of its predecessor, say
[p j, face j]. Intuitively, this suggests that [pi, facei] is scheduled to enter the critical
section immediately after ]p j, face j] exits the critical section. Note that if pi reads
the initial value of t-object X , then it immediately enters the critical section. Other-
wise it writes locked to the register Lock[pi, p j] and sets itself to be the successor of
[p j, face j] by writing pi to Succ[p j, face j]. Process pi now checks if p j has started
the Exit operation by checking if Done[p j, face j] is set. If it is, pi enters the critical
section; otherwise pi spins on the register Lock[pi][p j] until it is unlocked.
Exit operation. Process pi first indicates that it has exited the critical section by
setting Done[pi, facei], following which it unlocks the register Lock[Succ[pi, facei]][pi]
to allow pi’s successor to enter the critical section.
5.2 Proof of correctness
Lemma 5. The implementation L(M) (Algorithm 1) satisfies mutual exclusion.
Proof. Let E be any execution of L(M). We say that [pi, facei] is the successor of
[p j, face j] if pi reads the value of prev in Line 25 to be [p j, face j] (and [p j, face j] is
the predecessor of [pi, facei]); otherwise if pi reads the value to be ⊥, we say that
pi has no predecessor.
Suppose by contradiction that there exist processes pi and p j that are both
inside the critical section after E. Since pi is inside the critical section, either (1)
pi read prev = ⊥ in Line 23, or (2) pi read that Done[prev] is true (Line 29) or pi
reads that Done[prev] is false and Lock[pi][prev.pid] is unlocked (Line 30).
(Case 1) Suppose that pi read prev =⊥ and entered the critical section. Since
in this case, pi does not have any predecessor, some other process that returns
successfully from the while loop in Line 25 must be successor of pi in E. Since
12
there exists [p j, face j] also inside the critical section after E, p j reads that either
[pi, facei] or some other process to be its predecessor. Observe that there must exist
some such process [pk, facek] whose predecessor is [pi, facei]. Hence, without loss
of generality, we can assume that [p j, face j] is the successor of [pi, facei]. By our
assumption, [p j, face j] is also inside the critical section. Thus, p j locked the regis-
ter Lock[p j, pi] in Line 27 and set itself to be pi’s successor in Line 28. Then, p j
read that Done[pi, facei] is true or read that Done[pi, facei] is false and waited until
Lock[p j, pi] is unlocked and then entered the critical section. But this is possible
only if pi has left the critical section and updated the registers Done[pi, facei] and
Lock[p j, pi] in Lines 36 and 37 respectively—contradiction to the assumption that
[pi, facei] is also inside the critical section after E.
(Case 2) Suppose that pi did not read prev =⊥ and entered the critical section.
Thus, pi read that Done[prev] is false in Line 29 and Lock[pi][prev.pid] is unlocked
in Line 30, where prev is the predecessor of [pi, facei]. As with case 1, without
loss of generality, we can assume that [p j, face j] is the successor of [pi, facei] or
[p j, face j] is the predecessor of [pi, facei].
Suppose that [p j, face j] is the predecessor of [pi, facei], i.e., pi writes the value
[pi, facei] to the register Succ[p j, face j] in Line 28. Since [p j, face j] is also inside
the critical section after E, process pi must read that Done[p j, face j] is true in
Line 29 and Lock[pi, p j] is locked in Line 30. But then pi could not have entered
the critical section after E—contradiction.
Suppose that [p j, face j] is the successor of [pi, facei], i.e., p j writes the value
[p j, face j] to the register Succ[pi, facei]. Since both pi and p j are inside the critical
section after E, process p j must read that Done[pi, facei] is true in Line 29 and
Lock[p j, pi] is locked in Line 30. Thus, p j must spin on the register Lock[p j, pi],
waiting for it to be unlocked by pi before entering the critical section—contradiction
to the assumption that both pi and p j are inside the critical section.
Thus, L(M) satisfies mutual-exclusion.
Lemma 6. The implementation L(M) (Algorithm 1) provides deadlock-freedom.
Proof. Let E be any execution of L(M). Observe that a process may be stuck
indefinitely only in Lines 23 and 30 as it performs the while loop.
Since M is strongly progressive, in every execution E that contains an invoca-
tion of Enter by process pi, some process returns true from the invocation of func()
in Line 23.
Now consider a process pi that returns successfuly from the while loop in
Line 23. Suppose that pi is stuck indefinitely as it performs the while loop in
Line 30. Thus, no process has unlocked the register Lock[pi][prev.pid] by writing
to it in the Exit section. Recall that since [pi, facei] has reached the while loop in
Line 30, [pi, facei] necessarily has a predecessor, say [p j, face j], and has set itself
to be p j’s successor by writing pi to register Succ[p j, face j] in Line 28. Consider
the possible two cases: the predecessor of [p j, face j is some process pk;k 6= i or the
predecessor of [p j, face j is the process pi itself.
13
(Case 1) Since by assumption, process p j takes infinitely many steps in E, the
only reason that p j is stuck without entering the critical section is that [pk, facek] is
also stuck in the while loop in Line 30. Note that it is possible for us to iteratively
extend this execution in which pk’s predecessor is a process that is not pi or p j
that is also stuck in the while loop in Line 30. But then the last such process
must eventually read the corresponding Lock to be unlocked and enter the critical
section. Thus, in every extension of E in which every process takes infinitely many
steps, some process will enter the critical section.
(Case 2) Suppose that the predecessor of [p j, face j is the process pi itself.
Thus, as [pi, face] is stuck in the while loop waiting for Lock[pi, p j] to be unlocked
by process p j, p j leaves the critical section, unlocks Lock[pi, p j] in Line 37 and
prior to the read of Lock[pi, p j], p j re-starts the Entry operation, writes false to
Done[p j,1− face j] and sets itself to be the successor of [pi, facei] and spins on
the register Lock[p j, pi]. However, observe that process pi, which takes infinitely
many steps by our assumption must eventually read that Lock[pi, p j] is unlocked
and enter the critical section, thus establishing deadlock-freedom.
We say that a TM implementation M accesses a single t-object if in every
execution E of M and every transaction T ∈ txns(E), |Dset(T )| ≤ 1. We can now
prove the following theorem:
Theorem 7. Any strictly serializable, strongly progressive TM implementation M
that accesses a single t-object implies a deadlock-free, finite exit mutual exclusion
implementation L(M) such that the RMR complexity of M is within a constant
factor of the RMR complexity of L(M).
Proof. (Mutual-exclusion) Follows from Lemma 5.
(Finite-exit) The proof is immediate since the Exit operation contains no un-
bounded loops or waiting statements.
(Deadlock-freedom) Follows from Lemma 6.
(RMR complexity) First, let us consider the CC model. Observe that every
event not on M performed by a process pi as it performs the Entry or Exit oper-
ations incurs O(1) RMR cost clearly, possibly barring the while loop executed in
Line 30. During the execution of this while loop, process pi spins on the regis-
ter Lock[pi][p j], where p j is the predecessor of pi. Observe that pi’s cached copy
of Lock[pi][p j] may be invalidated only by process p j as it unlocks the register
in Line 37. Since no other process may write to this register and pi terminates
the while loop immediately after the write to Lock[pi][p j] by p j, pi incurs O(1)
RMR’s. Thus, the overall RMR cost incurred by M is within a constant factor of
the RMR cost of L(M).
Now we consider the DSM model. As with the reasoning for the CC model,
every event not on M performed by a process pi as it performs the Entry or Exit
operations incurs O(1) RMR cost clearly, possibly barring the while loop executed
in Line 30. During the execution of this while loop, process pi spins on the register
Lock[pi][p j], where p j is the predecessor of pi. Recall that Lock[pi][p j] is a register
14
that is local to pi and thus, pi does not incur any RMR cost on account of executing
this loop. It follows that pi incurs O(1) RMR cost in the DSM model. Thus, the
overall RMR cost of M is within a constant factor of the RMR cost of L(M) in the
DSM model.
Theorem 8. ( [3]) Any deadlock-free, finite-exit mutual exclusion implementation
from read, write and conditional primitives has an execution whose RMR complex-
ity is Ω(n logn).
Theorems 8 and 7 imply:
Theorem 9. Any strictly serializable, strongly progressive TM implementation
from read, write and conditional primitives that accesses a single t-object has an
execution whose RMR complexity is Ω(n logn).
6 Related work and concluding remarks
Theorem 3 improves the read-validation step-complexity lower bound [13, 14] de-
rived for strict-data partitioning (a very strong version of DAP) and (strong) invis-
ible reads. In a strict data partitioned TM, the set of base objects used by the TM
is split into disjoint sets, each storing information only about a single data item. In-
deed, every TM implementation that is strict data-partitioned satisfies weak DAP,
but not vice-versa. The definition of invisible reads assumed in [13, 14] requires
that a t-read operation does not apply nontrivial events in any execution. Theo-
rem 3 however, assumes weak invisible reads, stipulating that t-read operations of
a transaction T do not apply nontrivial events only when T is not concurrent with
any other transaction.
The notion of weak DAP used in this paper was introduced by Attiya et al. [5].
Proving a lower bound for a concurrent object by reduction to a form of mutual
exclusion has previously been used in [1, 14]. Guerraoui and Kapalka [14] proved
that it is impossible to implement strictly serializable strongly progressive TMs
that provide wait-free TM-liveness (every t-operation returns a matching response
within a finite number of steps) using only read and write primitives. Alistarh et al.
proved a lower bound on RMR complexity of renaming problem [1]. Our reduction
algorithm (Section 5) is inspired by the O(1) RMR mutual exclusion algorithm by
Lee [17].
To the best of our knowledge, the TM properties assumed for Theorem 3 cover
all of the TM implementations that are subject to the validation step-complexity [6,
7, 16].
It is easy to see that the lower bound of Theorem 3 is tight for both strict seri-
alizability and opacity. We refer to the TM implementation in [19] or DSTM [16]
for the matching upper bound.
Finally, we conjecture that the lower bound of Theorem 9 is tight. Proving this
remains an interesting open question.
15
References
[1] D. Alistarh, J. Aspnes, S. Gilbert, and R. Guerraoui. The complexity of
renaming. In IEEE 52nd Annual Symposium on Foundations of Computer
Science, FOCS 2011, Palm Springs, CA, USA, October 22-25, 2011, pages
718–727, 2011.
[2] T. E. Anderson. The performance of spin lock alternatives for shared-memory
multiprocessors. IEEE Trans. Parallel Distrib. Syst., 1(1):6–16, 1990.
[3] H. Attiya, D. Hendler, and P. Woelfel. Tight rmr lower bounds for mutual
exclusion and other problems. In Proceedings of the Twenty-seventh ACM
Symposium on Principles of Distributed Computing, PODC ’08, pages 447–
447, New York, NY, USA, 2008. ACM.
[4] H. Attiya and E. Hillel. The cost of privatization in software transactional
memory. IEEE Trans. Computers, 62(12):2531–2543, 2013.
[5] H. Attiya, E. Hillel, and A. Milani. Inherent limitations on disjoint-access
parallel implementations of transactional memory. Theory of Computing Sys-
tems, 49(4):698–719, 2011.
[6] L. Dalessandro, M. F. Spear, and M. L. Scott. Norec: Streamlining stm by
abolishing ownership records. SIGPLAN Not., 45(5):67–78, Jan. 2010.
[7] D. Dice, O. Shalev, and N. Shavit. Transactional locking ii. In Proceedings
of the 20th International Conference on Distributed Computing, DISC’06,
pages 194–208, Berlin, Heidelberg, 2006. Springer-Verlag.
[8] D. Dice and N. Shavit. What really makes transactions fast? In Transact,
2006.
[9] D. Dice and N. Shavit. TLRW: return of the read-write lock. In SPAA, pages
284–293, 2010.
[10] F. Ellen, D. Hendler, and N. Shavit. On the inherent sequentiality of concur-
rent objects. SIAM J. Comput., 41(3):519–536, 2012.
[11] F. Fich, D. Hendler, and N. Shavit. On the inherent weakness of conditional
synchronization primitives. In Proceedings of the Twenty-third Annual ACM
Symposium on Principles of Distributed Computing, PODC ’04, pages 80–
87, New York, NY, USA, 2004. ACM.
[12] K. Fraser. Practical lock-freedom. Technical report, Cambridge University
Computer Laborotory, 2003.
[13] R. Guerraoui and M. Kapalka. The semantics of progress in lock-based trans-
actional memory. SIGPLAN Not., 44(1):404–415, Jan. 2009.
16
[14] R. Guerraoui and M. Kapalka. Principles of Transactional Memory,Synthesis
Lectures on Distributed Computing Theory. Morgan and Claypool, 2010.
[15] M. Herlihy. Wait-free synchronization. ACM Trans. Prog. Lang. Syst.,
13(1):123–149, 1991.
[16] M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Software trans-
actional memory for dynamic-sized data structures. In Proceedings of the
Twenty-second Annual Symposium on Principles of Distributed Computing,
PODC ’03, pages 92–101, New York, NY, USA, 2003. ACM.
[17] L. Hyonho. Local-spin mutual exclusion algorithms on the DSM model using
fetch-and-store objects. 2003.
[18] A. Israeli and L. Rappoport. Disjoint-access-parallel implementations of
strong shared memory primitives. In PODC, pages 151–160, 1994.
[19] P. Kuznetsov and S. Ravi. On the cost of concurrency in transactional mem-
ory. CoRR, abs/1103.1302, 2011.
[20] P. Kuznetsov and S. Ravi. On partial wait-freedom in transactional memory.
CoRR, abs/1407.6876, 2014.
[21] C. H. Papadimitriou. The serializability of concurrent database updates. J.
ACM, 26:631–653, 1979.
[22] D. Perelman, R. Fan, and I. Keidar. On maintaining multiple versions in
STM. In PODC, pages 16–25, 2010.
[23] F. Tabba, M. Moir, J. R. Goodman, A. W. Hay, and C. Wang. Nztm:
Nonblocking zero-indirection transactional memory. In Proceedings of the
Twenty-first Annual Symposium on Parallelism in Algorithms and Architec-
tures, SPAA ’09, pages 204–213, New York, NY, USA, 2009. ACM.
17
