On the Cost of Concurrency in Transactional Memory by Kuznetsov, Petr & Ravi, Srivatsan
On the Cost of Concurrency in Transactional Memory
Petr Kuznetsov
TU Berlin/Deutsche Telekom Laboratories
Srivatsan Ravi
TU Berlin/Deutsche Telekom Laboratories ∗
Abstract
The promise of software transactional memory (STM) is to combine an easy-to-use pro-
gramming interface with an efficient utilization of the concurrent-computing abilities pro-
vided by modern machines. But does this combination come with an inherent cost?
We evaluate the cost of concurrency by measuring the amount of expensive synchroniza-
tion that must be employed in an STM implementation that ensures positive concurrency,
i.e., allows for concurrent transaction processing in some executions. We focus on two
popular progress conditions that provide positive concurrency: progressiveness and permis-
siveness.
We show that in permissive STMs, providing a very high degree of concurrency, a trans-
action performs a linear number of expensive synchronization patterns with respect to its
read-set size. In contrast, progressive STMs provide a very small degree of concurrency
but, as we demonstrate, can be implemented using at most one expensive synchronization
pattern per transaction. However, we show that even in progressive STMs, a transaction
has to “protect” (e.g., by using locks or strong synchronization primitives) a linear amount
of data with respect to its write-set size. Our results suggest that looking for high degrees
of concurrency in STM implementations may bring a considerable synchronization cost.
Keywords: Transactional Memory, Concurrency, RAW/AWAR complexity
∗The research leading to these results has received funding from the European Union Seventh Framework
Programme (FP7/2007-2013) under grant agreement N 238639, ITN project TRANSFORM.
ar
X
iv
:1
10
3.
13
02
v9
  [
cs
.D
C]
  2
7 J
un
 20
13
1 Introduction
The software transactional memory (STM) paradigm promises to efficiently exploit the concur-
rency provided by modern computers while offering an easy-to-use programming interface. It
allows a programmer to write a concurrent program as a sequence of transactions. A transaction
is a series of read and write operations on transactional objects (or t-objects). An STM imple-
mentation turns this series into a sequence of accesses to underlying base objects and exports
“all-or-nothing” semantics: every transaction either commits in which case all its operations are
expected to instantaneously “take effect”, or aborts in which case the transaction does not affect
any other transaction. In this paper, the default STM correctness property is opacity [13, 15]
that, informally, requires that in every execution, there is a total order on all transactions,
including aborted ones, where every read operation returns the argument of the last committed
write operation on the read t-object.
An STM implementation that aborts every transaction is trivially correct but useless. There-
fore, we need to specify a progress condition that captures the execution scenarios in which a
transaction should commit. Consider, for example, a simple non-trivial progress condition that
requires a transaction to commit if it does not overlap with any other transaction. This condi-
tion can be implemented using a single lock that is acquired at the beginning of a transaction
and released at its end. The resulting “single-lock” STM will be running one transaction at
a time, thus ignoring the potential benefits of multiprocessing. Similarly, an obstruction-free
STM [12] that only requires a transaction to commit if it eventually runs with no contention
allows for no concurrency at all. But to exploit the power of modern multiprocessor machines,
an STM implementation must allow at least some transactions to make progress concurrently.
If this is the case, we say that the implementation provides positive concurrency, in contrast to
zero concurrency provided by “single-lock” and obstruction-free STMs.
In this paper, we try to understand the inherent costs of allowing multiple concurrent trans-
actions to commit. Therefore, we focus on progress conditions that provide positive concurrency:
progressiveness [14] and permissiveness [11]. Informally, a progressive STM [14] provides a very
small degree of concurrency by only enforcing a transaction T to commit if it encounters no
concurrent conflicting transaction T ′: T and T ′ conflict on a t-object X if they concurrently
access X and one of the transactions tries to update X. A stronger variant of progressiveness,
called strong progressiveness, additionally requires that in case a set of transactions conflict on
at most one t-object, at least one transaction commits. A much more demanding permissive
STM [11] stipulates that a transaction must commit, unless committing it violates correctness,
which, informally, provides the highest degree of concurrency.
To understand the inherent cost of positive concurrency in STM implementations, we first
consider the number of RAW/AWAR synchronization patterns [6] that must be performed by a
process in the course of a transaction. A read-after-write (RAW) pattern consists of a write to
a (shared) base object x followed by a read from a different base object y (without a write to
y in between). An atomic write-after-read (AWAR) pattern consists of an atomic (indivisible)
execution of a read of a base object followed by a write on (possibly the same) base object.
Accounting for RAW/AWAR patterns is important since most modern processor architectures
use relaxed memory models, where maintaining the order of operations in a RAW requires a
memory fence [21] and each AWAR is manifested as an atomic instruction such as Compare-
and-Swap (CAS). In most architectures, memory fences and atomic instructions are believed to
be considerably slower than regular shared-memory accesses [1, 19, 21, 20].
We show that every permissive and opaque STM implementation has, for any m ∈ N, an
execution in which a transaction with a read set of size m incurs Ω(m) consecutive RAW/AWAR
patterns. This contrasts with a single-lock STM that uses only one such pattern, since a
1
successful lock acquisition can be implemented using only one (multi-) RAW [18]1 or AWAR [4].
We show that one RAW/AWAR is in fact optimal for single lock STMs. Moreover, we present
implementations of progressive STMs that employ just a single RAW or AWAR pattern per
transaction. Also, we describe a strongly progressive space-bounded STM implementation that
incurs four RAWs per transaction.
These implementations suggest that the RAW/AWAR metric is too coarse-grained to evalu-
ate the complexity of progressive STMs. Therefore, we introduce a new metric called protected
data size that, intuitively, captures the amount of data that a transaction must exclusively
control at some point of its execution. All progressive STM implementations we are aware of
(see, e.g., an overview in [14]) use locks or timing assumptions to give an updating transaction
exclusive access to all objects in its write set at some point of its execution. E.g., lock-based
progressive implementations require that a transaction grabs all locks on its write set before
updating the corresponding base objects. Our results show that this is an inherent price to
pay for providing progressive concurrency: every committed transaction in a progressive and
strict disjoint-access-parallel2 STM implementation must, at some point of its execution, pro-
tect every object in its write set. Interestingly, as our progressive implementations show, the
transaction’s read set does not need to be protected.
In brief, our results imply that providing high degrees of concurrency in opaque STM im-
plementations incurs a considerable synchronization cost. Permissive STMs, while providing
the best possible concurrency in theory, require a strong synchronization primitive or a mem-
ory fence per read operation, which may result in excessively slow execution times. Progressive
STMs provide only basic concurrency but perform considerably better in this respect: we present
progressive implementations that incur constant RAW/AWAR complexity. Does this mean that
maximizing the ability of processing multiple transactions in parallel should not be an important
factor in STM design? Should we rather assume little positive concurrency provided by pro-
gressiveness or even focus on speculative single-lock solutions a´ la flat combining [16]? Difficult
to say affirmatively, but our results suggest so.
The rest of the paper is organized as follows. Section 2 briefly introduces our system model
and recalls the correctness criteria in STM. Section 3 presents some useful properties of STM
implementations and Section 4 recalls the definitions of progress conditions of STM, including
progressiveness and permissiveness. Section 5 presents the definitions of RAW/AWAR complex-
ity. Sections 6 presents a linear lower bound on the number of RAW/AWAR patterns executed
by a transaction in a permissive STM. Section 7 describes our progressive STM implementations
that perform constant RAWs or AWARs per transaction and presents a lower bound on the
amount of data to be protected by a transaction in a progressive STM. Section 8 summarizes
some related work and Section 9 concludes the paper. Detailed proofs are delegated to the
optional Appendix.
2 Model
Our STM model, while keeping the spirit of the original definitions of [13, 15], introduces some
refinements that are instrumental for our results.
Transactions. Transactional memory provides the ability of reading and writing to a set
of transactional objects, or t-objects using atomic transactions. A transaction is a sequence
1A multi-RAW consists of a series of writes followed by a series of reads from a distinct locations. Maintaining
the multi-RAW order can be achieved with a single memory fence.
2A disjoint-access-parallel STM implementation [17, 8] guarantees that transactions accessing disjoint sets of
transactional objects are executed independently of each other, i.e., without conflicting on the base objects.
2
of accesses (reads or writes) to t-objects. We assume that every transaction Tk has a unique
identifier k. Formally, STM exports the following operations (called tm-operations in the paper):
(1) readk(X) that returns a value in a set V or a special value Ak /∈ V (abort); (2) writek(X, v)
that returns okk or Ak; (3) tryC k that returns Ck /∈ V (commit)or Ak and (4) tryAk that
returns Ak.
A history H is a sequence of invocations and responses of tm-operations. A history H
is sequential if every invocation is either the last event in H or is immediately followed by a
matching response. H|k denotes the subsequence of H restricted to events with index k. If H|k
is non-empty we say that Tk participates in H, and parts(H) denotes the set of transactions
that participate in H. A history is well-formed if for all Tk, H|k is sequential and contains no
events that appear after Ak or Ck. Throughout this paper, we assume that all histories are well-
formed, i.e., the user of transactional memory never invokes a new operation before receiving
a response from the current one and does not invoke any operation opk after Tk has returned
Ck or Ak. A history H is complete if for every Tk ∈ parts(H), H|k ends with a response event.
A transaction Tk ∈ parts(H) is live in H if H|k does not end with Ak or Ck. Otherwise, Tk
is called complete. A history is t-complete if parts(H) contains only complete transactions. A
transaction Tk ∈ parts(H) is forcefully aborted in H if some operation opk 6= tryAk returns Ak.
Two histories H and H ′ are equivalent if for every transaction Tk, H|k = H ′|k.
The read set (resp., the write set) of a transaction Tk ∈ parts(H), denoted Rset(Tk) (resp.,
Wset(Tk)), is the set of t-objects that Tk reads (resp., writes to) in H. Dset(Tk) = Rset(Tk) ∪
Wset(Tk) is called the data set of Tk. A transaction Tk is called read-only if Wset(Tk) = ∅,
otherwise, it is called updating.
Real-time and deferred-update orders. For Tk, Tm ∈ parts(H), we say that Tk precedes
Tm in the real-time order in H, and we write Tk ≺H Tm, if Tk is committed or aborted and the
last event of Tk precedes the first event of Tm in H. If neither Tk ≺H Tm nor Tm ≺H Tk, then we
say that Tk and Tm are concurrent in H. A transaction Tk ∈ parts(H) which is not concurrent
with any other transaction in H is called uncontended in H. A history H is t-sequential if no
two transactions are concurrent in H.
For Tk, Tm ∈ parts(H), we say that Tk precedes Tm in the deferred-update order, and we
write Tk ≺DUH Tm if there exists X ∈ Rset(Tk) ∩Wset(Tm), Tm has committed, such that the
response of readk(X) precedes the invocation of tryCm() in H. For Tk, Tm ∈ parts(H), we write
Tk
X≺HTm, if Tk has committed and the response of readm(X), X ∈ Rset(Tm)∩Wset(Tk) returns
v, the value of X updated in writek(X, v).
Legal histories. Let H be a complete t-sequential history. For every operation readk(X) in
H that reads a t-object X, we define the latest written value of X as follows: (1) If Tk contains
a writek(X, v) preceding readk(X) then the latest written value of X is the value of the latest
such write. (2) Otherwise, if H contains a writem(X, v) such that m 6= k, Tm precedes Tk,
and Tm commits in H, then the latest written value of X is the value of the latest such write
in H. (3) Otherwise, the latest written value of X is the initial value of X. Without loss of
generality, we assume that H starts with a fictitious initializing transaction T0 that writes 0 to
every t-object. We say that a complete t-sequential history H is legal if for every t-object X,
every read of X in H returns the latest written value of X.
Opacity. Let H be any complete sequential history. Now H¯ denotes a history constructed
from H as follows: (1) For every live transaction Tk in H, we insert tryCk ·Ak immediately after
the last event of Tk in H and (2) For every aborted transaction Tk in H, we remove all write
operations in Tk with the matching responses.
3
Definition 1 A complete sequential history H is opaque if there exists a legal complete t-
sequential history S such that (1) H¯ and S are equivalent and (2) S respects ≺H and ≺DUH .
We call such a legal complete t-sequential history S a serialization of H. A weaker property,
called strict serializability [22], guarantees opacity with respect to committed transactions in
H. Obviously, every opaque history is also strictly serializable.
Implementations. We consider an asynchronous shared-memory system in which processes
p1, . . . pN communicate by executing atomic operations on shared base objects.
An STM implementation provides the processes with algorithms for operations readk, writek,
tryCk and tryAk. Without loss of generality, we assume that base objects are accessed with
atomic read-write operations, but we allow the programmer to aggregate a sequence of op-
erations on base objects using clearly demarcated atomic sections: the operations within an
atomic section are to be executed sequentially. The atomic-section construct is general enough
to implement various strong synchronization primitives, such as test-and-set (TAS) or compare-
and-swap (CAS). We assume that atomic sections may only contain a bounded number of
base-object operations.
An execution of an implementation M is a sequence of atomic accesses to base objects
(base-object events), and invocation and responses of the TM operations (TM-events). If a
base-object event is a write or an atomic-section that contains a write (in one of its execution
paths), we say that the event is non-trivial.
A configuration of M (after some execution E) is determined by the states of all base objects
and the states of the processes. An initial state of M is determined by the initial states of base
objects and t-objects. We assume that each base object and each t-object is initialized to 0. A
history of an execution E, denoted by E|TM is the subsequence of E restricted to TM-events.
E|TM,pi denotes the subsequence of E|TM restricted to events issued by process pi.
The interval of a transaction Tk in E is the fragment of E that starts with the first event of
Tk in E and ends with the completing event of Tk (Ak or Ck) in E, or, if Tk has not completed
in E, with the last event of E. A tm-operation op1 precedes op2 in H if the invocation of op2
appears after the response of op1 in H. An execution E is well-formed if every atomic section
is executed sequentially in E, E|TM,pi is t-sequential for each pi, and no event on behalf of a
transaction Tk is taking place outside of an interval between invocation and response of some
TM-operation in Tk. We assume here that a TM implementation generates only well-formed
executions.
A completion of H is a history constructed from H by removing some pending invocations
and adding responses to the remaining pending invocations to the end of H. To account for
initial values of t-objects, we add to the beginning of H a (fictitious) transaction T0 that writes
0 to every t-object and commits.
A complete sequential history H ′ is a linearization of H if there exists a history H ′′, a
completion of H, such that (1) H ′ respects the precedence order of H, and (2) H ′ and H ′′ are
equivalent.
Definition 2 An STM implementation M is opaque if for every execution E of M , there exists
an opaque linearization of E|TM .
3 Preliminaries
In this section, we define some useful properties of STM implementations and prove some simple
facts that follow from these definitions.
4
Access patterns. The definition of STM allows a process to alternate reading and writing to
t-objects arbitrarily in the course of a transaction. Moreover, it allows a process to read from a
t-object that was previously written within the same transaction. We show that this flexibility
can be obtained “for free” given an implementation that only allows a user to read from a set
of t-object and then to write to a set of t-objects within a transaction.
We say that a transaction Tk is canonic in a history H if H|k consists is a sequence of
reads (of distinct t-objects) followed by a sequence of writes (to distinct t-objects). A general
complexity of an STM implementation M accounts for the number of accesses to base-objects
used to implement every given transaction in every execution of M .
Lemma 3 Let M be an opaque STM implementation that can only be accessed with canonic
transactions. Then there exists an opaque STM implementation M ′ that preserves the complexity
of M .
Proof. Let readM , writeM , tryCM and tryAM denote the implementations of the operations
provided by M . Now M ′ is constructed as follows.
We associate every transaction Tk with a local variable Wset(Tk) which contains, at any
moment of time, the current write set of Tk with the values to be written.
When writek(X, v) is invoked, (X, v) is simply added to Wset(Tk) and all other entries of
the form (X, v′) are removed from Wset(Tk). When readk(X) is invoked, we first check if X is
in Wset(Tk) and if so, we return the value stored in Wset(Tk). Otherwise, we invoke read
M
k (X)
and return the obtained value.
When tryCk() is invoked, we first execute writek(X, v) for each (X, v) ∈ Wsetk. Since for
each X there can be at most one entry of the form (X, v), the order in which these operations are
invoked does not matter. Also, since all invocations of writek succeed all invocations of readk,
the resulting sequence of invocations of M on behalf of Tk is a canonic transaction. Operation
tryAk() is implemented as tryA
M
k ().
Since M is opaque, the resulting implementation is also opaque: just use the serialization
of the resulting history of M . Since the modifications of M involve only local variables, the
base-object complexity of M ′ is the same as that of M . 
Therefore, in the rest of the paper, we only consider canonic transactions, which simplifies the
analysis without sacrificing generality.
Disjoint-access parallelism (DAP). In STM implementations, it is considered important
to allow transactions that are not related through their data sets that they access to execute
independently.
Let I be a fragment of an execution E. Following [17, 8], we first define a conflict graph
which relates transactions that are live in I. Vertices of the graph represent t-objects. The
vertices representing distinct t-objects X and Y are related with an edge if and only if there is
a transaction T such that {X,Y } ⊆ Dset(T ) and the interval of T overlaps with I in E.
Two transactions Ti and Tj are disjoint-access in E if there is no path between an item
in Dset(Ti) and an item in Dset(Tj) in the conflict graph of the minimal execution interval
containing the intervals of Ti and Tj .
Two transactions contend on a base-object x in an execution if both of them access x and
and one of these accesses is non-trivial.
Two transactions concurrently contend on a base-object x in an execution if both of them
have pending events on x in the same configuration and and one of them is non-trivial.
5
Definition 4 An STM implementation M is disjoint-access parallel (DAP) if, for all execu-
tions E of M , two processes executing Ti and Tj concurrently contend on the same base object
in E only if Ti and Tj are not disjoint-access.
Definition 5 An STM implementation M is strict disjoint-access parallel (SDAP) if, for all
executions E of M , two processes executing Ti and Tj contend on the same base object in E
only if Ti and Tj have disjoint data sets.
Definition 6 An STM implementation M provides strict data partitioning if every t-object
X is associated with a set of base object β(X) such that ∀X 6= Y , β(X) ∩ β(Y ) = ∅ and a
transaction Ti can access a base object in β(X) only if X ∈ Dset(Ti).
Any STM that provides strict data partitioning is also disjoint-access parallel (but not vice
versa).
Invisible reads and single-version opacity. An STM implementation M uses invisible
reads if no execution of a tm-read operation incurs a write on a base object.
Let H be a sequential history. We say that Ti precedes Tj in H in the single-version order,
and we write Ti ≺SVH Tj if there exists X ∈ Wset(Ti) ∩ Rset(Tj) such that tryCi precedes
readj(X) in H.
A sequential history H is single-version opaque if there exists a legal t-sequential history H ′
such that:
1. H¯ and H ′ are equivalent;
2. H ′ respects ≺H and ≺DUH and
3. H ′ respects ≺SVH .
Now an STM implementation M is single-version opaque if for every execution E of M ,
there exists an opaque single-version linearization of E|TM . Intuitively a single-version opaque
implementation is opaque and maintains exactly one copy of a t-object’s state at any given
moment.
4 Liveness and Progress
To describe the conditions under which a TM implementation does something useful, we need
to address two orthogonal dimensions. First, we need to give a tm-liveness property [3] that
determines the conditions under which an individual tm-operation must return. Second, we
need to give a progress condition that describes the cases in which a transaction must commit.
4.1 TM-liveness properties
A TM implementation M is wait-free if in every infinite execution of M , each tm-operation
returns in a finite number of its own steps, regardless of the behavior of concurrent transactions.
In other words, a wait-free individual tm-operation (read, write, tryC or tryA) cannot be
delayed because of a concurrent operation. The property can be very beneficial if executions of
transactions are subject to unpredictable delays or failures.
In this paper, we do not assume failures: every operation is expected to take steps until it
terminates. Moreover, we are interested in deriving inherent costs of implementing non-trivial
concurrency in TM. Therefore, we assume a weaker default tm-liveness guarantee, that we
6
call starvation-freedom. A TM implementation M is starvation-free in every infinite execution
of M , each tm-operation eventually returns, assuming that no concurrent tm-operation stops
indefinitely before returning. Starvation-freedom allows a tm-operation to be delayed only by
a concurrent tm-operation.
4.2 Progress conditions
A progress condition determines the scenarios in which a transaction is allowed to abort. Tech-
nically, unlike tm-liveness, a progress condition is a safety property [3], since it can be violated
in a finite execution. The simplest non-trivial progress property we consider in this paper is
single-lock progressiveness that says that a transaction can only abort if there is a concurrent
transaction. Clearly, an opaque single-lock TM can be implemented using any mutual exclusion
algorithm [24] with one critical section per transaction. Stronger progress conditions allow some
transactions to progress concurrently in some scenarios implying positive concurrency3.
Progressiveness allows an implementation to abort a transaction only in case of a conflict.
Transactions Ti, Tj conflict in a history H on a t-object X if Ti and Tj are concurrent in H,
X ∈ Dset(Ti) ∩Dset(Tj), and X ∈Wset(Ti) ∪Wset(Tj).
Definition 7 A TM implementation M is (weakly) progressive if for every history H of M
and every transaction Ti ∈ parts(H) that is forcefully aborted, there exists a prefix H ′ of H and
a transaction Tk ∈ parts(H ′) that is live in H ′, such that Tk and Ti conflict in H ′.
The strong progressiveness property [14] additionally requires that in case of a set of transactions
conflict on a single t-object at least one transaction commits. The formal definition is inspired
from [15].
Let CObjH(Ti) denote the set of t-objects over which transaction Ti ∈ parts(H) conflicts
with any other transaction in history H i.e. X ∈ CObjH(Ti) if there exists a transaction
Tk ∈ parts(H), k 6= i, such that Ti conflicts with Tk on X in H. Then, CObjH(Q) =
{CObjH(Ti)|∀Ti ∈ Q}, denotes the union of sets CObjH(Ti) for all transactions in Q.
Let CTrans(H) denote the set of non-empty subsets of parts(H) such that a set Q is in
CTrans(H) if no transaction in Q conflicts with a transaction not in Q.
Definition 8 A TM implementation M is strongly progressive if there does not exist any
history H of M in which for every set Q ∈ CTrans(H) of transactions such that |CObjH(Q)| ≤
1, every transaction in Q is forcefully aborted in H.
But since the goal of this paper is to derive a lower bound, we consider weak progressive
implementations (from now on—simply progressive).
Let C be any correctness property, i.e., any safety property on TM histories [3]. The
following property guarantees that no transaction is forcefully aborted if there is a chance of to
commit the transaction and preserve correctness.
Definition 9 A TM implementation M is permissive with respect to C if for every history H
of M such that H ends with a response rk and replacing rk with some rk 6= Ak gives a history
that satisfies C, we have rk 6= Ak.
Therefore, permissiveness does not allow a transaction to abort, unless committing it would vio-
late the execution’s correctness. In this paper, we consider TM implementations that are permis-
sive with respect to opacity. Clearly, permissiveness with respect to opacity is strictly stronger
than progressiveness: every permissive opaque implementation is also progressive opaque, but
not vice versa.
3This does not include transactions that guarantee obstruction-freedom [12]
7
A transaction in a permissive opaque implementation can only be forcefully aborted if it
tries to commit:
Lemma 10 Let a TM implementation M be permissive with respect to opacity. If a transaction
Ti is forcefully aborted executing an operation opi, then opi is tryCi.
Proof. Suppose, by contradiction, that there exists a history H of M such that some opi ∈
{readi,writei} executed within a transaction Ti returns Ai. Let H0 be the shortest prefix of H
that ends just before opi returns. By definition, H0 is opaque and any history H0 · ri where
ri 6= Ai is not opaque. Let H ′0 be the serialization of H0.
If opi is a write, then H0 ·oki is also opaque - no write operation of the incomplete transaction
Ti appears in H
′
0 and, thus, H
′
0 is also a serialization of H0 · oki.
If opi is a read(X) for some t-object X, then we can construct a serialization of H0 · v where
v is the value of X written by the last committed transaction in H ′0 preceding Ti or the initial
value of X if there is no such transaction. It is easy to see that H0” obtained from H
′
0 by adding
read(X) ·v at the end of Ti is a serialization of H0 ·read(X). In both cases, there exists a non-Ai
response ri to opi that preserves opacity of H0 · ri, and, thus, the only operation that can be
forcefully aborted in an execution of M is tryC. 
Obviously, Lemma 10 implies that there does not exist a permissive single-version TM imple-
mentation.
Multi-version permissiveness. A relaxation of permissiveness, called multi-version permis-
siveness (or mv-permissiveness) [23] says that a transaction Ti can only abort if Ti is updating
and there is a concurrent conflicting updating transaction Tj i.e. a read-only transaction cannot
be aborted.
Lemma 11 There does not exist a mv-permissive TM implementation M that guarantees (wait-
freedom)starvation-freedom of individual tm-operations and single-version opacity.
Proof. By contradiction, suppose that there exists a single-version opaque mv-permissive
implementation M . Consider an execution of M in which transaction T1 sequentially reads X,
then transaction T2 writes to X and Y and commits. Such an execution exists, since none of
these operations can be forcefully aborted in a mv-permissive implementation. Now we extend
this history with T1 reading Y . There is no way to serialize T1 and T2 preserving single-version
opacity, unless read1(Y ) aborts. But a mv-permissive TM implementation does not allow a
read-only transaction to return abort— a contradiction. 
If we relax our tm-liveness property and allow a tm-operation to be delayed by a concurrent
conflicting transaction, then a single-version mv-permissive implementation is possible [7].
Probabilistic permissiveness. Intuitively, a probabilistic permissive TM ensures the prop-
erty of Definition 9 with a positive probability. It is conjectured in [11] that probabilistically
permissive (with respect to opacity) implementations can be considerably cheaper than deter-
ministic ones. This is achieved by choosing the response to a tm-operation opk by sampling
uniformly at random from the set of possible return values (including Ak).
Definition 12 A TM implementation M is permissive with respect to C if for every history
H of M such that H ends with a response rk and replacing rk with some rk 6= Ak gives a history
that satisfies C, we have rk 6= Ak with positive probability.
8
5 RAW/AWAR complexity
Modern CPU architectures perform reordering of memory references for better performance.
Hence, memory barriers/fences are needed to enforce ordering in synchronization primitives
whose correct operation depends on ordered memory references. Attiya et al. [6] formalized
the RAW/AWAR class of synchronization patterns and showed that a wide class of concurrent
algorithm implementations must involve these expensive patterns. We recall the definitions
below.
Let pi be an execution fragment and let pii denote the i-th event in pi (i = 0, . . . , |pi|−1). We
say that process p performs a RAW (read-after-write) in pi if ∃i, j; 0 ≤ i < j < |pi| such that
• pii is a write to a base object x by process p,
• pij is a read of a base object y 6= x by process p and
• there is no pik such that i < k < j and pik is a write to y by p.
We say that two RAWs by process p overlap in an execution E with the read event of the first
RAW occurs after the write event of the second RAW. A multi-RAW consists of series of writes
to a set of base objects followed by a series of reads from different base objects.
We say a process p performs an AWAR (atomic-write-after-read) in pi if ∃i, j, 0 ≤ i < j < |pi|
such that
• pii is a read of a base-object x by process p,
• pij is a write to a base-object y by process p and
• pii and pij belong to the same atomic section.
Examples of AWAR are CAS and mCAS.
6 RAW/AWAR cost of permissive STMs
In this section, we show that an execution of a transaction in a permissive STM implementation
may require to perform at least one RAW/AWAR pattern per tm-read.
Let M be a permissive, opaque TM implementation. Consider an execution E of M with
a history H consisting of transactions T1, T2, T3 as shown in Figure 1: T3 performs a read of
X1, then T2 performs a write on X1 and commits, and finally T1 performs a series of reads
from objects X1, . . . , Xm. Here, Rk(X), Wk(X, v) denote complete executions of readk(X)
and writek(X, v) respectively. Since the implementation is permissive, no transaction can be
forcefully aborted in E, and the only valid serialization of this execution is T3, T2, T1. Note
also that the execution generates a sequential history: each invocation of a tm-operation is
immediately followed by a matching response in H. Thus, since we assume starvation-freedom
as a liveness property, such an execution exists.
Imagine that we modify the execution E as follows. Immediately after R1(Xk) executed by
T1 we add W3(X, v), and tryC3 executed by T3 (let TC3(Xk) denote the complete execution
of W3(Xk, v) followed by tryC3). Obviously, TC3(Xk) must return abort: neither T3 can be
serialized before T1 nor T1 can be serialized before T3. On the other hand if TC3(Xk) takes
place just before R1(Xk), then TC3(Xk) must return commit but R1(Xk) must return the value
written by T3. In other words, R1(Xk) and TC3(Xk) are strongly non-commutative [6]: both
of them see the difference when ordered differently. As a result, intuitively, R1(Xk) needs to
perform a RAW or AWAR to make sure that the order of these two “conflicting” operations is
properly maintained. A formal proof follows.
9
R1(X1)
tryC2
R1(Xm)
W2(X1, v)
R3(X1)
T1
T2
C2
T3
Figure 1: Execution E of a permissive, opaque STM: T2 and T3 force T1 to perform a RAW/AWAR in
each R1(Xk), 2 ≤ k ≤ m
Theorem 13 Let M be a permissive opaque STM implementation. Then, for any m ∈ N, M
has an execution in which some transaction performs m tm-reads such that the execution of
each tm-read contains at least one RAW or AWAR.
Proof. We consider R1(Xk), 2 ≤ k ≤ m in execution E.
Imagine a modification E′ of E, in which T3 performs W3(Xk) immediately after R1(Xk)
and then tries to commit. A serialization of H ′ = E′|TM should obey T3 ≺DUH′ T2 and T2 ≺H′ T1.
The execution of R1(Xk) does not modify base objects, hence, T3 does not observe R1(Xk) in
E′. Since M is permissive, T3 must commit in E′. But since T1 performs R1(Xk) before T3
commits and T3 updates Xk, we also have T1 ≺DUH′ T3. Thus, T3 cannot precede T1 in any
serialization—contradiction. Consequently, each R1(Xk) must perform a write to a base object.
Let pi be a fragment of E that represents the complete execution of R1(Xk). Clearly, pi
contains a write to a base object. Let pij be the first write to a base object in pi and piw, the
shortest fragment of pi that contains the atomic section to which pij belongs, else if pij is not
part of an atomic section, piw = pij . Thus, pi can be represented as pis · piw · pif .
Suppose that pi does not contain a RAW or AWAR. Since piw does not contain an AWAR,
there are no read events in piw that precede pij . Thus, pij is the first base object event in
piw. Consider the execution fragment pis · ρ, where ρ is the complete execution of TC3(Xk) by
T3. Such an execution exists since pis does not perform any base object write, hence, pis · ρ is
indistinguishable to T3 from ρ.
Since, by our assumption, piw · pif contains no RAW, any read performed in piw · pif can
only be applied to base objects previously written in piw · pif . Thus, there exists an execution
pis · ρ · piw · pif that is indistinguishable to T1 from pi. In pis · ρ · piw · pif , T3 commits (as in ρ)
but T1 ignores the value written by T3 to Xk. But T3, T2, T1 is the only valid serialization for
E|TM—contradiction. Thus, each R1(Xk), 2 ≤ k ≤ m must contain a RAW/AWAR.
Note that since all tm-reads of T1 are executed sequentially, all these RAW/AWAR patterns
are pairwise non-overlapping. 
7 RAW/AWAR cost and protected data in progressive STMs
In this section, we first describe our progressive STM implementations that perform at most
one RAW/AWAR per transaction. Then we present a lower bound on the amount of data to
be protected by a transaction in a progressive STM.
7.1 Constant RAW/AWAR implementations for progressive STM
We start with showing that even a single-lock progressive STM cannot avoid performing one
RAW/AWARs per transaction in some executions.
10
Theorem 14 Let M be a single-lock progressive opaque STM implementation. Then every
execution of M in which an uncontended transaction performs at least one read and at least one
write contains a RAW/AWAR pattern.
Proof. Consider an execution pi of M in which an uncontended transaction T1 performs (among
other events) read1(X), write1(Y, v) and tryC1(). Since M is single-lock progressive, T1 must
commit in pi. Clearly pi must contain a write to a base object. Otherwise a subsequent trans-
action reading Y would return the initial value of Y instead of the value written by T1.
Let pij be the first write to a base object in pi and let piw denote the shortest fragment of
pi that contains the atomic section to which pij belongs (piw = pij if pij is not part of an atomic
section). Thus, pi can be represented as pis · piw · pif .
Now suppose, by contradiction, that pi contains neither RAW nor AWAR patterns. Since
piw contains no AWAR, there are no read events in piw that precede pij . Since pij is the first
write event in pi, it follows that pij is the first base-object event in piw.
Since pis contains no writes, the states of base objects in the initial configuration and in the
configuration after pis is performed are the same. Consider an execution pis · ρ where in ρ, a
transaction T2 performs read2(Y ), write2(X, 1), tryC2() and commits. Such an execution exists,
since ρ is indistinguishable to T2 from an execution in which T2 is uncontended and thus T2
cannot be forcefully aborted in pis · ρ.
Since piw · pif contains no RAWs, every read performed in piw · pif is applied to base objects
which were previously written in piw · pif . Thus, there exists an execution pis · ρ · piw · pif , such
that T1 cannot distinguish pis · piw · pif and pis · ρ · piw · pif . Hence, T1 commits in pis · ρ · piw · pif .
But both T1 reads the initial value of X and T2 reads the initial value of Y in pis · ρ ·piw ·pif ,
and thus T1 and T2 cannot be both committed (at least one of the committed transactions must
read the value written by the other)—a contradiction.
The proof is analogous in the case when an execution of T1 extends any execution pi0 that
contains only complete transactions. 
Since every progressive or permissive STM implementation is also single-lock progressive, the
RAW/AWAR lower bound of Theorem 14 also holds for progressive and permissive STM im-
plementations. The lower bound is actually tight, and we sketch two progressive opaque imple-
mentations. Both implementations are strict data-partitioned [15] (split the set of base objects
used into disjoint subsets, each subset storing information of only a single t-object) and single-
version (maintain exactly one copy of a t-object’s state at a time). They also use invisible reads,
i.e., no execution of a tm-read operation performs a write to a base object.
Our first implementation employs a mCAS primitive4 and works, in brief, as follows. Every
t-object Xi is associated with a distinct base object vi that stores the “most recent” value of Xi
together with the id of the transaction that was the last to update Xi. Each time a transaction
Tk performs a read of a t-object Xi, it reads vi, adds Xi to its read set and checks if the t-objects
in the current read set of Tk have not been updated since Tk has read them. If this is not the
case the transaction is forcefully aborted. Otherwise, Tk returns the value read in vi. Each time
Tk performs a write to a t-object Xi, it adds Xi to its write set and returns ok.
For every updating transaction Tk, tryCk() invokes the mCAS primitive over Dset(Tk). If
the mCAS returns true, tryCk() returns Ck, otherwise it returns Ak. Clearly, if Tk is forcefully
aborted, then the execution of mCAS involved no AWAR (no write to a base object took place).
Read-only transactions simply returns Ck. Consequently, the implementation incurs a single
AWAR per updating committed transaction.
4 In mCAS(V,OV,NV ) [5], executed atomically, a process reads an array V of m objects V , and if for each i,
V [i] = OV [i], it replaces each V [i] with NV [i] and returns true, otherwise it returns false and leaves the objects
unchanged.
11
Theorem 15 There exists a progressive opaque STM implementation with wait-free operations
that employs exactly one AWAR per transaction. Moreover, no AWARs are performed in read-
only or aborted transactions.
Even if we do not use atomic sections (and, thus, AWARs) we still can implement a progressive
opaque STM using reads and writes that incurs only a single multi-RAW (and, thus, incurring
just a single fence) per update transaction. This implementation uses a simple multi-trylock
primitive, which in turn can be implemented with a single multi-RAW. The multi-trylock prim-
itive exports operations acquire(W ), release(W ) and isContended(X), for all sets of t-objects
W and all t-objects X. Informally, if there is no contention on the locks on objects in W , then
acquire(W ) returns true which means that exclusive locks on all objects in W are acquired.
Otherwise, acquire(W ) returns false which means that no locks on objects in W are acquired.
Operation release(W ) releases the acquired locks on objects in W and isContended(X) returns
true iff a lock on X is currently held by any process. The implementation of acquire(W ) first
writes to a series of base objects and then reads a series of base objects incurring a single
multi-RAW, while operations release(W ) and isContended(X) incur no RAW.
Implementations of reads and writes are similar to ones described above, except that each
time a transaction Tk performs a read of a t-object Xi, it additionally checks if no object in the
current read set is locked by an updating transaction. If some object in the read set has been
modified or is locked, the transaction is forcefully aborted.
For every updating transaction Tk, tryCk() invokes acquire(Wset(Tk)). If it returns true,
tryCk() returns Ck, otherwise it returns Ak. Read-only transactions simply returns Ck. Conse-
quently, the implementation incurs a single multi-RAW per updating transaction.
Theorem 16 There exists a progressive opaque STM implementation with wait-free operations
that employs a single multi-RAW per transaction. Moreover, no RAWs are performed in read-
only transactions.
We also derive a strongly progressive STM using only reads and writes that incurs at most
four RAWs per updating transaction and uses a finite number of bounded registers. Our im-
plementation uses a starvation-free multi-trylock primitive inspired by the Black-White Bakery
Algorithm [25], a bounded version of the Bakery Algorithm [18].
Informally, if no concurrent process contends infinitely long on some X ∈ W , then the
acquire(W ) operation of the starvation-free multi-trylock eventually returns true which means
that exclusive locks on all objects in W are acquired. The implementation of acquire(W ) incurs
three RAWs, while operation release(W ) performs a single RAW.
Implementations of tm-reads and tm-writes are identical to the constant RAW progressive
implementation described above. For every updating transaction Tk, tryCk() invokes the acquire
operation of the starvation-free multi-trylock over Wset(Tk). Note that this always returns true
and a transaction Tk with Rsetk = ∅ eventually returns Ck. Read-only transactions simply
returns Ck. Consequently, the implementation incurs four RAWs per updating transaction.
Theorem 17 There exists a strongly progressive single-version opaque STM implementation
with starvation-free operations that uses invisible reads and employs four RAWs per transaction.
Moreover, no RAWs are performed in read-only transactions.
Note that our implementation does not violate the impossibility result of Guerraoui and Ka-
palka [15] who proved that a strongly progressive opaque STM cannot be implemented using
only reads and writes if tm-operations are required to be wait-free.
12
7.2 Protected data
Let M be a progressive STM implementation. Intuitively, a t-object Xj is protected at the end
of some finite execution pi of M if some transaction T0 is about to atomically change the value
of Xj in its next step (e.g., by performing a CAS operation) or does not allow any concurrent
transaction to read Xj (e.g., by holding a lock on Xj).
Formally, let α · pi be an execution of M such that pi is an uncontended complete execution
of a transaction T0, where Wset(T0) = {X1, . . . , Xm}. Let uj (j = 1, . . . ,m) denote the value
written by T0 to t-object Xj in pi. We say that pi
′ is a proper prefix of pi if pi′ is a prefix of pi and
every atomic section is complete in pi′. In this section, let pit denote the t-th shortest proper
prefix of pi. Let pi0 denote the empty prefix. (Recall that an atomic event is either a tm-event,
a read or write on a base object, or an atomic section.)
For any Xj ∈Wset(T0), let Tj denote a transaction that tries to read Xj and commit. Let
Etj = α ·pit ·ρtj denote the extension of α ·pit in which Tj runs solo until it completes. Note that,
since we only require the implementation to be starvation-free, ρtj can be infinite.
We say that α · pit is (1, j)-valent if the read operation performed by Tj in α · pit · ρtj returns
uj (the value written by T0 to Xj). We say that α · pit is (0, j)-valent if the read operation
performed by Tj in α · pit · ρtj does not abort and returns an ”old” value u 6= uj . Otherwise, if
the read operation of Tj aborts or never returns in α ·pit · ρtj , we say that α ·pit is (⊥, j)-valent.
Definition 18 We say that T0 protects an object Xj in α · pit, where pit is the t-th shortest
proper prefix of pi (t > 0) if one of the following conditions holds: (1) α · pit is (0, j)-valent and
α · pit+1 is (1, j)-valent, or (2) α · pit or α · pit+1 is (⊥, j)-valent.
For strict disjoint-access parallel (SDAP) progressive STM, we show that every uncontended
transaction must protect every object in its write set at some point of its execution.
We observe that the no prefix of pi can be 0 and 1-valent at the same time (notations used
are the same as introduced in Section 7.2).
Lemma 19 There does not exist pit, a proper prefix of pi, and i, j ∈ {1, . . . ,m} such that α · pit
is both (0, i)-valent and (1, j)-valent.
Proof. By contradiction, suppose that there exist i, j and α · pit that is both (0, i)-valent and
(1, j)-valent. Since the implementation is SDAP, there exists an execution ofM , Etij = α·pit·ρtj ·ρti
that is indistinguishable to Ti from α ·pit ·ρti. In Etij , the only possible serialization is T0, Tj , Ti.
But Ti returns the “old” value of Xi and, thus, the serialization is not legal—a contradiction.

If α ·pit is (0, i)-valent (resp., (1, i)-valent) for some i, we say that it is 0-valent (resp., 1-valent).
By Lemma 19, the notions of 0-valence and 1-valence are well-defined.
Theorem 20 Let M be a progressive, opaque and strict disjoint-access-parallel STM imple-
mentation. Let α · pi be an execution of M , where pi is an uncontended complete execution of
a transaction T0. Then there exists pi
t, a proper prefix of pi, such that T0 protects |Wset(T0)|
t-objects in α · pit.
Proof. Let WsetT0 = {X1, . . . , Xm}. Consider two cases:
(1) Suppose that pi has a proper prefix pit such that α · pit is 0-valent and α · pit+1 is 1-valent.
By Lemma 19, there does not exists i, such that α · pit is (1, i)-valent and α · pit+1 is
(0, i)-valent. Thus, one of the following are true
13
• For every i ∈ {1, . . . ,m}, α · pit is (0, i)-valent and α · pit+1 is (1, i)-valent
• At least one of α · pit and α · pit+1 is (⊥, i)-valent i.e. the operation of Ti aborts or
never returns
In either case, T0 protects m t-objects in α · pit.
(2) Now suppose that such pit does not exists, i.e., there is no i ∈ {1, . . . ,m} and t ∈ {0, |pi|−1}
such that Eti exists and returns an old value, and E
t+1
i exists and returns a new value.
Suppose there exists s, t, 0 < s+ 1 < t, S ⊆ {1, . . . ,m}, such that:
• α · pis is 0-valent,
• α · pit is 1-valent,
• for all r, s < r < t, and for all i ∈ S, α · pir is (⊥, i)-valent.
We say that s+ 1, . . . , t− 1 is a protecting fragment for t-objects {Xj |j ∈ S}.
Since M is opaque and progressive, α · pi0 = α is 0-valent and α · pi is 1-valent. Thus, the
assumption of Case (2) implies that for each Xi, there exists a protecting fragment for
{Xi}. In particular, there exists a protecting fragment for {X1}.
Now we proceed by induction. Let pis+1, . . . , pit−1 be a protecting fragment for {X1, . . . , Xu−1}
such that u ≤ m.
Now we claim that there must be a subfragment of s+1, . . . , t−1 that protects {X1, . . . , Xu}.
Suppose not. Thus, there exists r, s < r < t, such that α · pir is (0, u)-valent or (1, u)-
valent. Suppose first that α ·pir is (1, u)-valent. Since α ·pis is (0, i)-valent for some i 6= u,
by Lemma 19 and the assumption of Case (2), there must exist s′, t′, s < s′ + 1 < t′ ≤ r
such that
• α · pis′ is 0-valent,
• α · pit′ is 1-valent,
• for all r′, s′ < r′ < t′, α · pir′ is (⊥, u)-valent.
As a result, s′ + 1, . . . , t′ − 1 is a protecting fragment for {X1, . . . , Xu}. The case when
α · pir is (0, u)-valent is symmetric, except that now we should consider fragment r, . . . , t
instead of s, . . . , r.
Thus, there exists a subfragment of s+ 1, . . . , t− 1 that protects {X1, . . . , Xu}. By in-
duction, we obtain a protecting fragment s′′ + 1, . . . , t′′ − 1 for {X1, . . . , Xm}. Thus, any
prefix α · pir, where s′′ < r < t′′ protects exactly m t-objects.
In both cases, there is a proper prefix of α · pi that protects exactly m t-objects. 
The lower bound of Theorem 20 is tight: it is matched by all progressive implementations we
are aware of, including ones in Section 7.1. Note that any DAP single-lock STM implementation
automatically provides a stronger progress condition than just single-lock progressiveness. A
transaction T in a DAP single-lock STM can only be forcefully aborted if it observes a concurrent
transaction T ′ such that Dset(T )∩Dset(T ′) 6= ∅. This is not very far from progressiveness, where
T may abort only if T and T ′ experience a write-write or write-read conflict on a t-object. Thus,
in the realm of DAP STM implementations, progressiveness is very close to the weakest non-
trivial progress condition.
14
8 Related work
Crain et al. [9] proved that a permissive opaque TM implementation cannot maintain invisible
reads, which inspired the derivation of our lower bound on RAW/AWAR complexity in Section 6.
The RAW/AWAR complexity for concurrent implementations was recently introduced in
[6]. The proofs of Theorems 13 and 14 extend the arguments used in [6] to the STM context.
A related paper by Attiya et al. [8] showed that every permissive strictly serializable and
DAP TM in which every read-only transaction must commit in a wait-free manner has an
execution in which some read-only transaction Tk performs at least |Dset(Tk)|-1 base-object
writes. In this paper we do not assume that a read operation must be wait-free and we do not
require disjoint-access parallelism. Also, we focus the number of RAW/AWAR patterns and
not only base-object writes. On the other hand, we consider a stronger correctness property
(opacity). Therefore, our lower bound in Section 6 incomparable with the one of [8].
To establish the lower bound on t-objects that must be ”protected” in an opaque, progressive
TM (Section 7.2), we use the definition of disjoint-access parallelism introduced in [8]. Guerraoui
and Kapalka [15] considered a stronger version of DAP called strict data-partitioning to prove
a linear lower bound on the number of steps performed by a successful read operation in a
progressive, opaque TM that uses invisible reads. Interestingly, the constant RAW/AWAR
implementations of progressive, opaque TMs sketched in Section 7 are strict data-partitioned.
9 Concluding remarks
In this paper, we derived inherent costs of implementing STMs with non-trivial concurrency
guarantees. At a high level, our results suggest that providing high degrees of concurrency
in STM may incur considerable unavoidable costs. Our results give rise to many intriguing
questions, and we list some of them below.
In this paper, we focused on progress conditions that provide positive concurrency, progres-
siveness and permissiveness. The results do not apply to obstruction-free STMs [12] that only
guarantee that a transaction commits if it eventually runs without contention. Effectively, an
obstruction-free STM provides zero concurrency, since progress is guaranteed only when one
transaction is active at a time. However, unlike single-lock implementations, it does allow over-
lapping transactions to make progress (one at a time). Does this incur higher RAW/AWAR
complexity?
We cannot expect the lower bound of Theorem 20 (the protected-data size) to apply to non-
DAP STMs, including trivial ones that allow storing the state of the whole STM in one base
object. One way to avoid trivialities is to assume that a base object can store information only
about a constant number of t-objects (the constant-size information property in [13]) which
can potentially give asymptotically close results.
We focused on implementations that allow a tm-operation to be delayed only by concurrent
operations performed by other transactions. Does relaxing the tm-liveness property by allowing
a read operation to wait until a concurrent transaction terminates [7] improve the RAW/AWAR
complexity with respect to permissive implementations? It is easy to see that the proof of our
permissive lower bound (Theorem 13) does not work for this case. But it is unclear a priori
how this may affect the cost of progressive implementations.
Last but not least, the results of this paper assume opacity as a correctness property. Re-
cently, multiple relaxations of opacity were proposed [10, 2, 9, 8]. It would be very interesting
to understand the concurrency benefits gained by such relaxed consistency conditions.
15
Acknowledgements. The authors are grateful to Michel Raynal and Rachid Guerraoui for
inspiring discussions on the properties and costs of STM and Damien Imbs for valuable com-
ments on the previous drafts. The comments and suggestions of anonymous reviewers on an
earlier version of this paper are also gratefully acknowledged.
References
[1] Sarita V. Adve and Kourosh Gharachorloo. Shared memory consistency models: A tutorial. IEEE
Computer, 29(12):66–76, 1996.
[2] Yehuda Afek, Adam Morrison, and Moran Tzafrir. View transactions: Transactional model with
relaxed consistency checks. In PODC ’10: Proceedings of the 29th Annual ACM SIGACT-SIGOPS
Symposium on Principles of Distributed Computing, 2010.
[3] Bowen Alpern and Fred B. Schneider. Defining liveness. Information Processing Letters, 21(4):181–
185, October 1985.
[4] Thomas E. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors.
IEEE Trans. Parallel Distrib. Syst., 1(1):6–16, 1990.
[5] H. Attiya and D. Hendler. Time and space lower bounds for implementations using k-cas. Parallel
and Distributed Systems, IEEE Transactions on, 21(2):162 –173, feb. 2010.
[6] Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, and Martin Vechev
Maged Michael. Laws of order: Expensive synchronization in concurrent algorithms cannot be
eliminated. In POPL, 2011.
[7] Hagit Attiya and Eshcar Hillel. Single-version stms can be multi-version permissive (extended
abstract). In ICDCN, pages 83–94, 2011.
[8] Hagit Attiya, Eshcar Hillel, and Alessia Milani. Inherent limitations on disjoint-access parallel
implementations of transactional memory. In Proceedings of the twenty-first annual symposium on
Parallelism in algorithms and architectures, SPAA ’09, pages 69–78, New York, NY, USA, 2009.
ACM.
[9] Tyler Crain, Damien Imbs, and Michel Raynal. Read invisibility, virtual world consistency and
permissiveness are compatible. Research Report, ASAP - INRIA - IRISA - CNRS : UMR6074 -
INRIA - Institut National des Sciences Applique´es de Rennes - Universite´ de Rennes I, 11 2010.
[10] Pascal Felber, Vincent Gramoli, and Rachid Guerraoui. Elastic transactions. In DISC ’09: Proceed-
ings of the 23rd International Symposum on Distributed Computing, volume 5805 of LNCS, pages
93–107, sep 2009.
[11] Rachid Guerraoui, Thomas A. Henzinger, and Vasu Singh. Permissiveness in transactional memories.
In DISC, pages 305–319, 2008.
[12] Rachid Guerraoui and Michal Kapalka. On obstruction-free transactions. In Proceedings of the
twentieth annual symposium on Parallelism in algorithms and architectures, SPAA ’08, pages 304–
313, New York, NY, USA, 2008. ACM.
[13] Rachid Guerraoui and Michal Kapalka. On the correctness of transactional memory. In PPOPP,
pages 175–184, 2008.
[14] Rachid Guerraoui and Michal Kapalka. The semantics of progress in lock-based transactional mem-
ory. In POPL, pages 404–415, 2009.
[15] Rachid Guerraoui and Michal Kapalka. Principles of Transactional Memory,Synthesis Lectures on
Distributed Computing Theory. Morgan and Claypool, 2010.
[16] Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Flat combining and the synchronization-
parallelism tradeoff. In SPAA, pages 355–364, 2010.
16
[17] Amos Israeli and Lihu Rappoport. Disjoint-access-parallel implementations of strong shared memory
primitives. In Proceedings of the thirteenth annual ACM symposium on Principles of distributed
computing, PODC ’94, pages 151–160, New York, NY, USA, 1994. ACM.
[18] Leslie Lamport. A New Solution of Dijkstra’s Concurrent Programming Problem. Commun. ACM,
17(8):453–455, 1974.
[19] Jaejin Lee. Compilation Techniques for Explicitly Parallel Programs. PhD thesis, Department of
Computer Science, University of Illinois at Urbana-Champaign, 1999.
[20] Paul McKenney. Concurrent code and expensive instructions. Linux Weekly News, January 2011.
http://lwn.net/Articles/423994/.
[21] Paul E. McKenney. Memory barriers: a hardware view for software hackers. Linux Technology
Center, IBM Beaverton, June 2010.
[22] Christos H. Papadimitriou. The serializability of concurrent database updates. J. ACM, 26:631–653,
October 1979.
[23] Dmitri Perelman, Rui Fan, and Idit Keidar. On maintaining multiple versions in stm. In Proceeding
of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing, PODC ’10,
pages 16–25, New York, NY, USA, 2010. ACM.
[24] Michel Raynal. Algorithms for Mutual Exclusion. MIT Press, 1986.
[25] Gadi Taubenfeld. The black-white bakery algorithm and related bounded-space, adaptive, local-
spinning and fifo algorithms. In DISC ’04: Proceedings of the 23rd International Symposum on
Distributed Computing, 2004.
17
A Constant RAW/AWAR implementations for progressive TM
This section presents the pseudo-code for single RAW and single AWAR implementations of
progressive opaque STMs and their proofs of correctness. The single RAW implementation uses
a multi-trylock primitive described below, while the single AWAR implementation uses a mCAS
primitive. Finally, we describe the read-write implementation of a strongly progressive STM
that employs at most four RAWs per transaction. In the implementations, every t-object Xi
is associated with a distinct base object vi that stores the “most recent” value of Xi together
with the id of the transaction that was the last to update Xi.
A.1 Multi-trylock
Algorithm 1 Multi-trylock invoked by process pi
1: Shared variables:
2: rij , for each process pi and each t-object Xj
3: acquire(Q):
4: for all Xj ∈ Q do
5: write(rij , 1)
6: if ∃Xj ∈ Q; t 6= i : rtj = 1 then
7: for all Xj ∈ Q do
8: write(rij , 0)
9: return false
10: return true
11: release(Q):
12: for all Xj ∈ Q do
13: write(rij , 0)
14: return ok
15: isContended(Xj):
16: if ∃pt : rtj 6= 0, t 6= i then
17: return true
18: return false
A multi-trylock provides exclusive write-access to a set Q of t-objects. Specifically, a multi-
trylock exports the following operations
• acquire(Q) returns true or false
• release(Q) releases the lock and returns ok
• isContended(Xj), Xj ∈ Q returns true or false
We assume that processes are well-formed: they never invoke a new operation on the multi-
trylock before receiving response from the previous invocation.
We say that a process pi holds a lock on Xj after an execution pi if pi contains the invocation
of acquire(Q), Xj ∈ Q by pi that returned true, but does not contain a subsequent invocation
of release(Q′), Xj ∈ Q′, by pi in pi. We say that Xj is locked after pi by process pi if pi holds a
lock on Xj after pi.
We say that Xj is contended by pi after an execution pi if pi contains the invocation of
acquire(Q), Xj ∈ Q, by pi but does not contain a subsequent return false or return of release(Q′),
Xj ∈ Q′, by pi in pi.
Let an execution pi contain the invocation iop of an operation op followed by a corresponding
response rop (we say that pi contains op). We say that Xj is uncontended (resp., locked) during
the execution of op in pi if Xj is uncontended (resp., locked) after every prefix of pi that contains
iop but does not contain rop.
A multi-trylock implementation satisfies the following properties:
• Mutual-exclusion: For any object Xj , and any execution pi, there exists at most one
process that holds a lock on Xj after pi.
18
• Progress: Let pi be any execution that contains acquire(Q) by process pi. If no object in Q
is contended during the execution of acquire(Q) by a process pk 6= pi in pi then acquire(Q)
returns true in pi.
• Let pi be any execution that contains isContended(Xj) invoked by pi.
– If Xj is locked by p`; ` 6= i during the complete execution of isContended(Xj) in pi,
then isContended(Xj) returns true.
– If ∀` 6= i, Xj is never contended by p` during the complete execution of isContended(Xj)
in pi, then isContended(Xj) returns false.
Note that ifXj is neither locked or uncontended during the complete execution of isContended(Xj),
then either of true and false can be returned.
Theorem 21 Algorithm 1 is an implementation of multi-trylock object in which every operation
is wait-free, every operation incurs at most one multi-RAW, and isContended involves no base-
object writes
Proof. Denote by L the shared object implemented by Algorithm 1. The operations exported
by L are wait-free i.e. every operation returns a value to the invoking process after a finite
number of its own steps. This follows from the fact that the implementation of acquire, release
and isContended described by Algorithm 1 contains no unbounded loops or waiting statements.
Assume, by contradiction, that L does not provide mutual-exclusion: there exists an exe-
cution pi after which processes pi and pk hold a lock on the same object, say Xj . In order to
hold the lock on Xj , process pi writes 1 to register rij and then checks if any other process
pk has written 1 to rkj . Since the corresponding operation acquire(Q), Xj ∈ Q invoked by pi
returns true, pi read 0 in rkj in Line 6. But then pk also writes 1 to rkj and later finds that rij
is 1. This is because pk can write 1 to rkj only after the read of rkj returned 0 to pi which is
preceded by the write of 1 to rij . Hence, there exists an object Xj such that rij = 1; i 6= k, but
the conditional in Line 6 returns true to process pk— a contradiction.
L also ensures progress. This is trivial since some process pi wishing to hold a lock on Xj
in an execution pi invokes acquire(Q), Xj ∈ Q which writes 1 to register rij and then checks
if any other process pk has written to register rkj . If no other process contends on Xj during
the execution of acquire(Q), the conditional on Line 6 returns true and respectively, acquire(Q)
must return true.
Let pi be any execution that contains isContended(Xj) executed by pi. If no process contends
on Xj during the execution of isContended(Xj) in pi, pi finds 0 in rtj = 0, ∀t and the conditional
in Line 27 returns false. However, if Xj is locked during the execution of isContended(Xj) in pi,
at any point of the execution there exists t such that rtj = 1. Thus, the conditional in Line 27
returns true and, respectively, isContended(Xj) must return true.
The implementation of isContended(Xj) only reads base objects. The implementation of
acquire(Q) first writes to a series of base objects and then reads a series of base objects incurring
a single multi-RAW. The implementation of release(Q) only writes to base objects. 
A.2 Progressive implementation with single multi-RAW
Algorithm 2 describes the algorithms for tm-operations of a progressive opaque STM incurring
at most a single multi-RAW per transaction.
19
Algorithm 2 Progressive STM with one multi-RAW: the implementation of Tk executed by pi
1: Shared variables:
2: vj , for each t-object Xj
3: L, a multi-trylock object
4: readk(Xj):
5: ovj := read(vj)
6: Rset(Tk) := Rset(Tk) ∪ {Xj}
7: if isAbortable() then
8: return Ak
9: return the value of ovj
10: writek(Xj , v):
11: if Xj 6∈Wset(Tk) then
12: nvj := v
13: Wset(Tk) := Wset(Tk) ∪ {Xj}
14: return okk
15: tryAk():
16: return Ak
17: tryCk():
18: if |Wset(Tk)| = ∅ then
19: return Ck
20: locked := L.acquire(Wset(Tk))
21: if not locked then
22: return Ak
23: if isAbortable() then
24: L.release(Wset(Tk))
25: return Ak
26: for all Xj ∈Wset(Tk) do
27: write(vj , (nvj , k))
28: L.release(Wset(Tk))
29: return Ck
30: Function: isAbortable():
31: if ∃Xj ∈ Rset(Tk) : L.isContended(Xj) then
32: return true
33: if isInvalid() then
34: return true
35: return false
36: Function: isInvalid():
37: if ∃Xj ∈ Rset(Tk):ovj 6= read(vj) then
38: return true
39: return false
Each time a transaction Tk performs a read of a t-object Xi, it reads vi, adds Xi to its read
set and checks if the t-objects in the current read set of Tk have not been updated since Tk has
read them and additionally checks if no object in the current read set is locked by an updating
transaction. If some object in the read set has been modified or is locked, the transaction is
forcefully aborted. Otherwise, Tk returns the value read in vi.
Each time Tk performs a write to a t-object Xi, it adds Xi to its write set and returns ok.
The implementation of tryCk() uses the multi-trylock primitive described in Section A.1.
For every updating transaction Tk, tryCk() invokes L.acquire(Wset(Tk)), where L denotes the
multi-trylock implemented by Algorithm 1. If it returns true, tryCk() returns Ck, otherwise it
returns Ak. Read-only transactions simply returns Ck.
A.2.1 Proof of opacity
Let E by any execution of the TM implemented by Algorithm 2. Recall that we assume every
t-object was initialized by some fictitious committed transaction T0 that precedes E. Let <E
denote a total-order on events in E.
Linearization points. Let H denote a linearization of E|TM constructed by selecting lin-
earization points of tm-operations performed in E|TM . The linearization point of a tm-operation
op, denoted as `op is associated with a base object event or a tm-event performed during the
lifetime of op using the following procedure.
First, we obtain a completion of E|TM by removing some pending invocations and adding
responses to the remaining pending invocations involving a transaction Tk as follows:
• Every incomplete readk, writek or tryAk operation is removed from E|TM
20
• For every pending tryCk, if some base object vj was written (Line 12), the response CK
is added to the end of E|TM , else Ak is added to the end of E|TM
Now a linearization H of E|TM is obtained by associating linearization points to tm-operations
in the obtained completion of E|TM as follows:
• For every tm-read opk that returns a non-Ak value, `opk is chosen as the event in Line 5
of Algorithm 2, else, `opk is chosen as invocation event of opk
• For every tm-write or tm-abort opk that returns, `opk is chosen as the invocation event of
opk
• For every opk = tryCk that returns Ck such that Wset(Tk) 6= ∅, `opk is associated with the
successful acquisition of the lock on Wset(Tk) (at the end of Line 7), else if opk returns
Ak, `opk is associated with the invocation event of opk
• For every opk = tryCk that returns Ck such that Wset(Tk) = ∅, `opk is associated with
Line 6
<H denotes a total-order on tm-operations in the complete sequential history H.
Serialization points. The serialization of a transaction Tj , denoted as δTj is associated with
the linearization point of a tm-operation performed within the lifetime of the transaction.
We obtain a t-complete history H¯ from H as follows:
• For every transaction Tk in H that is live, we insert tryCk ·Ak immediately after the last
event of Tk in H.
• For every aborted transaction Tk in H, we remove each write operation in Tk with the
matching response
H¯ is thus a t-complete sequential history that contains only updating committed transactions
and read-only transactions since every aborted transaction is reduced to its read-prefix. A
serialization S is obtained by associating serialization points to transactions in H¯ as follows:
• If Tk is an update transaction that commits, then δTk is `tryCk
• If Tk is a read-only or aborted transaction, then δTk is assigned to the linearization point
of the last tm-read that returned a non-Ak value in Tk
<S denotes a total-order on transactions in the t-sequential history S.
Lemma 22 If Ti ≺DUH Tj, then Ti <S Tj
Proof. Recall that Ti precedes Tj in the deferred-update order if there exists X ∈ Rset(Ti) ∩
Wset(Tj), Tj has committed, such that the response of read i(X) precedes the invocation of
tryCj() in H. Thus, `readi(X) <E `tryCj .
Consider the histories depicted in Figure 2 where Ti precedes Tj in the deferred-update order
(tryCk(Xj) denotes a tryCk such that Xj ∈Wset(Tk)).
(1) Consider the history depicted in Figure 2(A) where Ti is a read-only transaction and Tj
is an updating transaction that returns Cj . Assume the contrary that Ti ≺DUH Tj , but
Tj <S Ti, which implies that δTj <E δTi i.e. `tryCj(X) precedes the linearization point of
the last tm-read in Ti that returns a non-Ai value (say read i(X
′)). Thus, successful lock
21
(A) (B)
`Ri(X)
`Ri(X
′) ? `Ri(X′) ?
`tryCj(X)
Ti
Tj
`Ri(X1)
`tryCi(X2)
?
`tryCi(X2)
?
`tryCj(X1)
Ti
Tj
Figure 2: Assignment of serialization points respects the deferred-update order
acquisition on X by Tj in Line 7 precedes the read of the base object associated with X
′
by Ti in Line 5.
read i(X
′) checks if any object in Rset(Ti) is locked by a concurrent transaction, then
performs read-validation (Line 7). Consider the following possible sequence of events: Tj
acquires the lock on X, updates X to shared-memory, Ti reads the base object associated
with X ′, Tj releases the lock and finally Ti performs the check in Line 7. readi(X ′) is
forced to return Ai because X has been invalidated.
Else if Tj acquires the lock on X, updates X to shared-memory, Ti reads the base object
associated with X ′, Ti performs the check in Line 7 and finally Tj releases the lock on X.
Again, readi(X
′) returns Ai since Tj is holding a lock on X ∈ Rset(Ti)—contradiction.
Hence, the only possibility is that the last successful tm-read (readi(X
′)) in Ti is linearized
before tryCj(X), which implies that δTi <E δTj .
(2) Suppose that Ti is an updating transaction as shown in Figure 2(B), then `tryCi(X2) and
`tryCj(X1) are assigned to Line 7 of Algorithm 2 when the locks are acquired on X2 and
X1 respectively. Assume the contrary that Ti precedes Tj in deferred-update order, but
δTj <E δTi , then `tryCj <E `tryCi . A similar argument to the above leads to a contradiction
since tryC performs the same sequence of checks as the tm-read (Line 8).

Lemma 23 If Ti ≺H Tj, then Ti <S Tj
Proof. This follows from the fact that for a given transaction, its serialization point is chosen
within the lifetime of the transaction implying if Ti ≺H Tj , then δTi <E δTj =⇒ Ti <S Tj 
Lemma 24 If Ti
X≺HTj, then Ti <S Tj
Proof. Assume the contrary, i.e. there exists a read j(X), X ∈ Rset(Tj) ∩ Wset(Ti) that
returns the value of X updated in writei(X, value) and Tj <S Ti. Ti is an updating committing
transaction, hence δTi = `tryCi .
Consider two cases:
(1) Suppose that Tj is a read-only transaction. Thus, δTj is assigned to the last tm-read
that returns a non-Aj value (say readj(X
′)), whose linearization point precedes `tryCi .
This implies that the read of the base object associated with X ′ by Tj in Line 5 precedes
the successful lock acquisition on X by Ti in Line 7. Thus, the write to the base object
associated with X performed by tryCi() in line 12 is executed after the read of the base
object performed by readj(X) in Line 5—a contradiction.
22
(2) Suppose that Tj is an updating transaction. Then, `tryCj <E `tryCi . Again, this implies
that the read of the base object in Line 5 executed by readj(X) precedes to the write to
the base object performed by tryCi()—a contradiction.

Lemma 25 S is legal
Proof. Recall that S is legal if every tm-read of an object X performed by a transaction Ti
returns the response of the latest value written to X in S. Since we only consider canonic
transactions, the latest value written to X in S is the value written by the last transaction Tj
such that Tk commits, Tj <S Ti and X ∈Wset(Tj).
From Lemma 24, we have that for all Ti and Tj , if Ti
X≺HTj , then Ti precedes Tj in S. Thus, to
prove that S is legal, it is enough to show that if Ti
X≺HTj , then there does not exist a transaction
Tk that returns Ck, X ∈Wset(Tk) such that Ti <S Tk <S Tj .
Assume the contrary that
• TiX≺HTj
• ∃Tk, X ∈Wset(Tk), returns Ck such that Ti <S Tk <S Tj
Ti and Tk are both updating transactions that commit. Thus,
(Ti <S Tk) ⇐⇒ (δTi <E δTk)
(δTi <E δTk) ⇐⇒ (`tryCi <E `tryCk)
Since, Tj reads the value of X written by Ti, one of the following is true
`tryCi <E `tryCk <E `readj(X) (or)
`tryCi <E `readj(X) <E `tryCk
If `tryCk <E `readj(X), then the successful lock acquisition on X by Tk in Line 7 precedes the
read of the base object associated with X by Tj in Line 5.
read j(X) checks if any object in Rset(Tj) is locked by a concurrent transaction, then performs
read-validation (Line 7). Consider the following possible sequence of events: Tk acquires the
lock on X, updates X to shared-memory, Tj reads the base object associated with X, Tk releases
the lock and finally Tj performs the check in Line 7. readj(X) is forced to return Aj because
X ∈ Rset(Tj) (Line 6) and has been invalidated since last reading its value.
Else if, Tk acquires the lock on X, updates X to shared-memory, Tj reads the base object
associated with X, Tj performs the check in Line 7 and finally Tk releases the lock on X. Again,
readj(X) returns Aj since Tk is holding a lock on X ∈ Rset(Tj)—contradiction.
Thus, `readj(X) <E `tryCk .
Consider two cases:
(1) Suppose that Tj is a read-only transaction. Then, δTj is assigned to the last tm-read
performed by Tj that returns a non-Aj value. If readj(X) is not the last tm-read that
returned a non-Aj value, then there exists a readj(X
′) such that
`readj(X) <E `tryCk <E `readj(X′)
(2) Suppose that Tj is an updating transaction that commits, then δTj = `tryCj which implies
that
23
`readj(X) <E `tryCk <E `tryCj
The same argument derived in the proof of Lemma 22 shows that both cases lead to a contra-
diction, i.e., both readj(X
′) and tryCj are forced to return Aj—contradiction. 
Lemma 26 Algorithm 2 implements a progressive TM
Proof. Every transaction Tk in a TM M whose tm-operations are defined by Algorithm 2 can
be aborted in the following scenarios
• Read-validation failed in readk or tryCk
• readk or tryCk returned Ak because Xj ∈ Rset(Tk) is locked (belongs to write set of a
concurrent transaction)
• L.acquire(Wset(Tk)) returned false in Line 21 of Algorithm 2
Read-validation consists of checking whether the value to be returned from a tm-read of transac-
tion Tk is consistent with the values returned from the previous tm-reads. Hence, if validation of
a tm-read in Tk fails, it means that the t-object is overwritten by some transaction Ti such that
Ti <S Tk, implying a read-write conflict. This is also implied if some t-object Xj ∈ Rset(Tk) is
locked and returns abort since the t-object is in the write set of a concurrent transaction.
Acquisition of the multi-trylock can return false for Ti because there exists some Xj ∈
Wset(Ti) that was being written to by a concurrent transaction Tk implying a write-write
conflict.
Hence, for every transaction Ti ∈ H that is aborted, there exists a conflicting t-object that
is contended by a concurrent transaction. Thus, Algorithm 2 implements a progressive TM 
Theorem 16 There exists a progressive opaque STM implementation that employs a single
multi-RAW per transaction. Moreover, no RAWs are performed by read-only transactions.
Proof. From Lemmas 22, 23, 25 and 26, Algorithm 2 implements a progressive, opaque STM.
Any process executing a transaction Tk holds the lock on Wset(Tk) only once during tryCk.
If |Wset(Tk)| = ∅, then the transaction simply returns Ck incurring no RAW’s. Thus, from
Theorem 21, Algorithm 2 incurs a single multi-RAW per updating transaction and no RAW’s
are performed in read-only transactions. 
A.3 Progressive implementation with single mCAS
Algorithm 3 describes the implementation of a progressive, opaque TM incurring a single AWAR
per updating committed transaction. The implementations of reads and writes are similar to
ones described in Section A.2 except that each time a transaction Tk performs a read of a t-
object Xi, it reads vi, adds Xi to its read set and checks if the t-objects in the current read set
of Tk have not been updated since Tk has read them. If this is not the case, the transaction is
forcefully aborted. Otherwise, Tk returns the value read in vi.
For every updating transaction Tk, tryCk() invokes the mCAS primitive over Dset(Tk). If
the mCAS returns true, tryCk() returns Ck, otherwise it returns Ak. Read-only transactions
simply returns Ck.
24
Algorithm 3 Progressive STM with single mCAS; implementation of transaction Tk by process
pi
1: Shared variables:
2: vj , for each t-object Xj
3: readk(Xj):
4: ovj := read(vj)
5: Rset(Tk) := Rset(Tk) ∪ {Xj}
6: nvj := ovj
7: if isInvalid() then
8: return Ak
9: return the value of ovj
10: writek(Xj , v):
11: if Xj 6∈Wset(Tk) then
12: nvj := v
13: Wset(Tk) := Wset(Tk) ∪ {Xj}
14: return okk
15: tryAk():
16: return Ak
17: tryCk():
18: if Wset(Tk) = ∅ then
19: return Ck
20: for all Xj ∈Wset(Tk) do
21: ovj := read(vj)
22: Let Wset(Tk) ∪ Rset(Tk) be {Xi1 , ..., Xim}
23: V = {vi1 , ..., vim}
24: OV = {ovi1 , ..., ovim}
25: NV = {nvi1 , ..., nvim}
26: if mCAS(V,OV,NV) then
27: return Ck
28: return Ak
29: Function: isInvalid():
30: if ∃Xj ∈ Rset(Tk):ovj 6= read(vj) then
31: return true
32: return false
A.3.1 Proof of opacity
Using the same notation as in proof of opacity for Algorithm 2 in Section A.2.1, let E′ denote
an execution of the TM implemented by Algorithm 3 and H ′, a linearization of the execution
history E′|TM . We construct H ′ by assigning linearization points to tm-operations performed
in completion of E′|TM .
The linearization point of a tm-operation opk performed by transaction Tk in a completion
of E′|TM is associated with access of a base object or a tm-event performed during the lifetime
of the tm-operation as follows.
• For every tm-read opk that returns a non-Ak value, `opk is chosen as the event in Line 4
of Algorithm 3, else, `opk is chosen as invocation event of opk
• For every tm-write opk that returns, `opk is chosen as the invocation event of opk
• For every opk = tryCk that returns Ck such that Wset(Tk) 6= ∅, `opk is associated with
the successful acquisition of the lock on Wset(Tk) (Line 26), else if opk returns Ak, `opk is
associated with the invocation event of opk
• For every opk = tryCk that returns Ck such that Wset(Tk) = ∅, `opk is associated with
Line 19
The t-sequential history S′ is constructed in same manner as described in Section A.2.1 from
the above assignment for linearization points. Note that the Lemmas proven for Algorithm 2
are clearly also valid for Algorithm 3.
Theorem 27 There exists a progressive opaque STM implementation that employs exactly one
AWAR per transaction. Moreover, no AWARs are performed in read-only or aborted transac-
tions.
Proof sketch. Clearly, Algorithm 3 implements an opaque STM.
25
Algorithm 3 is progressive since every transaction forcefully aborts either due to read-
invalidation or because mCAS returns false implying that there exists a conflicting t-object
contended by a concurrent transaction. Also note that, if several transactions concurrently
conflict on a single t-object, the first transaction to execute the mCAS in Line 26 is returned
true and commits. Thus, the implementation guarantees that in any set of concurrent conflict-
ing transactions, at least one of the transactions commits which actually provides a stronger
progress guarantee than progressiveness or even strong progressiveness. Indeed, a transaction
Tk can abort only if a concurrent committed transaction modifies the value of vj for some
Xj ∈ Dset(Tk).
Algorithm 3 performs a single mCAS operation on Dset(Tk) of a transaction Tk that com-
mits during tryCk; if Tk aborts, the mCAS only performs reads of base objects. For read-only
transactions, the transaction simply returns Ck incurring no AWAR. 
A.4 Starvation-free multi-trylock
In this section, we define a multi-trylock object analogous to the one defined in Section A.1,
but whose operations are starvation-free. The algorithm is inspired by the Black-White Bakery
Algorithm [25] and uses a finite number of bounded registers.
The algorithm uses the following shared variables: registers rij for each process pi and object
Xj , a shared bit color ∈ {B,W}, registers LAi ∈ {0, . . . , N} for each pi that denote a Label and
MCi ∈ {B,W} for each pi.
We say (LAi, i) < (LAk, k) iff LAi < LAk or LAi = LAk and i < k.
A starvation-free multi-trylock implementation satisfies the following properties:
• Mutual-exclusion: For any object Xj , and any execution pi, there exists at most one
process that holds a lock on Xj after pi.
• Progress: Let pi be any execution that contains acquire(Q) by process pi. If no other
process pk, k 6= i contends infinitely long on some Xj ∈ Q, then acquire(Q) returns true
in pi.
• Let pi be any execution that contains isContended(Xj) invoked by pi.
– If Xj is locked by p`; ` 6= i during the complete execution of isContended(Xj) in pi,
then isContended(Xj) returns true.
– If ∀` 6= i, Xj is never contended by p` during the execution of isContended(Xj) in pi,
then isContended(Xj) returns false.
Lemma 28 In every execution pi of Algorithm 4, if pi holds a lock on some object Xj after pi,
then one of the following conditions must hold:
(1) for some k 6= i; LAk 6= 0, if MCk = MCi, then (LAk, k) > (LAi, i)
(2) for some k 6= i; LAk 6= 0, if MCk 6= MCi, then MCi 6= color
Proof. In order to hold the lock on Xj , some process pi writes 1 to rij , writes a value, say W
to MCi and reads the Labels of other processes that have obtained the same color as itself and
generates a Label greater by one than the maximum Label read (Line 11). Observe that until
the value of the color bit is changed, all processes read the same value W . The first process pi
to hold the lock on Xj changes the color bit to B when releasing the lock and hence the value
read by all subsequent processes will be B until it is changed again. Now consider two cases:
26
Algorithm 4 Starvation-free multi-trylock invoked by process pi
1: Shared variables:
2: LAi, for each process pi, initially 0
3: MCi ∈ {B,W} for each process pi, initially W
4: color ∈ {B,W}, initally W
5: rij , for each process pi and each t-object Xj , initially 0
6: acquire(Q):
7: for all Xj ∈ Q do
8: write(rij , 1)
9: ci := color
10: write(MCi, ci)
11: write(LAi, 1 + max({LAk)|MCk = MCi})
12: while ∃j : ∃k 6= i: isContended(Xj) && ((LAk 6= 0; (MCk = MCi); (LAk, k) < (LAi, i)) ||
13: (LAk 6= 0; (MCk 6= MCi); MCi = color)) do
14: no op
15: end while
16: return true
17: release(Q):
18: for all Xj ∈ Q do
19: write(rij , 0)
20: if MCi = B then
21: write(color,W )
22: else
23: write(color,B)
24: write(LAi, 0)
25: return ok
26: isContended(Xj):
27: if ∃pt : rtj 6= 0, t 6= i then
28: return true
29: return false
(1) Assume that there exists a process pk, k 6= i, LAk 6= 0 and MCk = MCi such that
(LAk, k) < (LAi, i), but pi holds a lock on Xj after pi. Thus, isContended(Xj) returns
true to pi because pk writes to rkj (Line 8) before writing to LAk (Line 11). By assumption,
(LAk, k) < (LAi, i);LAk > 0 and MCi = MCk, but the conditional in Line 13 returned
true to pi without waiting for pk to stop contending on Xj—contradiction.
(2) Assume that there exists a process pk, k 6= i, LAk 6= 0 and MCk 6= MCi such that
MCi = color, but pi holds a lock on Xj after pi. Again, since LAk > 0, isContended(Xj)
returns true to pi, MCk 6= MCi and MCi = color, but the conditional in Line 13 returned
true to pi without waiting for pk to stop contending on Xj—contradiction.

Theorem 29 Algorithm 4 is an implementation of multi-trylock object in which every operation
is starvation-free and incurs at most four RAWs.
Proof. Denote by L the shared object implemented by Algorithm 4.
Assume, by contradiction, that L does not provide mutual-exclusion: there exists an execu-
tion pi after which processes pi and pk, k 6= i hold a lock on the same object, say Xj . Since both
pi and pk have performed the write to LAi and LAk resp. in Line 11, LAi, LAk > 0. Consider
two cases:
27
(1) If MCk = MCi, then from Condition 1 of Lemma 28, we have (LAk, k) < (LAi, i) and
(LAk, k) > (LAi, i)—contradiction.
(2) If MCk 6= MCi, then from Condition 2 of Lemma 28, we have MCi 6= color and MCk 6=
color which implies MCk = MCi—contradiction.
L also ensures progress. If process pi wants to hold the lock on an object Xj i.e. invokes
acquire(Q), Xj ∈ Q, it checks if any other process pk holds the lock on Xj . If such a process
pk exists and MCk = MCi, then clearly isContended(Xj) returns true for pi and (LAk, k) <
(LAi, i). Thus, pi fails the conditional in Line 13 and waits until pk releases the lock on
Xj to return true. However, if pk contends infinitely long on Xj , pi is also forced to wait
indefinitely to be returned true from the invocation of acquire(Q). The same argument works
when MCk 6= MCi since when pk stops contending on Xj , isContended(Xj) eventually returns
false for pi if pk does not contend infinitely long on Xj .
All operations performed by L are starvation-free. Each process pi that successfully holds the
lock on an object Xj in an execution pi invokes acquire(Q), Xj ∈ Q, obtains a color and chooses a
value for LAi since there is no way to be blocked while writing to LAi. The response of operation
acquire(Q) by pi is only delayed if there exists a concurrent invocation of acquire(Q
′), Xj ∈ Q′
by pk in pi. In that case, process pi waits until pk invokes release(Q) and writes 0 to rkj and
eventually holds the lock on Xj . The implementation of release and isContended are wait-
free operations (and hence starvation-free) since they contains no unbounded loops or waiting
statements.
The implementation of isContended(Xj) only reads base objects. The implementation of
release(Q) writes to a series of base objects (Line 18) and then reads a base object (Line 20)
incurring a single RAW. The implementation of acquire(Q) writes to base objects (Line 8), reads
the shared bit color (Line 9)—one RAW, writes to a base object (Line 10), reads the Labels
(Line 11)—one RAW, writes to its own Label and finally performs a sequence of reads when
evaluating the conditional in Line 13—one RAW.
Thus, Algorithm 4 incurs at most four RAWs. 
A.5 Strong progressive implementation with constant RAWs
Let CObjH(Ti) denote the set of t-objects over which transaction Ti ∈ parts(H) conflicts
with any other transaction in history H i.e. X ∈ CObjH(Ti), if there exists a transaction
Tk ∈ parts(H), k 6= i, such that Ti conflicts with Tk on X in H. Then, CObjH(Q) =
{CObjH(Ti)|∀Ti ∈ Q}, denotes the union of sets CObjH(Ti) for all transactions in Q.
Let CTrans(H) denote the set of non-empty subsets of parts(H) such that a set Q is in
CTrans(H) if no transaction in Q conflicts with a transaction not in Q.
Definition 30 A TM implementation M is strongly progressive if M is weakly progressive and
for any history H of M , there does not exist a prefix H ′ of H in which every set Q ∈ CTrans(H ′)
of transactions that are live in H ′ such that |CObjH′(Q)| ≤ 1, every transaction in Q is forcefully
aborted in H.
Algorithm 5 describes the implementation of the tryC operation of a strongly progressive,
opaque TM. The only modification over the tryC implementation of Algorithm 2 is that in
Algorithm 5, every transaction with |Rset| = ∅ eventually commits. The read, write, tryA and
isAbortable operations are the same as in Algorithm 2.
Theorem 31 Algorithm 5 implements a strongly progressive TM
28
Algorithm 5 Strongly progressive, opaque STM: the implementation of Tk executed by pi
1: Shared variables:
2: vj , for each t-object Xj
3: L, a starvation-free multi-trylock object
4: tryCk():
5: if |Wset(Tk)| = ∅ then
6: return Ck
7: locked := L.acquire(Wset(Tk))
8: if isAbortable() then
9: L.release(Wset(Tk))
10: return Ak
11: for all Xj ∈Wset(Tk) do
12: write(vj , (nvj , k))
13: L.release(Wset(Tk))
14: return Ck
Proof. Every transaction Tk in a TM M whose tm-operations are defined by Algorithm 5 can
be aborted in the following scenarios
• Read-validation failed in readk or tryCk
• readk or tryCk returned Ak because Xj ∈ Rset(Tk) is locked (belongs to write set of a
concurrent transaction)
Thus, Algorithm 5 implements a weakly progressive TM (From Lemma 26).
To show Algorithm 5 also implements a strongly progressive STM, we need to show that
for every set of transactions that concurrently contend on a single t-object, at least one of the
transactions is not aborted.
Consider transactions Ti and Tk that concurrently attempt to execute tryCi and tryCk such
that Xj ∈ Wseti ∪Wsetk. Consequently, they both invoke the acquire operation of the multi-
trylock (Line 7) and thus, from Theorem 29, both Ti and Tk must commit eventually. Also, if
validation of a tm-read in Tk fails, it means that the t-object is overwritten by some transaction
Ti such that Ti precedes Tk, implying at least one of the transactions commit. Otherwise, if
some t-object Xj ∈ Rset(Tk) is locked and returns abort since the t-object is in the write set
of a concurrent transaction Ti. While it may still be possible that Ti returns Ai after acquiring
the lock on Wseti, strong progressiveness only guarantees progress for transactions that conflict
on at most one t-object. Thus, in either case, for every set of transactions that conflict on at
most one t-object, at least one transaction is not forcefully aborted. 
Theorem 17 There exists a strongly progressive single-version opaque STM implementation
with starvation-free operations that uses invisible reads and employs at most four RAWs per
transaction. Moreover, no RAWs are performed in read-only transactions.
Proof. The correctness of Algorithm 5 clearly follows from the proof of opacity presented
in Section A.2.1 for Algorithm 2. From Theorem 31, it is also strongly progressive.
Any process executing a transaction Tk holds the lock on Wset(Tk) only once during tryCk.
If |Wset(Tk)| = ∅, then the transaction simply returns Ck incurring no RAW’s. Thus, from
Theorem 29, Algorithm 5 incurs at most four RAWs per updating transaction and no RAW’s
are performed in read-only transactions. 
B RAW/AWAR cost of probabilistically permissive STMs
Theorem 32 Let M be a probabilistically permissive opaque STM implementation. Then, for
any m, there exists with positive probability, an execution in which a read-only transaction Ti
contains Ω(m) non-overlapping RAWs or AWARs on base objects where m = |Rset(Ti)|.
29
Proof. For the proof, note that we only need to show that there exists an execution of the
probabilistically permissive TM that is the same as the execution of a permissive TM, Then,
the construction and arguments used in the proof of Theorem 13 can be extended for the
probabilistic case.
Let E denote the execution depicted in Figure 1 where T3 performs a read of X1, then T2
performs a write on X1 and commits, and finally T1 performs a series of reads on X1, . . . , Xm.
We proceed by induction by considering R1(Xk), the k-th read of T1, 2 ≤ k ≤ m.
(1) Imagine an extension of E, denoted by E′, in which T3 performs a W3(Xk) immediately
after R1(Xk) and then tries to commit. A serialization of H
′ = E′|TM should obey
T3 ≺DUH′ T2 and T2 ≺H′ T1. The execution of R1(Xk) does not modify base objects, hence,
T3 does not observe R1(Xk) in E
′. In a probabilistically permissive TM, the tm-operation
W3(Tk) can return one of the following values Ak or okk. Note that this response is chosen
by sampling uniformly at random from the set of possible return values, thus, there exists
a positive probability that T3 commits successfully (when it returns okk). But since T1
performs R1(Xk) before T3 commits and T3 updates Xk, we also have T1 ≺DUH′ T3. Thus,
T3 cannot precede T1 in any serialization and we establish a contradiction. Consequently,
there exists with positive probability, an execution in which each R1(Xk), 2 ≤ k ≤ m
performs a write to a base-object.
(2) Let pi be a fragment of E that represents the complete execution of R1(Xk). Clearly, there
exists with positive probability, an execution in which pi contains a write to a base-object.
Let pij be the first write to a base-object in pi and piw, the shortest fragment of pi that
contains the atomic section to which pij belongs, else if pij is not part of an atomic section,
piw = pij . Thus, pi can be represented as pis · piw · pif .
Suppose that pi does not contain a RAW or AWAR. Since piw does not contain an AWAR
(atomic write-after-read), there are no read events in piw that precede pij . Thus, pij is
the first base-object event in piw. Consider the execution fragment pis · ρ, where ρ is
the complete execution of {W3(Xk), TC3} by transaction T3. By Definition 9, such an
execution exists with positive probability in which T3 commits. Since pis does not perform
any base-object write, pis · ρ is indistinguishable to T3 from ρ.
Also, by our assumption, piw · pif contains no RAW i.e. any read performed in piw · pif can
only be applied to base objects previously written in piw · pif . Thus, in a probabilistically
permissive TM in which responses to tm-operations are chosen by independent coin-tosses,
there exists with positive probability, an execution pis · ρ · piw · pif that is indistinguishable
to T1 from pi. However, in pis · ρ · piw · pif , T3 commits (as in ρ) but T1 ignores the value
written by T3 to Xk. But T3 can only be serialized before T1—contradiction.

30
