On the Cost of Concurrency in Transactional Memory by Ravi, Srivatsan
Technische Universität Berlin
Fakultät für Elektrotechnik und Informatik
Lehrstuhl für Intelligente Netze
und Management Verteilter Systeme
On the Cost of Concurrency in
Transactional Memory
vorgelegt von
Srivatsan Ravi (M.E.)
von der Fakultät IV – Elektrotechnik und Informatik
der Technischen Universität Berlin
zur Erlangung des akademischen Grades
Doktor der Ingenieurwissenschaften (Dr.-Ing.)
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Uwe Nestmann, Ph.D., TU Berlin
Gutachterin: Prof. Anja Feldmann, Ph.D., TU Berlin
Gutachter: Prof. Petr Kuznetsov, Ph.D., Télécom ParisTech
Gutachterin: Prof. Hagit Attiya, Ph.D., The Technion
Gutachter: Prof. Rachid Guerraoui, Ph.D., EPFL
Gutachter: Prof. Michel Raynal, Ph.D., INRIA, Rennes
Tag der wissenschaftlichen Aussprache: 18 June, 2015
Berlin 2015
D 83
ar
X
iv
:1
51
1.
01
77
9v
1 
 [c
s.D
C]
  5
 N
ov
 20
15

Eidesstattliche Erklärung
Ich versichere an Eides statt, dass ich diese Dissertation selbständig verfasst und nur die angegebenen
Quellen und Hilfsmittel verwendet habe.
Datum Srivatsan Ravi (M.E.)
3

Abstract
Current general-purpose CPUs aremulticores, offering multiple computing units within a single chip. The
performance of programs on these architectures, however, does not necessarily increase proportionally
with the number of cores. Designing concurrent programs to exploit these multicores emphasizes the need
for achieving efficient synchronization among threads of computation. When there are several threads
that conflict on the same data, the threads will need to coordinate their actions for ensuring correct
program behaviour.
Traditional techniques for synchronization are based on locking that provides threads with exclusive
access to shared data. Coarse-grained locking typically forces threads to access large amounts of data se-
quentially and, thus, does not fully exploit hardware concurrency. Program-specific fine-grained locking
or non-blocking (i.e., not using locks) synchronization, on the other hand, is a dark art to most pro-
grammers and trusted to the wisdom of a few computing experts. Thus, it is appealing to seek a middle
ground between these two extremes: a synchronization mechanism that relieves the programmer of the
overhead of reasoning about data conflicts that may arise from concurrent operations without severely
limiting the program’s performance. The Transactional Memory (TM) abstraction is proposed as such
a mechanism: it intends to combine an easy-to-use programming interface with an efficient utilization of
the concurrent-computing abilities provided by multicore architectures. TM allows the programmer to
speculatively execute sequences of shared-memory operations as atomic transactions with all-or-nothing
semantics: the transaction can either commit, in which case it appears as executed sequentially, or abort,
in which case its update operations do not take effect. Thus, the programmer can design software hav-
ing only sequential semantics in mind and let TM take care, at run-time, of resolving the conflicts in
concurrent executions.
Intuitively, we want TMs to allow for as much concurrency as possible: in the absence of severe data
conflicts, transactions should be able to progress in parallel. But what are the inherent costs associated
with providing high degrees of concurrency in TMs? This is the central question of the thesis.
To address this question, we first focus on the consistency criteria that must be satisfied by a TM
implementation. We precisely characterize what it means for a TM implementation to be safe, i.e., to
ensure that the view of every transaction could have been observed in some sequential execution. We
then present several lower and upper bounds on the complexity of three classes of safe TMs: blocking TMs
that allow transactions to delay or abort due to overlapping transactions, non-blocking TMs which adapt
to step contention by ensuring that a transaction not encountering steps of overlapping transactions must
commit, and partially non-blocking TMs that provide strong non-blocking guarantees (wait-freedom) to
only a subset of transactions. We then propose a model for hybrid TM implementations that complement
hardware transactions with software transactions. We prove that there is an inherent trade-off on the
degree of concurrency allowed between hardware and software transactions and the costs introduced
on the hardware. Finally, we show that optimistic synchronization techniques based on speculative
executions are, in a precise sense, better equipped to exploit concurrency than inherently pessimistic
techniques based on locking.
5
Zusammenfassung
Aktuelle Allzweck-CPUs haben mehrere Rechenkerne innerhalb eines einzelnen Chipsatzes. Allerdings
erhöht sich die Leistung der Programme auf diesen Architekturen nicht notwendigerweise proportional
in der Anzahl der Kerne. Das Entwerfen nebenläufiger Programme um diese Multicores zu nutzen,
erfordert die Überwindung einiger nicht-trivialer Herausforderungen; die wichtigste ist, eine effiziente
Synchronisierung der Threads der Berechnung herzustellen. Greifen mehrere Threads gleichzeitig auf
dieselben Daten zu, müssen diese ihre Aktionen koordinieren, um ein korrektes Programmverhalten zu
gewährleisten.
Die traditionelle Methode zur Synchronisierung ist "Locking", welches jeweils nur einem einzelnen Thread
Zugriff auf gemeinsam genutzten Daten gewährt. Bei grobkörnigem "Lockingërfolgt der Zugang zu ei-
ner großen Menge von Daten meist seriell, sodass die Hardware-Parallelität nich in vollem Unfang
ausgenutzt wird. Auf der anderen Seite stellt programmspezifisches feinkörniges Locking, oder auch
nicht-blockierende (d.h. keine Locks benutzende) Synchronisierung, eine dunkle Kunst für die meisten
Programmierer dar, welche auf die Weisheit weniger Computerexperten vertraut. So ist es angebracht,
einen Mittelweg zwischen diesen beiden Extremen zu suchen: einen Synchronisierungsmechanismus, der
den Programmierer bezüglich der Datenkonflikte, die aus gleichzeitigen Operationen entstehen, entlas-
tet, ohne jedoch die Leistung des Programms zu stark zu beeinträchtigen. Die Transactional Memory
(TM) Abstraktion wird als solcher Mechanismus vorgeschlagen: ihr Ziel ist es, eine einfach zu bedie-
nende Programmierschnittstelle mit einer effizienten Nutzung der gleichzeitigen Computing-Fähigkeiten
von Multicore-Architekturen zu kombinieren. TM erlaubt es dem Programmierer, Sequenzen von Ope-
rationen auf dem gemeinsamen Speicher als atomare Transaktionen mit Alles-oder-Nichts Semantik zu
erklären: Die Transaktion wird entweder übergeben, und somit sequentiell ausgeführt, oder abgebrochen,
sodass ihre Operationen nicht durchgeführt werden. Dies ermöglicht dem Programmierer, Software mit
nur sequentieller Semantik zu konzipieren, und die aus gleichzeitger Ausführung entstehenden Konflikte
TM zu überlassen.
Intuitiv sollen die TMs so viel Nebenläufigkeit wie möglich berücksichtigen: Falls keine Datenkonflikte
vorhanden sind, sollen alle Transaktionen parallel ausgeführt werden. Gibt es in TMs Kosten, die durch
diesen hohen Grad an Nebenläufigkeit entstehen? Das ist die zentrale Frage dieser Arbeit.
Um diese Frage zu beantworten, konzentrieren wir uns zunächst auf das Kriterium der Konsistenz, welche
von der TM-Implementierung erfüllt werden muss. Wir charakterisieren auf präzise Art, was es für eine
TM-Implementierung heißt, sicher zu sein, d.h. zu gewährleisten, dass die Sicht einer jeden Transakti-
on auch von einer sequentiellen Transaktion hätte beobachtet werden können. Danach präsentieren wir
mehrere untere und obere Schranken für die Komplexität dreier Klassen von sicheren TMs: blockieren-
de TMs, die Blockierungen oder Abbrüche der Transaktionen erlauben, sollten diese sich überlappen,
nicht-blockierende TMs die einen schrittweisen Zugriffskonflikt berücksichtigen, d.h. Transaktionen, die
keinen Zugriff überlappender anderer Transaktionen beobachten, müssen übergeben, und partiell nicht-
blockierende TMs, die nur für eine Teilmenge von Transaktionen nicht-blockierend sind. Wir schlagen
daraufhin ein Modell für hybride TM-Implementierungen vor, welches die Hardware Transaktionen mit
Software Transaktionen ergänzt. Wir beweisen, dass es eine inherente Trade-Off zwischen Grad der er-
laubten Nebenläufigkeit zwischen Hard- und Software Transaktionen und den Kosten der Hardware gibt.
Schlussendlich beweisen wir, dass optimistische, auf spekulativen Ausführungen basierende, Synchroni-
sierungstechniken, in einem präzisen Sinne, besser geeignet sind um Nebenläufigkeit auszunutzen als
pessimistische Techniken, die auf "Locking"basieren.
6
Contents
1 Introduction 11
1.1 Concurrency and synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.1 Concurrent computing overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.2 Synchronization using locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.1.3 Non-blocking synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Transactional Memory (TM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Summary of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.1 Safety for transactional memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.2 Complexity of transactional memory . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.3 Hybrid transactional memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.4 Optimism for boosting concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Roadmap of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Transactional Memory model 21
2.1 TM interface and TM implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 TM-correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 TM-progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 TM-liveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Invisible reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Disjoint-access parallelism (DAP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.7 TM complexity metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Safety for transactional memory 33
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Safety properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Opacity and deferred-update(DU) semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 On the safety of du-opacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1 Du-opacity is prefix-closed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.2 The limit of du-opaque histories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.3 Du-opacity is limit-closed for complete histories . . . . . . . . . . . . . . . . . . . . 38
3.4.4 Du-opacity vs. opacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Strict serializability with DU semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 Du-opacity vs. other deferred-update criteria . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6.1 Virtual-world consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6.2 Transactional memory specification (TMS) . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Related work and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Complexity bounds for blocking TMs 49
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Sequential TMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 A quadratic lower bound on step complexity . . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Expensive synchronization in Transactional memory cannot be eliminated . . . . . 53
4.3 Progressive TMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.1 A linear lower bound on the amount of protected data . . . . . . . . . . . . . . . . 54
4.3.2 A constant stall and constant expensive synchronization strict DAP opaque TM . 55
7
Contents
4.4 Strongly progressive TMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 A Ω(n log n) lower bound on remote memory references . . . . . . . . . . . . . . . 61
4.4.2 A constant expensive synchronization opaque TM . . . . . . . . . . . . . . . . . . 65
4.5 On the cost of permissive opaque TMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6 Related work and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5 Complexity bounds for non-blocking TMs 73
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Impossibility of weak DAP and invisible reads . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 A linear lower bound on memory stall complexity . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 A linear lower bound on expensive synchronization for RW DAP . . . . . . . . . . . . . . 78
5.5 Algorithms for obstruction-free TMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5.1 An opaque RW DAP TM implementation . . . . . . . . . . . . . . . . . . . . . . . 82
5.5.2 An opaque weak DAP TM implementation . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Why Transactional memory should not be obstruction-free . . . . . . . . . . . . . . . . . . 87
5.7 Related work and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6 Lower bounds for partially non-blocking TMs 91
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2 The space complexity of invisible reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3 Impossibility of strict DAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.4 A linear lower bound on expensive synchronization for weak DAP . . . . . . . . . . . . . . 95
6.5 Related work and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7 Hybrid transactional memory (HyTM) 99
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.2 Modelling HyTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.2.1 Direct and cached accesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.2.2 Slow-path and fast-path transactions . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2.3 Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.2.4 Impossibility of uninstrumented HyTMs . . . . . . . . . . . . . . . . . . . . . . . . 104
7.3 A linear lower bound on instrumentation for progressive HyTMs . . . . . . . . . . . . . . 106
7.4 Instrumentation-optimal progressive HyTM . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.5 Providing partial concurrency at low cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.6 Related work and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8 Optimism for boosting concurrency 119
8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.2 Concurrent implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.3 Locally serializable linearizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.4 Pessimistic vs. optimistic synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.4.1 Concurrency analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.4.2 Concurrency optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.5 Related work and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9 Concluding remarks 133
List of Figures 137
List of Tables 139
10 Bibliography 141
8
Contents
Acknowledgements
In his wonderfully sarcastic critique of the scientific community in His Master’s Voice, the great Polish
writer Stanisław Lem refers to a specialist as a barbarian whose ignorance is not well-rounded. Writing
a Ph.D. thesis is essentially an attempt at becoming a specialist on some topic; whether this thesis on
Transactional Memory makes me one is a questionable claim, but I am culturedly not totally ignorant,
I think. The thesis itself was a long time in the making and would not have been possible without the
wonderful support and gratitudes I have received these past four years.
My advisors Anja Feldmann and Petr Kuznetsov guided me throughout my Ph.D. term.
A fair amount of whatever good I have learnt these past few years, both scientifically and meta-
scientifically, I owe it to Petr. He taught me, by example, what it takes to achieve nontrivial scientific
results. He spent several hours schooling me when I had misunderstood some topic and as such suffered
the worst of my writing, especially in the first couple of years. He was hard on me when I did badly,
but always happy for me when I did well. Apart from being a deep thinker and a brilliant researcher,
his scientific integrity and mental discipline have indelibly made me a better human being and student
of science. At a personal level, I wish to thank him and his family for undeserved kindness shown to me
over the years.
I would like to thank Anja for the extraordinary amount of freedom she gave me to pursue my own
research and the trust she placed in me, as she does in all her students.
I am especially grateful to Robbert Van Renesse and Bryan Ford, who gave me a taste for independent
research and in many ways, helped shape the course of my graduate career.
I am of course extremely grateful to all my co-authors who allowed me to include content, written in
conjunction with them, in the thesis. So special thanks to Dan Alistarh, Hagit Attiya, Vincent Gramoli,
Sandeep Hans, Justin Kopinsky, Petr Kuznetsov and Nir Shavit.
The results in Chapter 3 were initiated during a memorable visit to the Technion, Haifa in the Spring
of ’12. I am thankful to have been hosted by Hagit Attiya and to have had the chance to work with her
and Sandeep Hans. I am also very grateful to David Sainz for taking time off and introducing me to
some beautiful parts of Israel.
Chapter 7 essentially stemmed from a visit to MIT in the summer of ’13. I am very grateful to Dan
Alistarh and Nir Shavit for hosting me. Special thanks to Justin Kopinsky, who helped keep our discus-
sions alive during our lengthy dry spells when we were seemingly spending all our time thinking about
the problem, but without producing any tangible results.
Chapter 8 represents the most excruciatingly painful part of the thesis purely in terms of the number
of iterations the paper based on this chapter went through. Yet, it was a procedure from which I learnt
a lot and I am very thankful to Vincent Gramoli, who initiated the topic during my visit to EPFL in
Spring ’11.
In general, I have benefitted immensely from just talking to researchers in distributed computing during
conferences, workshops and research visits. These include Yehuda Afek, Panagiota Fatourou, Rachid
Guerraoui, Maurice Herlihy, Victor Luchangco, Adam Morrison and Michel Raynal. Also, great thanks
to the anonymous reviewers of my paper submissions whose critiques and comments helped improve the
contents of the thesis.
Back here in Berlin, so many of my INET colleagues have shaped my thought processes and enriched my
experience in grad school. Thanks to Stefan Schmid, whose ability to execute several tasks concurrently
with minimal synchronization overhead, never ceases to amaze anyone in this group. I am also very
grateful to Anja, Petr and Stefan for allowing me to be a Teaching Assistant in their respective courses.
Apart from being great friends, Felix Poloczek and Matthias Rost have been wonderful office mates
and indulged my random discussions about life and research. Arne Ludwig and Carlo Fürst have been
great friends; Arne, thanks for all the football discussions and Carlo, for exposing me to some social life.
Thanks to Dan Levin, who has been a great friend and always been there to motivate and give me a fillip
whenever I needed it. Ingmar Poese has been a wonderful friend as well as a constant companion to the
9
Contents
movies. Franziska Lichtblau, Enric Pujol, Philipp Richter and others have been willing companions in
ordering several late night dinners at the lab. Thomas Krenc and Philipp Schmidt were great co-TA’s.
Great thanks to all the other current and past members of INET: our system admin Rainer May, Marco
Canini, Damien Foucard, Juhoon Kim, Gregor Schaffrath, Julius Schulz-Zander, Georgios Smaragdakis,
Florian Streibelt, Lalith Suresh, Steve Uhlig and all the other members I have missed. Special thanks to
our group secretaries, Birgit Hohmeier-Touré and Nadine Pissors, without whom there would be absolute
chaos.
I would like to thank my flat mates of the last two years: Ingmar, Jennifer, Jose, Lily and Mathilda.
Lastly, thanks to my family and friends outside of the academic sphere for tolerating me all these years.
10
1
Introduction
While the performance of programs would increase proportionally with the performance of a singlecore
CPU, the performance of programs on multicore CPU architectures, however, does not necessarily in-
crease proportionally with the number of cores. In order to exploit these multicores, the amount of
concurrency provided by programs will need to increase as well. Designing concurrent programs that
exploit the hardware concurrency provided by modern multicore CPU architectures requires achieving
efficient synchronization among threads of computation. However, due to the asynchrony resulting from
the CPU’s context switching and scheduling policies, it is hard to specify reasonable bounds on relative
thread speeds. This makes the design of efficient and correct concurrent programs a difficult task. The
Transactional Memory (TM) abstraction [80, 117] is a synchronization mechanism proposed as a solu-
tion to this problem: it combines an easy-to-use programming interface with an efficient utilization of
the concurrent-computing abilities provided by multicore architectures. This chapter introduces the TM
abstraction and presents an overview of the thesis.
1.1 Concurrency and synchronization
In this section, we introduce the challenges of concurrent computing and overview the drawbacks asso-
ciated with traditional synchronization techniques.
1.1.1 Concurrent computing overview
Shared memory model. A process represents a thread of computation that is provided with its own
private memory which cannot be accessed by other processes. However, these independent processes
will have to synchronize their actions in an asynchronous environment in order to implement a user
application, which they do by communicating via the CPU’s shared memory.
In the shared memory model of computation, processes communicate by reading and writing to a fragment
of the shared memory, referred to as a base object, in a single atomic (i.e., indivisible) instruction. Modern
CPU architectures additionally allow processes to invoke certain powerful atomic read-modify-write (rmw)
instructions [75], which allow processes to write to a base object subject to the check of an invariant.
For example, the compare-and-swap instruction is a rmw instruction that is supported by most modern
architectures: it takes as input 〈old ,new〉 and atomically updates the value of a base object to new and
returns true iff its value prior to applying the instruction is equal to old ; otherwise it returns false.
11
Chapter 1 Introduction
Concurrent implementations. A concurrent implementation provides each process with an algorithm
to apply CPU instructions on the shared base objects for the operations of the user application. For
example, consider the problem of implementing a concurrent list-based set [81]. the set abstraction
implemented as a sorted linked list supporting operations insert(v), remove(v) and contains(v); v ∈ Z.
The set abstraction stores a set of integer values, initially empty. The update operations, insert(v)
and remove(v), return a boolean response, true if and only if v is absent (for insert(v)) or present (for
remove(v)) in the list. After insert(v) is complete, v is present in the list, and after remove(v) is complete,
v is absent in the list. The contains(v) returns a boolean, true if and only if v is present in the list. A
concurrent implementation of the list-based set is simply an emulation of the set abstraction that is
realized by processes applying the available CPU instructions on the underlying base objects.
Safety and liveness. What does it mean for a concurrent implementation to be correct? Firstly,
the implementation must satisfy a safety property : there are no bad reachable states in any execution
of the implementation. Intuitively, we characterize safety for a concurrent implementation of a data
abstraction by verifying if the responses returned in the concurrent execution may have been observed
in a sequential execution of the same. For example, the safety property for a concurrent list-based set
implementation stipulates that the response of the set operations in a concurrent execution is consistent
with some sequential execution of the list-based set. However, a concurrent set implementation that does
not return any response trivially ensures safety; thus, the implementation must satisfy some liveness
property specifying the conditions under which the operations must return a response. For example, one
liveness property we may wish to impose on the concurrent list-based set is wait-freedom: every process
completes the operations it invokes irrespective of the behaviour of other processes.
As another example, consider the mutual exclusion problem [41] which involves sharing some critical data
resource among processes. The safety property for mutual exclusion stipulates that at most one process
has access to the resource in any execution, in which case, we say that the process is inside the critical
section. However, one may notice that an implementation which ensures that no process ever enters the
critical section is trivially safe, but not very useful. Thus, the mutual exclusion implementation must
satisfy some liveness property specifying the conditions under which the processes must eventually enter
the critical section. For example, we expect that the implementation is deadlock-free: if every process
is given the chance to execute its algorithm, some process will enter the critical section. In contrast to
safety, a liveness property can be violated only in an infinite execution, e.g., by no process ever entering
the critical section.
In shared memory computing, we are concerned with deriving concurrent implementations with strong
safety and liveness properties, thus emphasizing the need for efficient synchronization among processes.
1.1.2 Synchronization using locks
A lock is a concurrency abstraction that implements mutual exclusion and is the traditional solution
for achieving synchronization among processes. Processes acquire a lock prior to executing code inside
the critical section and release the lock afterwards, thereby allowing other processes to modify the data
accessed by the code within the critical section. In essence, after acquiring the lock, the code within
the critical section can be executed atomically. However, lock-based implementations suffer from some
fundamental drawbacks.
Ease of designing lock-based programs. Ideally, to reduce the programmer’s burden, we would
like to take any sequential implementation and transform it to an efficient concurrent one with minimal
effort. Consider a simple locking protocol that works for most applications: coarse-grained locking which
typically serializes access to a large amount of data. Although trivial for the programmer to implement,
it does not exploit hardware concurrency. In contrast, fine-grained locking may exploit concurrency
better, but requires the programmer to have a good understanding of the data-flow relationships in the
application and precisely specify which locks provide exclusive access to which data.
For example, consider the problem of implementing a concurrent list-based set. The sequential imple-
mentation of the list-based set uses a sorted linked list data structure in which each data item (except
12
1.1 Concurrency and synchronization
the tail of the list) maintains a next field to provide a pointer to the successor. Every operation (insert,
remove and contains) invoked with a parameter v ∈ Z traverses the list starting from the head up to the
data item storing value v′ ≥ v. If v′ = v, then contains returns true, remove(v) unlinks the corresponding
element and returns true and insert(v) returns false. Otherwise, contains(v) and remove(v) return false,
while insert(v) adds a new data item with value v to the list and returns true.
Given such a sequential implementation, we may derive a coarse-grained implementation of the list-based
set by having processes acquire a lock on the head of the list, thus, forcing one operation to complete
before the next starts. Alternatively, a fine-grained protocol may involve acquiring locks hand-over-
hand [29]: a process holds the lock on at most two adjacent data items of the list. Yet, while such a
protocol produces a correct set implementation [26], it is not a universal strategy that applies to other
popular data abstractions like queues and stacks.
Composing lock-based programs. It is hard to compose smaller atomic operations based on locks to
produce a larger atomic operation without affecting safety [81, 117]. Consider the fifo queue abstraction
supporting the enqueue(v); v ∈ Z and dequeue operations. Suppose that we wish to solve the problem of
atomically dequeuing from a queue Q1 and enqueuing the item returned, say v, to a queue Q2. While
the individual actions of dequeuing from Q1 and enqueuing v to Q2 may be atomic, we wish to ensure
that the combined action is atomic: no process must observe the absence of v or that it is present in
both Q1 and Q2. A possible solution to this specific problem is to force a process attempting atomic
modification of Q1 and Q2 to acquire a lock. Firstly, this requires prior knowledge of the identities of
the two queue instances. Secondly, this solution does not exploit hardware concurrency since the lock
itself becomes a concurrency bottleneck. Moreover, imagine that processes p1 and p2 need to acquire two
locks L1 and L2 in order to atomically modify a set of queue instances. Without imposing a pre-agreed
upon order on such lock acquisitions, there is the possibility of introducing deadlocks where processes
wait infinitely long without completing their operations. For example, imagine the following concurrency
scenario: process p1 (and resp., p2) holds the lock L1 (and resp., L2) and attempts to acquire the lock L2
(and resp., L1). Thus, process p1 (and resp., p2) waits infinitely long for p2 (and resp., p1) to complete
its operation.
1.1.3 Non-blocking synchronization
It is impossible to derive lock-based implementations that provide non-blocking liveness guarantees, i.e.,
some process completes its operations irrespective of the behaviour of other processes. In fact, even
the weak non-blocking liveness property of obstruction-freedom [16] cannot be satisfied by lock-based
implementations: a process must complete its operation if it eventually runs solo without interleaving
events of other processes.
Concurrent implementations providing non-blocking liveness properties are appealing in practice since
they overcome problems like deadlocks and priority inversions [30] inherent to lock-based implemen-
tations. Thus, non-blocking (without using locks) solutions using conditional rmw instructions like
compare-and-swap have been proposed as an alternative to lock-based implementations. However, as
with fine-grained locking, implementing correct non-blocking algorithms can be hard and requires hand-
crafted problem-specific strategies. For example, the state-of-the-art list-based set implementation by
Harris-Michael [70, 81, 105] is non-blocking: the insert and remove operations, as they traverse the list,
help concurrent operations to physically remove data items (using compare-and-swap) that are logically
deleted, i.e., “marked for deletion”. But one cannot employ an identical algorithmic technique for im-
plementing a non-blocking queue [106], whose semantics is orthogonal to that of the set abstraction.
Moreover, addressing the compositionality issue, as with lock-based solutions, requires ad-hoc strategies
that are not easy to realize [81].
13
Chapter 1 Introduction
Algorithm 1.1 Sequential implementation of
remove operation of list-based set
1: remove(v):
2: prev ← head
3: curr ← read(prev .next)
4: while (tval ← read(curr .val)) < v do
5: prev ← curr
6: curr ← read(curr .next)
7: end while
8: if tval = v then
9: tnext← read(curr .next)
10: write(prev .next , tnext)
11: Return (tval = v)
Algorithm 1.2 Using TM to implement remove
operation of list-based set
1: remove(v):
atomic{
2: prev ← head
3: curr ← tx-read(prev .next)
4: while (tval ← tx-read(curr .val)) < v do
5: prev ← curr
6: curr ← tx-read(curr .next)
7: end while
8: if tval = v then
9: tnext← tx-read(curr .next)
10: tx-write(prev .next , tnext)
11: tryCommit()
12: Return (tval = v)
} catch {AbortException a }
13: { Return ⊥ }
Figure 1.1: Transforming a sequential implementation of the list-based set to a TM-based concurrent one
1.2 Transactional Memory (TM)
Transactional Memory (TM) [80, 117] addresses the challenge of resolving conflicts (concurrent reading
and writing to the same data) in an efficient and safe manner by offering a simple interface in which
sequences of shared memory operations on data items can be declared as optimistic transactions. The
underlying idea of TM, inspired by databases [58], is to treat each transaction as atomic: a transaction
may either commit, in which case it appears as executed sequentially, or abort, in which case none of its
update operations take effect. Thus, it enables the programmer to design software applications having
only sequential semantics in mind and let TM take care of dynamically handling the conflicts resulting
from concurrent executions at run-time.
A TM implementation provides processes with algorithms for implementing transactional operations such
as read, write, tryCommit and tryAbort on data items using base objects. TM implementations typically
ensure that all committed transactions appear to execute sequentially in some total order respecting
the timing of non-overlapping transactions. Moreover, unlike database transactions, intermediate states
witnessed by the read operations of an incomplete transaction may affect the user application. Thus,
to ensure that the TM implementation does not export any pathological executions, it is additionally
expected that every transaction (including aborted and incomplete ones) must return responses that is
consistent with some sequential execution of the TM implementation.
In general, given a sequential implementation of a data abstraction, a corresponding TM-based concur-
rent one encapsulates the sequential (high-level) operations within a transaction. Then, the TM-based
concurrent implementation of the data abstraction replaces each read and write of a data item with the
transactional read and write implementations, respectively. If the transaction commits, then the result
of the high-level operation is returned to the application. Otherwise, one of the transactional operations
may be aborted, in which case, the write operations performed by the transaction do not take effect and
the high-level operation is typically re-started with a new transaction.
To illustrate this, we refer to the sequential implementation of the remove operation of list-based set
depicted in Algorithm 1.1 of Figure 1.1. In a TM-based concurrent implementation of the list-based set
(Algorithm 1.2), each read (and resp. write) operation performed by remove(v) on a data item X of the
list is replaced with tx-read(X) (and resp., tx-write(X, arg)). tx-read(X) returns the value of the data item
X or aborts the transaction while tx-write(X, arg) writes the value arg to X or aborts the transaction.
Finally, the process attempts to commit the transaction by invoking the tryCommit operation. If the
tryCommit is successful, the response of remove(v) is returned; otherwise a failed response (denoted ⊥)
is returned, in which case, the write operations performed by the transaction are “rolled back”.
14
1.3 Summary of the thesis
Intuitively, it is easy to understand how TM simplifies concurrent programming. Deriving a TM-based
concurrent implementation of the list-based set simply requires encapsulating the operations to be exe-
cuted atomically within a transaction using an atomic delimiter 1. The underlying TM implementation
endeavours to dynamically execute the transactions by resolving the conflicts that might arise from pro-
cesses reading and writing to the same data item at run-time. Intuitively, since the TM implementation
enforces a strong safety property, the resulting list-based implementation is also safe: the responses of
its operations are consistent with some sequential execution of the list-based set.
One may view TM as a universal construction [13, 28, 45, 49, 75] that accepts as input the operations
of a sequential implementation and strives to execute them concurrently. Specifically, TM is designed to
work in a dynamic environment where neither the sequence of operations nor the data items accessed by
a transaction are known a priori. Thus, the response of a read operation performed by a transaction is
returned immediately to the application, and the application determines the next data item that must
accessed by the transaction.
TM-based implementations overcome the drawbacks of traditional synchronization techniques based on
locks and compare-and-swap. Firstly, the TM interface places minimal overhead on the programmer:
using TM only requires encapsulating the sequential operations within transactions and handling an
exception should the transaction be aborted. Secondly, the ability to execute multiple operations atom-
ically allows TM-based implementations to seamlessly compose smaller atomic operations to produce
larger ones. For example, suppose that we wish to atomically dequeue from a queue Q1 and enqueue(v)
in queue Q2, where v is the value returned by Q1.dequeue. Solving this problem using TM simply re-
quires encapsulating the sequential implementation of Q1.dequeue, followed by Q2.enqueue(v) within a
transaction.
Note that a TM implementation may internally employ locks or conditional rmw instructions like
compare-and-swap. However, TM raises the level of abstraction by exposing an easy-to-use compositional
transactional interface for the user application that is oblivious to the specifics of the implementation
and the semantics of the user application.
1.3 Summary of the thesis
TM allows the programmer to speculatively execute sequences of shared-memory operations as atomic
transactions: if the transaction commits, the operations appear as executed sequentially, or if the trans-
action aborts, the update operations do not take effect. The combination of speculation and the simple
programming interface provided by TM seemingly overcomes the problems associated with traditional
synchronization techniques based on locks and compare-and-swap. But are there some fundamental
drawbacks associated with the TM abstraction? Does providing high degrees of concurrency in TMs
come with inherent costs? This is the central question of the thesis. In the rest of this introductory
chapter, we provide a summary of the results in the thesis that give some answers to this question.
1.3.1 Safety for transactional memory
We first need to define the consistency criteria that must be satisfied by a TM implementation. We
formalize the semantics of a safe TM: every transaction, including aborted and incomplete ones, must
observe a view that is consistent with some sequential execution. This is important, since if the inter-
mediate view is not consistent with any sequential execution, the application may experience a fatal
irrevocable error or enter an infinite loop. Additionally, the response of a transaction’s read should not
depend on an ongoing transaction that has not started committing yet. This restriction, referred to as
deferred-update semantics appears desirable, since the ongoing transaction may still abort, thus render-
ing the read inconsistent. We define the notion of deferred-update semantics formally and apply it to
several TM consistency criteria proposed in literature. We then verify if the resulting TM consistency
1Different compilers may use different names for the delimiter; in GCC, it is transaction_atomic [2]
15
Chapter 1 Introduction
criterion is a safety property [11, 100, 108] in the formal sense, i.e., the set of histories (interleavings of
invocations and responses of transactional operations) is prefix-closed and limit-closed.
We first consider the popular consistency criterion of opacity [64]. Opacity requires the states observed
by all transactions, included uncommitted ones, to be consistent with a global serialization, i.e., a serial
execution constituted by committed transactions. Moreover, the serialization should respect the real-
time order : a transaction that completed before (in real time) another transaction started should appear
first in the serialization.
By definition, opacity reduces correctness of a TM history to correctness of all its prefixes, and thus
is prefix-closed and limit-closed. Thus, to verify that a history is opaque, one needs to verify that
each of its prefixes is consistent with some global serialization. To simplify verification and explicitly
introduce deferred-update semantics into a TM correctness criterion, we specify a general criterion of
du-opacity [18], which requires the global serial execution to respect the deferred-update property. In-
formally, a du-opaque history must be indistinguishable from a totally-ordered history, with respect to
which no transaction reads from a transaction that has not started committing. Assuming that in an
infinite history, every transaction completes each of the operations it invoked, we prove that du-opacity
is a safety property.
One may notice that the intended safety semantics does not require, as opacity does, that all transactions
observe the same serial execution. As long as committed transactions constitute a serial execution and
every transaction witnesses a consistent state, the execution can be considered “safe”: no run-time error
that cannot occur in a serial execution can happen. Two definitions in literature have adopted this
approach [43, 85]. We introduce “deferred-update” versions of these properties and discuss how the
resulting properties relate to du-opacity.
1.3.2 Complexity of transactional memory
One may observe that a TM implementation that aborts or never commits any transaction is trivially
safe, but not very useful. Thus, the TM implementation must satisfy some nontrivial liveness property
specifying the conditions under which the transactional operations must return some response and a
progress property specifying the conditions under which the transaction is allowed to abort.
Two properties considered important for TM performance are read invisibility [23] and disjoint-access
parallelism [86]. Read invisibility may boost the concurrency of a TM implementation by ensuring that
no reading transaction can cause any other transaction to abort. The idea of disjoint-access parallelism
is to allow transactions that do not access the same data item to proceed independently of each other
without memory contention.
We investigate the inherent complexities in terms of time and memory resources associated with imple-
menting safe TMs that provide strong liveness and progress properties, possibly combined with attractive
requirements like read invisibility and disjoint-access parallelism. Which classes of TM implementations
are (im)possible to solve?
Blocking TMs. We begin by studying TM implementations that are blocking, in the sense that, a
transaction may be delayed or aborted due to concurrent transactions.
• We prove that, even inherently sequential TMs, that allow a transaction to be aborted due to
a concurrent transaction, incur significant complexity costs when combined with read invisibility
and disjoint-access parallelism.
• We prove that, progressive TMs, that allow a transaction to be aborted only if it encounters a read-
write or write-write conflict with a concurrent transaction [62], may need to exclusively control a
linear number of data items at some point in the execution.
• We then turn our focus to strongly progressive TMs [64] that, in addition to progressiveness,
ensures that not all concurrent transactions conflicting over a single data item abort. We prove
that in any strongly progressive TM implementation that accesses the shared memory with read,
16
1.3 Summary of the thesis
write and conditional primitives, such as compare-and-swap, the total number of remote memory
references [14, 22] (RMRs) that take place in an execution in which n concurrent processes perform
transactions on a single data item might reach Ω(n log n) in the worst-case.
• We show that, with respect to the amount of expensive synchronization patterns like compare-and-
swap instructions and memory barriers [17, 103], progressive implementations are asymptotically
optimal. We use this result to establish a linear (in the transaction’s data set size) separation
between the worst-case transaction expensive synchronization complexity of progressive TMs and
permissive TMs that allow a transaction to abort only if committing it would violate opacity.
Non-blocking TMs. Next, we focus on TMs that avoid using locks and rely on non-blocking syn-
chronization: a prematurely halted transaction cannot not prevent other transactions from committing.
Possibly the weakest non-blocking progress condition is obstruction-freedom [78, 82] stipulating that
every transaction running in the absence of step contention, i.e., not encountering steps of concurrent
transactions, must commit. In fact, several early TM implementations [52, 79, 101, 117, 120] satisfied
obstruction-freedom. However, circa. 2005, several papers presented the case for a shift from TMs
that provide obstruction-free TM-progress to lock-based progressive TMs [39, 40, 48]. They argued that
lock-based TMs tend to outperform obstruction-free ones by allowing for simpler algorithms with lower
complexity overheads. We prove the following lower bounds for obstruction-free TMs.
• Combining invisible reads with even weak forms of disjoint-access parallelism [24] in obstruction-
free TMs is impossible,
• A read operation in a n-process obstruction-free TM implementation incurs Ω(n) memory stalls [16,
46].
• A read-only transaction may need to perform a linear (in n) number of expensive synchronization
patterns.
We then present a progressive TM implementation that beats all of these lower bounds, thus suggesting
that the course correction from non-blocking (obstruction-free) TMs to blocking (progressive) TMs was
indeed justified.
Partially non-blocking TMs. Lastly, we explore the costs of providing non-blocking progress to only
a subset of transactions. Specifically, we require read-only transactions to commit wait-free, i.e., every
transaction commits within a finite number of its steps, but updating transactions are guaranteed to
commit only if they run in the absence of concurrency. We show that combining this kind of partial
wait-freedom with read invisibility or disjoint-access parallelism comes with inherent costs. Specifically,
we establish the following lower bounds for TMs that provide this kind of partial wait-freedom.
• This kind of partial wait-freedom equipped with invisible reads results in maintaining unbounded
sets of versions for every data item.
• It is impossible to implement a strict form of disjoint-access parallelism [60].
• Combining with the weak form of disjoint-access parallelism means that a read-only transaction
(with an arbitrarily large read set) must sometimes perform at least one expensive synchronization
pattern per read operation in some executions.
1.3.3 Hybrid transactional memory
We turn our focus on Hybrid transactional memory (HyTM) [35, 37, 88, 99]. The TM abstraction, in
its original manifestation, augmented the processor’s cache-coherence protocol and extended the CPU’s
instruction set with instructions to indicate which memory accesses must be transactional [80]. Most
popular TM designs, subsequent to the original proposal in [80] have implemented all the functionality in
software [36, 52, 79, 101, 117]. More recently, CPUs have included hardware extensions to support small
transactions [1, 107, 111]. Hardware transactions may be spuriously aborted due to several reasons: cache
capacity overflow, interrupts etc. This has led to proposals for best-effort HyTMs in which the fast, but
potentially unreliable hardware transactions are complemented with slower, but more reliable software
17
Chapter 1 Introduction
transactions. However, the fundamental limitations of building a HyTM with nontrivial concurrency
between hardware and software transactions are not well understood. Typically, hardware transactions
usually employ code instrumentation techniques to detect concurrency scenarios and abort in the case
of contention. But are there inherent instrumentation costs of implementing a HyTM, and what are
the trade-offs between these costs ands provided concurrency, i.e., the ability of the HyTM to execute
hardware and software transactions in parallel?
The thesis makes the following contributions which help determine the cost of concurrency in HyTMs.
• We propose a general model for HyTM implementations, which captures the notion of cached
accesses as performed by hardware transactions, and precisely defines instrumentation costs in a
quantifiable way.
• We derive lower and upper bounds in this model, which capture for the first time, an inherent
trade-off on the degree of concurrency allowed between hardware and software transactions and
the instrumentation overhead introduced on the hardware.
1.3.4 Optimism for boosting concurrency
Lock-based implementations are conventionally pessimistic in nature: the operations invoked by processes
are not “abortable” and return only after they are successfully completed. The TM abstraction is a
realization of optimistic concurrency control: speculatively execute transactions, abort and roll back on
dynamically detected conflicts. But are optimistic implementations fundamentally better equipped to
exploit concurrency than pessimistic ones?
We compare the amount of concurrency one can obtain by converting a sequential implementation of
a data abstraction into a concurrent one using optimistic or pessimistic synchronization techniques. To
establish fair comparison of such implementations, we introduce a new correctness criterion for concurrent
implementations, called locally serializable linearizability, defined independently of the synchronization
techniques they use.
We treat an implementation’s concurrency as its ability to accept schedules of sequential operations
from different processes. More specifically, we assume an external scheduler that defines which processes
execute which steps of the corresponding sequential implementation in a dynamic and unpredictable
fashion. This allows us to define concurrency provided by an implementation as the set of interleavings
of steps of sequential operations (or schedules) it accepts, i.e., is able to effectively process. Then, the
more schedules the implementation would accept without hampering correctness, the more concurrent it
would be.
The thesis makes the following contributions.
• We provide a framework to analytically capture the inherent concurrency provided by two broad
classes of synchronization techniques: pessimistic implementations that implement some form of
mutual exclusion and optimistic implementations based on speculative executions.
• We show that, implementations based on pessimistic synchronization and “semantics-oblivious”
TMs are suboptimal, in the sense that, there exist there exist simple schedules of the list-based set
which cannot be accepted by any pessimistic or TM-based implementation. Specifically, we prove
that TM-based implementations accept schedules of the list-based set that cannot be accepted
by any pessimistic implementation. However, we also show pessimistic implementations of the
list-based set which accept schedules that cannot be accepted by any TM-based implementation.
• We show that, there exists an optimistic implementation of the list-based set that is concurrency
optimal, i.e., it accepts all correct schedules.
Our results suggest that “semantics-aware” optimistic implementations may be better suited to exploiting
concurrency than their pessimistic counterparts.
18
1.4 Roadmap of the thesis
1.4 Roadmap of the thesis
We first define the TM model, the TM properties proposed in literature and the complexity metrics
considered in Chapter 2. Chapter 3 is on safety for TMs. Chapter 4 is on the complexity of blocking
TMs, non-blocking TMs that satisfy obstruction-freedom are covered in Chapter 5 and we present lower
bounds for partially non-blocking TMs in Chapter 6. Chapter 7 is devoted to the study of hybrid TMs.
In Chapter 8, we compare the relative abilities of optimistic and pessimistic synchronization techniques in
exploiting concurrency. Chapter 9 presents closing comments and future directions. Viewed collectively,
the results hopefully shine light on the foundations of the TM abstraction that is widely expected to be
the Zeitgeist of the concurrent computational model.
19

2
Transactional Memory model
All models are wrong, but some
models are useful.
George Edward Pelham Box
In this chapter, we formalize the TM model and discuss some important TM properties proposed in
literature. In Section 2.1, we formalize the specification of TMs. In Section 2.2, we introduce the basic
TM-correctness property of strict serializability that we consider in the thesis. Sections 2.3 and 2.4
overview progress and liveness properties for TMs respectively and identifies the relations between them.
Section 2.5 defines the notion of invisible reads while Section 2.6 is on disjoint-access parallelism. Finally,
in Section 2.7, we introduce some of the complexity metrics considered in the thesis.
2.1 TM interface and TM implementations
In this section, we first describe the shared memory model of computation and then introduce the TM
abstraction.
The shared memory model. The thesis considers the standard asynchronous shared memory model
of computation in which a set of n ∈ N processes (that may fail by crashing), communicate by applying
operations on shared objects [16]. An object is an instance of an abstract data type. An abstract data
type τ is a mealy machine that is specified as a tuple (Φ,Γ, Q, q0, δ) where Φ is a set of operations,
Γ is a set of responses, Q is a set of states, q0 ∈ Q is an initial state and δ ⊆ Q × Φ × Q × Γ is a
transition relation that determines, for each state and each operation, the set of possible resulting states
and produced responses [6]. Here, (q, pi, q′, r) ∈ δ implies that when an operation pi ∈ Φ is applied on an
object of type τ in state q, the object moves to state q′ and returns response r.
An implementation of an object type τ provides a specific data-representation of τ that is realized by
processes applying primitives on shared base objects, each of which is assigned an initial value. In order
to implement an object, processes are provided with an algorithm, which is a set of deterministic state-
machines, one for each process. In the thesis, we use the term primitive to refer to operations on base
objects and reserve the term operation for the object that is implemented from the base objects.
21
Chapter 2 Transactional Memory model
A primitive is a generic atomic read-modify-write (rmw) procedure applied to a base object [46, 75]. It
is characterized by a pair of functions 〈g, h〉: given the current state of the base object, g is an update
function that computes its state after the primitive is applied, while h is a response function that specifies
the outcome of the primitive returned to the process. A rmw primitive is trivial if it never changes the
value of the base object to which it is applied. Otherwise, it is nontrivial. An rmw primitive 〈g, h〉 is
conditional if there exists v, w such that g(v, w) = v and there exists v, w such that g(v, w) 6= v [51].
Read is an example of a trivial rmw primitive that takes no input arguments: when applied to a base
object with value v, the update function leaves the state of the base object unchanged and the response
function returns the value v. Write is an example of a nontrivial rmw primitive that takes an input
argument v′: when applied to a base object with value v, its update function changes the value of the
base object to v′ and its response function returns ok. Compare-and-swap is an example of a nontrivial
conditional rmw primitive: its update function receives an input argument 〈old ,new〉 and changes the
value v of the base object to which it is applied iff v = old . Load-linked/store-conditional is another
example of a nontrivial conditional rmw primitive: the load-linked primitive executed by some process
pi returns the value of the base object to which it is applied and the store-conditional primitive’s update
function receives an input new and atomically changes the value of the base object to new iff the base
object has not been updated by any other process since the load-linked event by pi. Fetch-and-add is an
example of a nontrivial rmw primitive that is not conditional: its update function applied to base object
with an integer value v takes an integer w as input and changes the value of the base object to v+w.
Transactional memory (TM). Transactional memory allows a set of data items (called t-objects) to
be accessed via transactions. Every transaction Tk has a unique identifier k. We make no assumptions
on the size of a t-object, i.e., the cardinality on the set V of possible values a t-object can store. A
transaction Tk may contain the following t-operations, each being a matching pair of an invocation and
a response: readk(X) returns a value in V , denoted readk(X) → v, or a special value Ak /∈ V (abort);
writek(X, v), for a value v ∈ V , returns ok or Ak; tryCk returns Ck /∈ V (commit) or Ak. As we show in
the subsequent Section 2.2, we can specify TM as an abstract data type.
Note that a TM interface may additionally provide a startk t-operation that returns ok or Ak, which is
the first t-operation transaction Tk must invoke, or a tryAk t-operation that returns Ak. However, the
actions performed inside the startk may be performed as part of the first t-operation performed by the
transaction. The tryAk t-operation allows the user application to explicitly abort a transaction and can
be useful, but since each of the individual t-read or t-write are allowed to abort, the tryAk t-operation
provides no additional expressive power to the TM interface. Thus, for simplicity, we do not incorporate
these t-operations in our TM specification.
TM implementations. A TM implementation provides processes with algorithms for implementing
readk, writek and tryCk() of a transaction Tk by applying primitives from a set of shared base objects,
each of which is assigned an initial value. We assume that a process starts a new transaction only after
its previous transaction has committed or aborted.
In the rest of this section, we define the terms specifically in the context of TM implementations, but they
may be used analogously in the context of any concurrent implementation of an abstract data type.
Executions and configurations. An event of a process pi (sometimes we say step of pi) is an invocation
or response of an operation performed by pi or a rmw primitive 〈g, h〉 applied by pi to a base object b
along with its response r (we call it a rmw event and write (b, 〈g, h〉, r, i)).
A configuration (of an implementation) specifies the value of each base object and the state of each
process. The initial configuration is the configuration in which all base objects have their initial values
and all processes are in their initial states.
An execution fragment is a (finite or infinite) sequence of events. An execution of an implementation M
is an execution fragment where, starting from the initial configuration, each event is issued according to
M and each response of a rmw event (b, 〈g, h〉, r, i) matches the state of b resulting from all preceding
events. An execution E ·E′ denotes the concatenation of E and execution fragment E′, and we say that
E′ is an extension of E or E′ extends E.
22
2.1 TM interface and TM implementations
Let E be an execution fragment. For every transaction (resp., process) identifier k, E|k denotes the
subsequence of E restricted to events of transaction Tk (resp., process pk). If E|k is non-empty, we
say that Tk (resp., pk) participates in E, else we say E is Tk-free (resp., pk-free). Two executions E
and E′ are indistinguishable to a set T of transactions, if for each transaction Tk ∈ T , E|k = E′|k.
A TM history is the subsequence of an execution consisting of the invocation and response events of
t-operations. Two histories H and H ′ are equivalent if txns(H) = txns(H ′) and for every transaction
Tk ∈ txns(H), H|k = H ′|k.
Data sets of transactions. The read set (resp., the write set) of a transaction Tk in an execution E,
denoted RsetE(Tk) (resp., WsetE(Tk)), is the set of t-objects that Tk reads (resp., writes to) in E. More
specifically, if E contains an invocation of readk(X) (resp., writek(X, v)), we say that X ∈ RsetE(Tk)
(resp., WsetE(Tk)) (for brevity, we sometimes omit the subscript E from the notation). The data set of
Tk is Dset(Tk) = Rset(Tk)∪Wset(Tk). A transaction is called read-only if Rset(Tk) 6= ∅ ∧Wset(Tk) = ∅;
write-only if Wset(Tk) 6= ∅ ∧ Rset(Tk) = ∅ and updating if Wset(Tk) 6= ∅. Note that, in our TM model,
the data set of a transaction is not known apriori, i.e., at the start of the transaction and it is identifiable
only by the set of data items the transaction has invoked a read or write on in the given execution.
Transaction orders. Let txns(E) denote the set of transactions that participate in E. In an infinite
history H, we assume that each Tk ∈ txns(H), H|k is finite; i.e., transactions do not issue an infinite
number of t-operations. An execution E is sequential if every invocation of a t-operation is either the
last event in the history H exported by E or is immediately followed by a matching response. We
assume that executions are well-formed, i.e., for all Tk, E|k begins with the invocation of a t-operation,
is sequential and has no events after Ak or Ck. A transaction Tk ∈ txns(E) is complete in E if E|k ends
with a response event. The execution E is complete if all transactions in txns(E) are complete in E. A
transaction Tk ∈ txns(E) is t-complete if E|k ends with Ak or Ck; otherwise, Tk is t-incomplete. Tk is
committed (resp., aborted) in E if the last event of Tk is Ck (resp., Ak). The execution E is t-complete
if all transactions in txns(E) are t-complete.
For transactions {Tk, Tm} ∈ txns(E), we say that Tk precedes Tm in the real-time order of E, denoted
Tk ≺RTE Tm, if Tk is t-complete in E and the last event of Tk precedes the first event of Tm in E. If
neither Tk ≺RTE Tm nor Tm ≺RTE Tk, then Tk and Tm are concurrent in E. An execution E is t-sequential
if there are no concurrent transactions in E.
Latest written value and legality. Let H be a t-sequential history. For every operation readk(X) in
H, we define the latest written value of X as follows: if Tk contains a writek(X, v) preceding readk(X),
then the latest written value of X is the value of the latest such write to X. Otherwise, the latest
written value of X is the value of the argument of the latest writem(X, v) that precedes readk(X) and
belongs to a committed transaction in H. (This write is well-defined since H starts with T0 writing to
all t-objects.)
We say that readk(X) is legal in a t-sequential history H if it returns the latest written value of X, and
H is legal if every readk(X) in H that does not return Ak is legal in H.
We also assume, for simplicity, that the user application invokes a readk(X) at most once within a
transaction Tk. This assumption incurs no loss of generality, since a repeated read can be assigned to
return a previously returned value without affecting the history’s legality.
Contention. We say that a configuration C after an execution E is quiescent (resp., t-quiescent) if
every transaction Tk ∈ txns(E) is complete (resp., t-complete) in C. If a transaction T is incomplete in
an execution E, it has exactly one enabled event, which is the next event the transaction will perform
according to the TM implementation. Events e and e′ of an execution E contend on a base object b if
they are both events on b in E and at least one of them is nontrivial (the event is trivial (resp., nontrivial)
if it is the application of a trivial (resp., nontrivial) primitive).
We say that T is poised to apply an event e after E if e is the next enabled event for T in E. We say
that transactions T and T ′ concurrently contend on b in E if they are poised to apply contending events
on b after E.
23
Chapter 2 Transactional Memory model
We say that an execution fragment E is step contention-free for t-operation opk if the events of E|opk
are contiguous in E. We say that an execution fragment E is step contention-free for Tk if the events
of E|k are contiguous in E. We say that E is step contention-free if E is step contention-free for all
transactions that participate in E.
2.2 TM-correctness
Correctness for TMs is specified as a safety property on TM histories [11, 100, 108]. In this section, we
introduce the popular TM-correctness condition strict serializability [109]: all committed transactions
appear to execute sequentially in some total order respecting the real-time transaction orders. We then
explain how strict serializability is related to linearizability [83].
In the thesis, we only consider TM-correctness conditions like strict serializability and its restrictions.
We formally define strict serializability below, but other TM-correctness conditions studied in the thesis
can be found in Chapter 3.
First, we define how to derive a t-complete history from a t-incomplete one.
Definition 2.1 (Completions). Let H be a history. A completion of H, denoted H, is a history derived
from H as follows:
– First, for every transaction Tk ∈ txns(H) with an incomplete t-operation opk in H, if opk =
readk∨writek, insert Ak somewhere after the invocation of opk; otherwise, if opk = tryCk(), insert
Ck or Ak somewhere after the last event of Tk.
– After all transactions are complete, for every transaction Tk that is not t-complete, insert tryCk ·Ak
after the last event of transaction Tk.
Definition 2.2 (Strict serializability). A finite history H is serializable if there is a legal t-complete
t-sequential history S, such that there is a completion H of H, such that S is equivalent to cseq(H),
where cseq(H) is the subsequence of H reduced to committed transactions in H.
We refer to S as a serialization of H.
We say that H is strictly serializable if there exists a serialization S of H such that for any two trans-
actions Tk, Tm ∈ txns(H), if Tk ≺RTH Tm, then Tk precedes Tm in S.
In general, given a TM-correctness condition C, we say that a TM implementationM satisfies C if every
execution of M satisfies C.
Strict serializability as linearizability. We now show we can specify TM as an abstract data type.
The sequential specification of a TM is specified as follows:
1. Φ is the set of all transactions {Ti}i∈N
2. Γ is the set of incommensurate vectors {[r1, . . . , ri]}; i ∈ N; where each rj ; 1 ≤ j ≤ i − 1 ∈ {v ∈
V,A, ok} and ri ∈ {A,C}
3. The state of TM is a vector of the state of each t-object Xm. The state of a t-object Xm is a value
vm ∈ V of Xm. Thus, Q ⊆ {[vi1, . . . , vim, . . .]}; where each vim ∈ V
4. q0 ∈ Q = [ov1, . . . , ovm, . . .], where each ovm ∈ V
5. δ is defined as follows: Let Tk be a transaction applied to the TM in state q = [v1, . . . , vm, . . .].
• For every X ∈ Rset(Tk), the response of readk(X) is defined as follows: If Tk contains a
writek(X, v) prior to readk(X), then the response is v; else the response is the current state
of X.
• For every X ∈Wset(Tk), the response of writek(X, v) is ok.
24
2.3 TM-progress
• Transaction Tk returns the response C in which case the TM moves to state q′ defined as
follows: every Xj ∈ Wset(Tk) to which Tk writes values nvj , q′[j] = nvj ; else if Xj 6∈
Wset(Tk), q′[j] is unchanged. Otherwise, Tk returns the response A in which case q′ = q.
In general, the correctness of an implementation of a data type is commonly captured by the criterion
of linearizability. In the TM context, a t-complete history H is linearizable with respect to the TM type
if there exists a t-sequential history S equivalent to H such that (1) S respects the real-time ordering of
transactions in H and (2) S is consistent with the sequential specification of TM.
The following lemma, which illustrates the similarity between strict serializability and linearizability with
respect to the TM type, is now immediate.
Lemma 2.1. Let H be any t-complete history. Then, H is strictly serializable iff H is linearizable with
respect to the TM type.
2.3 TM-progress
One may notice that a TM implementation that forces, in every execution to abort every transaction
is trivially strictly serializable, but not very useful. A TM-progress condition specifies the conditions
under which a transaction is allowed to abort. Technically, a TM-progress condition specified this way
is a safety property since it can be violated in a finite execution (cf. Chapter 3 for details on safety
properties).
Ideally, a TM-progress condition must provide non-blocking progress, in the sense that a prematurely
halted transaction cannot prevent all other transactions from committing. Such a TM-progress condition
is also said to be lock-free since it cannot be achieved by use of locks and mutual-exclusion. A non-
blocking TM-progress condition is considered useful in asynchronous systems with process failures since
it prevents the TM implementation from deadlocking (processes wait infinitely long without committing
their transactions).
Obstruction-freedom. Perhaps, the weakest non-blocking TM-progress condition is obstruction-
freedom, which stipulates that a transaction may be aborted only if it encounters steps of a concurrent
transaction [82].
Definition 2.3 (Obstruction-free (OF) TM-progress). We say that a TM implementation M provides
obstruction-free (OF) TM-progress if for every execution E of M , if any transaction Tk ∈ txns(E)
returns Ak in E, then E is not step contention-free for Tk.
We now survey the popular blocking TM-progress properties proposed in literature. Intuitively, unlike
non-blocking TM-progress conditions that adapt to step contention, a blocking TM-progress condition
allows a transaction to be aborted due to overlap contention.
Minimal progressiveness. Intuitively, the most basic TM-progress condition is one which provide
only sequential TM-progress, i.e., a transaction may be aborted due to a concurrent transaction. In
literature, this is referred to as minimal progressiveness [64].
Definition 2.4 (Minimal progressiveness). We say that a TM implementation M provides minimal pro-
gressive TM-progress (or minimal progressiveness) if for every execution E of M and every transaction
Tk ∈ txns(E) that returns Ak in E, there exists a transaction Tm ∈ txns(E) that is concurrent to Tk in
E [64].
Given TM conditions C1 and C2, if every TM implementation that satisfies C1 also satisfies C2, but the
converse is not true, we say that C2  C1.
Observation 2.2. Minimal progressiveness  Obstruction-free.
25
Chapter 2 Transactional Memory model
Proof. Clearly, every TM implementation that satisfies obstruction-freedom also satisfies minimal pro-
gressiveness, but the converse is not true. Consider any execution of a TM implementation M in which
a transaction T run step contention-free. If M is minimally progressive, then T may be aborted in such
an execution since T may be concurrent with another transaction. However, if M satisfies obstruction-
freedom, T cannot be aborted in such an execution.
Progressiveness. In contrast to the “single-lock” minimal progressive TM-progress condition (also
referred to as sequential TM-progress in the thesis), state-of-the-art TM implementations allow a trans-
action to abort only if it encounters a conflict on a t-object with a concurrent transaction.
Definition 2.5 (Conflicts). We say that transactions Ti, Tj conflict in an execution E on a t-object X
if Ti and Tj are concurrent in E and X ∈ Dset(Ti) ∩Dset(Tj), and X ∈Wset(Ti) ∪Wset(Tj).
Definition 2.6 (Progressiveness). A TM implementation M provides progressive TM-progress (or pro-
gressiveness) if for every execution E of M and every transaction Ti ∈ txns(E) that returns Ai in E,
there exists a transaction Tk ∈ txns(E) such that Tk and Ti conflict in E [64].
Note that progressiveness is incomparable to obstruction-freedom.
Observation 2.3. Progressiveness 6 Obstruction-free and Obstruction-free 6 Progressiveness.
Proof. We can show that there exists an execution exported by an obstruction-free TM, but not by any
progressive TM and vice-versa.
Consider a t-read X by a transaction T that runs step contention-free from a configuration that contains
an incomplete write to X. Weak progressiveness does not preclude T from being aborted in such an
execution. Obstruction-free TMs however, must ensure that T must complete its read of X without
blocking or aborting in such executions. On the other hand, weak progressiveness requires two non-
conflicting transactions to not be aborted even in executions that are not step contention-free; but this
is not guaranteed by obstruction-freedom.
In general, progressive TMs (including the ones described in the thesis) satisfy the following stronger
definition: for every transaction Ti ∈ txns(E) that returns Ai in an execution E, there exists prefix E′
of E and a transaction Tk ∈ txns(E′) such that Tk and Ti conflict in E. However, for the lower bound
results stated in the thesis, we stick to Definition 2.6.
Strong progressiveness. One may observe that the definition of progressiveness does not preclude
two conflicting transactions (over a single t-object) from each being aborted. Thus, we study a stronger
notion of progressiveness called strong progressiveness [64].
Let CObjE(Ti) denote the set of t-objects over which transaction Ti ∈ txns(H) conflicts with any other
transaction in an execution E, i.e., X ∈ CObjE(Ti), iff there exist transactions Ti and Tk that conflict
on X in E. Let Q ⊆ txns(E) and CObjE(Q) =
⋃
Ti∈Q
CObjE(Ti).
Let CTrans(E) denote the set of non-empty subsets of txns(E) such that a set Q is in CTrans(E) if no
transaction in Q conflicts with a transaction not in Q.
Definition 2.7 (Strong progressiveness). A TM implementationM is strongly progressive ifM is weakly
progressive and for every execution E of M and for every set Q ∈ CTrans(E) such that |CObjE(Q)| ≤ 1,
some transaction in Q is not aborted in E [64].
The above definitions imply:
Corollary 2.4. Minimal progressiveness (sequential TM-progress)  Progressiveness  Strong progres-
siveness.
26
2.4 TM-liveness
Mv-permissiveness. Perelman et al. introduced the notion of mv-permissiveness, a TM-progress
property designed to prevent read-only transactions from being aborted.
Definition 2.8 (Mv-permissiveness). A TM implementation M is mv-permissive if for every execution
E of M and for every transaction Tk ∈ txns(E) that returns Ak in E, we have that Wset(Tk) 6= ∅ and
there exists an updating transaction Tm ∈ txns(E) such that Tk and Tm conflict in E.
We observe that mv-permissiveness is strictly stronger than progressiveness, but incomparable to strong
progressiveness.
Observation 2.5. Progressiveness  Mv-permissiveness.
Proof. Since mv-permissive TMs allow a transaction to be aborted only on read-write conflicts, they
also satisfy progressiveness. But the converse is not true. Consider an execution in which a read-only
transaction Ti that runs concurrently with a conflicting updating transaction Tj . By the definition of
progressiveness, both Ti and Tj may be aborted in such an execution. However, a mv-permissive TM
would not allow Ti to be aborted since it is read-only.
Observation 2.6. Strong progressiveness 6 Mv-permissiveness and Mv-permissiveness 6 Strong pro-
gressiveness.
Proof. Consider an execution in which a read-only transaction Ti that runs concurrently with an up-
dating transaction Tj such that Ti and Tj conflict on at least two t-objects. By the definition of strong
progressiveness, both Ti and Tj may be aborted in such an execution. However, a mv-permissive TM
would not allow Ti to be aborted since it is read-only.
On the other hand, consider an execution in which two updating transactions Ti and Tj that conflict on
a single t-object. A mv-permissive TM allows both Ti and Tj to be aborted, but strong progressiveness
ensures that at least one of Ti or Tj is not aborted in such an execution.
2.4 TM-liveness
Observe that a TM-progress condition only specifies the conditions under which a transaction is aborted,
but does not specify the conditions under which it must commit. For instance, the OF TM-progress
condition specifies that a transaction T may be aborted only in executions that are not step contention-
free for T , but does not guarantee that T is committed in a step contention-free execution. Thus, in
addition to a progress condition, we must stipulate a liveness [11, 100] condition.
We now define the TM-liveness conditions considered in the thesis.
Definition 2.9 (Sequential TM-liveness). A TM implementation M provides sequential TM-liveness
if for every finite execution E of M , and every transaction Tk that runs t-sequentially and applies the
invocation of a t-operation opk immediately after E, the finite step contention-free extension for opk
contains a response.
Definition 2.10 (Interval contention-free (ICF) TM-liveness). A TM implementation M provides in-
terval contention-free (ICF) TM-liveness if for every finite execution E of M such that the configuration
after E is quiescent, and every transaction Tk that applies the invocation of a t-operation opk immediately
after E, the finite step contention-free extension for opk contains a response.
Definition 2.11 (Starvation-free TM-liveness). A TM implementation M provides starvation-free TM-
liveness if in every execution of M , each t-operation eventually returns a matching response, assuming
that no concurrent t-operation stops indefinitely before returning.
Definition 2.12 (Obstruction-free (OF) TM-liveness). A TM implementation M provides obstruction-
free (OF) TM-liveness if for every finite execution E of M , and every transaction Tk that applies the
invocation of a t-operation opk immediately after E, the finite step contention-free extension for opk
contains a matching response.
27
Chapter 2 Transactional Memory model
Definition 2.13 (Wait-free (WF) TM-liveness). A TM implementation M provides wait-free (WF)
TM-liveness if in every execution of M , every t-operation returns a response in a finite number of its
steps.
The following observations are immediate from the definitions:
Observation 2.7. Sequential TM-liveness  ICF TM-liveness  OF TM-liveness  WF TM-liveness
and Starvation-free TM-liveness  WF TM-liveness.
Since ICF TM-liveness guarantees that a t-operation returns a response if there is no other concurrent
t-operation, we have:
Observation 2.8. ICF TM-liveness  Starvation-free TM-liveness.
However, we observe that OF TM-liveness and starvation-free TM-liveness are incomparable.
Observation 2.9. Starvation-free TM-liveness 6 OF TM-liveness and OF TM-liveness 6 Starvation-
free TM-liveness.
Proof. Consider the step contention-free execution of t-operation opk concurrent with t-operation opm:
opk must return a matching response within a finite number of its steps, but this is not necessarily ensured
by starvation-free TM-liveness (opm may be delayed indefinitely). On the other hand, in executions where
two concurrent t-operations opk and opk encounter step contention, but neither stalls indefinitely, both
must return matching responses. But this is not guaranteed by OF TM-liveness.
2.5 Invisible reads
In this section, we introduce the notion of invisible reads that intuitively ensures that a reading trans-
action does not cause a concurrent transaction to abort. Since most TM worklods are believed to be
read-dominated, this is considered to be an important TM property for performance [25, 65].
Invisible reads. Informally, in a TM using invisible reads, a transaction cannot reveal any information
about its read set to other transactions. Thus, given an execution E and some transaction Tk with a
non-empty read set, transactions other than Tk cannot distinguish E from an execution in which Tk’s
read set is empty. This prevents TMs from applying nontrivial primitives during t-read operations and
from announcing read sets of transactions during tryCommit. Most popular TM implementations like
TL2 [39] and NOrec [36] satisfy this property.
Definition 2.14 (Invisible reads [23]). We say that a TM implementation M uses invisible reads if for
every execution E of M :
• for every read-only transaction Tk ∈ txns(E), no event of E|k is nontrivial in E,
• for every updating transaction Tk ∈ txns(E); RsetE(Tk) 6= ∅, there exists an execution E′ of M
such that
– RsetE′(Tk) = ∅,
– txns(E) = txns(E′) and ∀Tm ∈ txns(E) \ {Tk}: E|m = E′|m
– for any two transactions Ti, Tj ∈ txns(E), if the last event of Ti precedes the first event of
Tj in E, then the last event of Ti precedes the first event of Tj in E′.
Weak invisible reads. We introduce the notion of weak invisible reads that prevents t-read operations
from applying nontrivial primitives only in the absence of concurrent transactions. Specifically, weak
read invisibility allows t-read operations of a transaction T to be “visible”, i.e., write to base objects,
only if T is concurrent with another transaction.
28
2.6 Disjoint-access parallelism (DAP)
Definition 2.15 (Weak invisible reads). For any execution E and any t-operation pik invoked by some
transaction Tk ∈ txns(E), let E|pik denote the subsequence of E restricted to events of pik in E.
We say that a TM implementation M satisfies weak invisible reads if for any execution E of M and
every transaction Tk ∈ txns(E); Rset(Tk) 6= ∅ that is not concurrent with any transaction Tm ∈ txns(E),
E|pik does not contain any nontrivial events, where pik is any t-read operation invoked by Tk in E.
For example, the popular TM implementation DSTM [79] satisfies weak invisible reads, but not invisible
reads. Algorithm 5.1 in Chapter 4 depicts a TM implementation that is based on DSTM satisfying weak
invisible reads, but not the stronger definition of invisible reads.
2.6 Disjoint-access parallelism (DAP)
The notion of disjoint-access parallelism (DAP) [86] is considered important in the TM context since
it allows two transactions accessing unrelated t-objects to execute without memory contention. In this
section, we preview the DAP definitions proposed in literature and identify the relations between them.
Strict data-partitioning. Let E|X denote the subsequence of the execution E derived by removing all
events associated with t-object X. A TM implementation M is strict data-partitioned [64], if for every
t-object X, there exists a set of base objects BaseM (X) such that
• for any two t-objects X1, X2; BaseM (X1) ∩ BaseM (X2) = ∅,
• for every execution E of M and every transaction T ∈ txns(E), every base object accessed by T
in E is contained in BaseM (X) for some X ∈ Dset(T )
• for all executions E and E′ of M , if E|X = E|X for some t-object X, then the configurations
after E and E′ only differ in the states of the base objects in BaseM (X).
Strict disjoint-access parallelism. A TM implementation M is strictly disjoint-access parallel (strict
DAP) if, for all executions E of M , and for all transactions Ti and Tj that participate in E, Ti and Tj
contend on a base object in E only if Dset(Ti) ∩Dset(Tj) 6= ∅ [64].
Proposition 2.1. Strict DAP  Strict data-partitioning.
Proof. Let M be any strict data-partitioned TM implementation. Then, M is also strict DAP. Indeed,
since any two transactions accessing mutually disjoint data sets in a strict data-partitioned implemen-
tation cannot access a common base object in any execution E of M , E also ensures that any two
transactions contend on the same base object in E only if they access a common t-object.
Consider the following execution E of a strict DAP TM implementaton M that begins with two trans-
actions T1 and T2 that access disjoint data sets in E. A strict data-partitioned TM implementation
would preclude transactions T1 and T2 from accessing the same base object, but a strict DAP TM
implementation does not preclude this possibility.
We now describe two relaxations of strict DAP. For the formal definitions, we introduce the notion of a
conflict graph which captures the dependency relation among t-objects accessed by transactions.
Read-write (RW) disjoint-access parallelism. Informally, read-write (RW) DAP means that two
transactions can contend on a common base object only if their data sets are connected in the conflict
graph, capturing write-set overlaps among all concurrent transactions.
We denote by τE(Ti, Tj), the set of transactions (Ti and Tj included) that are concurrent to at least one
of Ti and Tj in an execution E.
Let G˜(Ti, Tj , E) be an undirected graph whose vertex set is
⋃
T∈τE(Ti,Tj)
Dset(T ) and there is an edge
between t-objects X and Y iff there exists T ∈ τE(Ti, Tj) such that {X,Y } ∈ Wset(T ). We say that
29
Chapter 2 Transactional Memory model
Ti and Tj are read-write disjoint-access in E if there is no path between a t-object in Dset(Ti) and a
t-object in Dset(Tj) in G˜(Ti, Tj , E). A TM implementation M is read-write disjoint-access parallel (RW
DAP) if, for all executions E of M , transactions Ti and Tj contend on the same base object in E only
if Ti and Tj are not read-write disjoint-access in E or there exists a t-object X ∈ Dset(Ti)∩Dset(Tj).
Proposition 2.2. RW DAP  Strict DAP.
Proof. From the definitions, it is immediate that every strict DAP TM implementation satisfies RW
DAP.
But the converse is not true (Algorithm 5.1 describes a TM implementation that satisfies RW and
weak DAP, but not strict DAP). Consider the following execution E of a weak DAP or RW DAP TM
implementatonM that begins with the t-incomplete execution of a transaction T0 that accesses t-objects
X and Y , followed by the step contention-free executions of two transactions T1 and T2 which access X
and Y respectively. Transactions T1 and T2 may contend on a base object since there is a path between
X and Y in G(T1, T2, E). However, a strict DAP TM implementation would preclude transactions T1
and T2 from contending on the same base object since Dset(T1) ∩Dset(T2) = ∅ in E.
Weak disjoint-access parallelism. Informally, weak DAP means that two transactions can concur-
rently contend on a common base object only if their data sets are connected in the conflict graph,
capturing data-set overlaps among all concurrent transactions.
Let G(Ti, Tj , E) be an undirected graph whose vertex set is
⋃
T∈τE(Ti,Tj)
Dset(T ) and there is an edge
between t-objects X and Y iff there exists T ∈ τE(Ti, Tj) such that {X,Y } ∈ Dset(T ). We say that
Ti and Tj are disjoint-access in E if there is no path between a t-object in Dset(Ti) and a t-object in
Dset(Tj) in G(Ti, Tj , E). A TM implementation M is weak disjoint-access parallel (weak DAP) if, for all
executions E of M , transactions Ti and Tj concurrently contend on the same base object in E only if Ti
and Tj are not disjoint-access in E or there exists a t-object X ∈ Dset(Ti) ∩Dset(Tj) [24, 110].
We now prove an auxiliary lemma, inspired by [24], concerning weak DAP TM implementations that
will be useful in subsequent proofs. Intuitively, the lemma states that, two transactions that are disjoint-
access and running one after the other in an execution of a weak DAP TM cannot contend on the same
base object.
Lemma 2.10. Let M be any weak DAP TM implementation. Let α · ρ1 · ρ2 be any execution of M
where ρ1 (resp., ρ2) is the step contention-free execution fragment of transaction T1 6∈ txns(α) (resp.,
T2 6∈ txns(α)) and transactions T1, T2 are disjoint-access in α · ρ1 · ρ2. Then, T1 and T2 do not contend
on any base object in α · ρ1 · ρ2.
Proof. Suppose, by contradiction that T1 and T2 contend on the same base object in α · ρ1 · ρ2.
If in ρ1, T1 performs a nontrivial event on a base object on which they contend, let e1 be the last event in
ρ1 in which T1 performs such an event to some base object b and e2, the first event in ρ2 that accesses b.
Otherwise, T1 only performs trivial events in ρ1 to base objects on which it contends with T2 in α ·ρ1 ·ρ2:
let e2 be the first event in ρ2 in which ρ2 performs a nontrivial event to some base object b on which
they contend and e1, the last event of ρ1 in T1 that accesses b.
Let ρ′1 (and resp. ρ′2) be the longest prefix of ρ1 (and resp. ρ2) that does not include e1 (and resp. e2).
Since before accessing b, the execution is step contention-free for T1, α · ρ′1 · ρ′2 is an execution of M . By
construction, T1 and T2 are disjoint-access in α · ρ′1 · ρ′2 and α · ρ1 · ρ′2 is indistinguishable to T2 from
α ·ρ′1 ·ρ′2. Hence, T1 and T2 are poised to apply contending events e1 and e2 on b in the configuration after
α · ρ′1 · ρ′2—a contradiction since T1 and T2 cannot concurrently contend on the same base object.
We now show that weak DAP is a weaker property than RW DAP.
Proposition 2.3. Weak DAP  RW DAP.
30
2.7 TM complexity metrics
Proof. Clearly, every implementation that satisfies RW DAP also satisfies weak DAP since the conflict
graph G˜(Ti, Tj , E) (for RW DAP) is a subgraph of G(Ti, Tj , E) (for weak DAP).
However, the converse is not true (Algorithm 5.2 describes a TM implementation that satisfies weak
DAP, but not RW DAP). Consider the following execution E of a weak DAP TM implementaton M
that begins with the t-incomplete execution of a transaction T0 that reads X and writes to Y , followed
by the step contention-free executions of two transactions T1 and T2 which write to X and read Y
respectively. Transactions T1 and T2 may contend on a base object since there is a path between X and
Y in G(T1, T2, E). However, a RW DAP TM implementation would preclude transactions T1 and T2 from
contending on the same base object: there is no edge between t-objects X and Y in the corresponding
conflict graph G˜(T1, T2, E) because X and Y are not contained in the write set of T0.
Thus, the above propositions imply:
Corollary 2.11. Weak DAP  RW DAP  Strict DAP  Strict data-partitioning.
2.7 TM complexity metrics
We now present an overview of some of the TM complexity metrics we consider in the thesis.
Step complexity. The step complexity metric, is the total number of events that a process performs
on the shared memory, in the worst case, in order to complete its operation on the implementation.
RAW/AWAR patterns. Attiya et al. identified two common expensive synchronization patterns that
frequently arise in the design of concurrent algorithms: read-after-write (RAW) or atomic write-after-
read (AWAR) [17, 103] and showed that it is impossible to derive RAW/AWAR-free implementations of
a wide class of data types that include sets, queues and deadlock-free mutual exclusion.
Note the shared memory model in the thesis makes the assumption that CPU events are performed
atomically: every “read” of a base object returns the value of “latest write” to the base object. In practice
however, the CPU architecture’s memory model [3] that specifies the outcome of CPU instructions
is relaxed without enforcing a strict order among the shared memory instructions. Intuitively, RAW
(read-after-write) or AWAR (atomic-write-after-read) patterns [17] capture the amount of “expensive
synchronization”, i.e., the number of costly memory barriers or conditional primitives [3] incurred by
the implementation in relaxed CPU architectures. The metric appears to be more practically relevant
than simply counting the number of steps performed by a process, as it accounts for expensive cache-
coherence operations or instructions like compare-and-swap. Detailed coverage on memory fences and
the RAW/AWAR metric can be found in [103].
Definition 2.16 (Read-after-write metric). A RAW (read-after-write) pattern performed by a transac-
tion Tk in an execution pi is a pair of its events e and e′, such that: (1) e is a write to a base object b
by Tk, (2) e′ is a subsequent read of a base object b′ 6= b by Tk, and (3) no event on b by Tk takes place
between e and e′.
In the thesis, we are concerned only with non-overlapping RAWs, i.e., the read performed by one RAW
precedes the write performed by the other RAW.
Definition 2.17 (Atomic write-after-read metric). An AWAR (atomic-write-after-read) pattern e in an
execution pi · e is a nontrivial rmw event on an object b which atomically returns the value of b (resulting
after pi) and updates b with a new value.
For example, consider the execution pi ·e where e is the application of a compare-and-swap rmw primitive
that returns true.
Stall complexity. Intuitively, the stall metric captures the fact that the time a process might have to
spend before it applies a primitive on a base object can be proportional to the number of processes that
try to update the object concurrently.
31
Chapter 2 Transactional Memory model
Let M be any TM implementation. Let e be an event applied by process p to a base object b as it
performs a transaction T during an execution E of M . Let E = α · e1 · · · em · e · β be an execution of
M , where α and β are execution fragments and e1 · · · em is a maximal sequence of m ≥ 1 consecutive
nontrivial events by distinct distinct processes other than p that access b. Then, we say that T incurs
m memory stalls in E on account of e. The number of memory stalls incurred by T in E is the sum of
memory stalls incurred by all events of T in E [16, 46].
In the thesis, we adopt the following definition of a k-stall execution from [16, 46].
Definition 2.18. An execution α · σ1 · · ·σi is a k-stall execution for t-operation op executed by process
p if
• α is p-free,
• there are distinct base objects b1, . . . , bi and disjoint sets of processes S1, . . . , Si whose union does
not include p and has cardinality k such that, for j = 1, . . . i,
– each process in Sj has an enabled nontrivial event about to access base object bj after α, and
– in σj, p applies events by itself until it is the first about to apply an event to bj, then each
of the processes in Sj applies an event that accesses bj, and finally, p applies an event that
accesses bj,
• p invokes exactly one t-operation op in the execution fragment σ1 · · ·σi
• σ1 · · ·σi contains no events of processes not in ({p} ∪ S1 ∪ · · · ∪ Si)
• in every ({p}∪S1∪· · ·∪Si)-free execution fragment that extends α, no process applies a nontrivial
event to any base object accessed in σ1 · · ·σi.
Observe that in a k-stall execution E for t-operation op, the number of memory stalls incurred by op in
E is k.
The following lemma will be of use in our proofs.
Lemma 2.12. Let α · σ1 · · ·σi be a k-stall execution for t-operation op executed by process p. Then,
α · σ1 · · ·σi is indistinguishable to p from a step contention-free execution [16].
Remote memory references(RMR) [22]. Modern shared memory CPU architectures employ a
memory hierarchy [73]: a hierarchy of memory devices with different capacities and costs. Some of the
memory is local to a given process while the rest of the memory is remote. Accesses to memory locations
(i.e. base objects) that are remote to a given process are often orders of magnitude slower than a local
access of the base object. Thus, the performance of concurrent implementations in the shared memory
model may depend on the number of remote memory references made to base objects [14].
In the cache-coherent (CC) shared memory, each process maintains local copies of shared base objects
inside its cache, whose consistency is ensured by a coherence protocol. Informally, we say that an access
to a base object b is remote to a process p and causes a remote memory reference (RMR) if p’s cache
contains a cached copy of the object that is out of date or invalidated ; otherwise the access is local.
In the write-through (CC) protocol, to read a base object b, process p must have a cached copy of b that
has not been invalidated since its previous read. Otherwise, p incurs a RMR. To write to b, p causes a
RMR that invalidates all cached copies of b and writes to the main memory.
In the write-back (CC) protocol, p reads a base object b without causing a RMR if it holds a cached
copy of b in shared or exclusive mode; otherwise the access of b causes a RMR that (1) invalidates all
copies of b held in exclusive mode, and writing b back to the main memory, (2) creates a cached copy of
b in shared mode. Process p can write to b without causing a RMR if it holds a copy of b in exclusive
mode; otherwise p causes a RMR that invalidates all cached copies of b and creates a cached copy of b
in exclusive mode.
In the distributed shared memory (DSM), each base object is forever assigned to a single process and it
is remote to the others. Any access of a remote register causes a RMR.
32
3
Safety for transactional memory
Arthur: If I asked you where the
hell we were, would I regret it?
Ford: We’re safe.
Arthur: Oh good.
Ford: We’re in a small galley
cabin in one of the spaceships of
the Vogon Constructor Fleet.
Arthur: Ah, this is obviously
some strange use of the word safe
that I wasn’t previously aware of.
Douglas Adams-The Hitchhiker’s
Guide to the Galaxy
3.1 Overview
In the context of Transactional memory, intermediate states witnessed by the read operations of an
incomplete transaction may affect the user application through the outcome of its read operations. If
the intermediate state is not consistent with any sequential execution, the application may experience a
fatal irrecoverable error or enter an infinite loop. Thus, it is important that each transaction, including
aborted ones observes a consistent state so that the implementation does not export any pathological
executions.
A state should be considered consistent if it could result from a serial application of transactions observed
in the current execution. In this sense, every transaction should witness a state that could have been
observed in some execution of the sequential code put by the programmer within the transactions.
Additionally, a consistent state should not depend on a transaction that has not started committing
yet (referred to as deferred-update semantics). This restriction appears desirable, since the ongoing
transaction may still abort (explicitly by the user or because of consistency reasons) and, thus, render
the read inconsistent. Further, the set of histories specified by the consistency criterion must constitute
a safety property, as defined by Owicki and Lamport [108], Alpern and Schneider [11] and refined by
Lynch [100]: it must be non-empty, prefix-closed and limit-closed.
33
Chapter 3 Safety for transactional memory
In this chapter, we define the notion of deferred-update semantics formally, which we then apply to a
spectrum of TM consistency criteria. Additionally, we verify if the resulting TM consistency criterion is
a safety property, as defined by Lynch [100].
We begin by considering the popular criterion of opacity [64], which was the first TM consistency cri-
terion that was proposed to grasp this semantics formally. Opacity requires the states observed by all
transactions, included uncommitted ones, to be consistent with a global serialization, i.e., a serial exe-
cution constituted by committed transactions. Moreover, the serialization should respect the real-time
order : a transaction that completed before (in real time) another transaction started should appear first
in the serialization.
By definition, opacity reduces correctness of a history to correctness of all its prefixes, and thus is
prefix-closed and limit-closed by definition. Thus, to verify that a history is opaque, one needs to
verify that each of its prefixes is consistent with some global serialization. To simplify verification and
explicitly introduce deferred-update semantics into a TM correctness criterion, we specify a general
criterion of du-opacity [18], which requires the global serial execution to respect the deferred-update
property. Informally, a du-opaque history must be indistinguishable from a totally-ordered history, with
respect to which no transaction reads from a transaction that has not started committing.
We show that du-opacity is prefix-closed, that is, every prefix of a du-opaque history is also du-opaque.
We then show that extending opacity (and du-opacity) to infinite histories in a non-trivial way (i.e.,
requiring that even infinite histories should have proper serializations), does not result in a limit-closed
property. However, under certain restrictions, we show that du-opacity is limit-closed. In particular,
assuming that in an infinite history, every transaction completes each of the operations it invoked, the
limit of any sequence of ever extending du-opaque histories is also du-opaque. Therefore, under this
assumption, du-opacity is a safety property [11, 100, 108], and to prove that a TM implementation that
complies with the assumption is du-opaque, it suffices to prove that all its finite histories are du-opaque.
One may notice that the intended safety semantics does not require that all transactions observe the
same serial execution. Intuitively, to avoid pathological executions, we only need that every transaction
witnesses some consistent state, while the views of different aborted and incomplete transactions do not
have to be consistent with the same serial execution. As long as committed transactions constitute a
serial execution and every transaction witnesses a consistent state, the execution can be considered “safe”:
no run-time error that cannot occur in a serial execution can happen. Several definitions like virtual-
world consistency (VWC) [85] and Transactional Memory Specification 1 (TMS1)[43] have adopted this
approach. We introduce “deferred-update” versions of these properties and discuss how the resulting
properties relate to du-opacity.
Finally, we also study the consistency criterion Transactional Memory Specification 2 (TMS2) [43, 98],
which was proposed as a restriction of opacity and verify if it is a safety property.
Roadmap of Chapter 3. In Section 3.2 of this chapter, we formally define safety properties. In
Section 3.3, we introduce the notion of deferred-update semantics and apply it to the correctness criterions
of opacity and strict serializability in Sections 3.4 and 3.5 respectively. Section 3.6 studies two relaxations
of opacity: VWC and TMS1 and a restriction of opacity, TMS2. Section 3.7 summarizes the relations
between the TM correctness properties proposed in the thesis and presents our concluding remarks.
3.2 Safety properties
A property P is a set of (transactional) histories. Intuitively, a safety property says that “no bad thing
ever happens”.
Definition 3.1 (Lynch [100]). A property P is a safety property if it satisfies the following two conditions:
Prefix-closure: For every history H ∈ P, every prefix H ′ of H ( i.e., every prefix of the sequence of the
events in H) is also in P.
34
3.3 Opacity and deferred-update(DU) semantics
Limit-closure: For every infinite sequence of finite histories H0, H1, . . . such that for every i, Hi ∈ P
and Hi is a prefix of Hi+1, the limit of the sequence is also in P.
Notice that the set of histories produced by a TM implementation M is, by construction, prefix-closed.
Therefore, every infinite history of M is the limit of an infinite sequence of ever-extending finite histories
of M . Thus, to prove that M satisfies a safety property P , it is enough to show that all finite histories
of M are in P . Indeed, limit-closure of P then implies that every infinite history of M is also in P .
3.3 Opacity and deferred-update(DU) semantics
In this section, we formalize the notion of deferred-update semantics and apply to the TM correctness
condition of opacity [64].
Definition 3.2 (Guerraoui and Kapalka [64]). A finite history H is final-state opaque if there is a legal
t-complete t-sequential history S, such that
1. for any two transactions Tk, Tm ∈ txns(H), if Tk ≺RTH Tm, then Tk <S Tm, and
2. S is equivalent to a completion of H.
We say that S is a final-state serialization of H.
Final-state opacity is not prefix-closed. Figure 3.1 depicts a t-complete sequential history H that is
final-state opaque, with T1 · T2 being a legal t-complete t-sequential history equivalent to H. Let H ′ =
write1(X, 1), read2(X) be a prefix of H in which T1 and T2 are t-incomplete. Transaction Ti (i = 1, 2)
is completed by inserting tryCi · Ai immediately after the last event of Ti in H. Observe that neither
T1 · T2 nor T2 · T1 allow us to derive a serialization of H ′ (we assume that the initial value of X is 0).
A restriction of final-state opacity, which we refer to as opacity [64] explicitly filters out histories that
are not prefix-closed.
tryC2R2(X)→ 1
tryC1W1(X, 1)
T1 C1
T2 C2
H ′ H
Figure 3.1: History H is final-state opaque, while its prefix H ′ is not final-state opaque.
Definition 3.3 (Guerraoui and Kapalka [64]). A history H is opaque if and only if every finite prefix
H ′ of H (including H itself if it is finite) is final-state opaque.
It can be easily seen that opacity is prefix- and limit-closed, and, thus, it is a safety property.
We now give a formal definition of opacity with deferred-update semantics. Then we show that the
property is prefix-closed and, under certain liveness restrictions, limit-closed.
Let H be any history and let S be a legal t-complete t-sequential history that is equivalent to some
completion of H. Let <S be the total order on transactions in S.
Definition 3.4 (Local serialization). For any readk(X) that does not return Ak, let Sk,X be the prefix
of S up to the response of readk(X) and Hk,X be the prefix of H up to the response of readk(X).
Sk,XH , the local serialization of readk(X) with respect to H and S, is the subsequence of S
k,X derived by
removing from Sk,X the events of all transactions Tm ∈ txns(H) \ {Tk} such that Hk,X does not contain
an invocation of tryCm().
35
Chapter 3 Safety for transactional memory
We are now ready to present our correctness condition, du-opacity.
Definition 3.5 (Du-opacity). A history H is du-opaque if there is a legal t-complete t-sequential history
S such that
1. there is a completion of H that is equivalent to S, and
2. for every pair of transactions Tk, Tm ∈ txns(H), if Tk ≺RTH Tm, then Tk <S Tm, i.e., S respects
the real-time ordering of transactions in H, and
3. each readk(X) in S that does not return Ak is legal in S
k,X
H .
We then say that S is a (du-opaque) serialization of H.
Informally, a history H is du-opaque if there is a legal t-sequential history S that is equivalent to H,
respects the real-time ordering of transactions in H and every t-read is legal in its local serialization with
respect to H and S. The third condition reflects the implementation’s deferred-update semantics, i.e.,
the legality of a t-read in a serialization does not depend on transactions that start committing after the
response of the t-read.
For any du-opaque serialization S, seq(S) denotes the sequence of transactions in S and seq(S)[k] denotes
the kth transaction in this sequence.
3.4 On the safety of du-opacity
In this section, we examine the safety properties of du-opacity, i.e., whether it is prefix-closed and
limit-closed.
3.4.1 Du-opacity is prefix-closed
Lemma 3.1. Let H be a du-opaque history and let S be a serialization of H. For any i ∈ N, there is a
serialization Si of Hi (the prefix of H consisting of the first i events), such that seq(Si) is a subsequence
of seq(S).
Proof. Given H, S and Hi, we construct a t-complete t-sequential history Si as follows:
– for every transaction Tk that is t-complete in Hi, Si|k = S|k.
– for every transaction Tk that is complete but not t-complete in Hi, Si|k consists of the sequence
of events in Hi|k, immediately followed by tryCk() ·Ak.
– for every transaction Tk with an incomplete t-operation, opk = readk∨writek∨ tryAk() in Hi, Si|k
is the sequence of events in S|k up to the invocation of opk, immediately followed by Ak.
– for every transaction Tk ∈ txns(Hi) with an incomplete t-operation, opk = tryCk(), Si|k = S|k.
By the above construction, Si is indeed a t-complete history and every transaction that appears in Si
also appears in S. We order transactions in Si so that seq(Si) is a subsequence of seq(S).
Note that Si is derived from events contained in some completion H of H that is equivalent to S and
some other events to derive a completion of Si. Since Si contains events from every complete t-operation
in Hi and other events included satisfy Definition 2.1, there is a completion of Hi that is equivalent to
Si.
We now argue that Si is a serialization of Hi. First we observe that Si respects the real-time order of
Hi. Indeed, if Tj ≺RTHi Tk, then Tj ≺RTH Tk and Tj <S Tk. Since seq(Si) is a subsequence of seq(S), we
have Tj <Si Tk.
36
3.4 On the safety of du-opacity
W1(X, 1) tryC1
R2(X)→ 1
Ri(X)→ 0R3(X)→ 0
T1
T2
T3 Ti
Figure 3.2: An infinite history in which tryC1 is incomplete and any two transactions are concurrent.
Each finite prefix of the history is du-opaque, but the infinite limit of the ever-extending
sequence is not du-opaque.
To show that Si is legal, suppose, by way of contradiction, that there is some readk(X) that returns
v 6= Ak in Hi such that v is not the latest written value of X in Si. If Tk contains a writek(X, v′)
preceding readk(X) such that v 6= v′ and v is not the latest written value for readk(X) in Si, it is also
not the latest written value for readk(X) in S, which is a contradiction. Thus, the only case to consider
is when readk(X) should return a value written by another transaction.
Since S is a serialization of H, there is a committed transaction Tm that performs the last writem(X, v)
that precedes readk(X) in Tk in S. Moreover, since readk(X) is legal in the local serialization of readk(X)
in H with respect to S, the prefix of H up to the response of readk(X) must contain an invocation of
tryCm(). Thus, readk(X) 6≺RTH tryCm() and Tm ∈ txns(Hi). By construction of Si, Tm ∈ txns(Si) and
Tm is committed in Si.
We have assumed, towards a contradiction, that v is not the latest written value for readk(X) in Si.
Hence, there is a committed transaction Tj that performs writej(X, v′); v′ 6= v in Si such that Tm <Si
Tj <Si Tk. But this is not possible since seq(Si) is a subsequence of seq(S).
Thus, Si is a legal t-complete t-sequential history equivalent to some completion of Hi. Now, by the
construction of Si, for every readk(X) that does not return Ak in Si, we have Si
k,X
Hi = S
k,X
H . Indeed, the
transactions that appear before Tk in Si
k,X
Hi are those with a tryC event before the response of readk(X)
in H and are committed in S. Since seq(Si) is a subsequence of seq(S), we have Sik,XHi = S
k,X
H . Thus,
readk(X) is legal in Si
k,X
Hi .
Lemma 3.1 implies that every prefix of a du-opaque history has a du-opaque serialization and thus:
Corollary 3.2. Du-opacity is a prefix-closed property.
3.4.2 The limit of du-opaque histories
We observe, however, that du-opacity is, in general, not limit-closed. We present an infinite history that
is not du-opaque, but each of its prefixes is.
Proposition 3.1. Du-opacity is not a limit-closed property.
Proof. Let Hj denote a finite prefix of H of length j. Consider an infinite history H that is the limit of
the histories Hj defined as follows (see Figure 3.2):
– Transaction T1 performs a write1(X, 1) and then invokes tryC1() that is incomplete in H.
– Transaction T2 performs a read2(X) that overlaps with tryC1() and returns 1.
– There are infinitely many transactions Ti, i ≥ 3, each of which performing a single readi(X) that
returns 0 such that each Ti overlaps with both T1 and T2.
We now prove that, for all j ∈ N, Hj is a du-opaque history. Clearly, H0 and H1 are du-opaque histories.
For all j > 1, we first derive a completion of Hj as follows:
37
Chapter 3 Safety for transactional memory
1. tryC1() (if it is contained in Hj) is completed by inserting C1 immediately after its invocation,
2. for all i ≥ 2, any incomplete readi(X) that is contained in Hj is completed by inserting Ai and
tryCi ·Ai immediately after its invocation, and
3. for all i ≥ 2 and every complete readj(X) that is contained in Hj , we include tryCi ·Ai immediately
after the response of this readj(X).
We can now derive a t-complete t-sequential history Sj equivalent to the above derived completion of Hj
from the sequence of transactions T3, . . . , Ti, T1, T2 (depending on which of these transactions participate
in Hj), where i ≥ 3. It is easy to observe that Sj so derived is indeed a serialization of Hj .
However, there is no serialization ofH. Suppose that such a serialization S exists. Since every transaction
that participates in H must participate in S, there exists n ∈ N such that seq(S)[n] = T1. Consider the
transaction at index n+ 1, say Ti in seq(S). But for any i ≥ 3, Ti must precede T1 in any serialization
(by legality), which is a contradiction.
Notice that all finite prefixes of the infinite history depicted in Figure 3.2 are also opaque. Thus, if we
extend the definition of opacity to cover infinite histories in a non-trivial way, i. e., by explicitly defining
opaque serializations for infinite histories, we can reformulate Proposition 3.1 for opacity.
3.4.3 Du-opacity is limit-closed for complete histories
We show now that du-opacity is limit-closed if the only infinite histories we consider are those in which
every transaction eventually completes (but not necessarily t-completes).
We first prove an auxiliary lemma on du-opaque serializations. For a transaction T ∈ txns(H), the live
set of T in H, denoted LsetH(T ) (T included), is defined as follows: every transaction T ′ ∈ txns(H)
such that neither the last event of T ′ precedes the first event of T in H nor the last event of T precedes
the first event of T ′ in H is contained in LsetH(T ). We say that transaction T ′ ∈ txns(H) succeeds the
live set of T and we write T ≺LSH T ′ if in H, for all T ′′ ∈ LsetH(T ), T ′′ is complete and the last event
of T ′′ precedes the first event of T ′.
Lemma 3.3. Let H be a finite du-opaque history and assume Tk ∈ txns(H) is a complete transaction
in H, such that every transaction in LsetH(Tk) is complete in H. Then there is a serialization S of H,
such that for all Tk, Tm ∈ txns(H), if Tk ≺LSH Tm, then Tk <S Tm.
Proof. Since H is du-opaque, there is a serialization S˜ of H.
Let S be a t-complete t-sequential history such that txns(S˜) = txns(S), and ∀ Ti ∈ txns(S˜) : S|i = S˜|i. We
now perform the following procedure iteratively to derive seq(S) from seq(S˜). Initially seq(S) = seq(S˜).
For each Tk ∈ txns(H), let T` ∈ txns(H) denote the earliest transaction in S˜ such that Tk ≺LSH T`. If
T` <S˜ Tk (implying Tk is not t-complete), then move Tk to immediately precede T` in seq(S).
By construction, S is equivalent to S˜ and for all Tk, Tm ∈ txns(H); Tk ≺LSH Tm, Tk <S Tm We claim that
S is a serialization of H. Observe that any two transactions that are complete in H, but not t-complete
are not related by real-time order in H. By construction of S, for any transaction Tk ∈ txns(H), the set
of transactions that precede Tk in S˜, but succeed Tk in S are not related to Tk by real-time order. Since
S˜ respects the real-time order in H, this holds also for S.
We now show that S is legal. Consider any readk(X) performed by some transaction Tk that returns
v ∈ V in S and let T` ∈ txns(H) be the earliest transaction in S˜ such that Tk ≺LSH T`. Suppose,
by contradiction, that readk(X) is not legal in S. Thus, there is a committed transaction Tm that
performs writem(X, v) in S˜ such that Tm = T` or T` <S˜ Tm <S˜ Tk. Note that, by our assumption,
readk(X) ≺RTH tryC`(). Since readk(X) must be legal in its local serialization with respect to H and S˜,
readk(X) 6≺RTH tryCm(). Thus, Tm ∈ LsetH(Tk). Therefore Tm 6= T`. Moreover, Tm is complete, and
since it commits in S˜, it is also t-complete in H and the last event of Tm precedes the first event of T`
in H, i.e., Tm ≺RTH T`. Hence, T` cannot precede Tm in S˜—a contradiction.
38
3.4 On the safety of du-opacity
Observe also that since Tk is complete in H but not t-complete, H does not contain an invocation of
tryCk(). Thus, the legality of any other transaction is unaffected by moving Tk to precede T` in S. Thus,
S is a legal t-complete t-sequential history equivalent to some completion of H. The above arguments
also prove that every t-read in S is legal in its local serialization with respect to H and S and, thus, S
is a serialization of H.
The proof uses König’s Path Lemma [87] formulated as follows. Let G on a rooted directed graph and
let v0 be the root of G. We say that vk, a vertex of G, is reachable from v0, if there is a sequence of
vertices v0 . . . , vk such that for each i, there is an edge from vi to vi+1. G is connected if every vertex in
G is reachable from v0. G is finitely branching if every vertex in G has a finite out-degree. G is infinite
if it has infinitely many vertices.
Lemma 3.4 (König’s Path Lemma [87]). If G is an infinite connected finitely branching rooted directed
graph, then G contains an infinite sequence of distinct vertices v0, v1, . . ., such that v0 is the root, and
for every i ≥ 0, there is an edge from vi to vi+1.
Theorem 3.5. Under the restriction that in any infinite history H, every transaction Tk ∈ txns(H) is
complete, du-opacity is a limit-closed property.
Proof. We want to show that the limit H of an infinite sequence of finite ever-extending du-opaque
histories is du-opaque. By Corollary 3.2, we can assume the sequence of du-opaque histories to be
H0, H1, . . . Hi, Hi+1, . . . such that for all i ∈ N, Hi+1 is the one-event extension of Hi.
We construct a rooted directed graph GH as follows:
1. The root vertex of GH is (H0, S0) where S0 and H0 contain the initial transaction T0.
2. Each non-root vertex of GH is a tuple (Hi, Si), where Si is a du-opaque serialization of Hi that
satisfies the condition specified in Lemma 3.3: for all Tk, Tm ∈ txns(H); Tk ≺LSHi Tm implies
Tk <Si Tm. Note that there exist several possible serializations for any Hi. For succinctness, in
the rest of this proof, when we refer to a specific Si, it is understood to be associated with the
prefix Hi of H.
3. Let cseq i(Sj), j ≥ i, denote the subsequence of seq(Sj) restricted to transactions whose last event
in H is a response event and it is contained in Hi. For every pair of vertices v = (Hi, Si) and
v′ = (Hi+1, Si+1) in GH , there is an edge from v to v′ if cseq i(Si) = cseq i(Si+1).
The out-degree of a vertex v = (Hi, Si) in GH is defined by the number of possible serializations of
Hi+1, bounded by the number of possible permutations of the set txns(Si+1), implying that GH is
finitely branching.
By Lemma 3.1, given any serialization Si+1 of Hi+1, there is a serialization Si of Hi such that seq(Si)
is a subsequence of seq(Si+1). Indeed, the serialization Si of Hi also respects the restriction specified
in Lemma 3.3. Since seq(Si+1) contains every complete transaction that takes its last step in H in Hi,
cseq i(S
i) = cseq i(S
i+1). Therefore, for every vertex (Hi+1, Si+1), there is a vertex (Hi, Si) such that
cseq i(S
i) = cseqi(S
i+1). Thus, we can iteratively construct a path from (H0, S0) to every vertex (Hi, Si)
in GH , implying that GH is connected.
We now apply König’s Path Lemma (Lemma 3.4) to GH . Since GH is an infinite connected finitely
branching rooted directed graph, we can derive an infinite sequence of distinct vertices
L = (H0, S0), (H1, S1), . . . , (Hi, Si), . . .
such that cseq i(Si) = cseq i(Si+1).
The rest of the proof explains how to use L to construct a serialization of H. We begin with the following
claim concerning L.
Claim 3.6. For any j > i, cseq i(Si) = cseq i(Sj).
39
Chapter 3 Safety for transactional memory
Proof. Recall that cseq i(Si) is a prefix of cseq i(Si+1), and cseq i+1(Si+1) is a prefix of cseq i+1(Si+2).
Also, cseq i(Si+1) is a subsequence of cseq i+1(Si+1). Hence, cseq i(Si) is a subsequence of cseq i+1(Si+2).
But, cseq i+1(Si+2) is a subsequence of cseq i+2(Si+2). Thus, cseq i(Si) is a subsequence of cseq i+2(Si+2).
Inductively, for any j > i, cseq i(Si) is a subsequence of cseqj(Sj). But cseq i(Sj) is the subsequence of
cseqj(S
j) restricted to complete transactions in H whose last step is in Hi. Thus, cseq i(Si) is indeed
equal to cseq i(Sj).
Let f : N→ txns(H) be defined as follows: f(1) = T0. For every integer k > 1, let
ik = min{` ∈ N|∀j > ` : cseq`(S`)[k] = cseqj(Sj)[k]}
Then, f(k) = cseq ik(S
ik)[k].
Claim 3.7. The function f is total and bijective.
Proof. (Totality and surjectivity)
Since each transaction T ∈ txns(H) is complete in some prefix Hi of H, for each k ∈ N, there exists i ∈ N
such that cseq i(Si)[k] = T . By Claim 3.6, for any j > i, cseq i(Si) = cseq i(Sj). Since a transaction that
is complete in Hi w.r.t H is also complete in Hj w.r.t H, it follows that for every j > i, cseqj(Sj)[k′] = T ,
with k′ ≥ k. By construction of GH and the assumption that each transaction is complete in H, there
exists i ∈ N such that each T ∈ LsetHi(T ) is complete in H and its last step is in Hi, and T precedes
in Si every transaction whose first event succeeds the last event of each T ′ ∈ LsetHi(T ) in Hi. Indeed,
this implies that for each k ∈ N, there exists i ∈ N such that cseq i(Si)[k] = T ; ∀j > i : cseqj(Sj)[k] = T .
This shows that for every T ∈ txns(H), there are i, k ∈ N; cseq i(Si)[k] = T , such that for every j > i,
cseqj(S
j)[k] = T . Thus, for every T ∈ txns(H), there is k such that f(k) = T .
(Injectivity)
If f(k) and f(m) are transactions at indices k, m of the same cseq i(Si), then clearly f(k) = f(m) implies
k = m. Suppose f(k) is the transaction at index k in some cseq i(Si) and f(m) is the transaction at
index m in some cseq`(S`). For every ` > i and k < m, if cseq i(Si)[k] = T , then cseq`(S`)[m] 6= T since
cseq i(S
i) = cseq i(S
`). If ` > i and k > m, it follows from the definition that f(k) 6= f(m). Similar
arguments for the case when ` < i prove that if f(k) = f(m), then k = m.
By Claim 3.7, F = f(1), f(2), . . . , f(i), . . . is an infinite sequence of transactions. Let S be a t-complete
t-sequential history such that seq(S) = F and for each t-complete transaction Tk in H, S|k = H|k; and
for transaction that is complete, but not t-complete in H, S|k consists of the sequence of events in H|k,
immediately followed by tryAk() ·Ak. Clearly, there is a completion of H that is equivalent to S.
Let F i be the prefix of F of length i, and Ŝi be the prefix of S such that seq(Ŝi) = F i.
Claim 3.8. Let Ĥji be a subsequence of H
j reduced to transactions Tk ∈ txns(Ŝi) such that the last event
of Tk in H is a response event and it is contained in Hj. Then, for every i, there is j such that Ŝi is a
serialization of Ĥji .
Proof. Let Hj be the shortest prefix of H (from L) such that for each T ∈ txns(Ŝi), if seq(Sj)[k] = T ,
then for every j′ > j, seq(Sj
′
)[k] = T . From the construction of F , such j and k exist. Also, we observe
that txns(Ŝi) ⊆ txns(Sj) and F i is a subsequence of seq(Sj). Using arguments similar to the proof of
Lemma 3.1, it follows that Ŝi is indeed a serialization of Ĥji .
Since H is complete, there is exactly one completion of H, where each transaction Tk that is not t-
complete in H is completed with tryCk · Ak after its last event. By Claim 7.11, the limit t-sequential
t-complete history is equivalent to this completion, is legal, respects the real-time order of H, and ensures
that every read is legal in the corresponding local serialization. Thus, S is a serialization of H.
40
3.4 On the safety of du-opacity
W1(X, 1) tryC1
R2(X)→ 1
W3(X, 1) tryC3
T1 A1
T2
T3 C3
Figure 3.3: A history that is opaque, but not du-opaque.
Theorem 3.5 implies the following:
Corollary 3.9. Let M be a TM implementation that ensures that in every infinite history H of M ,
every transaction T ∈ txns(H) is complete in H. Then, M is du-opaque if and only if every finite history
of M is du-opaque.
3.4.4 Du-opacity vs. opacity
We now compare our deferred-update requirement with the conventional TM correctness property of
opacity [64].
Theorem 3.10. Du-opacity $ Opacity.
Proof. We first claim that every finite du-opaque history is opaque. Let H be a finite du-opaque history.
By definition, there is a final-state serialization S of H. Since du-opacity is a prefix-closed property,
every prefix of H is final-state opaque. Thus, H is opaque.
Again, since every prefix of a du-opaque history is also du-opaque, by Definition 3.3, every infinite
du-opaque history is also opaque.
To show that the inclusion is strict, we present an an opaque history that is not du-opaque. Consider
the finite history H depicted in Figure 3.3: transaction T2 performs a read2(X) that returns the value
1. Observe that read2(X) → 1 is concurrent to tryC1, but precedes tryC3 in real-time order. Although
tryC1 returns A1 in H, the response of read2(X) can be justified since T3 concurrently writes 1 to
X and commits. Thus, read2(X) → 1 reads-from transaction T2 in any serialization of H, but since
read2(X) ≺RTH tryC3, H is not du-opaque even though each of its prefixes is final-state opaque.
We now formally prove that H is opaque. We proceed by examining every prefix of H.
1. Each prefix up to the invocation of read2(X) is trivially final-state opaque.
2. Consider the prefix, Hi of H where the ith event is the response of read2(X). Let Si be a t-
complete t-sequential history derived from the sequence T1, T2 by inserting C1 immediately after
the invocation of tryC1(). It is easy to see that Si is a final-state serialization of Hi.
3. Consider the t-complete t-sequential history S derived from the sequence T1, T3, T2 in which each
transaction is t-complete in H. Clearly, S is a final-state serialization of H.
Since H and every (proper) prefix of it are final-state opaque, H is opaque.
Clearly, the required final-state serialization S of H is specified by seq(S) = T1, T3, T2 in which T1 is
aborted while T3 is committed in S (the position of T1 in the serialization does not affect legality).
Consider read2(X) in S; since H2,X , the prefix of H up to the response of read2(X) does not contain an
invocation of tryC3(), the local serialization of read2(X) with respect to H and S, S
2,X
H is T1 · read2(X).
But read2(X) is not legal in S
2,X
H , which is a contradiction. Thus, H is not du-opaque.
The unique-write case We now show that du-opacity is equivalent to opacity assuming that no two
transactions write identical values to the same t-object (“unique-write” assumption).
Let Opacityuw ⊆ Opacity, be a property defined as follows:
41
Chapter 3 Safety for transactional memory
W1(X, 1) R2(X)→ 1 R2(Y )→ 1
W3(X, 1) W3(Y, 1)
C1 T2
T3 C3
Figure 3.4: A sequential du-opaque history, which is not opaque by the definition of [59].
1. an infinite opaque history H ∈ Opacityuw if and only if every transaction T ∈ txns(H) is complete
in H, and
2. an opaque history H ∈ Opacityuw if and only if for every pair of write operations writek(X, v) and
writem(X, v′), v 6= v′.
Theorem 3.11. Opacityuw =du-opacity.
Proof. We show first that every finite history H ∈Opacityuw is also du-opaque. Let H be any finite
opaque history such that for every pair of write operations writek(X, v) and writem(X, v), performed by
transactions Tk, Tm ∈ txns(H), respectively, v 6= v′.
Since H is opaque, there is a final-state serialization S of H. Suppose by contradiction that H is
not du-opaque. Thus, there is a readk(X) that returns a value v ∈ V in S that is not legal in Sk,XH ,
the local serialization of readk(X) with respect to H and S. Let Hk,X and Sk,X denote the prefixes
of H and S, respectively, up to the response of readk(X) in H and S. Recall that S
k,X
H , the local
serialization of readk(X) with respect to H and S, is the subsequence of Sk,X that does not contain
events of any transaction Ti ∈ txns(H) so that the invocation of tryCi() is not in Hk,X . Since readk(X)
is legal in S, there is a committed transaction Tm ∈ txns(H) that performs writem(X, v) that is the
latest such write in S that precedes Tk. Thus, if readk(X) is not legal in S
k,X
H , the only possibility
is that readk(X) ≺RTH tryCm(). Under the assumption of unique writes, there does not exist any other
transaction Tj ∈ txns(H) that performs writej(X, v). Consequently, there does not exist any Hk,X (some
completion of Hk,X) and a t-complete t-sequential history S′, such that S′ is equivalent to H
k,X
and S′
contains any committed transaction that writes v to X. This is, Hk,X is not final-state opaque. However,
since H is opaque, every prefix of H must be final-state opaque, which is a contradiction.
By Definition 3.3, an infinite history H is opaque if every finite prefix of H is final-state opaque. Theo-
rem 3.5 now implies that Opacityuw ⊆ du-Opacity.
Definition 3.3 and Corollary 3.2 imply that du-Opacity ⊆ Opacityuw.
The sequential-history case The deferred-update semantics was mentioned by Guerraoui et al. [59]
and later adopted by Kuznetsov and Ravi [90]. In both papers, opacity is only defined for sequential
histories, where every invocation of a t-operation is immediately followed by a matching response. In
particular, these definitions require the final-state serialization to respect the read-commit order : in these
definitions, a history H is opaque if there is a final-state serialization S of H such that if a t-read of a
t-object X by a transaction Tk precedes the tryC of a transaction Tm that commits on X in H, then
Tk precedes Tm in S. As we observed in Figure 3.4, this definition is not equivalent to opacity even for
sequential histories.
The property considered in [59, 90] is strictly stronger than du-opacity: the sequential history H in
Figure 3.4 is du-opaque (and consequently opaque by Theorem 3.10): a du-opaque serialization (in fact
the only possible one) for this history is T1, T3, T2. However, in the restriction of opacity defined above, T2
must precede T3 in any serialization, since the response of read2(X) precedes the invocation of tryC3().
42
3.5 Strict serializability with DU semantics
3.5 Strict serializability with DU semantics
In this section, we discuss the deferred-update restriction of strict serializability from Definition 2.2.
First, we remark that, just as final-state opacity, strict serializability is not prefix-closed (cf. Figure 3.1).
However, we show that the restriction of deferred-update semantics applied to strict serializability induces
a safety property.
Definition 3.6 (Strict serializability with du semantics). A finite history H is strictly serializable if
there is a legal t-complete t-sequential history S, such that
1. there is a completion H of H, such that S is equivalent to cseq(H), where cseq(H) is the subse-
quence of H reduced to committed transactions in H,
2. for any two transactions Tk, Tm ∈ txns(H), if Tk ≺RTH Tm, then Tk precedes Tm in S, and
3. each readk(X) in S that does not return Ak is legal in S
k,X
H .
Notice that every du-opaque history is strictly serializable, but not vice-versa.
Theorem 3.12. Strict serializability is a safety property.
Proof. Observe that any strictly serializable serialization of a finite history H does not include events of
any transaction that has not invoked tryC in H.
To show prefix-closure, a proof almost identical to that of Lemma 3.1 implies that, given a strictly
serializable history H and a serialization S, there is a serialization S′ of H ′ (H ′ is some prefix of H) such
that seq(S′) is a prefix of seq(S).
Consider an infinite sequence of finite histories
H0, . . . ,Hi, Hi+1, . . . ,
where Hi+1 is a one-event extension of Hi, we prove that the infinite limit H of this ever-extending
sequence is strictly serializable. As in Theorem 3.5, we construct an infinite rooted directed graph GH :
a vertex is a tuple (Hi, Si) (note that for each i ∈ N, there are several such vertices of this form), where
Si is a serialization of Hi and there is an edge from (Hi, Si) to (Hi+1, Si+1) if seq(Si) is a prefix of
seq(Si+1). The resulting graph is finitely branching since the out-degree of a vertex is bounded by the
number of possible serializations of a history. Observe that for every vertex (Hi+1, Si+1), there is a
vertex Hi, Si) such that seq(Si) is a prefix of seq(Si+1). Thus, GH is connected since we can iteratively
construct a path from the root (H0, S0) to every vertex (Hi, Si) in GH . Applying König’s Path Lemma
to GH , we obtain an infinite sequence of distinct vertices, (H0, S0), (H1, S1), . . . , (Hi, Si), . . .. Then,
S = lim
i→∞
Si gives the desired serialization of H.
3.6 Du-opacity vs. other deferred-update criteria
In this section, we first study two relaxations of opacity: Virtual-world consistency [85] and Transactional
Memory Specification 1 [43]. We then study Transactional Memory Specification 2 which is a restriction
of opacity.
43
Chapter 3 Safety for transactional memory
R1(X)→ 1 R1(Y )→ 0
W2(X, 1)
W3(Y, 1)R3(X)→ 0
T1 A1
T2 C2
T3 C3
Figure 3.5: A history that is du-VWC, but not du-opaque.
3.6.1 Virtual-world consistency
Virtual World Consistency (VWC) [85] was proposed as a relaxation of opacity (in our case, du-opacity),
where each aborted transaction should be consistent with its causal past (but not necessarily with a
serialization formed by committed transactions). Intuitively, a transaction T1 causally precedes T2 if
T2 reads a value written and committed by T1. The original definition [85] required that no two write
operations are ever invoked with the same argument (the unique-writes assumption). Therefore, the
causal precedence is unambiguously identified for each transactional read. Below we give a more general
definition.
Given a t-sequential legal history S and transactions Ti, Tj ∈ txns(S), we say that Ti reads X from Tj if
(1) Ti reads v in X and (2) Tj is the last committed transaction that writes v to X and precedes Ti in
S.
Now consider a (not necessarily t-sequential) history H. We say that Ti could have read X from Tj in
H if Tj writes a value v to a t-object X, Ti reads v in X, and readi(X) 6≺RTH tryCj().
Given T ⊆ txns(H), let HT denote the subsequence of H restricted to events of transactions in T .
Definition 3.7 (du-VWC). A finite history H is du-virtual-world consistent if it is strictly serializable
(with du-semantics), and for every aborted or t-incomplete transaction Ti ∈ txns(H), there is T ⊆
txns(H) including Ti and a t-sequential t-complete legal history S such that:
1. S is equivalent to a completion of HT ,
2. For all Tj , Tk ∈ txns(S), if Tj reads X from Tk in S, then Tj could have read X from Tk in H,
3. S respects the per-process order of H: if Tj and Tk are executed by the same process and Tj ≺RTH Tk,
then Tj ≺S Tk.
We refer to S as a du-VWC serialization for Ti in H.
Intuitively, with every t-read on X performed by Ti in H, the du-VWC serialization S associates some
transaction Tj from which Ti could have read the value of X. Recursively, with every read performed
by Tj , S associates some Tm from which Tj could have read, etc. Altogether, we get a “plausible” causal
past of Ti that constitutes a serial history. Notice that to ensure deferred-update semantics, we only
allow a transaction Tj to read from a transaction Tk that invoked tryCk by the time of the read operation
of Tj .
We now prove that du-VWC is a strictly weaker property than du-opacity. Since du-TMS2 is strictly
weaker than du-opacity, it follows that Du-TMS2 $ du-VWC.
Theorem 3.13. Du-opacity $ du-VWC.
Proof. If a history H is du-opaque, then there is a du-opaque serialization S equivalent to H, where H is
some completion of H. By construction, S is a total-order on the set of all transactions that participate
in S. Trivially, by taking T = txns(H), we derive that S is a du-VWC serialization for every aborted or
t-incomplete transaction Ti ∈ txns(H). Indeed, S respects the real-time order and, thus, the per-process
order of H. Since S respects the deferred-update order in H, every t-read in S “could have happened”
in H.
44
3.6 Du-opacity vs. other deferred-update criteria
To show that the inclusion is strict, Figure 3.5 depicts a history H that is du-VWC, but not du-opaque.
Clearly,H is strictly serializable. Here T2, T1 is the required du-VWC serialization for aborted transaction
T1. However, H has no du-opaque serialization.
Theorem 3.14. Du-VWC is a safety property.
Proof. By Definition 3.7, a history H is du-VWC if and only if H is strictly serializable and there is a
du-VWC serialization for every transaction Ti ∈ txns(H) that is aborted or t-incomplete in H.
To prove prefix-closure, recall that strict serializability is a prefix-closed property (Theorem 3.12). There-
fore, any du-VWC serialization S for a transaction Ti in history H is also a du-VWC serialization S for
a transaction Ti in any prefix of H that contains events of Ti.
To prove limit-closure, consider an infinite sequence of du-VWC histories H0, H1, . . ., Hi, Hi+1 , . . .,
where each Hi+1 is the one-event extension of Hi and prove that the infinite limit, H of this sequence is
also a du-VWC history. Theorem 3.12 establishes that there is a strictly serializable serialization for H.
Since, for all i ∈ N, Hi is du-VWC, for every transaction Ti that is t-incomplete or aborted in Hi,
there is a VWC serialization for Ti. Consequently, there is a du-VWC serialization for every aborted or
incomplete transaction Ti in H.
3.6.2 Transactional memory specification (TMS)
Transactional Memory Specification (TMS) 1 and 2 were formulated in I/O automata [43]. Following [15],
we adapt these definitions to our framework and explicitly introduce the deferred-update requirement.
TMS1. Given a history H, TMS1 requires us to justify the behavior of all committed transactions in H
by a legal t-complete t-sequential history that preserves the real-time order in H (strict serializability),
and to justify the response of each complete t-operation performed in H by a legal t-complete t-sequential
history S. The t-sequential history S used to justify a complete t-operation opi,k (the ith t-operation
performed by transaction Tk) includes Tk and a subset of transactions from H whose operations justify
opi,k. (Our description follows [15].)
Let Hk,i denote the prefix of a history H up to (and including) the response of ith t-operation opk,i of
transaction Tk. We say that a history H ′′ is a possible past of Hk,i if H ′′ is a subsequence of Hk,i and
consists of all events of transaction Tk and all events from some subset of committed transactions and
transactions that have invoked tryC in Hk,i such that if a transaction T ∈ H ′′, then for a transaction
T ′ ≺RTHk,i T , T ′ ∈ H ′′ if and only if T ′ is committed in Hk,i. Let cTMSpast(H, opk,i) denote the set of
possible pasts of Hk,i.
For any history H ′′ ∈ cTMSpast(H, opk,i), let ccomp(H ′′) denote the history generated from H ′′ by the
following procedure: for all m 6= k, replace every event Am by Cm and complete every incomplete tryCm
with including Cm at the end of H ′′; include tryCk ·Ak at the end of H ′′.
Definition 3.8 (du-TMS1). A history H satisfies du-TMS1 if
1. H is strictly serializable (with du-semantics), and
2. for each complete t-read opi,k that returns a non-Ak response in H, there exist a legal t-complete
t-sequential history S and a history H ′ such that:
– H ′ = ccomp(H ′′), where H ′′ ∈ cTMSpast(H, opk,i)
– H ′ is equivalent to S
– for any two transactions Tk and Tm in H ′, if Tk ≺RTH′ Tm then Tk <S Tm
We refer to S as the du-TMS1 serialization for opi,k.
Theorem 3.15. Du-TMS1 is a safety property.
45
Chapter 3 Safety for transactional memory
W1(X, 1) tryC1
W2(X, 0) tryC2
R3(X)→ 0 tryC3
T1 C1
T2 C2
T3 A3
Figure 3.6: A history which is du-VWC but not du-TMS1.
tryC1
tryC2
tryC3
tryC4
R1(X)→ 0 W1(Y, 1)
W2(X, 2)
R3(X)→ 0 W3(Z, 3)
R4(X)→ 2 R4(Y )→ 0 R4(Z)→ 3
T1 C1
T2 C2
T3 A3
T4 A4
Figure 3.7: A history which is du-TMS1 but not du-VWC.
Proof. A historyH is du-TMS1 if and only ifH is strictly serializable and there is a du-TMS1 serialization
for every t-operation opk,i that does not return Ak in H.
To see that du-TMS1 is prefix closed, recall that strict serializability is a prefix-closed property. Let H
be any du-TMS1 history and Hi, any prefix of H. We now need to prove that, for every t-operation
opk,i 6= tryCk that returns a non-Ak response in Hi, there is a du-TMS1 serialization for opk,i. But this
is immediate since the du-TMS1 serialization for opi,k in H is also the required du-TMS1 serialization
for opk,i in Hi.
To see that du-TMS1 is limit closed, consider an infinite sequence
H0, H1, . . . Hi, Hi+1, . . .
of finite du-TMS1 histories, such thatHi+1 is a one-event extension ofHi. Let letH be the corresponding
infinite limit history. We want to show that H is also du-TMS1.
Since strict serializability is a limit-closed property (Theorem 3.12), H is strictly serializable. By as-
sumption, for all i ∈ N, Hi is du-TMS1. Thus, for every transaction Ti that participates in Hi, there is a
du-TMS1 serialization Si,k for each t-operation opk,i. But Si,k is also the required du-TMS1 serialization
for opk,1 in H. The claim follows.
It has been shown [98] that Opacity is a strictly stronger property than du-TMS1, that is, Opacity $
du-TMS1. Since Du-Opacity $ Opacity (Theorem 3.10) it follows that Du-Opacity $ du-TMS1. On the
other hand, du-TMS1 is incomparable to du-VWC, as demonstrated by the following examples.
Proposition 3.2. There is a history that is du-TMS1, but not du-VWC.
Proof. Figure 3.7 depicts a history H that is du-TMS1, but not du-VWC. Observe that H is strictly
serializable. To prove that H is du-TMS1, we need to prove that there is a TMS1 serialization for each
t-read that returns a non-abort response in H. Clearly, the serialization in which only T3 participates
is the required TMS1 serialization for read3(X) → 0. Now consider the aborted transaction T4. The
TMS1 serialization for read4(X)→ 2 is T2, T4, while the TMS1 serialization that justifies the response of
read4(Y )− > 0 includes just T4 itself. The only nontrivial t-read whose response needs to be justified is
read4(Z)→ 3. Indeed, tryC3 overlaps with read4(Z) and thus, the response of read4(Z) can be justified by
choosing transactions in cTMSpart(H, read4(Z)) to be {T3, T2, T4} and then deriving a TMS1 serialization
S = T3, T2, T4 for read4(Z)→ 3 in which tryC3 may be completed by including the commit response.
However, H is not du-VWC. Consider transaction T3 which returns A3 in H: T3 must be aborted in any
serialization equivalent to some direct causal past of T4. But read4(Z) returns the value 3 that is written
by T3. Thus, read4(Z) cannot be legal in any du-VWC serialization for T4.
46
3.6 Du-opacity vs. other deferred-update criteria
R1(X)→ 0 W1(X, 1) tryC1
R2(X)→ 0 W2(Y, 1) tryC2
T1 C1
T2 C2
Figure 3.8: A history that is du-opaque, but not TMS2 [43].
Proposition 3.3. There is a history that is du-VWC, but not du-TMS1.
Proof. Figure 3.6 depicts a history H that is du-VWC, but not du-TMS1. Clearly, H is strictly serial-
izable. Observe that T3 could have read only from T1 in H (T1 writes the value 0 to X that is returned
by read3(X)). Therefore, T1, T3 is the required du-VWC serialization for aborted transaction T3.
However, H is not du-TMS1: since both transactions T1 and T2 are committed and precede T3 in real-
time order, they must be included in any du-TMS1 serialization for read3(X)→ 0. But there is no such
du-TMS1 serialization that would ensure the legality of read3(X).
TMS2. We now study the TMS2 definition which imposes an extra restriction on the opaque serializa-
tion.
Definition 3.9 (du-TMS2). A history H is du-TMS2 if there is a legal t-complete t-sequential history
S equivalent to some completion, H of H such that
1. for any two transactions Tk, Tm ∈ txns(H), such that Tm is a committed updating transaction, if
Ck ≺RTH tryCm or Ak ≺RTH tryCm, then Tk ≺S Tm, and
2. for any two transactions Tk, Tm ∈ txns(H), if Tk ≺RTH Tm, then Tk <S Tm, and
3. each readk(X) in S that does not return Ak is legal in S
k,X
H .
We refer to S as the du-TMS2 serialization of H.
It has been shown [98] that TMS2 is a strictly stronger property than Opacity, i.e., TMS2 $ Opacity.
We now show that du-TMS2 is strictly stronger than du-opacity. Indeed, from Definition 3.9, we observe
that every history that is du-TMS2 is also du-opaque. The following proposition completes the proof.
Proposition 3.4. There is a history that is du-opaque, but not du-TMS2.
Proof. Figure 3.8 depicts a history H that is du-opaque, but not du-TMS2. Indeed, there is a du-opaque
serialization S of H such that seq(S) = T2, T1. On the other hand, since T1 commits before T2, T1 must
precede T2 in any du-TMS2 serialization, there does not exist any such serialization that ensures every
t-read is legal. Thus, H is not du-TMS2.
Theorem 3.16. Du-TMS2 is prefix-closed.
Proof. Let H be any du-TMS2 history. Then, H is also du-opaque. By Corollary 3.2, for every i ∈ N,
there is a du-opaque serialization Si for Hi. We now need to prove that, for any two transactions
Tk, Tm ∈ txns(Hi), such that Tm is a committed updating transaction, if Ck ≺RTHi tryCm or Ak ≺RTHi
tryCm, there is a du-opaque serialization Si with the restriction that Tk ≺Si Tm.
Suppose by contradiction that there exist transactions Tk, Tm ∈ txns(Hi), such that Tm is a committed
updating transaction and Ck ≺RTHi tryCm or Ak ≺RTHi tryCm, but Tm must precede Tk in any du-opaque
serialization Si. Since Tm 6≺RTHi Tk, the only possibility is that Tm performs writem(X, v) and there is
readk(X) → v. However, by our assumption, writek(X, v) ≺RTHi tryCm: thus, readk(X) is not legal in
its local serialization with respect to Hi and Si—contradicting the assumption that Si is a du-opaque
serialization ofHi. Thus, there is a du-TMS2 serialization forHi, proving that du-TMS2 is a prefix-closed
property.
47
Chapter 3 Safety for transactional memory
du-opacity du-VWC du-TMS1 du-TMS2
du-opacity $ $ %
du-VWC % × %
du-TMS1 % × %
du-TMS2 $ $ $
Table 3.1: Relations between TM consistency definitions.
Proposition 3.5. Du-TMS2 is not limit-closed.
Proof. The counter-example to establish that du-opacity is not limit-closed (Figure 3.2) also shows that
du-TMS2 is not limit-closed: all histories discussed in the counter-example are in du-TMS2.
3.7 Related work and Discussion
The properties discussed in this chapter explicitly preclude reading from a transaction that has not yet
invoked tryCommit, which makes them prefix-closed and facilitates their verification. We believe that
this constructive definition is useful to TM practitioners, since it streamlines possible implementations
of t-read and tryCommit operations.
We showed that du-opacity is limit-closed under the restriction that every operation eventually termi-
nates, while du-VWC and du-TMS1 are (unconditionally) limit-closed, which makes them safety prop-
erties [100].
Table 3.1 summarizes the containment relations between the properties discussed in this chapter: opacity,
du-opacity, du-VWC, du-TMS1 and du-TMS2. For example, “du-opacity ( opacity” means that the set of
du-opaque histories is a proper subset of the set of opaque histories, i.e., du-opacity is a strictly stronger
property than opacity. Incomparable (not related by containment) properties, such as du-TMS1 and
du-VWC are marked with ×.
Linearizability [27, 83], when applied to objects with finite nondeterminism (i.e., an operation applied to
a given state may produce only finitely many outcomes) sequential specifications is a safety property [66,
100]. Recently, it has been shown [66] that linearizability is not limit-closed if the implemented object
may expose infinite non-determinism [66], that is, an operation applied to a given state may produce
infinitely many different outcomes. The limit-closure proof (cf. Theorem 3.5), using König’s lemma,
cannot be applied with infinite non-determinism, because the out-degree of the graph GH , constructed
for the limit infinite history H, is not finite.
In contrast, the TM abstraction is deterministic, since reads and writes behave deterministically in
serial executions, yet du-opacity is not limit-closed. It turns out that the graph GH for the counter-
example history H in Figure 3.2 is not connected. For example, one of the finite prefixes of H can be
serialized as T3, T1, T2, but no prefix has a serialization T3, T1 and, thus, the root is not connected to
the corresponding vertex of GH . Thus, the precondition of König’s lemma does not hold for GH : the
graph is in fact an infinite set of isolated vertices. This is because du-opacity requires even incomplete
reading transactions, such as T2, to appear in the serialization, which is not the case for linearizability,
where incomplete operations may be removed from the linearization.
48
4
Complexity bounds for blocking TMs
"I can’t believe that!" said Alice
"Can’t you?" the Queen said in a
pitying tone. "Try again: draw a
long breath, and shut your eyes."
Alice laughed. "There’s no use
trying," she said: "one can’t
believe impossible things."
"I daresay you haven’t had much
practice," said the Queen. "When
I was your age, I always did it for
half-an-hour a day. Why,
sometimes I’ve believed as many as
six impossible things before
breakfast."
Lewis Carroll-Through the
Looking-Glass
4.1 Overview
In this chapter, we present complexity bounds for TM implementations that provide no non-blocking
progress guarantees for transactions and typically allow a transaction to block (delay) or abort in con-
current executions. We refer to Section 2.7 in Chapter 2 for an overview of the complexity metrics
considered in the thesis.
Sequential TMs. We start by presenting complexity bounds for single-lock TMs that satisfy sequential
TM-progress. We show that a read-only transaction in an opaque TM featured with weak DAP, weak
invisible reads, ICF TM-liveness and sequential TM-progress must incrementally validate every next
read operation. This results in a quadratic (in the size of the transaction’s read set) step-complexity
lower bound. Secondly, we prove that if the TM-correctness property is weakened to strict serializability,
there exist executions in which the tryCommit of some transaction must access a linear (in the size of the
transaction’s read set) number of distinct base objects. We then show that expensive synchronization
49
Chapter 4 Complexity bounds for blocking TMs
in TMs cannot be eliminated: even single-lock TMs must perform a RAW (read-after-write) or AWAR
(atomic-write-after-read) pattern [17].
Progressive TMs. We turn our focus to progressive TM implementations which allow a transaction
to be aborted only due to read-write conflicts with concurrent transactions. We introduce a new metric
called protected data size that, intuitively, captures the amount of data that a transaction must ex-
clusively control at some point of its execution. All progressive TM implementations we are aware of
(see, e.g., an overview in [62]) use locks or timing assumptions to give an updating transaction exclusive
access to all objects in its write set at some point of its execution. For example, lock-based progressive
implementations like TL [40] and TL2 [39] require that a transaction grabs all locks on its write set
before updating the corresponding base objects. Our result shows that this is an inherent price to pay
for providing progressive concurrency: every committed transaction in a progressive and strict DAP TM
implementation providing starvation-free TM-liveness must, at some point of its execution, protect every
t-object in its write set.
We also present a very cheap progressive opaque strict DAP TM implementation from read-write base
objects with constant expensive synchronization and constant memory stall complexities.
Strongly progressive TMs. We then prove that in any strongly progressive strictly serializable TM
implementation that accesses the shared memory with read, write and conditional primitives, such as
compare-and-swap and load-linked/store-conditional, the total number of remote memory references
(RMRs) that take place in an execution of a progressive TM in which n concurrent processes perform
transactions on a single t-object might reach Ω(n log n). The result is obtained via a reduction to an
analogous lower bound for mutual exclusion [22]. In the reduction, we show that any TM with the
above properties can be used to implement a deadlock-free mutual exclusion, employing transactional
operations on only one t-object and incurring a constant RMR overhead. The lower bound applies to
RMRs in both the cache-coherent (CC) and distributed shared memory (DSM) models, and it appears
to be the first RMR complexity lower bound for transactional memory.
We also present a constant expensive synchronization strongly progressive TM implementation from read-
write base objects. Our implementation provides starvation-free TM-liveness, thus showing one means of
circumventing the lower bound of Rachid et al. [64] who proved the impossibility of implementing strongly
progressive strictly serializable TMs providing wait-free TM-liveness from read-write base objects.
Permissive TMs. We conclude our study of blocking TMs by establishing a linear (in the transac-
tion’s data set size) separation between the worst-case transaction expensive synchronization complexity
of strongly progressive TMs and permissive TMs that allow a transaction to abort only if commit-
ting it would violate opacity. Specifically, we show that an execution of a transaction in a permissive
opaque TM implementation that provides starvation-free TM-liveness may require to perform at least
one RAW/AWAR pattern per t-read.
Roadmap of Chapter 4. Section 4.2 studies “single-lock” TMs that provide minimal progressiveness
or sequential TM-progress, Section 4.3 is devoted to progressive TMs while Section 4.4 is on strongly
progressive TMs. In Section 4.5, we study the cost of permissive TMs that allow a transaction to abort
only if committing it would violate opacity. Finally, we present related work and open questions in
Section 4.6.
4.2 Sequential TMs
We begin with “single-lock”, i.e., sequential TMs. Our first result proves that a read-only transaction
in a sequential TM featured with weak DAP and weak invisible reads must incur the cost of validating
its read set. This results in a quadratic (and resp., linear) (in the size of the transaction’s read set)
step-complexity lower bound if we assume opacity (and resp., strict serializability). Secondly, we show
that expensive synchronization cannot be avoided even in such sequential TMs, i.e., a serializable TM
must perform a RAW/AWAR even when transactions are guaranteed to commit only in the absence of
any concurrency.
50
4.2 Sequential TMs
Rφ(X1) · · ·Rφ(Xi−1)
i− 1 t-reads Rφ(Xi)→ nvWi(Xi, nv)
Ti commits
TφTi
(a) Rφ(Xi) must return nv by strict serializability
Rφ(X1) · · ·Rφ(Xi−1)
i− 1 t-reads
Wi(Xi, nv)
Ti commits
Rφ(Xi)→ nv
new value
Tφ
Ti
(b) Ti does not observe any conflict with Tφ
Figure 4.1: Executions in the proof of Lemma 4.1; By weak DAP, Tφ cannot distinguish this from the
execution in Figure 4.1a
We first prove the following auxiliary lemma that will be of use in subsequent proofs.
Lemma 4.1. Let M be any strictly serializable, weak DAP TM implementation that provides sequential
TM-progress and sequential TM-liveness. Then, for all i ∈ N, M has an execution of the form pii−1 ·ρi ·αi
where,
• pii−1 is the complete step contention-free execution of read-only transaction Tφ that performs (i−1)
t-reads: readφ(X1) · · · readφ(Xi−1),
• ρi is the t-complete step contention-free execution of a transaction Ti that writes nv 6= v to Xi and
commits (v is the initial value of X − i),
• αi is the complete step contention-free execution fragment of Tφ that performs its ith t-read:
readφ(Xi)→ nvi.
Proof. By sequential TM-progress and sequential TM-liveness, M has an execution of the form ρi ·pii−1.
Since Dset(Tk) ∩ Dset(Ti) = ∅ in ρi · pii−1, by Lemma 2.10, transactions Tφ and Ti do not contend on
any base object in execution ρi · pii−1. Thus, ρi · pii−1 is also an execution of M .
By assumption of strict serializability, ρi · pii−1 · αi is an execution of M in which the t-read of Xi
performed by Tφ must return nv. But ρi · pii−1 · αi is indistinguishable to Tφ from pii−1 · ρi · αi. Thus,
M has an execution of the form pii−1 · ρi · αi.
4.2.1 A quadratic lower bound on step complexity
In this section, we present our step complexity lower bound for sequential TMs.
Theorem 4.2. For every weak DAP TM implementation M that provides ICF TM-liveness, sequential
TM-progress and uses weak invisible reads,
(1) If M is opaque, for every m ∈ N, there exists an execution E of M such that some transaction
T ∈ txns(E) performs Ω(m2) steps, where m = |Rset(Tk)|.
(2) if M is strictly serializable, for every m ∈ N, there exists an execution E of M such that some
transaction Tk ∈ txns(E) accesses at least m− 1 distinct base objects during the executions of the
mth t-read operation and tryCk(), where m = |Rset(Tk)|.
Proof. For all i ∈ {1, . . . ,m}, let v be the initial value of t-object Xi.
(1) Suppose thatM is opaque. Let pim denote the complete step contention-free execution of a transaction
Tφ that performs m t-reads: readφ(X1) · · · readφ(Xm) such that for all i ∈ {1, . . . ,m}, readφ(Xi)→ v.
By Lemma 4.1, for all i ∈ {2, . . . ,m}, M has an execution of the form Ei = pii−1 · ρi · αi.
51
Chapter 4 Complexity bounds for blocking TMs
For each i ∈ {2, . . . ,m}, j ∈ {1, 2} and ` ≤ (i − 1), we now define an execution of the form Eij` =
pii−1 · β` · ρi · αij as follows:
• β` is the t-complete step contention-free execution fragment of a transaction T` that writes nv` 6= v
to X` and commits
• αi1 (and resp. αi2) is the complete step contention-free execution fragment of readφ(Xi)→ v (and
resp. readφ(Xi)→ Aφ).
Claim 4.3. For all i ∈ {2, . . . ,m} and ` ≤ (i− 1), M has an execution of the form Ei1` or Ei2`.
Proof. For all i ∈ {2, . . . ,m}, pii−1 is an execution of M . By assumption of weak invisible reads and
sequential TM-progress, T` must be committed in pii−1 ·ρ` and M has an execution of the form pii−1 ·β`.
By the same reasoning, since Ti and T` have disjoint data sets,M has an execution of the form pii−1 ·β` ·ρi.
Since the configuration after pii−1 · β` · ρi is quiescent, by ICF TM-liveness, pii−1 · β` · ρi extended with
readφ(Xi) must return a matching response. If readφ(Xi) → vi, then clearly Ei1 is an execution of M
with Tφ, Ti−1, Ti being a valid serialization of transactions. If readφ(Xi) → Aφ, the same serialization
justifies an opaque execution.
Suppose by contradiction that there exists an execution of M such that pii−1 · β` · ρi is extended with
the complete execution of readφ(Xi) → r; r 6∈ {Aφ, v}. The only plausible case to analyse is when
r = nv. Since readφ(Xi) returns the value of Xi updated by Ti, the only possible serialization for
transactions is T`, Ti, Tφ; but readφ(X`) performed by Tk that returns the initial value v is not legal in
this serialization—contradiction.
We now prove that, for all i ∈ {2, . . . ,m}, j ∈ {1, 2} and ` ≤ (i− 1), transaction Tφ must access (i− 1)
different base objects during the execution of readφ(Xi) in the execution pii−1 · β` · ρi · αij .
By the assumption of weak invisible reads, the execution pii−1·β`·ρi·αij is indistinguishable to transactions
T` and Ti from the execution p˜ii−1 · β` · ρi · αij , where Rset(Tφ) = ∅ in p˜ii−1. But transactions T` and Ti
are disjoint-access in p˜ii−1 · β` · ρi and by Lemma 2.10, they cannot contend on the same base object in
this execution.
Consider the (i− 1) different executions: pii−1 ·β1 · ρi, . . ., pii−1 ·βi−1 · ρi. For all `, `′ ≤ (i− 1);`′ 6= `, M
has an execution of the form pii−1 · β` · ρi · β`′ in which transactions T` and T`′ access mutually disjoint
data sets. By weak invisible reads and Lemma 2.10, the pairs of transactions T`′ , Ti and T`′ , T` do not
contend on any base object in this execution. This implies that pii−1 · β` · β`′ · ρi is an execution of M in
which transactions T` and T`′ each apply nontrivial primitives to mutually disjoint sets of base objects
in the execution fragments β` and β`
′
respectively (by Lemma 2.10).
This implies that for any j ∈ {1, 2}, ` ≤ (i− 1), the configuration Ci after Ei differs from the configura-
tions after Eij` only in the states of the base objects that are accessed in the fragment β`. Consequently,
transaction Tφ must access at least i−1 different base objects in the execution fragment piij to distinguish
configuration Ci from the configurations that result after the (i − 1) different executions pii−1 · β1 · ρi,
. . ., pii−1 · βi−1 · ρi respectively.
Thus, for all i ∈ {2, . . . ,m}, transaction Tφ must perform at least i − 1 steps while executing the ith
t-read in piij and Tφ itself must perform
m−1∑
i=1
i = m(m−1)2 steps.
(2) Suppose thatM is strictly serializable, but not opaque. SinceM is strictly serializable, by Lemma 4.1,
it has an execution of the form E = pim−1 · ρm · αm.
For each ` ≤ (i − 1), we prove that M has an execution of the form E` = pim−1 · β` · ρm · α¯m where
α¯m is the complete step contention-free execution fragment of readφ(Xm) followed by the complete
execution of tryCφ. Indeed, by weak invisible reads, pim−1 does not contain any nontrivial events and
the execution pim−1 ·β` ·ρm is indistinguishable to transactions T` and Tm from the executions p˜im−1 ·β`
and p˜im−1 · β` · ρm respectively, where Rset(Tφ) = ∅ in p˜im−1. Thus, applying Lemma 2.10, transactions
52
4.3 Progressive TMs
β` ·ρm do not contend on any base object in the execution pim−1 ·β` ·ρm. By ICF TM-liveness, readφ(Xm)
and tryCφ must return matching responses in the execution fragment α¯m that extends pim−1 · β` · ρm.
Consequently, for each ` ≤ (i − 1), M has an execution of the form E` = pim−1 · β` · ρm · α¯m such that
transactions T` and Tm do not contend on any base object.
Strict serializability ofM means that if readφ(Xm)→ nv in the execution fragment α¯m, then tryCφ must
return Aφ. Otherwise if readφ(Xm)→ v (i.e. the initial value of Xm), then tryCφ may return Aφ or Cφ.
Thus, as with (1), in the worst case, Tφ must access at least m − 1 distinct base objects during the
executions of readφ(Xm) and tryCφ to distinguish the configuration Ci from the configurations after the
m− 1 different executions pim−1 · β1 · ρm, . . ., pim−1 · βm−1 · ρm respectively.
4.2.2 Expensive synchronization in Transactional memory cannot be
eliminated
In this section, we show that serializable TMs must perform a RAW/AWAR even if they are guaranteed
to commit only when they run in the absence of any concurrency.
Theorem 4.4. Let M be a serializable TM implementation providing sequential TM-progress and se-
quential TM-liveness. Then, every execution ofM in which a transaction running t-sequentially performs
at least one t-read and at least one t-write contains a RAW/AWAR pattern.
Proof. Consider an execution pi of M in which a transaction T1 running t-sequentially performs (among
other events) read1(X), write1(Y, v) and tryC1(). SinceM satisfies sequential TM-progress and sequential
TM-liveness, T1 must commit in pi. Clearly pi must contain a write to a base object. Otherwise a
subsequent transaction reading Y would return the initial value of Y instead of the value written by T1.
Let piw be the first write to a base object in pi. Thus, pi can be represented as pis · piw · pif .
Now suppose by contradiction that pi contains neither RAW nor AWAR patterns.
Since pis contains no writes, the states of base objects in the initial configuration and in the configuration
after pis is performed are the same. Consider an execution pis · ρ where in ρ, a transaction T2 performs
read2(Y ), write2(X, 1), tryC2() and commits. Such an execution exists, since ρ is indistinguishable to T2
from an execution in which T2 runs t-sequentially and thus T2 cannot be aborted in pis · ρ.
Since piw contains no AWAR, pis · ρ · piw is an execution of M .
Since piw · pif contains no RAWs, every read performed in piw · pif is applied to base objects which were
previously written in piw ·pif . Thus, there exists an execution pis ·ρ·piw ·pif , such that T1 cannot distinguish
pis · piw · pif and pis · ρ · piw · pif . Hence, T1 commits in pis · ρ · piw · pif .
But T1 reads the initial value of X and T2 reads the initial value of Y in pis · ρ · piw · pif , and thus T1 and
T2 cannot be both committed (at least one of the committed transactions must read the value written
by the other)—a contradiction.
4.3 Progressive TMs
We move on to the stronger (than sequential TMs) class of progressive TMs. We introduce a new
metric called protected data size that, intuitively, captures the number of t-objects that a transaction
must exclusively control at some prefix of its execution. We first prove that any strict DAP progressive
opaque TM must protect its entire write set at some point in its execution.
Secondly, we describe a constant stall, constant RAW/AWAR strict DAP opaque progressive TM that
provides invisible reads and is implemented from read-write base objects.
53
Chapter 4 Complexity bounds for blocking TMs
4.3.1 A linear lower bound on the amount of protected data
LetM be a progressive TM implementation providing starvation-free TM-liveness. Intuitively, a t-object
Xj is protected at the end of some finite execution pi of M if some transaction T0 is about to atomically
change the value of Xj in its next step (e.g., by performing a compare-and-swap) or does not allow any
concurrent transaction to read Xj (e.g., by holding a “lock” on Xj).
Formally, let α ·pi be an execution ofM such that pi is a t-sequential t-complete execution of a transaction
T0, where Wset(T0) = {X1, . . . , Xm}. Let uj (j = 1, . . . ,m) denote the value written by T0 to t-object
Xj in pi. In this section, let pit denote the t-th shortest prefix of pi. Let pi0 denote the empty prefix.
For any Xj ∈Wset(T0), let Tj denote a transaction that tries to read Xj and commit. Let Etj = α ·pit ·ρtj
denote the extension of α · pit in which Tj runs solo until it completes. Note that, since we only require
the implementation to be starvation-free, ρtj can be infinite.
We say that α ·pit is (1, j)-valent if the read operation performed by Tj in α ·pit · ρtj returns uj (the value
written by T0 to Xj). We say that α ·pit is (0, j)-valent if the read operation performed by Tj in α ·pit ·ρtj
does not abort and returns an "old" value u 6= uj . Otherwise, if the read operation of Tj aborts or never
returns in α · pit · ρtj , we say that α · pit is (⊥, j)-valent.
Definition 4.1. We say that T0 protects an object Xj in α · pit, where pit is the t-th shortest prefix of pi
(t > 0) if one of the following conditions holds: (1) α · pit is (0, j)-valent and α · pit+1 is (1, j)-valent, or
(2) α · pit or α · pit+1 is (⊥, j)-valent.
For strict disjoint-access parallel progressive TM, we show that every transaction running t-sequentially
must protect every t-object in its write set at some point of its execution.
We observe that the no prefix of pi can be 0 and 1-valent at the same time.
Lemma 4.5. There does not exist pit, a prefix of pi, and i, j ∈ {1, . . . ,m} such that α · pit is both
(0, i)-valent and (1, j)-valent.
Proof. By contradiction, suppose that there exist i, j and α · pit that is both (0, i)-valent and (1, j)-
valent. Since the implementation is strict DAP, there exists an execution of M , Etij = α · pit · ρtj · ρti that
is indistinguishable to Ti from α · pit · ρti. In Etij , the only possible serialization is T0, Tj , Ti. But Ti
returns the “old” value of Xi and, thus, the serialization is not legal—a contradiction.
If α · pit is (0, i)-valent (resp., (1, i)-valent) for some i, we say that it is 0-valent (resp., 1-valent). By
Lemma 4.5, the notions of 0-valence and 1-valence are well-defined.
Theorem 4.6. Let M be a progressive, opaque and strict disjoint-access-parallel TM implementation
that provides starvation-free TM-liveness. Let α · pi be an execution of M , where pi is a t-sequential
t-complete execution of a transaction T0. Then, there exists pit, a prefix of pi, such that T0 protects
|Wset(T0)| t-objects in α · pit.
Proof. Let WsetT0 = {X1, . . . , Xm}. Consider two cases:
(1) Suppose that pi has a prefix pit such that α · pit is 0-valent and α · pit+1 is 1-valent. By Lemma 4.5,
there does not exists i, such that α ·pit is (1, i)-valent and α ·pit+1 is (0, i)-valent. Thus, one of the
following are true
• For every i ∈ {1, . . . ,m}, α · pit is (0, i)-valent and α · pit+1 is (1, i)-valent
• At least one of α · pit and α · pit+1 is (⊥, i)-valent, i.e., the t-operation of Ti aborts or never
returns
In either case, T0 protects m t-objects in α · pit.
54
4.3 Progressive TMs
(2) Now suppose that such pit does not exists, i.e., there is no i ∈ {1, . . . ,m} and t ∈ {0, |pi| − 1} such
that Eti exists and returns an old value, and E
t+1
i exists and returns a new value.
Suppose there exists s, t, 0 < s+ 1 < t, S ⊆ {1, . . . ,m}, such that:
• α · pis is 0-valent,
• α · pit is 1-valent,
• for all r, s < r < t, and for all i ∈ S, α · pir is (⊥, i)-valent.
We say that s+ 1, . . . , t− 1 is a protecting fragment for t-objects {Xj |j ∈ S}.
SinceM is opaque and progressive, α·pi0 = α is 0-valent and α·pi is 1-valent. Thus, the assumption
of Case (2) implies that for each Xi, there exists a protecting fragment for {Xi}. In particular,
there exists a protecting fragment for {X1}.
Now we proceed by induction. Let pis+1, . . . , pit−1 be a protecting fragment for {X1, . . . , Xu−1}
such that u ≤ m.
Now we claim that there must be a subfragment of s+ 1, . . . , t− 1 that protects {X1, . . . , Xu}.
Suppose not. Thus, there exists r, s < r < t, such that α · pir is (0, u)-valent or (1, u)-valent.
Suppose first that α · pir is (1, u)-valent. Since α · pis is (0, i)-valent for some i 6= u, by Lemma 4.5
and the assumption of Case (2), there must exist s′, t′, s < s′ + 1 < t′ ≤ r such that
• α · pis′ is 0-valent,
• α · pit′ is 1-valent,
• for all r′, s′ < r′ < t′, α · pir′ is (⊥, u)-valent.
As a result, s′ + 1, . . . , t′ − 1 is a protecting fragment for {X1, . . . , Xu}. The case when α · pir is
(0, u)-valent is symmetric, except that now we should consider fragment r, . . . , t instead of s, . . . , r.
Thus, there exists a subfragment of s+ 1, . . . , t− 1 that protects {X1, . . . , Xu}. By induction, we
obtain a protecting fragment s′′ + 1, . . . , t′′ − 1 for {X1, . . . , Xm}. Thus, any prefix α · pir, where
s′′ < r < t′′ protects exactly m t-objects.
In both cases, there is a prefix of α · pi that protects exactly m t-objects.
The lower bound of Theorem 4.6 is tight: it is matched by all progressive implementations we are aware
of, including Algorithm 4.1 described in the next section.
4.3.2 A constant stall and constant expensive synchronization strict DAP
opaque TM
In this section, we describe a cheap progressive, opaque TM implementation LP (Algorithm 4.1). Our
TM LP , every transaction performs at most a single RAW, every t-read operation incurs O(1) memory
stalls and maintains exactly one version of every t-object at any prefix of an execution. Moreover, the
implementation is strict DAP and uses only read-write base objects.
Base objects. For every t-object Xj , LP maintains a base object vj that stores the value of Xj .
Additionally, for each Xj , we maintain a bit Lj , which if set, indicates the presence of an updating
transaction writing to Xj . Also, for every process pi and t-object Xj , LP maintains a single-writer bit
rij to which only pi is allowed to write. Each of these base objects may be accessed only via read and
write primitives.
Read operations. The implementation first reads the value of t-object Xj from base object vj and
then reads the bit Lj to detect contention with an updating transaction. If Lj is set, the transaction is
aborted; if not, read validation is performed on the entire read set. If the validation fails, the transaction
55
Chapter 4 Complexity bounds for blocking TMs
Algorithm 4.1 Strict DAP progressive opaque TM implementation LP ; code for Tk executed by process
pi
1: Shared base objects:
2: vj , for each t-object Xj , allows reads and writes
3: rij , for each process pi and t-object Xj
4: single-writer bit
5: allows reads and writes
6: Lj , for each t-object Xj
7: allows reads and writes
8: Local variables:
9: Rsetk,Wsetk for every transaction Tk;
10: dictionaries storing {Xm, vm}
11: readk(Xj):
12: if Xj 6∈ Rset(Tk) then
13: [ovj , kj ] := read(vj)
14: Rset(Tk) := Rset(Tk) ∪ {Xj , [ovj , kj ]}
15: if read(Lj) 6= 0 then
16: Return Ak
17: if validate() then
18: Return Ak
19: Return ovj
20: else
21: [ovj ,⊥] := Rset(Tk).locate(Xj)
22: Return ovj
23: writek(Xj, v):
24: nvj := v
25: Wset(Tk) := Wset(Tk) ∪ {Xj}
26: Return ok
27: tryCk():
28: if |Wset(Tk)| = ∅ then
29: Return Ck
30: locked := acquire(Wset(Tk))
31: if ¬ locked then
32: Return Ak
33: if isAbortable() then
34: release(Wset(Tk))
35: Return Ak
// Exclusive write access to each vj
36: for all Xj ∈Wset(Tk) do
37: write(vj , [nvj , k])
38: release(Wset(Tk))
39: Return Ck
40: Function: release(Q):
41: for all Xj ∈ Q do
42: write(Lj , 0)
43: for all Xj ∈ Q do
44: write(rij , 0)
45: Return ok
46: Function: acquire(Q):
47: for all Xj ∈ Q do
48: write(rij , 1)
49: if ∃Xj ∈ Q; t 6= k : read(rtj) = 1 then
50: for all Xj ∈ Q do
51: write(rij , 0)
52: Return false
// Exclusive write access to each Lj
53: for all Xj ∈ Q do
54: write(Lj , 1)
55: Return true
56: Function: isAbortable() :
57: if ∃Xj ∈ Rset(Tk) : Xj 6∈ Wset(Tk) ∧ read(Lj) 6= 0
then
58: Return true
59: if validate() then
60: Return true
61: Return false
62: Function: validate() :
// Read validation
63: if ∃Xj ∈ Rset(Tk):[ovj , kj ] 6= read(vj) then
64: Return true
65: Return false
is aborted. Otherwise, the implementation returns the value of Xj . For a read-only transaction Tk,
tryCk simply returns the commit response.
Updating transactions. The writek(X, v) implementation by process pi simply stores the value v
locally, deferring the actual updates to tryCk. During tryCk, process pi attempts to obtain exclusive
write access to every Xj ∈ Wset(Tk). This is realized through the single-writer bits, which ensure that
no other transaction may write to base objects vj and Lj until Tk relinquishes its exclusive write access
to Wset(Tk). Specifically, process pi writes 1 to each rij , then checks that no other process pt has written
1 to any rtj by executing a series of reads (incurring a single RAW). If there exists such a process that
concurrently contends on write set of Tk, for each Xj ∈ Wset(Tk), pi writes 0 to rij and aborts Tk. If
successful in obtaining exclusive write access to Wset(Tk), pi sets the bit Lj for each Xj in its write
set. Implementation of tryCk now checks if any t-object in its read set is concurrently contended by
another transaction and then validates its read set. If there is contention on the read set or validation
fails (indicating the presence of a conflicting transaction), the transaction is aborted. If not, pi writes the
56
4.3 Progressive TMs
values of the t-objects to shared memory and relinquishes exclusive write access to each Xj ∈Wset(Tk)
by writing 0 to each of the base objects Lj and rij .
Complexity. Read-only transactions do not apply any nontrivial primitives. Any updating transaction
performs at most a single RAW in the course of acquiring exclusive write access to the transaction’s
write set. Thus, every transaction performs O(1) non-overlapping RAWs in any execution.
Recall that a transaction may write to base objects vj and Lj only after obtaining exclusive write access
to t-object Xj , which in turn is realized via single-writer base objects. Thus, no transaction performs a
write to any base object b immediately after a write to b by another transaction, i.e., every transaction
incurs only O(1) memory stalls on account of any event it performs. The readk(Xj) implementation
reads base objects vj and Lj , followed by the validation phase in which it reads vk for each Xk in its
current read set. Note that if the first read in the validation phase incurs a stall, then readk(Xj) aborts.
It follows that each t-read incurs O(1) stalls in every execution.
Proof of opacity. We now prove that LP implements an opaque TM.
We introduce the following technical definition: process pi holds a lock on Xj after an execution pi of
Algorithm 4.1 if pi contains the invocation of acquire(Q), Xj ∈ Q by pi that returned true, but does not
contain a subsequent invocation of release(Q′), Xj ∈ Q′, by pi in pi.
Lemma 4.7. For any object Xj, and any execution pi of Algorithm 4.1, there exists at most one process
that holds a lock on Xj after pi.
Proof. Assume, by contradiction, that there exists an execution pi after which processes pi and pk hold a
lock on the same object, say Xj . In order to hold the lock on Xj , process pi writes 1 to register rij and
then checks if any other process pk has written 1 to rkj . Since the corresponding operation acquire(Q),
Xj ∈ Q invoked by pi returns true, pi read 0 in rkj in Line 49. But then pk also writes 1 to rkj and later
reads that rij is 1. This is because pk can write 1 to rkj only after the read of rkj returned 0 to pi which
is preceded by the write of 1 to rij . Hence, there exists an object Xj such that rij = 1; i 6= k, but the
conditional in Line 49 returns true to process pk— a contradiction.
Observation 4.8. Let pi be any execution of Algorithm 4.1. Then, any updating transaction Tk ∈ txns(pi)
executed by process pi writes to base object vj (in Line 37) for some Xj ∈Wset(Tk) immediately after pi
iff pi holds the lock on Xj after pi.
Lemma 4.9. Algorithm 4.1 implements an opaque TM.
Proof. Let E by any finite execution of Algorithm 4.1. Let <E denote a total-order on events in E.
Let H denote a subsequence of E constructed by selecting linearization points of t-operations performed
in E. The linearization point of a t-operation op, denoted as `op is associated with a base object event
or an event performed between the invocation and response of op using the following procedure.
Completions. First, we obtain a completion of E by removing some pending invocations and adding
responses to the remaining pending invocations involving a transaction Tk as follows: every incomplete
readk, writek operation is removed from E; an incomplete tryCk is removed from E if Tk has not performed
any write to a base object during the release function in Line 38, otherwise it is completed by including
Ck after E.
Linearization points. Now a linearization H of E is obtained by associating linearization points to
t-operations in the obtained completion of E as follows:
• For every t-read opk that returns a non-Ak value, `opk is chosen as the event in Line 13 of Algo-
rithm 4.1, else, `opk is chosen as invocation event of opk
• For every opk = writek that returns, `opk is chosen as the invocation event of opk
• For every opk = tryCk that returns Ck such that Wset(Tk) 6= ∅, `opk is associated with the response
of acquire in Line 30, else if opk returns Ak, `opk is associated with the invocation event of opk
57
Chapter 4 Complexity bounds for blocking TMs
• For every opk = tryCk that returns Ck such that Wset(Tk) = ∅, `opk is associated with Line 29
<H denotes a total-order on t-operations in the complete sequential history H.
Serialization points. The serialization of a transaction Tj , denoted as δTj is associated with the
linearization point of a t-operation performed within the execution of Tj .
We obtain a t-complete history H¯ from H as follows: for every transaction Tk in H that is complete,
but not t-complete, we insert tryCk ·Ak after H.
A t-complete t-sequential history S is obtained by associating serialization points to transactions in H¯
as follows:
• If Tk is an update transaction that commits, then δTk is `tryCk
• If Tk is a read-only or aborted transaction in H¯, δTk is assigned to the linearization point of the
last t-read that returned a non-Ak value in Tk
<S denotes a total-order on transactions in the t-sequential history S.
Claim 4.10. If Ti ≺H Tj, then Ti <S Tj
Proof. This follows from the fact that for a given transaction, its serialization point is chosen between
the first and last event of the transaction implying if Ti ≺H Tj , then δTi <E δTj implies Ti <S Tj .
Claim 4.11. Let Tk be any updating transaction that returns false from the invocation of isAbortable
in Line 33. Then, Tk returns Ck within a finite number of its own steps in any extension of E.
Proof. Observer that Tk performs the write to base objects vj for every Xj ∈Wset(Tk) and then invokes
release in Lines 37 and 38 respectively. Since neither of these involve aborting the transaction or contain
unbounded loops or waiting statements, it follows that Tk will return Ck within a finite number of its
steps.
Claim 4.12. S is legal.
Proof. Observe that for every readj(Xm)→ v, there exists some transaction Ti that performs writei(Xm, v)
and completes the event in Line 37 such that readj(Xm) 6≺RTH writei(Xm, v). More specifically, readj(Xm)
returns as a non-abort response, the value of the base object vm and vm can be updated only by a transac-
tion Ti such that Xm ∈Wset(Ti). Since readj(Xm) returns the response v, the event in Line 13 succeeds
the event in Line 37 performed by tryCi. Consequently, by Claim 4.11 and the assignment of linearization
points, `tryCi <E `readj(Xm). Since, for any updating committing transaction Ti, δTi = `tryCi , by the
assignment of serialization points, it follows that δTi <E δTj .
Thus, to prove that S is legal, it suffices to show that there does not exist a transaction Tk that returns
Ck in S and performs writek(Xm, v′); v′ 6= v such that Ti <S Tk <S Tj . Suppose that there exists a
committed transaction Tk, Xm ∈Wset(Tk) such that Ti <S Tk <S Tj .
Ti and Tk are both updating transactions that commit. Thus,
(Ti <S Tk) ⇐⇒ (δTi <E δTk)
(δTi <E δTk) ⇐⇒ (`tryCi <E `tryCk)
Since, Tj reads the value of X written by Ti, one of the following is true: `tryCi <E `tryCk <E `readj(Xm)
or `tryCi <E `readj(Xm) <E `tryCk . Let Ti and Tk be executed by processes pi and pk respectively.
Consider the case that `tryCi <E `tryCk <E `readj(Xm).
By the assignment of linearization points, Tk returns a response from the event in Line 30 before the
read of vm by Tj in Line 13. Since Ti and Tk are both committed in E, pk returns true from the event
in Line 30 only after Ti writes 0 to rim in Line 44 (Lemma 4.17).
58
4.3 Progressive TMs
Recall that readj(Xm) checks if Xm is locked by a concurrent transaction (i.e Lj 6= 0), then performs
read-validation (Line 15) before returning a matching response. Consider the following possible sequence
of events: Tk returns true from the acquire function invocation, sets Lj to 1 for every Xj ∈ Wset(Tk)
(Line 54) and updates the value of Xm to shared-memory (Line 37). The implementation of readj(Xm)
then reads the base object vm associated with Xm after which Tk releases Xm by writing 0 to rkm
and finally Tj performs the check in Line 15. However, readj(Xm) is forced to return Aj because
Xm ∈ Rset(Tj) (Line 14) and has been invalidated since last reading its value. Otherwise suppose that
Tk acquires exclusive access to Xm by writing 1 to rkm and returns true from the invocation of acquire,
updates vm in Line 37), Tj reads vm, Tj performs the check in Line 15 and finally Tk releases Xm by
writing 0 to rkm. Again, readj(Xm) returns Aj since Tj reads that rkm is 1—contradiction.
Thus, `tryCi <E `readj(X) <E `tryCk .
We now need to prove that δTj indeed precedes `tryCk in E.
Consider the two possible cases:
• Suppose that Tj is a read-only or aborted transaction in H¯. Then, δTj is assigned to the last
t-read performed by Tj that returns a non-Aj value. If readj(Xm) is not the last t-read performed
by Tj that returned a non-Aj value, then there exists a readj(Xz) performed by Tj such that
`readj(Xm) <E `tryCk <E `readj(Xz). Now assume that `tryCk must precede `readj(Xz) to obtain a
legal S. Since Tk and Tj are concurrent in E, we are restricted to the case that Tk performs a
writek(Xz, v) and readj(Xz) returns v. However, we claim that this t-read of Xz must abort by
performing the checks in Line 15. Observe that Tk writes 1 to Lm, Lz each (Line 54) and then
writes new values to base objects vm, vz (Line 37). Since readj(Xz) returns a non-Aj response, Tk
writes 0 to Lz before the read of Lz by readj(Xz) in Line 15. Thus, the t-read of Xz would return
Aj (in Line 17 after validation of the read set since Xm has been updated— contradiction to the
assumption that it the last t-read by Tj to return a non-Aj response.
• Suppose that Tj is an updating transaction that commits, then δTj = `tryCj which implies that
`readj(Xm) <E `tryCk <E `tryCj . Then, Tj must necessarily perform the checks in Line 33 and read
that Lm is 1. Thus, Tj must return Aj—contradiction to the assumption that Tj is a committed
transaction.
The conjunction of Claims 4.10 and 4.12 establish that Algorithm 4.1 is opaque.
We can now prove the following theorem:
Theorem 4.13. Algorithm 4.1 describes a progressive, opaque and strict DAP TM implementation LP
that provides wait-free TM-liveness, uses invisible reads, uses only read-write base objects, and for every
execution E and transaction Tk ∈ txns(E):
• Tk performs at most a single RAW, and
• every t-read operation invoked by Tk incurs O(1) memory stalls in E, and
• every complete t-read operation invoked by Tk performs O(|Rset(Tk)|) steps in E.
Proof. (TM-liveness and TM-progress) Since none of the implementations of the t-operations in Algo-
rithm 4.1 contain unbounded loops or waiting statements, every t-operation opk returns a matching
response after taking a finite number of steps in every execution. Thus, Algorithm 4.1 provides wait-free
TM-liveness.
To prove progressiveness, we proceed by enumerating the cases under which a transaction Tk may be
aborted.
59
Chapter 4 Complexity bounds for blocking TMs
• Suppose that there exists a readk(Xj) performed by Tk that returns Ak from Line 15. Thus, there
exists a process pt executing a transaction that has written 1 to rtj in Line 48, but has not yet
written 0 to rtj in Line 44 or some t-object in Rset(Tk) has been updated since its t-read by Tk. In
both cases, there exists a concurrent transaction performing a t-write to some t-object in Rset(Tk).
• Suppose that tryCk performed by Tk that returns Ak from Line 31. Thus, there exists a process
pt executing a transaction that has written 1 to rtj in Line 48, but has not yet written 0 to rtj in
Line 44. Thus, Tk encounters step-contention with another transaction that concurrently attempts
to update a t-object in Wset(Tk).
• Suppose that tryCk performed by Tk that returns Ak from Line 33. Since Tk returns Ak from
Line 33 for the same reason it returns Ak after Line 15, the proof follows.
(Strict disjoint-access parallelism) Consider any execution E of Algorithm 4.1 and let Ti and Tj be any
two transactions that participate in E and access the same base object b in E.
• Suppose that Ti and Tj contend on base object vj or Lj . Since for every t-object Xj , there exists
distinct base objects vj and Lj , Tj and Tj contend on vj only if Xj ∈ Dset(Ti) ∩Dset(Tj).
• Suppose that Ti and Tj contend on base object rij . Without loss of generality, let pi be the
process executing transaction Ti; Xj ∈Wset(Ti) that writes 1 to rij in Line 48. Indeed, no other
process executing a transaction that writes to Xj can write to rij . Transaction Tj reads rij only
if Xj ∈ Dset(Tj) as evident from the accesses performed in Lines 48, 49, 44, 27.
Thus, Ti and Tj access the same base object only if they access a common t-object.
(Opacity) Follows from Lemma 4.9.
(Invisible reads) Observe that read-only transactions do not perform any nontrivial events. Secondly,
in any execution E of Algorithm 4.1, and any transaction Tk ∈ txns(E), if Xj ∈ Rset(Tk), Tk does not
write to any of the base objects associated with Xj nor write any information that reveals its read set
to other transactions.
(Complexity) Consider any execution E of Algorithm 4.1.
• For any Tk ∈ txns(E), each readk only applies trivial primitives in E while tryCk simply returns
Ck if Wset(Tk) = ∅. Thus, Algorithm 4.1 uses invisible reads.
• Any read-only transaction Tk ∈ txns(E) not perform any RAW or AWAR. An updating transaction
Tk executed by process pi performs a sequence of writes (Line 48 to base objects {rij} : Xj ∈
Wset(Tk), followed by a sequence of reads to base objects {rtj} : t ∈ {1, . . . , n}, Xj ∈ Wset(Tk)
(Line 49) thus incurring a single multi-RAW.
• Let e be a write event performed by some transaction Tk executed by process pi in E on base
objects vj and Lj (Lines 37 and 54). Any transaction Tk performs a write to vj or Lj only after
Tk writes 0 to rij , for every Xj ∈Wset(Tk). Thus, by Lemmata 4.17 and 4.9, it follows that events
that involve an access to either of these base objects incurs O(1) stalls.
Let e be a write event on base object rij (Line 48) while writing to t-object Xj . By Algorithm 4.1,
no other process can write to rij . It follows that any transaction Tk ∈ txns(E) incurs O(1) memory
stalls on account of any event it performs in E. Observe that any t-read readk(Xj) only accesses
base objects vj , Lj and other value base objects in Rset(Tk). But as already established above,
these are O(1) stall events. Hence, every t-read operation incurs O(1)-stalls in E.
The following corollary follows from Theorems 4.13 and 4.2.
Corollary 4.14. Let M be any weak DAP progressive opaque TM implementation providing ICF TM-
liveness and weak invisible reads. Then, for every execution E and each read-only transaction Tk ∈
txns(E), Tk performs Θ(m2) steps in E, where m = |RsetE(Tk)|.
60
4.4 Strongly progressive TMs
Algorithm 4.2 Strict DAP progressive strictly serializable TM implementation; code for Tk executed
by process pi
1: readk(Xj):
2: if Xj 6∈ Rset(Tk) then
3: [ovj , kj ] := read(vj)
4: Rset(Tk) := Rset(Tk) ∪ {Xj , [ovj , kj ]}
5: if read(Lj) 6= 0 then
6: Return Ak
7: Return ovj
8: else
9: [ovj ,⊥] := Rset(Tk).locate(Xj)
10: Return ovj
Similarly, we can prove an almost matching upper bound for Theorem 4.2 for strictly serializable pro-
gressive TMs.
Consider Algorithm 4.2 that is a simplification of the opaque progressive TM in Algorithm 4.1: we remove
the validation performed in the implementation of a t-read, i.e., Line 17 in Algorithm 4.1; otherwise, the
two algorithms are identical. It is easy to see this results in a strictly serializable (but not opaque) TM
implementation. Thus,
Theorem 4.15. Algorithm 4.2 describes a progressive, strictly serializable and strict DAP TM imple-
mentation that provides wait-free TM-liveness, uses invisible reads, uses only read-write base objects, and
for every execution E and transaction Tk ∈ txns(E): every t-read operation invoked by Tk performs O(1)
steps and tryCk performs O(|Rset(Tk)|) steps in E.
Corollary 4.16. Let M be any weak DAP progressive strictly serializable TM implementation provid-
ing ICF TM-liveness and weak invisible reads. Then, for every execution E and each read-only trans-
action Tk ∈ txns(E), each readk performs O(1) steps and tryCk performs Θ(m) steps in E, where
m = |RsetE(Tk)|.
4.4 Strongly progressive TMs
In this section, we prove that every strongly progressive strictly serializable TM that uses only read,
write and conditional primitives has an execution in which in which n concurrent processes perform
transactions on a single data item and incur Ω(log n) remote memory references [14].
We then describe a constant RAW/AWAR strongly progressive TM providing starvation-free TM-liveness
from read-write base objects.
4.4.1 A Ω(n logn) lower bound on remote memory references
Our lower bound on RMR complexity of strongly progressive TMs is derived by reduction to mutual
exclusion.
Mutual exclusion. The mutex object supports two operations: Entry and Exit, both of which return
the response ok. We say that a process pi is in the critical section after an execution pi if pi contains the
invocation of Entry by pi that returns ok, but does not contain a subsequent invocation of Exit by pi in
pi.
A mutual exclusion implementation satisfies the following properties:
• (Mutual-exclusion) After any execution pi, there exists at most one process that is in the critical
section.
61
Chapter 4 Complexity bounds for blocking TMs
• (Deadlock-freedom) Let pi be any execution that contains the invocation of Enter by process pi.
Then, in every extension of pi in which every process takes infinitely many steps, some process is
in the critical section.
• (Finite-exit) Every process completes the Exit operation within a finite number of steps.
We describe an implementation of a mutex object L(M) from a strictly serializable, strongly progressive
TM implementation M providing wait-free TM-liveness (Algorithm 4.3). The algorithm is based on the
mutex implementation in [84].
Given a sequential implementation, we use a TM to execute the sequential code in a concurrent en-
vironment by encapsulating each sequential operation within an atomic transaction that replaces each
read and write of a t-object with the transactional read and write implementations, respectively. If the
transaction commits, then the result of the operation is returned; otherwise if one of the transactional
operations aborts. For instance, in Algorithm 4.3, we wish to atomically read a t-object X, write a
new value to it and return the old value of X prior to this write. To achieve this, we employ a strictly
serializable TM implementation M . Since we assume that M is strongly progressive, in every execution,
at least one transaction successfully commits and the value of X is returned.
Shared objects. We associate each process pi with two alternating identities [pi, facei]; facei ∈ {0, 1}.
The strongly progressive TM implementation M is used to enqueue processes that attempt to enter
the critical section within a single t-object X (initially ⊥). For each [pi, facei], L(M) uses a register
bit Done[pi, facei] that indicates if this face of the process has left the critical section or is executing
the Entry operation. Additionally, we use a register Succ[pi, facei] that stores the process expected to
succeed pi in the critical section. If Succ[pi, facei] = pj , we say that pj is the successor of pi (and pi is the
predecessor of pj). Intuitively, this means that pj is expected to enter the critical section immediately
after pi. Finally, L(M) uses a 2-dimensional bit array Lock : for each process pi, there are n− 1 registers
associated with the other processes. For all j ∈ {0, . . . , n− 1} \ {i}, the registers Lock [pi][pj ] are local to
pi and registers Lock [pj ][pi] are remote to pi. Process pi can only access registers in the Lock array that
are local or remote to it.
Entry operation. A process pi adopts a new identity facei and writes false to Done(pi, facei) to
indicate that pi has started the Entry operation. Process pi now initializes the successor of [pi, facei] by
writing ⊥ to Succ[pi, facei]. Now, pi uses a strongly progressive TM implementation M to atomically
store its pid and identity i.e., facei to t-object X and returns the pid and identity of its predecessor, say
[pj , facej ]. Intuitively, this suggests that [pi, facei] is scheduled to enter the critical section immediately
after ]pj , facej ] exits the critical section. Note that if pi reads the initial value of t-object X, then it
immediately enters the critical section. Otherwise it writes locked to the register Lock [pi, pj ] and sets
itself to be the successor of [pj , facej ] by writing pi to Succ[pj , facej ]. Process pi now checks if pj has
started the Exit operation by checking if Done[pj , facej ] is set. If it is, pi enters the critical section;
otherwise pi spins on the register Lock [pi][pj ] until it is unlocked.
Exit operation. Process pi first indicates that it has exited the critical section by setting Done[pi, facei],
following which it unlocks the register Lock [Succ[pi, facei]][pi] to allow pi’s successor to enter the critical
section.
Lemma 4.17. The implementation L(M) (Algorithm 4.3) satisfies mutual exclusion.
Proof. Let E be any execution of L(M). We say that [pi, facei] is the successor of [pj , facej ] if pi reads
the value of prev in Line 25 to be [pj , facej ] (and [pj , facej ] is the predecessor of [pi, facei]); otherwise if
pi reads the value to be ⊥, we say that pi has no predecessor.
Suppose by contradiction that there exist processes pi and pj that are both inside the critical section
after E. Since pi is inside the critical section, either (1) pi read prev = ⊥ in Line 23, or (2) pi read
that Done[prev ] is true (Line 29) or pi reads that Done[prev ] is false and Lock [pi][prev .pid ] is unlocked
(Line 30).
(Case 1) Suppose that pi read prev = ⊥ and entered the critical section. Since in this case, pi does not
have any predecessor, some other process that returns successfully from the while loop in Line 25 must
62
4.4 Strongly progressive TMs
Algorithm 4.3 Mutual-exclusion object L from a strongly progressive, strict serializable TM M ; code
for process pi; 1 ≤ i ≤ n
1: Local variables:
2: bit facei, for each process pi
3: Shared objects:
4: strongly progressive, strictly
5: serializable TM M
6: t-object X, initially ⊥
7: storing value v ∈ {[pi, facei]} ∪ {⊥}
8: for each tuple [pi, facei]
9: Done[pi, facei] ∈ {true, false}
10: Succ[pi, facei] ∈ {p1, . . . , pn} ∪ {⊥}
11: for each pi and j ∈ {1, . . . , n} \ {i}
12: Lock [pi][pj ] ∈ {locked , unlocked}
13: Function: func():
14: atomic using M
15: value := tx-read(X)
16: tx-write(X, [pi, facei])
17: on abort Return false
18: Return value
19: Entry:
20: facei := 1− facei
21: Done[pi, facei].write(false)
22: Succ[pi, facei].write(⊥)
23: while (prev ← func) = false do
24: no op
25: end while
26: if prev 6= ⊥ then
27: Lock [pi][prev .pid].write(locked)
28: Succ[prev ].write(pi)
29: if Done[prev ] = false then
30: while Lock [pi][prev .pid] = unlocked do
31: no op
32: end while
33: Return ok
34: // Critical section
35: Exit:
36: Done[pi, facei].write(true)
37: Lock [Succ[pi, facei]][pi].write(unlocked)
38: Return ok
be successor of pi in E. Since there exists [pj , facej ] also inside the critical section after E, pj reads that
either [pi, facei] or some other process to be its predecessor. Observe that there must exist some such
process [pk, facek] whose predecessor is [pi, facei]. Hence, without loss of generality, we can assume that
[pj , facej ] is the successor of [pi, facei]. By our assumption, [pj , facej ] is also inside the critical section.
Thus, pj locked the register Lock [pj , pi] in Line 27 and set itself to be pi’s successor in Line 28. Then,
pj read that Done[pi, facei] is true or read that Done[pi, facei] is false and waited until Lock [pj , pi] is
unlocked and then entered the critical section. But this is possible only if pi has left the critical section
and updated the registers Done[pi, facei] and Lock [pj , pi] in Lines 36 and 37 respectively—contradiction
to the assumption that [pi, facei] is also inside the critical section after E.
(Case 2) Suppose that pi did not read prev = ⊥ and entered the critical section. Thus, pi read that
Done[prev ] is false in Line 29 and Lock [pi][prev .pid ] is unlocked in Line 30, where prev is the predecessor
of [pi, facei]. As with case 1, without loss of generality, we can assume that [pj , facej ] is the successor of
[pi, facei] or [pj , facej ] is the predecessor of [pi, facei].
Suppose that [pj , facej ] is the predecessor of [pi, facei], i.e., pi writes the value [pi, facei] to the register
Succ[pj , facej ] in Line 28. Since [pj , facej ] is also inside the critical section after E, process pi must read
that Done[pj , facej ] is true in Line 29 and Lock [pi, pj ] is locked in Line 30. But then pi could not have
entered the critical section after E—contradiction.
Suppose that [pj , facej ] is the successor of [pi, facei], i.e., pj writes the value [pj , facej ] to the register
Succ[pi, facei]. Since both pi and pj are inside the critical section after E, process pj must read that
Done[pi, facei] is true in Line 29 and Lock [pj , pi] is locked in Line 30. Thus, pj must spin on the register
Lock [pj , pi], waiting for it to be unlocked by pi before entering the critical section—contradiction to the
assumption that both pi and pj are inside the critical section.
Thus, L(M) satisfies mutual-exclusion.
Lemma 4.18. The implementation L(M) (Algorithm 4.3) provides deadlock-freedom.
Proof. Let E be any execution of L(M). Observe that a process may be stuck indefinitely only in
Lines 23 and 30 as it performs the while loop.
Since M is strongly progressive, in every execution E that contains an invocation of Enter by process pi,
some process returns true from the invocation of func() in Line 23.
63
Chapter 4 Complexity bounds for blocking TMs
Now consider a process pi that returns successfully from the while loop in Line 23. Suppose that pi is
stuck indefinitely as it performs the while loop in Line 30. Thus, no process has unlocked the register
Lock [pi][prev .pid ] by writing to it in the Exit section. Recall that since [pi, facei] has reached the while
loop in Line 30, [pi, facei] necessarily has a predecessor, say [pj , facej ], and has set itself to be pj ’s
successor by writing pi to register Succ[pj , facej ] in Line 28. Consider the possible two cases: the
predecessor of [pj , facej is some process pk;k 6= i or the predecessor of [pj , facej is the process pi itself.
(Case 1) Since by assumption, process pj takes infinitely many steps in E, the only reason that pj is
stuck without entering the critical section is that [pk, facek] is also stuck in the while loop in Line 30.
Note that it is possible for us to iteratively extend this execution in which pk’s predecessor is a process
that is not pi or pj that is also stuck in the while loop in Line 30. But then the last such process must
eventually read the corresponding Lock to be unlocked and enter the critical section. Thus, in every
extension of E in which every process takes infinitely many steps, some process will enter the critical
section.
(Case 2) Suppose that the predecessor of [pj , facej is the process pi itself. Thus, as [pi, face ] is stuck in
the while loop waiting for Lock [pi, pj ] to be unlocked by process pj , pj leaves the critical section, unlocks
Lock [pi, pj ] in Line 37 and prior to the read of Lock [pi, pj ], pj re-starts the Entry operation, writes false
to Done[pj , 1− facej ] and sets itself to be the successor of [pi, facei] and spins on the register Lock [pj , pi].
However, observe that process pi, which takes infinitely many steps by our assumption must eventually
read that Lock [pi, pj ] is unlocked and enter the critical section, thus establishing deadlock-freedom.
We say that a TM implementation M accesses a single t-object if in every execution E of M and every
transaction T ∈ txns(E), |Dset(T )| ≤ 1. We can now prove the following theorem:
Theorem 4.19. Any strictly serializable, strongly progressive TM implementation M that accesses a
single t-object implies a deadlock-free, finite exit mutual exclusion implementation L(M) such that the
RMR complexity of M is within a constant factor of the RMR complexity of L(M).
Proof. (Mutual-exclusion) Follows from Lemma 4.17.
(Finite-exit) The proof is immediate since the Exit operation contains no unbounded loops or waiting
statements.
(Deadlock-freedom) Follows from Lemma 4.18.
(RMR complexity) First, let us consider the CC model. Observe that every event not on M performed
by a process pi as it performs the Entry or Exit operations incurs O(1) RMR cost clearly, possibly barring
the while loop executed in Line 30. During the execution of this while loop, process pi spins on the
register Lock [pi][pj ], where pj is the predecessor of pi. Observe that pi’s cached copy of Lock [pi][pj ] may
be invalidated only by process pj as it unlocks the register in Line 37. Since no other process may write
to this register and pi terminates the while loop immediately after the write to Lock [pi][pj ] by pj , pi
incurs O(1) RMR’s. Thus, the overall RMR cost incurred by M is within a constant factor of the RMR
cost of L(M).
Now we consider the DSM model. As with the reasoning for the CC model, every event not on M
performed by a process pi as it performs the Entry or Exit operations incurs O(1) RMR cost clearly,
possibly barring the while loop executed in Line 30. During the execution of this while loop, process pi
spins on the register Lock [pi][pj ], where pj is the predecessor of pi. Recall that Lock [pi][pj ] is a register
that is local to pi and thus, pi does not incur any RMR cost on account of executing this loop. It follows
that pi incurs O(1) RMR cost in the DSM model. Thus, the overall RMR cost of M is within a constant
factor of the RMR cost of L(M) in the DSM model.
Theorem 4.20. ([22]) Any deadlock-free, finite-exit mutual exclusion implementation from read, write
and conditional primitives has an execution whose RMR complexity is Ω(n log n).
Theorems 4.20 and 4.19 imply:
64
4.4 Strongly progressive TMs
Theorem 4.21. Any strictly serializable, strongly progressive TM implementation with wait-free TM-
liveness from read, write and conditional primitives that accesses a single t-object has an execution whose
RMR complexity is Ω(n log n).
4.4.2 A constant expensive synchronization opaque TM
In this section, we describe a strongly progressive opaque TM implementation providing starvation-free
TM-liveness from read-write base objects with constant RAW/AWAR cost. For our implementation, we
define and implement a starvation-free multi-trylock object.
Starvation-free multi-trylock. A multi-trylock provides exclusive write-access to a set Q of t-objects.
Specifically, a multi-trylock exports the following operations
• acquire(Q) returns true or false
• release(Q) releases the lock and returns ok
• isContended(Xj), Xj ∈ Q returns true or false
We assume that processes are well-formed: they never invoke a new operation on the multi-trylock before
receiving response from the previous invocation.
We say that a process pi holds a lock on Xj after an execution pi if pi contains the invocation of acquire(Q),
Xj ∈ Q by pi that returned true, but does not contain a subsequent invocation of release(Q′), Xj ∈ Q′,
by pi in pi. We say that Xj is locked after pi by process pi if pi holds a lock on Xj after pi.
We say that Xj is contended by pi after an execution pi if pi contains the invocation of acquire(Q), Xj ∈ Q,
by pi but does not contain a subsequent return false or return of release(Q′), Xj ∈ Q′, by pi in pi.
Let an execution pi contain the invocation iop of an operation op followed by a corresponding response
rop (we say that pi contains op). We say that Xj is uncontended (resp., locked) during the execution of
op in pi if Xj is uncontended (resp., locked) after every prefix of pi that contains iop but does not contain
rop.
We implement a multi-trylock object whose operations are starvation-free. The algorithm is inspired by
the Black-White Bakery Algorithm [121] and uses a finite number of bounded registers.
A starvation-free multi-trylock implementation satisfies the following properties:
• Mutual-exclusion: For any object Xj , and any execution pi, there exists at most one process that
holds a lock on Xj after pi.
• Progress: Let pi be any execution that contains acquire(Q) by process pi. If no other process
pk, k 6= i contends infinitely long on some Xj ∈ Q, then acquire(Q) returns true in pi.
• Let pi be any execution that contains isContended(Xj) invoked by pi.
– If Xj is locked by p`; ` 6= i during the complete execution of isContended(Xj) in pi, then
isContended(Xj) returns true.
– If ∀` 6= i, Xj is never contended by p` during the execution of isContended(Xj) in pi, then
isContended(Xj) returns false.
Our starvation-free multi-trylock in Algorithm 4.4 uses the following shared variables: registers rij for
each process pi and object Xj , a shared bit color ∈ {B,W}, registers LAi ∈ {0, . . . , N} for each pi that
denote a Label and MCi ∈ {B,W} for each pi.
We say (LAi, i) < (LAk, k) iff LAi < LAk or LAi = LAk and i < k. We now prove the following
invariant about the multi-trylock implementation.
Lemma 4.22. In every execution pi of Algorithm 4.4, if pi holds a lock on some object Xj after pi, then
one of the following conditions must hold:
65
Chapter 4 Complexity bounds for blocking TMs
Algorithm 4.4 Starvation-free multi-trylock invoked by process pi
1: Shared variables:
2: LAi, for each process pi, initially 0
3: MCi ∈ {B,W} for each process pi, initially W
4: color ∈ {B,W}, initally W
5: rij , for each process pi and each t-object Xj , initially 0
6: acquire(Q):
7: for all Xj ∈ Q do
8: write(rij , 1)
9: ci := color
10: write(MCi, ci)
11: write(LAi, 1 +max({LAk)|MCk = MCi})
12: while ∃j : ∃k 6= i: isContended(Xj) && ((LAk 6= 0; (MCk = MCi); (LAk, k) < (LAi, i)) ||
13: (LAk 6= 0; (MCk 6= MCi); MCi = color)) do
14: no op
15: end while
16: return true
17: release(Q):
18: for all Xj ∈ Q do
19: write(rij , 0)
20: if MCi = B then
21: write(color,W )
22: else
23: write(color, B)
24: write(LAi, 0)
25: return ok
26: isContended(Xj):
27: if ∃pt : rtj 6= 0, t 6= i then
28: return true
29: return false
(1) for some k 6= i; LAk 6= 0, if MCk = MCi, then (LAk, k) > (LAi, i)
(2) for some k 6= i; LAk 6= 0, if MCk 6= MCi, then MCi 6= color
Proof. In order to hold the lock on Xj , some process pi writes 1 to rij , writes a value, say W to MCi
and reads the Labels of other processes that have obtained the same color as itself and generates a Label
greater by one than the maximum Label read (Line 11). Observe that until the value of the color bit is
changed, all processes read the same value W . The first process pi to hold the lock on Xj changes the
color bit to B when releasing the lock and hence the value read by all subsequent processes will be B
until it is changed again. Now consider two cases:
(1) Assume that there exists a process pk, k 6= i, LAk 6= 0 and MCk = MCi such that (LAk, k) <
(LAi, i), but pi holds a lock on Xj after pi. Thus, isContended(Xj) returns true to pi because pk
writes to rkj (Line 8) before writing to LAk (Line 11). By assumption, (LAk, k) < (LAi, i);LAk >
0 and MCi = MCk, but the conditional in Line 13 returned true to pi without waiting for pk to
stop contending on Xj—contradiction.
(2) Assume that there exists a process pk, k 6= i, LAk 6= 0 and MCk 6= MCi such that MCi = color,
but pi holds a lock on Xj after pi. Again, since LAk > 0, isContended(Xj) returns true to pi,
MCk 6= MCi andMCi = color, but the conditional in Line 13 returned true to pi without waiting
for pk to stop contending on Xj—contradiction.
We can thus prove the following theorem:
Theorem 4.23. Algorithm 4.4 is an implementation of multi-trylock object in which every operation is
starvation-free and incurs at most four RAWs.
66
4.4 Strongly progressive TMs
Proof. Denote by L the shared object implemented by Algorithm 4.4.
Assume, by contradiction, that L does not provide mutual-exclusion: there exists an execution pi after
which processes pi and pk, k 6= i hold a lock on the same object, say Xj . Since both pi and pk have
performed the write to LAi and LAk resp. in Line 11, LAi, LAk > 0. Consider two cases:
(1) IfMCk = MCi, then from Condition 1 of Lemma 4.22, we have (LAk, k) < (LAi, i) and (LAk, k) >
(LAi, i)—contradiction.
(2) If MCk 6= MCi, then from Condition 2 of Lemma 4.22, we have MCi 6= color and MCk 6= color
which implies MCk = MCi—contradiction.
L also ensures progress. If process pi wants to hold the lock on an object Xj i.e. invokes acquire(Q), Xj ∈
Q, it checks if any other process pk holds the lock on Xj . If such a process pk exists and MCk = MCi,
then clearly isContended(Xj) returns true for pi and (LAk, k) < (LAi, i). Thus, pi fails the conditional
in Line 13 and waits until pk releases the lock on Xj to return true. However, if pk contends infinitely
long on Xj , pi is also forced to wait indefinitely to be returned true from the invocation of acquire(Q).
The same argument works when MCk 6= MCi since when pk stops contending on Xj , isContended(Xj)
eventually returns false for pi if pk does not contend infinitely long on Xj .
All operations performed by L are starvation-free. Each process pi that successfully holds the lock on
an object Xj in an execution pi invokes acquire(Q), Xj ∈ Q, obtains a color and chooses a value for
LAi since there is no way to be blocked while writing to LAi. The response of operation acquire(Q)
by pi is only delayed if there exists a concurrent invocation of acquire(Q′), Xj ∈ Q′ by pk in pi. In that
case, process pi waits until pk invokes release(Q) and writes 0 to rkj and eventually holds the lock on
Xj . The implementation of release and isContended are wait-free operations (and hence starvation-free)
since they contains no unbounded loops or waiting statements.
The implementation of isContended(Xj) only reads base objects. The implementation of release(Q)
writes to a series of base objects (Line 18) and then reads a base object (Line 20) incurring a single
RAW. The implementation of acquire(Q) writes to base objects (Line 8), reads the shared bit color
(Line 9)—one RAW, writes to a base object (Line 10), reads the Labels (Line 11)—one RAW, writes to
its own Label and finally performs a sequence of reads when evaluating the conditional in Line 13—one
RAW.
Thus, Algorithm 4.4 incurs at most four RAWs.
Strongly progressive TM from starvation-free multi-trylock. We now use the starvation-free
multi-trylock to implement a starvation-free strongly progressive opaque TM implementation with con-
stant expensive synchronization (Algorithm 4.5). The implementation is almost identical to the progres-
sive TM implementation LP in Algorithm 4.1, except that the function calls to acquire and release the
transaction’s write set are replaced with analogous calls to a multi-trylock object.
Theorem 4.24. Algorithm 4.5 implements a strongly progressive opaque TM implementation with
starvation-free t-operations that uses invisible reads and employs at most four RAWs per transaction.
Proof. (Opacity) Since Algorithm 4.5 is similar to the opaque progressive TM implementation in Algo-
rithm 4.1, it is easy to adapt the proof of Lemma 4.9 to prove opacity for this implementation.
(TM-progress and TM-liveness) Every transaction Tk in a TM M whose t-operations are defined by
Algorithm 4.5 can be aborted in the following scenarios:
• Read-validation failed in readk or tryCk
• readk or tryCk returned Ak because Xj ∈ Rset(Tk) is locked (belongs to write set of a concurrent
transaction)
67
Chapter 4 Complexity bounds for blocking TMs
Algorithm 4.5 Strongly progressive, opaque TM: the implementation of Tk executed by pi
1: Shared variables:
2: vj , for each t-object Xj
3: L, a starvation-free multi-trylock object
4: Local variables:
5: Rsetk,Wsetk for every transaction Tk;
6: dictionaries storing {Xm, vm}
7: readk(Xj):
8: if Xj 6∈ Rset(Tk) then
9: [ovj , kj ] := read(vj)
10: Rset(Tk) := Rset(Tk) ∪ {Xj , [ovj , kj ]}
11: if isAbortable() then
12: Return Ak
13: Return ovj
14: else
15: [ovj ,⊥] := Rset(Tk).locate(Xj)
16: Return ovj
17: writek(Xj, v):
18: nvj := v
19: Wset(Tk) := Wset(Tk) ∪ {Xj}
20: Return ok
21: tryCk():
22: if |Wset(Tk)| = ∅ then
23: Return Ck
24: locked := L.acquire(Wset(Tk))
25: if isAbortable() then
26: L.release(Wset(Tk))
27: Return Ak
28: for all Xj ∈Wset(Tk) do
29: write(vj , (nvj , k))
30: L.release(Wset(Tk))
31: return Ck
32: Function: isAbortable():
33: if ∃Xj ∈ Rset(Tk) : Xj 6∈ Wset(Tk) ∧
L.isContended(Xj) then
34: Return true
35: if validate() then
36: Return true
37: Return false
Since in each of these cases, a transaction is aborted only because of a read-write conflict with a concurrent
transaction, it is easy to see that M is progressive.
To show Algorithm 4.5 also implements a strongly progressive TM, we need to show that for every set
of transactions that concurrently contend on a single t-object, at least one of the transactions is not
aborted.
Consider transactions Ti and Tk that concurrently attempt to execute tryCi and tryCk such that Xj ∈
Wseti ∪Wsetk. Consequently, they both invoke the acquire operation of the multi-trylock (Line 24) and
thus, from Theorem 4.23, both Ti and Tk must commit eventually. Also, if validation of a t-read in Tk
fails, it means that the t-object is overwritten by some transaction Ti such that Ti precedes Tk, implying
at least one of the transactions commit. Otherwise, if some t-object Xj ∈ Rset(Tk) is locked and returns
abort since the t-object is in the write set of a concurrent transaction Ti. While it may still be possible
that Ti returns Ai after acquiring the lock on Wseti, strong progressiveness only guarantees progress for
transactions that conflict on at most one t-object. Thus, in either case, for every set of transactions that
conflict on at most one t-object, at least one transaction is not forcefully aborted.
Starvation-free TM-liveness follows from the fact that the multi try-lock we use in the implementation
of M provides starvation-free acquire and release operations.
(Complexity) Any process executing a transaction Tk holds the lock on Wset(Tk) only once during
tryCk. If |Wset(Tk)| = ∅, then the transaction simply returns Ck incurring no RAW’s. Thus, from
Theorem 4.23, Algorithm 4.5 incurs at most four RAWs per updating transaction and no RAW’s are
performed in read-only transactions.
4.5 On the cost of permissive opaque TMs
We have shown that (strongly) progressive TMs that allow a transaction to be aborted only on read-write
conflicts have constant RAW/AWAR complexity. However, not aborting on conflicts may not necessarily
affect TM-correctness. Ideally, we would like to derive TM implementations that are permissive, in the
sense that a transaction is aborted only if committing it would violate TM-correctness.
68
4.5 On the cost of permissive opaque TMs
Definition 4.2 (Permissiveness). A TM implementationM is permissive with respect to TM-correctness
C if for every history H of M such that H ends with a response rk and replacing rk with some rk 6= Ak
gives a history that satisfies C, we have rk 6= Ak.
Therefore, permissiveness does not allow a transaction to abort, unless committing it would violate the
execution’s correctness.
We first show that a transaction in a permissive opaque implementation can only be forcefully aborted
if it tries to commit:
Lemma 4.25. Let a TM implementation M be permissive with respect to opacity. If a transaction Ti
is forcefully aborted executing a t-operation opi, then opi is tryCi.
Proof. Suppose, by contradiction, that there exists a history H ofM such that some opi ∈ {readi,writei}
executed within a transaction Ti returns Ai. Let H0 be the shortest prefix of H that ends just before
opi returns. By definition, H0 is opaque and any history H0 · ri where ri 6= Ai is not opaque. Let H ′0 be
the serialization of H0.
If opi is a write, then H0 ·oki is also opaque - no write operation of the incomplete transaction Ti appears
in H ′0 and, thus, H ′0 is also a serialization of H0 · oki.
If opi is a read(X) for some t-object X, then we can construct a serialization of H0 · v where v is the
value of X written by the last committed transaction in H ′0 preceding Ti or the initial value of X if there
is no such transaction. It is easy to see that H0” obtained from H ′0 by adding read(X) · v at the end of
Ti is a serialization of H0 · read(X). In both cases, there exists a non-Ai response ri to opi that preserves
opacity of H0 · ri, and, thus, the only t-operation that can be forcefully aborted in an execution of M is
tryC.
We now show that an execution of a transaction in a permissive opaque TM implementation (providing
starvation-free TM-liveness) may require to perform at least one RAW/AWAR pattern per t-read.
Theorem 4.26. Let M be a permissive opaque TM implementation providing starvation-free TM-
liveness. Then, for any m ∈ N, M has an execution in which some transaction performs m t-reads
such that the execution of each t-read contains at least one RAW or AWAR.
Proof. Consider an execution E of M consisting of transactions T1, T2, T3 as shown in Figure 4.2: T3
performs a t-read of X1, then T2 performs a t-write on X1 and commits, and finally T1 performs a
series of reads from objects X1, . . . , Xm. Since the implementation is permissive, no transaction can be
forcefully aborted in E, and the only valid serialization of this execution is T3, T2, T1. Note also that the
execution generates a sequential history: each invocation of a t-operation is immediately followed by a
matching response. Thus, since we assume starvation-freedom as a liveness property, such an execution
exists.
We consider read1(Xk), 2 ≤ k ≤ m in execution E. Imagine that we modify the execution E as follows.
Immediately after read1(Xk) executed by T1 we add write3(X, v), and tryC3 executed by T3 (let TC3(Xk)
denote the complete execution of W3(Xk, v) followed by tryC3). Obviously, TC3(Xk) must return abort:
neither T3 can be serialized before T1 nor T1 can be serialized before T3. On the other hand if TC3(Xk)
takes place just before read1(Xk), then TC3(Xk) must return commit but read1(Xk) must return the
value written by T3. In other words, read1(Xk) and TC3(Xk) are strongly non-commutative [17]: both of
them see the difference when ordered differently. As a result, intuitively, read1(Xk) needs to perform a
RAW or AWAR to make sure that the order of these two “conflicting” operations is properly maintained.
We formalize this argument below.
Consider a modification E′ of E, in which T3 performs write3(Xk) immediately after read1(Xk) and then
tries to commit. In any serialization of E′, T3 must precede T2 (read3(X1) returns the initial value of X1)
and T2 must precede T1 to respect the real-time order of transactions. The execution of read1(Xk) does
not modify base objects, hence, T3 does not observe read1(Xk) in E′. Since M is permissive, T3 must
commit in E′. But since T1 performs read1(Xk) before T3 commits and T3 updates Xk, we also have
69
Chapter 4 Complexity bounds for blocking TMs
R1(X1) → nv R1(Xm)
W2(X1, nv)
R3(X1) → v
T1
T2 C2
T3
Figure 4.2: Execution E of a permissive, opaque TM: T2 and T3 force T1 to perform a RAW/AWAR in each
R1(Xk), 2 ≤ k ≤ m
T1 must precede T3 in any serialization. Thus, T3 cannot precede T1 in any serialization—contradiction.
Consequently, each read1(Xk) must perform a write to a base object.
Let pi be the execution fragment that represents the complete execution of read1(Xk) and Ek, the prefix
of E up to (but excluding) the invocation of read1(Xk).
Clearly, pi contains a write to a base object. Let piw be the first write to a base object in pi. Thus, pi can
be represented as pis ·piw ·pif . Suppose that pi does not contain a RAW or AWAR. Consider the execution
fragment Ek ·pis · ρ, where ρ is the complete execution of TC3(Xk) by T3. Such an execution of M exists
since pis does not perform any base object write, hence, Ek · pis · ρ is indistinguishable to T3 from Ek · ρ.
Since, by our assumption, piw · pif contains no RAW, any read performed in piw · pif can only be applied
to base objects previously written in piw · pif . Since piw is not an AWAR, Ek · pis · ρ · piw · pif is an
execution of M since it is indistinguishable to T1 from Ek · pi. In Ek · pis · ρ · piw · pif , T3 commits (as
in ρ) but T1 ignores the value written by T3 to Xk. But there exists no serialization that justifies this
execution—contradiction to the assumption that M is opaque. Thus, each read1(Xk), 2 ≤ k ≤ m must
contain a RAW/AWAR.
Note that since all t-reads of T1 are executed sequentially, all these RAW/AWAR patterns are pairwise
non-overlapping, which completes the proof.
4.6 Related work and Discussion
In this section, we summarize the complexity bounds for blocking TMs presented in this chapter and
identify some open questions.
Sequential TMs. Theorem 4.2 improves the read-validation step-complexity lower bound [62, 64]
derived for strict-data partitioning (a very strong version of DAP) and invisible reads. In a strict data
partitioned TM, the set of base objects used by the TM is split into disjoint sets, each storing information
only about a single data item. Indeed, every TM implementation that is strict data-partitioned satisfies
weak DAP, but not vice-versa (cf. Section 2.6). The definition of invisible reads assumed in [62, 64]
requires that a t-read operation does not apply nontrivial events in any execution. Theorem 4.2 however,
assumes weak invisible reads, stipulating that t-read operations of a transaction T do not apply nontrivial
events only when T is not concurrent with any other transaction. We believe that the TM-progress and
TM-liveness restrictions as well as the definitions of DAP and invisible reads we consider for this result
are the weakest possible assumptions that may be made. To the best of our knowledge, these assumptions
cover every TM implementation that is subject to the validation step-complexity [36, 39, 79].
Progressive TMs. We summarize the known complexity bounds for progressive (and resp. strongly
progressive) TMs in Table 4.1 (and resp. Table 4.2). Some questions remain open. Can the tight bounds
on step complexity for progressive TMs in Corollaries 4.14 and 4.16 be extended to strongly progressive
TMs?
Guerraoui and Kapalka [64] proved that it is impossible to implement strictly serializable strongly pro-
gressive TMs that provide wait-free TM-liveness (every t-operation returns a matching response within
70
4.6 Related work and Discussion
TM-correctness TM-liveness DAP Invisible reads Read-write Complexity
Opacity ICF weak yes yes Θ(|Rset|2) step-complexity
Strict serializability ICF weak yes yes Θ(|Rset|) step-complexity for tryCommit
Opacity WF strict yes yes O(1) RAW/AWAR, O(1) stalls for t-reads
Opacity starvation-free strict Θ(|Wset|) protected data
Table 4.1: Complexity bounds for progressive TMs.
TM-correctness TM-liveness Invisible reads rmw primitives Complexity
Strict serializability WF read-write Impossible
Strict serializability read-write, conditional Ω(n log n) RMRs
Opacity starvation-free yes read-write O(1) RAW/AWAR
Table 4.2: Complexity bounds for strongly progressive TMs.
a finite number of steps) using only read and write primitives. Algorithm 4.5 describes one means to
circumvent this impossibility result by describing an opaque strongly progressive TM implementation
from read-write base objects that provides starvation-free TM-liveness.
We conjecture that the lower bound of Theorem 4.21 on the RMR complexity is tight. Proving this
remains an interesting open question.
Permissive TMs. Crain et al. [34] proved that a permissive opaque TM implementation cannot main-
tain invisible reads, which inspired the derivation of our lower bound on RAW/AWAR complexity in
Section 4.5. Furthermore, [34] described a permissive VWC TM implementation that ensures that t-read
operations do not perform nontrivial primitives, but the tryCommit invoked by a read-only transaction
perform a linear (in the size of the transaction’s data set) number of RAW/AWARs. Thus, an open ques-
tion is whether there exists a linear lower bound on RAW/AWAR complexity for weaker (than opacity)
TM-correctness properties of VWC and TMS1.
71

5
Complexity bounds for non-blocking TMs
5.1 Overview
In the previous chapter, we presented complexity bounds for lock-based blocking TMs. Early TM im-
plementations such as the popular DSTM [79] however avoid using locks and provide non-blocking
TM-progress. In this chapter, we present several complexity bounds for non-blocking TMs exemplified
by obstruction-freedom, possibly the weakest non-blocking progress condition [78, 82].
We first establish that it is impossible to implement a strictly serializable obstruction-free TM that
provides both weak DAP and read invisibility. Indeed, popular obstruction-free TMs like DSTM [79]
and FSTM [52] are weak DAP, but use visible reads for aborting pending writing transactions. Secondly,
we show that a t-read operation in a n-process strictly serializable obstruction-free TM implementation
may incur Ω(n) stalls. Specifically, we prove that every such TM implementation has a (n − 1)-stall
execution for an invoked t-read operation. Thirdly, we prove that any RW DAP opaque obstruction-
free TM implementation has an execution in which a read-only transaction incurs Ω(n) non-overlapping
RAWs or AWARs. Finally, we show that there exists a considerable complexity gap between blocking
(i.e., progressive) and non-blocking (i.e., obstruction-free) TM implementations. We use the progressive
opaque TM implementation LP described in Algorithm 4.1 (Chapter 4) to establish a linear separation
in memory stall and RAW/AWAR complexity between blocking and non-blocking TMs.
Formally, let OF denote the class of TMs that provide OF TM-progress and OF TM-liveness.
Roadmap of Chapter 5. In Section 5.2, we show that no strictly serializable TM in OF can be
weak DAP and have invisible reads. In Section 5.3, we determine stall complexity bounds for strictly
serializable TMs in OF , and in Section 5.4, we present a linear (in n) lower bound on the RAW/AWAR
complexity for RW DAP opaque TMs in OF . In Section 5.5, we describe two obstruction-free algorithms:
a RW DAP opaque TM and a weak DAP (but not RW DAP) opaque TM. In Section 5.6, we present
complexity gaps between blocking and non-blocking TM implementations. We conclude this chapter
with a discussion on related work and open questions concerning obstruction-free TMs.
5.2 Impossibility of weak DAP and invisible reads
In this section, we prove that it is impossible to combine weak DAP and invisible reads for strictly
serializable TMs in OF .
73
Chapter 5 Complexity bounds for non-blocking TMs
R0(Z)→ v W0(X,nv) tryC0 R2(X)→ v
initial value
(event of T0)
e
R1(X)→ nv
new value
T0 T2 T1
(a) T1 returns new value of X since T2 is invisible
R0(Z)→ v W0(X,nv) tryC0 R2(X)→ v
initial value
(event of T0)
e
R1(X)→ nv
new value
W3(Z, nv)
write new value
T0 T2 T3 T1
(b) T1 and T3 do not contend on any base object
R0(Z)→ v W0(X,nv) tryC0 R2(X)→ v
initial value
(event of T0)
e
R1(X)→ nv
new value
W3(Z, nv)
write new value
T0 T2 T3 T1
(c) T3 does not access the base object from the nontrivial event e
R0(Z)→ v W0(X,nv) tryC0 R2(X)→ v
initial value
(event of T0)
e
R1(X)→ nv
new value
W3(Z, nv)
write new value
T0 T3 T2 T1
(d) T3 and T2 do not contend on any base object
Figure 5.1: Executions in the proof of Theorem 5.1; execution in 5.1d is not strictly serializable
Here is a proof sketch: suppose, by contradiction, that such a TM implementation M exists. Consider
an execution E of M in which a transaction T0 performs a t-read of t-object Z (returning the initial
value v), writes nv (new value) to t-object X, and commits. Let E′ denote the longest prefix of E that
cannot be extended with the t-complete step contention-free execution of any transaction that reads nv
in X and commits.
Thus if T0 takes one more step, then the resulting execution E′ · e can be extended with the t-complete
step contention-free execution of a transaction T1 that reads nv in X and commits.
SinceM uses invisible reads, the following execution exists: E′ can be extended with the t-complete step
contention-free execution of a transaction T2 that reads the initial value v in X and commits, followed
by the step e of T0 after which transaction T1 running step contention-free reads nv in X and commits.
Moreover, this execution is indistinguishable to T1 and T2 from an execution in which the read set of
T0 is empty. Thus, we can modify this execution by inserting the step contention-free execution of a
committed transaction T3 that writes a new value to Z after E′, but preceding T2 in real-time order.
Intuitively, by weak DAP, transactions T1 and T2 cannot distinguish this execution from the original one
in which T3 does not participate.
Thus, we can show that the following execution exists: E′ is extended with the t-complete step contention-
free execution of T3 that writes nv to Z and commits, followed by the t-complete step contention-free
execution of T2 that reads the initial value v in X and commits, followed by the step e of T0, after which
T1 reads nv in X and commits.
This execution is, however, not strictly serializable: T0 must appear in any serialization (T1 reads a value
written by T0). Transaction T2 must precede T0, since the t-read of X by T2 returns the initial value of
X. To respect real-time order, T3 must precede T2. Finally, T0 must precede T3 since the t-read of Z
returns the initial value of Z. The cycle T0 → T3 → T2 → T0 implies a contradiction.
The formal proof follows.
Theorem 5.1. There does not exist a weak DAP strictly serializable TM implementation in OF that
uses invisible reads.
Proof. By contradiction, assume that such an implementation M ∈ OF exists. Let v be the initial value
of t-objects X and Z. Consider an execution E of M in which a transaction T0 performs read0(Z)→ v
(returning v), writes nv 6= v to X, and commits. Let E′ denote the longest prefix of E that cannot be
extended with the t-complete step contention-free execution of any transaction performing a t-read X
that returns nv and commits.
74
5.2 Impossibility of weak DAP and invisible reads
Let e be the enabled event of transaction T0 in the configuration after E′. Without loss of generality,
assume that E′ · e can be extended with the t-complete step contention-free execution of a committed
transaction T1 that reads X and returns nv. Let E′ · e · E1 be such an execution, where E1 is the
t-complete step contention-free execution fragment of transaction T1 that performs read1(X)→ nv and
commits.
We now prove that M has an execution of the form E′ · E2 · e · E1, where E2 is the t-complete step
contention-free execution fragment of transaction T2 that performs read2(X)→ v and commits.
We observe that E′ ·E2 is an execution ofM . Indeed, by OF TM-progress and OF TM-liveness, T2 must
return a matching response that is not A2 in E′ ·E2, and by the definition of E′, this response must be
the initial value v of X.
By the assumption of invisible reads, E2 does not contain any nontrivial events. Consequently, E′·E2·e·E1
is indistinguishable to transaction T1 from the execution E′ ·e·E1. Thus, E′ ·E2 ·e·E1 is also an execution
of M (Figure 5.1a).
Claim 5.2. M has an execution of the form E′ ·E2 ·E3 ·e ·E1 where E3 is the t-complete step contention-
free execution fragment of transaction T3 that writes nv 6= v to Z and commits.
Proof. The proof is through a sequence of indistinguishability arguments to construct the execution.
We first claim that M has an execution of the form E′ ·E2 · e ·E1 ·E3. Indeed, by OF TM-progress and
OF TM-liveness, T3 must be committed in E′ · E2 · e · E1 · E3.
Since M uses invisible reads, the execution E′ · E2 · e · E1 · E3 is indistinguishable to transactions T1
and T3 from the execution Eˆ · E1 · E3, where Eˆ is the t-incomplete step contention-free execution of
transaction T0 with WsetEˆ(T0) = {X}; RsetEˆ(T0) = ∅ that writes nv to X.
Observe that the execution E′ · E2 · e · E1 · E3 is indistinguishable to transactions T1 and T3 from the
execution Eˆ ·E1 ·E3, in which transactions T3 and T1 are disjoint-access. Consequently, by Lemma 2.10,
T1 and T3 do not contend on any base object in Eˆ · E1 · E3. Thus, M has an execution of the form
E′ · E2 · e · E3 · E1 (Figure 5.1b).
By definition of E′, T0 applies a nontrivial primitive to some base object, say b, in event e that T1
must access in E1. Thus, the execution fragment E3 does not contain any nontrivial event on b in the
execution E′ · E2 · e · E1 · E3. In fact, since T3 is disjoint-access with T0 in the execution Eˆ · E3 · E1, by
Lemma 2.10, it cannot access the base object b to which T0 applies a nontrivial primitive in the event e.
Thus, transaction T3 must perform the same sequence of events E3 immediately after E′, implying that
M has an execution of the form E′ · E2 · E3 · e · E1 (Figure 5.1c).
Finally, we observe that the execution E′ ·E2 ·E3 · e ·E1 established in Claim 5.2 is indistinguishable to
transactions T2 and T3 from an execution E˜ ·E2 ·E3 · e ·E1, where Wset(T0) = {X} and Rset(T0) = ∅ in
E˜. But transactions T3 and T2 are disjoint-access in E˜ ·E2 ·E3 · e ·E1 and by Lemma 2.10, T2 and T3 do
not contend on any base object in this execution. Thus, M has an execution of the form E′ ·E3 ·E2 ·e ·E1
(Figure 5.1d) in which T3 precedes T2 in real-time order.
However, the execution E′ · E3 · E2 · e · E1 is not strictly serializable: T0 must be committed in any
serialization and transaction T2 must precede T0 since read2(X) returns the initial value of X. To
respect real-time order, T3 must precede T2, while T0 must precede T1 since read1(X) returns nv, the
value of X updated by T0. Finally, T0 must precede T3 since read0(Z) returns the initial value of Z. But
there exists no such serialization—contradiction.
75
Chapter 5 Complexity bounds for non-blocking TMs
5.3 A linear lower bound on memory stall complexity
We prove a linear (in n) lower bound for strictly serializable TM implementations in OF on the total
number of memory stalls incurred by a single t-read operation.
Inductively, for each k ≤ n − 1, we construct a specific k-stall execution [46] in which some t-read
operation by a process p incurs k stalls. In the k-stall execution, k processes are partitioned into disjoint
subsets S1, . . . , Si. The execution can be represented as α · σ1 · · ·σi; α is p-free, where in each σj ,
j = 1, . . . , i, p first runs by itself, then each process in Sj applies a nontrivial event on a base object bj ,
and then p applies an event on bj . Moreover, p does not detect step contention in this execution and,
thus, must return a non-abort value in its t-read and commit in the solo extension of it. Additionally, it
is guaranteed that in any extension of α by the processes other than {p}∪S1∪S2∪ . . .∪Si, no nontrivial
primitive is applied on a base object accessed in σ1 · · ·σi.
Assuming that k ≤ n− 2, we introduce a not previously used process executing an updating transaction
immediately after α, so that the subsequent t-read operation executed by p is “perturbed” (must return
another value). This will help us to construct a (k + k′)-stall execution α · α′ · σ1 · · ·σi · σi+1, where
k′ > 0.
The formal proof follows:
Theorem 5.3. Every strictly serializable TM implementation M ∈ OF has a (n− 1)-stall execution E
for a t-read operation performed in E.
Proof. We proceed by induction. Observe that the empty execution is a 0-stall execution since it vacu-
ously satisfies the invariants of Definition 2.18.
Let v be the initial value of t-objects X and Z. Let α = α1 · · ·αn−2 be a step contention-free execution
of a strictly serializable TM implementation M ∈ OF , where for all j ∈ {1, . . . , n − 2}, αj is the
longest prefix of the execution fragment α¯j that denotes the t-complete step-contention free execution of
committed transaction Tj (invoked by process pj) that performs readj(Z) → v, writes value nv 6= v to
X in the execution α1 · · ·αj−1 · α¯j such that
• tryCj() is incomplete in αj ,
• α1 · · ·αj cannot be extended with the t-complete step contention-free execution fragment of any
transaction Tn−1 or Tn that performs exactly one t-read of X that returns nv and commits.
Assume, inductively, that α · σ1 · · ·σi is a k-stall execution for readn(X) executed by process pn, where
0 ≤ k ≤ n − 2. By Definition 2.18, there are distinct base objects b1, . . . bi accessed by disjoint sets of
processes S1 . . . Si in the execution fragment σ1 · · ·σi, where |S1 ∪ . . .∪ Si| = k and σ1 · · ·σi contains no
events of processes not in S1 ∪ . . . ∪ Si ∪ {pn}. We will prove that there exists a (k + k′)-stall execution
for readn(X), for some k′ ≥ 1.
By Lemma 2.12, α ·σ1 · · ·σi is indistinguishable to Tn from a step contention-free execution. Let σ be the
finite step contention-free execution fragment that extends α · σ1 · · ·σi in which Tn performs events by
itself: completes readn(X) and returns a response. By OF TM-progress and OF TM-liveness, readn(X)
and the subsequent tryCk must each return non-An responses in α ·σ1 · · ·σi ·σ. By construction of α and
strict serializability of M , readn(X) must return the response v or nv in this execution. We prove that
there exists an execution fragment γ performed by some process pn−1 6∈ ({pn} ∪ S1 ∪ · · · ∪ Si) extending
α that contains a nontrivial event on some base object that must be accessed by readn(X) in σ1 · · ·σi ·σ.
Consider the case that readn(X) returns the response nv in α · σ1 · · ·σi · σ. We define a step contention-
free fragment γ extending α that is the t-complete step contention-free execution of transaction Tn−1
executed by some process pn−1 6∈ ({pn} ∪ S1 ∪ · · · ∪ Si) that performs readn−1(X) → v, writes nv 6= v
to Z and commits. By definition of α, OF TM-progress and OF TM-liveness, M has an execution of
the form α · γ. We claim that the execution fragment γ must contain a nontrivial event on some base
object that must be accessed by readn(X) in σ1 · · ·σi · σ. Suppose otherwise. Then, readn(X) must
return the response nv in σ1 · · ·σi · σ. But the execution α · σ1 · · ·σi · σ is not strictly serializable. Since
76
5.3 A linear lower bound on memory stall complexity
readn(X)→ nv, there exists a transaction Tq ∈ txns(α) that must be committed and must precede Tn in
any serialization. Transaction Tn−1 must precede Tn in any serialization to respect the real-time order
and Tn−1 must precede Tq in any serialization. Also, Tq must precede Tn−1 in any serialization. But
there exists no such serialization.
Consider the case that readn(X) returns the response v in α ·σ1 · · ·σi ·σ. In this case, we define the step
contention-free fragment γ extending α as the t-complete step contention-free execution of transaction
Tn−1 executed by some process pn−1 6∈ ({pn} ∪ S1 ∪ · · · ∪ Si) that writes nv 6= v to X and commits. By
definition of α, OF TM-progress and OF TM-liveness, M has an execution of the form α · γ. By strict
serializability of M , the execution fragment γ must contain a nontrivial event on some base object that
must be accessed by readn(X) in σ1 · · ·σi · σ. Suppose otherwise. Then, σ1 · · ·σi · γ · σ is an execution of
M in which readn(X)→ v. But this execution is not strictly serializable: every transaction Tq ∈ txns(α)
must be aborted or must be preceded by Tn in any serialization, but committed transaction Tn−1 must
precede Tn in any serialization to respect the real-time ordering of transactions. But then readn(X) must
return the new value nv of X that is updated by Tn−1—contradiction.
Since, by Definition 2.18, the execution fragment γ executed by some process pn−1 6∈ ({pn}∪S1∪· · ·∪Si)
contains no nontrivial events to any base object accessed in σ1 · · ·σi, it must contain a nontrivial event
to some base object bi+1 6∈ {b1, . . . , bi} that is accessed by Tn in the execution fragment σ.
Let A denote the set of all finite ({pn} ∪ S1 . . . ∪ Si)-free execution fragments that extend α. Let
bi+1 6∈ {b1, . . . , bi} be the first base object accessed by Tn in the execution fragment σ to which some
transaction applies a nontrivial event in the execution fragment α′ ∈ A. Clearly, some such execution
α · α′ exists that contains a nontrivial event in α′ to some distinct base object bi+1 not accessed in
the execution fragment σ1 · · ·σi. We choose the execution α · α′ ∈ A that maximizes the number of
transactions that are poised to apply nontrivial events on bi+1 in the configuration after α ·α′. Let Si+1
denote the set of processes executing these transactions and k′ = |Si+1| (k′ > 0 as already proved).
We now construct a (k+ k′)-stall execution α ·α′ · σ1 · · ·σi · σi+1 for readn(X), where in σi+1, pn applies
events by itself, then each of the processes in Si+1 applies a nontrivial event on bi+1, and finally, pn
accesses bi+1.
By construction, α · α′ is pn-free. Let σi+1 be the prefix of σ not including Tn’s first access to bi+1,
concatenated with the nontrivial events on bi+1 by each of the k′ transactions executed by processes in
Si+1 followed by the access of bi+1 by Tn. Observe that Tn performs exactly one t-operation readn(X)
in the execution fragment σ1 · · ·σi+1 and σ1 · · ·σi+1 contains no events of processes not in ({pn} ∪ S1 ∪
· · · ∪ Si ∪ Si+1).
To complete the induction, we need to show that in every ({pn} ∪ S1 ∪ · · · ∪ Si ∪ Si+1)-free extension of
α · α′, no transaction applies a nontrivial event to any base object accessed in the execution fragment
σ1 · · ·σi ·σi+1. Let β be any such execution fragment that extends α ·α′. By our construction, σi+1 is the
execution fragment that consists of events by pn on base objects accessed in σ1 · · ·σi, nontrivial events
on bi+1 by transactions in Si+1 and finally, an access to bi+1 by pn. Since α ·σ1 · · ·σi is a k-stall execution
by our induction hypothesis, α′ ·β is ({pn}∪S1 . . .∪Si})-free and thus, α′ ·β does not contain nontrivial
events on any base object accessed in σ1 · · ·σi. We now claim that β does not contain nontrivial events
to bi+1. Suppose otherwise. Thus, there exists some transaction T ′ that has an enabled nontrivial event
to bi+1 in the configuration after α · α′ · β′, where β′ is some prefix of β. But this contradicts the choice
of α · α′ as the extension of α that maximizes k′.
Thus, α ·α′ ·σ1 · · ·σi ·σi+1 is indeed a (k+k′)-stall execution for Tn where 1 < k < (k+k′) ≤ (n−1).
Since there are at most n processes that are concurrent at any prefix of an execution, the lower bound
of Theorem 5.3 is tight.
77
Chapter 5 Complexity bounds for non-blocking TMs
5.4 A linear lower bound on expensive synchronization for RW
DAP
We prove that opaque, RW DAP TM implementations in OF have executions in which some read-only
transaction performs a linear (in n) number of non-overlapping RAWs or AWARs.
Prior to presenting the formal proof, we present an overview (the executions used in the proof are depicted
in Figure 5.2).
We first construct an execution of the form ρ¯1 · · · ρ¯m, where for all j ∈ {1, . . . ,m}; m = n − 3, ρ¯j
denotes the t-complete step contention-free execution of transaction Tj that reads the initial value v in a
distinct t-object Zj , writes a new value nv to a distinct t-object Xj and commits. Observe that since any
two transactions that participate in this execution are mutually read-write disjoint-access, they cannot
contend on the same base object and, thus, the execution appears solo to each of them.
Let each of two new transactions Tn−1 and Tn perform m t-reads on objects X1, . . . , Xm. For j ∈
{1, . . . ,m}, we now define ρj to be the longest prefix of ρ¯j such that ρ1 · · · ρj cannot be extended the
complete step contention-free execution fragment of Tn−1 or Tn where the t-read of Xj returns nv
(Figure 5.2a). Let ej be the event by Tj enabled after ρ1 · · · ρj . Let us count the number of indices
j ∈ {1, . . . ,m} such that Tn−1 (resp., Tn) reads the new value nv in Xj when it runs after ρ1 · · · ρj · ej .
Without loss of generality, assume that Tn−1 has more such indices j than Tn. We are going to show
that, in the worst-case, Tn must perform dm2 e non-overlapping RAW/AWARs in the course of performing
m t-reads of X1, . . . , Xm immediately after ρ1 · · · ρm.
Consider any j ∈ {1, . . . ,m} such that Tn−1, when it runs step contention-free after ρ1 · · · ρj · ej , reads
nv in Xj . We claim that, in ρ1 · · · ρm extended with the step contention-free execution of Tn performing
j t-reads readn(X1) · · · readn(Xj), the t-read of Xj must contain a RAW or an AWAR.
Suppose not. Then we are going to schedule a specific execution of Tj and Tn−1 concurrently with
readn(Xj) so that Tn cannot detect the concurrency. By the definition of ρj and the fact that the TM
is RW DAP, Tn, when it runs step contention-free after ρ1 · · · ρm, must read v (the initial value) in
Xj (Figure 5.2b). Then the following execution exists: ρ1 · · · ρm is extended with the t-complete step
contention-free execution of Tn−2 writing nv to Zj and committing, after which Tn runs step contention-
free and reads v in Xj (Figure 5.2c). Since, by the assumption, readn(Xj) contains no RAWs or AWARs,
we show that we can run Tn−1 performing j t-reads concurrently with the execution of readn(Xj) so that
Tn and Tn−1 are unaware of step contention and readn−1(Xj) still reads the value nv in Xj .
To understand why this is possible, consider the following: we take the execution depicted in Figure 5.2c,
but without the execution of readn(Xj), i.e, ρ1 · · · ρm is extended with the step contention-free execution
of committed transaction Tn−2 writing nv to Zj , after which Tn runs step contention-free performing
j−1 t-reads. This execution can be extended with the step ej by Tj , followed by the step contention-free
execution of transaction Tn−1 in which it reads nv in Xj . Indeed, by RW DAP and the definition of
ρj · ej , there exists such an execution (Figure 5.2d).
Since readn(Xj) contains no RAWs or AWARs, we can reschedule the execution fragment ej followed by
the execution of Tn−1 so that it is concurrent with the execution of readn(Xj) and neither Tn nor Tn−1
see a difference (Figure 5.2e). Therefore, in this execution, readn(Xj) still returns v, while readn−1(Xj)
returns nv.
However, the resulting execution (Figure 5.2e) is not opaque. In any serialization the following must
hold. Since Tn−1 reads the value written by Tj in Xj , Tj must be committed. Since readn(Xj) returns
the initial value v, Tn must precede Tj . The committed transaction Tn−2, which writes a new value to
Zj , must precede Tn to respect the real-time order on transactions. However, Tj must precede Tn−2 since
readj(Zj) returns the initial value and the implementation is opaque. The cycle Tj → Tn−2 → Tn → Tj
implies a contradiction.
Thus, we can show that transaction Tn must perform Ω(n) RAW/AWARs during the execution of m
t-reads immediately after ρ1 · · · ρm.
78
5.4 A linear lower bound on expensive synchronization for RW DAP
R1(Z1)→ v W1(X1, nv) tryC1 Rm(Zm)→ v Wm(Xm, nv) tryCm
T1 Tm
(a) Transactions in {T1, . . . , Tm};m = n− 3 are mutually read-
write disjoint-access and concurrent; they are poised to ap-
ply a nontrivial primitive
R1(Z1)→ v W1(X1, nv) tryC1 Rm(Zm)→ v Wm(Xm, nv) tryCm Rn(X1)→ v Rn(Xj)→ v
T1 Tm Tn
(b) Tn performs m reads; each readn(Xj) returns initial value v
R1(Z1)→ v W1(X1, nv) tryC1 Rm(Zm)→ v Wm(Xm, nv) tryCm Rn(X1)→ v Rn(Xj)→ vWn−2(Zj , nv)
T1 Tm TnTn−2
(c) Tn−2 commits; Tn is read-write disjoint-access with Tn−2
R1(Z1)→ v W1(X1, nv) tryC1 Rm(Zm)→ v Wm(Xm, nv) tryCm Wn−2(Zj , nv)
(event of Tj)
ej
Rn−1(X1) Rn−1(Xj)→ nv
Rn(X1) · · ·Rn(Xj−1)
T1 Tm TnTn−2
Tn−1
(d) Tn−1 is read-write disjoint-access with Tn−2; readn−1(Xj) returns the value nv
R1(Z1)→ v W1(X1, nv) tryC1 Rm(Zm)→ v Wm(Xm, nv) tryCm Rn(X1) · · ·Rn(Xj−1) Rn(Xj)→ vWn−2(Zj , nv)
(event of Tj)
ej
Rn−1(X1) Rn−1(Xj)→ nv
T1 Tm TnTn−2
Tn−1
(e) Suppose readn(Xj) does not perform a RAW/AWAR, Tn and Tn−1 are unaware of step contention and Tn misses
the event of Tj , but Rn−1(Xj) returns the value of Xj that is updated by Tj
Figure 5.2: Executions in the proof of Theorem 5.4; execution in 5.2e is not opaque
Theorem 5.4. Every RW DAP opaque TM implementation M ∈ OF has an execution E in which some
read-only transaction T ∈ txns(E) performs Ω(n) non-overlapping RAW/AWARs.
Proof. For all j ∈ {1, . . . ,m}; m = n− 3, let v be the initial value of t-objects Xj and Zj . Throughout
this proof, we assume that, for all i ∈ {1, . . . , n}, transaction Ti is invoked by process pi.
By OF TM-progress and OF TM-liveness, any opaque and RW DAP TM implementation M ∈ OF
has an execution of the form ρ¯1 · · · ρ¯m, where for all j ∈ {1, . . . ,m}, ρ¯j denotes the t-complete step
contention-free execution of transaction Tj that performs readj(Zj)→ v, writes value nv 6= v to Xj and
commits.
By construction, any two transactions that participate in ρ¯1 · · · ρ¯n are mutually read-write disjoint-access
and cannot contend on the same base object. It follows that for all 1 ≤ j ≤ m, ρ¯j is an execution of M .
For all j ∈ {1, . . . ,m}, we iteratively define an execution ρj of M as follows: it is the longest prefix
of ρ¯j such that ρ1 · · · ρj cannot be extended with the complete step contention-free execution frag-
ment of transaction Tn that performs j t-reads: readn(X1) · · · readn(Xj) in which readn(Xj) → nv nor
with the complete step contention-free execution fragment of transaction Tn−1 that performs j t-reads:
readn−1(X1) · · · readn−1(Xj) in which readn−1(Xj)→ nv (Figure 5.2a).
For any j ∈ {1, . . . ,m}, let ej be the event transaction Tj is poised to apply in the configuration
after ρ1 · · · ρj . Thus, the execution ρ1 · · · ρj · ej can be extended with the complete step contention-free
executions of at least one of transaction Tn or Tn−1 that performs j t-reads of X1, . . . , Xj in which the
t-read of Xj returns the new value nv. Let Tn−1 be the transaction that must return the new value for
the maximum number of Xj ’s when ρ1 · · · ρj · ej is extended with the t-reads of X1, . . . , Xj . We show
that, in the worst-case, transaction Tn must perform dm2 e non-overlapping RAW/AWARs in the course
of performing m t-reads of X1, . . . , Xm immediately after ρ1 · · · ρm. Symmetric arguments apply for the
case when Tn must return the new value for the maximum number of Xj ’s when ρ1 · · · ρj · ej is extended
with the t-reads of X1, . . . , Xj .
79
Chapter 5 Complexity bounds for non-blocking TMs
Proving the RAW/AWAR lower bound. We prove that transaction Tn must perform dm2 e non-
overlapping RAWs or AWARs in the course of performing m t-reads of X1, . . . , Xm immediately after
the execution ρ1 · · · ρm. Specifically, we prove that Tn must perform a RAW or an AWAR during
the execution of the t-read of each Xj such that ρ1 · · · ρj · ej can be extended with the complete step
contention-free execution of Tn−1 as it performs j t-reads of X1 . . . Xj in which the t-read of Xj returns
the new value nv. Let J denote the of all j ∈ {1, . . . ,m} such that ρ1 · · · ρj ·ej extended with the complete
step contention-free execution of Tn−1 performing j t-reads of X1 . . . Xj must return the new value nv
during the t-read of Xj .
We first prove that, for all j ∈ J, M has an execution of the form ρ1 · · · ρm · δj (Figures 5.2a and
5.2b), where δj is the complete step contention-free execution fragment of Tn that performs j t-reads:
readn(X1) · · · readn(Xj), each of which return the initial value v.
By definition of ρj , OF TM-progress and OF TM-liveness, M has an execution of the form ρ1 · · · ρj · δj .
By construction, transaction Tn is read-write disjoint-access with each transaction T ∈ {Tj+1, . . . , Tm}
in ρ1 · · · ρj · · · ρm · δj . Thus, Tn cannot contend with any of the transactions in {Tj+1, . . . , Tm}, implying
that, for all j ∈ {1, . . . ,m}, M has an execution of the form ρ1 · · · ρm · δj (Figure 5.2b).
We claim that, for each j ∈ J, the t-read of Xj performed by Tn must perform a RAW or an AWAR in
the course of performing j t-reads of X1, . . . , Xj immediately after ρ1 · · · ρm. Suppose by contradiction
that readn(Xj) does not perform a RAW or an AWAR in ρ1 · · · ρm · δm.
Claim 5.5. For all j ∈ J, M has an execution of the form ρ1 · · · ρj · · · ρm · δj−1 · ej · β where, β
is the complete step contention-free execution fragment of transaction Tn−1 that performs j t-reads:
readn−1(X1) · · · readn−1(Xj−1) · readn−1(Xj) in which readn−1(Xj) returns nv.
Proof. We observe that transaction Tn is read-write disjoint-access with every transaction T ∈ {Tj , Tj+1, . . . , Tm}
in ρ1 · · · ρj · · · ρm·δj−1. By RWDAP, it follows thatM has an execution of the form ρ1 · · · ρj · · · ρm·δj−1·ej
since Tn cannot perform a nontrivial event on the base object accessed by Tj in the event ej .
By the definition of ρj , transaction Tn−1 must access the base object to which Tj applies a nontrivial
primitive in ej to return the value nv of Xj as it performs j t-reads of X1, . . . , Xj immediately after the
execution ρ1 · · · ρj · · · ρm · δj−1 · ej . Thus, M has an execution of the form ρ1 · · · ρj · δj−1 · ej · β.
By construction, transactions Tn−1 is read-write disjoint-access with every transaction T ∈ {Tj+1, . . . , Tm}
in ρ1 · · · ρj · · · ρm ·δj−1 ·ej ·β. It follows thatM has an execution of the form ρ1 · · · ρj · · · ρm ·δj−1 ·ej ·β.
Claim 5.6. For all j ∈ {1, . . . ,m}, M has an execution of the form ρ1 · · · ρj · · · ρm ·γ · δj−1 · ej ·β, where
γ is the t-complete step contention-free execution fragment of transaction Tn−2 that writes nv 6= v to Zj
and commits.
Proof. Observe that Tn−2 precedes transactions Tn and Tn−1 in real-time order in the above execution.
By OF TM-progress and OF TM-liveness, transaction Tn−2 must be committed in ρ1 · · · ρj · · · ρm · γ.
Since transaction Tn−1 is read-write disjoint-access with Tn−2 in ρ1 · · · ρj · · · ρm · γ · δj−1 · ej · β, Tn−1
does not contend with Tn−2 on any base object (recall that we associate an edge with t-objects in the
conflict graph only if they are both contained in the write set of some transaction). Since the execution
fragment β contains an access to the base object to which Tj performs a nontrivial primitive in the
event ej , Tn−2 cannot perform a nontrivial event on this base object in γ. It follows that M has an
execution of the form ρ1 · · · ρj · · · ρm ·γ ·δj−1 ·ej ·β since, it is indistinguishable to Tn−1 from the execution
ρ1 · · · ρj · · · ρm · δj−1 · ej · β (the existence of which is already established in Claim 5.5).
Recall that transaction Tn is read-write disjoint-access with Tn−2 in ρ1 · · · ρj · · · ρm · γ · δj . Thus, M has
an execution of the form ρ1 · · · ρj · · · ρm · γ · δj (Figure 5.2c).
Deriving a contradiction. For all j ∈ {1, . . . ,m}, we represent the execution fragment δj as δj−1 · pij ,
where pij is the complete execution fragment of the jth t-read readn(Xj) → v. By our assumption, pij
does not contain a RAW or an AWAR.
80
5.5 Algorithms for obstruction-free TMs
For succinctness, let α = ρ1 · · · ρm · γ · δj−1. We now prove that if pij does not contain a RAW or an
AWAR, we can define pij1 · pij2 = pij to construct an execution of the form α · pij1 · ej · β · pij2 (Figure 5.2e)
such that
• no event in pij1 is the application of a nontrivial primitive
• α · pij1 · ej · β · pij2 is indistinguishable to Tn from the step contention-free execution α · pij1 · pij2
• α · pij1 · ej · β · pij2 is indistinguishable to Tn−1 from the step contention-free execution α · ej · β.
The following claim defines pij1 and pi
j
2 to construct this execution.
Claim 5.7. For all j ∈ {1, . . . ,m}, M has an execution of the form α · pij1 · ej · β · pij2.
Proof. Let t be the first event containing a write to a base object in the execution fragment pij . We
represent pij as the execution fragment pij1 · t · pijf . Since pij1 does not contain nontrivial events that write
to a base object, α · pij1 · ej · β is indistinguishable to transaction Tn−1 from the step contention-free
execution α · ej · β (as already proven in Claim 5.6). Consequently, α · pij1 · ej · β is an execution of M .
Since t is not an atomic-write-after-read, M has an execution of the form α · γ · pij1 · ej · β · t. Secondly,
since pij does not contain a read-after-write, any read of a base object performed in pijf may only be
performed to base objects previously written in t ·pijf . Thus, α ·pij1 · ej ·β · t ·pijf is indistinguishable to Tn
from the step contention-free execution α ·pij1 · t ·pijf . But, as already proved, α ·pij is an execution of M .
Choosing pij2 = t · pijf , it follows that M has an execution of the form α · pij1 · ej · β · pij2.
We have now proved that, for all j ∈ {1, . . . ,m}, M has an execution of the form ρ1 · · · ρm · γ · δj−1 · pij1 ·
ej · β · pij2 (Figure 5.2e).
The execution in Figure 5.2e is not opaque. Indeed, in any serialization the following must hold. Since
Tn−1 reads the value written by Tj in Xj , Tj must be committed. Since readn(Xj) returns the initial
value v, Tn must precede Tj . The committed transaction Tn−2, which writes a new value to Zj , must
precede Tn to respect the real-time order on transactions. However, Tj must precede Tn−2 since readj(Zj)
returns the initial value of Zj . The cycle Tj → Tn−2 → Tn → Tj implies that there exists no such a
serialization.
Thus, for each j ∈ J, transaction Tn must perform a RAW or an AWAR during the t-read of Xj in the
course of performing m t-reads of X1, . . . , Xm immediately after ρ1 · · · ρm. Since |J| ≥ d (n−3)2 e, in the
worst-case, Tn must perform Ω(n) RAW/AWARs during the execution of m t-reads immediately after
ρ1 · · · ρm.
5.5 Algorithms for obstruction-free TMs
In this section, we present two opaque obstruction-free TM implementations: the first one satisfies RW
DAP, but not strict DAP while the second one satisfies weak DAP, but not RW DAP.
81
Chapter 5 Complexity bounds for non-blocking TMs
5.5.1 An opaque RW DAP TM implementation
In this section, we describe a RW DAP TM implementation in OF (based on DSTM [79]).
Every t-object Xm maintains a base object tvar [m] and every transaction Tk maintains a status[k] base
object. Both base objects support the read, write and compare-and-swap (cas) primitives.
The object tvar [m] stores a triple: the owner of Xm is an updating transaction that performs the
latest write to Xm, the old value and new value of Xm represent two latest versions of Xm. The base
object status[k] denotes if Tk is live (i.e. t-incomplete), committed or aborted. Intuitively, if status[k] is
committed, then other transactions can safely read the value of the t-objects updated by Tk.
Implementation of readk(Xm) first reads tvar [m] and checks if the owner of Xm is live; if so, it forcefully
aborts the owning transaction and returns the old value of Xm. Otherwise, if the owner is committed, it
returns the new value of Xm. In both cases, it only returns a non-abort value if no t-object previously
read has been updated since. The writek(Xm, v) works similar to the readk(Xm) implementation; but
additionally, if the owner of Xm is live, it forcefully aborts the owning transaction, assumes ownership of
Xm, sets v as the new value of Xm and leaves the old value of Xm unchanged. Otherwise, if the owner
of Xm is a committed transaction, it updates the old value of Xm to be the value of Xm updated by its
previous owner. The tryCk implementation sets status[k] to committed if it has not been set to aborted
by a concurrent transaction, otherwise Tk is deemed aborted. Since any t-read operation performs at
most two AWARs and the tryC performs only a single AWAR, any read-only transaction T performs at
most O(|Rset(T )|) AWARs. The pseudocode is described in Algorithm 5.1.
Lemma 5.8. Algorithm 5.1 implements an opaque TM.
Proof. Since opacity is a safety property, we only consider finite executions [18]. Let E by any finite
execution of Algorithm 5.1. Let <E denote a total-order on events in E.
Let H denote a subsequence of E constructed by selecting linearization points of t-operations performed
in E. The linearization point of a t-operation op, denoted as `op is associated with a base object event
or an event performed during the execution of op using the following procedure.
Completions. First, we obtain a completion of E by removing some pending invocations and adding
responses to the remaining pending invocations involving a transaction Tk as follows: every incomplete
readk, writek, tryCk operation is removed from E; an incomplete writek is removed from E.
Linearization points. We now associate linearization points to t-operations in the obtained completion
of E as follows:
• For every t-read opk that returns a non-Ak value, `opk is chosen as the event in Line 12 of Algo-
rithm 5.1, else, `opk is chosen as invocation event of opk
• For every t-write opk that returns a non-Ak value, `opk is chosen as the event in Line 37 of
Algorithm 5.1, else, `opk is chosen as invocation event of opk
• For every opk = tryCk that returns Ck, `opk is associated with Line 65.
<H denotes a total-order on t-operations in the complete sequential history H.
Serialization points. The serialization of a transaction Tj , denoted as δTj is associated with the
linearization point of a t-operation performed during the execution of the transaction.
We obtain a t-complete history H¯ from H as follows: for every transaction Tk in H that is complete,
but not t-complete, we insert tryCk ·Ak after H.
H¯ is thus a t-complete sequential history. A t-complete t-sequential history S equivalent to H¯ is obtained
by associating serialization points to transactions in H¯ as follows:
• If Tk is an update transaction that commits, then δTk is `tryCk
82
5.5 Algorithms for obstruction-free TMs
Algorithm 5.1 RW DAP opaque implementation M ∈ OF ; code for Tk
1: Shared base objects:
2: tvar [m], storing [ownerm, ovalm,nvalm]
3: for each t-object Xm, supports read, write, cas
4: ownerm, a transaction identifier
5: ovalm ∈ V
6: nvalm ∈ V
7: status[k] ∈ {live, aborted , committed},
8: for each Tk; supports read, write, cas
9: Local variables:
10: Rsetk,Wsetk for every transaction Tk;
11: dictionaries storing {Xm, Tvar [m]}
12: readk(Xm):
13: [ownerm, ovalm,nvalm] ← tvar [m].read()
14: if ownerm 6= k then
15: sm ← status[ownerm].read()
16: if sm = committed then
17: curr = nvalm
18: else if sm = aborted then
19: curr = ovalm
20: else
21: if status[ownerm].cas(live, aborted) then
22: curr = ovalm
23: else
24: Return Ak
25: if status[k] = live ∧ ¬validate() then
26: Rset(Tk).add({Xm, [ownerm, ovalm,nvalm]})
27: Return curr
28: Return Ak
29: else
30: Return Rset(Tk).locate(Xm)
31: Function: validate():
32: if ∃{Xj , [ownerj , ovalj ,nvalj ]} ∈ Rset(Tk):
33: ([ownerj , ovalj ,nvalj ] 6= tvar [j].read()) then
34: Return true
35: Return false
36: writek(Xm, v):
37: [ownerm, ovalm,nvalm] ← tvar [m].read()
38: if ownerm 6= k then
39: sm ← status[ownerm].read()
40: if sm = committed then
41: curr = nvalm
42: else if sm = aborted then
43: curr = ovalm
44: else
45: if status[ownerm].cas(live, aborted) then
46: curr = ovalm
47: else
48: Return Ak
49: om ← tvar [m].cas([ownerm, ovalm,nvalm], [k, curr , v])
50: if om ∧ status[k] = live then
51: Wsetk.add({Xm, [k, curr , v]})
52: Return ok
53: else
54: Return Ak
55: else
56: [ownerm, ovalm,nvalm] = Wsetk.locate(Xm)
57: s = tvar [m].cas([ownerm, ovalm,nvalm], [k, ovalm, v])
58: if s then
59: Wset(Tk).add({Xm, [k, ovalm, v]})
60: Return ok
61: else
62: Return Ak
63: tryCk():
64: if validate() then
65: Return Ak
66: if status[k].cas(live, committed) then
67: Return Ck
68: Return Ak
• If Tk is an aborted or read-only transaction in H¯, then δTk is assigned to the linearization point
of the last t-read that returned a non-Ak value in Tk
<S denotes a total-order on transactions in the t-sequential history S.
Claim 5.9. If Ti ≺RTH Tj, then Ti <S Tj.
Proof. This follows from the fact that for a given transaction, its serialization point is chosen between
the first and last event of the transaction implying if Ti ≺H Tj , then δTi <E δTj implies Ti <S Tj
Claim 5.10. If transaction Ti returns Ci in E, then status[i]=committed in E.
Proof. Transaction Ti must perform the event in Line 66 before returning Ti i.e. the cas on its own
status to change the value to committed. The proof now follows from the fact that any other transaction
may change the status of Ti only if it is live (Lines 45 and 21).
Claim 5.11. S is legal.
Proof. Observe that for every readj(X)→ v, there exists some transaction Ti that performs writei(X, v)
and completes the event in Line 22 to write v as the new value of X such that readj(X) 6≺RTH writei(X, v).
For any updating committing transaction Ti, δTi = `tryCi . Since readj(X) returns a response v, the event
83
Chapter 5 Complexity bounds for non-blocking TMs
in Line 12 must succeed the event in Line 66 when Ti changes status[i] to committed. Suppose otherwise,
then readj(X) subsequently forces Ti to abort by writing aborted to status[i] and must return the old
value of X that is updated by the previous owner of X, which must be committed in E (Line 40). Since
δTi = `tryCi precedes the event in Line 66, it follows that δTi <E `readj(X).
We now need to prove that δTi <E δTj . Consider the following cases:
• if Tj is an updating committed transaction, then δTj is assigned to `tryCj . But since `readj(X) <E
`tryCj , it follows that δTi <E δTj .
• if Tj is a read-only or aborted transaction, then δTj is assigned to the last t-read that did not
abort. Again, it follows that δTi <E δTj .
To prove that S is legal, we need to show that, there does not exist any transaction Tk that returns Ck in
S and performs writek(X, v′); v′ 6= v such that Ti <S Tk <S Tj . Now, suppose by contradiction that there
exists a committed transaction Tk, X ∈ Wset(Tk) that writes v′ 6= v to X such that Ti <S Tk <S Tj .
Since Ti and Tk are both updating transactions that commit,
(Ti <S Tk) ⇐⇒ (δTi <E δTk)
(δTi <E δTk) ⇐⇒ (`tryCi <E `tryCk)
Since, Tj reads the value of X written by Ti, one of the following is true: `tryCi <E `tryCk <E `readj(X)
or `tryCi <E `readj(X) <E `tryCk .
If `tryCi <E `tryCk <E `readj(X), then the event in Line 66 performed by Tk when it changes the status
field to committed precedes the event in Line 12 performed by Tj . Since `tryCi <E `tryCk and both Ti
and Tk are committed in E, Tk must perform the event in Line 37 after Ti changes status[i] to committed
since otherwise, Tk would perform the event in Line 45 and change status[i] to aborted, thereby forcing
Ti to return Ai. However, readj(X) observes that the owner of X is Tk and since the status of Tk is
committed at this point in the execution, readj(X) must return v′ and not v—contradiction.
Thus, `tryCi <E `readj(X) <E `tryCk . We now need to prove that δTj indeed precedes δTk = `tryCk in E.
Now consider two cases:
• Suppose that Tj is a read-only transaction. Then, δTj is assigned to the last t-read performed by
Tj that returns a non-Aj value. If readj(X) is not the last t-read that returned a non-Aj value,
then there exists a readj(X ′) such that `readj(X) <E `tryCk <E `readj(X′). But then this t-read of
X ′ must abort since the value of X has been updated by Tk since Tj first read X—contradiction.
• Suppose that Tj is an updating transaction that commits, then δTj = `tryCj which implies that
`readj(X) <E `tryCk <E `tryCj . Then, Tj must neccesarily perform the validation of its read set in
Line 65 and return Aj—contradiction.
Claims 5.9 and 5.11 establish that Algorithm 5.1 is opaque.
Theorem 5.12. Algorithm 5.1 describes a RW DAP, progressive opaque TM implementation M ∈ OF
such that in every execution E of M ,
• the total number of stalls incurred by a t-read operation invoked in E is O(n),
• every read-only transaction T ∈ txns(E) performs O(|Rset(T )|) AWARs in E, and
• every complete t-read operation invoked by transaction Tk ∈ txns(E) performs O(|RsetE(Tk)| steps.
84
5.5 Algorithms for obstruction-free TMs
Proof. (Opacity) Follows from Lemma 5.8
(TM-liveness and TM-progress) Since none of the implementations of the t-operations in Algorithm 5.1
contain unbounded loops or waiting statements, every t-operation opk returns a matching response after
taking a finite number of steps. Thus, Algorithm 5.1 provides wait-free TM-liveness.
To prove OF TM-progress, we proceed by enumerating the cases under which a transaction Tk may be
aborted in any execution.
• Suppose that there exists a readk(Xm) performed by Tk that returns Ak. If readk(Xm) returns
Ak in Line 28, then there exists a concurrent transaction that updated a t-object in Rset(Tk) or
changed status[k] to aborted. In both cases, Tk returns Ak only because there is step contention.
• Suppose that there exists a writek(Xm, v) performed by Tk that returns Ak in Line 54. Thus,
either a concurrent transaction has changed status[k] to aborted or the value in tvar [m] has been
updated since the event in Line 37. In both cases, Tk returns Ak only because of step contention
with another transaction.
• Suppose that a readk(Xm) or writek(Xm, v) return Ak in Lines 21 and 45 respectively. Thus, a
concurrent transaction has takes steps concurrently by updating the status of ownerm since the
read by Tk in Lines 12 and 37 respectively.
• Suppose that tryCk() returns Ak in Line 62. This is because there exists a t-object in Rset(Tk) that
has been updated by a concurrent transaction since, i.e., tryCk() returns Ak only on encountering
step contention.
It follows that in any step contention-free execution of a transaction Tk from a Tk-free execution, Tk
must return Ck after taking a finite number of steps.
The enumeration above also proves that M implements a progressive TM.
(Read-write disjoint-access parallelism) Consider any execution E of Algorithm 5.1 and let Ti and Tj
be any two transactions that contend on a base object b in E. We need to prove that there is a
path between a t-object in Dset(Ti) and a t-object in Dset(Tj) in G˜(Ti, Tj , E) or there exists X ∈
Dset(Ti) ∩ Dset(Tj). Recall that there exists an edge between t-objects X and Y in G˜(Ti, Tj , E) only if
there exists a transaction T ∈ txns(E) such that {X,Y } ∈Wset(T ).
• Suppose that Ti and Tj contend on base object tvar[m] belonging to t-object Xm in E. By
Algorithm 5.1, a transaction accesses Xm only if Xm is contained in Dset(Tm). Thus, both Ti and
Tj must access Xm.
• Suppose that Ti and Tj contend on base object status[i] in E (the case when Ti and Tj contend
on status[j] is symmetric). Tj accesses status[i] while performing a t-read of some t-object X
in Lines 15 and 21 only if Ti is the owner of X. Also, Tj accesses status[i] while performing a
t-write to X in Lines 39 and 45 only if Ti is the owner of X. But if Ti is the owner of X, then
X ∈Wset(Ti).
• Suppose that Ti and Tj contend on base object status[m] belonging to some transaction Tm in E.
Firstly, observe that Ti or Tj access status[m] only if there exist t-objects X and Y in Dset(Ti) and
Dset(Tj) respectively such that {X,Y } ∈ Wset(Tm). This is because Ti and Tj would both read
status[m] in Lines 15 (during t-read) and 39 (during t-write) only if Tm was the previous owner
of X and Y . Secondly, one of Ti or Tj applies a nontrivial primitive to status[m] only if Ti and Tj
read status[m]=live in Lines 15 (during t-read) and 37 (during t-write). Thus, at least one of Ti or
Tj is concurrent to Tm in E. It follows that there exists a path between X and Y in G˜(Ti, Tj , E).
(Complexity) Every t-read operation performs at most one AWAR in an execution E (Line 21) of Algo-
rithm 5.1. It follows that any read-only transaction Tk ∈ txns(E) performs at most |Rset(Tk)| AWARs
in E.
The linear step-complexity is immediate from the fact that during the t-read operations, the transaction
validates its entire read set (Line 25). All other t-operations incur O(1) step-complexity since they
involve no iteration statements like for and while loops.
85
Chapter 5 Complexity bounds for non-blocking TMs
Since at most n−1 transactions may be t-incomplete at any point in an execution E, it follows that E is
at most a (n−1)-stall execution for any t-read op and every T ∈ txns(E) incurs O(n) stalls on account of
any event performed in E. More specifically, consider the following execution E: for all i ∈ {1, . . . , n−1},
each transaction Ti performs writei(Xm, v) in a step-contention free execution until it is poised to apply
a nontrivial event on tvar [m] (Line 22). By OF TM-progress, we construct E such that each of the Ti
is poised to apply a nontrivial event on tvar [m] after E. Consider the execution fragment of readn(Xm)
that is poised to perform an event e that reads tvar [m] (Line 12) immediately after E. In the constructed
execution, Tn incurs O(n) stalls on account of e and thus, produces the desired (n − 1)-stall execution
for readn(X).
5.5.2 An opaque weak DAP TM implementation
In this section, we describe a weak DAP TM implementation in OF with constant step-complexity t-read
operations.
Algorithm 5.2 Weak DAP opaque implementation M ∈ OF ; code for Tk
1: readk(Xm):
2: [ownerm, ovalm,nvalm] ← tvar [m].read()
3: if ownerm 6= k then
4: sm ← status[ownerm].read()
5: if sm = committed then
6: curr = nvalm
7: else if sm = aborted then
8: curr = ovalm
9: else
10: if status[ownerm].cas(live, aborted) then
11: curr = ovalm
12: Return Ak
13: om ← tvar [m].cas([ownerm, ovalm,nvalm], [k, ovalm,nvalm])
14: if om ∧ status[k] = live then
15: Rset(Tk).add({Xm, [ownerm, ovalm,nvalm]})
16: Return curr
17: else
18: Return Rset(Tk).locate(Xm)
19: tryCk():
20: if status[k].cas(live, committed) then
21: Return Ck
22: Return Ak
Algorithm 5.2 describes a weak DAP implementation in OF that does not satisfy read-write DAP. The
code for the t-write operations is identical to Algorithm 5.1. During the t-read of t-object Xm by
transaction Tk, Tk becomes the owner of Xm thus eliminating the per-read validation step-complexity
inherent to Algorithm 5.1. Similarly, tryCk also not involve performing the validation of the Tk’s read
set; the implementation simply sets status[k] = committed and returns Ck.
Theorem 5.13. Algorithm 5.2 describes a weak TM implementationM ∈ OF such that in any execution
E ofM , for every transaction T ∈ txns(E), T performs O(1) steps during the execution of any t-operation
in E.
Proof. The proofs of opacity, TM-liveness and TM-progress are almost identical to the analogous proofs
for Algorithm 5.1.
(Weak disjoint-access parallelism) Consider any execution E of Algorithm 5.2 and let Ti and Tj be any
two transactions that contend on a base object b in E. We need to prove that there is a path between a t-
object inDset(Ti) and a t-object inDset(Tj) in G˜(Ti, Tj , E) or there existsX ∈ Dset(Ti)∩Dset(Tj). Recall
that there exists an edge between t-objects X and Y in G(Ti, Tj , E) only if there exists a transaction
T ∈ txns(E) such that {X,Y } ∈ Dset(T ).
86
5.6 Why Transactional memory should not be obstruction-free
• Suppose that Ti and Tj contend on base object tvar[m] belonging to t-object Xm in E. By
Algorithm 5.2, a transaction accesses Xm only if Xm is contained in Dset(Tm). Thus, both Ti and
Tj must access Xm.
• Suppose that Ti and Tj contend on base object status[i] in E (the case when Ti and Tj contend
on status[j] is symmetric). Tj accesses status[i] while performing a t-read of some t-object X in
Lines 4 and 10 only if Ti is the owner of X. Also, Tj accesses status[i] while performing a t-write to
X in Lines 39 and 45 only if Ti is the owner of X. But if Ti is the owner of X, then X ∈ Dset(Ti).
• Suppose that Ti and Tj contend on base object status[m] belonging to some transaction Tm in E.
Firstly, observe that Ti or Tj access status[m] only if there exist t-objects X and Y in Dset(Ti) and
Dset(Tj) respectively such that {X,Y } ∈ Dset(Tm). This is because Ti and Tj would both read
status[m] in Lines 4 (during t-read) and 39 (during t-write) only if Tm was the previous owner of
X and Y . Secondly, one of Ti or Tj applies a nontrivial primitive to status[m] only if Ti and Tj
read status[m]=live in Lines 4 (during t-read) and 37 (during t-write). Thus, at least one of Ti or
Tj is concurrent to Tm in E. It follows that there exists a path between X and Y in G˜(Ti, Tj , E).
(Complexity) Since no implementation of any of the t-operation contains any iteration statements like
for and while loops), the proof follows.
5.6 Why Transactional memory should not be obstruction-free
Obstruction-free TMs Progressive TM LP
strict DAP No [60] Yes
invisible reads+weak DAP No Yes
stall complexity of t-reads Ω(n) O(1)
RAW/AWAR complexity Ω(n) O(1)
read-write base objects, wait-free TM-liveness No [64] Yes
Figure 5.3: Complexity gap between blocking and non-blocking TMs
As a synchronization abstraction, TM came as an alternative to conventional lock-based synchronization,
and it therefore appears natural that early TM implementations [52, 79, 101, 120], avoided using locks.
Instead, early TM designs relied on non-blocking synchronization, where a prematurely halted trans-
action cannot prevent all other transactions from committing. Possibly the weakest progress condition
elucidating non-blocking TM-progress is obstruction-freedom.
However, in 2005, Ennals [48] argued that obstruction-free TMs inherently yield poor performance,
because they require transactions to forcefully abort each other. Ennals further described a lock-based
TM implementation [47] satisfying progressiveness that he claimed to outperform DSTM [79], the most
referenced obstruction-free TM implementation at the time. Inspired by [48], more recent lock-based
progressive TMs, such as TL [40], TL2 [39] and NOrec [36], demonstrate better performance than
obstruction-free TMs on most workloads.
There is a considerable amount of empirical evidence on the performance gap between non-blocking
(obstruction-free) and blocking (progressive) TM implementations but no analytical result explains it.
We present complexity lower and upper bounds that provide such an explanation.
To exhibit a complexity gap between blocking and non-blocking TMs, we go back to the the progressive
opaque TM implementation LP (Algorithm 4.1) that beats the impossibility result and the lower bounds
we established for obstruction-free TMs. Recall that our implementation LP , (1) uses only read-write
base objects and provides wait-free TM-liveness, (2) ensures strict DAP, (3) has invisible reads, (4) per-
forms O(1) non-overlapping RAWs/AWARs per transaction, and (5) incurs O(1) memory stalls for read
operations (Theorem 4.13). In contrast, from prior work and our lower bounds we know that (i) no
OF TM that provides wait-free transactional operations can be implemented using only read-write base
objects [64]; (ii) no OF TM can provide strict DAP [60]; (iii) no weak DAP OF TM has invisible reads
87
Chapter 5 Complexity bounds for non-blocking TMs
(Section 5.2) and (iv) no OF TM ensures a constant number of stalls incurred by a t-read operation (Sec-
tion 5.3). Finally, (v) no RW DAP opaque OF TM has constant RAW/AWAR complexity (Section 5.4).
In fact, (iv) and (v) exhibit a linear separation between blocking and non-blocking TMs w.r.t expensive
synchronization and memory stall complexity, respectively.
Altogether, our results exhibit a considerable complexity gap between progressive and obstruction-free
TMs, as summarized in Figure 5.3, that seems to justify the shift in TM practice (circa. 2005) from
non-blocking to blocking TMs.
Overcoming our lower bounds for obstruction-free TMs individually is comparatively easy. Say, TL [40]
combines strict DAP with invisible reads, but it is not read-write, and it does not provide constant
RAW/AWAR and stall complexities.
Coming out with a single algorithm that beats all these lower bounds is quite nontrivial. Our algorithm
LP incurs the cost of incremental validation, i.e., checking that the current read set has not changed per
every new read operation. This is, however, unavoidable for invisible read algorithms (cf. Theorem 4.2),
and is, in fact, believed to yield better performance in practice than “visible” reads [36, 40, 47], and we
show that it enables constant stall and RAW/AWAR complexity.
5.7 Related work and Discussion
In this section, we summarize the results presented in this chapter and identify some unresolved ques-
tions.
Lower bounds for non-blocking TMs. Complexity of obstruction-free TMs was first studied by
Guerraoui and Kapalka [60, 64] who proved that they cannot provide strict DAP. However, as we show
in Section 5.5, it is possible to realize weaker than strict DAP variants of obstruction-free opaque TMs.
Bushkov et al. [31] improved on the impossibility result in [60] and showed that a variant of strict DAP
cannot be combined with obstruction-free TM-progress, even if a weaker (than strictly serializability) TM-
correctness property is assumed. In the thesis, we do not consider relaxations of strict serializability.
Guerraoui and Kapalka [60, 64] also proved that a strict serializable TM that provides OF TM-progress
and wait-free TM-liveness cannot be implemented using only read and write primitives. An interesting
open question is whether we can implement strict serializable TMs in OF using only read and write
primitives.
Observe that, since there are at most n concurrent transactions, we cannot do better than (n− 1) stalls
(cf. Definition 2.18). Thus, the lower bound of Theorem 5.3 is tight.
Moreover, we conjecture that the linear (in n) lower bound of Theorem 5.4 for RW DAP opaque
obstruction-free TMs can be strengthened to be linear in the size of the transaction’s read set. Then,
Algorithm 5.1, which proves a linear upper bound in the size of the transaction’s read set, would al-
low us to establish a linear tight bound (in the size of the transaction’s read set) for RW DAP opaque
obstruction-free TMs.
Blocking versus non-blocking TMs. As highlighted in [40, 48], obstruction-free TMs typically
must forcefully abort pending conflicting transactions. This observation inspires the impossibility of
invisible reads (Theorem 5.1). Typically, to detect the presence of a conflicting transaction and abort
it, the reading transaction must employ a RAW or a read-modify-write primitive like compare-and-swap,
motivating the linear lower bound on expensive synchronization (Theorem 5.4). Also, in obstruction-free
TMs, a transaction may not wait for a concurrent inactive transaction to complete and, as a result,
we may have an execution in which a transaction incurs a distinct stall due to a transaction run by
each other process, hence the linear stall complexity (Theorem 5.3). Intuitively, since transactions in
progressive TMs may abort themselves in case of conflicts, they can employ invisible reads and maintain
constant stall and RAW/AWAR complexities.
The lower bound and the proof technique in Theorem 5.3 is inspired by an analogous lower bound
on linearizable solo-terminating implementations [16, 46] of a wide class of “perturbable” objects that
88
5.7 Related work and Discussion
include counters, compare-and-swap and single-writer snapshots [16, 46]. Informally, the definition of
solo-termination (adapted to the TM context) says that for every finite execution E, and every transaction
T that is t-incomplete in E, there is a finite step contention-free extension in which T eventually commits.
Observe that, under this definition, T is guaranteed to commit even in some executions that are not step
contention-free for T . However, the definition of OF TM-progress used in the thesis ensures that T is
guaranteed to commit only if all its events are issued in the absence of step contention. Moreover, [16]
described a single-lock (only the process holding the lock can invoke an operation) implementation of
these objects that incurs O(log n) stalls, thus establishing a separation between the worst-case operation
stall complexity of non-blocking and blocking (i.e., lock-based) implementations of these objects. In this
chapter, we presented a linear separation in memory stall complexity between obstruction-free TMs and
lock-based TMs characterized by progressiveness, which is a strictly stronger (than single-lock) progress
guarantee, thus establishing the inherent cost of non-blocking progress in the TM context.
Some benefits of obstruction-free TMs, namely their ability to make progress even if some transac-
tions prematurely fail, are not provided by progressive TMs. However, several papers [39, 40, 48] ar-
gued that lock-based TMs tend to outperform obstruction-free ones by allowing for simpler algorithms
with lower overhead, and their inherent progress issues may be resolved using timeouts and contention-
managers [115]. This chapter explains the empirically observed performance gap between blocking and
non-blocking TMs via a series of lower bounds on obstruction-free TMs and a progressive TM algorithm
that beats all of them.
89

6
Lower bounds for partially non-blocking TMs
6.1 Overview
It is easy to see that dynamic TMs where the patterns in which transactions access t-objects are not
known in advance do not allow for wait-free TMs [64], i.e., every transaction must commit in a finite
number of steps of the process executing it, regardless of the behavior of concurrent processes. Suppose
that a transaction T1 reads t-object X, then a concurrent transaction T2 reads t-object Y , writes to X
and commits, and finally T2 writes to Y . Since T1 has read the “old” value in X and T2 has read the
“old” value in Y , there is no way to commit T1 and order the two transactions in a sequential execution.
As this scenario can be repeated arbitrarily often, even the weaker guarantee of local progress that only
requires that each transaction eventually commits if repeated sufficiently often, cannot be ensured by
any strictly serializable TM implementation, regardless of the base objects it uses [32].1
But can we ensure that at least some transactions commit wait-free and what are the inherent costs? It is
often argued that many realistic workloads are read-dominated : the proportion of read-only transactions
is higher than that of updating ones, or read-only transactions have much larger data sets than updat-
ing ones [25, 65]. Therefore, it seems natural to require that read-only transactions commit wait-free.
Since we are interested in complexity lower bounds, we require that updating transaction provide only
sequential TM-progress.
First, we focus on strictly serializable TMs with the above TM-progress conditions that use invisible
reads. We show that this requirement results in maintaining unbounded sets of versions for every data
item, i.e., such implementations may not be practical due to their space complexity. Secondly, we
prove that strictly serializable TMs with these progress conditions cannot ensure strict DAP. Thus, two
transactions that access mutually disjoint data sets may prevent each other from committing. Thirdly,
for weak DAP TMs, we show that a read-only transaction (with an arbitrarily large read set) must
sometimes perform at least one expensive synchronization pattern [17] per t-read operation, i.e., the
expensive synchronization complexity of a read-only transaction is linear in the size of its data set.
Formally, we denote by RWF the class of partially non-blocking TMs.
Definition 6.1. (The class RWF) A TM implementation M ∈ RWF iff in its every execution:
1Note that the counter-example would not work if we imagine that the data sets accessed by a transaction can be known
in advance. However, in the thesis, we consider the conventional dynamic TM programming model.
91
Chapter 6 Lower bounds for partially non-blocking TMs
R0(X1)→ v01
R2(X1)→ v11
R2i(X1)→ vi1
∀X` ∈ X : write v1`
T1 commits
∀X` ∈ X : write vi`
T2i−1 commits
Phase 0| T0
T2Phase 1| T1
Phase i| T2i−1 T2i
↓
extend to c− 1 phases
(a) for all i ∈ {1, . . . , c− 1}, T2i−1 writes vi` to each X`; read2i(X1) must return vi1
R0(X1)→ v01
R2(X1)→ v11
R2i(X1)→ vi1
R0(X2)→ v02 · · ·R0(X`)→ v0` · · ·
R2(X2)→ v12 · · ·R2(X`)→ v1` · · ·
R2i(X2)→ vi2 · · ·R2i(X`)→ vi` · · ·
∀X` ∈ X : write v1`
T1 commits
∀X` ∈ X : write vi`
T2i−1 commits
T0
T2T1
T2i−1 T2i
↓
extend to c− 1 phases
(b) extend every read-only transaction T2i in phase i with t-reads of X2, . . . X`, . . .; each read2i(X`) must return vi`
Figure 6.1: Executions in the proof of Theorem 6.1; execution in 6.1a must maintain c distinct values of
every t-object
• (wait-free TM-progress for read-only transactions) every read-only transaction commits in a finite
number of its steps, and
• ( sequential TM-progress and sequential TM-liveness for updating transactions) i.e., every trans-
action running step contention-free from a t-quiescent configuration, commits in a finite number
of its steps.
Roadmap of Chapter 6. Section 6.2 presents a lower bound on the inherent space complexity of
TMs in RWF . Section 6.3 proves the impossibility of strict DAP TMs in RWF while in Section 6.4,
assuming weak DAP, we prove a linear, in the size of the transaction’s read set, lower bound on expensive
synchronization complexity. We conclude this chapter with a discussion of the related work and open
questions concerning TMs in RWF .
6.2 The space complexity of invisible reads
We prove that every strictly serializable TM implementation M ∈ RWF that uses invisible reads must
keep unbounded sets of values for every t-object. To do so, for every c ∈ N, we construct an execution
of M that maintains at least c distinct values for every t-object. We require the following technical
definition:
Definition 6.2. Let E be any execution of a TM implementation M . We say that E maintains c distinct
values {v1, . . . , vc} of t-object X, if there exists an execution E · E′ of M such that
• E′ contains the complete executions of c t-reads of X and,
• for all i ∈ {1, . . . , c}, the response of the ith t-read of X in E′ is vi.
92
6.3 Impossibility of strict DAP
Theorem 6.1. Let M be any strictly serializable TM implementation in RWF that uses invisible reads,
and X , any set of t-objects. Then, for every c ∈ N, there exists an execution E of M such that E
maintains at least c distinct values of each t-object X ∈ X .
Proof. Let v0` be the initial value of t-object X` ∈ X . For every c ∈ N, we iteratively construct an
execution E of M of the form depicted in Figure 6.1a. The construction of E proceeds in phases: there
are at most c− 1 phases. For all i ∈ {0, . . . c− 1}, we denote the execution after phase i as Ei which is
defined as follows:
• E0 is the complete step contention-free execution fragment α0 of read-only transaction T0 that
performs read0(X1)→ v01
• for all i ∈ {1, . . . , c− 1}, Ei is defined to be an execution of the form α0 · ρ1 ·α1 · · · ρi ·αi such that
for all j ∈ {1, . . . , i},
– ρj is the t-complete step contention-free execution fragment of an updating transaction T2j−1
that, for all X` ∈ X writes the value vj` and commits
– αj is the complete step contention-free execution fragment of a read-only transaction T2j
that performs read2j(X1)→ vj1
Since read-only transactions are invisible, for all i ∈ {0, . . . , c − 1}, the execution fragment αi does
not contain any nontrivial events. Consequently, for all i < j ≤ c − 1, the configuration after Ei is
indistinguishable to transaction T2j−1 from a t-quiescent configuration and it must be committed in ρj
(by sequential progress for updating transactions). Observe that, for all 1 ≤ j < i, T2j−1 ≺RTE T2i−1.
Strict serializability of M now stipulates that, for all i ∈ {1, . . . , c − 1}, the t-read of X1 performed by
transaction T2i in the execution fragment αi must return the value vi1 of X1 as written by transaction
T2i−1 in the execution fragment ρi (in any serialization, T2i−1 is the latest committed transaction writing
to X1 that precedes T2i). Thus, M indeed has an execution E of the form depicted in Figure 6.1a.
Consider the execution fragment E′ that extends E in which, for all i ∈ {0, . . . , c − 1}, read-only
transaction T2i is extended with the complete execution of the t-reads of every t-object X` ∈ X \ {X1}
(depicted in Figure 6.1b).
We claim that, for all i ∈ {0, . . . , c− 1}, and for all X` ∈ X \ {X1}, read2i(X`) performed by transaction
T2i must return the value vi` of X` written by transaction T2i−1 in the execution fragment ρi. Indeed,
by wait-free progress, readi(X`) must return a non-abort response in such an extension of E. Suppose
by contradiction that readi(X`) returns a response that is not vi` . There are two cases:
• read2i(X`) returns the value vj` written by transaction T2j−1; j < i. However, since for all j < i,
T2j ≺RTE T2i, the execution is not strictly serializable—contradiction.
• read2i(X`) returns the value vj` written by transaction T2j ; j > i. Since readi(X1) returns the
value vi1 and T2i ≺RTE T2j , there exists no such serialization—contradiction.
Thus, E maintains at least c distinct values of every t-object X ∈ X .
6.3 Impossibility of strict DAP
In this section, we prove that it is impossible to derive strictly serializable TM implementations in
RWF which ensure that any two transactions accessing pairwise disjoint data sets can execute without
contending on the same base object.
Theorem 6.2. There exists no strictly serializable strict DAP TM implementation in RWF .
93
Chapter 6 Lower bounds for partially non-blocking TMs
R0(X1)→ v
(event of T1)W1(X1, nv) W1(X3, nv) tryC1
R3(X3)→ nv
T3 commits
T0
T1
T3
(a) By strict DAP, T0 and T3 do not contend on any base object
R0(X1)→ v R0(X2)→ nv
(event of T1)W1(X1, nv) W1(X3, nv) tryC1
R3(X3)→ nv
T3 commits
W2(X2, nv)
T2 commits
T0
T1
T3T2
(b) read0(X2) must return nv
R0(X1)→ v R0(X2)→ nv
(event of T1)W1(X1, nv) W1(X3, nv) tryC1
R3(X3)→ nv
T3 commits
W2(X2, nv)
T2 commits
T0
T1
T3 T2
(c) By strict DAP, T0 cannot distinguish this execution from the execution in 6.2b
Figure 6.2: Executions in the proof of Theorem 6.2; execution in 6.2c is not strictly serializable
Proof. Suppose by contradiction that there exists a strict DAP TM implementation M ∈ RWF .
Let v be the initial value of t-objects X1, X2 and X3. Let pi be the t-complete step contention-free
execution of transaction T1 that writes the value nv 6= v to t-objects X1 and X3. By sequential progress
for updating transactions, T1 must be committed in pi.
Note that any read-only transaction that runs step contention-free after some prefix of pi must return a
non-abort value. Since any such transaction reading X1 or X3 must return v after the empty prefix of pi
and nv when it starts from pi, there exists pi′, the longest prefix of pi that cannot be extended with the
t-complete step contention-free execution of any transaction that performs a t-read of X1 and returns
nv nor with the t-complete step contention-free execution of any transaction that performs a t-read of
X3 and returns nv.
Consider the execution fragment pi′ · α1, where α1 is the complete step contention-free execution of
transaction T0 that performs read0(X1)→ v. Indeed, by definition of pi′ and wait-free progress (assumed
for read-only transactions), M has an execution of the form pi′ · α1.
Let e be the enabled event of transaction T1 in the configuration after pi′. Without loss of generality,
assume that pi′ · e can be extended with the t-complete step contention-free execution of a transaction
that reads X3 and returns nv.
We now prove that M has an execution of the form pi′ · α1 · e · β · γ, where
• β is the t-complete step contention-free execution fragment of transaction T3 that performs read3(X3)→
nv and commits
• γ is the t-complete step contention-free execution fragment of transaction T2 that writes nv to X2
and commits.
Observe that, by definition of pi′, M has an execution of the form pi′ · e · β. By construction, transaction
T1 applies a nontrivial primitive to a base object, say b in the event e that is accessed by transaction T3 in
the execution fragment β. Since transactions T0 and T3 access mutually disjoint data sets in pi′ ·α1 · e ·β,
T3 does not access any base object in β to which transaction T0 applies a nontrivial primtive in the
94
6.4 A linear lower bound on expensive synchronization for weak DAP
execution fragment α1 (assumption of strict DAP). Thus, α1 does not contain a nontrivial primitive to
b and pi′ · α1 · e · β is indistinguishable to T3 from the execution pi′ · e · β. This proves that M has an
execution of the form pi′ · α1 · e · β (depicted in Figure 6.2a).
Since transaction T2 writes to t-object Dset(T2) = X2 6∈ {Dset(T1) ∪ Dset(T0) ∪ Dset(T3)}, by strict
DAP, the configuration after pi′ · α1 · e · β is indistinguishable to T2 from a t-quiescent configuration.
Indeed, transaction T2 does not contend with any of the transactions T1, T0 and T3 on any base object
in pi′ ·α1 · e ·β · γ. Sequential progress of M requires that T2 must be committed in pi′ ·α1 · e ·β · γ. Thus,
M has an execution of the form pi′ · α1 · e · β · γ.
By the above arguments, the execution pi′ · α1 · e · β · γ is indistinguishable to each of the transactions
T1, T0, T2 and T3 from γ · pi′ · α1 · e · β in which transaction T2 precedes T1 in real-time ordering. Thus,
γ · pi′ · α1 · e · β is also an execution of M .
Consider the extension of the execution γ · pi′ · α1 · e · β in which transaction T0 performs read0(X2) and
commits (depicted in Figure 6.2b). Strict serializability of M stipulates that read0(X2) must return nv
since T2 (which writes nv to X2 in γ) precedes T0 in this execution.
Similarly, we now extend the execution pi′ · α1 · e · β · γ with the complete step contention-free execution
fragment of the t-read of X2 by transaction T0. Since T0 is a read-only transaction, it must be committed
in this extension. However, as proved above, this execution is indistinguishable to T0 from the execution
depicted in Figure 6.2b in which read0(X2) must return nv. Thus, M has an execution of the form
pi′ · α1 · e · β · γ · α2, where T0 performs read0(X2)→ nv in α2 and commits.
However, the execution pi′ ·α1 ·e ·β ·γ ·α2 (depicted in Figure 6.2c) is not strictly serializable. Transaction
T1 must be committed in any serialization and must precede transaction T3 since read3(X3) returns the
value of X3 written by Tm. However, transaction T0 must must precede T1 since read0(X1) returns the
initial the value of X1. Also, transaction T2 must precede T0 since read0(X2) returns the value of X2
written by T2. But transaction T3 must precede T2 to respect real-time ordering of transactions. Thus,
T1 must precede T0 in any serialization. But there exists no such serialization: a contradiction to the
assumption that M is strictly serializable.
6.4 A linear lower bound on expensive synchronization for weak
DAP
In this section, we prove a linear lower bound (in the size of the transaction’s read set) on the number of
RAWs or AWARs for weak DAP TM implementations in RWF . To do so, we construct an execution in
which each t-read operation of an arbitrarily long read-only transaction contains a RAW or an AWAR.
Theorem 6.3. Every strictly serializable weakly DAP TM implementation M ∈ RWF has, for all
m ∈ N, an execution in which some read-only transaction T0 with m = |Rset(T0)| performs Ω(m)
RAWs/AWARs.
Proof. Let v be the initial value of each of the t-objects X1, . . . , Xm. Consider the t-complete step
contention-free execution of transaction T0 that performs m t-reads read0(X1), read0(X1),. . . read0(Xm)
and commits. We prove that each of the first m− 1 t-reads must perform a RAW or an AWAR.
For all j ∈ {1, . . . ,m− 1}, M has an execution of the form α1 · α2 · · ·αj , where for all i ∈ {1, . . . , j}, αi
is the complete step contention-free execution fragment of read0(Xj)→ v. Assume inductively that each
of the first j− 1 t-reads performs a RAW or an AWAR in this execution. We prove that read0(Xj) must
perform a RAW or an AWAR in the execution fragment αj . Suppose by contradiction that αj does not
contain a RAW or an AWAR.
The following claim shows that we can schedule a committed transaction Tj that writes a new value to
Xj concurrent to read0(Xj) such that the execution is indistinguishable to both T0 and Tj from a step
contention-free execution (depicted in Figure 6.3a).
95
Chapter 6 Lower bounds for partially non-blocking TMs
R0(X1) · · ·R0(Xj−1)
j − 1 t-reads R0(Xj)→ v
initial value
Wj(Xj , nv)
Tj commits
T0
Tj
(a) read0(Xj)→ v performs no RAW/AWAR; T0 and
Tj are unaware of step contention
R0(X1) · · ·R0(Xj−1)
j − 1 t-reads R0(Xj)→ v
initial value
Wj(Xj , nv)
Tj commits
W`(Xm, nv)
T` commits
R0(Xm)→ nv
new value
T0
T` Tj
(b) R0(Xm) must return nv by strict serializability
R0(X1) · · ·R0(Xj−1)
j − 1 t-reads R0(Xj)→ v
initial value
Wj(Xj , nv)
Tj commits
W`(Xm, nv)
T` commits
R0(Xm)→ nv
new value
T0
T`Tj
(c) By weak DAP, T0 cannot distinguish this execution from 6.3b
Figure 6.3: Executions in the proof of Theorem 6.3; execution in 6.3c is not strictly serializable
Claim 6.4. For all j ∈ {1, . . . ,m− 1}, M has an execution of the form α1 · · ·αj−1 · α1j · δj · α2j where,
• δj is the t-complete step contention-free execution fragment of transaction Tj that writes nv 6= v
and commits
• α1j · α2j = αj is the complete execution fragment of the jth t-read read0(Xj)→ v such that
– α1j does not contain any nontrivial events
– α1 · · ·αj−1 · α1j · δj · α2j is indistinguishable to T0 from the step contention-free execution
fragment α1 · · ·αj−1 · α1j · α2j
Moreover, Tj does not access any base object to which T0 applies a nontrivial event in α1 · · ·αj−1 ·α1j ·δj.
Proof. By wait-free progress (for read-only transactions) and strict serializability, M has an execution
of the form α1 · · ·αj−1 in which each of the t-reads performed by T0 must return the initial value of the
t-objects.
Since Tj is an updating transaction, by sequential progress, there exists an execution of M of the form
δj · α1 · · ·αj−1. Since T0 and Tj are disjoint-access in the δj · α1 · · ·αj−1, by Lemma 2.10, T0 and Tj do
not contend on any base object in δj · α1 · · ·αj−1. Thus, α1 · · ·αj−1 · δj is indistinguishable to Tj from
the execution δj and α1 · · ·αj−1 · δj is also an execution of M .
Let e be the first event that contains a write to a base object in αj . If there exists no such write event
to a base object in αj , then α1j = αj and α2j is empty. Otherwise, we represent the execution fragment
αj as α1j · e · αfj .
Since αsj does not contain any nontrivial events that write to a base object, α1 · · ·αj−1 · αsj · δj is
indistinguishable to transaction Tj from the execution α1 · · ·αj−1 · δj . Thus, α1 · · ·αj−1 · αsj · δj is an
execution of M . Since e is not an atomic-write-after-read, α1 · · ·αj−1 · αsj · δj · e is an execution of M .
Since αj does not contain a RAW, any read performed in α
f
j may only be performed to base objects
96
6.4 A linear lower bound on expensive synchronization for weak DAP
previously written in e · αfj . Thus, α1 · · ·αj−1 · αsj · δj · e · αfj is indistinguishable to transaction T0 from
the step contention-free execution α1 · · ·αj−1 · αsj · e · αfj in which read0(Xj)→ v.
Choosing α2j = e · αfj , it follows that M has an execution of the form α1 · · ·αj−1 · α1j · δj · α2j that is
indistinguishable to Tj and T0 from a step contention-free execution. The proof follows.
We now prove that, for all j ∈ {1, . . . ,m−1}, M has an execution of the form δm ·α1 · · ·αj−1 ·α1j · δj ·α2j
such that
• δm is the t-complete step contention-free execution of transaction T` that writes nv 6= v to Xm
and commits
• T` and T0 do not contend on any base object in δm · α1 · · ·αj−1 · α1j · δj · α2j
• T` and Tj do not contend on any base object in δm · α1 · · ·αj−1 · α1j · δj · α2j .
By sequential progress for updating transactions, T` which writes the value nv to Xm must be committed
in δm since it is running in the absence of step-contention from the initial configuration. Observe that T`
and T0 are disjoint-access in δm ·α1 · · ·αj−1 ·α1j ·δj ·α2j . By definition of α1j and α2j , δm ·α1 · · ·αj−1 ·α1j ·δj ·α2j
is indistinguishable to T0 from δm · α1 · · ·αj−1 · α1j · α2j . By Lemma 2.10, T` and T0 do not contend on
any base object in δm · α1 · · ·αj−1 · α1j · α2j .
By Claim 6.4, δm · α1 · · ·αj−1 · α1j · δj is indistinguishable to Tj from δm · δj . But transactions T` and Tj
are disjoint-access in δm · δj , and by Lemma 2.10, Tj and T` do not contend on any base object in δm · δj .
Since strict serializability of M stipulates that each of the j t-reads performed by T0 return the initial
values of the respective t-objects, M has an execution of the form δm · α1 · · ·αj−1 · α1j · δj · α2j .
Consider the extension of δm ·α1 · · ·αj−1 ·α1j ·δj ·α2j in which T0 performs (m−j) t-reads of Xj+1, · · · , Xm
step contention-free and commits (depicted in Figure 6.3b). By wait-free progress of M and since T0
is a read-only transaction, there exists such an execution. Notice that the mth t-read, read0(Xm) must
return the value nv by strict serializability since T` precedes T0 in real-time order in this execution.
Recall that neither pairs of transactions T` and Tj nor T` and T0 contend on any base object in the
execution δm · α1 · · ·αj−1 · α1j · δj · α2j . It follows that for all j ∈ {1, . . . ,m− 1}, M has an execution of
the form α1 · · ·αj−1 · α1j · δj · α2j · δm in which Tj precedes T` in real-time order.
Let α′ be the execution fragment that extends α1 · · ·αj−1 · α1j · δj · α2j · δm in which T0 performs (m− j)
t-reads of Xj+1, · · · , Xm step contention-free and commits (depicted in Figure 6.3c). Since α1 · · ·αj−1 ·
α1j · δj ·α2j · δm is indistinguishable to T0 from the execution δm ·α1 · · ·α!j−1 ·α1j · δj ·α2j , read0(Xm) must
return the response value nv in α′.
The execution α1 · · ·αj−1 · α1j · δj · α2j · δm · α′ is not strictly serializable. In any serialization, Tj must
precede T` to respect the real-time ordering of transactions, while T` must precede T0 since readj(Xm)
returns the value of Xm updated by T`. Also, transaction T0 must precede Tj since read0(Xj) returns
the initial value of Xj . But there exists no such serialization: a contradiction to the assumption that M
is strict serializable.
Thus, for all j ∈ {1, . . . ,m− 1}, transaction T0 must perform a RAW or an AWAR during the execution
of read0(Xj), completing the proof.
Since Theorem 6.3 implies that read-only transactions must perform nontrivial events, we have the
following corollary that was proved directly in [24].
Corollary 6.5 ([24]). There does not exist any strictly serializable weak DAP TM implementation
M ∈ RWF that uses invisible reads.
97
Chapter 6 Lower bounds for partially non-blocking TMs
6.5 Related work and Discussion
Attiya et al. [24] showed that it is impossible to implement weak DAP strictly serializable TMs inRWF if
read-only transactions may only apply trivial primitives to base objects. Attiya et al. [24] also considered
a stronger “disjoint-access” property, called simply DAP, referring to the original definition proposed
Israeli and Rappoport [86]. In DAP, two transactions are allowed to concurrently access (even for reading)
the same base object only if they are disjoint-access. For an n-process DAP TM implementation, it is
shown in [24] that a read-only transaction must perform at least n − 3 writes. Our lower bound in
Theorem 6.3 is strictly stronger than the one in [24], as it assumes only weak DAP, considers a more
precise RAW/AWARmetric, and does not depend on the number of processes in the system. (Technically,
the last point follows from the fact that the execution constructed in the proof of Theorem 6.3 uses only 3
concurrent processes.) Thus, the theorem subsumes the two lower bounds of [24] within a single proof.
Perelman et al. [110] considered the closely related (to RWF) class of mv-permissive TMs: a transaction
can only be aborted if it is an updating transaction that conflicts with another updating transaction.
RWF is incomparable with the class of mv-permissive TMs. On the one hand, mv-permissiveness
guarantees that read-only transactions never abort, but does not imply that they commit in a wait-
free manner. On the other hand, RWF allows an updating transaction to abort in the presence of a
concurrent read-only transaction, which is disallowed by mv-permissive TMs. Observe that, technically,
mv-permissiveness is a blocking TM-progress condition, although when used in conjunction with wait-free
TM-liveness, it is a partially non-blocking TM-progress condition that is strictly stronger than RWF .
Assuming starvation-free TM-liveness, [110] showed that implementing a weak DAP strictly serializable
mv-permissive TM is impossible. In the thesis, we showed that strictly serializable TMs in RWF cannot
provide strict DAP, but proving the impossibility result assuming weak DAP remains an interesting open
question.
[110] also proved that mv-permissive TMs cannot be online space optimal, i.e., no mv-permissive TM can
keep the minimum number of old object versions for any TM history. Our result on the space complexity
of implementations in RWF that use invisible reads (Theorem 6.1) is different since it proves that the
implementation must maintain an unbounded number of versions of every t-object. Our proof technique
can however be used to show that mv-permissive TMs considered in [110] should also maintain unbounded
number of versions.
98
7
Hybrid transactional memory (HyTM)
HAL: The 9000 series is the most
reliable computer ever made. No
9000 computer has ever made a
mistake or distorted information.
We are all, by any practical
definition of the words, foolproof
and incapable of error.
. . .
HAL: I’ve just picked up a fault
in the AE35 unit. It’s going to go
100% failure in 72 hours.
HAL: It can only be attributable
to human error.
Stanley Kubrick -2001: A Space
Odyssey
7.1 Overview
Hybrid transactional memory. The TM abstraction, in its original manifestation from the proposal
by Herlihy and Moss [80], augmented the processor’s cache-coherence protocol and extended the CPU’s
instruction set with instructions to indicate which memory accesses must be transactional [80]. Most
popular TM designs, subsequent to the original proposal in [80] have implemented all the functionality
in software [36, 52, 79, 101, 117] (cf. software TM model in Chapter 2). More recently, CPUs have
included hardware extensions to support short, small hardware transactions [1, 107, 111].
Early experience with programming Hardware transactional memory (HTM), e.g. [7, 38, 44], paints an
interesting picture: if used carefully, HTM can be an extremely useful construct, and can significantly
speed up and simplify concurrent implementations. At the same time, this powerful tool is not without
its limitations: since HTMs are usually implemented on top of the cache coherence mechanism, hardware
transactions have inherent capacity constraints on the number of distinct memory locations that can be
accessed inside a single transaction. Moreover, all current proposals are best-effort, as they may abort
99
Chapter 7 Hybrid transactional memory (HyTM)
under imprecisely specified conditions (cache capacity overflow, interrupts etc). In brief, the programmer
should not solely rely on HTMs.
Several Hybrid Transactional Memory (HyTM) schemes [35, 37, 88, 99] have been proposed to comple-
ment the fast, but best-effort nature of HTM with a slow, reliable software transactional memory (STM)
backup. These proposals have explored a wide range of trade-offs between the overhead on hardware
transactions, concurrent execution of hardware and software, and the provided progress guarantees.
Early proposals for HyTM implementations [37, 88] shared some interesting features. First, transactions
that do not conflict are expected to run concurrently, regardless of their types (software or hardware).
This property is referred to as progressiveness [63] and is believed to allow for increased parallelism.
Second, in addition to changing the values of transactional objects, hardware transactions usually em-
ploy code instrumentation techniques. Intuitively, instrumentation is used by hardware transactions to
detect concurrency scenarios and abort in the case of contention. The number of instrumentation steps
performed by these implementations within a hardware transaction is usually proportional to the size of
the transaction’s data set.
Recent work by Riegel et al. [113] surveyed the various HyTM algorithms to date, focusing on techniques
to reduce instrumentation overheads in the frequently executed hardware fast-path. However, it is not
clear whether there are fundamental limitations when building a HyTM with non-trivial concurrency
between hardware and software transactions. In particular, what are the inherent instrumentation costs
of building a HyTM, and what are the trade-offs between these costs and the provided concurrency, i.e.,
the ability of the HyTM system to run software and hardware transactions in parallel?
Modelling HyTM. To address these questions, the thesis proposes the first model for hybrid TM
systems which formally captures the notion of cached accesses provided by hardware transactions, and
precisely defines instrumentation costs in a quantifiable way.
We model a hardware transaction as a series of memory accesses that operate on locally cached copies
of the variables, followed by a cache-commit operation. In case a concurrent transaction performs a
(read-write or write-write) conflicting access to a cached object, the cached copy is invalidated and the
hardware transaction aborts.
Our model for instrumentation is motivated by recent experimental evidence which suggests that the
overhead on hardware transactions imposed by code which detects concurrent software transactions is a
significant performance bottleneck [102]. In particular, we say that a HyTM implementation imposes a
logical partitioning of shared memory into data and metadata locations. Intuitively, metadata is used
by transactions to exchange information about contention and conflicts while data locations only store
the values of data items read and updated within transactions. We quantify instrumentation cost by
measuring the number of accesses to metadata objects which transactions perform. Our framework
captures all known HyTM proposals which combine HTMs with an STM fallback [35, 37, 88, 99, 112].
The cost of instrumentation. Once this general model is in place, we derive two lower bounds
on the cost of implementing a HyTM. First, we show that some instrumentation is necessary in a
HyTM implementation even if we only intend to provide sequential progress, where a transaction is only
guaranteed to commit if it runs in the absence of concurrency.
Second, we prove that any progressive HyTM implementation providing obstruction-free liveness (every
operation running solo returns some response) and has executions in which an arbitrarily long read-
only hardware transaction running in the absence of concurrency must access a number of distinct
metadata objects proportional to the size of its data set. Our proof technique is interesting in its own
right. Inductively, we start with a sequential execution in which a “large” set Sm of read-only hardware
transactions, each accessing m distinct data items and m distinct metadata memory locations, run after
an execution Em. We then construct execution Em+1, an extension of Em, which forces at least half of
the transactions in Sm to access a new metadata base object when reading a new (m+ 1)th data item,
running after Em+1. The technical challenge, and the key departure from prior work on STM lower
bounds, e.g. [24, 60, 64], is that hardware transactions practically possess “automatic” conflict detection,
aborting on contention. This is in contrast to STMs, which must take steps to detect contention on
memory locations.
100
7.2 Modelling HyTM
We match this lower bound with an HyTM algorithm that, additionally, allows for uninstrumented writes
and invisible reads and is provably opaque [64]. To the best of our knowledge, this is the first formal
proof of correctness of a HyTM algorithm.
Low-instrumentation HyTM. The high instrumentation costs of early HyTM designs, which we show
to be inherent, stimulated more recent HyTM schemes [35, 99, 102, 113] to sacrifice progressiveness for
constant instrumentation cost (i.e., not depending on the size of the transaction). In the past two
years, Dalessandro et al. [35] and Riegel et al. [113] have proposed HyTMs based on the efficient NOrec
STM [36]. These HyTMs schemes do not guarantee any parallelism among transactions; only sequential
progress is ensured. Despite this, they are among the best-performing HyTMs to date due to the limited
instrumentation in hardware transactions.
Starting from this observation, we provide a more precise upper bound for low-instrumentation HyTMs by
presenting a HyTM algorithm with invisible reads and uninstrumented hardware writes which guarantees
that a hardware transaction accesses at most one metadata object in the course of its execution. Software
transactions in this implementation remain progressive, while hardware transactions are guaranteed to
commit only if they do not run concurrently with an updating software transaction (or exceed capacity).
Therefore, the cost of avoiding the linear lower bound for progressive implementations is that hardware
transactions may be aborted by non-conflicting software ones.
Roadmap of Chapter 7. In Section 7.2, we introduce the model of HyTMs and Section 7.3 studies
the inherent cost of concurrency in progressive HyTMs by presenting a linear lower bound on the cost of
instrumentation while Section 7.4 presents a matching upper bound. In Section 7.5 discusses providing
partial concurrency with instrumentation cost and in Section 7.6, we elaborate on prior work related to
HyTMs.
7.2 Modelling HyTM
In this chapter, we introduce the model of HyTMs, extending the TM model from Chapter 2, that
intuitively captures the cache-coherence protocols employed in shared memory systems.
7.2.1 Direct and cached accesses
We now describe the operation of a Hybrid Transactional Memory (HyTM) implementation. In our
model, every base object can be accessed with two kinds of primitives, direct and cached.
In a direct access, the rmw primitive operates on the memory state: the direct-access event atomically
reads the value of the object in the shared memory and, if necessary, modifies it.
In a cached access performed by a process i, the rmw primitive operates on the cached state recorded in
process i’s tracking set τi. One can think of τi as the L1 cache of process i. A hardware transaction is a
series of cached rmw primitives performed on τi followed by a cache-commit primitive.
More precisely, τi is a set of triples (b, v,m) where b is a base object identifier, v is a value, and m ∈
{shared , exclusive} is an access mode. The triple (b, v,m) is added to the tracking set when i performs a
cached rmw access of b, where m is set to exclusive if the access is nontrivial, and to shared otherwise.
We assume that there exists some constant TS (representing the size of the L1 cache) such that the
condition |τi| ≤ TS must always hold; this condition will be enforced by our model. A base object b is
present in τi with mode m if ∃v, (b, v,m) ∈ τi.
A trivial (resp. nontrivial) cached primitive 〈g, h〉 applied to b by process i first checks the condition
|τi| = TS and if so, it sets τi = ∅ and immediately returns ⊥ (we call this event a capacity abort). We
assume that TS is large enough so that no transaction with data set of size 1 can incur a capacity abort.
If the transaction does not incur a capacity abort, the process checks whether b is present in exclusive
(resp. any) mode in τj for any j 6= i. If so, τi is set to ∅ and the primitive returns ⊥. Otherwise, the
triple (b, v, shared) (resp. (b, g(v), exclusive)) is added to τi, where v is the most recent cached value of
101
Chapter 7 Hybrid transactional memory (HyTM)
Fast-path
(access of b)
T2 A2
T1
E
(b, v, exclusive) ∈ τ2 after E
(a) τ2 is invalidated by
(fast-path or slow-path)
transaction T1’s access
of base object b
Fast-path
(write to b)
T2 A2
T1
E
(b, v, shared) ∈ τ2 after E
(b) τ2 is invalidated by
(fast-path or slow-
path) transaction T1’s
write to base object b
Figure 7.1: Tracking set aborts in fast-path transactions; we denote a fast-path (and resp. slow-path)
transaction by F (and resp. S)
b in τi (in case b was previously accessed by i within the current hardware transaction) or the value of b
in the current memory configuration, and finally h(v) is returned.
A tracking set can be invalidated by a concurrent process: if, in a configuration C where (b, v, exclusive) ∈
τi (resp. (b, v, shared) ∈ τi), a process j 6= i applies any primitive (resp. any nontrivial primitive) to b,
then τi becomes invalid and any subsequent cached primitive invoked by i sets τi to ∅ and returns ⊥.
We refer to this event as a tracking set abort.
Finally, the cache-commit primitive issued by process i with a valid τi does the following: for each base
object b such that (b, v, exclusive) ∈ τi, the value of b in C is updated to v. Finally, τi is set to ∅ and the
primitive returns commit.
Note that HTM may also abort spuriously, or because of unsupported operations [111]. The first cause
can be modelled probabilistically in the above framework, which would not however significantly affect
our claims and proofs, except for a more cumbersome presentation. Also, our lower bounds are based
exclusively on executions containing t-reads and t-writes. Therefore, in the following, we only consider
contention and capacity aborts.
7.2.2 Slow-path and fast-path transactions
In the following, we partition HyTM transactions into fast-path transactions and slow-path transac-
tions. Practically, two separate algorithms (fast-path one and slow-path one) are provided for each
t-operation.
A slow-path transaction models a regular software transaction. An event of a slow-path transaction is
either an invocation or response of a t-operation, or a rmw primitive on a base object.
A fast-path transaction essentially encapsulates a hardware transaction. An event of a fast-path trans-
action is either an invocation or response of a t-operation, a cached primitive on a base object, or a
cache-commit : t-read and t-write are only allowed to contain cached primitives, and tryC consists of
invoking cache-commit. Furthermore, we assume that a fast-path transaction Tk returns Ak as soon an
underlying cached primitive or cache-commit returns ⊥. Figure 7.1 depicts such a scenario illustrating
a tracking set abort: fast-path transaction T2 executed by process p2 accesses a base object b in shared
(and resp. exclusive) mode and it is added to its tracking set τ2. Immediately after the access of b by T2,
a concurrent transaction T1 applies a nontrivial primitive to b (and resp. accesses b). Thus, the tracking
of p2 is invalidated and T2 must be aborted in any extension of this execution.
We provide two key observations on this model regarding the interactions of non-committed fast path
transactions with other transactions. Let E be any execution of a HyTM implementation M in which
a fast-path transaction Tk is either t-incomplete or aborted. Then the sequence of events E′ derived by
removing all events of E|k from E is an executionM. Moreover:
102
7.2 Modelling HyTM
W2(X, v)
W1(X, v)
Fast-path
Slow-path
T2
T1
E
Aborted or incomplete
fast-path transaction T2
(a)
W1(X, v)
Slow-path
T1
E′
(b)
Figure 7.2: Execution E in Figure 7.2a is indistinguishable to T1 from the execution E′ in Figure 7.2b
Observation 7.1. To every slow-path transaction Tm ∈ txns(E), E is indistinguishable from E′.
Observation 7.2. If a fast-path transaction Tm ∈ txns(E) \ {Tk} does not incur a tracking set abort in
E, then E is indistinguishable to Tm from E′.
Intuitively, these observations say that fast-path transactions which are not yet committed are invisible
to slow-path transactions, and can communicate with other fast-path transactions only by incurring their
tracking-set aborts. Figure 7.2 illustrates Observation 7.1: a fast-path transaction T2 is concurrent to a
slow-path transaction T1 in an execution E. Since T2 is t-incomplete or aborted in this execution, E is
indistinguishable to T1 from an execution E′ derived by removing all events of T2 from E. Analogously,
to illustrate Observation 7.2, if T1 is a fast-path transaction that does not incur a tracking set abort in
E, then E is indistinguishable to T1 from E′.
7.2.3 Instrumentation
Now we define the notion of code instrumentation in fast-path transactions. Intuitively, instrumentation
characterizes the number of extra “metadata” accesses performed by a fast-path transaction.
We start with the following technical definition. An execution E of a HyTMM appears t-sequential to
a transaction Tk ∈ txns(E) if there exists an execution E′ ofM such that:
• txns(E′) ⊆ txns(E) \ {Tk} and the configuration after E′ is t-quiescent,
• every transaction Tm ∈ txns(E) that precedes Tk in real-time order is included in E′ such that
E|m = E′|m,
• for every transaction Tm ∈ txns(E′), RsetE′(Tm) ⊆ RsetE(Tm) and WsetE′(Tm) ⊆ WsetE(Tm),
and
• E′ · E|k is an execution ofM.
Definition 7.1 (Data and metadata base objects). Let X be the set of t-objects operated by a HyTM
implementationM. Now we partition the set of base objects used byM into a set D of data objects and
a set M of metadata objects (D ∩M = ∅). We further partition D into sets DX associated with each
t-object X ∈ X : D = ⋃
X∈X
DX , for all X 6= Y in X , DX ∩ DY = ∅, such that:
1. In every execution E, each fast-path transaction Tk ∈ txns(E) only accesses base objects in⋃
X∈DSet(Tk)
DX or M.
2. Let E · ρ and E ·E′ · ρ′ be two t-complete executions, such that E and E ·E′ are t-complete, ρ and
ρ′ are complete executions of a transaction Tk /∈ txns(E · E′), Hρ = Hρ′ , and ∀Tm ∈ txns(E′),
Dset(Tm) ∩ Dset(Tk) = ∅. Then the states of the base objects
⋃
X∈DSet(Tk)
DX in the configuration
after E · ρ and E · E′ · ρ′ are the same.
103
Chapter 7 Hybrid transactional memory (HyTM)
R0(Z)→ v W0(X,nv) tryC0W0(Y, nv) (event of T0)
e
Ry(Y )→ nv
returns new value
S F
T0 Ty
(a) Ty must return the new value
R0(Z)→ v W0(X,nv) tryC0W0(Y, nv) Wz(Z, nv)
write new value
S F
T0 Tz
(b) Since Tz is uninstrumented, by Observation 7.3 and se-
quential TM-progress, Tz must commit
R0(Z)→ v W0(X,nv) tryC0W0(Y, nv) Rx(X)→ v
returns initial value
Wz(Z, nv)
write new value
S F F
T0 Tz Tx
(c) Since Tx does not access any metadata, by Observation 7.3, it cannot abort
and must return the initial value value of X
R0(Z)→ v W0(X,nv) tryC0W0(Y, nv) Rx(X)→ v
returns initial value
(event of T0)
e
Ry(Y )→ nv
returns new value
Wz(Z, nv)
write new value
S F FF
T0 Tz Tx Ty
(d) Ty does not contend with Tx or Tz on any base object
Figure 7.3: Executions in the proof of Theorem 7.4; execution in 7.3d is not strictly serializable
3. Let execution E appear t-sequential to a transaction Tk and let the enabled event e of Tk after E
be a primitive on a base object b ∈ D. Then, unless e returns ⊥, E · e also appears t-sequential to
Tk.
Intuitively, the first condition says that a transaction is only allowed to access data objects based on
its data set. The second condition says that transactions with disjoint data sets can communicate only
via metadata objects. Finally, the last condition means that base objects in D may only contain the
“values” of t-objects, and cannot be used to detect concurrent transactions. Note that our results will
lower bound the number of metadata objects that must be accessed under particular assumptions, thus
from a cost perspective, D should be made as large as possible.
All HyTM proposals we aware of, such as HybridNOrec [35, 112], PhTM [99] and others [37, 88], conform
to our definition of instrumentation in fast-path transactions. For instance, HybridNOrec [35, 112]
employs a distinct base object in D for each t-object and a global sequence lock as the metadata that
is accessed by fast-path transactions to detect concurrency with slow-path transactions. Similarly, the
HyTM implementation by Damron et al. [37] also associates a distinct base object in D for each t-object
and additionally, a transaction header and ownership record as metadata base objects.
Definition 7.2 (Uninstrumented HyTMs). A HyTM implementationM provides uninstrumented writes
(resp. reads) if in every execution E of M, for every write-only (resp. read-only) fast-path transaction
Tk, all primitives in E|k are performed on base objects in D. A HyTM is uninstrumented if both its reads
and writes are uninstrumented.
Observation 7.3. Consider any execution E of a HyTM implementation M which provides uninstru-
mented reads (resp. writes). For any fast-path read-only (resp. write-only) transaction Tk 6∈ txns(E),
that runs step-contention free after E, the execution E appears t-sequential to Tk.
7.2.4 Impossibility of uninstrumented HyTMs
In this section, we show that any strictly serializable HyTM must be instrumented, even under a very
weak progress assumption by which a transaction is guaranteed to commit only when run t-sequentially:
104
7.2 Modelling HyTM
Definition 7.3 (Sequential TM-progress for HyTMs). A HyTM implementationM provides sequential
TM-progress for fast-path transactions (and resp. slow-path) if in every execution E of M, a fast-
path (and resp. slow-path) transaction Tk returns Ak in E only if Tk incurs a capacity abort or Tk
is concurrent to another transaction. We say that M provides sequential TM-progress if it provides
sequential TM-progress for fast-path and slow-path transactions.
Theorem 7.4. There does not exist a strictly serializable uninstrumented HyTM implementation that
ensures sequential TM-progress and TM-liveness.
Proof. Suppose by contradiction that such a HyTMM exists. For simplicity, assume that v is the initial
value of t-objects X, Y and Z. Let E be the t-complete step contention-free execution of a slow-path
transaction T0 that performs read0(Z) → v, write0(X,nv), write0(Y, nv) (nv 6= v), and commits. Such
an execution exists sinceM ensures sequential TM-progress.
By Observation 7.3, any transaction that runs step contention-free starting from a prefix of E must
return a non-abort value. Since any such transaction reading X or Y must return v when it starts from
the empty prefix of E and nv when it starts from E.
Thus, there exists E′, the longest prefix of E that cannot be extended with the t-complete step contention-
free execution of a fast-path transaction reading X or Y and returning nv. Let e is the enabled event of T0
in the configuration after E′. Without loss of generality, suppose that there exists an execution E′ ·e ·Ey
where Ey is the t-complete step contention-free execution fragment of some fast-path transaction Ty that
reads Y is returns nv (Figure 7.3a).
Claim 7.5. M has an execution E′ · Ez · Ex, where
• Ez is the t-complete step contention-free execution fragment of a fast-path transaction Tz that
writes nv 6= v to Z and commits
• Ex is the t-complete step contention-free execution fragment of a fast-path transaction Tx that
performs a single t-read readx(X)→ v and commits.
Proof. By Observation 7.3, the extension of E′ in which Tz writes to Z and tries to commit appears t-
sequential to Tz. By sequential TM-progress, Tz complets the write and commits. Let E′·Ez (Figure 7.3b)
be the resulting execution ofM.
Similarly, the extension of E′ in which Tx reads X and tries to commit appears t-sequential to Tx. By
sequential TM-progress, Tx commits and let E′ · Ex be the resulting execution ofM. By the definition
of E′, readx(X) must return v in E′ · Ex.
SinceM is uninstrumented and the data sets of Tx and Tz are disjoint, the sets of base objects accessed
in the execution fragments Ex and Ey are also disjoint. Thus, E′ ·Ez ·Ex is indistinguishable to Tx from
the execution E′ · Ex, which implies that E′ · Ez · Ex is an execution ofM (Figure 7.3c).
Finally, we prove that the sequence of events, E′ · Ez · Ex · e · Ey is an execution ofM.
Since the transactions Tx, Ty, Tz have pairwise disjoint data sets in E′ · Ez · Ex · e · Ey, no base object
accessed ib Ey can be accessed in Ex and Ez. The read operation on X performed by Ty in E′ · e · Ey
returns nv and, by the definition of E′ and e, Ty must have accessed the base object b modified in the
event e by T0. Thus, b is not accessed in Ex and Ez and E′ ·Ez ·Ex · e is an execution ofM. Summing
up, E′ ·Ez ·Ex · e ·Ey is indistinguishable to Ty from E′ · e ·Ey, which implies that E′ ·Ez ·Ex · e ·Ey is
an execution ofM (Figure 7.3d).
But the resulting execution is not strictly serializable. Indeed, suppose that a serialization exists. As
the value written by T0 is returned by a committed transaction Ty, T0 must be committed and precede
Ty in the serialization. Since Tx returns the initial value of X, Tx must precede T0. Since T0 reads the
initial value of Z, T0 must precede Tz. Finally, Tz must precede Tx to respect the real-time order. The
cycle in the serialization establishes a contradiction.
105
Chapter 7 Hybrid transactional memory (HyTM)
7.3 A linear lower bound on instrumentation for progressive
HyTMs
In this section, we show that giving HyTM the ability to run and commit transactions in parallel brings
considerable instrumentation costs. We focus on a natural progress condition called progressiveness [61,
62, 63] that allows a transaction to abort only if it experiences a read-write or write-write conflict with
a concurrent transaction:
Definition 7.4 (Progressiveness for HyTMs). We say that transactions Ti and Tj conflict in an execution
E on a t-object X if X ∈ Dset(Ti) ∩Dset(Tj) and X ∈Wset(Ti) ∪Wset(Tj).
A HyTM implementation M is fast-path (resp. slow-path) progressive if in every execution E of M
and for every fast-path (and resp. slow-path) transaction Ti that aborts in E, either Ai is a capacity
abort or Ti conflicts with some transaction Tj that is concurrent to Ti in E. We sayM is progressive if
it is both fast-path and slow-path progressive.
We show that for every opaque fast-path progressive HyTM that provides obstruction-free TM-liveness,
an arbitrarily long read-only transaction might access a number of distinct metadata base objects that
is linear in the size of its read set or experience a capacity abort.
The following auxiliary results will be crucial in proving our lower bound. We observe first that a fast path
transaction in a progressive HyTM can contend on a base object only with a conflicting transaction.
Lemma 7.6. LetM be any fast-path progressive HyTM implementation. Let E ·E1 ·E2 be an execution
ofM where E1 (and resp. E2) is the step contention-free execution fragment of transaction T1 6∈ txns(E)
(and resp. T2 6∈ txns(E)), T1 (and resp. T2) does not conflict with any transaction in E ·E1 ·E2, and at
least one of T1 or T2 is a fast-path transaction. Then, T1 and T2 do not contend on any base object in
E · E1 · E2.
Proof. Suppose, by contradiction that T1 or T2 contend on the same base object in E · E1 · E2.
If in E1, T1 performs a nontrivial event on a base object on which they contend, let e1 be the last event
in E1 in which T1 performs such an event to some base object b and e2, the first event in E2 that accesses
b. Otherwise, T1 only performs trivial events in E1 to base objects on which it contends with T2 in
E ·E1 ·E2: let e2 be the first event in E2 in which E2 performs a nontrivial event to some base object b
on which they contend and e1, the last event of E1 in T1 that accesses b.
Let E′1 (and resp. E′2) be the longest prefix of E1 (and resp. E2) that does not include e1 (and resp.
e2). Since before accessing b, the execution is step contention-free for T1, E · E′1 · E′2 is an execution of
M. By construction, T1 and T2 do not conflict in E ·E′1 ·E′2. Moreover, E ·E1 ·E′2 is indistinguishable
to T2 from E · E′1 · E′2. Hence, T1 and T2 are poised to apply contending events e1 and e2 on b in the
execution E˜ = E · E′1 · E′2. Recall that at least one event of e1 and e2 must be nontrivial.
Consider the execution E˜ · e1 · e′2 where e′2 is the event of p2 in which it applies the primitive of e2 to the
configuration after E˜ · e1. After E˜ · e1, b is contained in the tracking set of process p1. If b is contained
in τ1 in the shared mode, then e′2 is a nontrivial primitive on b, which invalidates τ1 in E˜ · e1 · e′2. If b
is contained in τ1 in the exclusive mode, then any subsequent access of b invalidates τ1 in E˜ · e1 · e′2. In
both cases, τ1 is invalidated and T1 incurs a tracking set abort. Thus, transaction T1 must return A1 in
any extension of E · e1 · e2—a contradiction to the assumption thatM is progressive.
Iterative application of Lemma 7.6 implies the following:
Corollary 7.7. LetM be any fast-path progressive HyTM implementation. Let E ·E1 · · ·Ei ·Ei+1 · · ·Em
be any execution ofM where for all i ∈ {1, . . . ,m}, Ei is the step contention-free execution fragment of
transaction Ti 6∈ txns(E) and any two transactions in E1 · · ·Em do not conflict. For all i, j = 1, . . . ,m,
i 6= j, if Ti is fast-path, then Ti and Tj do not contend on a base object in E · E1 · · ·Ei · · ·Em
106
7.3 A linear lower bound on instrumentation for progressive HyTMs
Proof. Let Ti be a fast-path transaction. By Lemma 7.6, in E · E1 · · ·Ei · · ·Em, Ti does not contend
with Ti−1 (if i > 1) or Ti+1 (if i < m) on any base object and, thus, Ei commutes with Ei−1 and Ei+1.
Thus, E · E1 · · ·Ei−2 · Ei · Ei−1 · Ei+1 · · ·Em (if i > 1) and E · E1 · · ·Ei−1 · Ei+1 · Ei · Ei+2 · · ·Em (if
i < m) are executions of M. By iteratively applying Lemma 7.6, we derive that Ti does not contend
with any Tj , j 6= i.
We say that execution fragments E and E′ are similar if they export equivalent histories, i.e., no process
can see the difference between them by looking at the invocations and responses of t-operations. We now
use Corollary 7.7 to show that t-operations only accessing data base objects cannot detect contention
with non-conflicting transactions.
Lemma 7.8. Let E be any t-complete execution of a progressive HyTM implementationM that provides
OF TM-liveness. For any m ∈ N, consider a set of m executions ofM of the form E · Ei · γi · ρi where
Ei is the t-complete step contention-free execution fragment of a transaction Tm+i, γi is a complete step
contention-free execution fragment of a fast-path transaction Ti such that Dset(Ti) ∩Dset(Tm+i) = ∅ in
E · Ei · γi, and ρi is the execution fragment of a t-operation by Ti that does not contain accesses to any
metadata base object. If, for all i, j ∈ {1, . . . ,m}, i 6= j, Dset(Ti)∩Dset(Tm+j) = ∅, Dset(Ti)∩Dset(Tj) =
∅ and Dset(Tm+i)∩Dset(Tm+j) = ∅, then there exists a t-complete step contention-free execution fragment
E′ that is similar to E1 · · ·Em such that for all i ∈ {1, . . . ,m}, E · E′ · γi · ρi is an execution ofM.
Proof. Observe that any two transactions in the execution fragment E1 · · ·Em access mutually disjoint
data sets. Since M is progressive and provides OF TM-liveness, there exists a t-sequential execution
fragment E′ = E′1 · · ·E′m such that, for all i ∈ {1, . . . ,m}, the execution fragments Ei and E′i are similar
and E · E′ is an execution of M. Corollary 7.7 implies that, for all for all i ∈ {1, . . . ,m}, M has
an execution of the form E · E′1 · · ·E′i · · ·E′m · γi. More specifically, M has an execution of the form
E · γi · E′1 · · ·E′i · · ·E′m. Recall that the execution fragment ρi of fast-path transaction Ti that extends
γi contains accesses only to base objects in
⋃
X∈DSet(Ti)
DX . Moreover, for all i, j ∈ {1, . . . ,m}; i 6= j,
Dset(Ti) ∩Dset(Tm+j) = ∅ and Dset(Tm+i) ∩Dset(Tm+j) = ∅.
It follows thatM has an execution of the form E · γi ·E′1 · · ·E′i · ρi ·E′i+1 · · ·E′m. and the states of each
of the base objects
⋃
X∈DSet(Ti)
DX accessed by Ti in the configuration after E · γi ·E′1 · · ·E′i and E · γi ·Ei
are the same. But E · γi ·Ei · ρi is an execution ofM. Thus, for all i ∈ {1, . . . ,m},M has an execution
of the form E · E′ · γi · ρi.
Finally, we are now ready to derive our lower bound.
Theorem 7.9. LetM be any progressive, opaque HyTM implementation that provides OF TM-liveness.
For every m ∈ N, there exists an execution E in which some fast-path read-only transaction Tk ∈ txns(E)
satisfies either (1) Dset(Tk) ≤ m and Tk incurs a capacity abort in E or (2) Dset(Tk) = m and Tk accesses
Ω(m) distinct metadata base objects in E.
Here is a high-level overview of the proof technique. Let κ be the smallest integer such that some fast-
path transaction running step contention-free after a t-quiescent configuration performs κ t-reads and
incurs a capacity abort.
We prove that, for all m ≤ κ−1, there exists a t-complete execution Em and a set Sm with |Sm| = 2κ−m
of read-only fast-path transactions that access mutually disjoint data sets such that each transaction in
Sm that runs step contention-free from Em and performs t-reads of m distinct t-objects accesses at least
one distinct metadata base object within the execution of each t-read operation.
We proceed by induction. Assume that the induction statement holds for all m < kappa− 1. We prove
that a set Sm+1; |Sm+1| = 2κ−(m+1) of fast-path transactions, each of which run step contention-free
after the same t-complete execution Em+1, perform m + 1 t-reads of distinct t-objects so that at least
one distinct metadata base object is accessed within the execution of each t-read operation. In our
construction, we pick any two new transactions from the set Sm and show that one of them running
107
Chapter 7 Hybrid transactional memory (HyTM)
step contention-free from a t-complete execution that extends Em performs m + 1 t-reads of distinct
t-objects so that at least one distinct metadata base object is accessed within the execution of each
t-read operation. In this way, the set of transactions is reduced by half in each step of the induction
until one transaction remains which must have accessed a distinct metadata base object in every one of
its m+ 1 t-reads.
Intuitively, since all the transactions that we use in our construction access mutually disjoint data sets,
we can apply Lemma 7.6 to construct a t-complete execution Em+1 such that each of the fast-path
transactions in Sm+1 when running step contention-free after Em+1 perform m + 1 t-reads so that at
least one distinct metadata base object is accessed within the execution of each t-read operation.
We now present the formal proof:
Proof. In the constructions which follow, every fast-path transaction executes at most m + 1 t-reads.
Let κ be the smallest integer such that some fast-path transaction running step contention-free after a
t-quiescent configuration performs κ t-reads and incurs a capacity abort. We proceed by induction.
Induction statement. We prove that, for all m ≤ κ− 1, there exists a t-complete execution Em and a
set Sm with |Sm| = 2κ−m of read-only fast-path transactions that access mutually disjoint data sets such
that each transaction Tfi ∈ Sm that runs step contention-free from Em and performs t-reads ofm distinct
t-objects accesses at least one distinct metadata base object within the execution of each t-read operation.
Let Efi be the step contention-free execution of Tfi after Em and let Dset(Tfi) = {Xi,1, . . . , Xi,m}.
The induction. Assume that the induction statement holds for all m ≤ κ−1. The statement is trivially
true for the base case m = 0 for every κ ∈ N.
We will prove that a set Sm+1; |Sm+1| = 2κ−(m+1) of fast-path transactions, each of which run step
contention-free from the same t-quiescent configuration Em+1, perform m + 1 t-reads of distinct t-
objects so that at least one distinct metadata base object is accessed within the execution of each t-read
operation.
The construction proceeds in phases: there are exactly |Sm|2 phases. In each phase, we pick any two new
transactions from the set Sm and show that one of them running step contention-free after a t-complete
execution that extends Em performs m + 1 t-reads of distinct t-objects so that at least one distinct
metadata base object is accessed within the execution of each t-read operation.
Throughout this proof, we will assume that any two transactions (and resp. execution fragments) with
distinct subscripts represent distinct identifiers.
For all i ∈ {0, . . . , |Sm|2 − 1}, let X2i+1, X2i+2 6∈
|Sm|−1⋃
i=0
{Xi,1, . . . , Xi,m} be distinct t-objects and let v
be the value of X2i+1 and X2i+2 after Em. Let Tsi denote a slow-path transaction which writes nv 6= v
to X2i+1 and X2i+2. Let Esi be the t-complete step contention-free execution fragment of Tsi running
immediately after Em.
Let E′si be the longest prefix of the execution Esi such that Em · E′si can be extended neither with
the complete step contention-free execution fragment of transaction Tf2i+1 that performs its m t-reads
of X2i+1,1, . . . , X2i+1,m and then performs readf2i+1(X2i+1) and returns nv, nor with the complete step
contention-free execution fragment of some transaction Tf2i+2 that performs t-reads ofX2i+21 , . . . , X2i+2,m
and then performs readf2i+2(X2i+2) and returns nv. Progressiveness and OF TM-liveness ofM stipulates
that such an execution exists.
Let ei be the enabled event of Tsi in the configuration after Em · E′si . By construction, the execution
Em ·E′si can be extended with at least one of the complete step contention-free executions of transaction
Tf2i+1 performing (m + 1) t-reads of X2i+1,1, . . . , X2i+1,m, X2i+1 such that readf2i+1(X2i+1) → nv or
transaction Tf2i+2 performing t-reads of X2i+2,1, . . . , X2i+2,m, X2i+2 such that readf2i+2(X2i+2) → nv.
Without loss of generality, suppose that Tf2i+1 reads the value of X2i+1 to be nv after Em · E′0i · ei.
108
7.3 A linear lower bound on instrumentation for progressive HyTMs
For any i ∈ {0, . . . , |Sm|2 − 1}, we will denote by αi the execution fragment which we will construct in
phase i. For any i ∈ {0, . . . , |Sm|2 − 1}, we prove thatM has an execution of the form Em · αi in which
Tf2i+1 (or Tf2i+2) running step contention-free after a t-complete execution that extends Em performs
m + 1 t-reads of distinct t-objects so that at least one distinct metadata base object is accessed within
the execution of each first m t-read operations and Tf2i+1 (or Tf2i+2) is poised to apply an event after
Em · αi that accesses a distinct metadata base object during the (m+ 1)th t-read. Furthermore, we will
show that Em · αi appears t-sequential to Tf2i+1 (or Tf2i+2).
(Construction of phase i)
Let Ef2i+1 (and resp. Ef2i+2) be the complete step contention-free execution of the t-reads ofX2i+1,1, . . . , X2i+1,m
(and resp. X2i+2,1, . . . , X2i+2,m) running after Em by Tf2i+1 (and resp. Tf2i+2). By the inductive hy-
pothesis, transaction Tf2i+1 (and resp. Tf2i+2) accesses m distinct metadata objects in the execution
Em ·Ef2i+1 (and resp. Em ·Ef2i+2). Recall that transaction Tf2i+1 does not conflict with transaction Tsi .
Thus, by Corollary 7.7,M has an execution of the form Em ·E′si ·ei ·Ef2i+1 (and resp. Em ·E′si ·ei ·Ef2i+2).
Let Erf2i+1 be the complete step contention-free execution fragment of readf2i+1(X2i+1) that extends
E2i+1 = Em ·E′si · ei ·Ef2i+1 . By OF TM-liveness, readf2i+1(X2i+1) must return a matching response in
E2i+1 · Erf2i+1 . We now consider two cases.
Case I: Suppose Erf2i+1 accesses at least one metadata base object b not previously accessed by Tf2i+1 .
Let E′rf2i+1 be the longest prefix of Erf2i+1 which does not apply any primitives to any metadata base
object b not previously accessed by Tf2i+1 . The execution Em ·E′si ·ei ·Ef2i+1 ·E′rf2i+1 appears t-sequential
to Tf2i+1 because Ef2i+1 does not contend with Tsi on any base object and any common base object
accessed in the execution fragments E′rx2i+1 and Esi by Tf2i+1 and Tsi respectively must be data objects
contained in D. Thus, we have that |Dset(Tf2i+1)| = m+ 1 and that Tf2i+1 accesses m distinct metadata
base objects within each of its first m t-read operations and is poised to access a distinct metadata base
object during the execution of the (m+ 1)th t-read. In this case, let αi = Em · E′si · ei · Ef2i+1 · E′rf2i+1 .
Case II: Suppose Erf2i+1 does not access any metadata base object not previously accessed by Tf2i+1 .
In this case, we will first prove the following:
Claim 7.10. M has an execution of the form E2i+2 = Em · E′si · ei · E¯f2i+1 · Ef2i+2 where E¯f2i+1 is
the t-complete step contention-free execution of Tf2i+1 in which readf2i+1(X2i+1) → nv, Tf2i+1 invokes
tryCf2i+1 and returns a matching response.
Proof. Since Erf2i+1 does not contain accesses to any distinct metadata base objects, the execution
Em ·E′si · ei ·Ef2i+1 ·Erf2i+1 appears t-sequential to Tf2i+1 . By definition of the event ei, readf2i+1(X2i+1)
must access the base object to which the event ei applies a nontrivial primitive and return the response
nv in E′si · ei ·Ef2i+1 ·Erf2i+1 . By OF TM-liveness, it follows that Em ·E′si · ei · E¯f2i+1 is an execution ofM.
Now recall that Em ·E′si ·ei ·Ef2i+2 is an execution ofM because transactions Tf2i+2 and Tsi do not conflict
in this execution and thus, cannot contend on any base object. Finally, because Tf2i+1 and Tf2i+2 access
disjoint data sets in Em ·E′si ·ei ·E¯f2i+1 ·Ef2i+2 , by Lemma 7.6 again, we have that Em ·E′si ·ei ·E¯f2i+1 ·Ef2i+2
is an execution ofM.
Let Erf2i+2 be the complete step contention-free execution fragment of readf2i+2(X2i+2) after Em · E′si ·
ei · E¯f2i+1 ·Ef2i+2 . By the induction hypothesis and Claim 7.10, transaction Tf2i+2 must access m distinct
metadata base objects in the execution Em · E′si · ei · E¯f2i+1 · Ef2i+2 .
If Erf2i+2 accesses some metadata base object, then by the argument given in Case I applied to transaction
Tf2i+2 , we get that Tf2i+2 accesses m distinct metadata base objects within each of the first m t-read
operations and is poised to access a distinct metadata base object during the execution of the (m+ 1)th
t-read.
109
Chapter 7 Hybrid transactional memory (HyTM)
Thus, suppose that Erf2i+2 does not access any metadata base object previously accessed by Tf2i+2 .
We claim that this is impossible and proceed to derive a contradiction. In particular, Erf2i+2 does not
contend with Tsi on any metadata base object. Consequently, the execution Em ·E′si · ei · E¯f2i+1 ·Ef2i+2
appears t-sequential to Tx2i+2 since Erx2i+2 only contends with Tsi on base objects in D. It follows that
E2i+2 · Erf2i+2 must also appear t-sequential to Tf2i+2 and so Erf2i+2 cannot abort. Recall that the
base object, say b, to which Tsi applies a nontrivial primitive in the event ei is accessed by Tf2i+1 in
Em ·E′si · ei · E¯f2i+1 ·Ef2i+2 ; thus, b ∈ DX2i+1 . Since X2i+1 6∈ Dset(Tf2i+2), b cannot be accessed by Tf2i+2 .
Thus, the execution Em ·E′si · ei · E¯f2i+1 ·Ef2i+2 ·Erf2i+2 is indistinguishable to Tf2i+2 from the execution
Eˆi · E′si · Ef2i+2 · Erf2i+2 in which readf2i+2(X2i+2) must return the response v (by construction of E′si).
But we observe now that the execution Em · E′si · ei · E¯f2i+1 · Ef2i+2 · Erf2i+2 is not opaque. In any
serialization corresponding to this execution, Tsi must be committed and must precede Tf2i+1 because
Tf2i+1 read nv from X2i+1. Also, transaction Tf2i+2 must precede Tsi because Tf2i+2 read v from X2i+2.
However Tf2i+1 must precede Tf2i+2 to respect real-time ordering of transactions. Clearly, there exists
no such serialization—contradiction.
Letting E′rf2i+2 be the longest prefix of Erf2i+2 which does not access a base object b ∈M not previously
accessed by Tf2i+2 , we can let αi = E′si · ei · E¯f2i+1 · Ef2i+2 · E′rf2i+2 in this case.
Combining Cases I and II, the following claim holds.
Claim 7.11. For each i ∈ {0, . . . , |Sm|2 − 1},M has an execution of the form Em · αi in which
(1) some fast-path transaction Ti ∈ txns(αi) performs t-reads of m+1 distinct t-objects so that at least
one distinct metadata base object is accessed within the execution of each of the first m t-reads, Ti
is poised to access a distinct metadata base object after Em ·αi during the execution of the (m+1)th
t-read and the execution appears t-sequential to Ti,
(2) the two fast-path transactions in the execution fragment αi do not contend on the same base object.
(Collecting the phases)
We will now describe how we can construct the set Sm+1 of fast-path transactions from these
|Sm|
2 phases
and force each of them to access m+ 1 distinct metadata base objects when running step contention-free
after the same t-complete execution.
For each i ∈ {0, . . . , |Sm|2 −1}, let βi be the subsequence of the execution αi consisting of all the events of
the fast-path transaction that is poised to access a (m+ 1)th distinct metadata base object. Henceforth,
we denote by Ti the fast-path transaction that participates in βi. Then, from Claim 7.11, it follows that,
for each i ∈ {0, . . . , |Sm|2 − 1}, M has an execution of the form Em · E′si · ei · βi in which the fast-path
transaction Ti performs t-reads of m + 1 distinct t-objects so that at least one distinct metadata base
object is accessed within the execution of each of the first m t-reads, Ti is poised to access a distinct
metadata base object after Em ·E′si ·ei ·βi during the execution of the (m+1)th t-read and the execution
appears t-sequential to Ti.
The following result is a corollary to the above claim that is obtained by applying the definition of
“appears t-sequential”. Recall that E′si ·ei is the t-incomplete execution of slow-path transaction Tsi that
accesses t-objects X2i+1 and X2i+2.
Corollary 7.12. For all i ∈ {0, . . . , |(Sm|2 − 1}, M has an execution of the form Em · Ei · βi such that
the configuration after Em · Ei is t-quiescent, txns(Ei) ⊆ {Tsi} and Dset(Tsi) ⊆ {X2i+1, X2i+2} in Ei.
We can represent the execution βi = γi · ρi where fast-path transaction Ti performs complete t-reads of
m distinct t-objects in γi and then performs an incomplete t-read of the (m+1)th t-object in ρi in which
Ti only accesses base objects in
⋃
X∈DSet(Ti)
{X}. Recall that Ti and Tsi do not contend on the same base
object in the execution Em · Ei · γi. Thus, for all i ∈ {0, . . . , |Sm|2 − 1},M has an execution of the form
Em · γi · Ei · ρi.
110
7.4 Instrumentation-optimal progressive HyTM
Observe that the fast-path transaction Ti ∈ γi does not access any t-object that is accessed by any slow-
path transaction in the execution fragment E0 · · ·E |Sm|
2 −1
. By Lemma 7.8, there exists a t-complete step
contention-free execution fragment E′ that is similar to E0 · · ·E |Sm|
2 −1
such that for all i ∈ {0, . . . , |Sm|2 −
1}, M has an execution of the form Em · E′ · γi · ρi. By our construction, the enabled event of each
fast-path transaction Ti ∈ βi in this execution is an access to a distinct metadata base object.
Let Sm+1 denote the set of all fast-path transactions that participate in the execution fragment β0 · · ·β |(Sm|
2 −1
and Em+1 = Em · E′. Thus, |Sm+1| fast-path transactions, each of which run step contention-free from
the same t-quiescent configuration, perform m + 1 t-reads of distinct t-objects so that at least one dis-
tinct metadata base object is accessed within the execution of each t-read operation. This completes the
proof.
7.4 Instrumentation-optimal progressive HyTM
We prove that the lower bound in Theorem 7.9 is tight by describing an ‘instrumentation-optimal” HyTM
implementation (Algorithm 7.1) that is opaque, progressive, provides wait-free TM-liveness, uses invisible
reads.
Base objects. For every t-object Xj , our implementation maintains a base object vj ∈ D that stores
the value of Xj and a metadata base object rj , which is a lock bit that stores 0 or 1.
Fast-path transactions. For a fast-path transaction Tk, the readk(Xj) implementation first reads rj
to check if Xj is locked by a concurrent updating transaction. If so, it returns Ak, else it returns the
value of Xj . Updating fast-path transactions use uninstrumented writes: write(Xj , v) simply stores the
cached state of Xj along with its value v and if the cache has not been invalidated, updates the shared
memory during tryCk by invoking the commit-cache primitive.
Slow-path read-only transactions. Any readk(Xj) invoked by a slow-path transaction first reads
the value of the object from vj , checks if rj is set and then performs value-based validation on its entire
read set to check if any of them have been modified. If either of these conditions is true, the transaction
returns Ak. Otherwise, it returns the value of Xj . A read-only transaction simply returns Ck during the
tryCommit.
Slow-path updating transactions. The writek(X, v) implementation of a slow-path transaction stores
v and the current value of Xj locally, deferring the actual update in shared memory to tryCommit.
During tryCk, an updating slow-path transaction Tk attempts to obtain exclusive write access to its
entire write set as follows: for every t-object Xj ∈ Wset(Tk), it writes 1 to each base object rj by
performing a compare-and-set (cas) primitive that checks if the value of rj is not 1 and, if so, replaces it
with 1. If the cas fails, then Tk releases the locks on all objects X` it had previously acquired by writing
0 to r` and then returns Ak. Intuitively, if the cas fails, some concurrent transaction is performing a
t-write to a t-object in Wset(Tk). If all the locks on the write set were acquired successfully, Tk checks
if any t-object in Rset(Tk) is concurrently being updated by another transaction and then performs
value-based validation of the read set. If a conflict is detected from the these checks, the transaction is
aborted. Finally, tryCk attempts to write the values of the t-objects via cas operations. If any cas on the
individual base objects fails, there must be a concurrent fast-path writer, and so Tk rolls back the state
of the base objects that were updated, releases locks on its write set and returns Ak. The roll backs are
performed with cas operations, skipping any which fail to allow for concurrent fast-path writes to locked
locations. Note that if a concurrent read operation of a fast-path transaction T` finds an “invalid” value
in vj that was written by such transaction Tk but has not been rolled back yet, then T` either incurs a
tracking set abort later because Tk has updated vj or finds rj to be 1. In both cases, the read operation
of T` aborts.
The implementation uses invisible reads (no nontrivial primitives are applied by reading transactions).
Every t-operation returns a matching response within a finite number of its steps.
111
Chapter 7 Hybrid transactional memory (HyTM)
Algorithm 7.1 Progressive opaque HyTM implementation that provides uninstrumented writes and
invisible reads; code for process pi executing transaction Tk
1: Shared objects:
2: vj ∈ D, for each t-object Xj
3: allows reads, writes and cas
4: rj ∈ M, for each t-object Xj
5: allows reads, writes and cas
6: Local objects:
7: Lset(Tk) ⊆Wset(Tk), initially empty
8: Oset(Tk) ⊆Wset(Tk), initially empty
Code for slow-path transactions
9: readk(Xj): // slow-path
10: if Xj 6∈ Rsetk then
11: [ovj , kj ] := read(vj)
12: Rset(Tk) := Rset(Tk) ∪ {Xj , [ovj , kj ]}
13: if rj 6= 0 then
14: Return Ak
15: if ∃Xj ∈ Rset(Tk):(ovj , kj) 6= read(vj) then
16: Return Ak
17: Return ovj
18: else
19: ovj := Rset(Tk).locate(Xj)
20: Return ovj
21: writek(Xj, v): // slow-path
22: (ovj , kj) := read(vj)
23: nvj := v
24: Wset(Tk) := Wset(Tk) ∪ {Xj , [ovj , kj ]}
25: Return ok
26: tryCk(): // slow-path
27: if Wset(Tk) = ∅ then
28: Return Ck
29: locked := acquire(Wset(Tk))
30: if ¬ locked then
31: Return Ak
32: if isAbortable() then
33: release(Lset(Tk))
34: Return Ak
35: for all Xj ∈Wset(Tk) do
36: if vj .cas([ovj , kj ], [nvj , k]) then
37: Oset(Tk) := Oset(Tk) ∪ {Xj}
38: else
39: undo(Oset(Tk))
40: release(Wset(Tk))
41: Return Ck
42: Function: acquire(Q):
43: for all Xj ∈ Q do
44: if rj .cas(0, 1) then
45: Lset(Tk) := Lset(Tk) ∪ {Xj}
46: else
47: release(Lset(Tk))
48: Return false
49: Return true
50: Function: release(Q):
51: for all Xj ∈ Q do
52: rj .write(0)
53: Return ok
54: Function: undo(Oset(Tk)):
55: for all Xj ∈ Oset(Tk) do
56: vj .cas([nvj , k], [ovj , kj ])
57: release(Wset(Tk))
58: Return Ak
59: Function: isAbortable() :
60: if ∃Xj ∈ Rset(Tk): Xj 6∈Wset(Tk)∧read(rj) 6= 0 then
61: Return true
62: if ∃Xj ∈ Rset(Tk):[ovj , kj ] 6= read(vj) then
63: Return true
64: Return false
Code for fast-path transactions
65: readk(Xj): // fast-path
66: [ovj , kj ] := read(vj) // cached read
67: if read(rj) 6= 0 then
68: Return Ak
69: Return ovj
70: writek(Xj, v): // fast-path
71: write(vj , [nvj , k]) // cached write
72: Return ok
73: tryCk(): // fast-path
74: commit-cachei // returns Ck or Ak
Complexity. Every t-read operation performed by a fast-path transaction accesses a metadata base
object once (the lock bit corresponding to the t-object), which is the price to pay for detecting conflicting
updating slow-path transactions. Write operations of fast-path transactions are uninstrumented.
Lemma 7.13. Algorithm 7.1 implements an opaque TM.
Proof. Let E by any execution of Algorithm 7.1. Since opacity is a safety property, it is sufficient to
prove that every finite execution is opaque [18]. Let <E denote a total-order on events in E.
Let H denote a subsequence of E constructed by selecting linearization points of t-operations performed
in E. The linearization point of a t-operation op, denoted as `op is associated with a base object event
or an event performed during the execution of op using the following procedure.
112
7.4 Instrumentation-optimal progressive HyTM
Completions. First, we obtain a completion of E by removing some pending invocations or adding
responses to the remaining pending invocations as follows:
• incomplete readk, writek operation performed by a slow-path transaction Tk is removed from E;
an incomplete tryCk is removed from E if Tk has not performed any write to a base object rj ;
Xj ∈Wset(Tk) in Line 36, otherwise it is completed by including Ck after E.
• every incomplete readk, tryAk, writek and tryCk performed by a fast-path transaction Tk is removed
from E.
Linearization points. Now a linearization H of E is obtained by associating linearization points to
t-operations in the obtained completion of E. For all t-operations performed a slow-path transaction Tk,
linearization points as assigned as follows:
• For every t-read opk that returns a non-Ak value, `opk is chosen as the event in Line 11 of Algo-
rithm 7.1, else, `opk is chosen as invocation event of opk
• For every opk = writek that returns, `opk is chosen as the invocation event of opk
• For every opk = tryCk that returns Ck such that Wset(Tk) 6= ∅, `opk is associated with the first
write to a base object performed by release when invoked in Line 40, else if opk returns Ak, `opk
is associated with the invocation event of opk
• For every opk = tryCk that returns Ck such that Wset(Tk) = ∅, `opk is associated with Line 28
For all t-operations performed a fast-path transaction Tk, linearization points as assigned as follows:
• For every t-read opk that returns a non-Ak value, `opk is chosen as the event in Line 66 of Algo-
rithm 7.1, else, `opk is chosen as invocation event of opk
• For every opk that is a tryCk, `opk is the commit-cachek primitive invoked by Tk
• For every opk that is a writek, `opk is the event in Line 71.
<H denotes a total-order on t-operations in the complete sequential history H.
Serialization points. The serialization of a transaction Tj , denoted as δTj is associated with the
linearization point of a t-operation performed by the transaction.
We obtain a t-complete history H¯ from H as follows. A serialization S is obtained by associating
serialization points to transactions in H¯ as follows: for every transaction Tk in H that is complete, but
not t-complete, we insert tryCk ·Ak immediately after the last event of Tk in H.
• If Tk is an updating transaction that commits, then δTk is `tryCk
• If Tk is a read-only or aborted transaction, then δTk is assigned to the linearization point of the
last t-read that returned a non-Ak value in Tk
<S denotes a total-order on transactions in the t-sequential history S.
Claim 7.14. If Ti ≺H Tj, then Ti <S Tj
Proof. This follows from the fact that for a given transaction, its serialization point is chosen between
the first and last event of the transaction implying if Ti ≺H Tj , then δTi <E δTj implies Ti <S Tj .
Claim 7.15. S is legal.
113
Chapter 7 Hybrid transactional memory (HyTM)
Proof. We claim that for every readj(Xm) → v, there exists some slow-path transaction Ti (or resp.
fast-path) that performs writei(Xm, v) and completes the event in Line 36 (or resp. Line 71) such that
readj(Xm) 6≺RTH writei(Xm, v).
Suppose that Ti is a slow-path transaction: since readj(Xm) returns the response v, the event in Line 11
succeeds the event in Line 36 performed by tryCi. Since readj(Xm) can return a non-abort response
only after Ti writes 0 to rm in Line 52, Ti must be committed in S. Consequently, `tryCi <E `readj(Xm).
Since, for any updating committing transaction Ti, δTi = `tryCi , it follows that δTi <E δTj .
Otherwise if Ti is a fast-path transaction, then clearly Ti is a committed transaction in S. Recall that
readj(Xm) can read v during the event in Line 11 only after Ti applies the commit-cache primitive. By
the assignment of linearization points, `tryCi <E `readj(Xm) and thus, δTi <E `readj(Xm).
Thus, to prove that S is legal, it suffices to show that there does not exist a transaction Tk that returns
Ck in S and performs writek(Xm, v′); v′ 6= v such that Ti <S Tk <S Tj .
Ti and Tk are both updating transactions that commit. Thus,
(Ti <S Tk) ⇐⇒ (δTi <E δTk)
(δTi <E δTk) ⇐⇒ (`tryCi <E `tryCk)
Since, Tj reads the value of X written by Ti, one of the following is true: `tryCi <E `tryCk <E `readj(Xm)
or `tryCi <E `readj(Xm) <E `tryCk .
Suppose that `tryCi <E `tryCk <E `readj(Xm).
(Case I:) Ti and Tk are slow-path transactions.
Thus, Tk returns a response from the event in Line 29 before the read of the base object associated with
Xm by Tj in Line 11. Since Ti and Tk are both committed in E, Tk returns true from the event in
Line 29 only after Ti writes 0 to rm in Line 52.
If Tj is a slow-path transaction, recall that readj(Xm) checks if Xj is locked by a concurrent transaction,
then performs read-validation (Line 13) before returning a matching response. We claim that readj(Xm)
must return Aj in any such execution.
Consider the following possible sequence of events: Tk returns true from acquire function invocation,
updates the value of Xm to shared-memory (Line 36), Tj reads the base object vm associated with
Xm, Tk releases Xm by writing 0 to rm and finally Tj performs the check in Line 13. But in this
case, readj(Xm) is forced to return the value v′ written by Tm— contradiction to the assumption that
readj(Xm) returns v.
Otherwise suppose that Tk acquires exclusive access to Xm by writing 1 to rm and returns true from
the invocation of acquire, updates vm in Line 36), Tj reads vm, Tj performs the check in Line 13 and
finally Tk releases Xm by writing 0 to rm. Again, readj(Xm) must return Aj since Tj reads that rm is
1—contradiction.
A similar argument applies to the case that Tj is a fast-path transaction. Indeed, since every data base
object read by Tj is contained in its tracking set, if any concurrent transaction updates any t-object in
its read set, Tj is aborted immediately by our model(cf. Section 7.2.2).
Thus, `tryCi <E `readj(X) <E `tryCk .
(Case II:) Ti is a slow-path transaction and Tk is a fast-path transaction. Thus, Tk returns Ck before
the read of the base object associated with Xm by Tj in Line 11, but after the response of acquire by
Ti in Line 29. Since readj(Xm) reads the value of Xm to be v and not v′, Ti performs the cas to vm
in Line 36 after the Tk performs the commit-cache primitive (since if otherwise, Tk would be aborted in
E). But then the cas on vm performed by Ti would return false and Ti would return Ai—contradiction.
(Case III:) Tk is a slow-path transaction and Ti is a fast-path transaction. This is analogous to the above
case.
114
7.4 Instrumentation-optimal progressive HyTM
(Case IV:) Ti and Tk are fast-path transactions. Thus, Tk returns Ck before the read of the base object
associated with Xm by Tj in Line 11, but before Ti returns Ci (this follows from Observations 7.1 and
7.2). Consequently, readj(Xm) must read the value of Xm to be v′ and return v′—contradiction.
We now need to prove that δTj indeed precedes `tryCk in E.
Consider the two possible cases:
• Suppose that Tj is a read-only transaction. Then, δTj is assigned to the last t-read performed by
Tj that returns a non-Aj value. If readj(Xm) is not the last t-read that returned a non-Aj value,
then there exists a readj(X ′) such that `readj(Xm) <E `tryCk <E `readj(X′). But then this t-read of
X ′ must abort by performing the checks in Line 13 or incur a tracking set abort—contradiction.
• Suppose that Tj is an updating transaction that commits, then δTj = `tryCj which implies that
`readj(X) <E `tryCk <E `tryCj . Then, Tj must neccesarily perform the checks in Line 32 and
return Aj or incur a tracking set abort—contradiction to the assumption that Tj is a committed
transaction.
The proof follows.
The conjunction of Claims 7.14 and 7.15 establish that Algorithm 7.1 is opaque.
Theorem 7.16. There exists an opaque HyTM implementation that provides uninstrumented writes,
invisible reads, progressiveness and wait-free TM-liveness such that in its every execution E, every read-
only fast-path transaction T ∈ txns(E) accesses O(|Rset(T )|) distinct metadata base objects.
Proof. (Opacity) Follows from Lemma 7.13.
(TM-liveness and TM-progress) Since none of the implementations of the t-operations in Algorithm 7.1
contain unbounded loops or waiting statements, Algorithm 7.1 provides wait-free TM-liveness, i.e., every
t-operation returns a matching response after taking a finite number of steps.
Consider the cases under which a slow-path transaction Tk may be aborted in any execution.
• Suppose that there exists a readk(Xj) performed by Tk that returns Ak from Line 13. Thus, there
exists a transaction that has written 1 to rj in Line 44, but has not yet written 0 to rj in Line 52
or some t-object in Rset(Tk) has been updated since its t-read by Tk. In both cases, there exists a
concurrent transaction performing a t-write to some t-object in Rset(Tk), thus forcing a read-write
conflict.
• Suppose that tryCk performed by Tk that returns Ak from Line 30. Thus, there exists a transaction
that has written 1 to rj in Line 44, but has not yet written 0 to rj in Line 52. Thus, Tk encounters
write-write conflict with another transaction that concurrently attempts to update a t-object in
Wset(Tk).
• Suppose that tryCk performed by Tk that returns Ak from Line 32. Since Tk returns Ak from
Line 32 for the same reason it returns Ak after Line 13, the proof follows.
Consider the cases under which a fast-path transaction Tk may be aborted in any execution E.
• Suppose that a readk(Xm) performed by Tk returns Ak from Line 67. Thus, there exists a con-
current slow-path transaction that is pending in its tryCommit and has written 1 to rm, but not
released the lock on Xm i.e. Tk conflicts with another transaction in E.
• Suppose that Tk returns Ak while performing a cached access of some base object b via a trivial
(and resp. nontrivial) primitive. Indeed, this is possible only if some concurrent transaction writes
(and resp. reads or writes) to b. However, two transactions Tk and Tm may contend on b in E
only if there exists X ∈ Dset(Ti) ∩ Dset(Tj) and X ∈ Wset(Ti) ∪Wset(Tj). from Line 30. The
same argument applies for the case when Tk returns Ak while performing commit-cachek in E.
115
Chapter 7 Hybrid transactional memory (HyTM)
(Complexity) The implementation uses uninstrumented writes since each writek(Xm) simply writes to
vm ∈ DXm and does not access any metadata base object. The complexity of each readk(Xm) is a
single access to a metadata base object rm in Line 67 that is not accessed any other transaction Ti
unless Xm ∈ Dset(Ti). while the tryCk just calls cache-commitk that returns Ck. Thus, each read-only
transaction Tk accesses O(|Rset(Tk)|) distinct metadata base objects in any execution.
7.5 Providing partial concurrency at low cost
Algorithm 7.2 Opaque HyTM implementation with progressive slow-path and sequential fast-path
TM-progress; code for Tk by process pi
1: Shared objects:
2: vj ∈ D, for each t-object Xj
3: allows reads, writes and cas
4: rj ∈ M, for each t-object Xj
5: allows reads, writes and cas
6: fa, fetch-and-add object
Code for slow-path transactions
7: tryCk(): // slow-path
8: if Wset(Tk) = ∅ then
9: Return Ck
10: locked := acquire(Wset(Tk))
11: if ¬ locked then
12: Return Ak
13: fa.add(1)
14: if isAbortable() then
15: release(Lset(Tk))
16: Return Ak
17: for all Xj ∈Wset(Tk) do
18: if vj .cas((ovj , kj), (nvj , k)) then
19: Oset(Tk) := Oset(Tk) ∪ {Xj}
20: else
21: Return undo(Oset(Tk))
22: release(Wset(Tk))
23: Return Ck
24: Function: release(Q):
25: for all Xj ∈ Q do
26: rj .write(0)
27: fa.add(−1)
28: Return ok
Code for fast-path transactions
29: readk(Xj): // fast-path
30: if Rset(Tk) = ∅ then
31: l← read(fa) // cached read
32: if l 6= 0 then
33: Return Ak
34: (ovj , kj) := read(vj) // cached read
35: Return ovj
36: writek(Xj, v): // fast-path
37: vj .write(nvj , k) // cached write
38: Return ok
39: tryCk(): // fast-path
40: commit-cachei // returns Ck or Ak
We showed that allowing fast-path transactions to run concurrently in HyTM results in an instrumen-
tation cost that is proportional to the read-set size of a fast-path transaction. But can we run at least
some transactions concurrently with constant instrumentation cost, while still keeping invisible reads?
Algorithm 7.2 implements a slow-path progressive opaque HyTM with invisible reads and wait-free TM-
liveness. To fast-path transactions, it only provides sequential TM-progress (they are only guaranteed to
commit in the absence of concurrency), but in return the algorithm is only using a single metadata base
object fa that is read once by a fast-path transaction and accessed twice with a fetch-and-add primitive
by an updating slow-path transaction. Thus, the instrumentation cost of the algorithm is constant.
Intuitively, fa allows fast-path transactions to detect the existence of concurrent updating slow-path
transactions. Each time an updating slow-path updating transaction tries to commit, it increments
fa and once all writes to data base objects are completed (this part of the algorithm is identical to
Algorithm 7.1) or the transaction is aborted, it decrements fa. Therefore, fa 6= 0 means that at least
one slow-path updating transaction is incomplete. A fast-path transaction simply checks if fa 6= 0 in the
beginning and aborts if so, otherwise, its code is identical to that in Algorithm 7.1. Note that this way,
any update of fa automatically causes a tracking set abort of any incomplete fast-path transaction.
Theorem 7.17. There exists an opaque HyTM implementation that provides uninstrumented writes,
invisible reads, progressiveness for slow-path transactions, sequential TM-progress for fast-path transac-
tions and wait-free TM-liveness such that in every its execution E, every fast-path transaction accesses
at most one metadata base object.
116
7.6 Related work and Discussion
Proof. The proof of opacity is almost identical to the analogous proof for Algorithm 7.1 in Lemma 7.13.
As with Algorithm 7.1, enumerating the cases under which a slow-path transaction Tk returns Ak proves
that Algorithm 7.2 satisfies progressiveness for slow-path transactions. Any fast-path transaction Tk;
Rset(Tk) 6= ∅ reads the metadata base object fa and adds it to the process’s tracking set (Line 31). If
the value of fa is not 0, indicating that there exists a concurrent slow-path transaction pending in its
tryCommit, Tk returns Ak. Thus, the implementation provides sequential TM-progress for fast-path
transactions.
Also, in every execution E ofM, no fast-path write-only transaction accesses any metadata base object
and a fast-path reading transaction accesses the metadata base object fa exactly once, during the first
t-read.
7.6 Related work and Discussion
HyTM model. Our HyTM model is a natural extension of the model we specified for Software Trans-
actional memory (cf. Chapter 2), and has the advantage of being relatively simple. The term instru-
mentation was originally used in the context of HyTMs [35, 99, 112] to indicate the overhead a hardware
transaction induces in order to detect pending software transactions. The impossibility of designing
HyTMs without any code instrumentation was intuitively suggested in [35], we present a formal proof
in this paper.
In [23], Attiya and Hillel considered the instrumentation cost of privatization, i.e., allowing transactions
to isolate data items by making them private to a process so that no other process is allowed to modify
the privatized item. Just as we capture a tradeoff between the cost of hardware instrumentation and
the amount of concurrency allowed between hardware and software transactions, [23] captures a tradeoff
between the cost of privatization and the number of transactions guaranteed to make progress concur-
rently in `-progressive STMs. The model we consider is fundamentally different to [23], in that we model
hardware transactions at the level of cache coherence, and do not consider non-transactional accesses,
i.e., neither data nor meta-data base objects are private in our HyTM model. The proof techniques we
employ are also different.
Uninstrumented HTMs may be viewed as being disjoint-access parallel (DAP) [24, 86]. As such, some
of the techniques used in the proof of Theorem 7.4 resemble those used in [24, 60, 64].
We have proved that it is impossible to completely forgo instrumentation in a HyTM even if only sequen-
tial TM-progress is required, and that any opaque HyTM implementation providing non-trivial progress
either has to pay a linear number of metadata accesses, or will have to allow slow-path transactions
to abort fast-path operations. The main motivation for our definition of metadata base objects (Def-
inition 7.1) is given by experiments suggesting that the cost of concurrency detection is a significant
bottleneck for many HyTM implementations [102]. To precisely characterize the costs incurred by hard-
ware transactions, we made a distinction between the set of memory locations that store the data values
of the t-objects and the locations that store the metadata information. To the best of our knowledge, all
known HyTM proposals, such as HybridNOrec [35, 112], PhTM [99] and others [37, 88] avoid co-locating
the data and metadata within a single base object.
HyTM algorithms. Circa 2005, several papers introduced HyTM implementations [12, 37, 88] that
integrated HTMs with variants of DSTM [79]. These implementations provide nontrivial concurrency
between hardware and software transactions (progressiveness), by imposing instrumentation on hardware
transactions: every t-read operation incurs at least one extra access to a metadata base object. Our
Theorem 7.9 shows that this overhead is unavoidable. Of note, write operations of these HyTMs are also
instrumented, but our Algorithm 7.1 shows that it is not necessary.
Implementations like PhTM [99] and HybridNOrec [35] overcome the per-access instrumentation cost
of [37, 88] by realizing that if one is prepared to sacrifice progress, hardware transactions need instru-
mentation only at the boundaries of transactions to detect pending software transactions. Inspired by
117
Chapter 7 Hybrid transactional memory (HyTM)
this observation, our HyTM implementation described in Algorithm 7.2 overcomes the linear per-read
instrumentation cost by allowing hardware readers to abort due to a concurrent software writer, but
maintains progressiveness for software transactions, unlike [35, 99, 102].
References [69, 112] provide detailed overviews on HyTM designs and implementations. The software
component of the HyTM algorithms presented in this paper is inspired by progressive STM implementa-
tions [36, 39, 90] and is subject to the lower bounds for progressive STMs established in [23, 62, 64, 90].
118
8
Optimism for boosting concurrency
The wickedness and the foolishness
of no man can avail against the
fond optimism of mankind.
James Branch Cabell-The Silver
Stallion
8.1 Overview
In previous chapters, we were concerned with the inherent complexities of implementing TM. In this
chapter, we are concerned with using TM to derive concurrent implementations and raise a fundamental
question about the ability of the TM abstraction to transform a sequential implementation to a con-
current one. Specifically, does the optimistic nature of TM give it an inherent advantage in exploiting
concurrency that is lacking in pessimistic synchronization techniques like locking? To exploit concur-
rency, conventional lock-based synchronization pessimistically protects accesses to the shared memory
before executing them. Speculative synchronization, achieved using TMs, optimistically executes memory
operations with a risk of aborting them in the future. A programmer typically uses these synchroniza-
tion techniques as “wrappers” to allow every process (or thread) to locally run its sequential code while
ensuring that the resulting concurrent execution is globally correct.
Unfortunately, it is difficult for programmers to tell in advance which of the synchronization techniques
will establish more concurrency in their resulting programs. In this chapter, we analyze the “amount
of concurrency” one can obtain by turning a sequential program into a concurrent one. In particular,
we compare the use of optimistic and pessimistic synchronization techniques, whose prime examples are
TMs and locks respectively.
To fairly compare concurrency provided by implementations based on various techniques, one has (1) to
define what it means for a concurrent program to be correct regardless of the type of synchronization it
uses and (2) to define a metric of concurrency.
Correctness. We begin by defining a consistency criterion, namely locally-serializable linearizability.
We say that a concurrent implementation of a given sequential data type is locally serializable if it ensures
that the local execution of each operation is equivalent to some execution of its sequential implementation.
119
Chapter 8 Optimism for boosting concurrency
This condition is weaker than serializability since it does not require that there exists a single sequential
execution that is consistent with all local executions. It is however sufficient to guarantee that optimistic
executions do not observe an inconsistent transient state that could lead, for example, to a fatal error
like division-by-zero.
Furthermore, the implementation should “make sense” globally, given the sequential type of the data
structure we implement. The high-level history of every execution of a concurrent implementation must
be linearizable [27, 83] with respect to this sequential type. The combination of local serializability and
linearizability gives a correctness criterion that we call LS-linearizability, where LS stands for “locally
serializable”. We show that LS-linearizability is, as the original linearizability, compositional [81, 83]: a
composition of LS-linearizable implementations is also LS-linearizable.
We apply the criterion of LS-linearizability to two broad classes of pessimistic and optimistic synchro-
nization techniques. Pessimistic implementations capture what can be achieved using classic locks; in
contrast, optimistic implementations proceed speculatively and fail to return a response to the process
in the case of conflicts, e.g., relying on transactional memory.
Measuring concurrency. We characterize the amount of concurrency provided by an LS-linearizable
implementation as the set of schedules it accepts. To this end, we define a concurrency metric inspired
by the analysis of parallelism in database concurrency control [74, 126]. More specifically, we assume
an external scheduler that defines which processes execute which steps of the corresponding sequential
program in a dynamic and unpredictable fashion. This allows us to define concurrency provided by
an implementation as the set of schedules (interleavings of steps of concurrent sequential operations) it
accepts (is able to effectively process). Then, the more schedules the implementation would accept, the
more concurrent it would be.
We provide a framework to compare the concurrency one can get by choosing a particular synchronization
technique for a specific data type. For the first time, we analytically capture the inherent concurrency
provided by optimism-based and pessimism-based implementations in exploiting concurrency. We illus-
trate this using a popular sequential list-based set implementation [81], concurrent implementations of
which are our running examples. More precisely, we show that there exist TM-based implementations
that, for some workloads, allow for more concurrency than any pessimistic implementation, but we also
show that there exist pessimistic implementations that, for other workloads, allow for more concurrency
than any TM-based implementation.
Intuitively, an implementation based on transactions may abort an operation based on the way concurrent
steps are scheduled, while a pessimistic implementation has to proceed eagerly without knowing about
how future steps will be scheduled, sometimes over-conservatively rejecting a potentially acceptable
schedule. By contrast, pessimistic implementations designed to exploit the semantics of the data type can
supersede the “semantics-oblivious” TM-based implementations. More surprisingly, we demonstrate that
combining the benefit of pessimistic implementations, namely their semantics awareness, and the benefit
of TMs, namely their optimism, enables implementations that are strictly better-suited for exploiting
concurrency than any of them individually. We describe a generic optimistic implementation of the list-
based set that is optimal with respect to our concurrency metric: we show that, essentially, it accepts
all correct concurrent schedules.
Our results suggest that “relaxed” TM models that are designed with the semantics of the high-level
object in mind might be central to exploiting concurrency.
Roadmap of Chapter 8. In Section 8.2, we introduce the class of optimistic and pessimistic concurrent
implementations we consider in this chapter. Section 8.3 introduces the definition of locally serializable
linearizability and Section 8.4 is devoted to the concurrency analysis of optimistic and pessimistic syn-
chronization techniques in the context of the list-based set. We wrap up with concluding remarks in
Section 8.5.
120
8.2 Concurrent implementations
8.2 Concurrent implementations
Objects and implementations. As with Chapter 2, we assume an asynchronous shared-memory
system in which a set of n > 1 processes p1, . . . , pn communicate by applying operations on shared
objects.
An object is an instance of an abstract data type which specifies a set of operations that provide the
only means to manipulate the object. Recall that an abstract data type τ is a tuple (Φ,Γ, Q, q0, δ) where
Φ is a set of operations, Γ is a set of responses, Q is a set of states, q0 ∈ Q is an initial state and
δ ⊆ Q × Φ × Q × Γ is a transition relation that determines, for each state and each operation, the set
of possible resulting states and produced responses. In this chapter, we consider only types that are
total, i.e., for every q ∈ Q, pi ∈ Φ, there exist q′ ∈ Q and r ∈ Γ such that (q, pi, q′, r) ∈ δ. We assume
that every type τ = (Φ,Γ, Q, q0, δ) is computable, i.e., there exists a Turing machine that, for each input
(q, pi), q ∈ Q, pi ∈ Φ, computes a pair (q′, r) such that (q, pi, q′, r) ∈ δ.
For any type τ , each high-level object Oτ of this type has a sequential implementation. For each operation
pi ∈ Φ, IS specifies a deterministic procedure that performs reads and writes on a collection of objects
X1, . . . , Xm that encode a state of Oτ , and returns a response r ∈ Γ.
Sequential list-based set. As a running example, we consider the sorted linked-list based implemen-
tation of the type set, commonly referred to as the list-based set [81]. Recall that the set type exports
operations insert(v), remove(v) and contains(v), with v ∈ Z. Formally, the set type is defined by the
tuple (Φ,Γ, Q, q0, δ) where:
Φ = {insert(v), remove(v), contains(v)}; v ∈ Z
Γ = {true, false}
Q is the set of all finite subsets of Z; q0 = ∅
δ is defined as follows:
(1): (q, contains(v), q, (v ∈ q))
(2): (q, insert(v), q ∪ {v}, (v 6∈ q))
(3): (q, remove(v), q \ {v}, (v ∈ q))
We consider a sequential implementation LL (Algorithm 8.1) of the set type using a sorted linked list
where each element (or object) stores an integer value, val , and a pointer to its successor, next , so that
elements are sorted in the ascending order of their value.
Every operation invoked with a parameter v traverses the list starting from the head up to the element
storing value v′ ≥ v. If v′ = v, then contains(v) returns true, remove(v) unlinks the corresponding
element and returns true, and insert(v) returns false. Otherwise, contains(v) and remove(v) return false
while insert(v) adds a new element with value v to the list and returns true. The list-based set is denoted
by (LL, set).
Concurrent implementations. We tackle the problem of turning the sequential implementation IS
of type τ into a concurrent one, shared by n processes. The implementation provides the processes with
algorithms for the reads and writes on objects. We refer to the resulting implementation as a concurrent
implementation of (IS , τ). As in Chapter 2, we assume an asynchronous shared-memory system in which
the processes communicate by applying primitives on shared base objects [75]. We place no upper bounds
on the number of versions an object may maintain or on the size of this object.
Throughout this chapter, the term operation refers to some high-level operation of the type, while read-
write operations on objects are referred simply as reads and writes.
An implemented read or write may abort by returning a special response ⊥. In this case, we say that
the corresponding high-level operation is aborted. The ⊥ event is treated both as the response event of
the read or write operation and as the response of the corresponding high-level operation.
121
Chapter 8 Optimism for boosting concurrency
Algorithm 8.1 Sequential implementation LL (sorted linked list) of set type
1: Shared variables:
2: Initially head , tail ,
3: head .val = −∞, tail .val = +∞
4: head .next = tail
5: insert(v):
6: prev ← head // copy the address
7: curr ← read(prev .next) // fetch the next element
8: while (tval ← read(curr .val)) < v do
9: prev ← curr
10: curr ← read(curr .next) // fetch from memory
11: end while
12: if tval 6= v then // tval is stored locally
13: X ← new-node(v, prev .next) // v and address of curr
14: write(prev .next , X) // next points to the new element
15: Return (tval 6= v)
16: remove(v):
17: prev ← head // copy the address
18: curr ← read(prev .next) // fetch next field
19: while (tval ← read(curr .val)) < v do // val local copy
20: prev ← curr
21: curr ← read(curr .next)
22: end while
23: if tval = v then
24: tnext← read(curr .next) // fetch the node after curr
25: write(prev .next , tnext) // delete the node
26: Return (tval = v)
27: contains(v):
28: curr ← head
29: curr ← read(prev .next)
30: while (tval ← read(curr .val)) < v do
31: curr ← read(curr .next)
32: end while
33: Return (tval = v)
Executions and histories. An execution of a concurrent implementation (of (IS , τ)) is a sequence
of invocations and responses of high-level operations of type τ , invocations and responses of read and
write operations, and primitives applied on base-objects. We assume that executions are well-formed :
no process invokes a new read or write, or high-level operation before the previous read or write, or a
high-level operation, resp., returns, or takes steps outside its read or write operation’s interval.
Let α|pi denote the subsequence of an execution α restricted to the events of process pi. Executions α
and α′ are equivalent if for every process pi, α|pi = α′|pi. An operation pi precedes another operation
pi′ in an execution α, denoted pi →α pi′, if the response of pi occurs before the invocation of pi′. Two
operations are concurrent if neither precedes the other. An execution is sequential if it has no concurrent
operations. A sequential execution α is legal if for every object X, every read of X in α returns the
latest written value of X. An operation is complete in α if the invocation event is followed by a matching
(non-⊥) response or aborted; otherwise, it is incomplete in α. Execution α is complete if every operation
is complete in α.
The history exported by an execution α is the subsequence of α reduced to the invocations and responses
of operations, reads and writes, except for the reads and writes that return ⊥.
High-level histories and linearizability. A high-level history H˜ of an execution α is the subsequence
of α consisting of all invocations and responses of (high-level) operations.
Definition 8.1 (Linearizability). A complete high-level history H˜ is linearizable with respect to an object
type τ if there exists a sequential high-level history S equivalent to H˜ such that (1) →H˜⊆→S and (2) S
is consistent with the sequential specification of type τ .
Now a high-level history H˜ is linearizable if it can be completed (by adding matching responses to a
subset of incomplete operations in H˜ and removing the rest) to a linearizable high-level history [27, 83].
122
8.3 Locally serializable linearizability
Obedient implementations. We only consider implementations that satisfy the following condition:
Let α be any complete sequential execution of a concurrent implementation I. Then in every execution
of I of the form α · ρ1 · · · ρk where each ρi (i = 1, . . . , k) is the complete execution of a read, every read
returns the value written by the last write that does not belong to an aborted operation.
Intuitively, this assumption restricts our scope to “obedient” implementations of reads and writes, where
no read value may depend on some future write. In particular, we filter out implementations in which the
complete execution of a high-level operation is performed within the first read or write of its sequential
implementation.
Pessimistic implementations. Informally, a concurrent implementation is pessimistic if the exported
history contains every read-write event that appears in the execution. More precisely, no execution of a
pessimistic implementation includes operations that returned ⊥.
For example, a class of pessimistic implementations are those based on locks. A lock provides shared
or exclusive access to an object X through synchronization primitives lockS(X) (shared mode), lock(X)
(exclusive mode), and unlock(X). When lockS(X) (resp. lock(X)) invoked by a process pi returns, we
say that pi holds a lock on X in shared (resp. exclusive) mode. A process releases the object it holds by
invoking unlock(X). If no process holds a shared or exclusive lock on X, then lock(X) eventually returns;
if no process holds an exclusive lock on X, then lockS(X) eventually returns; and if no process holds a
lock on X forever, then every lock(X) or lockS(X) eventually returns. Given a sequential implementation
of a data type, a corresponding lock-based concurrent one is derived by inserting the synchronization
primitives to provide read-write access to an object.
Optimistic implementations. In contrast with pessimistic ones, optimistic implementations may,
under certain conditions, abort an operation: some read or write may return ⊥, in which case the
corresponding operation also returns ⊥.
Popular classes of optimistic implementations are those based on “lazy synchronization” [71, 81] (with
the ability of returning ⊥ and re-invoking an operation) or transactional memory.
8.3 Locally serializable linearizability
We are now ready to define the correctness criterion that we impose on our concurrent implementations.
Let H be a history and let pi be a high-level operation in H. Then H|pi denotes the subsequence of H
consisting of the events of pi, except for the last aborted read or write, if any. Let IS be a sequential
implementation of an object of type τ and ΣIS , the set of histories of IS .
Definition 8.2 (LS-linearizability). A history H is locally serializable with respect to IS if for every
high-level operation pi in H, there exists S ∈ ΣIS such that H|pi = S|pi. A history H is LS-linearizable
with respect to (IS , τ) (we also write H is (IS, τ)-LSL) if: (1) H is locally serializable with respect to IS
and (2) the corresponding high-level history H˜ is linearizable with respect to τ .
Observe that local serializability stipulates that the execution is witnessed sequential by every operation.
Two different operations (even when invoked by the same process) are not required to witness mutually
consistent sequential executions.
A concurrent implementation I is LS-linearizable with respect to (IS, τ) (we also write I is (IS , τ)-LSL)
if every history exported by I is (IS , τ)-LSL. Throughout this paper, when we refer to a concurrent
implementation of (IS , τ), we assume that it is LS-linearizable with respect to (IS , τ).
LS-linearizability is compositional. Just as linearizability, LS-linearizability is compositional [81, 83]:
a composition of LSL implementations is also LSL. We define the composition of two distinct object types
τ1 and τ2 as a type τ1 × τ2 = (Φ,Γ, Q, q0, δ) as follows: Φ = Φ1 ∪ Φ2, Γ = Γ1 ∪ Γ2,1 Q = Q1 × Q2,
1Here we treat each τi as a distinct type by adding index i to all elements of Φi, Γi, and Qi.
123
Chapter 8 Optimism for boosting concurrency
R(h) W (X1)
insert(2)
1 3
2
4 th
5
insert(2) insert(5)
R(X5)R(X1) R(X3) R(X4)
contains(5)
R(h)
R(h) R(X1)
insert(5)
W (X4)
true
true
true
Figure 8.1: A concurrency scenario for a list-based set, initially {1, 3, 4}, where value i is stored at node Xi:
insert(2) and insert(5) can proceed concurrently with contains(5), the history is LS-linearizable but
not serializable. (We only depict important read-write events here.)
q0 = (q01, q02), and δ ⊆ Q×Φ×Q× Γ is such that ((q1, q2), pi, (q′1q′2), r) ∈ δ if and only if for i ∈ {1, 2},
if pi ∈ Φi then (qi, pi, q′i, r) ∈ δi ∧ q3−i = q′3−i.
Every sequential implementation IS of an object O1×O2 of a composed type τ1×τ2 naturally induces two
sequential implementations IS1 and IS2 of objects O1 and O2, respectively. Now a correctness criterion
Ψ is compositional if for every history H on an object composition O1 × O2, if Ψ holds for H|Oi with
respect to ISi, for i ∈ {1, 2}, then Ψ holds for H with respect to IS = IS1× IS2. Here, H|Oi denotes the
subsequence of H consisting of events on Oi.
Theorem 8.1. LS-linearizability is compositional.
Proof. Let H, a history on O1 × O2, be LS-linearizable with respect to IS . Let each H|Oi, i ∈ {1, 2},
be LS-linearizable with respect to ISi. Without loss of generality, we assume that H is complete (if H is
incomplete, we consider any completion of it containing LS-linearizable completions of H|O1 and H|O1).
Let H˜ be a completion of the high-level history corresponding to H such that H˜|O1 and H˜|O2 are
linearizable with respect to τ1 and τ2, respectively. Since linearizability is compositional [81, 83], H˜ is
linearizable with respect to τ1 × τ2.
Now let, for each operation pi, S1pi and S2pi be any two sequential histories of IS1 and IS2 such that
H|pi|Oj = Sjpi|pi, for j ∈ {1, 2} (since H|O1 and H|O2 are LS-linearizable such histories exist). We
construct a sequential history Spi by interleaving events of S1pi and S2pi so that Spi|Oj = Sjpi, j ∈ {1, 2}.
Since each Sjpi acts on a distinct component Oj of O1 × O2, every such Spi is a sequential history of IS .
We pick one Spi that respects the local history H|pi, which is possible, since H|pi is consistent with both
S1|pi and S2|pi.
Thus, for each pi, we obtain a history of IS that agrees with H|pi. Moreover, the high-level history of H
is linearizable. Thus, H is LS-linearizable with respect to IS .
LS-linearizability versus other criteria. LS-linearizability is a two-level consistency criterion which
makes it suitable to compare concurrent implementations of a sequential data structure, regardless of
synchronization techniques they use. It is quite distinct from related criteria designed for database and
software transactions, such as serializability [109, 125] and multilevel serializability [124, 125].
For example, serializability [109] prevents sequences of reads and writes from conflicting in a cyclic way,
establishing a global order of transactions. Reasoning only at the level of reads and writes may be
overly conservative: higher-level operations may commute even if their reads and writes conflict [123].
Consider an execution of a concurrent list-based set depicted in Figure 8.1. We assume here that the set
initial state is {1, 3, 4}. Operation contains(5) is concurrent, first with operation insert(2) and then with
operation insert(5). The history is not serializable: insert(5) sees the effect of insert(2) because R(X1) by
insert(5) returns the value of X1 that is updated by insert(2) and thus should be serialized after it. But
contains(5) misses element 2 in the linked list, but must see the effect of insert(5) to perform the read
of X5, i.e., the element created by insert(5). However, this history is LSL since each of the three local
histories is consistent with some sequential history of LL.
Multilevel serializability [124, 125] was proposed to reason in terms of multiple semantic levels in the
same execution. LS-linearizability, being defined for two levels only, does not require a global seri-
alization of low-level operations as 2-level serializability does. LS-linearizability simply requires each
124
8.4 Pessimistic vs. optimistic synchronization
process to observe a local serialization, which can be different from one process to another. Also, to
make it more suitable for concurrency analysis of a concrete data structure, instead of semantic-based
commutativity [123], we use the sequential specification of the high-level behavior of the object [83].
Linearizability [27, 83] only accounts for high-level behavior of a data structure, so it does not imply LS-
linearizability. For example, Herlihy’s universal construction [75] provides a linearizable implementation
for any given object type, but does not guarantee that each execution locally appears sequential with
respect to any sequential implementation of the type. Local serializability, by itself, does not require any
synchronization between processes and can be trivially implemented without communication among the
processes. Therefore, the two parts of LS-linearizability indeed complement each other.
8.4 Pessimistic vs. optimistic synchronization
In this section, we compare the relative abilities of optimistic and pessimistic synchronization techniques
to exploit concurrency in the context of the list-based set.
To characterize the ability of a concurrent implementation to process arbitrary interleavings of sequential
code, we introduce the notion of a schedule. Intuitively, a schedule describes the order in which complete
high-level operations, and sequential reads and writes are invoked by the user. More precisely, a schedule
is an equivalence class of complete histories that agree on the order of invocation and response events of
reads, writes and high-level operations, but not necessarily on read values or high-level responses. Thus,
a schedule can be treated as a history, where responses of reads and operations are not specified.
We say that an implementation I accepts a schedule σ if it exports a history H such that complete(H)
exhibits the order of σ, where complete(H) is the subsequence of H that consists of the events of the
complete operations that returned a matching response. We then say that the execution (or history)
exports σ. A schedule σ is (IS , τ)-LSL if there exists an (IS , τ)-LSL history that exports σ.
A synchronization technique is a set of concurrent implementations. We define a specific optimistic
synchronization technique and then a specific pessimistic one.
The class SM. Formally, SM denotes the set of optimistic, safe-strict serializable LSL implementa-
tions.
Let α denote the execution of a concurrent implementation and ops(α), the set of operations each of
which performs at least one event in α. Let αk denote the prefix of α up to the last event of operation
pik. Let Cseq(α) denote the set of subsequences of α that consist of all the events of operations that
are complete in α. We say that α is strictly serializable if there exists a legal sequential execution α′
equivalent to a sequence in σ ∈ Cseq(α) such that →σ⊆→α′ .
We focus on optimistic implementations that are strictly serializable and, in addition, guarantee that
every operation (even aborted or incomplete) observes correct (serial) behavior. More precisely, an
execution α is safe-strict serializable if (1) α is strictly serializable, and (2) for each operation pik, there
exists a legal sequential execution α′ = pi0 · · ·pii · pik and σ ∈ Cseq(αk) such that {pi0, · · · , pii} ⊆ ops(σ)
and ∀pim ∈ ops(α′) : α′|m = αk|m.
Similar to other relaxations of opacity [64] like TMS1 [43] and VWC [85], safe-strict serializable imple-
mentations (SM) require that every transaction (even aborted and incomplete) observes “correct” serial
behavior. Safe-strict serializability captures nicely both local serializability and linearizability. If we
transform a sequential implementation IS of a type τ into a safe-strict serializable concurrent one, we
obtain an LSL implementation of (IS , τ). Thus, the following lemma is immediate.
Lemma 8.2. Let I be a safe-strict serializable implementation of (IS, τ). Then, I is LS-linearizable with
respect to (IS, τ).
125
Chapter 8 Optimism for boosting concurrency
{1, 2, 3}
insert(2)
R(h) R(X1)
R(h) R(X1) R(X2)
(a) σ
insert(1) false insert(1)
R(h)
R(X1)R(h)
R(X1) W (h)
W (h)
true
truefalse
(b) σ′
insert(2)
{3}
Figure 8.2: (a) a history exporting schedule σ, with initial state {1, 2, 3}, accepted by ILP ∈ SM; (b) a history
exporting a problematic schedule σ′, with initial state {3}, which should be accepted by any I ∈ P
if it accepts σ
Indeed, by running each operation of IS within a transaction of a safe-strict serializable TM, we make
sure that completed operations witness the same execution of IS , and every operation that returned ⊥
is consistent with some execution of IS based on previously completed operations.
The class P. This denotes the set of deadlock-free pessimistic LSL implementations: assuming that every
process takes enough steps, at least one of the concurrent operations return a matching response [82].
Note that P includes implementations that are not necessarily safe-strict serializable.
8.4.1 Concurrency analysis
We now provide a concurrency analysis of synchronization techniques SM and P in the context of the
list-based set.
A pessimistic implementation IH ∈ P of (LL, set). We describe a pessimistic implementation of
(LL, set), IH ∈ P, that accepts non-serializable schedules: each read operation performed by contains
acquires the shared lock on the object, reads the next field of the element before releasing the shared
lock on the predecessor element in a hand-over-hand manner [29]. Update operations (insert and remove)
acquire the exclusive lock on the head during read(head) and release it at the end. Every other read
operation performed by update operations simply reads the element next field to traverse the list. The
write operation performed by an insert or a remove acquires the exclusive lock, writes the value to the
element and releases the lock. There is no real concurrency between any two update operations since
the process holds the exclusive lock on the head throughout the operation execution. Thus:
Lemma 8.3. IH is deadlock-free and LSL implementation of (LL, set).
On the one hand, the schedule of (LL, set) depicted in Figure 8.1, which we denote by σ0, is not serializable
and must be rejected by any implementation in SM. However, there exists an execution of IH that
exports σ0 since there is no read-write conflict on any two consecutive elements accessed.
On the other hand, consider the schedule σ of (LL, set) in Figure 8.2(a). Clearly, σ is serializable and is
accepted by implementations based on most progressive TMs since there is no read-write conflict. For ex-
ample, let ILP denote an implementation of (IS , τ) based on the progressive opaque TM implementation
LP in Algorithm 4.1 (Chapter 4). Then, ILP ∈ SM and the schedule σ is accepted by ILP . However,
we prove that σ is not accepted by any implementation in P. Our proof technique is interesting in its
own right: we show that if there exists any implementation in P that accepts σ, it must also accept the
schedule σ′ depicted in Figure 8.2(b). In σ′, insert(2) overwrites the write on head performed by insert(1)
resulting in a lost update. By deadlock-freedom, there exists an extension of σ′ in which a contains(1)
returns false; but this is not a linearizable schedule.
Theorem 8.4. There exists a schedule σ0 of (LL, set) that is accepted by an implementation in PL, but
not accepted by any implementation I ∈ SM.
Proof. Let σ0 be the schedule of (LL, set) depicted in Figure 8.1. Suppose by contradiction that σ0 ∈
S(I), where I is an implementation of (LL, set) based on any safe-strict serializable TM. Thus, there
exists an execution α of I that exports σ0. Now consider two cases: (1) Suppose that the read of X4 by
126
8.4 Pessimistic vs. optimistic synchronization
contains(5) returns the value of X4 that is updated by insert(5). Since insert(2) →α insert(5), insert(2)
must precede insert(5) in any sequential execution α′ equivalent to α. Also, since contains(5) reads X1
prior to its update by insert(2), contains(5) must precede insert(2) in α′. But then the read of X4 is not
legal in α′—a contradiction since α must be serializable. (2) Suppose that contains(5) reads the initial
value of X4, i.e., its value prior to the write to X4 by insert(5), where X4.next points to the tail of the list
(according to our sequential implementation LL). But then, according to LL, contains(5) cannot access
X5 in σ0—a contradiction.
Consider the pessimistic implementation IH ∈ P: since the contains operation traverses the list using
shared hand-over-hand locking, the process pi executing contains(5) can release the lock on element X1
prior to the acquisition of the exclusive lock on X1 by insert(2). Similarly, pi can acquire the shared lock
on X4 immediately after the release of the exclusive lock on X4 by the process executing insert(5) while
still holding the shared lock on element X3. Thus, there exists an execution of IH that exports σ0.
Theorem 8.5. There exists a schedule σ of (LL, set) that is accepted by an implementation in SM, but
not accepted by any implementation in P.
Proof. We show first that the schedule σ of (LL, set) depicted in Figure 8.2(a) is not accepted by any
implementation in P. Suppose the contrary and let σ be exported by an execution α. Here α starts with
three sequential insert operations with parameters 1, 2, and 3. The resulting “state” of the set is {1, 2, 3},
where value i ∈ {1, 2, 3} is stored in object Xi.
Suppose, by contradiction, that some I ∈ P accepts σ. We show that I then accepts the schedule σ′
depicted in Figure 8.2(b), which starts with a sequential execution of insert(3) storing value 3 in object
X1.
Let α′ be any history of I that exports σ′. Recall that we only consider obedient implementations: in
α′: the read of head by insert(2) in σ′ refers to X1 (the next element to be read by insert(2)). In α,
element X1 stores value 1, i.e., insert(1) can safely return false, while in σ′, X1 stores value 3, i.e., the
next step of insert(1) must be a write to head. Thus, no process can distinguish α and α′ before the
read operations on X1 return. Let α′′ be the prefix of α′ ending with R(X1) executed by insert(2).
Since I is deadlock-free, we have an extension of α′′ in which both insert(1) and insert(2) terminate; we
show that this extension violates linearizability. Since I is locally-serializable, to respect our sequential
implementation of (LL, set), both operations should complete the write to head before returning. Let
pi1 = insert(1) be the first operation to write to head in this extended execution. Let pi2 = insert(2) be
the other insert operation. It is clear that pi1 returns true even though pi2 overwrites the update of pi1 on
head and also returns true. Recall that implementations in P are deadlock-free. Thus, we can further
extend the execution with a complete contains(1) that will return false (the element inserted to the list
by pi1 is lost)—a contradiction since I is linearizable with respect to set. Thus, σ /∈ S(I) for any I ∈ P.
On the other hand, the schedule σ is accepted by ILP ∈ SM, since there is no conflict between the two
concurrent update operations.
8.4.2 Concurrency optimality
We now combine the benefits of semantics awareness of implementations in P and the optimism of SM
to derive a generic optimistic implementation of the list-based set that supersedes every implementation
in classes P and SM in terms of concurrency. Our implementation, denoted IRM provides processes
with algorithms for implementing read and write operations on the elements of the list for each operation
of the list-based set (Algorithm 8.2).
Every object (or element) X` is specified by the following shared variables: t-var [`] stores the value
v ∈ V of X`, r[`] stores a boolean indicating if X` is marked for deletion, L[`] stores a tuple of the
version number of X` and a locked flag; the latter indicates whether a concurrent process is performing
a write to X`.
127
Chapter 8 Optimism for boosting concurrency
Algorithm 8.2 Code for process pk implementing reads and writes in implementation IRM
1: Shared variables:
2: for each object X`:
3: t-var [`], initially 0
4: r[`], initially false
5: L[`] ∈ N × {true, false} supports read
6: write, cas operations, initially 〈0, false〉
7: Local variables of process pk:
8: rbuf k[i] ⊂ X ×N ; i = {1, 2} cyclic buffer of size 2,
9: initially ∅
10: readk(X`) executed by insert, remove, contains:
11: 〈ver1 , ∗〉 ← L[`].read() // get versioned lock
12: val← t-var [`].read() // get value
13: r ← r[`].read()
14: 〈ver2 , ∗〉 ← L[`].read() // reget versioned lock
15: if (ver1 6= ver2) ∨ r then
16: Return ⊥
17: rbuf k.add(〈X`, ver1〉) // override penultimate entry
18: Return val
19: writek(X`, v) executed by remove:
20: let oldver` be such that 〈X`, oldver`〉 ∈ rbuf k
21: ver ← oldver`
22: if ¬L[`].cas(〈ver , false〉, 〈ver , true〉) then
23: Return ⊥ // grab lock or abort
24: let X`′ 6= X` be such that {X`′ , ver`′} ∈ rbuf k
25: if ¬L[`′].cas(〈ver`′ , false〉, 〈ver`′ , true〉) then
26: Return ⊥ // grab lock or abort
27: r[`′].write(true) // mark element for deletion
28: t-var [`].write(v) // update memory
29: L[`].write(〈ver + 1, false〉)// release locks
30: L[`′].write(〈ver`′ + 1, false〉)
31: Return ok
32: writek(X`, v) executed by insert:
33: let oldver` be such that 〈X`, oldver`〉 ∈ rbuf k
34: ver ← oldver`
35: if ¬L[`].cas(〈ver , false〉, 〈ver , true〉) then
36: Return ⊥ // grab lock or abort
37: t-var [`].write(v) // update memory
38: L[`].write(〈ver + 1, false〉)// release locks
39: Return ok
Any operation with input parameter v traverses the list starting from the head element up to the element
storing value v′ ≥ v without writing to shared memory. If a read operation on an element conflicts with
a write operation to the same element or if the element is marked for deletion, the operation terminates
by returning ⊥. While traversing the list, the process maintains the last two read elements and their
version numbers in the local rotating buffer rbuf . If none of the read operations performed by contains(v)
return ⊥ and if v′ = v, then contains(v) returns true; otherwise it returns false. Thus, the contains does
not write to shared memory.
To perform write operation to an element as part of an update operation (insert and remove), the process
first retrieves the version of the object that belongs to its rotating buffer. It returns ⊥ if the version has
been changed since the previous read of the element or if a concurrent process is executing a write to
the same element. Note that, technically, ⊥ is returned only if prev .next 6→ curr . If prev .next → curr ,
then we attempt to lock the element with the current version and return ⊥ if there is a concurrent
process executing a write to the same element. But we avoid expanding on this step in our algorithm
pseudocode. The write operation performed by the remove operation, additionally checks if the element
to be removed from the list is locked by another process; if not, it sets a flag on the element to mark it for
deletion. If none of the read or write operations performed during the insert(v) or remove(v) returned ⊥,
128
8.4 Pessimistic vs. optimistic synchronization
appropriate matching responses are returned as prescribed by the sequential implementation LL. Any
update operation of IRM uses at most two expensive synchronization patterns [17].
Proof of LS-linearizability. Let α be an execution of IRM and <α denote the total-order on events in
α. For simplicity, we assume that α starts with an artificial sequential execution of an insert operation
pi0 that inserts tail and sets head .next = tail . Let H be the history exported by α, where all reads and
writes are sequential. We construct H by associating a linearization point `op with each non-aborted
read or write operation op performed in α as follows:
• if op is a read, then performed by process pk, `op is the base-object read in line 12;
• if op is a write within an insert operation, `op is the base-object cas in line 22;
• if op is a write within a remove operation, `op is the base-object cas in line 35.
We say that a read of an element X within an operation pi is valid in H (we also say that X is valid)
if there does not exist any remove operation pi1 that deallocates X (removes X from the list) such that
`pi1.write(X) <α `pi.read(X).
Lemma 8.6. Let pi be any operation performing read(X) followed by read(Y ) in H. Then (1) there
exists an insert operation that sets X.next = Y prior to pi.read(X), and (2) pi.read(X) and pi.read(Y ) are
valid in H.
Proof. Let pi be any operation in IRM that performs read(X) followed by read(Y ). If X and Y are head
and tail respectively, head .next = tail (by assumption). Since no remove operation deallocates the head
or tail, the read of X and Y are valid in H.
Now, let X be the head element and suppose that pi performs read(X) followed by read(Y ); Y 6= tail
in H. Clearly, if pi performs a read(Y ), there exists an operation pi′ = insert that has previously set
head.next = Y . More specifically, pi.read(X) performs the action in line 12 after the write to shared
memory by pi′ in line 37. By the assignment of linearization points to tx-operations, `pi′ <α `pi.read(X).
Thus, there exists an insert operation that sets X.next = Y prior to pi.read(X) in H.
For the second claim, we need to prove that the read(Y ) by pi is valid in H. Suppose by contradiction
that Y has been deallocated by some pi′′ = remove operation prior to read(Y ) by pi. By the rules for
linearization of read and write operations, the action in line 28 precedes the action in line 12. However,
pi proceeds to perform the check in line 15 and returns ⊥ since the flag corresponding to the element Y
is previously set by pi′′. Thus, H does not contain pi.read(Y )—contradiction.
Inductively, by the above arguments, every non-head read by pi is performed on an element previously
created by an insert operation and is valid in H.
Lemma 8.7. H is locally serializable with respect to LL.
Proof. By Lemma 8.6, every element X read within an operation pi is previously created by an insert
operation and is valid in H. Moreover, if the read operation on X returns v′, then X.next stores a pointer
to another valid element that stores an integer value v′′ > v′. Note that the series of reads performed
by pi terminates as soon as an element storing value v or higher is found. Thus, pi performs at most
O(|v − v0|) reads, where v0 is the value of the second element read by pi. Now we construct Spi as a
sequence of insert operations, that insert values read by pi, one by one, followed by pi. By construction,
Spi ∈ ΣLL.
It is sufficient for us to prove that every finite high-level history H of IRM is linearizable. First, we
obtain a completion H˜ of H as follows. The invocation of an incomplete contains operation is discarded.
The invocation of an incomplete pi = insert ∨ remove operation that has not returned successfully from
the write operation is discarded; otherwise, it is completed with response true.
We obtain a sequential high-level history S˜ equivalent to H˜ by associating a linearization point `pi with
each operation pi as follows. For each pi = insert ∨ remove that returns true in H˜, `pi is associated with
the first write performed by pi in H; otherwise `pi is associated with the last read performed by pi in H.
129
Chapter 8 Optimism for boosting concurrency
For pi = contains that returns true, `pi is associated with the last read performed in IRM ; otherwise `pi is
associated with the read of head. Since linearization points are chosen within the intervals of operations
of IRM , for any two operations pii and pij in H˜, if pii →H˜ pij , then pii →S˜ pij .
Lemma 8.8. S˜ is consistent with the sequential specification of type set.
Proof. Let S˜k be the prefix of S˜ consisting of the first k complete operations. We associate each S˜k
with a set qk of objects that were successfully inserted and not subsequently successfully removed in S˜k.
We show by induction on k that the sequence of state transitions in S˜k is consistent with operations’
responses in S˜k with respect to the set type.
The base case k = 1 is trivial: the tail element containing +∞ is successfully inserted. Suppose that S˜k
is consistent with the set type and let pi1 with argument v ∈ Z and response rpi1 be the last operation of
S˜k+1. We want to show that (qk, pi1, qk+1, rpi1) is consistent with the set type.
(1) If pi1 = insert(v) returns true in S˜k+1, there does not exist any other pi2 = insert(v) that returns true
in S˜k+1 such that there does not exist any remove(v) that returns true; pi2 →S˜k+1 remove(v)→S˜k+1
pi1. Suppose by contradiction that such a pi1 and pi2 exist. Every successful insert(v) operation
performs its penultimate read on an element X that stores a value v′ < v and the last read is
performed on an element that stores a value v′′ > v. Clearly, pi1 also performs a write on X. By
construction of S˜, pi1 is linearized at the release of the cas lock on element X. Observe that pi2
must also perform a write to the element X (otherwise one of pi1 or pi2 would return false). By
assumption, the write to X in shared-memory by pi2 (line 37) precedes the corresponding write
to X in shared-memory by pi2. If `pi2 <α `pi1.read(X), then pi1 cannot return true—a contradiction.
Otherwise, if `pi1.read(X) <α `pi2 , then pi1 reaches line 22 and return ⊥. This is because either pi1
attempts to acquire the cas lock on X while it is still held by pi2 or the value of X contained in
the rbuf of the process executing pi1 has changed—a contradiction.
If pi1 = insert(v) returns false in S˜k+1, there exists a pi2 = insert(v) that returns true in S˜k+1 such
that there does not exist any pi3 = remove(v) that returns true; pi2 →S˜k+1 pi3 →S˜k+1 pi1. Suppose
that such a pi2 does not exist. Thus, pi1 must perform its last read on an element that stores value
v′′ > v, perform the action in Line 37 and return true—a contradiction.
It is easy to verify that the conjunction of the above two claims prove that ∀q ∈ Q; ∀v ∈ Z, S˜k+1
satisfies (q, insert(v), q ∪ {v}, (v 6∈ q)).
(2) If pi1 = remove(v), similar arguments as applied to insert(v) prove that ∀q ∈ Q; ∀v ∈ Z, S˜k+1
satisfies (q, remove(v), q \ {v}, (v ∈ q)).
(3) If pi1 = contains(v) returns true in S˜k+1, there exists pi2 = insert(v) that returns true in S˜k+1
such that there does not exist any remove(v) that returns true in S˜k+1 such that pi2 →S˜k+1
remove(v)→S˜k+1 pi1. The proof of this claim immediately follows from Lemma 8.6.
Now, if pi1 = contains(v) returns false in S˜k+1, there does not exist an pi2 = insert(v) that returns
true such that there does not exist any remove(v) that returns true; pi2 →S˜k+1 remove(v) →S˜k+1
contains(v). Suppose by contradiction that such a pi1 and pi2 exist. Thus, the action in line 37 by the
insert(v) operation that updates some element, say X precedes the action in line 12 by contains(v)
that is associated with its first read (the head). We claim that contains(v) must read the element
X ′ newly created by insert(v) and return true—a contradiction to the initial assumption that it
returns false. The only case when this can happen is if there exists a remove operation that forces
X ′ to be unreachable from head i.e. concurrent to the write to X by insert, there exists a remove
that sets X ′′.next to X.next after the action in line 35 by insert. But this is not possible since the
cas on X performed by the remove would return false.
Thus, inductively, the sequence of state transitions in S˜ satisfies the sequential specification of the set
type.
Lemmas 8.7 and 8.8 imply:
130
8.4 Pessimistic vs. optimistic synchronization
Theorem 8.9. IRM is LS-linearizable with respect to (LL, set).
Proof of concurrency optimality. Now we show that IRM supersedes, in terms of concurrency, any
implementation in classes P or SM. The proof is based on a more general optimality result, interesting in
its own right: any finite schedule rejected by IRM is not observably LS-linearizable (or simply observable).
We show that any finite schedule rejected by our algorithm is not observably correct.
A correct schedule σ is observably correct if by completing update operations in σ and extending, for any
v ∈ Z, the resulting schedule with a complete sequential execution contains(v), applied to the resulting
contents of the list, we obtain a correct schedule. Here the contents of the list after a given correct
schedule is determined based on the order of its write operations. For each element, we define the
resulting state of its next field based on the last write in the schedule. Since in a correct schedule,
each new element is first created and then linked to the list, we can reconstruct the state of the list by
iteratively traversing it, starting from head .
Intuitively, a schedule is observably correct if it incurs no “lost updates”. Consider, for example a schedule
(cf. Figure 8.2(b)) in which two operations, insert(1) and insert(2) are applied to the list with state {3}.
The resulting schedule is trivially correct (both operations return true so the schedule can some from a
complete linearizable history). However, in the schedule, one of the operations, say insert(1), overwrites
the effect of the other one. Thus, if we extend the schedule with a complete execution of contains(2),
the only possible response it may give is false which obviously does not produce a linearizable high-level
history.
Theorem 8.10 (Optimality). IRM accepts all schedules that are observable with respect to (LL, set).
Proof. We prove that any schedule rejected by IRM is not observable. We go through the cases when
a read or write returns ⊥ (implying the operation fails to return a matching response) and thus the
current schedule is rejected: (1) read(X`) returns ⊥ in line 16 when r[`] = true or when ver1 6= ver2,
(2) write(X`) performed by remove(v) either returns ⊥ in line 22 when the cas operation on L[`] returns
false or returns ⊥ in line 25 when the cas operation on the element that stores v returns false, and (3)
write(X`) performed by insert returns ⊥ in line 35 when the cas operation on L[`] returns false.
Consider the subcase (1a), r[`] is set true by a preceding or concurrent write(X`) (line 27). The high-level
operation performing this write is a remove that marks the corresponding list element as removed. Since
no removed element can be read in a sequential execution of LL, the corresponding history is not locally
serializable. Alternatively, in subcase (1b), the version of X` read previously in line 11 has changed.
Thus, an update operation has concurrently performed a write to X`. However, there exist executions
that export such schedules.
In case (2), the write performed by a remove operation returns ⊥. In subcase (2a), X` is currently locked.
Thus, a concurrent high-level operation has previously locked X` (by successfully performing L[`].cas()
in line 22) and has not yet released the lock (by writing 〈ver ′, false〉 to L[`] in line 29). In subcase (2b),
the current version of X` (stored in L[`]) differs from the version of X` witnessed by a preceding read.
Thus, a concurrent high-level operation completed a write to X` after the current high-level operation
pi performed a read of X`. In both (2a) and (2b), a concurrent high-level updating operation pi′ (remove
or insert) has written or is about to perform a write to X`. In subcase (2c), the cas on the element
X`′ (element that stores the value v) executed by remove(v) returns false (line 25). Recall that by the
sequential implementation LL, remove(v) performs a read of X`′ prior to the write(X`), where X`.next
refers to X`′ . If the cas on X`′ fails, there exists a process that concurrently performed a write to X`′ ,
but after the read of X`′ by remove(v). In all cases, we observe that if we did not abort the write to X`,
then the schedule extended by a complete execution of contains is not LSL.
In case (3), the write performed by an insert operation returns ⊥. Similar arguments to case (2) prove
that any schedule rejected is not observable LSL.
Theorem 8.10 implies that the schedules exported by the histories in Figures 8.1 and 8.2(a) and that are
not accepted by any I ′ ∈ SM and any I ∈ P, respectively, are indeed accepted by IRM . But it is easy
131
Chapter 8 Optimism for boosting concurrency
to see that implementations in SM and P can only accept observable schedules. As a result, IRM can
be shown to strictly supersede any pessimistic or TM-based implementation of the list-based set.
Corollary 8.11. IRM accepts every schedule accepted by any implementation in P and SM. Moreover,
IRM accepts schedules σ and σ′ that are rejected by any implementation in P and SM, respectively.
8.5 Related work and Discussion
Measuring concurrency. Sets of accepted schedules are commonly used as a metric of concurrency
provided by a shared memory implementation. Gramoli et al. [53] defined a concurrency metric, the
input acceptance, as the ratio of committed transactions over aborted transactions when TM executes
the given schedule. Unlike our metric, input acceptance does not apply to lock-based programs.
For static database transactions, Kung and Papadimitriou [89] use the metric to capture the parallelism
of a locking scheme, While acknowledging that the metric is theoretical, they insist that it may have
“practical significance as well, if the schedulers in question have relatively small scheduling times as
compared with waiting and execution times.” Herlihy [74] employed the metric to compare various
optimistic and pessimistic synchronization techniques using commutativity of operations constituting
high-level transactions. A synchronization technique is implicitly considered in [74] as highly concurrent,
namely “optimal”, if no other technique accepts more schedules. By contrast, we focus here on a dynamic
model where the scheduler cannot use the prior knowledge of all the shared addresses to be accessed.
Also, unlike [74, 89], the results in this chapter require all operations, including aborted ones, to observe
(locally) consistent states.
Concurrency optimality. This chapter shows that “semantics-oblivious” optimistic TM and “semantics-
aware” pessimistic locking are incomparable with respect to exploiting concurrency of the list-based set.
Yet, we have shown how to use the benefits of optimism to derive a concurrency optimal implementation
that is fine-tuned to the semantics of the list-based set. Intuitively, the ability of an implementation to
successfully process interleaving steps of concurrent threads is an appealing property that should be met
by performance gains. We believe this to be so.
In work that is not part of the thesis [56], we confirm experimentally that the concurrency optimal opti-
mistic implementation of the list-based set based on IRM outperforms the state-of-the-art implementa-
tions of the list-based set, namely, the Lazy linked list [71] and the Harris-Michael linked list [70, 105].
Does the claim also hold for other data structures? We suspect so. For example, similar but more
general data structures, such as skip-lists or tree-based dictionaries, may allow for optimizations simi-
lar to proposed in this paper. Our results provides some preliminary hints in the quest for the “right”
synchronization technique to develop highly concurrent and efficient implementations of data types.
132
9
Concluding remarks
Everything has to come to an end,
sometime.
Lyman Frank Baum-The
Marvelous Land of Oz
The inclusion of hardware support for transactions in mainstream CPU’s [1, 107, 111] suggests that
TM is an important concurrency abstraction. However, hardware transactions are not going to be
sufficient to support efficient concurrent programming since they may be aborted spuriously; the fast
but potentially unreliable hardware transactions must be complemented with slower, but more reliable
software transactions. Thus, understanding the inherent cost of both hardware and software transactions
is of both theoretical and practical interest.
Below, we briefly recall the outcomes of the thesis and overview the future research directions.
Safety for TMs. We formalized the semantics of a safe TM: every transaction, including aborted and
incomplete ones, must observe a view that is consistent with some sequential execution. We introduced
the notion of deferred-update semantics which explicitly precludes reading from a transaction that has not
yet invoked tryCommit. We believe that our definition is useful to TM practitioners, since it streamlines
possible implementations of t-read and tryCommit operations.
Complexity of TMs. The cost of the TM abstraction is parametrized by several properties: safety for
transactions, conditions under which transactions must terminate, conditions under which transactions
must commit/abort, bound on the number of versions that can be maintained and a multitude of other
implementation strategies like disjoint-access parallelism and invisible reads.
At a high-level, the complexity bounds presented in the thesis suggest that providing high degrees of con-
currency in software transactional memory (STM) implementations incurs a considerable synchronization
cost. As we show, permissive STMs, while providing the best possible concurrency in theory, require
a strong synchronization primitive (AWAR) or a memory fence (RAW) per read operation, which may
result in excessively slow execution times. Progressive STMs provide only basic concurrency by adapting
to data conflicts, but perform considerably better in this respect: we present progressive implementations
that incur constant RAW/AWAR complexity.
Since Transactional memory was originally proposed as an alternative to locking, early STMs implemen-
tations [52, 79, 101, 117, 120] adopted optimistic concurrency control and guaranteed that a prematurely
133
Chapter 9 Concluding remarks
halted transaction cannot not prevent other transactions from committing. However, popular state-of-
the-art STM implementations like TL2 [39] and NOrec [36] are progressive, providing no non-blocking
progress guarantees for transactions, but perform empirically better than obstruction-free TMs. Com-
plexity lower and upper bounds presented in the thesis explain this performance gap.
Do our results mean that maximizing the ability of processing multiple transactions in parallel or provid-
ing non-blocking progress should not be an important factor in STM design? It would seem so. Should
we rather even focus on speculative “single-lock” solutions á la flat combining [72] or “pessimistic” STMs
in which transactions never abort [5]? Difficult to say affirmatively, but probably not, since our results
suggest progressive STMs incur low complexity overheads as also evidenced by their good empirical
performance on most realistic TM workloads [36, 39].
Several questions yet remain open on the complexity of STMs. For instance, the bounds in the thesis
were derived for the TM-correctness property of strict serializability and its restrictions. But there has
been study of relaxations of strict serializability like snapshot isolation [24, 31]. Verifying if the lower
bounds presented in the thesis hold under such weak TM-correctness properties and extending the proofs
if indeed, presents interesting open questions. The discussion section of Chapters 4, 5 and 6 additionally
list some unresolved questions closely related to the results in the thesis.
One problem of practical need that is not considered in the thesis concerns the interaction of transactional
code with non-transactional code, i.e., the same data item is accessed both transactionally and non-
transactionally. It is expected that code executed within a transaction behave as lock-based code within a
single “global lock” [104, 116] to avoid memory races. Techniques to ensure the safety of non-transactional
accesses have been formulated through the notion of privatization [23, 118]. Devising techniques to ensure
privatization for TMs and understanding the cost of enforcing it is an important research direction.
In the thesis, we assumed that a rmw event is an access to a single base object. However, there have been
proposals to provide implementations with the ability to invoke k-rmw; k ∈ N primitives [21, 42] that
allow accessing up to k base objects in a single atomic event. For example, the k-cas instruction allows to
perform k cas instructions atomically on a vector 〈b1, . . . , bk〉 of base objects: it accepts as input a vector
〈old1, . . . , oldk,new1, . . . ,newk〉 and atomically updates the value of 〈b1, . . . , bk〉 to 〈new1, . . . ,newk〉 and
returns true iff for all i ∈ {1, . . . , k}, old i = new i; otherwise it returns false. However, the ability to
access such k-rmw primitives does not necessarily simplify the design and improve the performance of
non-blocking implementations nor overcome the compositionality issue [21, 42, 81]. Nonetheless, verifying
if the lower bounds presented in the thesis hold in this shared memory model is an interesting problem.
HyTMs. We have introduced an analytical model for hybrid transactional memory that captures the
notion of cached accesses as performed by hardware transactions. We then derived lower and upper
bounds in this model to capture the inherent tradeoff between the degree of concurrency allowed among
hardware and software transactions and the instrumentation overhead introduced on the hardware. In
a nutshell, our results say that it is impossible to completely forgo instrumentation in a sequential
HyTM, and that any opaque HyTM implementation providing non-trivial progress either has to pay
a linear number of metadata accesses, or will have to allow slow-path transactions to abort fast-path
operations.
Our model of HTMs assumed that the hardware resources were bounded, in the sense that, a hardware
transaction may only access a bounded number of data items, exceeding which, it incurs a capacity
abort. To overcome the inherent limitations of bounded HTMs, there have been proposals for “unbounded
HTMs” that allow transactions to commit even if they exceed the hardware resources [12, 68]. The HyTM
model from Chapter 7 can be easily extended to accommodate unbounded HTM designs by disregarding
capacity aborts.
Some papers have investigated alternatives to providing HTMs with an STM fallback, such as sandbox-
ing [4, 33], or employing hardware-accelerated STM [114, 119], and the use of both direct and cached
accesses within the same hardware transaction to reduce instrumentation overhead [88, 112, 113]. An-
other approach proposed reduced hardware transactions [102], where a part of the slow-path is executed
134
using a short fast-path transaction, which allows to partially eliminate instrumentation from the hard-
ware fast-path. Modelling and deriving complexity bounds for HyTM proposals outside the HyTM model
described in the thesis is an interesting future direction.
Relaxed transactional memory. The concurrency lower bounds derived in Chapter 8 illustrated that
a strictly serializable TM, when used as a black-box to transform a sequential implementation of the
list-based set to a concurrent one, is not concurrency-optimal. This is due to the fact that TM detects
conflicts at the level of transactional reads and writes resulting in false conflicts, in the sense that, the
read-write conflict may not affect the correctness of the implemented high-level set type. As we have
shown, we can derive a concurrency optimal optimistic (non-strictly serializable) implementation that
can process every correct schedule of the list-based set. Indeed, several papers have studied “relaxed” TMs
that are fined-tuned to the semantics of the high-level data type [50, 76, 77]. Exploring the complexity
of such relaxed TM models represents a very important future research direction.
135

List of Figures
1.1 Transforming a sequential implementation of the list-based set to a TM-based concurrent
one . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 History H is final-state opaque, while its prefix H ′ is not final-state opaque. . . . . . . . . 35
3.2 An infinite history in which tryC1 is incomplete and any two transactions are concurrent.
Each finite prefix of the history is du-opaque, but the infinite limit of the ever-extending
sequence is not du-opaque. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 A history that is opaque, but not du-opaque. . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 A sequential du-opaque history, which is not opaque by the definition of [59]. . . . . . . . 42
3.5 A history that is du-VWC, but not du-opaque. . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 A history which is du-VWC but not du-TMS1. . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7 A history which is du-TMS1 but not du-VWC. . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8 A history that is du-opaque, but not TMS2 [43]. . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Executions in the proof of Lemma 4.1; By weak DAP, Tφ cannot distinguish this from the
execution in Figure 4.1a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Execution E of a permissive, opaque TM: T2 and T3 force T1 to perform a RAW/AWAR in each
R1(Xk), 2 ≤ k ≤ m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.1 Executions in the proof of Theorem 5.1; execution in 5.1d is not strictly serializable . . . 74
5.2 Executions in the proof of Theorem 5.4; execution in 5.2e is not opaque . . . . . . . . . . 79
5.3 Complexity gap between blocking and non-blocking TMs . . . . . . . . . . . . . . . . . . . 87
6.1 Executions in the proof of Theorem 6.1; execution in 6.1a must maintain c distinct values
of every t-object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.2 Executions in the proof of Theorem 6.2; execution in 6.2c is not strictly serializable . . . 94
6.3 Executions in the proof of Theorem 6.3; execution in 6.3c is not strictly serializable . . . 96
7.1 Tracking set aborts in fast-path transactions; we denote a fast-path (and resp. slow-path)
transaction by F (and resp. S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2 Execution E in Figure 7.2a is indistinguishable to T1 from the execution E′ in Figure 7.2b 103
7.3 Executions in the proof of Theorem 7.4; execution in 7.3d is not strictly serializable . . . 104
137
LIST OF FIGURES
8.1 A concurrency scenario for a list-based set, initially {1, 3, 4}, where value i is stored at node Xi:
insert(2) and insert(5) can proceed concurrently with contains(5), the history is LS-linearizable
but not serializable. (We only depict important read-write events here.) . . . . . . . . . . . . . 124
8.2 (a) a history exporting schedule σ, with initial state {1, 2, 3}, accepted by ILP ∈ SM; (b) a
history exporting a problematic schedule σ′, with initial state {3}, which should be accepted by
any I ∈ P if it accepts σ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
138
List of Tables
3.1 Relations between TM consistency definitions. . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 Complexity bounds for progressive TMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Complexity bounds for strongly progressive TMs. . . . . . . . . . . . . . . . . . . . . . . . 71
139

10
Bibliography
[1] Advanced Synchronization Facility Proposed Architectural Specification, March 2009. http://
developer.amd.com/wordpress/media/2013/09/45432-ASF_Spec_2.1.pdf.
[2] Transactional Memory in GCC. 2012.
[3] S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer,
29(12):66–76, 1996.
[4] Y. Afek, A. Levy, and A. Morrison. Software-improved hardware lock elision. In PODC. ACM,
2014.
[5] Y. Afek, A. Matveev, and N. Shavit. Pessimistic software lock-elision. In Proceedings of the 26th
International Conference on Distributed Computing, DISC’12, pages 297–311, Berlin, Heidelberg,
2012. Springer-Verlag.
[6] M. K. Aguilera, S. Frølund, V. Hadzilacos, S. L. Horn, and S. Toueg. Abortable and query-abortable
objects and their efficient implementation. In PODC, pages 23–32, 2007.
[7] D. Alistarh, P. Eugster, M. Herlihy, A. Matveev, and N. Shavit. Stacktrack: An automated
transactional approach to concurrent memory reclamation. In Proceedings of the Ninth European
Conference on Computer Systems, EuroSys ’14, pages 25:1–25:14, New York, NY, USA, 2014.
ACM.
[8] D. Alistarh, J. Kopinsky, P. Kuznetsov, S. Ravi, and N. Shavit. Inherent limitations of hybrid
transactional memory. CoRR, abs/1405.5689, 2014.
[9] D. Alistarh, J. Kopinsky, P. Kuznetsov, S. Ravi, and N. Shavit. Inherent limitations of hybrid
transactional memory. CoRR, abs/1405.5689, 2014. To appear in 29th International Symposium
on Distributed Computing (DISC’15), Japan.
[10] D. Alistarh, J. Kopinsky, P. Kuznetsov, S. Ravi, and N. Shavit. Inherent limitations of hybrid
transactional memory. 6th Workshop on the Theory of Transactional Memory, Paris, France,
2014.
[11] B. Alpern and F. B. Schneider. Defining liveness. Inf. Process. Lett., 21(4):181–185, Oct. 1985.
[12] C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. Unbounded transactional
memory. In Proceedings of the 11th International Symposium on High-Performance Computer
Architecture, HPCA ’05, pages 316–327, Washington, DC, USA, 2005. IEEE Computer Society.
141
Chapter 10 Bibliography
[13] J. H. Anderson and M. Moir. Universal constructions for multi-object operations. In Proceedings
of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing, PODC ’95,
pages 184–193, New York, NY, USA, 1995. ACM.
[14] T. E. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors.
IEEE Trans. Parallel Distrib. Syst., 1(1):6–16, 1990.
[15] H. Attiya, A. Gotsman, S. Hans, and N. Rinetzky. Safety of live transactions in transactional
memory: TMS is necessary and sufficient. In DISC, pages 376–390, 2014.
[16] H. Attiya, R. Guerraoui, D. Hendler, and P. Kuznetsov. The complexity of obstruction-free imple-
mentations. J. ACM, 56(4), 2009.
[17] H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. Michael, and M. Vechev. Laws of order:
Expensive synchronization in concurrent algorithms cannot be eliminated. In POPL, pages 487–
498, 2011.
[18] H. Attiya, S. Hans, P. Kuznetsov, and S. Ravi. Safety of deferred update in transactional memory.
2013 IEEE 33rd International Conference on Distributed Computing Systems, 0:601–610, 2013.
[19] H. Attiya, S. Hans, P. Kuznetsov, and S. Ravi. Safety of deferred update in transactional memory.
CoRR, abs/1301.6297, 2013.
[20] H. Attiya, S. Hans, P. Kuznetsov, and S. Ravi. Safety and deferred update in transactional mem-
ory. In R. Guerraoui and P. Romano, editors, Transactional Memory. Foundations, Algorithms,
Tools, and Applications, volume 8913 of Lecture Notes in Computer Science, pages 50–71. Springer
International Publishing, 2015.
[21] H. Attiya and D. Hendler. Time and space lower bounds for implementations using k-cas. Parallel
and Distributed Systems, IEEE Transactions on, 21(2):162 –173, feb. 2010.
[22] H. Attiya, D. Hendler, and P. Woelfel. Tight rmr lower bounds for mutual exclusion and other
problems. In Proceedings of the Twenty-seventh ACM Symposium on Principles of Distributed
Computing, PODC ’08, pages 447–447, New York, NY, USA, 2008. ACM.
[23] H. Attiya and E. Hillel. The cost of privatization in software transactional memory. IEEE Trans.
Computers, 62(12):2531–2543, 2013.
[24] H. Attiya, E. Hillel, and A. Milani. Inherent limitations on disjoint-access parallel implementations
of transactional memory. Theory of Computing Systems, 49(4):698–719, 2011.
[25] H. Attiya and A. Milani. Transactional scheduling for read-dominated workloads. In Proceedings of
the 13th International Conference on Principles of Distributed Systems, OPODIS ’09, pages 3–17,
Berlin, Heidelberg, 2009. Springer-Verlag.
[26] H. Attiya, G. Ramalingam, and N. Rinetzky. Sequential verification of serializability. In Proceedings
of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages,
pages 31–42, 2010.
[27] H. Attiya and J. Welch. Distributed Computing. Fundamentals, Simulations, and Advanced Topics.
John Wiley & Sons, 2004.
[28] G. Barnes. A method for implementing lock-free shared-data structures. In Proceedings of the Fifth
Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’93, pages 261–270, New
York, NY, USA, 1993. ACM.
[29] R. Bayer and M. Schkolnick. Concurrency of operations on B-trees. In Readings in database
systems, pages 129–139. Morgan Kaufmann Publishers Inc., 1988.
[30] E. Bruno. What Is Priority Inversion (And How Do You Control It)? 2011.
[31] V. Bushkov, D. Dziuma, P. Fatourou, and R. Guerraoui. The pcl theorem: Transactions cannot
be parallel, consistent and live. In SPAA, pages 178–187, 2014.
142
[32] V. Bushkov, R. Guerraoui, and M. Kapalka. On the liveness of transactional memory. In Proceed-
ings of the 2012 ACM Symposium on Principles of Distributed Computing, PODC ’12, pages 9–18,
New York, NY, USA, 2012. ACM.
[33] I. Calciu, T. Shpeisman, G. Pokam, and M. Herlihy. Improved single global lock fallback for
best-effort hardware transactional memory. In Transact 2014 Workshop. ACM, 2014.
[34] T. Crain, D. Imbs, and M. Raynal. Read invisibility, virtual world consistency and permissiveness
are compatible. Research Report, ASAP - INRIA - IRISA - CNRS : UMR6074 - INRIA - Institut
National des Sciences Appliquées de Rennes - Université de Rennes I, 11 2010.
[35] L. Dalessandro, F. Carouge, S. White, Y. Lev, M. Moir, M. L. Scott, and M. F. Spear. Hybrid
NOrec: a case study in the effectiveness of best effort hardware transactional memory. In R. Gupta
and T. C. Mowry, editors, ASPLOS, pages 39–52. ACM, 2011.
[36] L. Dalessandro, M. F. Spear, and M. L. Scott. Norec: Streamlining stm by abolishing ownership
records. SIGPLAN Not., 45(5):67–78, Jan. 2010.
[37] P. Damron, A. Fedorova, Y. Lev, V. Luchangco, M. Moir, and D. Nussbaum. Hybrid transactional
memory. SIGPLAN Not., 41(11):336–346, Oct. 2006.
[38] D. Dice, Y. Lev, M. Moir, and D. Nussbaum. Early experience with a commercial hardware
transactional memory implementation. In Proceedings of the 14th International Conference on
Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pages
157–168, New York, NY, USA, 2009. ACM.
[39] D. Dice, O. Shalev, and N. Shavit. Transactional locking ii. In Proceedings of the 20th International
Conference on Distributed Computing, DISC’06, pages 194–208, Berlin, Heidelberg, 2006. Springer-
Verlag.
[40] D. Dice and N. Shavit. What really makes transactions fast? In Transact, 2006.
[41] E. W. Dijkstra. Solution of a problem in concurrent programming control. Commun. ACM,
8(9):569–, Sept. 1965.
[42] S. Doherty, D. L. Detlefs, L. Groves, C. H. Flood, V. Luchangco, P. A. Martin, M. Moir, N. Shavit,
and G. L. Steele, Jr. Dcas is not a silver bullet for nonblocking algorithm design. In Proceedings
of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA
’04, pages 216–224, New York, NY, USA, 2004. ACM.
[43] S. Doherty, L. Groves, V. Luchangco, and M. Moir. Towards formally specifying and verifying
transactional memory. Formal Asp. Comput., 25(5):769–799, 2013.
[44] A. Dragojević, M. Herlihy, Y. Lev, and M. Moir. On the power of hardware transactional memory
to simplify memory management. In Proceedings of the 30th Annual ACM SIGACT-SIGOPS
Symposium on Principles of Distributed Computing, PODC ’11, pages 99–108, New York, NY,
USA, 2011. ACM.
[45] F. Ellen, P. Fatourou, E. Kosmas, A. Milani, and C. Travers. Universal constructions that ensure
disjoint-access parallelism and wait-freedom. In PODC, pages 115–124, 2012.
[46] F. Ellen, D. Hendler, and N. Shavit. On the inherent sequentiality of concurrent objects. SIAM
J. Comput., 41(3):519–536, 2012.
[47] R. Ennals. The lightweight transaction library. http://sourceforge.net/projects/libltx/files/.
[48] R. Ennals. Software transactional memory should not be obstruction-free. 2005.
[49] P. Fatourou and N. D. Kallimanis. A highly-efficient wait-free universal construction. In Proceedings
of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures,
SPAA ’11, pages 325–334, New York, NY, USA, 2011. ACM.
[50] P. Felber, V. Gramoli, and R. Guerraoui. Elastic transactions. In DISC, pages 93–107, 2009.
143
Chapter 10 Bibliography
[51] F. Fich, D. Hendler, and N. Shavit. On the inherent weakness of conditional synchronization prim-
itives. In Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed
Computing, PODC ’04, pages 80–87, New York, NY, USA, 2004. ACM.
[52] K. Fraser. Practical lock-freedom. Technical report, Cambridge University Computer Laborotory,
2003.
[53] V. Gramoli, D. Harmanci, and P. Felber. On the input acceptance of transactional memory. Parallel
Processing Letters, 20(1):31–50, 2010.
[54] V. Gramoli, P. Kuznetsov, and S. Ravi. From sequential to concurrent: correctness and relative
efficiency (ba). In Principles of Distributed Computing (PODC), pages 241–242, 2012.
[55] V. Gramoli, P. Kuznetsov, and S. Ravi. Optimism for boosting concurrency. CoRR, abs/1203.4751,
2012.
[56] V. Gramoli, P. Kuznetsov, S. Ravi, and D. Shang. A concurrency-optimal list-based set. CoRR,
abs/1502.01633, 2015.
[57] V. Gramoli, P. Kuznetsov, S. Ravi, and D. Shang. A concurrency-optimal list-based set (ba). CoRR,
abs/1502.01633, 2015. To appear in 29th International Symposium on Distributed Computing
(DISC’15).
[58] J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 1st edition, 1992.
[59] R. Guerraoui, T. A. Henzinger, and V. Singh. Permissiveness in transactional memories. In DISC,
pages 305–319, 2008.
[60] R. Guerraoui and M. Kapalka. On obstruction-free transactions. In Proceedings of the twentieth
annual symposium on Parallelism in algorithms and architectures, SPAA ’08, pages 304–313, New
York, NY, USA, 2008. ACM.
[61] R. Guerraoui and M. Kapalka. On the correctness of transactional memory. In Proceedings of the
13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’08,
pages 175–184, New York, NY, USA, 2008. ACM.
[62] R. Guerraoui and M. Kapalka. The semantics of progress in lock-based transactional memory.
SIGPLAN Not., 44(1):404–415, Jan. 2009.
[63] R. Guerraoui and M. Kapalka. Transactional memory: Glimmer of a theory. In Proceedings of
the 21st International Conference on Computer Aided Verification, CAV ’09, pages 1–15, Berlin,
Heidelberg, 2009. Springer-Verlag.
[64] R. Guerraoui and M. Kapalka. Principles of Transactional Memory, Synthesis Lectures on Dis-
tributed Computing Theory. Morgan and Claypool, 2010.
[65] R. Guerraoui, M. Kapalka, and J. Vitek. Stmbench7: A benchmark for software transactional
memory. SIGOPS Oper. Syst. Rev., 41(3):315–324, Mar. 2007.
[66] R. Guerraoui and E. Ruppert. Linearizability is not always a safety property. In NETYS, pages
57–69, 2014.
[67] P. K. Hagit Attiya, Sandeep Hans and S. Ravi. What is safe in transactional memory. 4th Workshop
on the Theory of Transactional Memory, Madeira, Portugal, 2012.
[68] L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu,
H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency.
SIGARCH Comput. Archit. News, 32(2):102–, Mar. 2004.
[69] T. Harris, J. R. Larus, and R. Rajwar. Transactional Memory, 2nd edition. Synthesis Lectures on
Computer Architecture. Morgan & Claypool Publishers, 2010.
[70] T. L. Harris. A pragmatic implementation of non-blocking linked-lists. In DISC, pages 300–314,
2001.
144
[71] S. Heller, M. Herlihy, V. Luchangco, M. Moir, W. N. Scherer, and N. Shavit. A lazy concurrent
list-based set algorithm. In OPODIS, pages 3–16, 2006.
[72] D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism
tradeoff. In SPAA, pages 355–364, 2010.
[73] J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan
Kaufmann Publishers Inc., San Francisco, CA, USA, 3 edition, 2003.
[74] M. Herlihy. Apologizing versus asking permission: optimistic concurrency control for abstract data
types. ACM Trans. Database Syst., 15(1):96–124, 1990.
[75] M. Herlihy. Wait-free synchronization. ACM Trans. Prog. Lang. Syst., 13(1):123–149, 1991.
[76] M. Herlihy and E. Koskinen. Transactional boosting: A methodology for highly-concurrent trans-
actional objects. In PPoPP, New York, NY, USA, 2008. ACM.
[77] M. Herlihy and E. Koskinen. Composable transactional objects: A position paper. In Z. Shao,
editor, Programming Languages and Systems, volume 8410 of Lecture Notes in Computer Science,
pages 1–7. Springer Berlin Heidelberg, 2014.
[78] M. Herlihy, V. Luchangco, and M. Moir. Obstruction-free synchronization: Double-ended queues
as an example. In ICDCS, pages 522–529, 2003.
[79] M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Software transactional memory
for dynamic-sized data structures. In Proceedings of the Twenty-second Annual Symposium on
Principles of Distributed Computing, PODC ’03, pages 92–101, New York, NY, USA, 2003. ACM.
[80] M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data
structures. In ISCA, pages 289–300, 1993.
[81] M. Herlihy and N. Shavit. The art of multiprocessor programming. Morgan Kaufmann, 2008.
[82] M. Herlihy and N. Shavit. On the nature of progress. In OPODIS, pages 313–328, 2011.
[83] M. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM
Trans. Program. Lang. Syst., 12(3):463–492, 1990.
[84] L. Hyonho. Local-spin mutual exclusion algorithms on the DSM model using fetch-and-store
objects. 2003.
[85] D. Imbs and M. Raynal. Virtual world consistency: A condition for STM systems (with a versatile
protocol with invisible read operations). Theor. Comput. Sci., 444, July 2012.
[86] A. Israeli and L. Rappoport. Disjoint-access-parallel implementations of strong shared memory
primitives. In PODC, pages 151–160, 1994.
[87] D. König. Theorie der Endlichen und Unendlichen Graphen: Kombinatorische Topologie der
Streckenkomplexe. Akad. Verlag. 1936.
[88] S. Kumar, M. Chu, C. J. Hughes, P. Kundu, and A. Nguyen. Hybrid transactional memory. In
Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming, PPoPP ’06, pages 209–220, New York, NY, USA, 2006. ACM.
[89] H. T. Kung and C. H. Papadimitriou. An optimality theory of concurrency control for databases.
In SIGMOD, pages 116–126, 1979.
[90] P. Kuznetsov and S. Ravi. On the cost of concurrency in transactional memory. In OPODIS, pages
112–127, 2011.
[91] P. Kuznetsov and S. Ravi. On the cost of concurrency in transactional memory. CoRR,
abs/1103.1302, 2011.
[92] P. Kuznetsov and S. Ravi. On partial wait-freedom in transactional memory. CoRR, abs/1407.6876,
2014.
145
Chapter 10 Bibliography
[93] P. Kuznetsov and S. Ravi. On partial wait-freedom in transactional memory. In Proceedings of
the 2015 International Conference on Distributed Computing and Networking, ICDCN 2015, Goa,
India, January 4-7, 2015, page 10, 2015.
[94] P. Kuznetsov and S. Ravi. Progressive transactional memory in time and space. CoRR,
abs/1502.04908, 2015.
[95] P. Kuznetsov and S. Ravi. Progressive transactional memory in time and space. CoRR,
abs/1502.04908, 2015. To appear in 13th International Conference on Parallel Computing Tech-
nologies, Russia.
[96] P. Kuznetsov and S. Ravi. Why transactional memory should not be obstruction-free. CoRR,
abs/1502.02725, 2015.
[97] P. Kuznetsov and S. Ravi. Why transactional memory should not be obstruction-free. CoRR,
abs/1502.02725, 2015. To appear in 29th International Symposium on Distributed Computing
(DISC’15), Japan.
[98] M. Lesani, V. Luchangco, and M. Moir. Putting opacity in its place. In WTTM, 2012.
[99] Y. Lev, M. Moir, and D. Nussbaum. Phtm: Phased transactional memory. In In Work-
shop on Transactional Computing (Transact), 2007. research.sun.com/scalable/pubs/ TRANS-
ACT2007PhTM.pdf.
[100] N. A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
[101] V. J. Marathe, W. N. S. Iii, and M. L. Scott. Adaptive software transactional memory. In In Proc.
of the 19th Intl. Symp. on Distributed Computing, pages 354–368, 2005.
[102] A. Matveev and N. Shavit. Reduced hardware transactions: a new approach to hybrid transactional
memory. In Proceedings of the 25th ACM symposium on Parallelism in algorithms and architectures,
pages 11–22. ACM, 2013.
[103] P. E. McKenney. Memory barriers: a hardware view for software hackers. Linux Technology
Center, IBM Beaverton, June 2010.
[104] V. Menon, S. Balensiefer, T. Shpeisman, A.-R. Adl-Tabatabai, R. L. Hudson, B. Saha, and A. Welc.
Single global lock semantics in a weakly atomic stm. SIGPLAN Not., 43(5):15–26, May 2008.
[105] M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In SPAA,
pages 73–82, 2002.
[106] M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent
queue algorithms. In PODC, pages 267–275, 1996.
[107] M. Ohmacht. Memory Speculation of the Blue Gene/Q Compute Chip, 2011. http://wands.cse.
lehigh.edu/IBM_BQC_PACT2011.ppt.
[108] S. S. Owicki and L. Lamport. Proving liveness properties of concurrent programs. ACM Trans.
Program. Lang. Syst., 4(3):455–495, 1982.
[109] C. H. Papadimitriou. The serializability of concurrent database updates. J. ACM, 26:631–653,
1979.
[110] D. Perelman, R. Fan, and I. Keidar. On maintaining multiple versions in STM. In PODC, pages
16–25, 2010.
[111] J. Reinders. Transactional Synchronization in Haswell, 2012. http://software.intel.com/
en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/.
[112] T. Riegel. Software Transactional Memory Building Blocks. 2013.
[113] T. Riegel, P. Marlier, M. Nowack, P. Felber, and C. Fetzer. Optimizing hybrid transactional
memory: The importance of nonspeculative operations. In Proceedings of the 23rd ACM Symposium
on Parallelism in Algorithms and Architectures, pages 53–64. ACM, 2011.
146
[114] B. Saha, A.-R. Adl-Tabatabai, and Q. Jacobson. Architectural support for software transactional
memory. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchi-
tecture, MICRO 39, pages 185–196, Washington, DC, USA, 2006. IEEE Computer Society.
[115] W. N. Scherer, III and M. L. Scott. Advanced contention management for dynamic software
transactional memory. In Proceedings of the Twenty-fourth Annual ACM Symposium on Principles
of Distributed Computing, PODC ’05, pages 240–248, New York, NY, USA, 2005. ACM.
[116] M. L. Scott. Shared-memory Synchronization, Synthesis Lectures on Distributed Computing The-
ory. Morgan and Claypool, 2013.
[117] N. Shavit and D. Touitou. Software transactional memory. In PODC, pages 204–213, 1995.
[118] M. F. Spear, V. J. Marathe, L. Dalessandro, and M. L. Scott. Privatization techniques for software
transactional memory. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles
of Distributed Computing, PODC ’07, pages 338–339, New York, NY, USA, 2007. ACM.
[119] M. F. Spear, A. Shriraman, L. Dalessandro, S. Dwarkadas, and M. L. Scott. Nonblocking trans-
actions without indirection using alert-on-update. In Proceedings of the Nineteenth Annual ACM
Symposium on Parallel Algorithms and Architectures, SPAA ’07, pages 210–220, New York, NY,
USA, 2007. ACM.
[120] F. Tabba, M. Moir, J. R. Goodman, A. W. Hay, and C. Wang. Nztm: Nonblocking zero-indirection
transactional memory. In Proceedings of the Twenty-first Annual Symposium on Parallelism in
Algorithms and Architectures, SPAA ’09, pages 204–213, New York, NY, USA, 2009. ACM.
[121] G. Taubenfeld. The black-white bakery algorithm and related bounded-space, adaptive, local-
spinning and fifo algorithms. In DISC ’04: Proceedings of the 23rd International Symposum on
Distributed Computing, 2004.
[122] P. K. Vincent Gramoli and S. Ravi. Sharing a sequential data structure: correctness definition and
concurrency analysis. 4th Workshop on the Theory of Transactional Memory, Madeira, Portugal,
2012.
[123] W. E. Weihl. Commutativity-based concurrency control for abstract data types. IEEE Trans.
Comput., 37(12):1488–1505, 1988.
[124] G. Weikum. A theoretical foundation of multi-level concurrency control. In PODS, pages 31–43,
1986.
[125] G. Weikum and G. Vossen. Transactional Information Systems: Theory, Algorithms, and the
Practice of Concurrency Control and Recovery. Morgan Kaufmann, 2002.
[126] M. Yannakakis. Serializability by locking. J. ACM, 31(2):227–244, 1984.
147

Papers
The content of the thesis is based on the following tech reports and publications.
Tech reports
P. Kuznetsov and S. Ravi. Progressive transactional memory in time and space. CoRR,
abs/1502.04908, 2015.
P. Kuznetsov and S. Ravi. Why transactional memory should not be obstruction-free. CoRR,
abs/1502.02725, 2015.
D. Alistarh, J. Kopinsky, P. Kuznetsov, S. Ravi, and N. Shavit. Inherent limitations of
hybrid transactional memory. CoRR, abs/1405.5689, 2014.
V. Gramoli, P. Kuznetsov, and S. Ravi. Optimism for boosting concurrency. CoRR,
abs/1203.4751, 2012.
P. Kuznetsov and S. Ravi. On partial wait-freedom in transactional memory. CoRR,
abs/1407.6876, 2014.
H. Attiya, S. Hans, P. Kuznetsov, and S. Ravi. Safety of deferred update in transactional
memory. CoRR, abs/1301.6297, 2013.
P. Kuznetsov and S. Ravi. On the cost of concurrency in transactional memory. CoRR,
abs/1103.1302, 2011.
Publications
P. Kuznetsov and S. Ravi. Progressive transactional memory in time and space. CoRR,
abs/1502.04908, 2015. To appear in 13th International Conference on Parallel Computing
Technologies, Russia.
P. Kuznetsov and S. Ravi. Why transactional memory should not be obstruction-free.
CoRR, abs/1502.02725, 2015. To appear in 29th International Symposium on Distributed
Computing (DISC’15), Japan.
D. Alistarh, J. Kopinsky, P. Kuznetsov, S. Ravi, and N. Shavit. Inherent limitations of hybrid
transactional memory. CoRR, abs/1405.5689, 2014. To appear in 29th International
Symposium on Distributed Computing (DISC’15), Japan.
H. Attiya, S. Hans, P. Kuznetsov, and S. Ravi. Safety and deferred update in transactional
memory. In R. Guerraoui and P. Romano, editors, Transactional Memory. Foundations,
Algorithms, Tools, and Applications, volume 8913 of Lecture Notes in Computer Science,
pages 50–71. Springer International Publishing, 2015.
P. Kuznetsov and S. Ravi. On partial wait-freedom in transactional memory. In Proceedings
of the 2015 International Conference on Distributed Computing and Networking, ICDCN
2015, Goa, India, January 4-7, 2015, page 10, 2015.
149
Chapter 10 Bibliography
H. Attiya, S. Hans, P. Kuznetsov, and S. Ravi. Safety of deferred update in transactional
memory. 2013 IEEE 33rd International Conference on Distributed Computing Systems,
0:601–610, 2013.
V. Gramoli, P. Kuznetsov, and S. Ravi. From sequential to concurrent: correctness and
relative efficiency (ba). In Principles of Distributed Computing (PODC), pages 241–242,
2012.
P. Kuznetsov and S. Ravi. On the cost of concurrency in transactional memory. In OPODIS,
pages 112–127, 2011.
Workshop papers
D. Alistarh, J. Kopinsky, P. Kuznetsov, S. Ravi, and N. Shavit. Inherent limitations of
hybrid transactional memory. 6th Workshop on the Theory of Transactional Memory,
Paris, France, 2014.
P. K. Vincent Gramoli and S. Ravi. Sharing a sequential data structure: correctness defini-
tion and concurrency analysis. 4th Workshop on the Theory of Transactional Memory,
Madeira, Portugal, 2012.
P. K. Hagit Attiya, Sandeep Hans and S. Ravi. What is safe in transactional memory. 4th
Workshop on the Theory of Transactional Memory, Madeira, Portugal, 2012.
Concurrently, I was also involved in the following paper whose contents are not included in the thesis.
V. Gramoli, P. Kuznetsov, S. Ravi, and D. Shang. A concurrency-optimal list-based set.
CoRR, abs/1502.01633, 2015.
V. Gramoli, P. Kuznetsov, S. Ravi, and D. Shang. A concurrency-optimal list-based set
(ba). CoRR, abs/1502.01633, 2015. To appear in 29th International Symposium on
Distributed Computing (DISC’15).
150
