Safe Non-blocking Synchronization in Ada 202x by Blieberger, Johann & Burgstaller, Bernd
ar
X
iv
:1
80
3.
10
06
7v
2 
 [c
s.P
L]
  1
8 J
un
 20
18
Safe Non-blocking Synchronization in Ada 202x
Johann Blieberger1 and Bernd Burgstaller2
1 Institute of Computer Engineering, Automation Systems Group, TU Wien, Austria
2 Department of Computer Science, Yonsei University, Korea
Abstract. The mutual-exclusion property of locks stands in the way to
scalability of parallel programs on many-core architectures. Locks do not
allow progress guarantees, because a task may fail inside a critical section
and keep holding a lock that blocks other tasks from accessing shared
data. With non-blocking synchronization, the drawbacks of locks are
avoided by synchronizing access to shared data by atomic read-modify-
write operations.
To incorporate non-blocking synchronization in Ada 202x, programmers
must be able to reason about the behavior and performance of tasks in
the absence of protected objects and rendezvous. We therefore extend
Ada’s memory model by synchronized types, which support the expres-
sion of memory ordering operations at a sufficient level of detail. To mit-
igate the complexity associated with non-blocking synchronization, we
propose concurrent objects as a novel high-level language construct. Enti-
ties of a concurrent object execute in parallel, due to a fine-grained, opti-
mistic synchronization mechanism. Synchronization is framed by the se-
mantics of concurrent entry execution. The programmer is only required
to label shared data accesses in the code of concurrent entries. Labels
constitute memory-ordering operations expressed through attributes. To
the best of our knowledge, this is the first approach to provide a non-
blocking synchronization construct as a first-class citizen of a high-level
programming language. We illustrate the use of concurrent objects by
several examples.
1 Introduction
Mutual exclusion locks are the most common technique to synchronize multi-
ple tasks to access shared data. Ada’s protected objects (POs) implement the
monitor-lock concept [13]. Method-level locking requires a task to acquire an
exclusive lock to execute a PO’s entry or procedure. (Protected functions allow
concurrent read-access in the style of a readers–writers lock [12].) Entries and
procedures of a PO thus effectively execute one after another, which makes it
straight-forward for programmers to reason about updates to the shared data
encapsulated by a PO. Informally, sequential consistency ensures that method
calls act as if they occurred in a sequential, total order that is consistent with
the program order of each participating task. I.e., for any concurrent execution,
the method calls to POs can be ordered sequentially such that they (1) are con-
sistent with program order, and (2) meet each PO’s specification (pre-condition,
side-effect, post-condition) [12].
2Although the sequential consistency semantics of mutual exclusion locks fa-
cilitate reasoning about programs, they nevertheless introduce potential con-
currency bugs such as dead-lock, live-lock and priority inversion. The mutual-
exclusion property of (highly-contended) locks stands in the way to scalability of
parallel programs on many-core architectures [20]. Locks do not allow progress
guarantees, because a task may fail inside a critical section (e.g., by entering an
endless loop), preventing other tasks from accessing shared data.
Given the disadvantages of mutual exclusion locks, it is thus desirable to
give up on method-level locking and allow method calls to overlap in time. Syn-
chronization is then performed on a finer granularity within a method’s code, via
atomic read-modify-write (RMW) operations. In the absence of mutual exclusion
locks, the possibility of task-failure inside a critical section is eliminated, because
critical sections are reduced to single atomic operations. These atomic operations
are provided either by the CPU’s instruction set architecture (ISA), or the lan-
guage run-time (with the help of the CPU’s ISA). It thus becomes possible to
provide progress guarantees, which are unattainable with locks. In particular, a
method is non-blocking, if a task’s pending invocation is never required to wait
for another task’s pending invocation to complete [12].
Non-blocking synchronization techniques are notoriously difficult to imple-
ment and the design of non-blocking data structures is an area of active research.
To enable non-blocking synchronization, a programming language must provide
a strict memory model. The purpose of a memory model is to define the set of
values a read operation in a program is allowed to return [2].
To motivate the need for a strict memory model, consider the producer-
consumer synchronization example in Fig. 1(a) (adopted from [22] and [5]). The
programmer’s intention is to communicate the value of variable Data from Task 1
to Task 2. Without explicitly requesting a sequentially consistent execution, a
compiler or CPU may break the programmer’s intended synchronization via the
Flag variable by re-ordering memory operations that will result in reading R2
= 0 in Line 6 of Task 2. (E.g., a store–store re-ordering of the assignments in
lines 2 and 3 of Task 1 will allow this result.) In Ada 2012, such re-orderings
1 -- Initial values:
2 Flag := False;
3 Data := 0;
1 -- Task 1:
2 Data := 1;
3 Flag := True;
1 -- Task 2:
2 loop
3 R1 := Flag;
4 exit when R1;
5 end loop;
6 R2 := Data;
1 Data : Integer with Volatile ;
2 Flag : Boolean with Atomic;
(a) (b)
Fig. 1: (a) Producer-consumer synchronization in pseudo-code: Task 1 writes the
Data variable and then signals Task 2 by setting the Flag variable. Task 2 is
spinning on the Flag variable (lines 2 to 5) and then reads the Data variable.
(b) Labeling to enforce sequential consistency in Ada 2012.
3can be ruled out by labeling variables Data and Flag by aspect volatile. The
corresponding variable declarations are depicted in Fig. 1(b). (Note that by [9,
C.6§8/3] aspect atomic implies aspect volatile, but not vice versa.)
The intention for volatile variables in Ada 2012 was to guarantee that all
tasks agree on the same order of updates [9, C.6§16/3]. Updates of volatile
variables are thus required to be sequentially consistent, in the sense of Lamport’s
definition [14]: “With sequential consistency (SC), any execution has a total order
over all memory writes and reads, with each read reading from the most recent
write to the same location”.
However, the Ada 2012 aspect volatile has the following shortcomings:
1. Ensuring SC for multiple tasks without atomic access is impossible. Non-
atomic volatile variables therefore should not be provided by the language.
Otherwise, the responsibility shifts from the programming language imple-
mentation to the programmer to ensure SC by pairing an atomic (implied
volatile) variable with each non-atomic volatile variable (see, e.g., Fig. 1(b)
and [21] for examples). (Note that a programming language implementation
may ensure atomicity by a mutual exclusion lock if no hardware primitives
for atomic access to a particular type of shared data are available.)
2. Requiring SC on all shared variables is costly in terms of performance on
contemporary multi-core CPUs. In Fig. 1, performance can be improved by
allowing a less strict memory order for variable Data (to be addressed in
Section 2).
3. Although Ada provides the highly abstract PO monitor-construct for block-
ing synchronization, there is currently no programming primitive available
to match this abstraction level for non-blocking synchronization.
Contemporary CPU architectures relax SC for the sake of performance [3,10,22].
It is a challenge for programming language designers to provide safe, efficient and
user-friendly non-blocking synchronization features. The original memory model
for Java contained problems and had to be revised [15]. It was later found to be
unsound with standard compiler optimizations [23]. The C++11 standard (cf.
[1,24]) has already specified a strict memory model for concurrent and parallel
computing. We think that C++11 was not entirely successful both in terms of
safety and in terms of being user-friendly. In contrast, we are convinced that
these challenges can be met in the upcoming Ada 202x standard.
It has been felt since Ada 95 that it might be advantageous to have language
support for synchronization based on atomic variables. For example, we cite [11,
C.1]: “A need to access specific machine instructions arises sometimes from other
considerations as well. Examples include instructions that perform compound
operations atomically on shared memory, such as test-and-set and compare-and-
swap, and instructions that provide high-level operations, such as translate-and-
test and vector arithmetic.”
Ada is already well-positioned to provide a strict memory model in conjunc-
tion with support for non-blocking synchronization, because it provides tasks as
first-class citizens. This rules out inconsistencies that may result from thread-
functionality provided through libraries [7].
4To provide safe and efficient non-blocking synchronization for Ada 202x, this
paper makes the following contributions:
1. We extend Ada’s memory model by introducing synchronized types, which
allow the expression of memory ordering operations consistently and at a
sufficient level of detail. Memory ordering operations are expressed through
aspects and attributes. Language support for spin loop synchronization via
synchronized variables is proposed.
2. We propose concurrent objects (COs) as a high-level language construct to
express non-blocking synchronization. COs are meant to encapsulate the
intricacies of non-blocking synchronization as POs do for blocking synchro-
nization. Contrary to POs, the entries and procedures of COs execute in
parallel, due to a fine-grained, optimistic synchronization mechanism.
3. We provide an alternative, low-level API on synchronized types, which pro-
vides programmers with full control over the implementation of non-blocking
synchronization semantics. Our main purpose with the low-level API is to
provoke a discussion on the trade-off between abstraction versus flexibility.
4. We illustrate the use of concurrent objects and the alternative, low-level API
by several examples.
The remainder of this paper is organized as follows. We summarize the state-
of-the-art on memory models and introduce synchronized variables in Sec. 2. We
introduce attributes for specifying memory ordering operations in Sec. 3. We
specify concurrent objects in Sec. 4 and discuss task scheduling in the presence
of COs in Sec. 5. Sec. 6 contains two CO example implementations with varying
memory consistency semantics. We discuss our low-level API in Sec. 7. Sec. 8
contains our conclusions.
This paper is an extension of work that appeared at the Ada-Europe 2018
conference [6]. Additional material is confined to two appendices: Appendix A
states the design-decisions of our proposed non-blocking synchronization mech-
anisms. Appendix B contains further examples.
2 The Memory Model
For reasons outlined in Sec. 1, we do not consider the Ada 2012 atomic and
volatile types here. Rather, we introduce synchronized types and variables. Syn-
chronized types provide atomic access. We propose aspects and attributes for
specifying a particular memory model to be employed for reading/writing syn-
chronized variables.
Modern multi-core computer architectures are equipped with a memory hi-
erarchy that consist of main memory, caches and registers. It is important to
distinguish between memory consistency and coherence. We cite from [22]: ‘For
a shared memory machine, the memory consistency model defines the architec-
turally visible behavior of its memory system. Consistency definitions provide
rules about loads and stores (or memory reads and writes) and how they act
5upon memory. As part of supporting a memory consistency model, many ma-
chines also provide cache coherence protocols that ensure that multiple cached
copies of data are kept up-to-date.’
The purpose of a memory consistency model (or memory model, for short)
is to define the set of values a read operation is allowed to return [2]. To fa-
cilitate programmers’ intuition, it would be ideal if all read/write operations of
a program’s tasks are sequentially consistent. However, the hardware memory
models provided by contemporary CPU architectures relax SC for the sake of
performance [3,10,22]. Enforcing SC on such architectures may incur a notice-
able performance penalty. The workable middle-ground between intuition (SC)
and performance (relaxed hardware memory models) has been established with
SC for data race-free programs (SC-for-DRF) [4]. Informally, a program has a
data race if two tasks access the same memory location, at least one of them is a
write, and there are no intervening synchronization operations that would order
the accesses. “SC-for-DRF” requires programmers to ensure that programs are
free of data races under SC. In turn, the relaxed memory model of a SC-for-DRF
CPU guarantees SC for all executions of such a program.
It has been acknowledged in the literature [2] that Ada 83 was perhaps the
first widely-use high-level programming language to provide first-class support
for shared-memory programming. The approach taken with Ada 83 and later
language revisions was to require legal programs to be without synchronization
errors, which is the approach taken with SC-for-DRF. In contrast, for the Java
memory model it was perceived that even programs with synchronization er-
rors shall have defined semantics for reasons of safety and security of Java’s
sand-boxed execution environment. (We do not consider this approach in the re-
mainder of this paper, because it does not align with Ada’s current approach to
regard the semantics of programs with synchronization errors as undefined, i.e.,
as an erroneous execution, by [9, 9.10§11].) The SC-for-DRF programming model
and two relaxations were formalized for C++11 [8]. They were later adopted for
C11, OpenCL 2.0, and for X10 [26] (without the relaxations).
On the programming language level to guarantee DRF, means for synchro-
nization (ordering operations) have to be provided. Ada’s POs are well-suited for
this purpose. For non-blocking synchronization, atomic operations can be used
to enforce an ordering between the memory accesses of two tasks. It is one goal
of this paper to add language features to Ada such that atomic operations can
be employed with DRF programs. To avoid ambiguity, we propose synchronized
variables and types, which support the expression of memory ordering operations
at a sufficient level of detail (see Sec. 3.1).
The purpose of synchronized variables is that they can be used to safely trans-
fer information (i.e., the value of the variables) from one task to another. ISAs
provide atomic load/store instructions only for a limited set of primitive types.
Beyond those, atomicity can only be ensured by locks. Nevertheless, computer
architectures provide memory fences (see e.g., [12]) to provide means for ordering
memory operations. A memory fence requires that all memory operations before
the fence (in program order) must be committed to the memory hierarchy before
6any operation after the fence. Then, for data to be transferred from one thread to
another it is not necessary to be atomic anymore. I.e., it is sufficient that (1) the
signaling variable is atomic, and that (2) all write operations are committed to
the memory hierarchy before setting the signaling variable. On the receiver’s
side, it must be ensured that (3) the signaling variable is read atomically, and
that (4) memory loads for the data occur after reading the signaling variable
(Listing 1.2 provides an example.)
In addition to synchronized variables, synchronized types and attribute
Synchronized Components are convenient means for enhancing the usefulness
of synchronized variables.
The general idea of our proposed approach is to define non-blocking con-
current objects similar to protected objects (cf. e.g., [12]). However, entries of
concurrent objects will not block on guards; they will spin loop until the guard
evaluates to true. In addition, functions, procedures, and entries of concurrent
objects are allowed to execute and to modify the encapsulated data in parallel.
Private entries for concurrent objects are also supported. It is their responsibility
that the data provides a consistent view to the users of the concurrent object.
Concurrent objects will use synchronized types for synchronizing data access.
Several memory models are provided for doing this efficiently. It is the respon-
sibility of the programmer to ensure that the entries of a concurrent object are
free from data races (DRF). For such programs, the non-blocking semantics of
a concurrent object will provide SC in the same way as protected objects do for
blocking synchronization.
2.1 Synchronizing memory operations and enforcing ordering
For defining ordering relations on memory operations, it is useful to introduce
some other useful relations.
The synchronizes-with relation can be achieved only by use of atomic types.
Even if monitors or protected objects are used for synchronization, the runtime
implements them employing atomic types. The general idea is to equip read and
write operations on an atomic variable with information that will enforce an
ordering on the read and write operations. Our proposal is to use attributes for
specifying this ordering information. Details can be found below.
The happens-before relation is the basic relation for ordering operations in
programs. In a program consisting of only one single thread, happens-before
is straightforward. For inter-thread happens-before relations the synchronizes-
with relation becomes important. If operation X in one thread synchronizes-with
operation Y in another thread, then X happens-before Y. Note that the happens-
before relation is transitive, i.e., if X happens-before Y and Y happens-before Z,
then X happens-before Z. This is true even if X, Y, and Z are part of different
threads.
We define different memory models. These memory models originated from
the DRF [4] and properly-labeled [10] hardware memory models. They were
formalized for the memory model of C++ [8]. The “sequentially consistent”
and “acquire-release” memory models provide SC for DRF. The models can
7memory order involved constraints for reordering memory accesses
threads (for compilers and CPUs)
relaxed 1 no inter-thread constraints
release/acquire 2 (1) ordinary4 stores originally5 before release (in pro-
gram order) will happen before the release fence (after
compiler optimizations and CPU reordering)
(2) ordinary loads originally after acquire (in program
order) will take place after the acquire fence (after com-
piler optimizations and CPU reordering)
sequentially
consistent
all (1) all memory accesses originally before the sequenti-
ally consistent one (in program order) will happen be-
fore the fence (after compiler optimizations and CPU re-
ordering)
(2) all memory accesses originally after the sequenti-
ally consistent one (in program order) will happen af-
ter the fence (after compiler optimizations and CPU re-
ordering)
Table 1: Memory Order and Constraints for Compilers and CPUs
have varying costs on different computer architectures. The “acquire-release”
memory model is a relaxation of the “sequentially consistent” memory model.
As described in Table 1, it requires concessions from the programmer to weaken
SC in turn for more flexibility for the CPU to re-order memory operations.
Sequentially Consistent Ordering is the most stringent model and the eas-
iest one for programmers to work with. In this case all threads see the same,
total order of operations. This means, a sequentially consistent write to a
synchronized variable synchronizes-with a sequentially-consistent read of the
same variable.
Relaxed Ordering does not obey synchronizes-with relationships, but opera-
tions on the same synchronized variable within a single thread still obey
happens-before relationships. This means that although one thread may
write a synchronized variable, at a later point in time another thread may
read an earlier value of this variable.
Acquire-Release Ordering when compared to relaxed ordering introduces
some synchronization. In fact, a read operation on synchronized variables
can then be labeled by acquire, a write operation can be labeled by release.
Synchronization between release and acquire is pairwise between the thread
that issues the release and that acquire operation of a thread that does the
first read-acquire after the release.3 A thread issuing a read-acquire later
may read a different value than that written by the first thread.
3 In global time!
4 Memory accesses other than accesses to synchronized variables
5 Before optimizations performed by the compiler and before reordering done by the
CPU.
8It is important to note, that the semantics of the models above have to
be enforced by the compiler (for programs which are DRF). I.e., the compiler
“knows” the relaxed memory model of the hardware and inserts memory fences
in the machine-code such that the memory model of the high-level programming
language is enforced. Compiler optimizations must ensure that reordering of
operations is performed in such a way that the semantics of the memory model
are not violated. The same applies to CPUs, i.e., reordering of instructions is
done with respect to the CPU’s relaxed hardware memory model, constrained
by the ordering semantics of fences inserted by the compiler. The constraints
enforced by the memory model are summarized in Table 1.
3 Synchronization primitives
3.1 Synchronized Variables
Synchronized variables can be used as atomic variables in Ada 2012, the only
exception being that they are declared inside the lexical scope (data part) of a
concurrent object. In this case aspects and attributes used in the implementation
of the concurrent object’s operations (functions, procedures, and entries) are
employed for specifying behavior according to the memory model. Variables are
labeled by the boolean aspect Synchronized.
Read accesses to synchronized variables in the implementation of the con-
current object’s operations may be labeled with the attribute Concurrent Read,
write accesses with the attribute Concurrent Write. Both attributes have a
parameter Memory Order to specify the memory order of the access. (If the op-
erations are not labeled, the default values given below apply.) In case of read ac-
cesses, values allowed for parameter Memory Order are Sequentially Consistent,
Acquire, and Relaxed. The default value is Sequentially Consistent. For
write accesses the values allowed are Sequentially Consistent, Release, and
Relaxed. The default value is again Sequentially Consistent.
For example, assigning the value of synchronized variable Y to synchronized
variable X is given like
X’Concurrent Write(Memory Order => Release) :=
Y’Concurrent Read(Memory Order => Acquire);
In addition we propose aspects for specifying variable specific default values
for the attributes described above. In more detail, when declaring synchronized
variables the default values for read and write accesses can be specified via
aspects Memory Order Read and Memory Order Write. The allowed values are
the same as those given above for read and write accesses. If these memory
model aspects are given when declaring a synchronized variable, the attributes
Concurrent Read and Concurrent Write need not be given for actual read and
write accesses of this variable. However, these attributes may be used to tem-
porarily over-write the default values specified for the variable by the aspects.
For example
X: integer with Synchronized, Memory Order Write => Release;
9Y: integer with Synchronized, Memory Order Read => Acquire;
. . .
X := Y;
does the same as the example above but without spoiling the assignment state-
ment.
Aspect Synchronized Components relates to aspect Synchronized in the
same way as Atomic Components relates to Atomic in Ada 2012.
3.2 Read-Modify-Write Variables
If a variable inside the data part of a concurrent object is labeled by the aspect
Read Modify Write, this implies that the variable is synchronized. Write access
to a read-modify-write variable in the implementation of the protected object’s
operations is a read-modify-write access. The read-modify-write access is done
via the attribute Concurrent Exchange. The two parameters of this attribute
are Memory Order Success and Memory Order Failure. The first specifies the
memory order for a successful write, the second one the memory order if the
write access fails (and a new value is assigned to the variable).
Memory Order Success is one of Sequentially Consistent, Acquire,
Release, and Relaxed.
Memory Order Failuremay be one of Sequentially Consistent, Acquire,
and Relaxed. The default value for both is Sequentially Consistent. For
the same read-modify-write access the memory order specified for failure must
not be stricter than that specified for success. So, if Memory Order Failure =>
Acquire or Memory Order Failure => Sequentially Consistent is specified,
these have also be given for success.
For read access to a read-modify-write variable, attribute Concurrent Read
has to be used. The parameter Memory Order has to be given. Its value is one of
Sequentially Consistent, Acquire, Relaxed. The default value is
Sequentially Consistent.
Again, aspects for variable specific default values for the attributes described
above may be specified when declaring a read-modify-write variable. The aspects
are Memory Order Read, Memory Order Write Success, and
Memory Order Write Failure with allowed values as above.
3.3 Synchronization Loops
As presented below synchronization by synchronized variables is performed via
spin loops. We call these loops sync loops.
4 Concurrent Objects
4.1 Non-Blocking Synchronization
Besides the aspects and attributes proposed in Section 3 that have to be used
for implementing concurrent objects, concurrent objects are different from pro-
tected objects in the following way. All operations of concurrent objects can be
10
executed in parallel. Synchronized variables have to be used for synchronizing
the executing operations. Entries have Boolean-valued guards. The Boolean ex-
pressions for such guards may contain only synchronized variables declared in
the data part of the protected object and constants. Calling an entry results
either in immediate execution of the entry’s body if the guard evaluates to true,
or in spin-looping until eventually the guard evaluates to true. We call such a
spin loop sync loop.
4.2 Read-Modify-Write Synchronization
For concurrent objects with read-modify-write variables the attributes proposed
in Section 3 apply. All operations of concurrent objects can be executed in paral-
lel. Read-modify-write variables have to be used for synchronizing the executing
operations. The guards of entries have to be of the form X = X’OLD where X de-
notes a read-modify-write variable of the concurrent object. The attribute OLD
is well-known from postconditions. An example in our context can be found in
Listing 1.1.
If during the execution of an entry a read-modify-write operation is reached,
that operation might succeed immediately, in which case execution proceeds after
the operation in the normal way. If the operation fails, the whole execution of the
entry is restarted (implicit sync loop). In particular, only the statements being
data-dependent on the read-modify-write variable are re-executed. Statements
not being data-dependent on the read-modify-write variables are executed only
on the first try.6 Precluding non-data-dependent statements from re-execution
is not only a matter of efficiency, it sometimes makes sense semantically, e.g.,
for adding heap management to an implementation.
5 Scheduling and Dispatching
We propose a new state for Ada tasks to facilitate correct scheduling and dis-
patching for threads synchronizing via synchronized or read-modify-write types.
If a thread is in a sync loop, the thread state changes to “in sync loop”. Note
that sync loops can only happen inside concurrent objects. Thus they can be
spotted easily by the compiler and cannot be confused with “normal” loops.
Note also that for the state change it makes sense not to take place during the
first iteration of the sync loop, because the synchronization may succeed imme-
diately. For read-modify-write loops, iteration from the third iteration on may
be a good choice; for spin loops, an iteration from the second iteration on may
be a good choice.
In this way the runtime can guarantee that not all available CPUs (cores)
are occupied by threads in state “in sync loop”. Thus we can be sure that at
least one thread makes progress and finally all synchronized or read-modify-write
6 For the case that the compiler cannot figure out which statements are data-
dependent, we propose an additional Boolean aspect only execute on first try
to tag non-data-dependent statements.
11
variables are released (if the program’s synchronization structure is correct and
the program does not deadlock).
After leaving a sync loop, the thread state changes back to “runable”.
6 Examples
Non-blocking Stack. Listing 1.1 shows an implementation of a non-blocking
stack using our proposed new syntax for concurrent objects.
1 subtype Data i s Integer ;
2
3 type List;
4 type List_P i s a c ce s s List;
5 type List i s
6 record
7 D: Data;
8 Next: List_P;
9 end record ;
10
11 Empty: except ion ;
12
13 concurrent Lock_Free_Stack
14 i s
15 entry Push(D: Data);
16 entry Pop(D: out Data);
17 pr i vat e
18 Head: List_P with Read Modify Write ,
19 Memory Order Read => Relaxed ,
20 Memory Order Write Success => Release ,
21 Memory Order Write Failure => Relaxed;
22 end Lock_Free_Stack;
23
24 concurrent body Lock_Free_Stack i s
25 entry Push (D: Data)
26 un t i l Head = Head ’OLD i s
27 New_Node : List_P := new List;
28 begin
29 New_Node . a l l := (D => D, Next => Head);
30 Head := New_Node ;
31 end Push;
32
33 entry Pop(D: out Data)
34 un t i l Head = Head ’OLD i s
35 Old_Head : List_P;
36 begin
37 Old_Head := Head;
38 i f Old_Head /= nu l l then
39 Head := Old_Head .Next;
40 D := Old_head .D;
41 e l s e
42 r a i s e Empty;
43 end i f ;
44 end Pop;
45 end Lock_Free_Stack;
Listing 1.1: Non-blocking Stack Implementation Using Proposed New Syntax
Implementation of entry Push (lines 25–31) behaves as follows. In Line 29 a
new element is inserted at the head of the list. Pointer Next of this element is
set to the current head. The next statement (Line 30) assigns the new value to
the head of the list. Since variable Head has aspect Read Modify Write (line 18),
12
this is done with RMW semantics, i.e., if the value of Head has not been changed
(since the execution of Push has started) by a different thread executing Push
or Pop (i.e., Head = Head’OLD), then the RMW operation succeeds and exe-
cution proceeds at Line 31, i.e., Push returns. If the value of Head has been
changed (Head /= Head’OLD), then the RMW operation fails and entry Push is
re-executed starting from Line 29. Line 27 is not re-executed as it is not data
dependent on Head.
Several memory order attributes apply to the RMW operation (Line 30)
which are given in lines 19–21: In case of a successful execution of the RMW, the
value of Head is released such that other threads can read its value via memory
order acquire. In the failure case the new value of Head is assigned to the “local
copy” of Head (i.e., Head’OLD) via relaxed memory order. “Relaxed” is enough
because RMW semantics will detect if the value of Head has been changed by a
different thread anyway. The same applies to “Relaxed” in Line 27.
Implementation of entry Pop (lines 33–44) follows along the same lines.
Memory management needs special consideration: In our case it is enough to
use a synchronized counter that counts the number of threads inside Pop. If the
counter equals 1, memory can be freed. Ada’s storage pools are a perfect means
for doing this without spoiling the code.
This example also shows how easy it is to migrate from a (working) blocking
to a (working) non-blocking implementation of a program. Assume that a work-
ing implementation with a protected object exists, then one has to follow these
steps:
1. Replace keyword protected by keyword concurrent.
2. Replace protected operations by DRF concurrent operations, thereby adding
appropriate guards to the concurrent entries.
3. Test the non-blocking program which now has default memory order
sequentially consistent.
4. Carefully relax the memory ordering requirements: Add memory order as-
pects and/or attributes Acquire, Release, and/or Relaxed to improve per-
formance but without violating memory consistency.
Generic Release-Acquire Object. Listing 1.2 shows how release-acquire se-
mantics can be implemented for general data structures with help of one syn-
chronized Boolean.
1 gene r i c
2 type Data i s p r i va te ;
3 package Generic_Release_Acquire i s
4
5 concurrent RA
6 i s
7 procedure Write (d: Data);
8 entry Get (D: out Data);
9 pr i vat e
10 Ready: Boolean := false with Synchronized ,
11 Memory Order Read => Acquire ,
12 Memory Order Write => Release ;
13 Da: Data;
14 end RA;
15
13
16 end Generic_Release_Acquire;
17
18 package body Generic_Release_Acquire i s
19
20 concurrent body RA i s
21
22 procedure Write (D: Data) i s
23 begin
24 Da := D;
25 Ready := true;
26 end Write:
27
28 entry Get (D: out Data)
29 when Ready i s
30 -- spin -lock until released , i.e., Ready = true;
31 -- only sync. variables and constants allowed in guard expression
32 begin
33 D := Da;
34 end Get;
35 end RA;
36
37 end Generic_Release_Acquire;
Listing 1.2: Generic Release-Acquire Object
7 API
As already pointed out, we feel that providing concurrent objects as first-class
citizens is the right way to enhance Ada with non-blocking synchronization on an
adequate memory model. On the other hand, if the programmer needs synchro-
nization on a lower level than concurrent objects provide, an API-based approach
(generic function Read Modify Write in package Memory Model) would be a vi-
able alternative. Listing 1.3 shows such a predefined package Memory Model. It
contains the specification of generic function Read Modify Write, which allows
to use the read-modify-write operation of the underlying computer hardware7.
Exposing sync loops to the programmer makes it necessary to introduce a new
aspect sync loop to let the runtime perform the state change to “in sync loop”
(cf. Section 5). Because nobody can force the programmer to use this aspect
correctly, the information transferred to the runtime may be false or incom-
plete, giving rise to concurrency defects such as deadlocks, livelocks, and other
problems.
1 package Memory_Model i s
2
3 type Memory_Order_Type i s (
4 Sequent i a l l y Cons i s t en t ,
5 Relaxed ,
6 Acquire ,
7 Release );
8
9 subtype Memory_Order_Success_Type i s Memory_Order_Type;
10
11 subtype Memory_Order_Failure_Type i s Memory_Order_Type
12 range S equen t i a l l y Con s i s t en t .. Acquire ;
13
7 An example for employing function Read Modify Write is given in the Appendix in
Listing 1.8. It shows an implementation of a lock free stack using generic function
Read Modify Write of package Memory Model.
14
14 gener i c
15 type Some_Synchronized_Type i s p r iva te ;
16 with funct i on Update return Some_Synchronized_Type;
17 Read_Modify_Write_Variable: in out Some_Synchronized_Type
18 with Read Modify Write;
19 Memory Order Success : Memory_Order_Success_Type :=
20 Sequ ent i a l l y Con s i s t en t ;
21 Memory Order Failure : Memory_Order_Failure_Type :=
22 Sequ ent i a l l y Con s i s t en t ;
23 f unc t i on Read Modify Write return Boolean ;
24
25 end Memory_Model;
Listing 1.3: Package Memory Model
8 Conclusion and Future Work
We have presented an approach for providing safe non-blocking synchronization
in Ada 202x. Our novel approach is based on introducing concurrent objects for
encapsulating non-blocking data structures on a high abstraction level. In addi-
tion, we have presented synchronized and read-modify-write types which support
the expression of memory ordering operations at a sufficient level of detail. Con-
current objects provide SC for programs without data races. This SC-for-DRF
memory model is well-aligned with Ada’s semantics for blocking synchronization
via protected objects, which requires legal programs to be without synchroniza-
tion errors ([9, 9.10§11]).
Although Ada 2012 provides the highly abstract protected object monitor-
construct for blocking synchronization, there was previously no programming
primitive available to match this abstraction level for non-blocking synchroniza-
tion. The proposed memory model in conjunction with our concurrent object
construct for non-blocking synchronization may bar users from having to in-
vent ad-hoc synchronization solutions, which have been found error-prone with
blocking synchronization already [25].
Until now, all previous approaches are based on APIs. We have listed a
number of advantages that support our approach of making non-blocking data
structures first class language citizens. In contrast, our approach for Ada 202x
encapsulates non-blocking synchronization inside concurrent objects. This safe
approach makes the code easy to understand. Note that concurrent objects are
not orthogonal to objects in the sense of OOP (tagged types in Ada). However,
this can be achieved by employing the proposed API approach (cf. Section 7).
In addition, it is not difficult to migrate code from blocking to non-blocking
synchronization. Adding memory management via storage pools integrates well
with our modular approach and does not clutter the code.
A lot of work remains to be done. To name only a few issues: Non-blocking
barriers (in the sense of [9, D.10.1]) would be useful; details have to be elaborated.
Fully integrating concurrent objects into scheduling and dispatching models and
integrating with the features for parallel programming planned for Ada 202x
have to be done carefully.
15
9 Acknowledgments
This research was supported by the Austrian Science Fund (FWF) project
I 1035N23, and by the Next-Generation Information Computing Development
Program through the National Research Foundation of Korea (NRF), funded by
the Ministry of Science, ICT & Future Planning under grant NRF2015M3C4A-
7065522.
References
1. Working Draft, Standard for Programming Language C++. ISO/IEC N4296, 2014.
2. S. V. Adve and H.-J. Boehm. Memory models: A case for rethinking parallel
languages and hardware. Commun. ACM, 53(8):90–101, Aug. 2010.
3. S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial.
Computer, 29(12):66–76, Dec. 1996.
4. S. V. Adve and M. D. Hill. Weak ordering—a new definition. In Proceedings of the
17th Annual International Symposium on Computer Architecture, ISCA ’90, pages
2–14, New York, NY, USA, 1990. ACM.
5. J. Barnes. Ada 2012 Rationale: The Language – the Standard Libraries. Springer
LNCS, 2013.
6. J. Blieberger and B. Burgstaller. Safe non-blocking synchronization in Ada 202x.
In Proceeding of Ada-Europe, Springer LNCS, 2018.
7. H.-J. Boehm. Threads cannot be implemented as a library. SIGPLAN Not.,
40(6):261–268, June 2005.
8. H.-J. Boehm and S. V. Adve. Foundations of the C++ concurrency memory model.
In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language
Design and Implementation, PLDI ’08, pages 68–78, New York, NY, USA, 2008.
ACM.
9. R. L. Brukardt, editor. Annotated Ada Reference Manual, ISO/IEC 8652:2012/Cor
1:2016. 2016.
10. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy.
Memory consistency and event ordering in scalable shared-memory multiproces-
sors. SIGARCH Comput. Archit. News, 18(2SI):15–26, May 1990.
11. L. Guerby. Ada 95 Rationale – The Language – The Standard Libraries. Springer,
1997.
12. M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kauf-
mann Publishers Inc., San Francisco, CA, USA, 2012.
13. C. A. R. Hoare. Monitors: An operating system structuring concept. Commun.
ACM, 17(10):549–557, Oct. 1974.
14. L. Lamport. How to make a multiprocessor computer that correctly executes
multiprocess programs. IEEE Trans. Comput., 28(9):690–691, Sept. 1979.
15. J. Manson, W. Pugh, and S. V. Adve. The Java memory model. In Proceedings
of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, POPL ’05, pages 378–391, New York, NY, USA, 2005. ACM.
16. P. E. McKenney, T. Riegel, and J. Preshing. N4036: Towards Implemen-
tation and Use of memory order consume. Technical Report WG21/N4036,
JTC1/SC22/WG21 – The C++ Standards Committee – ISOCPP, May 2014.
16
17. P. E. McKenney, T. Riegel, J. Preshing, H. Boehm, C. Nelson, O. Giroux, L. Crowl,
J. Bastien, and M. Wong. P0462R1: Marking memory order consume Dependency
Chains. Technical Report WG21/P0462R1, JTC1/SC22/WG21 – The C++ Stan-
dards Committee – ISOCPP, Feb. 2017.
18. P. E. McKenney, M. Wong, H. Boehm, J. Maurer, J. Yasskin, and J. Bastien.
P0190R4: Proposal for New memory order consume Definition. Technical Re-
port WG21/P0190R4, JTC1/SC22/WG21 – The C++ Standards Committee –
ISOCPP, July 2017.
19. J. Preshing. The purpose of memory order consume in C++11.
http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/.
Accessed: 2017-09-13.
20. M. L. Scott. Shared-Memory Synchronization. Synthesis Lectures on Computer
Architecture. Morgan & Claypool Publishers, 2013.
21. H. Simpson. Four-slot fully asynchronous communication mechanism. Computers
and Digital Techniques, IEE Proceedings E, 137:17–30, 02 1990.
22. D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and
Cache Coherence. Number 16 in Synthesis Lectures on Computer Architecture.
Morgan & Claypool, 2011.
23. J. Sˇevcˇ´ık and D. Aspinall. On validity of program transformations in the Java
memory model. In Proceedings of the 22nd European Conference on Object-Oriented
Programming, ECOOP ’08, pages 27–51, Berlin, Heidelberg, 2008. Springer-Verlag.
24. A. Williams. C++ Concurrency in Action. Manning Publ. Co., Shelter Island,
NY, 2012.
25. W. Xiong, S. Park, J. Zhang, Y. Zhou, and Z. Ma. Ad hoc synchronization consid-
ered harmful. In Proceedings of the 9th USENIX Conference on Operating Systems
Design and Implementation, OSDI’10, pages 163–176, Berkeley, CA, USA, 2010.
USENIX Association.
26. A. Zwinkau. A memory model for X10. In Proceedings of the 6th ACM SIGPLAN
Workshop on X10, X10 2016, pages 7–12, New York, NY, USA, 2016. ACM.
A Rationale and comparison with C++11
We state the rationale for our proposed language features and compare them to
the C++ memory model. This section thus requires modest familiarity with the
C++11 standard [1].
A.1 C++11’s compare exchange weak and compare exchange strong
We felt that compare exchange weak and compare exchange strong are not
needed on language level. These are hardware-related details which the compiler
knows and should anticipate without intervention of the programmer.
In particular, compare exchange weak means that sometimes a RMW op-
eration fails although the value of the RMW variable has not been changed by
a different thread. In this case re-executing the whole implicit sync loop is not
necessary, only the RMW operation has to be redone. We assume that the com-
piler produces machine code for this “inner” loop. Because this is only the case
on very peculiar CPUs, it is obvious that the compiler and not the programmer
should take care of this.
17
In addition, migrating Ada programs will be facilitated by assigning this job
to the compiler.
A.2 C++11’s consume memory ordering
C++ introduced memory order consume specifically for supporting read-copy
update (RCU) used in the Linux Kernel (cf. [19]). However, it turned out that
memory order consume as defined in the C++ standard [1] is not implemented
by compilers. Instead, all compilers map it to memory order acquire. The ma-
jor reason for this is that the data dependency as defined in [1] is difficult to
implement (cf., e.g., [16]). There is, however, ongoing work within ISOCPP to
make memory order consume viable for implementation (cf., e.g., [17,18]). In
particular, [18] proposes to restrict memory order consume to data dependence
chains starting with pointers because this represents the major usage scenario
in the Linux kernel.
For Ada 202x it seems reasonable not to include memory order consume in
the standard. Instead, compilers are encouraged to exploit features provided by
the hardware for gaining performance on weakly-ordered CPUs. The program-
mer uses memory order release and memory order acquire for synchroniza-
tion and the compiler improves the performance of the program if the hardware
is weakly-ordered and it (the compiler) is willing to perform data dependency
analysis. In addition, a compiler switch might be a way for letting the pro-
grammer decide whether she is willing to bare the optimization load (increased
compile time).
In addition, migrating Ada programs will be facilitated by not having to
replace memory order acquire with memory order consume and vice versa de-
pending on the employed hardware.
A.3 C++11’s acquire release memory ordering
C++11 defines acquire release memory ordering because some of C++11’s
RMW operations contain both a read and a write operation, e.g., i++ for i
being an atomic integer variable. Because Ada’s syntax does not contain such
operators, acquire release memory ordering is not needed on language level.
Compiling i := i+1 (i being an atomic integer variable), an Ada compiler is
able to employ suitable memory fences to ensure the memory model aspects
given by the programmer together with the original statement.
18
B Further Examples
Peterson’s Algorithm. Listing 1.4 shows an implementation of Peterson’s
algorithm, a method for lock-free synchronizing two tasks, under the sequentially
consistent memory model.
1 concurrent Peterson_Exclusion i s
2
3 procedure Task1_Critical_Section;
4
5 procedure Task2_Critical_Section;
6
7 pr i vat e
8
9 -- Accesses to synchronized variables are atomic and have by default
10 -- Sequential Consistency (i.e. the compiler must generate code that
11 -- respects program order , and the adequate memory fence instructions
12 -- are introduced before and after each load or store to serialize
13 -- memory operations in all CPU cores , respecting ordering , visibility ,
14 -- and atomicity). However , it is possible to relax each read / write
15 -- operation on an synch variable , for obtaining higher performance in
16 -- those algorithms that allow these kinds of reorderings by other
17 -- threads .
18
19 Flag1 : Boolean := False with Synchronized;
20 Flag2 : Boolean := False with Synchronized;
21 Turn : Natural with Synchronized ;
22
23 -- Additional data can be placed here , i.e. for shared data variables
24 -- that need no atomic accesses (i.e. when data races are not possible
25 -- because protected by synchronized variables)
26
27 -- Concurrent entries also encapsulate the access to shared data
28 -- (Synch variables), but automatically using spin loops / compare and
29 -- swap operations for synchronization among threads .
30
31 entry Task1_Wait;
32
33 entry Task2_Wait;
34
35 end Peterson_Exclusion;
36
37
38 concurrent body Peterson_Exclusion i s
39
40 entry Task1_Busy_Wait
41 when Flag2 and then Turn = 2 -- Spin loop until the condition is
True
42 i s
43 begin
44 nu l l ;
45 end Task1_Busy_Wait;
46
47 entry Task2_Busy_Wait
48 when Flag1 and then Turn = 1 -- Spin loop until the condition is
True
49 i s
50 begin
51 nu l l ;
52 end Task2_Busy_Wait;
53
54 procedure Task1_Critical_Section i s
55 begin
56 Flag1 := True;
57 Turn := 2;
58 Task1_Busy_Wait;
19
59
60 Code_For_Task1_Critical_Section;
61
62 Flag1 := False;
63 end Task1_Critical_Section;
64
65 procedure Task2_Critical_Section i s
66 begin
67 Flag2 := True;
68 Turn := 1;
69 Task2_Busy_Wait;
70
71 Code_For_Task2_Critical_Section;
72
73 Flag2 := False;
74 end Task2_Critical_Section;
75
76 end Peterson_Exclusion;
Listing 1.4: Peterson’s Algorithm under the Sequentially Consistent Memory
Model
Listing 1.5 shows an implementation of Peterson’s algorithm under the release-
acquire memory model with default memory model specified in the declarative
part.
1 concurrent Peterson_Exclusion i s
2
3 procedure Task1_Critical_Section;
4
5 procedure Task2_Critical_Section;
6
7 pr i vat e
8
9 Flag1 : Boolean := False
10 with Synchronized , Memory Order Read => Acquire ,
11 Memory Order Write => Release ;
12 Flag2 : Boolean := False
13 with Synchronized , Memory Order Read => Acquire ,
14 Memory Order Write => Release ;
15 Turn : Natural
16 with Synchronized , Memory Order Read => Acquire ,
17 Memory Order Write => Release ;
18
19 entry Task1_Wait;
20
21 entry Task2_Wait;
22
23 end Peterson_Exclusion;
24
25
26 concurrent body Peterson_Exclusion i s
27
28 entry Task1_Busy_Wait
29 when Flag2 and Turn = 2
30 -- Spin loop until the condition is True
31 i s
32 begin
33 nu l l ;
34 end Task1_Busy_Wait;
35
36 entry Task2_Busy_Wait
37 when Flag1 and Turn = 1
38 -- Spin loop until the condition is True
39 i s
40 begin
20
41 nu l l ;
42 end Task2_Busy_Wait;
43
44 procedure Task1_Critical_Section i s
45 begin
46 Flag1 := True;
47 Turn := 2;
48 Task1_Busy_Wait;
49
50 Code_For_Task1_Critical_Section;
51
52 Flag1 := False;
53 end Task1_Critical_Section;
54
55 procedure Task2_Critical_Section i s
56 begin
57 Flag2 := True;
58 Turn := 1;
59 Task2_Busy_Wait;
60
61 Code_For_Task2_Critical_Section;
62
63 Flag2 := False;
64 end Task2_Critical_Section;
65
66 end Peterson_Exclusion;
Listing 1.5: Peterson’s Algorithm under the Release-Acquire Memory Model
with default memory model specified in the declarative part
Listing 1.6 shows an implementation of Peterson’s algorithm under the release-
acquire memory model with memory model explicitly specified at statements.
1 concurrent Peterson_Exclusion i s
2
3 procedure Task1_Critical_Section;
4
5 procedure Task2_Critical_Section;
6
7 pr i vat e
8
9 Flag1 : Boolean ’Concurrent Write (Memory_Model => Release ) := False
10 with Synchronized ;
11 Flag2 : Boolean ’Concurrent Write (Memory_Model => Release ) := False
12 with Synchronized ;
13 Turn : Natural with Synchronized ;
14
15 entry Task1_Wait;
16
17 entry Task2_Wait;
18
19 end Peterson_Exclusion;
20
21
22 concurrent body Peterson_Exclusion i s
23
24 entry Task1_Busy_Wait
25 when Flag2 ’Concurrent Read(Memory_Model => Acquire ) and
26 Turn ’Concurrent Read(Memory_Model => Acquire ) = 2
27 -- Spin loop until the condition is True
28 i s
29 begin
30 nu l l ;
31 end Task1_Busy_Wait;
32
33 entry Task2_Busy_Wait
21
34 when Flag1 ’Concurrent Read(Memory_Model => Acquire ) and
35 Turn ’Concurrent Read(Memory_Model => Acquire ) = 1
36 -- Spin loop until the condition is True
37 i s
38 begin
39 nu l l ;
40 end Task2_Busy_Wait;
41
42 procedure Task1_Critical_Section i s
43 begin
44 Flag1 ’Concurrent Write (Memory_Model => Release ) := True;
45 Turn ’Concurrent Write (Memory_Model => Release ) := 2;
46 Task1_Busy_Wait;
47
48 Code_For_Task1_Critical_Section;
49
50 Flag1 ’Concurrent Write (Memory_Model => Release ) := False;
51 end Task1_Critical_Section;
52
53 procedure Task2_Critical_Section i s
54 begin
55 Flag2 ’Concurrent Write (Memory_Model => Release ) := True;
56 Turn ’Concurrent Write (Memory_Model => Release ) := 1;
57 Task2_Busy_Wait;
58
59 Code_For_Task2_Critical_Section;
60
61 Flag2 ’Concurrent Write (Memory_Model => Release ) := False;
62 end Task2_Critical_Section;
63
64 end Peterson_Exclusion;
Listing 1.6: Peterson’s Algorithm under the Release-Acquire Memory Model
with memory model explicitly specified at statements
22
Filter Algorithm. The filter algorithm is a non-blocking method for synchro-
nizing n processes, which is starvation and deadlock free ([12]). Listing 1.7 is an
implementation using our proposed approach. In particular, notice the use of a
private entry family.
1 gene r i c
2 No_Of_Processes: Positive ; -- positive number >= 2
3 package Filter_Algorithm i s
4
5 subtype Process_ID i s Natural range 0 .. No_Of_Processes -1;
6 subtype Process_ID_With_Minus_One i s Integer range
7 -1 .. No_Of_Processes -1;
8 subtype Process_ID_Small i s Process_ID range
9 Process_ID ’FIRST .. Process_ID ’LAST -1;
10 type Level_Type i s array (Integer range <>) o f
11 Process_ID_With_Minus_One;
12
13 concurrent Access_To_Critical_Section
14 i s
15 procedure Acquire_Lock (ID: Process_ID);
16 procedure Release_Lock (ID: Process_ID);
17 pr i vat e
18 entry Private_Lock(Process_ID); -- entry family
19
20 Level: Level_Type (Process_ID) := ( o ther s => -1)
21 with Synchronized Components ,
22 Memory Order Read => Acquire ,
23 Memory Order Write => Release ;
24 Last_To_Enter: Level_Type(Process_ID_Small) := ( o ther s => -1)
25 with Synchronized Components ,
26 Memory Order Read => Acquire ,
27 Memory Order Write => Release ;
28 Var_L: Level_Type (Process_ID);
29
30 end Access_To_Critical_Section;
31
32 end Filter_Algorithm;
1 package body Filter_Algorithm i s
2
3 concurrent body Access_To_Critical_Section i s
4
5 procedure Acquire_Lock (ID: Process_ID) i s
6 begin
7 f o r L in Process_ID_Small ’RANGE loop
8 Level(ID) := L;
9 Last_To_Enter(L) := ID;
10 Var_L(ID) := L;
11 Private_Lock(ID);
12 end loop ;
13 end Acquire_Lock;
14
15 entry Private_Lock( f o r ID in Process_ID)
16 when ((Last_To_Enter(Var_L(ID)) /= ID) or e l s e
17 ( f o r a l l K in Level ’RANGE => ( K /= ID and then
18 Level(K) < Var_L(ID))))
19 i s
20 begin
21 nu l l ;
22 end Private_Lock;
23
24 procedure Release_Lock (ID: Process_ID) i s
25 begin
26 Level(ID) := -1;
27 end Release_Lock;
28
29 end Access_To_Critical_Section;
30
31 end Filter_Algorithm;
Listing 1.7: Filter Algorithm
23
API-based non-blocking stack. Here we present how a non-blocking stack
can be implemented via the API proposed in Sec. 7.
1 subtype Data i s Integer ;
2
3 type List;
4 type List_P i s a c ce s s List;
5 type List i s
6 record
7 D: Data;
8 Next: List_P;
9 end record ;
10
11 Empty: except ion ;
12
13 concurrent Lock_Free_Stack
14 i s
15 procedure Push(D: Data);
16 procedure Pop(D: out Data);
17 pr i vat e
18 Head: List_P with Read Modify Write;
19 end Lock_Free_Stack;
20
21 concurrent body Lock_Free_Stack i s
22 procedure Push (D: Data) i s
23 New_Node : List_P := new List;
24 f unc t i on Update_Head_Push return List_P i s
25 begin
26 return New_Node ;
27 end Update_Head_Push;
28 f unc t i on RMW_Head_Push return Boolean i s
29 new Memory_Model.Read Modify Write(
30 Some_Synchronized_Type => List_P ,
31 Update => Update_Head_Push ,
32 Read_Modify_Write_Variable => Head ,
33 Memory Order Success => Release ,
34 Memory Order Failure => Relaxed);
35 begin
36 loop with Sync Loop
37 New_Node . a l l := (D => D, Next => Head ’Concurrent Read(
38 Memory Order => Relaxed);
39 e x i t when RMW_Head_Push;
40 -- This is an RMW operation; so: if value of head has changed in
41 -- between , the loop is reexecuted;
42 -- if not , the assignment succeeds .
43 -- NOTE: memory_order release initiates a happens_before
44 -- relationship for the memory_order aquire in pop
45 end loop ;
46 end Push;
47
48 procedure Pop(D: out Data) i s
49 Old_Head : List_P;
50 f unc t i on Update_Head_Pop return List_P i s
51 begin
52 return Old_Head .Next;
53 end Update_Head_Pop;
54 f unc t i on RMW_Head_Pop return Boolean i s
55 new Memory_Model.Read Modify Write(
56 Some_Atomic_Type => List_P ,
57 Update => Update_Head_Pop ,
58 Read_Modify_Write_Variable => Head ,
59 Memory Order Success => Relaxed ,
60 Memory Order Failure => Relaxed);
61 begin
62 loop with Sync Loop
63 Old_Head := Head ’Concurrent Read(Memory Order => Relaxed);
64 i f Old_Head /= nu l l then
65 i f RMW_Head_Pop then
24
66 -- This is an RMW operation; so: if value of head has changed in
67 -- between , the if statement terminates and the loop body is
68 -- executed once more ,
69 -- if not , the assignment succeeds and the then branch is
70 -- executed .
71 -- NOTE: memory_order aquire establishes a happens_before
72 -- relationship with the memory_order release in push
73 D := Old_Head .D;
74 e x i t ;
75 end i f ;
76 e l s e
77 r a i s e Empty;
78 end i f ;
79 end loop ;
80 end Pop;
81 end Lock_Free_Stack;
Listing 1.8: Non-blocking Stack Implementation Using Generic Function
Memory Model.Read Modify Write
