Relaxed memory consistency models specify effects of executions of statements among threads, which may or may not be reordered. Such reorderings may cross loop iterations. To the best of our knowledge, however, there exists no concurrent program logic which explicitly handles the reorderings across loop iterations. This paper provides concurrent program logic for relaxed memory consistency models that can represent, for example, total store ordering, partial store ordering, relaxed memory ordering, and acquire and release consistency. There are two novel aspects to our approach. First, we translate a concurrent program into a family of directed acyclic graphs with finite nodes and transitive edges called program graphs according to a memory consistency model that we adopt. These represent dependencies among statements which represent reorderings of not only statements but also visibility of their effects. Second, we introduce auxiliary variables that temporarily buffer the effects of write operations on shared memory, and explicitly describe the reflections of the buffered effects to shared memory. Specifically, we define a small-step operational semantics for the program graphs with the introduced auxiliary variables, then define sound and relatively complete logic to the semantics.
Introduction
A memory consistency model is a formal definition of the behavior of shared memory that is simultaneously accessed by multiple threads. Relaxed memory consistency models [6] allow the shared memory to behave differently than that for one sequential thread. Today, relaxed memory consistency models are widely adopted by programming languages (e.g., Refs. [13] , [22] , [23] , [24] , [29] , [34] ) and CPU architectures (e.g., Refs. [20] , [21] , [38] , [39] ). This is because recent CPUs tend to have a number of cores, and large computing systems consist of a large number of computing nodes. Thus, it is difficult to adhere to non-relaxed memory consistency models without severe performance degradation.
One big problem with relaxed memory consistency models is that programming is very error prone because, from the programmer's point of view, the shared memory behaves unexpectedly. For example, in a certain relaxed model, the effects of memory operations performed by one thread can be observed by another thread in a different order. To address this problem, program verification can be used to detect bugs caused by unexpected behavior in the relaxed memory consistency models. Recently, program verification for relaxed memory consistency models has been ex- 1 STAIR Lab., Chiba Institute of Technology, Narashino, Chiba 275-0016, Japan 2 RIKEN AICS, Kobe, Hyogo 650-0047, Japan a) abet@stair.center b) tosh@stair.center tensively studied [1] , [2] , [3] , [5] , [7] , [8] , [9] , [11] , [14] , [25] , [35] , [40] , [49] .
One approach of program verification for ensuring safety of programs is program logic. Program logic is based on proofs, which ensure that a given property holds on any execution of programs. Recently, concurrent program logics with relaxed memory consistency models are seen [17] , [33] , [44] , [45] .
However, it is not easy to construct concurrent program logic that can handle programs with loops under relaxed memory consistency models since dependencies among statements which represent reorderings of not only statements but also visibility of their effects may cross loop iterations.
For example, let us consider the concurrent program in Fig. 1 consisting of two threads. The left-side thread writes the same value to x and y on a shared memory. The values are incremented at each iteration. The right-side thread reads values from x and y.
Under relaxed memory consistency models, effects of stores to a shared memory may be reordered. For example, r2==2 && r3==0 may occur under Partial Store Ordering (PSO) [6] since an effect of a write operation to y can be overtaken by those of write operations to x at the later iterations. To the best of our knowledge, there exists no concurrent program logic that is sound and relatively complete to such semantics.
On the other hand, under Total Store Ordering (TSO) [6] , r2==2 && r3==0 never occurs since TSO prohibits effects of write operations to x to overtake that of a write operation to y. Thus, dependencies/independencies between effects across loop iterations depend on memory consistency models that we adopt. Since memory consistency models refer to effects that occur in low-level, we cannot handle effects across loop iterations through descriptions of loops in high-level. The authors learned its significance by an experience of detecting a similar bug in a numeric computing program of climate simulation [36] , and started to construct verification theories that are parameterized by memory consistency models in consideration of dependency/independency between effects across loop iterations [4] . To the best of our knowledge, there exists no concurrent program logic which explicitly describes dependencies across loop iterations under relaxed memory consistency models.
This paper provides concurrent program logic for standard relaxed memory consistency models that can represent, for example, TSO, PSO, Relaxed Memory Ordering (RMO) [6] , and Acquire and Release consistency (AR) [20] . There are two novel aspects to our approach. First, we translate a concurrent program into a family of directed acyclic graphs with finite nodes and transitive edges called program graphs according to a memory consistency model that we adopt. These represent the dependencies among statements, which may or may not be reordered even across loop iterations. Second, we introduce auxiliary variables that temporarily buffer the effects of write operations on shared memory, and explicitly handle the reflections of the buffered effects to shared memory. Specifically, we define a small-step operational semantics for the program graphs with the introduced auxiliary variables, then define a sound and relatively complete logic to the semantics.
The rest of this paper is organized as follows. Section 2 introduces program graphs, and memory consistency models. Section 3 defines the operational semantics for the program graphs, and Section 4 explains our concurrent program logic, and its relation to the operational semantics. Section 5 slightly extends the operational semantics and the concurrent program logic, then shows proofs of soundness and relative-completeness of the extended proof (and soundness of the original one). Section 6 shows example derivations on the concurrent program in Fig. 1 and its variants. Section 7 discusses related work, and Section 8 concludes the paper and discusses ideas for future work.
Representations of Programs and Memory Consistency Models
In this section, we formally define our target concurrent programs, representations of programs with reordering structures, representations of memory consistency models, and present some related definitions.
Programs
Similar to the conventional Hoare logic (e.g., [19] ), sequential programs are defined as sequences of statements. Parentheses are often omitted, and operators are assumed to be left associative. Let r denote thread-local variables (which cannot be accessed by other threads), x, y, . . . denote shared variables, and e denotes thread-local expressions (thread-local variables, constant values val, arithmetic operations, and so on). A sequential program can then be defined as:
In the above definition, the superscript i represents (an identifier of) the thread on which the associated statement will be executed.
In the rest of this paper, this superscript is often omitted when the context renders it obvious. The SK statement denotes an ordinary no-effect statement (SKip). As in conventional Hoare logic, MV re denotes an ordinary variable substitution (MoVe). The load and store statements denote read and write operations, respectively, for shared variables (LoaD and STore). The effect of the store statement issued by one thread may not be observed by the other threads until the FN statement is issued. The FN statement ensures that other threads can observe the effect of store statements (FeNce). The IF and WL statements denote ordinary conditional branches and iterations, respectively, where we adopt ternary conditional operators (IF-then-else-end and WhiLe-do-end). Finally, S; S denotes a sequential composition of statements. We write ϕ ∨ ψ and ϕ ∧ ψ, as (¬ ϕ) ⊃ ψ and ¬ ((¬ ϕ) ∨ (¬ ψ)), respectively. In the following, we assume that ¬, ∧, ∨, and ⊃ are stronger with respect to their connective powers. In addition to the above definition, is defined as a tautology ∀ r. r = r.
A concurrent program with N threads is defined as the composition of sequential programs by parallel connectives as follows:
In this paper, we assume that the number of threads N is fixed during program execution.
Program Graphs
As mentioned in Section 1, instead of directly handling the concurrent programs defined above, we translate them to a family of program graphs according to a memory consistency model. These are directed acyclic graphs with finite nodes and transitive edges that represent dependencies among statements. Specifically, the program graph G consists of nodes (called commands) and edges, where the nodes are defined as follows:
As shown in the definition, we introduce RF i x, which explicitly denotes the effect of ST i x e on a shared variable x (ReFlect). That is, the effect is not observed until RF i x is executed. Also, note that IF and WL statements do not exist, and guards ϕ i are introduced. IF and WL statements are finitely unfolded. That is, a program is represented by a (possibly infinite) family of program graphs.
Proving correctness of a program via its program graphs is discussed in Section 4. Nodes that share a common command are assumed to be identified by appropriate tags although this paper does not explicitly handle them.
In the generated program graphs, edges connect one node to another if the command represented by the latter node depends on that of the former node under a given memory consistency model. Let us consider the following three examples under TSO. The sequential composition ST 0 x r 0 ; LD 0 r 1 x is translated to the program graph (ST 0 xr 0 → LD 0 r 1 x) RF 0 x, which has an edge between nodes denoting the commands, because the two commands access the same shared variable x (and the effect of store is explicitly handled by reflect). Although the sequential composition LD 0 r 2 x; ST 0 x r 3 does not appear to have such a dependency, this is translated to the program graph LD 0 r 2 x → ST 0 xr 3 → RF 0 x, which has an edge between nodes denoting the commands, because reordering of the commands makes a dependency that does not originally exist if no edge exist between the nodes. In contrast, the sequential composition LD 0 r 2 x; ST 0 y r 3 is translated to the program graph LD 0 r 2 x (ST 0 y r 3 → RF 0 y) (that is, no edges between LD and ST), because there is no dependency where G 0 G 1 denotes the disjoint union of G 0 and G 1 in graph theory.
We denote the nodes, edges, root nodes, and leaf nodes of a program graph G as N(G), E(G), R(G), and L(G), respectively. The root nodes are defined as those that do not appear as the destination of any edge, and the leaf nodes are defined as nodes that do not appear as the source of any edge.
). In the rest of this paper, we often treat a program graph like a set of nodes when its edges are obvious from the context. For example, G \ {C} denotes a full sub-graph of G that does not have a node C. Moreover, we denote a graph that holds only one node {C} as C. Edges derived from transitivity are often omitted for readability.
Memory Consistency Models: Translations
We represent a memory consistency model as a translation from programs into families of program graphs. In this section, we give four example translations that represent TSO, PSO, RMO [6] , and AR [20] .
We first decompose a parallel composition into a set of sequential programs. Next, we translate each sequential program into a program graph. Finally, we create a program graph by taking a disjoint union of the program graphs.
The translation of sequential programs consists of two steps. The first step translation defined in Fig. 2 is to regard sequential compositions of statements as a sequence of commands, unfold IF and WL statements finitely, and transform the program to sequences (of the finite length) of commands. Unfolding IFϕ?S 0 :S 1 Fig. 3 Graph creations with memory consistency models. that has no nested IF and WL statement generates two sequences of commands according to S 0 and S 1 . Similarly, for example, unfolding WL ϕ?S 0 generates infinite sequences (of the finite length) of commands. Each sequence corresponds to the number of executions of iterations of the loop (as discussed in Section 4).
The second step translation g shown in Fig. 3 is to create nodes from the sequences of commands generated by the first step and edges between the nodes that have dependencies where − → C means a sequence of commands. The dependency relation C i C i 0 between C i and C i 0 is defined in Tables 1, 2, 3 and 4. Table 1 represents a dependency relation of TSO where fv(e) means variables that occur in e. Formulas in cells in Table 1 mean assumptions that make C i C i 0 true. For example, in Table 1 , for LD r x ST x e means that LD r x unconditionally depends on ST x e. On the other hand, RF x never depends on LD r x (blank cells mean contradiction). Although we do not explain each cell in detail, Table 1 represents key characteristics of TSO, that is, two loads follow the program order, two stores also follow the program order, no store overtakes any load, and a load Table 2 The PSO dependency relation C C i 0 .
c 2017 Information Processing Society of Japan Table 3 The RMO dependency relation C C i 0 .
may overtake stores. Please note that neither ST x e LD r x nor RF x LD r x is in Table 1 . Under the so-called Sequential Consistency (SC) [28] , it should be . Table 2 represents a dependency relation of PSO. The unique difference between PSO and TSO is that PSO does not preserve orders of stores and reflects. This is achieved by relaxing the assumption of RF x RF x from to x = x (emphasized by the rectangle in Table 2) . Table 3 represents a dependency relation of RMO. The differences between RMO and PSO are that RMO does not preserve an order of loads, and load and store. These are achieved by strengthening the assumption of LD r x LD r x from to r = r, and LD r x ST x e from to r ∈ fv(e) or x = x (emphasized by the rectangles in Table 3 ), respectively.
Finally, let us give a translation to represent AR, which is adopted by C11/C++11 [22], [24] . Under AR, no LD statement can overtake a LD statement with attribute acquire (acq), and no ST can delay a ST with attribute release (rel).
We modify the definition of statements as follows:
where A is a set of attributes, acq, rel, etc. Statement LD i r x, A where acq ∈ A means LD i r x that no load and store statement overtake. Statement ST i x e, A where rel ∈ A means ST i x e that cannot overtake any load and store. A dependency relation is defined in Table 4 where we omit cells except those for loads and stores. Attributes are used only to construct program graphs, and not used on semantics of program graphs as described in Section 3 and its logic as described in Section 4.
Operational Semantics
In this section, we define a small-step operational semantics for the program graphs defined in Section 2.2. Specifically, the semantics is defined as a standard state transition system, where a state (written as st) is represented as a pair of substitutions σ, Σ . The first element of the pair σ gives the value of thread-local variables and shared variables. The second element Σ represents buffers that temporarily buffer the effects of store operations to shared variables. We assume that the set of value contains a special constant value udf to represent uninitialized or invalidated buffers. We define the following three operations for substitution functions:
where we write Σi as Σ i , f ranges over both σ and Σ i , and g is a function from shared variables to values. Specifically, f [v val] and σ[Σ i ] represent updated values via substitutions, and Σ[i g] represents updated substitutions. Figure 4 shows the rules of the operational semantics where |e| σ denotes the valuation of an expression e as follows:
and σ ϕ means satisfiability of ϕ on σ in the standard manner as defined in Fig. 5 .
A pair of a program graph and a state is called a configuration. Each rule is represented by one-step reduction between configurations G, st c −→ G , st , which indicates that a command makes G, st to transit to G , st .
Specifically, O-MV evaluates e and updates σ. Rule O-LD evaluates x on Σ i , if Σ i x is defined, and on σ otherwise, and updates σ. O-ST evaluates e and updates Σ i (not σ); i.e., the rule indicates that the effect of the store operation is buffered in Σ i . Rule O-RF denotes the reflection from a store buffer to shared memory. Rule O-FN does not change the state. The command FN is only used to represent ordering constraints among the other statements. Rule O-GD handles guard nodes ϕ by simply asserting that ϕ is satisfied under state σ. If σ is not satisfied, we say that G gets stuck.
The operational semantics is nondeterministic because all rules have an assumption C ∈ R(G), and R(G) may consist of more than one element. This nondeterminacy in the choice of one of the root nodes enables the operational semantics to simulate various relaxed memory consistency models by choosing different translation approaches.
Concurrent Program Logic
In this section, we define our concurrent program logic. Our assertion language is defined as follows:
where E represents a pseudo-expression denoting thread-local variables r, shared variables x, buffered variables x i , next threadlocal variables r, next shared variables x, next buffered variables x i , constant values val, arithmetic operations, and so on. The buffered variable x i represents the value written to the shared variable x by the ST on a thread with identifier i, but not yet reflected.
In addition, df(x i ) indicates that x i is defined; i.e., there is a pending ST for x on thread i. We define [e/x i ]df(x i ) as . The next variable v represents the value of v on a state to which the current state transits under the operational semantics. We call v a current variable. Figure 6 shows the judgment rules. They are defined following the styles of Stølen and Xu's proof systems [42] , [47] , [48] . We assume that rely/guarantee-conditions are reflexive and transitive. Each judgment has form {pre, rely} G {guar, post} (where pre and post have no next variable), which states that, if the program graph G is computed under the pre-condition pre and rely-condition rely (which the other threads guarantee) according to the operational semantics of Section 3, then the guarantee-condition guar (on which the other threads rely) holds (for any possible nondeterministic computation), as usual in the conventional rely-guarantee systems. Furthermore, if the computation terminates and does not get stuck, then the postcondition post also holds. In the rest of paper, we denote More specifically, rule L-MV handles the substitution of thread-local variables with expressions. This is the same as in conventional rely-guarantee proof systems. [e/v] represents a substitution of v with e. We define Φ as σ, Σ , σ , Σ Φ for any σ, Σ , σ , Σ , where σ, Σ , σ , Σ Φ is defined as an extension (for buffered variables) of that in a standard manner of rely-guarantee system as shown in Fig. 7 . In the following, we often write σ, Σ Φ when Φ has no next variable. The first and second assumptions mean that pre and post are stable under rely, respectively, that another thread guarantees where we state that
The third assumption means that pre must be a sufficient condition that implies guar under an invariant condition about variables before and after an execution of C (written as C U W ) where U and W are finite sets of current non-buffered and buffered variables that occur in guar, respectively. A formula MV i r e U W is defined as r = e ∧ Inv U \ {r} ∧ Inv W, which means that the value of r becomes equal to the evaluation of e while values of variables in U \ {r} and W are invariant where
Its formal definition is shown in Fig. 8 j < N }. The fourth assumption means that pre must be a sufficient condition that implies post with respect to the substitution.
Rule L-EM of Fig. 6 states that the empty graph does not affect anything. Rule L-LD handles substitutions of thread-local variables with shared variables. Please note that r is substituted with the auxiliary buffer variables x i , instead of the shared variables x. Rule L-ST handles the substitution of shared variables with expressions. Please note that, as in L-LD, this rule considers the buffered variable x i instead of the shared variable x. Rule L-RF handles reflections from a store buffer to shared memory. Threads js and ks have and do not have pending stores in their own buffers, respectively. Any sequence of x k s are simultaneously replaced with x i since x k s are undefined; i.e., thread k observes the effect of the store of thread i if there is no pending store for x on thread k where − → X denotes multiple Xs and fv(Φ) represents a set of free variables in Φ in the standard manner. Any sequence of x j s are not replaced with x i ; i.e., thread j observes a pending store in its own buffer. Although a pre-conditions appears to become huge by combinations of buffered variables, we can often construct a derivation in which there exists buffered variables x i s only in a pre-condition of RF i x by using L-PR appropriately. Please refer to an example verification in Section 6. Rule L-FN handles FN and does nothing like L-EM, which means that FN is only used as a landmark in linking nodes which have dependency. Rule L-GD handles a guard ϕ. It asserts that pre and ϕ implies post. Rule L-WK is the so-called consequence rule. Rule L-SQ handles G 0 → G 1 , which is considered as a sequential composition of program graphs since all nodes in G 0 must be executed before any command of G 1 is executed. L-PR handles program graphs that have G 0 and G 1 where there is no edge between G 0 and G 1 . This corresponds to a rule of parallel compositions of programs in a standard rely-guarantee system. The third assumption means that G 1 's rely condition rely 1 must be guaranteed by the global rely-condition rely or G 0 's guarantee-condition guar 0 . The fourth assumption is similar. The fifth assumption means that guar must be guaranteed by either guar 0 or guar 1 . L-LN handles the nondeterministic choice of one of the leaf nodes of G. This rule focuses on every leaf node (denoted as C) in G one by one, and checks exhaustively whether {pre, rely} G \ {C} {guar, Φ} and {Φ, rely} C {guar, post} hold. It can be considered that this rule handles all the linearizations [18] of commands.
Validity
We define computations of program graphs, and give validity for judgments. We define the set of computations Cmp(G) of G as a finite or infinite sequence c of configurations where adjacent configurations are related by c −→ defined in Section 3 St(c, 0) . We define c · c as a concatenation of c and c. We define {pre, rely} G {guar, post} as Cmp(G) ∩ A(pre, rely) ⊆ C(guar, post), which means that any computation under pre/rely-conditions satisfies guarantee/postconditions as shown in Fig. 9 . This kind of validity is called partial correctness [46] .
Program Verification via Program Graphs
The verification of a program is carried out by deriving judgments with the rules defined in Fig. 6 for its program graphs that consist of finitely unfolded loops. Careful readers may notice that the logic described in this section handles a single program graph, but the translation in Section 2.3 may generate an infinite family of program graphs. Formally speaking, an infinite family of derivations for the generated program graphs correspond to the proof of the original program. That is, we have to construct a derivation for each generated program graph. Although it seems difficult to construct an infinite family of derivations, it is not always impossible. For example, we show an inductive construction of derivations in Section 6. Careful readers may also wonder why we unfold loops unlike the conventional program logics [19] , [26] . The reason is that our work explicitly handles dependencies/independencies across loop iterations, and unfolding loops is the most straightforward way to achieve this. To the best of our knowledge, there is no concurrent program logic which handles dependencies/independencies across loop iterations. Careful readers may also wonder if a family of program graphs contains an inadmissible behavior of a program. For example, the program in Fig. 1 generates a program graph which has ¬ r 0 = 0 as a root node, but the condition is never satisfied because r 0 is initialized to 0. Therefore, the program graph can be considered to denote an inadmissible behavior of the program. However, this is not a problem because the program graph gets stuck according to the operational semantics 3, and a derivation of a judgment for a program graph that gets stuck does not imply that the behavior of the program is valid by the definition of validity of judgments as described. In addition, careful readers may also wonder if a program graph cannot capture a behavior of a program that is represented by an infinitely unfolded loop. However, that is not a problem because a non-terminating program is out of the scope of this work that considers the so-called partial correctness [46] .
Soundness and Relative Completeness
In this section, we show our concurrent program logic is sound to the operational semantics defined in Fig. 4 . We also show soundness and relative completeness [46] between a slight extension of our concurrent program logic and its corresponding operational semantics with respect to histories of computations introduced in Refs. [47] , [48] .
Most of the soundness and relative completeness proofs in this paper follow those in Xu's PhD thesis and its journal version [47] , [48] . Differences in some definitions, propositions, lemmas, and theorems are derived from that our semantics and logic handle not programs themselves but program graphs. Interesting differences are that our proof for completeness is simpler than the proof in Refs. [47] , [48] , and our completeness theorem claims a conclusion including L-SQ and L-PR-freeness.
To handle arbitrary program graphs, we added the L-LN rule in Section 4, which is unnecessary to handle programs only. The rule is so troublesome to require a huge number of judgments as assumptions and make constructions of derivations tedious. Instead of it, the L-LN rule is so powerful to make a proof for completeness theorem drastically simple, while the proof for the completeness theorem in Refs. [47] , [48] appears to be complicated since constructing judgments as assumptions of the L-PR rule (from a valid judgment) is hard. Actually, L-SQ and L-PRfreeness in Theorem 2 shows that L-LN contains L-SQ and L-PR in provability.
In order to prove the relative completeness, we slightly extend the operational semantics of Section 3 and the concurrent program logic of Section 4 with a notion of a history, which informally holds a sequence of pairs of executed assignments and states, following the style of Refs. [47] , [48] . History variables allow us to take snapshots of computations at arbitrary time, and formally enriches assertion languages. As high expressibility of assertion languages is a point of proving completeness of program logics such as Hoare logic [46] , introducing history variables is a typical method to enrich assertion languages in concurrent program logics. Formally, we assume that the set of values contain histories, which consist of sequences of the form C 0 , − − → val 0 · · · C n−1 , − −−−→ val n−1 where C j is a command or a symbol Env (meaning an environment), and introduce special variables (called history variables). Given a program graph, let − → v be the sequence of its current variables and their buffered variables. We extend the definition of commands of assignments MV r e, LD r x, ST x e, and RF x so that they can have additional assignments of histories to history variables. Formally, C is converted
. . , h n−1 are history variables, and atomic{ C, C } means that C and C are atomically (without being disturbed by the other threads) executed. We also extend the operational semantics so that history variables are updated appropriately. Please note that the effect of assignments to h are not saved at Σ but immediately reflected to σ, i.e., history variables are shared and unbuffered. We also extend the concurrent program logic so that, on each judgment rule for an assignment C, h in a post-condition is updated to h · C, − → v , and h = h · C, − → v is added to C U W as a conjunction. We treat h as a non-buffered variable, i.e., h can belong to U in C U W . Furthermore, we add the so-called auxiliary variables rule [41] , [42] , [48] as follows:
• C is an assignment whose left value belongs to − → v 0 ,
• no variable in − → v 0 occurs in assignments whose left values do not belong to − → v 0 , and
• no variable in − → v 0 freely occurs in guards. 
We write prefix(c, i) and postfix(c, i) as the prefix of c with length i + 1 and the sequence that is derived from c by removing prefix(c, i − 1), respectively. post 1 ) , rely ∨ guar 0 ⊃ rely 1 , rely ∨ guar 1 ⊃ rely 0 , guar 0 ∨ guar 1 ⊃ guar, and c ∈ Cmp(G 0 G 1 ) ∩ A(pre 0 ∧ pre 1 , rely). In addition, we take c 0 ∈ Cmp(G 0 ) and c 1 ∈ Cmp(G 1 ) such that c = c 0 c 1 by Prop. 1. St(c, k) , St(c, k + 1) rely ∨ guar 1 holds. By rely ∨ guar 1 ⊃ rely 0 , prefix(c 0 , i + 1) ∈ A(pre 0 , rely 0 ) holds. Since Cmp(G 0 ) ∩ A(pre 0 , rely 0 ) ⊆ C(guar 0 , post 0 ) holds, in particular, St(c, i), St(c, i + 1) guar 0 holds. This contradicts St(c, i), St(c, i + 1) guar 0 . The latter case is similar.
2. Immediate from the definition of c = c 0 c 1 and 1. 3. Immediate from 1 and guar 0 ∨ guar 1 ⊃ guar. 4. By 2, rely ∨ guar 0 ⊃ rely 1 , and rely ∨ guar 1 ⊃ rely 0 , c 0 ∈ A(pre 0 , rely 0 ) and c 1 ∈ A(pre 1 , rely 1 ) hold.
By Cmp(G 0 ) ∩ A(pre 0 , rely 0 ) ⊆ C(guar 0 , post 0 ) and Cmp(G 1 ) ∩ A(pre 1 , rely 1 ) ⊆ C(guar 1 A(pre, rely) .
σ 0 , Σ 0 pre, and σ j , Σ j , σ j+1 , Σ j+1 rely for any 0 ≤ j < n. By pre ⊥ rely, σ n , Σ n pre. By the definition, σ n , Σ n , σ n [r |e| σn ], Σ n MV i r e U W . By pre ⊃ MV i r e U W ⊃ guar, σ n , Σ n , σ n [r |e| σn ], Σ n guar, that is, σ n , Σ n , σ n+1 , Σ n+1 guar.
In addition, assume |c| < ω. By pre ⊃ [e/r]post, σ n , Σ n [e/r]post. By the definition, σ n [r |e| σn ], Σ n post, that is,
Cases of L-EM, L-LD, L-ST, L-RF, and L-FN are similar. Second, assume L-WK. Let c ∈ Cmp(G) ∩ A(pre, rely). By pre ⊃ pre 0 and rely ⊃ rely 0 , c ∈ Cmp(G) ∩ A(pre 0 , rely 0 ) holds. By induction hypothesis, c ∈ C(guar 0 , post 0 ) holds. By guar 0 ⊃ guar and post 0 ⊃ post, c ∈ C(guar, post) holds. Third, assume L-GD. Let c ∈ Cmp({ϕ i }) ∩ A(pre, rely). There exist σ 0 , Σ 0 , . . . such that σ n+1 , Σ n+1 = σ n , Σ n ,
σ n ϕ, σ 0 , Σ 0 pre, and σ j , Σ j , σ j+1 , Σ j+1 rely for any 0 ≤ j < n.
By the reflexivity of guar, σ n , Σ n , σ n , Σ n guar holds, that is, σ n , Σ n , σ n+1 , Σ n+1 guar holds.
In addition, assume |c| < ω. By pre ⊥ rely, σ n , Σ n pre. By pre ⊃ ϕ ⊃ post and σ n ϕ, σ n , Σ n post holds. That is, σ n+1 , Σ n+1 post holds. By post ⊥ rely, σ |c|−1 , Σ |c|−1 post holds.
The case that A(pre, rely) . There exist st 0 , δ 0 , . . . such that
st 0 pre, and st j , st j+1 rely for any 0 ≤ j < n.
Let c and c be G 0 , st 0 δ0 −→ · · · δn−1 −→ ∅, st n and postfix(c, n), respectively. Obviously, c ∈ Cmp(G 0 ) ∩ A(pre, rely) holds. By induction hypothesis, c ∈ C(guar, Φ) holds. By the definition, σ n , Σ n Φ holds. Therefore, c ∈ Cmp(G 1 ) ∩ A(Φ, rely) holds. By induction hypothesis, c ∈ C(guar, post) holds. Therefore, c ∈ C(guar, post) holds. Fifth, assume L-PR. By Lem. 1.3 and 1.4. Sixth, assume L-LN. Let c ∈ Cmp(G) ∩ A(pre, rely) . There exist st 0 , δ 0 , . . . such that
and st j , st j+1 rely for any 0 ≤ j < n. Let c and c be G \ {C}, st 0 δ0 −→ · · · δn−1 −→ ∅, st n and postfix(c, n), respectively. Obviously, c ∈ Cmp(G \ {C}) ∩ A(pre C , rely) holds. By induction hypothesis, c ∈ C(guar, Φ C ) holds. By the definition, σ n , Σ n Φ C holds. Therefore, c ∈ Cmp(C) ∩ A(Φ C , rely) holds. By induction hypothesis, c ∈ C(guar, post) holds. Therefore, c ∈ C(guar, post) holds.
Finally, assume L-AX. Let c ∈ Cmp(G) ∩ A(pre, rely). There exist σ 0 , Σ 0 , δ 0 , . . . such that
pre ∧ pre 0 , and σ j , Σ j , σ j+1 , Σ j+1 rely ∧ rely 0 for any v 1 − → v 0 and 0 ≤ j < n. Therefore, c ∈ Cmp(G)∩A(pre∧pre 0 , rely∧rely 0 ) holds. By induction hypothesis, c ∈ C(guar, post) holds. Therefore, c ∈ C(guar, post) holds.
Next, we show relative completeness. We define G h , which has an additional assignment to h for each assignment. We also define
where ε denotes the sequence with length 0. We define that Φ characterizes G 0 , G n , pre, and rely if st n Φ holds if and only if G 0 , st 0 δ0 −→ · · · δn−1 −→ G n , st n ∈ A(pre, rely) holds for some st j and δ j (0 ≤ j < n). We say an assertion language is expressive if, for any G 0 , G n , pre, and rely, there exists Φ that characterizes G 0 h , G n h , pre h , and rely h . For any pre, rely, post, we define
Lemma 2.
( 1 ) pre ⊃ pre rely .
( 2 ) post rely ⊃ post. The case that
Cases that G is ∅ or a program graph consisting of one node are similar.
Next, assume G = {ϕ i }. Let σ, Σ , st pre rely . Let c be ϕ i , σ, Σ c −→ ∅, σ, Σ ∈ A(pre rely , rely). By the definition, σ ϕ holds. By reflexivity of rely, we can define c ∈ A(pre rely , rely) as c · ∅, σ, Σ e −→ ∅, st . By Lem. 2, c ∈ C(guar, post rely ) holds. Therefore, σ, Σ , st guar holds.
In addition, by c ∈ A(pre rely , rely) ⊆ C(guar, post rely ), σ, Σ post rely holds.
Therefore, {pre, rely} G {guar, post} holds by L-GD, Lem. 2, and L-WK.
The case that
Finally, assume the other case. Let C ∈ L(G). Let h be a history variable that does not appear G, pre, rely, guar, and post. By Prop. 2, {pre h , rely h } G h {guar, post} holds. Since the assertion language is expressive, there exists Φ C that characterizes G h , {C} h , pre h , and rely h . 
Example Derivations
In this section, we give example derivations in the logic in Section 4.
First, we give a simple derivation for a trivial program in Fig. 10 to show that detecting an invariant is a point.
If a precondition is x is non-negative and the buffers are empty, then r 2 is non-negative when the program finishes, since x is increasing by the writer thread, where we assume that 0 ≤ r 1 , for simplicity. We can easily construct a derivation that ensures this. The program graph G 0 of the first (writer) thread of the program under PSO is shown in Fig. 11 where a rectangle means a loop iteration. A judgment is {I, rely 0 } G 0 {guar 0 , I} where G 0 is any subgraph of G 0 , and
where G 1 is a program graph of the reader thread of the program, {I ∧ 0 ≤ x 1 , rely 0 ∧ rely 1 } G 0 G 1 {guar 0 ∨ guar 1 , 0 ≤ r 2 } is derivable. Thus, to detect an invariant I is a key point in constructing a derivation.
At last, we demonstrate verification of the program introduced in Section 1 by using the concurrent program logic in Section 4. Let us prove that r 2 ≤ r 3 + 1 holds when the program terminates with an appropriate pre-condition under TSO.
We denote a family of program graphs generated by the TSO translation (in Section 2) as D n G 1 where D n is derived by unfolding WL loop on the left-side thread at n times, and G 1 is LD 1 r 2 x → LD 1 r 3 y → ST 1 z 1 → RF 1 z on the right-side thread. For example, Fig. 12 shows D 2 where the two rectangles denote the two iterations. As shown in Fig. 12 , there exists a dependency Let us consider another view of the program graph, which enables us to use L-SQ rule. The program graph in Fig. 13 also denotes a unit of D n , where it does not exactly match an iteration in the original program. Formally, G 0 is defined as follows:
By the definition, we can represent D n as sequential compositions of program graphs:
where G 0 = ∅ and G n = G → G n−1 (1 ≤ n) . We prove that G 0 has an invariant I at its entry and exit, as shown in Fig. 14 where 15 shows a full derivation where
Since D 0 gets stuck under a pre-condition r 0 = 0 and a relycondition r 0 = r 0 , all admissible behaviors of the program have been verified.
On the other hand, under PSO we can construct no similar derivation since there exists no edge from RF 0 y to RF 0 x. Formally, there exists no derivation by soundness of logic described in Section 4 and existence of the computation as follows: where D 1 and D 2 are the program graphs derived from D 1 and D 2 by removing the edges between RF 0 y and RF 0 x, respectively, R * is defined as the reflexive and transitive closure of a relation R, and σ and Σ are constant functions to 0 and udf, respectively.
Related Work
There exist some papers about concurrent program logics for relaxed memory consistency models. However, most of them are specific to memory consistency models. Therefore, they cannot be essentially compared with this work, whose goal is to handle various memory consistency models.
Ridge [33] gave a rely-guarantee system for the x86-TSO semantics [32] , and demonstrated differences of behaviors of Simpson's four slot algorithm [37] under SC and x86-TSO. However, the annotated pseudo-code in Ref. [33] has barriers and memory fences at exits of loop iterations, that is, each iteration has dependencies on the other iterations. It is unclear whether his system can handle the program with dependencies across loop iterations introduced in Section 1. Also, the system is specific to x86-TSO, and he did not prove its completeness, whereas our work expresses multiple relaxed memory consistency models as translations into program graphs, and proves relative completeness.
Ferreira et al. [17] introduced a relation between commands in their paper (corresponding to statements in our paper) called subsumptions, and represented memory models as binary relations between statements. By extending judgments of the conventional separation logic [12] , [30] , they developed concurrent separation logic for relaxed memory models with invariants satisfied by the binary relations. However, since the subsumption relation is congruent, for example, c 0 c 1 implies WL ϕ?c 0 WL ϕ?c 1 , their concurrent separation logic looks to handle loops roughly, and does not seem to directly verify the program with dependencies across loop iterations introduced in Section 1. In addition, they did not show its completeness.
Vafeiadis et al. [44] , [45] gave concurrent separation logics for restricted C11/C++11 memory models [22], [24], and showed some example derivations that include loops. However, all the loops consist of one expression, one conditional branch with one atomic load operation, or have compare-and-swap at their exits. Therefore, similar to [33] , it is unclear how to handle the program with dependencies across loop iterations introduced in Section 1. Similarly, since no program in Lahav et al.'s paper, which performed Owicki-Gries reasoning for weak memory models, contains multiple statements except SK [27] , it is never clear how to handle dependencies across loop iterations. In addition, they did not show their completeness.
This paper provides a concurrent program logic following Stølen and Xu's proof systems [42] , [47] , [48] , which do not deal with relaxed memory consistency models. In other words, their systems deal with the strictest memory consistency model only.
Conclusion and Future Work
This paper provides concurrent program logic for relaxed memory consistency models that can represent standard memory consistency models and handle dependencies across loop iterations. We introduced graph representations of programs called program graphs, gave a small-step operational semantics for them, and formalized relaxed memory consistency models as translations to graphs. We keep our concurrent program logic theoretically simple, and not only sound but also relatively complete to the semantics.
There are four future directions for this work. The first is to support memory hierarchy. We assumed one-layer store buffers for simplicity of presentation. Therefore, our logic cannot currently verify programs that show different orders of effects to different threads, for example, Independent-Reads-Independent-Write [10] . The authors' previous works on model checking [2] , [3] , [5] handled such memory consistency models by introducing a notion of visibility of effects on each thread. We think that verification of the program is possible by introducing multilayered store buffers in operational semantics and auxiliary variables that represent variable on the buffers in concurrent program logic. The second is to support relaxed memory consistency models (e.g., UPC memory model [43] and C11/C++11 memory models [22], [24]) that do not assume global time. The authors' previous works on model checking [2] , [3] , [5] handled such memory consistency models by introducing a notion of speculative behavior on each thread. We think that the operational semantics and concurrent program logic can be extended in a similar way. The third is to enhance our approach to support pointers, arrays, functions, dynamic creation and termination of threads, and so forth, so that our approach can be applied to real-world programs like OpenMP [31] . We consider that our formulation seems compatible with some existing works (e.g., separation logic [12] , [30] , deny-guarantee [16] , and concurrent views framework [15] ). The fourth is to develop a software to construct/check derivations. As we have seen in Section 6, it is not easy to construct a derivation, on the contrary, to check whether the derivation follows the inference rules in our logic or not.
Finally, program graphs proposed in this paper also seem to be used as representations of programs in other fields, for example, representations of programs in model checking that is parameterized by memory consistency models, intermediate representations in compilers that are parameterized by memory consistency models, etc.
