We address the problem of statically checking control state reachability (as in possibility of assertion violations, race conditions or runtime errors) and plain reachability (as in deadlock-freedom) of phaser programs. Phasers are a modern non-trivial synchronization construct that supports dynamic parallelism with runtime registration and deregistration of spawned tasks. They allow for collective and point-to-point synchronizations. For instance, phasers can enforce barriers or producerconsumer synchronization schemes among all or subsets of the running tasks. Implementations are found in modern languages such as Habanero Java. Phasers essentially associate phases to individual tasks and use their runtime values to restrict possible concurrent executions. Unbounded phases may result in infinite transition systems even in the case of programs only creating finite numbers of tasks and phasers. We introduce an exact gaporder based procedure that always terminates when checking control reachability for programs generating bounded numbers of coexisting tasks and phasers. We also show verifying plain reachability is undecidable even for programs generating few tasks and phasers. We then explain how to turn our procedure into a sound analysis for checking plain reachability (including deadlock freedom). We report on preliminary experiments with our open source tool.
Abstract-We address the problem of statically checking control state reachability (as in possibility of assertion violations, race conditions or runtime errors) and plain reachability (as in deadlock-freedom) of phaser programs. Phasers are a modern non-trivial synchronization construct that supports dynamic parallelism with runtime registration and deregistration of spawned tasks. They allow for collective and point-to-point synchronizations. For instance, phasers can enforce barriers or producerconsumer synchronization schemes among all or subsets of the running tasks. Implementations are found in modern languages such as Habanero Java. Phasers essentially associate phases to individual tasks and use their runtime values to restrict possible concurrent executions. Unbounded phases may result in infinite transition systems even in the case of programs only creating finite numbers of tasks and phasers. We introduce an exact gaporder based procedure that always terminates when checking control reachability for programs generating bounded numbers of coexisting tasks and phasers. We also show verifying plain reachability is undecidable even for programs generating few tasks and phasers. We then explain how to turn our procedure into a sound analysis for checking plain reachability (including deadlock freedom). We report on preliminary experiments with our open source tool.
Index Terms-phasers, safety verification, dynamic synchronization, collective synchronization, Point-to-point synchronization, model checking
I. INTRODUCTION
We focus on safety verification of programs using phasers for task synchronization [1] - [3] . This sophisticated construct dynamically unifies collective and point-to-point synchronizations. For instance, it allows for dynamic registration and deregistration of tasks allowing for a more balanced usage of the computing resources when compared to static producerconsumer or barrier constructs [4] . The construct can be added to any parallel programming language with a shared address space. For instance, it can be found in Habanero Java [3] , an extension of the Java programming language. Phasers build on the clock construct from the X10 programming language [1] . They can be created dynamically and spawned tasks may get registered or deregistred at runtime.
Intuitively, each phaser associates two phases (hereafter wait and signal phases) to each registered task. Apart from creating phasers and registering each other to them, tasks can individually issue wait and signal commands to a phaser they are registered to. Intuitively, signal commands are used to inform other registered tasks the issuing task is done with its signal phase. The command is non-blocking. It increments This work is partially supported by the CENIIT research organization. the signal phase associated to the issuing task on the given phaser. The wait command is instead used to check whether all registered tasks are done with (i.e., have a signal phase that is strictly larger than) the issuing task's wait phase. This command may get blocked by a task that did not yet finish the corresponding phase. Unlike classical barriers, phasers need not force registered tasks to wait for each other at each single phase. Instead they allow them to proceed with the following phases (by issuing signal commands), or even to exit the construct by deregistering from the phaser. Such dynamic behavior allows for better load balancing and performance, but comes at the price of making it easy to introduce programming mistakes such as assertion violations, race conditions, runtime errors and, in the important situation where wait and signal commands are decoupled for maximum flexibility, deadlocks. We summarize our contributions in this work:
• We propose an operational model based on [2] , [3] , [5] . • We show undecidability of checking deadlock-freedom for programs with fixed numbers of tasks and phasers. • We describe an exact gap-order based symbolic verification procedure for checking control state reachability (as in assertion violations, race conditions or runtime errors) and plain reachability (as in checking deadlock freedom). • We show termination of the procedure for control state reachability when numbers of tasks and phasers are fixed. • We describe how to turn the procedure into a sound overapproximation for plain reachability. • We report on our preliminary experiments with our open source tool.
Related work. We are not aware of automatic formal verification works that focus on constructs allowing for such a degree of dynamic parallelism. Unlike [6] , we focus on fully automatic verification and consider the richer and more challenging phaser construct. The work of [5] considers the dynamic verification of phaser programs and can therefore only reason about particular program inputs and runs. The work in [7] uses Java Path Finder [8] to explore several runs, but still for one concrete input at a time. The works in [9] , [10] target gap-order systems. Although phaser programs share some of their properties (larger gaps can do more), the results in [9] , [10] do not apply since gap-order systems crucially forbid exact increments.
Outline. We describe a phaser program and recall some preliminaries in Sections II and III. This is followed in Section IV by a formal description of phaser programs and of the properties we want to check. We also establish the undecidability of checking deadlock freedom. We introduce a gap-order based symbolic representation in Section V and describe in Section VI a simple verification procedure. We then show decidability of checking control state reachability and introduce a relaxation procedure for checking plain reachability. Finally, we report on our experiments and conclude the work. Descriptions of the proofs can be found in [11] .
II. MOTIVATING EXAMPLE
The program listed in Fig. (1) uses Boolean shared variables B = {a, b, done}. A main task creates two phasers (lines 5 and 6). When creating a phaser, the task gets automatically registered to it. The main task also creates three other task instances (lines 9, 10 and 11). Several tasks can be registered to several phasers. When a task t is registered to a phaser p, a pair of numbers (wait t p , sig t p ), each in N∪{+∞}, is associated to the couple (t, p). The pair represents the individual wait and signal phases of task t on phaser p.
Registration of a task to a phaser can occur in one of three modes: SIG WAIT, WAIT and SIG. In SIG WAIT mode, a task may issue both signal and wait commands. In WAIT mode, a task may only issue wait commands on the phaser. Finally, when registered in SIG mode, a task may only issue signal commands. Issuing a signal command by a task on a phaser results in the task incrementing its signal phase associated to the phaser. This command is non-blocking. On the other-hand, issuing a wait command by a task on a phaser p will block until all tasks registered on p exhibit signal values on p that are strictly larger than the wait value of the issuing task on phaser p. In this case, the wait phase of the issuing task is incremented. Intuitively, a signal command allows the issuing task to state other tasks need not wait for it to complete its signal phase. In retrospect, a wait command allows a task to make sure all registered tasks have moved past its wait phase.
Upon creation of a phaser, wait and signal phases are initialized to 0 (except in WAIT mode where the signal phase is instead initialized to +∞ in order to not block other waiters). The only other way a task may get registered to a phaser is if an already registred task does register it in the same mode (or in WAIT or SIG if the registrar is registered in SIG WAIT). In this case, wait and signal phases of the newly registered task are initialized to those of the registrar. Tasks are therefore dynamically registered (e.g., lines 9-11). They can also dynamically deregister themselves (e.g., lines 25-26);
In this example, two producers and one consumer are synchronized using two phasers. The consumer requires the two producers to be ahead of it (wrt. the phaser main pointed to with prod) in order for it to consume their respective products. At the same time, the consumer needs to be ahead of both producers (wrt. the phaser main pointed to with cons) in order for these to produce their pair of products. It should be clear that phasers can be used as barriers for synchronizing dynamic subsets of concurrent tasks. Observe producers need not, in general, proceed in a lock step fashion. Producers may produce many items before consumers "catch up".
We are interested in checking: (a) control state reachability as in assertions (e.g., line 44), race conditions (e.g., mutual exclusion of lines 20 and 49) or runtime errors (e.g., signaling a dropped phaser), and (b) plain reachability as in deadlocks (e.g., a producer at line 23 and a consumer at line 50 with equal phases). Intuitively, both problems concern themselves with the reachability of target sets of program configurations. The difference is that control state reachability defines the targets with the states of the tasks (their control locations and whether they are registered to some phasers). Plain reachability can, in addition, use values or relations between values of involved phases. Observe that control state reachability depends on the values of the actual phases, but these values are not used to define the target sets. For example, assertions are expressed as predicates over Boolean variables (e.g., line 44). Establishing such an assertion requires capturing the constraints imposed by the phasers on the program behaviors.
Our work proposes a sound and complete algorithm for checking control state reachability in case a bounded number of tasks and phasers are generated. The algorithm can handle arbitrarily large phases, e.g., generated using nested signaling loops. The algorithm starts from a symbolic representation of all bad configurations and successively computes sets of predecessor configurations. We show termination based on a well-quasi-ordering argument that imposes restrictions on what can be expressed with our symbolic representation. For instance putting upper bounds on differences between phases is forbidden. Deadlock configurations cannot be faithfully captured with such restricted representations. Intuitively, a deadlocked configuration will have a cycle where each involved task is waiting for the task to its right but where the wait phase of each task equals the signal phase of the task it is waiting for. We show the problem of checking deadlock freedom to be undecidable even for programs only generating a bounded number of tasks and phasers. We explain how to turn our verification algorithm into a sound but incomplete procedure for checking deadlock-freedom. Precision can then be augmented on demand to eliminate false positives.
III. PRELIMINARIES
We use N and Z for natural and integer numbers respectively. We write A B to mean the union of disjoint sets A and B. We let Pfn (A, B) be the set of partial functions from A to B and use ∅ A for the empty function over A, i.e., ∅ A (a) is undefined (written ∅ A (a) ↑) for all a ∈ A. Given function g ∈ Pfn (A, B) we write g(a) ↓ to mean that g(a) is defined and write g \ {a} to mean the restriction of g to the domain A\{a}. We write g[a ← b] for the function that coincides with g on A except for a that is sent to b. We abuse notation and let, for pairwise different
mean the function that coincides with g on A except for each a i that is sent to the corresponding b i . We sometimes write a function g as a set {a → g(a) | a ∈ A}. It is then implicitly undefined outside of A. Fig. 1 . Two producers and one consumer are synchronized using two phasers. In this construction, the consumer requires both producers to be ahead of it (wrt. the prod phaser) in order for it to consume their respective products. At the same time, the consumer needs to be ahead of both producers (wrt. the cons phaser) in order for these to be able to produce their pair of products. Fig. (1) . Observe that there is no a priori bound on the values of the different wait and signal phases. In this example, the difference between signal and wait phases is bounded. This is not always the case in general.
IV. LANGUAGE
A program may use a set B of shared Boolean variables and a set V of local phaser variables:
A program consists in a set of tasks T. A task is declared with
. v k are phaser variables that are local to the declared task. A task can also create a new phaser with v = newPhaser() and store the identifier of the phaser in a local variable v. We let V be the union of all local phaser variables. When creating a phaser, a task gets registered to it. To simplify our description, we will assume all registrations to be in SIG WAIT mode. Including the other modes is a matter of changing the initial phase values at registration and of statically ensuring the issued commands respect the registration mode. A task can deregister itself from a phaser referenced by a variable v with v.drop(). It can also issue signal or wait commands on a phaser on which it is registered and that is referenced by v. A task can spawn another task with asynch(task, v 1 , . . . , v n ). The issuing task registers the spawned task to the phasers it points to with v 1 , . . . , v n . The issuing task need not wait for the spawned task and may directly continue its execution.
Assume a phaser program prg = (B, V, T). We inductively define the finite set S of control sequences as follows. S is the smallest set containing: (i) suffixes of each "stmt i " appearing in some "task i (v 1i , . . . , v ki ) {stmt i }"; and (ii) suffixes of "stmt i ; while(cond) {stmt i }; stmt j " (respectively "stmt i ; while(cond) {stmt i }") for each "while(cond) {stmt i }; stmt i " (respectively "while(cond) {stmt i }") in S; and (iii) suffixes of "stmt i ; stmt j " (respectively "stmt i ") for each "if(cond) {stmt i }; stmt j " (respectively "if(cond) {stmt i }") appearing in S. We write s to mean some control sequence in S, and hd(s) and tl(s) to respectively mean the head and the tail of the sequence s.
A. Semantics.
A configuration c of prg = (B, V, T) is a tuple (T , P , bv bv bv, pc pc pc, pv pv pv, ϕ ϕ ϕ) where:
• T is the current finite set of task identifiers. We let t, u range over the values in T . • P is the current finite set of phaser identifiers. We let p, q range over the values in P . • bv bv bv : B → {true, false} is a total mapping that associates a value to each b ∈ B. • pc pc pc : T → S is a total mapping that associates tasks to their remaining sequences (i.e., control location). • pv pv pv : T → Pfn (V, P ) is a total mapping that associates, to each task identifier in T , a partial mapping from the local phaser variables V to phaser identifiers P . It captures the values of the phaser variables V of each task. • ϕ ϕ ϕ : P → Pfn T , N 2 is a total mapping that associates to each phaser p ∈ P a partial mapping ϕ ϕ ϕ(p) that is defined exactly on the identifiers of the tasks registered to p. For such a task t, ϕ ϕ ϕ(p)(t) is the pair (wait t p , sig t p ) representing wait and signal values of t on p. The set of tasks T is altered by asynch(task, v 1 , . . . , v n ) and exit statements (rules (asynch) and (exit) in Fig.(3) ). The set of phasers P is updated upon creation of new phasers (rule (newPhaser) in Fig.(3) ). The mapping pv pv pv associates values to program phaser variables. Accessing variables with undefined values, or phasers to which the task is not currently registered, leads to runtime errors (rule (runtime error)). The total mapping ϕ ϕ ϕ captures states of phasers. It associates to each phaser identifier p in P a partial mapping ϕ ϕ ϕ(p). This partial mapping is defined for a task identifier t ∈ T (i.e., ϕ ϕ ϕ(p)(t) ↓) iff the task t is registered to the phaser p. In this case, ϕ ϕ ϕ(p) gives the waiting phase wait t p and the signaling phase sig t p of the task t on the phaser p. Initially, a unique "main" task t 0 starts executing its stmt main with no phasers. ϕ ϕ ϕ is the empty function with an empty domain ∅ ∅ . After a task t executes a v := newPhaser() statement (rule (newPhaser) in Fig.(3) ), a new phaser p is associated to the variable v using pv pv pv and ϕ ϕ ϕ(p) becomes the partial function {t → (0, 0)}. The initial configuration is c init = ({t 0 } , {} , bv bv bv false , {t 0 → stmt} , ∅, ∅), where a "main" task with identifier t 0 and code stmt is the unique initial task. No phasers are present in the initial configuration, and all Boolean variables are mapped to false.
Given two configurations c and c with c = (T , P , bv bv bv, pc pc pc, pv pv pv, ϕ ϕ ϕ), we write c t − → c if there is a task t ∈ T such that one of the rules in Fig.(3) holds. We use * − → for the reflexive transitive closure of − → and write c * − → c to mean that c is reachable from c. A configuration is said reachable if it is reachable from the initial configuration c init .
1) Control-state reachability: Checking the possibility of assertion violations, of runtime errors and of race conditions amounts to checking reachability of configurations respectively in badConfs race for some number of tasks n and number of phasers p. We introduce in Section V a complete procedure for checking reachability of such sets of configurations and show it to be sound for programs with fixed upper bounds on numbers of generated phasers and tasks.
2) Deadlocks as in plain reachability: We are also interested in checking the possibility of deadlocks. For this we need to define the notion of a blocked task. Assume in the following a configuration c = (T , P , bv bv bv, pc pc pc, pv pv pv, ϕ ϕ ϕ).
Definition 1 (Blocked). A task t ∈ T is blocked at phaser p ∈ P by task u ∈ T if hd(pc pc pc(t)) = v.wait() with pv pv pv(t)(v) = p and ϕ ϕ ϕ(p)(t) = (wait t p , ) when ϕ ϕ ϕ(p)(u) = ( , sig u p ) and sig u p ≤ wait t p . Intuitively, a task t is blocked by a task u if it cannot finish its wait command on some phaser because it is waiting for task u that did not issue enough signal commands on the same phaser.
Definition 2 (Deadlock). (T , P , bv bv bv, pc pc pc, pv pv pv, ϕ ϕ ϕ) is a deadlock configuration if each task of a non empty subset U ⊆ T is blocked by some task in U.
Theorem 1 (Deadlock-Freedom). It is undecidable in general, even for programs with only three phasers and four tasks, to check for deadlock-freedom.
The idea of the proof is to encode the reachability problem of any given 3-counters reset-VAS (vector addition system with reset arcs) as the reachability problem of a configuration with a cycle involving three phasers and three tasks (in addition to the main task). Indeed, reachability of configuration (s F , 0, 0, 0) (three counters with zero values at some control location s F ) is undecidable for reset-VASs. The idea then is to spawn three tasks and as many phasers. The value of each counter is captured with the difference between the signal and the wait of a pair of tasks on one phaser. Resets are encoded by asking a task to drop a phaser and exit and spawning a new task. The encoding ensures that a deadlock is reached exactly when the vector addition system reaches configuration (s F , 0, 0, 0). (See [11] for more details.)
V. SYMBOLIC VERIFICATION OF PHASER PROGRAMS
We briefly introduce gap-order constraints and use them to define a symbolic representation (hereafter constraints) that we use in Section VI for checking reachability.
A. Gap-order constraints and graphs [9] , [10] , [12] , [13] .
Gap-order constraints can be regarded as a particular case of the octagons or the unit two variables per inequality (utvpi) constraints. Assume in this section that x and y are integer variables and that k is an integer constant. We use X and Y to mean finite sets of integer variables. A valuation val is a total function X → Z. Valuations are implicitly extended to preserve constants (i.e. val(k) = k for any k ∈ Z). A gaporder clause δ over X is an inequality of the form a − b ≥ k where a, b ∈ X ∪ {0}. A gap-order constraint Δ over X is a finite conjunction of gap-order clauses over the same set X. Observe that (x = y + 2 ∧ y ≤ 5) is essentially a gaporder constraint because it can be equivalently rewritten as hd(pc pc pc(t)) = asynch(task, v1, .
(T , P , bv bv bv, pc pc pc, pv pv pv, ϕ ϕ ϕ) t − → T , P , bv bv bv, pc pc pc [t ← tl(pc pc pc(t))], pv pv pv, ϕ ϕ ϕ 
(T , P , bv bv bv, pc pc pc, pv pv pv, ϕ ϕ ϕ) ∈ badConfs . The result of the closure procedure is a special graph ℘ false denoting the graph without any satisfying valuation each time a weight k=+∞ is generated. The closure of a graph can be computed in polynomial time and we get Sat(clo (℘)) = Sat(℘). We define the degree of a graph ℘ (written degreeOf(℘)) to be 0 if no edge in clo (℘) has a negative weight apart from −∞. Otherwise, degreeOf(℘) is the largest natural k ∈ N such that there is an edge in clo (℘) with weight −k. For instance, the degree of the graph resulting from (x − y ≥ 2 ∧ y − x ≥ −4) is 4. We systematically close all manipulated graphs and write G(X) for the set of closed graphs over X. Given a graph ℘, we write ℘[x/y] to mean the graph obtained by replacing the vertex x by the vertex y. We abuse notation and write ℘[{x i /y i | i ∈ I}], for pairwise different x i elements to mean the simultaneous application of the individual substitutions. For a set of variables Y , we write ℘ Y to mean the graph obtained by removing the variables in Y from the vertices of ℘. Given two closed graphs ℘ and ℘ over the same X, we write ℘ G ℘ to mean that each directed edge in ℘ is labeled with a larger weight in ℘ . As a result, Sat(℘ ) ⊆ Sat(℘). Finally, we write ℘ ℘ to mean the closure of the graph obtained with merging the two sets of vertices and edges. As a result, Sat(℘ ℘ ) = Sat(℘) ∩ Sat(℘ ).
B. Constraints as a symbolic representation.
A constraint φ is a tuple (T , P , bv bv bv, pc pc pc, pv pv pv, γ γ γ) where the only difference with the definition of a configuration (T , P , bv bv bv, pc pc pc, pv pv pv, ϕ ϕ ϕ) is the adoption of a gap-order constraint γ γ γ instead of ϕ ϕ ϕ. More specifically, γ γ γ : P → ∪ U⊆T G(∪ t∈U {ω t p , σ t p }) is a total mapping that associates a gap-order graph to each phaser p ∈ P . Intuitively, we use variables ω t p and σ t p to constrain in graph γ γ γ(p) possible values of both wait (wait t p ) and signal (sig t p ) phases of each task t registered to phaser p. As a result, we can check if task t is registered to phaser p according to graph ℘ = γ γ γ(p) by checking if {ω t p , σ t p } ⊆ varsOf(℘). We will write Reg(p, ℘) to mean the set of tasks {t | {ω t p , σ t p } ⊆ varsOf(℘)}. We also write isReg(t, p, ℘) for the predicate t ∈ Reg(p, ℘). Observe that the language semantics impose that, for each phaser p and for any pair t, u of tasks in Reg(p, ℘), the predicate 0 ≤ wait t p ≤ sig u p is an invariant. For this reason, we always safely strengthen, in any obtained γ γ γ(p) = ℘, weights k in σ t p k − → ω u p , σ t p k − → 0 and ω t p k − → 0 with max(k, 0). The following definition helps us characterize configurations for which our procedure terminates.
Definition 3 (degree and freeness of constraints). A constraint (T , P , bv bv bv, pc pc pc, pv pv pv, γ γ γ) has as degree the largest degree among all its graphs γ γ γ(p) for p ∈ P if P is not empty and 0 otherwise. Furthermore, a constraint is said to be "free" if, for any p ∈ P , the only edges in γ γ γ(p) with weights different from −∞ are edges of the forms (i) σ −−−→ 0 for some t, u ∈ Reg(p, γ γ γ(p)) and k (σ t p ,ω u p ) , k (σ t p ) , k (ω t p ) ∈ N Free constraints are only allowed to impose, for the same phaser, non-negative lower bounds on differences between signals and waits, between signals and 0, and between waits and 0. Like degree-0-constraints, free constraints are not allowed to put a positive upper bound on how much a signal is larger than a wait. Unlike degree-0-constraints, they are not allowed to put bounds on the differences among signal values, or among wait values. For instance a free constraint cannot impose σ t p − σ u p = 0 while a degree-0-constraint can. Intuitively, freeness does not oblige our verification procedure to maintain exact differences when firing "signal" or "wait" instructions, jeopardizing termination. This will be stated in Section VI. pc pc pc = pc pc pc[t ← v := newPhaser(); pc pc pc(t)] ∧ Theorem
