RVSDG: An Intermediate Representation for Optimizing Compilers by Reissmann, Nico et al.
ar
X
iv
:1
91
2.
05
03
6v
1 
 [c
s.P
L]
  1
0 D
ec
 20
19
RVSDG: An Intermediate Representation for Optimizing
Compilers
Nico Reissmann1, Jan Christian Meyer1, Helge Bahmann2, and Magnus Sja¨lander1
1Norwegian University of Science and Technology
2Auterion AG
Abstract
Intermediate Representations (IRs) are central to optimizing compilers as the way the program is
represented may enhance or limit analyses and transformations. Suitable IRs focus on exposing the
most relevant information and establish invariants that different compiler passes can rely on. While
control-flow centric IRs appear to be a natural fit for imperative programming languages, analyses
required by compilers have increasingly shifted to understand data dependencies and work at multiple
abstraction layers at the same time. This is partially evidenced in recent developments such as the
MLIR proposed by Google for deep learning. However, rigorous use of data flow centric IRs in general
purpose compilers has not been evaluated for feasibility and usability as previous works provide no
practical implementations.
We present the Regionalized Value State Dependence Graph (RVSDG) IR for optimizing com-
pilers. The RVSDG is a data flow centric IR where nodes represent computations, edges represent
computational dependencies, and regions capture the hierarchical structure of programs. It repre-
sents programs in demand-dependence form, implicitly supports structured control flow, and models
entire programs within a single IR. We provide a complete specification of the RVSDG, construction
and destruction methods, as well as exemplify its utility by presenting Dead Node and Common
Node Elimination optimizations. We implemented a prototype compiler and evaluate it in terms of
performance, code size, compilation time, and representational overhead. Our results indicate that
the RVSDG can serve as a competitive IR in optimizing compilers while reducing complexity.
1 Introduction
Intermediate representations (IRs) are at the heart of every modern compiler. These data structures
represent programs throughout compilation, connect individual compiler stages, and provide abstractions
to facilitate the implementation of analyses, optimizations, and program transformations. A suitable IR
highlights and exposes program properties that are important to the transformations in a specific compiler
stage. This reduces the complexity of optimizations and simplifies their implementation.
Modern computer systems have become increasingly parallel and specialized as system designers
strive to improve their computational power. In order to take full advantage of these systems, optimizing
compilers need to expose a program’s available parallelism and be able to work at multiple abstraction
layers. This has led to an emerging interest in developing more efficient IRs for exposing the necessary
information, as exemplified by MLIR [38], an IR proposed by Google for deep learning.
Data flow centric IRs, such as the Value (State) Dependence Graph (V(S)DG) [45, 18, 21], have
emerged as a promising class of IRs for optimizing compilers. These IRs are based on the observation
that many optimizations require data flow rather than control flow information, and shift the focus to
1
explicitly expose data instead of control flow. They represent programs in demand-dependence form,
encode structured control flow, and explicitly model data flow between operations. This raises the IR’s
abstraction level, permits simple and powerful implementations of data flow optimizations, and helps
to expose the inherent parallelism in programs [21, 18, 40]. However, the shift in focus from explicit
control flow to only structured and implicit control flow requires more sophisticated construction and
destruction methods [45, 21, 39]. In this context, Bahmann et al. [3] presents the Regionalized Value
State Dependence Graph (RVSDG) and conclusively addresses the problem of intra-procedural control
flow recovery for demand-dependence graphs. They show that the RVSDG’s restricted control flow
constructs do not limit the complexity of the recoverable control flow.
In this work, we are concerned with the aspects of whole-program representation in the RVSDG. We
present the required RVSDG constructs, consider construction and destruction at the program level, and
show feasibility and practicality of this IR for optimizations by providing a practical compiler implemen-
tation. Specifically, we make the following contributions:
1. A complete RVSDG specification, including intra- and inter-procedural constructs.
2. A complete description of RVSDG construction and destruction, augmenting the previously pro-
posed algorithms with the construction and destruction of inter-procedural constructs, as well as
the handling of intra-procedural dependencies during construction.
3. A presentation of Dead Node Elimination (DNE) and Common Node Elimination (CNE) optimiza-
tions to demonstrate the RVSDG’s utility. DNE combines dead and unreachable code elimination,
as well as dead function removal. CNE permits the removal of redundant computations by detecting
congruent operations.
4. A publicly available [32] prototype compiler that implements the discussed concepts. It consumes
and produces LLVM IR, and is to our knowledge the first optimizing compiler that uses a demand
dependence graph as IR.
5. An evaluation of the RVSDG in terms of performance and size of the produced code, as well as
compile time and representational overhead.
Our results show that the RVSDG can produce competitive code and that it can serve as the IR
in a compiler’s optimization stage. This work paves the way for further exploration of the RVSDG’s
properties and their effect on optimizations and analyses, as well as its usability in code generation for
dataflow and parallel architectures.
2 Motivation
Contemporary optimizing compilers are predominantly based on imperative program representations in
the form of control flow graphs or variants. These representations preserve the sequential nature of the
input program and implicitly convey some of the semantics associated with the sequential execution
model (e.g., order of access through potentially aliased references). In case of LLVM, the representation
is based on the instruction set of a virtual CPU with operation semantics closely matching that of real
CPUs. This choice of representation is somewhat at odds with the requirements of code optimization
analysis, which is often focused on data dependence instead: As Table 1 illustrates, the majority of
optimization passes executed are concerned with data flow analysis (in the form of SSA construction and
interpretation, or in-memory data structures in the form of alias analysis and/or memory SSA).
We propose the (data-)dependence centric RVSDG as an alternative. While it requires considerably
more effort to construct the RVSDG from an imperative program as well as to recover the necessary
2
Table 1: Thirteen most invoked LLVM 7.0.1 passes at O3.
Optimization # Invocations
1. Alias Analysis (-aa) 19
2. Basic Alias Analysis (-basicaa) 18
3. Optimization Remark Emitter (-opt-remark-emitter) 15
4. Natural Loop Information (-loops) 14
5. Lazy Branch Probability Analysis (-lazy-branch-prob) 14
6. Lazy Block Frequency Analysis (-lazy-block-freq) 14
7. Dominator Tree Construction (-domtree) 13
8. Scalar Evolution Analysis (-scalar-evolution) 10
9. CFG Simplifier (-simplifycfg) 8
10. Redundant Instruction Combinator (-instcombine) 8
11. Natural Loop Canonicalization (-loop-simplify) 8
12. Loop-Closed SSA Form (-lcssa) 7
13. Loop-Closed SSA Form Verifier (-lcssa-verification) 7
Total 155
SSA Restoration 14
control flow for code generation, we believe that this cost is more than offset by the benefits provided to
the analyses and optimization stages. The following sections illustrate this hypothesis by examples.
2.1 Simplified Compilation by Strong Representation Invariants
The Control Flow Graph (CFG) in Static Single Assignment (SSA) form [10] is the predominant IR
for optimizations in modern imperative language compilers [41]. Its nodes represent a list of totally
ordered operations and its edges a program’s possible control flow paths, permitting efficient control flow
optimizations and simple code generation. The CFG’s translation to SSA form improves the efficiency of
many data flow optimizations [34, 44]. Figure 1a shows a function with a simple loop and a conditional,
and Figure 1b shows the corresponding CFG in SSA form.
This form is however not an intrinsic property of the CFG, but a specialized variant that needs to
be actively maintained. Various compiler passes, such as jump threading or live-range splitting, may
perform transformations that cause the CFG to no longer satisfy this form. As shown in Table 1, LLVM
requires SSA restoration [7] in 14 different passes.
Moreover, CFG-based compilers must constantly (re-)discover and canonicalize loops, or establish
various invariants besides SSA form. Table 1 shows that six of the 13 most invoked passes in LLVM
are helper passes only performing such tasks. They amount to 23% of all pass invocations. This lack of
enforced invariants complicates the implementation of optimizations and analyses, increases engineering
effort, unnecessarily prolongs compilation time, and leads to compiler bugs [22, 23, 24].
In contrast, the RVSDG is always in strict SSA form as edges connect each operand input to only one
output. It explicitly exposes desirable program structures, such as loops, in a tree structure (Section 4),
similarly to the Program Structure Tree [19]. This eliminates the need for SSA restoration and the other
helper passes from Table 1. Figure 1c shows the RVSDG corresponding to Figure 1a. It is an acyclic
demand-dependence graph where nodes represent simple operations or control flow constructs, and edges
represent the dependencies between computations (see Section 4). In Figure 1c, simple operations are
colored yellow, conditionals are green, loops are red, and functions are blue.
3
int
f(int a, int b, int c, int d)
{
 int li1, li2;
 int cse, epr;
 do {
   li1 = b+c;
   li2 = d-b;
   a = a*li1;
   int down = a%c;
   int dead = a+d;
   if(a > d) {
     int acopy = a;
     a = 3+down;
     cse = acopy<<b;
   } else {
     cse = a<<b;
   }
   epr = a<<b;
 } while(a > cse);
 return li2+epr;
}
(a) Code
Enter
Exit
a2:=phi(a5,a1)
b2:=phi(b2,b1)
c2:=phi(c2,c1)
d2:=phi(d2,d1)
li13:=phi(li14,li11)
li23:=phi(li24,li21)
epr2:=phi(epr3,epr1)
li14:=b2+c2
li24:=d2-b2
a3:=a2*li14
down:=a3%c2
dead:=a3+c2
branch a3>d2
a4:=3+down
cse2:=a3<<b2
cse1:=a3<<b2
epr3:=a5<<b2
branch a5>c2
r:=li24+epr
return r
a5:=phi(a3,a4)
cse3:=phi(cse1,cse2)
li11:=ud
epr1:=ud
li21:=ud
0 1
01
(b) CFG in SSA form
lambda f
theta
+
-
*
%
+
gamma
0 1
+
<<
3
<<
<<
>
>
ud ud ud
+
(c) Unoptimized RVSDG
lambda f
theta-
*
+
gamma
0 1
+
3
<<
<<
>
>
ud
+
%
(d) Optimized RVSDG
int
f(int* x, float* y, int k)
{
 *x = 5;
 *y = 6.0;
 int i=0;
 int f=1;
 int sum=0;
 int fac=1;
 do {
  sum += i;
  i++;
 } while(i < k);
 do {
   fac *= f; 
   f++;
 } while(f < k);
 return fac+sum;
}
(e) Code
lambda f
Store
5 6.0
Store
theta
<
+
0
1
0
+
theta
<
+
0
1
1
*
merge
+ merge
(f) RVSDG of Code 1e
Figure 1: RVSDG Examples
4
2.2 Unified Representation of Different Levels of Program Structures
While the CFG can represent a single procedure, representation of programs as a whole requires additional
data structures such as call graphs. The RVSDG can represent an entire program as a single data
structure where a def-use dependency of one function on another is modeled the same way as the def-use
dependency of scalar quantities. This makes it possible to apply the same program transformation at
multiple levels resulting in a considerably smaller number of transformation passes and algorithms, e.g.,
unreachable code and dead function analysis turns out to be essentially the same as dead variable analysis
(Section 6.1).
2.3 Strongly Normalized Representation
The RVSDG program representation is much more strongly normalized than control flow representations.
Programs differing only in the ordering of (independent) operations result in the same RVSDG representa-
tion, loops and conditionals always take a single canonical form. This normalization already simplifies the
implementation of transformations [45, 18, 21] and eliminates the need for (repeated) compiler analysis
passes such as loop detection.
Some common program optimizing transformations take a particular simple form in the RVSDG rep-
resentation. For example, Figure 1d shows the optimized RVSDG of Figure 1c, illustrating some of these
optimizations: The inputs to the “upper left” plus operation are easily recognized as loop invariant be-
cause their “loop entry ports” connect directly to the corresponding “loop exit ports” (operations, ports,
and edges highlighted in purple). A simple push strategy allows to recursively identify data dependent
operations as invariant and hoist them out of the loop: The addition and subtraction computing li1
and li2 are moved out of the loop (theta) as their operands, i.e. b, c, and d, are loop invariant (all
three of them connect the entry of the loop to the exit). Similarly, the shift operation common to both
conditional branches is hoisted and combined, while the division operation is moved into the conditional
as it is only used in one alternative. In contrast to CFG-based compilers, all these optimizations are
performed directly on the unoptimized RVSDG of Figure 1c and can be performed in a single regular
pass. No additional data structures or helper passes are required. See also Section 6 for further details.
2.4 Exposing Independent Computations
CFGs implicitly represent a single global machine state by sequencing all operations that could affect it.
While RVSDG can follow the same model, it is not actually limited to this interpretation. The RVSDG
can instead model the system as consisting of multiple independent states. The code in Figure 1e is used
to illustrate this concept: The depicted function contains two non-aliasing store operations (pointing to
memory objects of incompatible types) and two independent loops.
In a CFG, both stores and loops are strictly ordered. Their mutual independence needs to be es-
tablished by explicit compiler passes (and may need to be re-established multiple times during the
compilation process as the number of alias analysis passes in Table 1 illustrate) and represented using
auxiliary data structures and/or annotations. In contrast, the RVSDG permits the encoding of such in-
formation directly in the graph, as shown in Figure 1f. Disjoint memory regions (consisting of int-typed
and float-typed memory objects) are modeled as disjoint states, exposing the independence of affecting
operations in the representation. RVSDG can in principle go even further in representing a memory SSA
form that is not formally any different from value SSA form, enabling the same kind of optimizations to
be applied to both.
5
2.5 Summary
The RVSDG raises the IR abstraction level by enforcing desirable properties, such as SSA form, explicitly
encoding important structures, such as loops, and relaxing the overly strict order of the input program.
This leads to a more normalized program representation and avoids many idiosyncrasies and artifacts
from other IRs, such as the CFG, and further helps to expose parallelism in programs.
3 Related Work
A cornucopia of IRs has been presented in the literature to better expose desirable program properties for
optimizations. For the sake of brevity, we restrict our discussion to the most prominent IRs, only high-
lighting their strengths and weaknesses in comparison to the RVSDG, and refer the reader to Stanier et
al. [41] for a more complete overview.
3.1 Control (Data) Flow Graph
The Control Flow Graph (CFG) [1] exposes the intra-procedural control flow of a function. Its nodes
represent basic blocks, i.e., an ordered list of operations without branches or branch targets, and its
edges represent the possible control flow paths between these nodes. This explicit exposure of control
flow simplifies certain analyses, such as loop identification or irreducibility detection, and enables simple
target code generation. The CFG’s translation to SSA form [10], or one of its variants, such as gated
SSA [43], thinned gated SSA [14], or future gated SSA [12], additionally improves the efficiency of data
flow optimizations [44, 34]. These properties along with its simple construction from a language’s abstract
syntax tree made the CFG in SSA form the predominant IR for imperative language compilers [41],
such as LLVM [20] and GCC [9]. However, the CFG has also been criticized as an IR for optimizing
compilers [13, 17, 18, 21, 45, 47, 46]:
1. It is incapable of representing inter-procedural information. It requires additional IRs, e.g., the call
graph, to represent such information.
2. It provides no structural information about a procedure’s body. Important structures, such as loops,
and their nesting needs to be constantly (re-)discovered for optimizations, as well as normalized to
make them amenable for transformations.
3. It emphasizes control dependencies, even though many optimizations are based on the flow of data.
This is somewhat mitigated by translating it to SSA form or one of its variants, but in turn requires
SSA restoration passes [7] to ensure SSA invariants.
4. It is an inherently sequential IR. The operations in basic blocks are listed in a sequential order, even
if they are not dependent on each other. Moreover, this sequentialization also exists for structures
such as loops, as two independent loops can only be represented in sequential order. Thus, the
CFG is by design incapable of explicitly encoding independent operations.
5. It provides no means to encode additional dependencies other than control and true data depen-
dencies. Other information, such as loop-carried dependencies or alias information, must regularly
be recomputed and/or memoized in addition to the CFG.
The Control Data Flow Graph (CDFG) [27] tries to mitigate the sequential nature of the CFG by
replacing the sequence of operations in basic blocks with the Data Flow Graph (DFG) [11], an acyclic
graph that represents the flow of data between operations. This relaxes the strict ordering within a
basic block, but does not expose instruction level parallelism beyond basic block boundaries or between
program structures.
6
3.2 Program Dependence Graph/Web
The Program Dependence Graph (PDG) [13, 15] combines control and data flow within a single rep-
resentation. It features data and control flow edges, as well as statement, predicate, and region nodes.
Statement nodes represent operations, predicate nodes represent conditional choices, and region nodes
group nodes with the same control dependency. If a region’s control dependencies are fulfilled, then its
children can be executed in parallel. Horwitz et al. [16] extended the PDG to model inter-procedural
dependencies by incorporating procedures into the graph.
The PDG improves upon the CFG by employing region nodes to relax the overly restrictive sequence
of operations. This relaxed sequence combined with the unified representation of data and control
dependencies simplifies complex optimizations, such as code vectorization [4] or the extraction of thread-
level parallelism [29, 36]. However, the unified data and control flow representation results in a large
number of edge types, five in Ferrante et al. [13] and four in Horwitz et al. [15], which need to be
maintained to ensure the graph’s invariants. The PDG suffers from aliasing and side-effect problems, as
it supports no clear distinction between data held in register and memory. This complicates or can even
preclude its construction altogether [18]. Moreover, program structure and SSA form still need to be
discovered and maintained.
The Program Dependence Web (PDW) [28] extends the PDG and gated SSA [43] to provide a uni-
fied representation for the interpretation of programs using control-, data-, or demand-driven execution
models. This simplifies the mapping of programs written in different paradigms, such as the imperative
or functional paradigm, to different architectures, such as Von-Neumann and dataflow architectures. In
addition to the elements of the PDG, the PDW adds µ nodes to manage initial and loop-carried values
and η nodes to manage loop-exit values. Campbell et al. [5] further refined the definition of the PDW
by replacing µ nodes with β nodes and eliminating η nodes. As the PDW is based on the PDG, it
suffers from the same aliasing and side-effect problems. PDW’s additional constructs further complicate
graph maintenance and its construction is elaborate, requiring three additional passes over a PDG, and
is limited to programs with reducible control flow.
3.3 Value (State) Dependence Graph
The Value Dependence Graph (VDG) [45] abandons the explicit representation of control flow and only
models the flow of values using ports. Its nodes represent simple operations, the selection between
values, or functions, using recursive functions to model loops. The VDG is implicitly in SSA form and
abandons the sequential order of operations from the CFG, as each node is only dependent on its values.
However, modeling only data flow between stateful computations raises a significant problem in terms
of preservation of program semantics, as the ”evaluation of the VDG may terminate even if the original
program would not...” [45].
The Value State Dependence Graph (VSDG) [17, 18] addresses the VDG’s termination problem by
introducing state edges. These edges are used to model the sequential execution of stateful computations.
In addition to nodes for representing simple operations and selection, it introduces nodes to explicitly
represent loops. Like the VDG, the VSDG is implicitly in SSA form, and nodes are solely dependent
on required operands, avoiding a sequential order of operations. However, the VSDG supports no inter-
procedural constructs, and its selection operator is only capable of selecting between two values based on
a predicate. This complicates destruction, as selection nodes must be combined to express conditionals.
Even worse, the VSDG represents all nodes as a flat graph, which simplifies optimizations [18], but has a
severe effect on evaluation semantics. Operations with side-effects are no longer guarded by predicates,
and care must be taken to avoid duplicated evaluation of these operations. In fact, for graphs with
stateful computations, lazy evaluation is the only safe strategy [21]. The restoration of a program with
an eager evaluation semantics complicates destruction immensely, and requires a detour over the PDG to
7
arrive at a unique CFG [21]. Zaidi et al. [46, 47] adapted the VSDG to spatial hardware and sidestepped
this problem by introducing a predication-based eager/dataflow semantics. The idea is to effectively
enforce correct evaluation of operations with side-effects by using predication. While this seems to
circumvent the problem for spatial hardware, it is unclear what the performance implications would be
for conventional processors.
The RVSDG solves the VSDG’s eager evaluation problem by introducing regions. These regions
enable the modeling of control flow constructs as nested nodes, and the guarding of operations with side-
effects. This avoids any possibility of duplicated evaluation, and in turn simplifies RVSDG destruction.
Moreover, nested nodes permit the explicit encoding of a program’s hierarchical structure into the graph,
further simplifying optimizations.
4 The Regionalized Value State Dependence Graph
A Regionalized Value State Dependence Graph (RVSDG) is an acyclic hierarchical multigraph consisting
of nested regions. A region R = (A,N,E,R) represents a computation with argument tuple A, nodes N ,
edges E, and result tuple R, as illustrated in Figure 2a. A node can be either simple, i.e., it represents
a primitive operation, or structural, i.e., it contains regions. Each node n ∈ N has a tuple of inputs I
and outputs O. In case of simple nodes, they correspond to arguments and results of the represented
operation, whereas for structural nodes, they map to arguments and results of the contained regions. For
nodes n1, n2 ∈ N , an edge (g, u) ∈ E connects either output g ∈ On1 or argument g ∈ A to either input
u ∈ In2 or result u ∈ R of matching type. We refer to g as the origin of an edge, and to u as the user
of an edge. Every input or result is the user of exactly one edge, whereas outputs or arguments can be
the origins of multiple edges. All inputs or results of an origin are called its users. The corresponding
node of an origin is called its producer, whereas the corresponding node of a user is called consumer.
Correspondingly, the set of nodes of all users of an origin are referred to as its consumers. The types of
inputs and outputs are either values, representing arguments or results of computations, or states, used
to impose an order on operations with side-effects. A node’s signature are the types of its inputs and
outputs, whereas a region’s signature are the types of its arguments and results. Throughout this paper,
we use n, e, i, o, a, and r with sub- and superscripts to denote individual nodes, edges, inputs, outputs,
arguments, and results, respectively. We use g and u to denote an edge’s origin and user, respectively.
An edge e from origin g to user u is also denoted as e : (g, u), or short (g, u).
The RVSDG can model programs at different abstraction levels. It can represent simple data-flow
graphs such as those used in machine learning frameworks, but it can also represent programs at the
machine level as used in compiler back-ends for code generation. This flexibility makes it possible to use
the RVSDG for the entire compilation pipeline. In this paper, we target an abstraction level similar to
that of LLVM IR. This permits us to illustrate all of the RVSDG’s features without involving architecture-
specific details. The rest of this section defines the necessary constructs.
4.1 Nodes
Simple nodes model primitive operations such as addition, subtraction, load, and store. They have an
operator associated with them, and a node’s signature must correspond to the signature of its operator.
Simple nodes map their input value tuple to their output value tuple by evaluating their operator with
the inputs as arguments, and associating the results with their outputs. Figure 2b illustrates the use of
simple nodes as well as value and state edges. Solid lines represent value edges, whereas dashed lines
represent state edges. Nodes have as many value inputs and outputs as their corresponding operations
demand. The ordering of the load and store nodes is preserved by sequentializing them with the help of
a state edge.
8
n: node
e: edge
i: input
o: output
a: argument
r: result
4
0 1
n (simple)
a
i
o
o
region
n (structural)
r
e:(g,u)
g: origin
u: user
(a) Notation
 
 
*x += 4;  
*y += 5;
5Load
+
Store
Load 4
Store
+
y sx
(b) Simple nodes
switch(x){
  case 0: y=1; break;
  case 1: y=0; break;
  default: y=2; break;
}
gamma
1 0 2
x
0 1 2
map
(c) γ-node
int r=1, n=1;
do {
 r=n*r;
 n++;
} while(n<5);
theta
< *
+
5
1
1
(d) θ-node
Figure 2: Notation as well as examples for the usage of simple, γ- and θ-nodes.
Structural nodes contain regions and can model structural program behavior such as the conditional
or repeated evaluation of computations. We present six different kind of structural nodes: γ-nodes,
which represent conditionals, θ-nodes, which represent tail-controlled loops, λ-nodes for procedures and
functions, δ-nodes for global variables, φ-nodes for mutually recursive environments, and ω-nodes for
translation units. The rest of this section discusses each structural node in detail and illustrates their
usage.
4.1.1 Gamma-Nodes
A γ-node models a decision point and contains regions R0, ...,Rk | k > 0 of matching signature. Its
first input is a predicate, which determines the region under evaluation. It evaluates to an integer v with
0 ≤ v ≤ k. The values of all other inputs are mapped to the corresponding arguments of region Rv, Rv
is evaluated, and the values of its results are mapped to the outputs of the γ-node.
γ-nodes represent conditionals with symmetric control flow splits and joins, such as if-then-else or
switch statements without fall-throughs. Figure 2c shows a γ-node. It contains three regions: one for
each case, and a default region. The map node takes the value of x as input and maps it to zero, one, or
two, determining the region under evaluation. This region is evaluated and its result is mapped to the
γ-node’s output.
We define the entry variable of a γ-node as a pair of an input and the arguments the input maps
to during evaluation, as well as the exit variable of a γ-node as a pair of an output and the results the
output could receive its value from:
Definition 1 The pair evl = (il, Al−1) is the l-th entry variable of a γ-node with k regions. It consists
of the l-th input and tuple Al−1 = {a
R0
l−1
, ..., aRk
l−1
} with the l− 1-th argument from each region. We refer
to the set of all entry variables as EV .
Definition 2 The pair exl = (Rl, ol) is the l-th exit variable of a γ-node with k regions. It consists of a
tuple Rl = {r
R0
l
, ..., rRk
l
} of the l-th result from each region and the l-th output they would map to. We
refer to the set of all exit variables as EX.
4.1.2 Theta-Nodes
A θ-node models a tail-controlled loop. It contains one region that represents the loop body. The length
and signature of its input tuple equals that of its output, or the region’s argument tuple. The first region
9
result is a predicate. Its value determines the continuation of the loop. When a θ-node is evaluated, the
values of all its inputs are mapped to the corresponding region arguments and the body is evaluated.
When the predicate is true, all other results are mapped to the corresponding arguments for the next
iteration. Otherwise, the result values are mapped to the corresponding outputs. The loop body of an
iteration is always fully evaluated before the evaluation of the next iteration. This avoids “deadlock“
problems between computations of the loop body and the predicate, and results in well-defined behavior
for non-terminating loops that update external state.
θ-nodes permit the representation of do-while loops. In combination with γ-nodes, it is possible
to model head-controlled loops, i.e., for and while loops. Thus, employing tail-controlled loops as
basic loop construct enables us to express more complex loops as a combination of basic constructs.
This normalizes the representation and reduces the complexity of optimizations as there exists only one
construct for loops. Another benefit of tail-controlled loops is that their body is guaranteed to execute
at least once, enabling the unconditional hoisting of invariant code with side-effects.
Figure 2d shows a θ-node with two loop variables, n and r, and an additional result for the predicate.
When the predicate evaluates to true, the results for n and r of the current iteration are mapped to the
region arguments to continue with the next iteration. When the predicate evaluates to false, the loop
exits and the results are mapped to the node’s outputs. We define a loop variable as a quadruple that
represents a value routed through a θ-node:
Definition 3 The quadruple lvl = (il, al, rl+1, ol) is the l-th loop variable of a θ-node. It consists of the
l-th input i
l
, argument a
l
, and output o
l
, and the l + 1-th result of a θ-node. We refer to the set of all
loop variables as LV .
4.1.3 Lambda-Nodes
A λ-node models a function and contains a single region representing a function’s body. It features a
tuple of inputs and a single output. The inputs refer to external variables the λ-node depends on, and
the output represents the λ-node itself. The region has a tuple of arguments comprised of a function’s
external dependencies and its arguments, and a tuple of results corresponding to a function’s results.
An apply-node represents a function invocation. Its first input takes a λ-node’s output as origin, and
all other inputs represent the function arguments. In the rest of the paper, we refer to an apply-node’s
first input as its function input, and to all its other inputs as its argument inputs. Invocation maps the
values of a λ-node’s input k-tuple to the first k arguments of the λ-region, and the values of the function
arguments of the apply-node to the rest of the arguments of the λ-region. The function body is evaluated
and the values of the λ-region’s results are mapped to the outputs of the apply-node.
Figure 3a shows an RVSDG with two λ-nodes. Function f calls functions puts and max with the
help of apply-nodes. The function max is part of the translation unit, while puts is external and must
be imported (see the paragraph about ω-nodes for more details). We further define the context variable
of a λ-node. A context variable provides the corresponding input and argument for a variable a λ-node
depends on.
Definition 4 The pair cvl = (il, al) is a λ-node’s l-th context variable. It consists of the l-th input and
argument. We refer to the set of all context variables as CV .
Definition 5 The λ-node connected to a function input is the callee of an apply-node, and an apply-node
is the caller of a λ-node. We refer to the set of all callers of a λ-node as CLL.
4.1.4 Delta-Nodes
A δ-node models a global variable and contains a single region representing the constants’ value. It
features a tuple of inputs and a single output. The inputs refer to the external variables the δ-node
10
lambda 
gamma
0 1
>
omega
max
static int
max(int x, int y){
  return x > y ? x : y;
}
 
int
f(int a, int b) {
  puts("max");
  return max(a,b);
}
omega
lambda lambda
gamma
0 1
lambda f
apply apply
delta
aryconst
m a x \0
delta
(a) RVSDG with λ- and δ-nodes
lambda
gamma
0 1
-
*
1
apply
1
!=
1
phi
omega
f
unsigned int
f(unsigned int x){
 if (1!=x)
   return x*f(x-1);
 return 1;
}
omega
phi
lambda
gamma
0 1
(b) RVSDG with a φ-node
Figure 3: Example for the usage of λ-, δ-, and φ-nodes, as well as corresponding region trees.
depends on, and the output represents the δ-node itself. The region has a tuple of arguments representing
a global variable’s external dependencies and a single result corresponding to its right-hand side value.
Figure 3a shows an RVSGD with a δ-node. Function puts takes a string as argument that is the
right-hand side of a global variable. Similarly to λ-nodes, we define the context variable of a δ-node. It
provides the corresponding input and argument for a variable a δ-node depends on.
Definition 6 The pair cvl = (il, al) is a δ-node’s l-th context variable. It consists of the l-th input and
argument. We refer to the set of all context variables as CV .
4.1.5 Phi-Nodes
A φ-node models an environment with mutually recursive functions, and contains a single region with
λ-nodes. Each single output of these λ-nodes serves as origin to a single result in the φ-region. A φ-node’s
outputs expose the individual functions to callers outside the φ-region, and must therefore have the same
arity and signature as the results of the φ-region. The first input of an apply-node from outside the
φ-region takes these outputs as origin to invoke one of the functions.
The inputs of a φ-node refer to variables that the contained functions depend on and are mapped
to corresponding arguments in the φ-region when a function is invoked. In addition, a φ-region has
arguments for each contained function. An apply-node from inside a φ-region takes these as origin to its
function input.
φ-nodes permit a program’s mutually recursive functions to be expressed in the RVSDG without the
introduction of cycles. Figure 3b shows an RVSDG with a φ-node. The function f calls itself, and
therefore needs to be in a φ-node to preserve the RVSDG’s acyclicity. The region in the φ-node has one
input, representing the declaration of f , and one output, representing the definition of f . The φ-node
has one output so that f can be called from outside the recursive environment.
We define context variables and recursion variables. Context variables provide corresponding inputs
and arguments for variables the λ-nodes from within a φ-region depend on. Recursion variables provide
the argument and output an apply-node’s function input connects to.
11
Definition 7 The pair cvl = (il, al) is the l-th context variable of a φ-node. It consists of the l-th input
and argument. We call the set of all context variables CV .
Definition 8 For a φ-node with n context variables, the triple rvl = (rl, al+n, ol) is the l-th recursion
variable. It consists of the l-th result and l + n-th argument of the φ-region as well as the l-th output of
the φ-node. We refer to the set of all recursion variables as RV .
4.1.6 Omega-Nodes
An ω-node models a translation unit. It is the top-level node of an RVSDG and has no inputs or
outputs. It contains exactly one region. This region’s arguments represent entities that are external
to the translation unit and therefore need to be imported. Its results mark all exported entities in the
translation unit. Figure 3a and 3b illustrate the usage of ω-nodes. The ω-region in Figure 3a has one
argument, representing the import of function g, and one result, representing the export of function f.
The ω-region in Figure 3b has only one export for function f.
4.2 Edges
Edges connect node outputs or region arguments to a node input or region result, and are either value
typed, i.e., represent the flow of data between computations, or state typed, i.e., impose an ordering on
operations with side-effects. State edges are used to preserve the observational semantics of the input
program by ordering its side-effecting operations. Such operations include memory read and writes, as
well as exceptions.
In practice, a richer type system permits further distinction between different kind of values or states.
For example, different types for fixed- and floating-point values helps to distinguish between these arith-
metics, and a type for functions permits to correctly specify the output types of λ-nodes and the function
input of apply-nodes.
5 Construction & Destruction
RVSDG construction and destruction are responsible for generating an RVSDG from an input program
and reestablishing control flow for code generation, respectively. We present both stages with an Inter-
Procedure Graph (IPG) and a CFG as input and output. The IPG is an extension of a call graph and
captures all static dependencies between functions, incorporating not only those originating from (direct)
calls, but also those from other references within a function. In the IPG, an edge from node n1 to node
n2 exists, if the body of the function corresponding to n1 references the function represented by n2. The
utilization of an IPG and a CFG permits a language-independent presentation of RVSDG construction
and destruction.
5.1 Construction
RVSDG construction is responsible for mapping all constructs, concepts, and abstractions of an input
language to the RVSDG. The mapping is language-specific and depends on the language’s concrete
features. For example, languages with possibly unstructured control flow, such as C or C++, cannot be
mapped directly to the RVSDG and require the CFG as a stepping stone, while other languages, such as
Haskell, permit a direct construction [31]. In this section, we present RVSDG construction for the former
case as it supersedes the latter. Conceptually, RVSDG construction can be split in two phases:
1. Inter-Procedural Translation (Inter-PT) translates functions and inter-procedural dependencies,
creating λ- and φ-nodes.
12
static int
f(){ return 3; }
static int
g(){ return 5; }
static int
sum(int(*x)(), int(*y)()) {
 return x() + y();
}
int
tot(int z) {
 return z + sum(f, g);
}
(a) Code
tot
sum
f
g
(b) IPG
omega
lambda tot
+
lambda f
3
lambda g
5
apply
lambda sum
+
apply
apply
(c) RVSDG
Figure 4: Inter-Procedural Translation
2. Intra-Procedural Translation (Intra-PT) translates intra-procedural control and data flow, creating
a λ-region from a function’s body.
Inter-PT invokes Intra-PT for each function’s body. Both phases interact with each other through
a common symbol table. This table maps function and CFG variables to the corresponding RVSDG
arguments or outputs, and every creation of a node or region triggers updates to this table. We omit
these updates in our algorithm descriptions to avoid unnecessary cluttering.
5.1.1 Inter-Procedural Translation
Inter-PT converts all functions from the Inter-Procedure Graph (IPG) of a translation unit to λ-nodes.
Figure 4b shows the IPG for the code in Figure 4a. The code consists of four functions, with function sum
performing two indirect calls. The corresponding IPG consists of four nodes and three edges. All edges
originate from node tot, as it is the only function that explicitly references other functions, i.e. sum for
a direct call, and f and g to pass as argument. No edge originates from node sum, as the corresponding
function does not explicitly reference any other functions, and the functions for the indirect calls are
provided as arguments.
The RVSDG puts two constraints on the translation from an IPG. Firstly, mutually recursive functions
are required to be created within φ-nodes to preserve the RVSDG’s acyclicity. Secondly, Inter-PT must
respect the calling dependencies of functions to ensure that λ-nodes are created before their apply-nodes.
In order to embed mutually recursive functions into φ-nodes, we need to identify the strongly connected
components (SCCs) in the IPG. We consider an SCC trivial, if it consists only of a single node with
no self-referencing edges. Otherwise, it is non-trivial. Moreover, a trivial SCC might not have a CFG
associated with it, and is therefore defined in another translation unit.
Algorithm I outlines the RVSDG construction from an IPG. It finds all SCCs and converts trivial SCCs
to individual λ-nodes, while the λ-nodes created from non-trivial SCCs are embedded in φ-nodes. This
satisfies the first constraint. The second constraint is satisfied by processing SCCs in topological order,
creating λ-nodes before their apply-nodes. The identification and ordering of SCCs can be performed in
a single step with Tarjan’s algorithm [42], which returns the identified SCCs in reverse topological order.
Figure 4c shows the RVSDG after the application of Algorithm I to the IPG in Figure 4b. In addition to
a function’s arguments, Algorithm I adds a state argument and result to λ-regions (the red dashed line
13
Algorithm I: Inter-Procedural Translation
Compute all SCCs in an IPG and process them in topological order of the directed acyclic graph formed by the SCCs
as follows:
1. Trivial SCC:
(a) With CFG: Begin a λ-node by adding all context variables, function arguments, and an additional state
argument to the λ-region. Translate the CFG with Intra-PT as explained in Section 5.1.2, and finish the
λ-node by adding the function results and the state result to the λ-region. If a function is exported, add
a result to the ω-region and connect the λ-node’s output to it.
(b) Without CFG: Add a ω-region argument for the external function.
2. Non-trivial SCC: Begin a φ-node by adding all functions as well as context variables to the φ-region. Translate
each function in the SCC according to Trivial SCC without exporting them. Finish the φ-node by adding all
function outputs as results to the φ-region. If a function is exported, add a result to the ω-region and connect
the φ-node’s output to it.
in Figure 4c). This state is used to sequentialize stateful computations. Nodes representing operations
with side-effects consume this state and produce a new state for the next node1.
5.1.2 Intra-Procedural Translation
The RVSDG puts several constraints on the translation of intra-procedural control and data flow. Firstly,
it requires that the control flow only consists of constructs that can be translated to γ- and θ-nodes, i.e.
it can only consist of tail-controlled loops and conditionals with symmetric control flow splits and joins.
Secondly, the nesting and relation of these constructs to each other is required as the RVSDG is a
hierarchical representation. Thirdly, it is necessary to know the data dependencies of these structures in
order to construct γ- and θ-nodes. While these constraints are beneficial for optimizations by substantially
simplifying their implementation, they render RVSDG construction non-trivial.
This section’s construction algorithm enables the translation of any data and control flow, irregardless
of its complexity, to the RVSDG. It creates a λ-region from a function’s body in four stages:
1. Control Flow Restructuring (CFR) restructures a function’s CFG to make it amenable to RVSDG
construction.
2. Structural Analysis constructs a control tree [26] from the restructured CFG, discovering the CFG’s
individual control flow regions.
3. Demand Annotation annotates the discovered control flow regions with the variables that are de-
manded by the instructions within these regions.
4. Control Tree Translation converts the annotated control tree into a λ-region.
CFR ensures the first requirement by translating a function’s control flow to a form that is amenable
to RVSDG construction. It restructures control flow to a form that enables the direct mapping of a CFG’s
control flow regions to the RVSDG’s γ- and θ-nodes. CFR can be omitted for languages with limited
control flow structures, such as Haskell or Scheme. Structural analysis ensures the second requirement
by constructing a control tree from the CFG, exposing the control regions nesting and the relation to
each other. Demand annotation fulfills the third requirement by annotating the control tree’s nodes with
their data dependencies. Finally, the annotated control tree can be translated to a λ-region. The rest of
this section covers the four stages in detail.
1See Section 5.1.3 for further information.
14
Control Flow Restructuring: CFR converts a CFG to a form that only contains tail-controlled loops
and conditionals with properly nested splits and joins. This stage is only necessary for languages that
support more complex control flow constructs, such as goto statements or short-circuit operators, but
can be omitted for languages with more limited control flow. CFR consists of two interlocked phases:
loop restructuring and branch restructuring. Loop restructuring transforms all loops to tail-controlled
loops, while branch restructuring ensures conditionals with symmetric control flow splits and joins.
We omit an extensive discussion of CFR as it is detailed in Bahmann et al. [3]. In contrast to node
splitting approaches [48], CFR avoids the possibility of exponential code blowup [6] by inserting additional
predicates and branches instead of cloning nodes. Moreover, it does not require a CFG in SSA form as
this form is automatically established throughout construction.
Structural Analysis: After CFR, a restructured CFG consists of 3 single-entry/single-exit control
flow regions:
- Linear Region: A linear subgraph where the entry node and all intermediate nodes have only one
outgoing edge, and the exit node as well as all intermediate nodes have only one incoming edge.
- Branch Region: An subgraph with the entry and exit node representing the control flow split and
join, respectively, and each branch alternative consisting of a single node.
- Loop Region: A single node where an edge originates and targets this node.
These control flow regions and their corresponding nesting structure can be exposed by performing
an interval [26] or structural [37] analysis. The analysis result is a control tree [26] with basic blocks as
leaves and abstract nodes representing the control flow regions as branches.
A linear region maps to a linear node in the control tree with the linear subgraph’s entry and exit
node as the node’s left and right most child, respectively. A branch region maps to two control tree
nodes: a branch node and a linear node. The branch node represents the region’s alternatives with the
corresponding nodes as its children. A linear node with three children can then be used to capture the
rest of the branch region. Its first child is the region’s entry node, the second child the branch node
representing the alternatives, and the third child the region’s exit node. Finally, a loop region maps to
a loop node with the region’s single node as its child.
Figure 5a shows Euclid’s algorithm as a CFG, and Figure 5b shows the same CFG after CFR,
which restructured the head-controlled loop to a tail-controlled loop. The left of Figure 5c shows the
corresponding control tree.
Demand Annotation: Structural analysis exposes the necessary control flow regions for a direct
translation to an RVSDG. A control flow tree’s branch and loop nodes can directly be mapped to γ-
and θ-nodes, and individual instructions to simple nodes. However, a further necessity for the efficient
generation of these RVSDG nodes is the exposure of their data dependencies.
This is the task of demand annotation. It exposes these data dependencies by annotating control
tree nodes with the variables that are demanded by the instructions within control flow regions. It
accomplishes this using a read-write and demand-set annotation pass. The read-write pass annotates
each control tree node with the set of read and written variables of the corresponding control flow
region, while the demand-set pass uses these variables to annotate each control tree node with the set of
demanded variables, i.e. variables that are necessary to fulfill the dependencies of the instructions within
a control flow region.
Algorithm II shows the details of the two passes. The read-write pass annotates each node with
the read set R and write set W . It processes the tree in post-order, building up the two sets from the
innermost to the outermost nested control flow region. For linear nodes, the children are processed from
15
Enter
Exit
return x
t:=y
y:=x%y
x:=t
branch c
01
c:=0!=y
(a) CFG
Enter
branch c
t:=y
y:=x%y
x:=t
r:=1 r:=0
branch r
return x
Exit
1 0
1 0
c:=0!=y
(b) Restructured CFG
Linear
Enter Loop return x Exit
t:=y
y:=x%y
x:=t
r:=1
branch r
Linear
Branch
Linear r:=0
B:
branch cc:=0!=y
A
B C D E
F
G H
I
J
K L
M N
Read-Write
Annotation
Demand-Set
Annotation
G:
M:
H:
N:
K:
L:
I:
J:
F:
C:
D:
E:
A:
Read Write
{} {}
{y} {c}
{c} {}
{x,y} {x,y,t}
{} {r}
{x,y} {r,x,y,t}
{} {r}
{r}{x,y}
{r} {}
{r,c}{x,y}
{x,y} {r,c}
{x} {}
{} {}
{r,c}{x,y}
E:
D:
J:
C:
L:
N:
M:
K:
I:
H:
G:
F:
B:
A:
{}
{x}
{x,y}
{x,y,r}
{x,y}
{x,y}
{x,y}
{x,y}
{x,y}
{x,y,c}
{x,y}
{x,y}
{x,y}
{x,y}
(c) Annotated control tree
gamma
0 1
%
!=
0
theta
1 0
yx
yx
(d) RVSDG
Figure 5: Intra-Procedural Translation
Algorithm II: Demand Annotation
1. Read-Write Annotation: Process the control tree nodes in post-order as follows:
- Basic Block: For each instruction i processed bottom-up, the read set is R = (R \Wi) ∪ Ri. The write
set is W =
⋃
Wi.
- Linear Node: For each child c processed right to left, the read set is R = (R \Wc) ∪ Rc. The write set
is W =
⋃
Wc.
- Branch Node: For each child c, the read set and write set is R =
⋃
Rc and W =
⋂
Wc, respectively.
- Loop Node: For the child c, the read set and write set is R = Rc and W = Wc, respectively.
2. Demand-Set Annotation: Process the control tree nodes with an empty demand set Dt as follows:
- Basic Block: Set D = Dt = (Dt \W ) ∪ R and continue processing.
- Linear Node: Recursively process the children right to left. Set D = Dt = (Dt \W ) ∪ R. and continue
processing.
- Branch Node: Set Dtmp = Dt. Recursively process each child with a copy of Dt. Set D = Dt =
(Dtmp \W ) ∪ R and continue processing.
- Loop Node: Set D = Dt ∪R. Recursively process the child with Dt = D and continue processing.
16
Algorithm III: Control Tree Translation
Process the control tree nodes as follows:
- Basic Block: Process the node’s operations top-down creating simple nodes in the RVSDG.
- Linear Node: Recursively process the node’s children top-down.
- Branch Node:Begin a γ-node with inputs according to the node’s demand set. Create subregions by recursively
processing the node’s children. Finish the γ-node with outputs according to its right sibling node’s demand set.
- Loop Node: Begin a θ-node with inputs according to the node’s demand set. Create its region by recursively
processing its child. Finish the θ-node with outputs according to its demand set.
right to left, i.e. bottom-up in the restructured CFG, to create the two sets. For branch nodes, a variable
is only considered to be written, if it is in the write set of all the node’s children, i.e. it was written in
all alternatives of a conditional.
The demand-set pass uses the read set R and write setW to construct a demand set D for each node.
The algorithm is initialized with an empty set Dt, which is used to keep track of demanded variables
during traversal. The demand-set pass traverses the tree such that it follows a bottom-up traversal of the
restructured CFG, adding and removing variables from Dt during this traversal according to each node’s
rules. For branch nodes, each child is processed with a copy of Dt, as the corresponding alternatives
of the conditional are independent from another. For loop nodes, the θ-node’s requirement that inputs
and outputs must have the same signature necessitates that R is added to Dt before the loop’s body is
processed. The right of Figure 5c shows the traversal order for the two passes along with the read, write,
and demand set for each node of the control tree on the left.
Control Tree Translation: After demand annotation, each node of the control tree is annotated
with the set of variables that its instructions require, i.e. their data dependencies. Finally, the control
tree translation constructs a λ-region from the control tree along with its annotated demand sets. Al-
gorithm III shows the details. The algorithm processes each node in the control tree creating γ- and
θ-nodes for all branch and loop nodes, respectively. For the outputs of gamma nodes, the algorithm uses
the demand set of the right sibling, which corresponds to the branch region’s join node in the CFG.
Figure 5d shows the resulting RVSDG nodes for the example.
5.1.3 Modeling Stateful Computations
Algorithm I adds an additional state argument and result to every λ-node. This state is used to sequen-
tialize all stateful computations within a function. Nodes with side-effects consume this state and produce
a new state for consumption by the next node. This single state ensures that the order of operations
with side-effects in the RVSDG is according to the total order specified in the original program, ensuring
correct observable behavior. Specifically, the use of a single state for sequentializing stateful operations
ensures that the order of these operations in the RVSDG is equivalent to the order in the restructured
CFG.
The utilization of a single state is, however, overly conservative, as different computations can have
mutually exclusive side-effects. For example, the side-effect of a non-terminating loop is unrelated to
a non-dereferencable load. These stateful computations can be modeled independently with the help
of distinct states, as depicted in Figure 1f. This results in the explicit exposure of more concurrent
computations, as loops with no memory operations would become independent from other loops with
memory operations. Moreover, the possibility of encoding independent states can also be leveraged
by analyses and optimizations. For example, alias analysis can directly encode independent memory
operations into the RVSDG by introducing additional memory states. Pure functions could be easily
recognized and optimized, as they would contain no operations that use the added states and therefore
would only pass it through, i.e., the origin of the state result would be the λ-region’s argument.
17
Algorithm IV: Inter-Procedural Control Flow Recovery
1. Create IPG nodes for all function arguments of the ω-region.
2. Process all nodes of the ω-region in topological order as follows:
- λ-nodes: Create an IPG node, and mark it exported if the λ-node’s output has a ω-region’s result as
user. For every context variable cv = (i, a), add an edge from the λ-node’s IPG node to the corresponding
IPG node of the producer of i. Create a CFG from the λ-node’s subregion and attach it to the IPG node.
- φ-nodes: For every argument of the φ-region, create an IPG node for the corresponding λ-node and
add IPG edges from this node to the corresponding IPG nodes of the context variables. Create a CFG
from every λ-node’s subregion and attach it to the IPG node. Mark the IPG node as exported if the
corresponding φ-node’s output has a ω-region’s result as user.
5.2 Destruction
The destruction stage reestablishes control flow by extracting an IPG from an RVSDG as well as gen-
erating CFGs from individual λ-regions. Inter-Procedural Control Flow Recovery (Inter-PCFR) creates
an IPG from λ-nodes, while Intra-Procedural Control Flow Recovery (Intra-PCFR) extracts control flow
from γ- and θ-nodes and generates basic blocks with corresponding operations for primitive nodes. A
λ-region without γ- and θ-nodes is trivially transformed into a linear CFG, whereas λ-regions with these
nodes require the construction of branches and/or loops. The rest of this section discusses Inter-PCFR
in detail. We refrain from an in-depth discussion of Intra-PCFR as it is covered in Bahmann et al. [3].
5.2.1 Inter-Procedural Control Flow Recovery
Inter-PCFR recovers an IPG from an RVSDG. IPG nodes are created for λ-nodes as well as arguments of
the ω-region, while IPG edges are inserted to capture the dependencies between λ-nodes. Algorithm IV
starts by creating IPG nodes for all arguments of the ω-region, i.e., all external functions. It continues
by recursively traversing the region tree, creating IPG nodes for encountered λ-nodes and IPG edges for
their dependencies. For the region of every λ-node, it invokes Intra-PCFR to create a CFG.
5.2.2 Intra-Procedural Control Flow Recovery
Bahmann et al. [3] explored two different approaches for CFG generation: Structured Control Flow
Recovery (SCFR) and Predicative Control Flow Recovery (PCFR). SCFR uses the region hierarchy within
a λ-region to recover control flow, while PCFR generates branches for predicate producers and follows the
predicate consumers to the eventual destination. Both schemes reestablish evaluation-equivalent CFGs,
but differ in the recoverable control flow. SCFR recovers only control flow that resembles the structural
nodes in λ-regions, i.e., control flow equivalent to if-then-else, switch, and do-while statements,
while PCFR can recover arbitrary complex control flow, i.e., control flow that is not restricted to RVSDG
constructs. PCFR reduces the number of static branches in the resulting control flow [3], but might also
result in undesirable control flow for certain architectures, such as graphic processing units [33]. For the
sake of brevity, we omit a discussion of SCFR and PCFR as the algorithms are extensively described by
Bahmann et al. [3].
6 Optimizations
The properties of the RVSDG make it an appealing IR for optimizing compilers. Many optimizations
can be expressed as simple graph traversals, where subgraphs are rewritten, nodes are moved between
regions, nodes or edges are marked, or edges are diverted. In this section, we present Dead and Com-
mon Node Elimination optimizations that exploit the RVSDG’s properties to unify traditionally distinct
transformations.
18
int y = 6;
if (99 > x) {
 z = (x*x)-(y*y)
   / (y*y)+(x*x);
 w = -y;
} else {
 do {
  x++;
 } while(50 < x);
 z = x;
 w = y;
}
(a) Code
50
+
theta
1
<
gamma
0 1
*
*
*
*
- +
/ -
99
> 6
x
(b) RVSDG
50
+
theta
1
<
gamma
0 1
*
*
*
*
- +
/ -
99
> 6
x
(c) After CNE
50
+
theta
1
<
gamma
0 1
*
*
*
*
- +
/ -
99
> 6
x
(d) After DNE mark
Figure 6: Dead and Common Node Elimination
6.1 Dead Node Elimination
Dead Node Elimination (DNE) is a combination of dead and unreachable code elimination, and removes
all nodes that do not contribute to the result of a computation. Dead nodes are generated by unreachable
and dead code from the input program, as well as by other optimizations such as Common Node Elimi-
nation. An operation is considered dead code when its results are either not used or only by other dead
operations. Thus, an output of a node is dead, if it has no users or all its users are dead. We consider a
node to be dead, if all its outputs are dead. It follows that a node’s inputs are dead, if the node itself is
dead. We call all inputs, outputs, or nodes that are not dead alive.
The implementation of DNE consists of two phases: mark and sweep. The mark phase identifies all
outputs and arguments that are alive, while the sweep phase removes all dead entities. The mark phase
traverses RVSDG edges according to the rules in Algorithm V. If a structural node is dead, the mark
phase skips the traversal of its subregions as well as all of the contained computations, as it never reaches
the node in the first place. The mark phase is invoked for all result origins of the ω-region.
The sweep phase performs a simple bottom-up traversal of an RVSDG, recursively processing subre-
gions of structural nodes as long as these nodes are alive. A dead structural node is removed with all
its contained computations. The RVSDG’s uniform representation of all computations as nodes permits
DNE to not only remove simple computations, but also compound computations such as conditionals,
loops, or even entire functions. Moreover, its nested structure avoids the processing of entire branches of
the region tree if they are dead.
Figure 6d shows the RVSDG from Figure 6c after the mark phase. Grey colored entities are dead.
The mark phase traverses the graph’s edges, marking the γ-node’s leftmost output alive. This renders
the corresponding result origins of the γ-regions alive, then the leftmost output of the θ-node, and so
forth. After the mark phase annotated all outputs and arguments as alive, the sweep phase removes all
dead entities.
6.2 Common Node Elimination
Common Node Elimination (CNE) permits the removal of redundant computations by detecting con-
gruent nodes. These nodes always produce the same results, enabling the redirection of their result
edges to a single node. This renders the other nodes dead, permitting DNE to remove them. CNE
is similar to common subexpression elimination and value numbering [2] in that it detects equivalent
computations, but since the RVSDG represents all computations uniformly as nodes, it can be extended
19
Algorithm V: Dead Node Elimination
1. Mark: Mark output or argument as alive and continue as follows:
- ω-region argument: Stop marking.
- φ-node output: Mark the result origin of the corresponding recursion variable.
- φ-region argument: Mark the input origin if the argument belongs to a context variable. Otherwise,
mark the output of the corresponding recursion variable.
- λ-node output: Mark all result origins of the λ-region.
- λ-region argument: Mark the input origin if the argument is a dependency.
- θ-node output: Mark the θ-node’s predicate origin as well as the result and input origin of the corre-
sponding loop variable.
- θ-region argument: Mark the input origin and output of the corresponding loop variable.
- γ-node output: Mark the γ-node’s predicate origin as well as the origins of all results of the corresponding
exit variable.
- γ-region argument: Mark the input origin of the corresponding entry variable.
- Simple node output: Mark the origin of all inputs.
2. Sweep: Process all nodes in reverse topological order and remove them if they are dead. Otherwise, process
them as follows:
- ω-node: Recursively process the ω-region. Remove all dead arguments.
- γ-node: For all exit variables (R, o) ∈ EX where o is dead, remove o and all r ∈ R. Recursively process
the γ-regions. For all entry variables (i, A) ∈ EV where all a ∈ A are dead, remove all a ∈ A and i.
- θ-node: For all loop variables (i, a, r, o) ∈ LV where a and o are dead, remove o and r. Recursively
process the θ-region. Remove i and a.
- λ-node: Recursively process the λ-region. For all context variables (i, a) ∈ CV where a is dead, remove
a and i.
- φ-node: For all recursion variables (r, a, o) ∈ RV where a and o are dead, remove o and r. Recursively
process the φ-region. Remove a. For all context variables (i, a) ∈ CV where a is dead, remove a and i.
to conditionals [35], loops, and functions.
We consider two simple nodes n1 and n2 congruent, or n1 ∼= n2, if they represent the same compu-
tation, have the same number of inputs, i.e., |In1 | = |In2 |, and the inputs i
k
n1
and ik
n2
are congruent,
or ik
n1
∼= ikn2 , for all k = [0.. |In1 |]. Two inputs are congruent if their respective origins g
k
n1
and gk
n2
are
congruent, i.e., gkn1
∼= gkn2 . By definition, the origins of inputs are either outputs of simple or structural
nodes, or arguments of regions. Origins from simple nodes are only equivalent when their respective
producers are computationally equivalent, whereas for the other cases, it must be guaranteed that they
always receive the same value.
The implementation of CNE consists of two phases: mark and divert. The mark phase identifies
congruent simple nodes, while the divert phase diverts all edges of their origins to a single node, rendering
all other nodes dead. Both phases of Algorithm VI perform a simple top-down traversal, recursively
processing subregions of structural nodes annotating inputs, outputs, arguments, and results, as well as
simple nodes as congruent. For γ-nodes, the algorithm marks only computations within a single region
as congruent and performs no analysis between regions. In the case of θ-nodes, computations are only
congruent when they are congruent before and after the loop execution, i.e., the inputs and results of
two loop variables must be congruent.
Figure 6b shows the RVSDG for the code in Figure 6a, and Figure 6b the RVSDG after CNE. Two
of the four multiplications take the same inputs and are therefore congruent to each other. Thus, their
result edges are redirected and they become dead. DNE can then remove both multiplications as shown
in Figure 6d.
For simple nodes, the algorithm marks all nodes within a region that are congruent to a node n. In
order to avoid costly traversals of all nodes for every node n, the mark phase takes the candidates from
the users of the origin of n’s first input. If there is another input from a simple node n′ with the same
operation and number of inputs among them, the other inputs from both nodes can be compared for
congruence. Moreover, a region must store constant nodes, i.e. nodes without inputs, separately from
other nodes so that the candidate nodes for constants are available. For commutative simple nodes, the
20
Algorithm VI: Common Node Elimination
1. Mark: Process all nodes in topological order as follows:
- Simple nodes: Denote this node as n. Mark n as congruent to all nodes n′ which represent the same
operation and where |In| = |In′ | ∧ i
k
n
∼= ik
n′
for all k = [0.. |In|]. Mark all outputs okn
∼= ok
n′
for all
k = [0.. |On|].
- γ-node: For all entry variables ev1, ev2 ∈ EV where iev1
∼= iev2 , mark a
k
ev1
∼= akev2 for all k ∈ [0.. |Aev1 |].
Recursively process the γ-regions. For all exit variables ex1, ex2 ∈ EX where rkex1
∼= rkex2 for all k ∈
[0.. |Rex1 |], mark oex1
∼= oex2 .
- θ-node: For all loop variables lv1, lv2 ∈ LV where ilv1
∼= ilv2
∧ r
lv1
∼= rlv2
, mark a
lv1
∼= alv2
and
o
lv1
∼= olv2
. Recursively process the θ-region.
- λ-node: For all context variables cv1, cv2 ∈ CV where icv1
∼= icv2 , mark acv1
∼= acv2 . Recursively process
the λ-region.
- φ-node: For all context variables cv1, cv2 ∈ CV where icv1
∼= icv2 , mark acv1
∼= acv2 . Recursively process
the φ-region.
- ω-node: Recursively process the ω-region.
2. Divert: Process all nodes in topological order as follows:
- Simple nodes: Denote this node as n. For all nodes n′ which are congruent to n, divert all outputs ok
n′
to okn for all k = [0.. |On|].
- γ-node: For all entry variables ev1, ev2 ∈ EV where iev1
∼= iev2 , divert all edges from a
k
ev2
to akev1 for all
k ∈ [0.. |Aev1 |]. Recursively process the γ-regions. For all exit variables ex1, ex2 ∈ EX where r
k
ex1
∼= rkex2
for all k ∈ [0.. |Rex1 |], divert all edges from oex2 to oex1 .
- θ-node: For all induction variables lv1, lv2 ∈ LV where alv1
∼= alv2
∧ o
lv1
∼= olv2
, divert all edges from
a
lv2
to a
lv1
and from o
lv2
to o
lv1
. Recursively process the θ-region.
- λ-node: For all context variables cv1, cv2 ∈ CV where icv1
∼= icv2 , divert all edges from acv2 to acv1 .
Recursively process the λ-region.
- φ-node: For all context variables cv1, cv2 ∈ CV where icv1
∼= icv2 , divert all edges from acv2 to acv1 .
Recursively process the φ-region.
- ω-node: Recursively process the ω-region.
inputs should be sorted before their comparison.
The presented algorithm only detects simple nodes as congruent within a region. For γ-nodes, con-
gruence can also exist between nodes of different γ-regions and extending the algorithm would eliminate
these redundancies. Another extension would be to permit congruence detection for structural nodes to
implement conditional fusion [35] and loop fusion [25]. In the case of γ-nodes, it is sufficient to ensure that
two nodes have congruent predicates, whereas for θ-nodes it would be necessary to permit congruence
detection between different θ-regions to ensure that their predicates are the same.
7 Implementation and Evaluation
This section’s goal is to demonstrate that the RVSDG has no inherent impediment that prevents it from
producing competitive code and that it can serve as the IR in a compiler’s optimization stage. The goal is
not to outperform mature compilers, such as LLVM or GCC. This would require a significant engineering
effort, which is outside the scope of this article. In light of this goal, we evaluate the RVSDG in terms of
performance and size of produced code, as well as compilation time and representational overhead.
7.1 Implementation
We have implemented jlm, a publicly available [32] prototype compiler that uses the RVSDG for opti-
mizations. Its compilation pipeline is outlined in Figure 7a. Jlm takes LLVM IR as input, constructs an
RVSDG, transforms and optimizes this RVSDG, and destructs it again to LLVM IR. The SSA form of
the input is destructed before RVSDG construction proceeds with Inter- and Intra-PT. This additional
step is required due to the control flow restructuring phase of Intra-PT. Destruction discovers control
21
*.ll
S
S
A
D
e
s
t
r
u
c
t
i
o
n
I
n
t
e
r
-
 
a
n
d
I
n
t
r
a
-
P
T
Construction
R
V
S
D
G
O
p
t
i
m
i
z
a
t
i
o
n
s
C
o
n
t
r
o
l
 
F
l
o
w
R
e
c
o
v
e
r
y
S
S
A
C
o
n
s
t
r
u
c
t
i
o
n
Destruction
*.ll
(a) Jlm compilation pipeline.
*.c clang -O0 opt -mem2reg llc -O3
opt -O?
*.o
(b) Evaluation setup.
Figure 7: Jlm’s compilation pipeline and evaluation setup.
flow by employing SCFR before it constructs SSA form to output LLVM IR. Jlm supports LLVM IR
function, integer, floating point, pointer, array, structure, and vector types as well as their correspond-
ing operations. No support exists in the current implementation for exceptions and intrinsic functions.
The compiler uses two distinct states to model operations with side-effects: one for modeling memory
accesses and one for non-terminating loops. We implemented the following optimizations in addition to
DNE (Section 6.1) and CNE (Section 6.2):
- Inlining (ILN): Simple function inlining.
- Invariant Value Redirection (INV): Redirects invariant values from θ- and γ-nodes. For γ-nodes,
the users of an exit variable’s output can be redirected to the origin of an entry variable’s input,
if the origin for all the results of the exit variable are the corresponding arguments of the entry
variable. For θ-nodes, a loop variable is invariant if the origin of its result is the argument of the
loop variable.
- Node Push Out (PSH): Moves all invariant nodes out of γ- and θ-regions. For γ-nodes, all nodes
without side-effects are moved, exposing them to other optimizations such as CNE. For θ-nodes, a
node is invariant if all its operands are invariant.
- Node Pull In (PLL): Moves all nodes that are only used in one γ-region into the γ-node. This
ensures their conditional execution, while avoiding code bloat.
- Node Reduction (RED): Performs simplifications, such as constant folding or strength reduction,
similarly to LLVM’s redundant instruction combinator (-instcombine), albeit by far not as many.
- Loop Unrolling (URL): Unrolls all inner loops by a factor of four. Higher factors gave no significant
performance improvements in return for the increased code size.
- θ − γ Inversion (IVT): Inverts γ- and θ-nodes where both nodes have the same predicate origin.
This replaces the loop containing a conditional with a conditional that has a loop in its then-case.
We use the following optimization order: ILN INV RED DNE IVT INV DNE PSH INV DNE URL
INV RED CNE DNE PLL INV DNE.
7.2 Evaluation Setup
Figure 7b outlines our evaluation setup. We use clang 7.0.1 [8] to convert C files to LLVM IR, pre-optimize
the IR with LLVM’s opt, and then optimize it either with jlm, or opt using different optimization levels.
The optimized output is converted to an object file with LLVM’s llc. The pre-optimization step is
necessary to avoid a re-implementation of LLVM’s mem2reg pass, since clang allocates all values on the
stack by default.
22
 2
 4
 6
 8
 10
2
m
m
3
m
m
ad
i
atax
b
icg
ch
olesky
correlation
covarian
ce
d
erich
e
d
oitg
en
d
u
rb
in
fd
td
-2
d
oyd
-w
arsh
all
g
em
m
g
em
ver
g
esu
m
m
v
g
ram
sch
m
id
t
h
eat-3
d
jacob
i-1
d
jacob
i-2
d
lu lu
d
cm
p
m
vt
n
u
ssin
ov
seid
el-2
d
sym
m
syr2
k
syrk
trisolv
trm
m
g
m
ean
S
p
e
e
d
u
p
opt -O1
opt -O3
opt -O3-no-vec
opt -O3-no-vec-stripped
jlm
Figure 8: Speedup relative to O0 at different optimization levels.
We use the polybench 4.2.1 beta benchmark suite [30] to evaluate the RVSDG’s usability and efficacy.
This benchmark suite provides structurally small benchmarks, and therefore reduces the implementation
effort for the construction and destruction phases, as well as the number and complexity of optimizations.
The experiments are performed on an Intel Xeon E5-2695v4 running CentOS 7.4. The core frequency
is pinned to 2.0 GHz to avoid performance variations and thermal throttling effects. All outputs of the
benchmark runs are verified to equal the corresponding outputs of the executables produced by clang.
7.3 Performance
Figure 8 shows the speedup at five different optimization levels. The O0 optimization level serves as
baseline. The O3-no-vec optimization level is the same as O3, but without slp- and loop-vectorization.
Optimization level O3-no-vec-stripped is the same as O3-no-vec, but the IR is stripped of named
metadata and attribute groups before invoking llc. Since jlm does not support metadata and attributes
yet, this optimization level permits us to compare the pure optimized IR against jlm without the optimizer
providing hints to llc. We omit optimization level O2 as it was very similar to O3. The gmean column
in Figure 8 shows the geometric mean of all benchmarks.
The results show that the executables produced by jlm (gmean 2.58) are faster than O1 (gmean 2.49),
but slower than O3 (gmean 3.21), O3-no-vec (gmean 2.95), and O3-no-vec-stripped (gmean 2.92).
Optimization level O3 tries to vectorize twenty benchmarks, but only produces measurable improvements
for eight of them, namely atax, durbin, fdtd-2d, gemm, gemver, heat-3d, jacobi-1d, and jacobi-2d. Jlm
would require a vectorizer to achieve similar speedups.
Disabling vectorization with O3-no-vec and O3-no-vec-stripped shows that jlm achieves similar
speedups for fdtd-2d, gemm, heat-3d, javobi-1d, and jacobi-2d. The metadata transferred between the
optimizer and llc only makes a significant difference for durbin, floyd-warshall, gesummv, jacobi-1d, and
nussinov. In the case of gesummv and jacobi-1d, performance drops below jlm.
Jlm is outperformed by optimization level O1 at six benchmarks: adi, durbin, floyd-warshall, nussinov,
seidel-2d, and syrk. We inspected the output files and found the following causes:
- adi : Jlm fails to eliminate load instructions from the two innermost loops. These loads have loop-
carried dependences with a distance of one to store instructions in the same loop, and can be
eliminated by propagating the stored value to the users of the load’s output. The LLVM pass that
performs this optimization is loop load elimination (-loop-load-elim). If this transformation is
performed by hand on the two loops, then jlm achieves the same performance as O1.
- durbin: Jlm fails to transform a loop that copies values between arrays to a memcpy intrinsic.
This impedes LLVM’s code generator to produce better code. The LLVM pass responsible for this
23
 0
 1
 2
 3
 4
 5
 6
 7
2
m
m
3
m
m
ad
i
atax
b
icg
ch
olesky
correlation
covarian
ce
d
erich
e
d
oitg
en
d
u
rb
in
fd
td
-2
d
 oyd
-w
arsh
all
g
em
m
g
em
ver
g
esu
m
m
v
g
ram
sch
m
id
t
h
eat-3
d
jacob
i-1
d
jacob
i-2
d
lu lu
d
cm
p
m
vt
n
u
ssin
ov
seid
el-2
d
sym
m
syr2
k
syrk
trisolv
trm
m
am
ean
.t
e
x
t 
s
e
c
ti
o
n
 s
iz
e
 [
k
B
]
opt -O3
opt -O3-no-vec
opt -Os
jlm
jlm -no-unroll
Figure 9: Code size at different optimization levels.
transformation is the loop-idiom pass (-loop-idiom). If the loop is replaced with a call to memcpy,
then jlm is better than O1.
- floyd-warshall : Jlm fails to move instructions out of the innermost loop due to loads and stores
impeding their hoisting. Currently, all loads and stores are sequentialized using a single state.
This has the consequence that invariant loads/stores might not appear as invariant due to their
state edge originating from a another non-invariant load or store. This in turn also renders other
instructions non-hoistable as they might be dependent on one of these non-hoistable loads. An alias
analysis pass would resolve this problem as it would render loop invariant loads/store independent
of non-invariant ones.
- nussinov : Similarly to floyd-warshall, the overly strict sequentialization of load and store instruc-
tions impedes further optimizations. In this case, it is the application of CNE. Loads from the
same address are not detected as congruent due to different state edge origins. Again, this has a
cascading effect to other instructions and an alias analysis pass would resolve this problem.
- seidel-2d : Similarly to adi, jlm fails to eliminate load instructions from the innermost loop. If the
load elimination is performed by hand, then jlm achieves the same performance as O1.
- syrk : Similarly to nussinov, jlm fails to satisfactorily apply CNE due to an overly strict sequential-
ization of load and store instructions.
Figure 8 shows that it is feasible to produce competitive code using the RVSDG, but also that more
optimizations and analyses are required in order to reliably do so. The differences in performance are not
due to inherent characteristics of the RVSDG, but can be attributed to missing analyses, optimizations,
and heuristics for their application. Specifically, jlm requires more complex analyses as well as more
optimizations exploiting the results of these analyses in order to compete with mature compilers at more
complex benchmarks. In particular, an alias analysis pass is required as the results above and the number
of LLVM pass invocations from Table 1 indicate.
7.4 Code Size
Figure 9 shows the code size for O3, O3-no-vec, Os, and for jlm with and without loop unrolling. The
amean column shows the arithmetic mean of all benchmarks.
Optimization level O3 produces on average text sections that are 11% bigger than O3-no-vec. Vec-
torization often requires loop transformations, such as loop unrolling, to make loops amenable to the
24
 200
 300
 400
 

	


















ff
fi
fl
ffi

!
"
#
$
%
&
'
(
)
*
+
,
V
S
D
G
 N
o
d
e
s
# Instructions
(a) Representational overhead.
 0
 10
 20
 30
 40
 50
 60
 70
-.
 90
 5
0
 1
0
0
 1
5
0
 2
0
0
 2
5
0
 3
0
0
 3
5
0
 4
0
0
 4
5
0
 5
0
0
 5
5
0
T
im
e
 [
m
s
]
/ 0123456789:;
(b) Compilation times.
Figure 10: Compilation overhead of jlm.
vectorizer, and the insertion of pre- and post-loop code. This affects code size negatively, but can re-
sult in better performance. The results also show that Os consistently produces smaller text sections
than O3-no-vec. This is due to more conservative optimization heuristics and the omission of other
optimizations, e.g., aggressive instruction combination (-aggressive-instcombine) or the promotion of
by-reference arguments to scalars (-argpromotion).
In comparison to Os, jlm produces ca. 39% bigger text sections. The experiments without loop
unrolling show that this can be attributed to the naive heuristic used for this optimization. Jlm does
not take code size into account and unrolls every inner loop unconditionally four times, leading to
excessive code expansion. Avoiding unrolling completely results in text sections that are on average
between O3-no-vec and Os. This indicates that the excessive code size is due to naive heuristics and
shortcomings in the implementation, but not to inherent characteristics of the RVSDG.
7.5 Compilation Overhead
Figure 10 shows the overhead in terms of IR size and time for the RVSDG. Figure 10a shows the
representational overhead by relating the number of instructions in the LLVM module to the number of
RVSDG nodes after construction, whereas Figure 10b relates the number of instructions in the LLVM
module to the time spent on RVSDG construction and optimizations.
Figure 10a shows a clear linear relationship for all cases, confirming the observations by Bahmann et
al. [3] that the RVSDG is feasible in terms of space requirements. Figure 10b also indicates a linear
dependency, but with larger variations for similar input sizes. This variation can be attributed to the
fact that construction and optimizations are also compounded by input structure. Structural differences
in the inter-procedure and control flow graphs lead to runtime variations in RVSDG construction and
different runtimes for optimizations. For example, the presence of loops in a translation unit determines
whether loop unrolling is performed, while their absence avoids the runtime overhead for this optimization
completely. Overall, Figure 10 suggests that the RVSDG is feasible as an IR for optimizing compilers in
terms of compilation overhead.
8 Conclusion
This paper presents a complete specification for representing entire programs in the RVSDG IR for an
optimizing compiler. We provide construction and destruction algorithms, and show the RVSDG’s efficacy
25
as an IR for analyses and optimizations by presenting Dead Node and Common Node Elimination. We
implemented jlm, a publicly available [32] compiler that uses the RVSDG for optimizations, and evaluate it
in terms of performance, code size, compilation time, and representational overhead. The results suggest
that the RVSDG combines the abstractions of data centric IRs with the CFG’s advantages to optimize
and generate efficient control flow. This makes the RVSDG an appealing IR for optimizing compilers. A
natural direction for future work is to explore how features such as exceptions can be efficiently mapped
to the RVSDG. Another research direction would be to extend the number of optimizations and their
heuristics in jlm to a competitive level with CFG-based compilers. This would provide further information
about the number of necessary optimizations, their complexity, and consequently the required engineering
effort.
References
[1] Frances E Allen. Control flow analysis. In ACM Sigplan Notices, volume 5, pages 1–19. ACM, 1970.
[2] B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of variables in programs. In
Proceedings of the ACM SIGPLAN Symposium on Principles of Programming Languages, pages
1–11. ACM, 1988.
[3] Helge Bahmann, Nico Reissmann, Magnus Jahre, and Jan Christian Meyer. Perfect reconstructabil-
ity of control flow from demand dependence graphs. ACM Transactions on Architecture and Code
Optimization, 11(4):66:1–66:25, 2015.
[4] W. Baxter and H. R. Bauer, III. The program dependence graph and vectorization. In Proceedings
of the ACM SIGPLAN Symposium on Principles of Programming Languages, pages 1–11. ACM,
1989.
[5] Philip L Campbell, Ksheerabdhi Krishna, and Robert A Ballance. Refining and Defining the Program
Dependence Web. Technical report, University of New Mexico, 1993.
[6] Larry Carter, Jeanne Ferrante, and Clark D. Thomborson. Folklore confirmed: reducible flow
graphs are exponentially larger. In Proceedings of the ACM SIGPLAN Symposium on Principles of
Programming Languages, pages 106–114. ACM, 2003. ACM SIGPLAN Notices 38(1), January 2003.
[7] Jong-Deok Choi, Vivek Sarkar, and Edith Schonberg. Incremental computation of static single
assignment form. In Proceedings of the International Conference on Compiler Construction, pages
223–237. Springer-Verlag, 1996.
[8] Clang. Clang: A c language family frontend for llvm. https://clang.llvm.org, 2017. Accessed:
2019-10-30.
[9] GNU Compiler Collection. https://gcc.gnu.org/, 2018. Accesssed: 2019-08-05.
[10] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. K Zadeck. Efficiently
computing static single assignment form and the control dependence graph. Technical report, 1991.
[11] Jack Bonnell Dennis. Data flow supercomputers. Computer, 13(11):48–56, 1980.
[12] Shuhan Ding, John Earnest, and Soner O¨nder. Single Assignment Compiler, Single Assignment
Architecture: Future Gated Single Assignment Form. In Proceedings of the International Symposium
on Code Generation and Optimization. ACM, 2014.
26
[13] Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. The program dependence graph and its use
in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319–349, 1987.
[14] Paul Havlak. Construction of Thinned Gated Single-Assignment Form. In Proceedings of the In-
ternational Workshop on Languages and Compilers for Parallel Computing-Revised Papers, pages
477–499. Springer, 1993.
[15] S. Horwitz, J. Prins, and T. Reps. On the adequacy of program dependence graphs for represent-
ing programs. In Proceedings of the ACM SIGPLAN Symposium on Principles of Programming
Languages, pages 146–157. ACM, 1988.
[16] S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. volume 23,
pages 35–46. ACM, 1988.
[17] Neil Johnson and Alan Mycroft. Combined code motion and register allocation using the value state
dependence graph. In Proceedings of the International Conference on Compiler Construction, pages
1–16. Springer-Verlag, 2003.
[18] Neil E. Johnson. Code size optimization for embedded processors. Technical report, University of
Cambridge, Computer Laboratory, 2004.
[19] Richard Johnson, David Pearson, and Keshav Pingali. The program structure tree: Computing
control regions in linear time. pages 171–185. ACM, 1994.
[20] Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Anal-
ysis & Transformation. In Proceedings of the International Symposium on Code Generation and
Optimization, 2004.
[21] Alan C. Lawrence. Optimizing compilation with the Value State Dependence Graph. Technical
report, University of Cambridge, Computer Laboratory, 2007.
[22] LLVM. https://bugs.llvm.org/show_bug.cgi?id=31851, 2018. Accesssed: 2018-05-07.
[23] LLVM. https://bugs.llvm.org/show_bug.cgi?id=37202, 2018. Accesssed: 2018-05-07.
[24] LLVM. https://bugs.llvm.org/show_bug.cgi?id=31183, 2018. Accesssed: 2018-05-07.
[25] Naraig Manjikian and Tarek S Abdelrahman. Fusion of loops for parallelism and locality. IEEE
Transactions on Parallel and Distributed Systems, 8(2):193–209, 1997.
[26] Steven S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997.
[27] R. Namballa, N. Ranganathan, and A. Ejnioui. Control and Data Flow Graph Extraction for High-
Level Synthesis. In IEEE Computer Society Annual Symposium on VLSI, pages 187–192, 2004.
[28] Karl J. Ottenstein, Robert A. Ballance, and Arthur B. MacCabe. The Program Dependence Web:
A Representation Supporting Control-, Data-, and Demand-driven Interpretation of Imperative
Languages. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design
and Implementation, pages 257–271. ACM, 1990.
[29] Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. Automatic Thread Extraction
with Decoupled Software Pipelining. In Proceedings of the ACM/IEEE International Symposium on
Microarchitecture, pages 105–118. IEEE, 2005.
27
[30] Louis-Noe¨l Pouchet. Polybench/c 4.2. http://web.cse.ohio-state.edu/~pouchet.2/software/polybench/,
2017. Accessed: 2019-11-11.
[31] Nico Reissmann. Utilizing the value state dependence graph for haskell. 2012.
[32] Nico Reissmann. jlm. https://github.com/phate/jlm, 2017. Accessed: 2017-12-13.
[33] Nico Reissmann, Thomas L. Falch, Benjamin A. Bjørnseth, Helge Bahmann, Jan Christian Meyer,
and Magnus Jahre. Efficient control flow restructuring for GPUs. In Proceedings of the International
Conference on High Performance Computing and Simulation, pages 48–57, 2016.
[34] B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global value numbers and redundant computations.
In Proceedings of the ACM SIGPLAN Symposium on Principles of Programming Languages, pages
12–27. ACM, 1988.
[35] Radu Rugina and Martin C. Rinard. Recursion unrolling for divide and conquer programs. In
Proceedings of the International Workshop on Languages and Compilers for Parallel Computing-
Revised Papers, pages 34–48. Springer-Verlag, 2001.
[36] V. Sarkar. Automatic Partitioning of a Program Dependence Graph into Parallel Tasks. IBM
Journal of Research and Development, 35(5-6):779–804, 1991.
[37] M. Sharir. Structural analysis: A new approach to flow analysis in optimizing compilers. Computer
Languages, 5(3-4):141–153, 1980.
[38] Tatiana Shpeisman and Chris Lattner. Mlir: Multi-level intermediate representation for compiler
infrastructure. Keynote, 2019. European LLVM Developers Meeting.
[39] James Stanier. Removing and Restoring Control Flow with the Value State Dependence Graph. PhD
thesis, University of Sussex, 2012.
[40] James Stanier and Alan Lawrence. The value state dependence graph revisited. In Proceedings of
the Workshop on Intermediate Representations, pages 53–60, 2011.
[41] James Stanier and Des Watson. Intermediate representations in imperative compilers: A survey.
ACM Computing Surveys (CSUR), 45(3):26:1–26:27, 2013.
[42] R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2):146–
160, 1972.
[43] Peng Tu and David Padua. Efficient Building and Placing of Gating Functions. In Proceedings of the
ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 47–55.
ACM, 1995.
[44] Mark N. Wegman and F. Kenneth Zadeck. Constant propagation with conditional branches. ACM
Transactions on Programming Languages and Systems, 13(2):181–210, 1991.
[45] Daniel Weise, Roger F. Crew, Michael Ernst, and Bjarne Steensgaard. Value dependence graphs:
Representation without taxation. In Proceedings of the ACM SIGPLAN Symposium on Principles
of Programming Languages, pages 297–310. ACM, 1994.
[46] Ali Mustafa Zaidi. Accelerating control-flow intensive code in spatial hardware. Technical report,
University of Cambridge, 2015.
28
[47] Ali Mustafa Zaidi and David Greaves. Value state flow graph: A dataflow compiler ir for accelerating
control-intensive code in spatial hardware. ACM Transactions on Reconfigurable Technology and
Systems, 9(2):14:1–14:22, 2015.
[48] F. Zhang and E.H. D’Hollander. Using hammock graphs to structure programs. IEEE Transactions
on Software Engineering, 30(4):231–245, 2004.
29
