Gemini: A Functional Programming Language for Hardware Description by Srinivasan, Aditya & Hilton, Andrew D.
ar
X
iv
:1
91
1.
03
92
6v
1 
 [c
s.P
L]
  1
0 N
ov
 20
19
GEMINI: A FUNCTIONAL PROGRAMMING LANGUAGE FOR
HARDWARE DESCRIPTION
A PREPRINT
Aditya Srinivasan
Pratt School of Engineering
Duke University ’18
Durham, NC 27708
aditya.srinivasan@alumni.duke.edu
Andrew D. Hilton
Pratt School of Engineering
Duke University
Durham, NC 27708
adhilton@ee.duke.edu
November 12, 2019
ABSTRACT
This paper presents GEMINI, a functional programming language for hardware description that pro-
vides features such as parametric polymorphism, recursive datatypes, higher-order functions, and
type inference for higher expressivity compared to modern hardware description languages. GEMINI
demonstrates the theory and implementation of novel type-theoretical concepts through its unique
type system consisting of multiple atomic kinds and dependent types, which allows the language
to model both software and hardware constructs safely and perform type inference through multi-
staged compilation. The primary technical results of this paper include formalizations of the GEM-
INI grammar, typing rules, and evaluation rules, a proof of safety of GEMINI’s type system, and a
prototype implementation of the compiler’s semantic analysis phase.
Keywords Dependent types · Type inference ·Multi-stage compilation · HDL
1 Introduction
Over the latter half of the 20th century, meth-
ods in electronic circuit design evolved significantly to
cope with growing complexity and scale due in part
to Moore’s Law. Traditional methods of manual de-
sign at the transistor level were rendered ineffectual due
to limitations at scale, and gate-level descriptions (also
known as netlists), were used as a higher-level abstrac-
tion. A netlist description is an enumeration of the elec-
tronic components in a circuit and their connectivity to
other nodes in the circuit, and is translated to a phys-
ical transistor-level implementation by place-and-route
algorithms. Eventually, direct specification of netlists
became unfeasible due to differences in implementation
across different target hardwares. Hardware description
languages, which specify designs at a register-transfer
level (RTL) abstraction, were therefore created [8, 1].
As very-large-scale-integration (VLSI) became in-
creasingly popular, the need for HDLs became more
important, and the first modern HDL, Verilog, was intro-
duced between 1983 and 1984 [2]. At approximately the
same time, the Department of Defense began develop-
ing a new standard named VHDL [3, 4]. These and vari-
ant descendent languages have enabled electronic cir-
cuit designers to specify highly complex structure and
behavior at a high level of abstraction, and leverage syn-
thesis tools to translate the RTL specification to an op-
timized netlist for a specific target hardware, much in
the same way a compiler generates optimized low-level
code for a specific target machine. There also exist tools
to verify and simulate digital logic circuits written in
HDLs.
As HDLs continued to evolve, successive iterations
have introduced increasingly powerful features, such as
datatypes and strong type systems, borrowed from pop-
ular software programming languages. However, de-
velopment of software languages has outpaced that of
HDLs, and they have been continuously improving the
programmer’s ability to concisely express complex pro-
grams.
A logical next step is to merge the advanced
ideas from software programming languages with ex-
isting hardware description languages, to provide an-
other level of abstraction that equips electronic circuit
designers to represent increasingly complex designs.
The GEMINI language does so by incorporating fea-
A PREPRINT - NOVEMBER 12, 2019
tures such as parametric polymorphism, type inference,
higher-order functions, recursive types, and recursion.
The idea of using software programming languages
to generate Verilog or some other HDL is not novel.
There exist projects that do so based on popular pro-
gramming languages: Clash and Lava based on Haskell
[9, 11], Chisel based on Scala [10], and HML based on
SML [12]. While there exist similarities between these
languages and GEMINI, the GEMINI language models a
wider range of software constructs. GEMINI also offers
certain abstractions such as the hardware record type,
which translates to a bit vector when compiled, but al-
lows the programmer to name and access fields in GEM-
INI as if the vector were a software record.
Most notably, the design and implementation of
GEMINI demonstrates novel concepts in type theory by
virtue of its unique type system. This includes a kinding
system with multiple atomic kinds, and dependent types.
Yet, the compiler is still able to perform type inference
by virtue of multi-staged compilation.
In Section 2, we provide a full specification of the
GEMINI language. We formalize the type system, soft-
ware and hardware grammars, typing rules, and evalua-
tion rules. In Section 3, we provide a proof of a desir-
able property of our type system, safety, which is bol-
stered by proofs of progress and preservation. In Sec-
tion 4, we transition from discussing the language de-
sign to the compiler implementation. We provide a de-
scription of all phases of the GEMINI compiler, and offer
an overview of all phases as well as a prototype imple-
mentation of the semantic analysis phase. In Section 5,
we provide examples of some GEMINI programs and the
compiled Verilog output. Finally, in Section 6 we dis-
cuss the scope for future extensions and improvements
towards the GEMINI project.
2 The GEMINI Language
We will first present a specification of the GEM-
INI language. GEMINI is inspired conceptually by lan-
guages in the ML family, and its syntax in particular
is derived mainly from SML. We begin with an infor-
mal overview of the key components of the language,
namely the type system, values, expressions, declara-
tions, and the core library. We will then formalize these
through the presentation of grammars, typing rules, and
evaluation rules. Such formalizations are necessary in
order to support our metatheory proofs in Section 3.
2.1 Types and kinds in GEMINI
In conventional type systems, there are two primi-
tives from which all kinds are produced: a single atomic
kind *1, and the constructor⇒2. Proper types, such as
int, real, and string belong to kind *. Type con-
structors, such as list and ref, belong to kind *⇒ *.
The latter are not types in their own right, but instead act
as functions that accept a type as an argument in order
to produces a type, such as an int list or a string
list ref [5].
In comparison, GEMINI possesses a type system
consisting of three atomic kinds:
1. *S: software type
2. *H : hardware type
3. *M : module type
The separation of kinds is an important and unique as-
pect of GEMINI. The motivation is to enforce a tri-
chotomy between software, hardware, andmodule types
in the type system. Figure 1 illustrates the three atomic
kinds and their constituent types.
Types in *S
int
real
string
*S list
{l1: *S, ..., ln: *S}
*S ref
*H sw
Ci *S
*S → *S
Types in *H
bit
*H [n]
*H @ n
#{l1: *H, ..., ln: *H}
Types in *M
*H  *H
Figure 1: GEMINI kinds
Further, the constructor ⇒ is separated into three
distinct variations:
1. S ⇒S : sofware-to-software
2. H ⇒S : hardware-to-software
3. H ⇒H : hardware-to-hardware
The subscript to the left of the constructor denotes
the kind that is accepted as an argument, and the sub-
script to the right of the constructor denotes the kind
1Pronounced type
2Pronounced to
2
A PREPRINT - NOVEMBER 12, 2019
that is produced. For example, the type constructors
list and ref are constructors of kind S ⇒S as they
construct a software type from another software type.
The constructor sw on the other hand is a constructor
of kind H ⇒S as it constructs a software type from a
provided hardware type.
There are several interesting aspects of the design of
this type system that warrant further elaboration.
Type descriptions. First, we will explicitly describe
each of the types listed above. In *S , the primitive types
are int, real, and string which represent integers,
real numbers, and sequences of characters, respectively.
The type constructor list of kind S ⇒S accepts a soft-
ware type and produces a software type representing an
ordered collection of elements of that type. The soft-
ware record is a type constructor of kind S ⇒S that
accepts an arbitrary number of software type parame-
ters and produces a record with named fields that can
be used to access each value. It is worth noting that
tuples and records are syntactically distinct, as will be
seen in the software grammar in Section 2.6, but are
represented identically in the type system; beyond the
lexical analysis phase, a tuple is treated as a record
with numerical indexes for field labels. The type con-
structor ref of kind S ⇒S accepts a software type and
produces a reference container whose inner value can
be mutated as in typical imperative programming lan-
guages. The type constructor sw is the only type con-
structor of kind H ⇒S , as it accepts a hardware type
and produces a software type. The resulting value acts
as a wrapper around a hardware value, which enables
the programmer to use it in software-typed constructs
such as lists or functions. The wrapper is opaque, in
that the underlying hardware value can never be directly
accessed or read in software constructs since it is not
known what exact value the hardware will take. How-
ever, it may be unwrapped to expose the hardware value
which may be used in hardware-typed constructs. The
type constructor Ci of kind S ⇒S is a variant in some
discriminated union type C. Lastly, the type construc-
tor →, pronounced "function", of kind S ⇒S accepts
a software type as the argument and produces another
software type. The recursive definition in terms of *S
allows for the existence of higher-order functions.
In *H , the only primitive type is bit, which repre-
sents a single bit value. There are three type construc-
tors of kind H ⇒H that form the remaining types in
*H . The first is the array type constructor, which ac-
cepts a hardware type and produces another hardware
type representing a fixed-length ordered array of ele-
ments of that type. It is important to note that the array
type is only partially defined by the element type; the
type is completely defined by the combination of the el-
ement type and the array size, denoted as n in Figure
1. This point will be elaborated upon further later in
this section. The second is the temporal type construc-
tor, which accepts a hardware type and produces another
hardware type representing a time-delayed value. As
with the array type, the temporal type is completely de-
fined by the combination of the input type and the delay
amount, denoted as n in 1. Lastly, the hardware record
is a type constructor that accepts an arbitrary number of
hardware type parameters and produces a record with
named fields that can be used to access each value. The
hardware-typed record is similar to the software-typed
record, with the sole distinction being the kind to which
the inner values must belong. Further, the tuple-record
duality holds for hardware-typed records as well.
The kind *M is simple as it contains a single
type constructor  , pronounced "module", of kind
H ⇒H that accepts a hardware type as the argument and
produces another hardware type. The separation of the
module type constructor into a separate kind is inten-
tional, and the motivation will be elaborated upon fur-
ther later in this section.
Asymmetric constructors. It is important to note
that among the four possible permutations of construc-
tors, only three are realized; the last, S ⇒H , does not
exist. This is because it is not possible to convert all
software-typed values into hardware, so such a construc-
tor would have to be restricted to operating on certain
kinds of software types as arguments. A simpler alterna-
tive approach is to define functions that convert certain
software types into hardware types, or more accurately
into software-wrapped hardware values. An example of
such a utility function could convert an integer to its 32-
bit signed representation as a bit array.
Restrained higher-order types. As alluded to ear-
lier, the separation of the module type constructor  
into a third kind is intentional and necessary in order
to enforce the expected semantics of hardware. If the
module type constructor were to have been defined in
*H , then this would give rise to higher-order modules in
which a module could be the input to another module.
These semantics do not have any meaning in hardware
and so the division of kinds was made to limit the ex-
pression of modules to the first-order; in fact, the mod-
ule type cannot be passed as an argument to any other
type constructor.
Dependent types. As mentioned previously, an in-
stance of the array and temporal types is defined by the
combination of a hardware type and an integer value.
The hardware type is a type argument that parameter-
izes these types, and the integer value can similarly
be considered as a value argument. As one would ex-
pect, the type of an array of 8 bits (bit[8]) is dis-
tinct from the type of an array of 8 two-entry bit-tuples
((bit * bit)[8]) since recursive comparison of the
types reveals that the element types differ. Follow-
ing the same principle, the type of an array of 8 bits
(bit[8]) is distinct from the type of an array of 16 bits
(bit[16]) since the array types are dependently typed
by the length; recursive comparison of the types reveals
that the lengths differ.
3
A PREPRINT - NOVEMBER 12, 2019
An important detail to explain is how semantic anal-
ysis can be performed on a type system with dependent
types. In order to carry out type-checking, it is necessary
to know the types of all values and expressions. Usually,
this is not a problem since types require no evaluation,
however dependent types must be known in order to de-
termine the complete instantiation of a type constructor.
Consider the following snippet of GEMINI.
1 let
2 val size = (* ... *)
3 module my_mod (a: bit[8]) = a[:7:]
4 val arg = #[size; gen i => ’b:0]
5 val h = my_mod arg
6 in
7 (* ... *)
8 end
We will not dwell on the details of syntax here; how-
ever, as an overview, this snippet makes a few declara-
tions. First, on line 2, size is assigned a value by some
expression. On line 3, a module my_mod is declared that
accepts an 8-bit array and returns the last bit. One line
4, arg is declared using an array generator with size as
its size and the zero-bit as the initializing value of ele-
ments. Lastly, h is declared as the result of passingarg
to the module my_mod. In determining whether this pro-
gram is well-typed, it is necessary to know the value of
size; the program is well-typed if and only if size is
equal to 8.
The GEMINI compiler makes this kind of analysis
possible through multi-staged compilation, an approach
taken by compilers of some other languages such as
MetaML[13]. Typically, the compiler first performs se-
mantic analysis and then, if the program is well-typed,
evaluates values. In GEMINI, the pair of phases is per-
formed twice: first for software-typed values and ex-
pressions, and then for hardware-typed values. After
the first pass, assuming the program has well-typed soft-
ware values and expressions, the resulting program con-
sists solely of hardware values whose types are fully
known, since all software values have already been eval-
uated. This allows for the subsequent semantic analysis
on hardware values, and ensures that the module pro-
duced by a GEMINI program is well-typed in terms of
the hardware as well.
Another interesting result of having dependent types
is that functions can be written to be parametrically poly-
morphic on values instead of types. For example, con-
sider the following snippet of GEMINI.
1 fun negate b = sw !(unsw b)
2 module map_module a = HW.map negate a
In Section 2.2 we will elaborate more on the func-
tion of sw and unsw as well as explain how HW.map
is actually written, though it is sufficient to know that
negate is a function that unwraps a bit, negates it, and
rewraps it, and that map_module is a module that takes a
bit array a and applies the negating function to each bit.
Here, map_module is parametrically polymorphic in the
size of the array; its type signature would be bit[n]  
bit[n]. Thus, a bit array of any size may be supplied
as an argument. It is important to note that while a GEM-
INI programmust return a module, it cannot be paramet-
rically polymorphic since explicit sizes must be known
in order to produce the appropriate Verilog. Thus, the
only way map_module can be used is by applying it at
some point in the program.
Hardware abstractions. The hardware record is an
example of how GEMINI makes it easier for the pro-
grammer to work with hardware values by introducing
a higher level of abstraction. In Verilog, records do not
exist since all data is expressed in terms of bits and ar-
rays. As such, at the final phase of compilation hard-
ware records are encoded as arrays by concatenating the
values of fields. Similarly, when accessing a field in a
record, the correct indices are computed based on the
sizes of preceding fields in order to retrieve the appro-
priate value. As a result, the output Verilog is correctly
expressed in terms of native types, and the GEMINI com-
piler handles the translation to and from this higher level
of abstraction.
2.2 GEMINI expressions
Having introduced the GEMINI type system, we now
turn our attention to the various expressions one may
use in a program. Expressions are different from val-
ues, which are atomic units of data that cannot be evalu-
ated any further, in that an expression can be reduced by
rules of evaluation to other subexpressions or terminal
values. In Section 2.6 we will formalize the expression
grammar, and in Section 2.9 we will formalize the rules
of evaluation; now we limit ourselves to an overview of
the expressions available in the language.
Function application. As described in Section 2.1,
a function accepts a software-typed argument and pro-
duces a software-typed result. Function application is
the invocation of a given function with a given argument
in order to produce a result. For parametrically polymor-
phic functions, the type of the result may depend on the
type of the argument. For example, consider the fol-
lowing GEMINI function that constructs a singleton list
from an element.
1 fun singleton x = [x]
The type signature of this function is ‘a -> ‘a
list, where ‘a is a type variable representing any type.
As a result, the type of the result will depend on what
singleton is supplied as an argument.
4
A PREPRINT - NOVEMBER 12, 2019
1 singleton 1 (* [1] : int list *)
2 singleton "a" (* ["a"]: string list *)
Since GEMINI also supports higher-order functions,
partial application (or currying) will result in another
function.
1 fun add a b = a + b
2 val add3 = add 3
In the snippet above, add has type signature int →
int → int, whereas add3 has type signature int →
int and will return the result of adding 3 to the integer
to which it is applied. This also demonstrates the first-
class status of functions as values, allowing them to be
passed as arguments to other functions. A common ex-
ample of this is the map function which has the type
signature (‘a -> ‘b) -> ‘a list -> ‘b list.
Operators. Operators are analogous to functions, ex-
cept their application is infix and the arguments are the
operands. In GEMINI, operator overloading is very rare
as is the case with many strongly typed languages. As
an example, the addition operator for integers is differ-
ent from that for reals3. This has two primary benefits.
First, it enables the exact determination of types when
performing inference without the need for explicit type
decoration. Second, it eliminates implicit type-casting
which reduces the potential for certain classes of pro-
gram bugs and improves performance by avoiding re-
dundant conversions which can be expensive. For an ex-
haustive list of the operators and their semantic results,
consult A.1 through A.5 of the Appendix.
Accesses. There are several types that contain inner
values or expressions that may need to be accessed. In
particular, the types are software records (and by exten-
sion software tuples), hardware records (and by exten-
sion hardware tuples), arrays, and references. Records
contain values named by labels, or fields, and inner val-
ues can be accessed with the syntax #f r, where f is
the field name and r is the record to access. As previ-
ously mentioned, a tuple is a special case of a record
with consecutively numbered fields beginning at 1, and
so tuple accesses are similarly made with the syntax #n
t, where n is an integer literal corresponding to the in-
dex and t is the tuple to access. Elements in an array
can be accessed with the syntax a[:i:] where a is the
array and i is the index, which may be an expression
or a value. Since the size of the array and the index are
both evaluated fully at the software evaluation stage, it
is possible to detect out-of-range accesses at compile-
time. Lastly, references can be accessed, or derefer-
enced, with the syntax $r where r is the reference. The
result is the current value inside the reference container.
Conditionals. In GEMINI, there are two forms of
conditional expressions to enable control flow: if-then-
else and if-then. The former is written with the syntax
if e1 then e2 else e3, and the result is e2 if the
guard e1, which must have an integer type, is nonzero,
else it is e3. The latter is a special case of the former,
where the else clause implicitly returns the empty tu-
ple, also known as unit. In both forms, the types of the
expressions in the then and else clauses must match.
Since the latter form always has the unit type in the else
clause, the expression of the then clause must always
have the unit type as well. The return type of condi-
tional expressions must also always be in *S .
Assignments. While dereferencing is used to access
the inner value of a reference, assignments are used to
mutate the inner value itself. The syntax is r := e,
where r is the reference and e is the expression to as-
sign to the inner value. The type of an assignment state-
ment is unit, as is the convention for side-effecting ex-
pressions.
Sequences. Sequence expressions allow multiple
expressions to be evaluated for side-effect, with the fi-
nal one serving as the return value. The syntax for a
sequence is a semi-colon separated list of expressions
of arbitrary length surrounded with parentheses.
Pattern-match. The pattern-match is ubiquitous in
strongly typed languages and serves as another, more
general, form of control flow. A pattern-match expres-
sion consists of a test expression e and an ordered set of
match-result pairs (m1, r1), ..., (mn, rn). The
value of e is compared to each match and for the first mi
that matches, the corresponding ri is returned. Pattern-
matching can be more expressive than simple condition-
als in certain cases as they can be used to inspect the
structure of the test expression, such as whether it is a
particular variant within a discriminated union type, or
how many elements it possesses if it is a list. The match
cases must also be exhaustive, or else the compiler will
issue a warning; this protects the programmer from cer-
tain classes of run-time errors by necessitating the ex-
plicit handling of each case.
Let-bindings. Let-bindings are used to bind iden-
tifiers with values or types and to then evaluate an ex-
pression in the augmented lexical scope. If an identifier
bound outside of a let-binding is bound again inside, the
previous binding is overridden within its scope and the
identifier is bound to the most recent value. In general,
an identifier is bound to the value from its most recent
declaration.
Software-wrapper. As mentioned in Section 2.1, the
type constructor sw is the sole member of H ⇒S . The
syntax to wrap a hardware value is sw h, where h is a
hardware value. The resulting value can be used in soft-
ware constructs such as functions or lists and eventually
unwrapped with the function unsw which has type sig-
nature ‘a sw → ‘a. The code snippet below demon-
3For integers the operator is + whereas for reals it is +.
5
A PREPRINT - NOVEMBER 12, 2019
strates a particularly useful method of using sw and
unsw.
1 unsw Array.fromList(
2 List.map hwMapFn (
3 Array.toList(sw array)
4 )
5 )
The HW.map function introduced in Section 2.1 is
actually constructed as shown above in expanded form.
We assume here that hwMapFn is a function with type
signature ‘a sw → ‘b sw and array is an array with
type ‘a[n] (note that the element type must match the
argument type of hwMapFn, and that the array size is
parametrically polymorphic). The use of sw allows us
to wrap and pass the array to other functions, which can
unwrap it as well as its elements in order to map to new
values. At no point is the programmer able to read or
use the specific hardware values in any way, though they
are able to apply logical operations which are later trans-
lated into Verilog.
Array generation. A special expression called ar-
ray generation allows for the initialization of hardware
arrays. The expression consists of two clauses: the ar-
ray size and an inline generator function. The array size
is specified as an integer-typed expression or value; re-
call that due to the multi-staged compilation of GEMINI
programs, expressions are permitted since they will be
evaluated fully prior to hardware type-checking, and so
the value argument to the array type constructor will be
known. The inline generator function is used to specify
the value of each element within the array. Within the
body of the generator function, an index variable can be
used to determine the element value. For example, the
following GEMINI snippet initializes a bit array of size
8 of alternating 0s and 1s.
1 #[8; gen i =>
2 if (i % 2) = 0
3 then ’b:0
4 else ’b:1
5 ]
2.3 GEMINI declarations
As described previously, let-bindings allow for the
binding of values or types to identifiers that can be ref-
erenced in other expressions within the lexical scope.
There are five types of declarations that can be made.
The specific syntax for each will be formalized in Sec-
tion 2.6, though we will provide an overview first.
Values. Declarations of values bind an identifier to
an expression or value. Since the GEMINI compiler per-
forms type inference, value declarations can either be
made implicitly, wherein the value type is omitted, or ex-
plicitly. The latter may be useful in the case that type in-
ference would infer a parametrically polymorphic type
whereas the programmer wants to restrict the type.
Functions. Declarations of functions bind an identi-
fier to a function by specifying one or more arguments
and the expression to evaluate when the function is in-
voked. The argument identifiers are lexically scoped in
the body of the function; further, if an argument uses
the same name as an identifier bound elsewhere, it will
take precedence. The function’s return value and any
arguments may also be optionally typed explicitly.
Types. Declarations of types bind an identifier to
a type. Parametrically polymorphic type constructors
may also be declared by preceding the identifier with a
type variable that is referenced by the type.
Datatypes. Declarations of datatypes bind an iden-
tifier to a discriminated union type, as well as spec-
ifying each of the variants and their type arguments.
Datatype declarations may be parametrically polymor-
phic as is the case with type declarations. Further,
software datatype declarations are syntactically distinct
from hardware datatype declarations in order to enforce
that the type arguments of all variants belong to the ap-
propriate kind.
Modules. Declarations of modules bind an identi-
fier to a module by specifying a single argument and
the body of the module. The module’s return value and
argument may also be optionally typed explicitly. Mod-
ules can also be dependently typed in the declaration
by specifying an integer-typed identifier that can be ref-
erenced throughout the module body. When the module
is invoked with a given integer value, it is substituted for
the identifier throughout the module. This is the mech-
anism by which polymorphic modules can be created;
however, as noted in our discussion of dependent types
in Section 2.1, a polymorphic module cannot be the re-
sult of a GEMINI program.
2.4 GEMINI core library
The GEMINI core library provides useful built-in
functions and modules for operating on software- and
hardware-typed terms, respectively. A complete list of
these can be found in A.6 of the Appendix.
2.5 Derived terms
As discussed in some previous sections, certain
GEMINI expressions are merely special cases of other
more general expressions. For example, the expression
1 if e1 then e2
is a special case of the more general if-then-else term
and can be rewritten as the following equivalent expres-
sion:
6
A PREPRINT - NOVEMBER 12, 2019
1 if e1 then e2 else ()
In fact, since the tuple is a special case of a record
with consecutively numbered fields beginning at 1, the
expression above is equivalently expressed as the fol-
lowing:
1 if e1 then e2 else {}
These special cases are called derived terms as
they are derived from more general expressions. Be-
yond the lexical analysis phase of compilation, they are
treated identically to the expressions from which they
are derived, which simplifies the implementation of later
phases. The complete list of derived terms can be found
in B.1 of the Appendix.
2.6 Software grammar
Having provided an informal overview of GEMINI,
we will begin to formalize the language beginning with
the grammars. In this and Section 2.7, we will provide
context-free grammars (CFG) for software and hard-
ware terms as a set of production rules, each of which
specifies how some non-terminal can be substituted by
a sequence of non-terminals and terminals.
2.6.1 Software value grammar
The first grammar represents software values; these
are non-reducible (or terminal) software-typed terms.
The grammar here does not represent the syntax, but in-
stead represents each value abstractly in a mathematical
sense. Figure 2 depicts an excerpt of the software value
grammar in Backus-Naur form (BNF).
〈swv〉 ::= 〈integer〉
| 〈real〉
| 〈string〉
| 〈list〉
| 〈software record〉
| 〈sw〉
| 〈ref 〉
| 〈variant〉
| 〈function〉
〈integer〉 ::= i ∈ Z∩ [−231, 231−1]
〈function〉 ::= λx : TS .e
Figure 2: GEMINI software value grammar in BNF (ex-
cerpt)
In the excerpt shown, there are three non-terminals.
The first represents a software value, which can be sub-
stituted for any of the non-terminals on the right. Two of
these are included in the excerpt. The first represents a
32-bit integer value, which is some i in the set of integer
numbers Z between −231 and 231, inclusive. The sec-
ond represents a function expressed as an abstraction in
the notation of the lambda calculus [5]. In this notation,
x represents the argument which has type TS which de-
notes some type of kind *S and some body expression
e. The complete software value grammar can be found
in B.2 of the Appendix.
2.6.2 Software term grammar
We now shift our attention to the grammar for soft-
ware terms, which includes both expressions and values.
Unlike the previous grammar, this one reflects the syn-
tax to be used in writing GEMINI programs. Figure 3 de-
picts an excerpt of the software term grammar in BNF.
〈exp〉 ::= 〈literal〉
| 〈access〉
| 〈let binding〉
| 〈conditional〉
| 〈operation〉
| 〈assignment〉
| 〈pattern match〉
| 〈sequence〉
| 〈application〉
〈literal〉 ::= 〈identifier〉
| 〈integer literal〉
| 〈real literal〉
| 〈string literal〉
| 〈list literal〉
| 〈software record literal〉
| 〈ref literal〉
| 〈sw literal〉
〈list literal〉 ::= [ 〈list-body〉 ]
| nil
〈list-body〉 ::= 〈exp〉
〈exp comma tail〉
| ǫ
〈exp comma tail〉 ::= , 〈exp〉
〈exp comma tail〉
| ǫ
〈ref literal〉 ::= ref 〈exp〉
〈sw literal〉 ::= sw 〈hwv〉
Figure 3: GEMINI software term grammar in BNF (ex-
cerpt)
The non-terminal 〈exp〉 represents all possible ex-
pressions of software terms. The first such choice is
7
A PREPRINT - NOVEMBER 12, 2019
the literal declaration of a value, denoted by the non-
terminal 〈literal〉. Each choice is another non-terminal
corresponding to some software type; the exception here
is the first choice, 〈identifier〉, which refers to an expres-
sion or value by the name to which it is bound.
Figure 3 shows the rules for three literals. The first is
the literal declaration of a list, which is either a comma-
separated list of software terms (that may be empty) sur-
rounded by square brackets, or the terminal nil which
represents the empty list. Here our definition of the non-
terminal 〈list-body〉 is recursive in reference to 〈exp〉.
Further, the definition of non-terminal 〈exp comma tail〉
is recursive in reference to itself. The second literal dec-
laration is that of the reference, which is the characters
ref followed by another 〈exp〉. The third literal decla-
ration is that of the software wrapper, which is the char-
acters sw followed by a hardware value represented by
the non-terminal 〈hwv〉. We will see the full hardware
value grammar containing this definition in Section 2.7,
though here we note that the definitions of the software
term grammar and hardware value grammar must co-
exist such that references from one to the other can be
made; the separation as presented in this paper is merely
to compartmentalize the syntax based on the kind to
which term types belong. The complete software term
grammar can be found in B.3 of the Appendix.
2.7 Hardware grammar
Another reason for the dichotomy of software and
hardware in the design of GEMINI is that while there are
both software-typed expressions, which are reducible,
and software-typed values, which are not, there only ex-
ist hardware-typed values. This is because the seman-
tics of hardware-typed expressions are ill-defined, since
no hardware circuit can be reduced any further than the
structure it takes4.
2.7.1 Hardware value grammar
We first present the hardware value grammar, which
represents hardware values abstractly instead of syntac-
tically. The full hardware value grammar is shown in
Figure 4.
We note a few interesting aspects of this value gram-
mar. First, the sole terminal values are 0 and 1 rep-
resenting the binary values that a bit may take. This
supports the idea that the atomic units of hardware cir-
cuits are bits, and are built using a variety of construc-
tors which are represented by the remaining alternations
of the grammar.
The next four alternations are logic gates: AND, OR,
XOR, andNOT. We note that compound logic gates such
as NAND and NOR and not represented as first-class
entities in this grammar and are instead constructed by
sequencing the appropriate gates5. The inputs to these
logic gates are recursively defined in terms of the non-
terminal 〈hwv〉 and may thus be themselves the outputs
of other logic gates.
The penultimate alternation is the array, which is
represented as a collection of zero or more hardware val-
ues6.
We also note the deliberate omission of hardware
records from our grammar definition. This is because
hardware records are translated into bit arrays at compi-
lation and therefore never exist as hardware values in a
true sense; they are merely a convenient abstraction for
programmers.
The final alternation is the delayed hardware value,
denoted by the application of the delay function δ. This
value will be temporally typed and the specific value
parameterizing the type will depend on the number of
clock cycles the value is delayed.
〈hwv〉 ::= 0
| 1
|
〈hwv〉1
...
〈hwv〉n
|
〈hwv〉1
...
〈hwv〉n
|
〈hwv〉1
...
〈hwv〉n
|
〈hwv〉
| #[ 〈hwv〉i∈0..n−1i ]
| δ( 〈hwv〉 )
Figure 4: GEMINI hardware value grammar in BNF
2.7.2 Hardware syntax grammar
We now shift our attention to the grammar represent-
ing the syntax for declaring hardware values in GEMINI
4The exception is hardware subcircuits defined in terms of literal bit values, which can be evaluated at compile-time. This is
considered an optimization by the compiler.
5In the case of NAND, this is an AND gate followed by a NOT gate
6The element values must all have the same type; this is not represented in the grammar but is enforced in the typing rules.
8
A PREPRINT - NOVEMBER 12, 2019
programs. Figure 5 presents an excerpt of the hardware
syntax grammar in BNF.
〈exp〉 ::= 〈literal〉
| 〈access〉
| 〈let binding〉
| 〈operation〉
| 〈parameterization〉
〈literal〉 ::= 〈bit literal〉
| 〈array literal〉
| 〈hardware record literal〉
〈bit literal〉 ::= ’b: 〈binary-digit〉
Figure 5: GEMINI hardware syntax grammar in BNF
(excerpt)
As mentioned previously, the grammar here is in-
tended to augment the grammar of software terms
shown in Figure 3. Certain non-terminals, such as 〈exp〉,
are repeated and any new rules appearing here should be
appended to those from earlier. The complete hardware
syntax grammar can be found in B.4 of the Appendix.
We also reiterate that the grammar itself is not re-
sponsible for enforcing typing rules. This allows us to
augment our existing software grammar with the hard-
ware productions. The typing rules will be formalized
in the next section.
2.8 Typing rules
It is now possible to verify whether a given program
is grammatically valid GEMINI with these grammars,
and they additionally allow us to may construct an ab-
stract syntax tree from said program. However, given
GEMINI is strongly and statically typed, grammatical
validity alone is not sufficient. We must also verify at
compile-time that the program is well-typed. We defer
discussing the implementation details until Section 4.3.
At this point, we will define the formal tools for encod-
ing what it means for a program to be well-typed, which
are the basis not only for implementation but also our
proofs of metatheory in Section 3.
A typing rule is a theorem, consisting of a set of
propositions or hypothesis, and a conclusion. They are
illustrated diagrammatically as a horizontal line above
which is written the antecedent clause is written and be-
low which is written the consequent clause. An excerpt
of the typing rules is shown in Figure 6.
The antecedent clause is true iff each of the propo-
sitions is true. Further, if the antecedent clause is true,
then the consequent clause is true. As an example, the
rule T-IFELSE is pronounced "if t1 has type int and
t2 has type T and t3 has type T, then if t1 then t2
else t3 has type T".
The typing rules are shown in full in B.5 of the Ap-
pendix. While these rules enable us to verify that expres-
sions are well-typed, they rely on knowing the types of
subexpressions. Since GEMINI allows implicit typing of
expressions and identifiers, the compiler must perform
type inference in order to determine the types of these
subexpressions. The typing rules in their current form
will not enable us to do so; for this, we must rely on
the inversion of the typing relation which we will see in
Section 3.
x : T ∈ Γ
Γ ⊢ x : T
(T-VAR)
Γ, x : T1 ⊢ t2 : T2
Γ ⊢ λx : T1.t2 : T1 → T2
(T-ABS)
Γ ⊢ t1 : T1 → T2 Γ ⊢ t2 : T1
Γ ⊢ t1t2 : T2
(T-APP)
t1 : int t2 : int
t1 + t2 : int
(T-INT-ADD)
t1 : TH t2 : TH
t1 & t2 : TH
(T-AND)
t1 : int t2 : T t1 : T
if t1 then t2 else t3 : T
(T-IFELSE)
Figure 6: GEMINI typing rules (excerpt)
2.9 Evaluation rules
Now with a rigorous formulation of the syntax and
typing rules of our language from Sections 2.6, 2.7, and
2.8, we must define precisely how expressions are eval-
uated. In defining the semantics of our language, we
must choose between three approaches: operational se-
mantics, denotational semantics, and axiomatic seman-
tics [5]. In this paper, we elect to define the semantics
in terms of operational semantics for its simplicity and
flexibility.
Operational semantics define an abstract state ma-
chine, where each state is an expression. The machine’s
behavior is defined by a transition function that either
yields the next state by performing a step of computa-
tion, or declares that the machine has halted by reaching
some terminal value. Operational semantics can be fur-
ther partitioned into small-step and big-step semantics.
Small-step semantics, or structural operational seman-
tics, consider how evaluation takes place one step at a
time. Big-step semantics, or natural semantics, instead
describe the final value to which some expression evalu-
ates [5].
In general, big-step semantics are less verbose since
intermediate expression states need not be encoded in
the machine behavior. However, small-step semantics
9
A PREPRINT - NOVEMBER 12, 2019
are more precise and readily translatable for implemen-
tation, and since our goal is to develop a compiler we
elect to define our evaluation rules in terms of small-step
semantics.
t1 −→ t
′
1
t1t2 −→ t
′
1t2
(E-APP1)
t2 −→ t
′
2
v1t2 −→ v1t
′
2
(E-APP2)
(λx.t1)v1 −→ [x 7→ v1]t1 (E-APPABS)
t1 −→ t
′
1
if t1 then t2 else t3
−→ if t′1 then t2 else t3
(E-IFELSE)
v1 : int v1 6= 0
if v1 then t2 else t3 −→ t2
(E-IFELSE-T)
if 0 then t2 else t3 −→ t3
(E-IFELSE-F)
Figure 7: GEMINI evaluation rules (excerpt)
Similarly to typing rules, each evaluation rule is a
theorem. An excerpt of the set of evaluation rules is
shown in Figure 7. In these rules, the character t de-
notes a term that may be further reduced by some evalu-
ation rule. The character v denotes a terminal value that
cannot be evaluated any further.
As an example, the rule E-IFELSE-T is pronounced
"if v1 has type int and v1 is not equal to 0, then the
expression if v1 then t2 else t3 evaluates to t2".
Next, we would evaluate t2 by the appropriate rule un-
til the result is a terminal value. The complete set of
evaluation rules is shown in B.6 of the Appendix.
The evaluation rules together define a precise
and unambiguous evaluation strategy. For example,
consider the expression if x then (if 1 then "a"
else "b") else "c". Under the evaluation rules of
our language, it is not possible for this to evaluate to
if x then "a" else "c", despite this being a state
that would evaluate to an equivalent value. We must
first evaluate the guard of the outer-conditional by rule
E-IFELSE. Once it is a terminal value, then we pick one
of the then- and else-clauses based on rules E-IFELSE-T
and E-IFELSE-F. A useful property is the determinacy
of one-step evaluation, stating that if t −→ t’ and t
−→ t”, then t’ = t”. This ensures that evaluation is
a deterministic process.
3 GEMINI Metatheory
In this section, we utilize the formalizations of gram-
mar, semantics, and evaluation made in the previous sec-
tion in order to prove desirable properties of GEMINI.
First, we must reiterate the definition of some terms.
A term t is in normal form if no evaluation rule can
be applied to it. A term t is in a stuck state if it is in
normal form but it is not a value. We aim to prove a ba-
sic property of GEMINI’s type system: safety. We will
have achieved this if we prove that a well-typed term
can never reach a stuck state during evaluation.
It is important to prove that a type system possesses
safety in order to ensure that well-typed programs can
be compiled and executed without entering a stuck state.
It is especially critical in the case of the GEMINI lan-
guage since software evaluation occurs as an intermedi-
ary step of compilation, and failure to be type-safe could
lead to a non-terminating process at compile-time.
In order to build to our proof of safety, we must
first establish supporting proofs of two other properties:
progress and preservation.
3.1 Proof of progress
Recall that a type system has the property of
progress if a well-typed term is never in a stuck state;
either it is a value or it or it can take a step according to
some evaluation rule.
In order to prove that our type system has this prop-
erty, we must first prove two supporting lemmas.
Lemma 1 (Inversion of Typing Relation). The follow-
ing are true, and constitute the inversion of the typing
relation:
1. If Γ ⊢ x : R, then x : R ∈ Γ.
2. If Γ ⊢ λx : T1.t2 : R, then R = T1 → R2 for
some R2 with Γ, x : T1 ⊢ t2 : R2.
3. If Γ ⊢ t1 t2 : R then there is some type
T11 such that Γ ⊢ t1 : T11 → R and that
Γ ⊢ t2 : T11.
4. If 〈integer〉: R, then R = int.
The remainder of the cases are omitted here and are
shown in full in C.1 of the Appendix.
Proof. Immediate from the definition of the typing
rules.
Lemma 2 (Canonical Forms). The following are true,
and constitute the canonical forms:
1. If v is a value of type int, then v is an integer
value according to the software value grammar.
2. If v is a value of type real, then v is a real
value according to the software value grammar.
3. If v is a value of type string, then v is a string
value according to the software value grammar.
4. If v is a value of type bit, then v is either 0 or
1.
5. If v is a value of type T1 → T2, then v =
λx:T1.t2.
10
A PREPRINT - NOVEMBER 12, 2019
6. If v is a value of type Ts ref, then v is a loca-
tion in store µ.
7. If v is a value of type {li: T
i∈1..n
i }, then v is
a value with the form {li = v
i∈1..n
i }.
8. If C is a constructor of datatype D accepting
type T1 and v is a value of type T1, then C v
is a value of type D with form 〈C=v〉.
9. If v is a value of type TH[n], then v is an array
value according to the hardware value gram-
mar.
10. If v is a value of type #{li: T
i∈1..n
i }, then v
is a value with the form #{li = v
i∈1..n
i }.
11. If v is a value of type TH sw, then v is a value
with the form ω(vH) for some vH of type TH .
Proof. We proceed through each case of the canonical
forms and refer to the inversion from Lemma 1.
Case 1: Values in this language can take several forms.
The case of an integer gives us our desired re-
sult immediately. All other forms cannot oc-
cur since we assumed that v has type int and
among the cases in consideration from Lemma
1, only case 4 tells us that the value has type
int.
The remaining cases are similar.
We are now equipped to prove the theorem of
progress.
Theorem 3 (PROGRESS). Suppose t is a closed, well-
typed term (⊢t:T for some type T). Then either t is a
value or else there is some t’ such that t −→ t’.
Proof. By structural induction on a derivation of t:T.
Case T-INT, T-REAL, T-STRING, T-BIT, T-NIL:
Immediate since t is a value.
Case T-APP:
t = t1 t2
⊢ t1 : T11 → T12
⊢ t2 : T12
By the induction hypothesis, either t1 is a
value or else there is some other t1’ for which
t1 −→ t1’, and likewise for t2. If t1 −→
t1’ then by E-APP1, t −→ t1’ t2. On the
other hand, if t1 is a value and t2 −→ t2’,
then by E-APP2, t −→ t1 t2’. Finally, if
both t1 and t2 are values, then case 5 of the
canonical forms lemma tells us that t1 has
the form λx:T11.t12 and so by E-APPABS, t
−→ [x 7→t2]t12 which is a value.
The remaining cases are shown in full in C.2 of the Ap-
pendix.
3.2 Proof of preservation
We are also equipped to prove the theorem of preser-
vation, which states that performing one step of evalua-
tion preserves the type of the original term.
Theorem 4 (PRESERVATION). If t:T and t −→ t’,
then t’:T.
Proof. By structural induction on a derivation of t:T.
At each step of the induction, we assume that the de-
sired property holds for all subderivations7 and proceed
by case analysis on the final rule in the derivation.
Case T-VAR:
t = x
x:T ∈ Γ
If the last rule in the derivation is T-VAR, then
we know from the form of this rule that t must
be a variable of type T. Thus t is a value, so
it cannot be the case that t −→ t’ for any
t’, and the requirements of the theorem are
vacuously satisfied.
Case T-APP:
t = t1 t2
Γ ⊢ t1:T11 → T12
Γ ⊢ t2:T11
T = T12
Looking at the evaluation rules with applica-
tion on the left-hand side, we find that there
are three rules by which t −→ t’ can be de-
rived: E-APP1, E-APP2, and E-APPABS. We
consider each case separately.
Subcase E-APP1:
t1 −→ t1’
t’ = t1’ t2
From the assumptions of the T-APP case,
we have a subderivation of the original
typing derivationwhose conclusion is Γ ⊢
t1:T11 → T12. We can apply the induc-
tion hypothesis to this subderivation ob-
taining Γ ⊢ t1’:T11 → T12. Combin-
ing this with the fact that Γ ⊢ t2:T11, we
can apply rule T-APP to conclude thatΓ ⊢
t’:T.
Subcase E-APP2:
Similar to E-APP1.
7This means that if s:S and s −→ s’, then s’:S whenever s:S is proved by a subderivation of the present one
11
A PREPRINT - NOVEMBER 12, 2019
Subcase E-APPABS:
t1 = λx:T11.t12
t2 = v2
t’ = [x 7→v2]t12
Using Lemma 1, we can deconstruct the
typing derivation for λx:T11.t12 yield-
ing Γ, x:T11 ⊢ t12 : T12. From this
we obtain Γ ⊢ t’:T12.
The remaining cases are shown in full in C.3 of the Ap-
pendix.
3.3 Proof of safety
Having proved the theorems of progress and preser-
vation, we are now ready to prove that the GEMINI type
system has the desired property of safety.
Theorem 5 (SAFETY). A well-typed term can never
reach a stuck state in evaluation.
Proof. Theorem 3 demonstrates that a well-typed term
is not stuck, and Theorem 4 demonstrates that if a well-
typed term takes a step of evaluation, then the resulting
term is also well-typed. In combination and by induc-
tion, these guarantee safety.
4 The GEMINI Compiler
At this stage, we have formalized the GEMINI lan-
guage grammars, typing rules, and evaluation rules, and
proved the desirable property of safety for our type sys-
tem. This positions us to implement our compiler.
The compiler discussed in this paper accepts a GEM-
INI program as an input and produces Verilog as an out-
put. We have written the compiler in the SML/NJ lan-
guage. In order to explain the implementation of the
compiler, we will decompose the implementation into
five sequential phases.
4.1 Lexer
The first phase of compilation is lexical analysis,
performed by the lexer module. In this phase, the pro-
gram is scanned to produce syntactic units called lex-
emes, which are then classified into a particular token
class. The lexer is specified by an ordered set of pat-
terns which match against certain sequences of charac-
ters within the language’s alphabet. The lexer scans lin-
early until it encounters a sequence of characters that
match a defined pattern. We denote a sequence of char-
acters from position i to j as Ci,j .
The lexer behaves deterministically by following
two priority rules:
1. Rule of longest match: if the lexer has encoun-
tered a valid sequence Ci,j , it will first check
whetherCi,j+1 is also valid; if it is, then it will
disregard Ci,j in favor of Ci,j+1, else it tok-
enizes Ci,j .
2. Rule of earliest pattern: if two lexer rules
match the same sequence Ci,j , then it will tok-
enize based on the pattern that appears earliest
in the list.
We provide illustrative examples of both rules. The
tokens >, =, and >= all exist in the GEMINI language as
operators. By the rule of longest match, when the lexer
encounters the character sequence >= it will tokenize it
as the single token >= instead of the token > followed by
the token =.
Further, GEMINI possesses the keyword if and de-
fines identifiers as alphanumeric sequences of charac-
ters8. If the character sequence if is encountered, the
tokenization will depend on the relative ordering of the
patterns. Since we desire the sequence to be tokenized
as the keyword if as opposed to the identifier if, we
must specify the pattern for the keyword before the pat-
tern for the identifiers such that we yield a keyword
when applying the rule of earliest pattern. If the order
were reversed, the lexer would never toknize the key-
word9.
In some cases, the lexer needs to perform some ba-
sic computation to attach values to tokens. For example,
GEMINI allows integers to be declared in various bases.
When the lexer encounters a hexadecimal representation
of an integer, such as #’h:beef, it computes the integer
value in base-10 and tokenizes it as an integer value, the
same way it would have treated the equivalent base-10
integer 48879. This allows various representations to
be treated identically past lexical analysis, thereby sim-
plifying the implementation of later phases by reducing
the number of cases to consider.
In addition to performing tokenization, the GEMINI
lexer is responsible for ensuring that comments are bal-
anced and that string quotes are closed before the end of
the program. It does so by maintaining global state of
nested comment depth and whether a quote has been left
open, and ensuring that when the end-of-file is encoun-
tered that both are the appropriate values (0 and false,
respectively).
The lexer was written using the ML-Lex tool devel-
oped by Andrew Appel. Each pattern is specified by a
regular expression to match against and an action to ex-
ecute; the action may be to report an error, generate a
token, perform some side-effect, or any combination of
the three. These patterns are written in a .lex file which
is compiled to generate the appropriate SML code that
perform lexing [6].
8Identifiers are actually defined slightly more restrictively, though this definition is fine for our example; a more precise defini-
tion can be found in the software grammar of B.3 of the Appendix.
9For this reason, the pattern for identifiers appears after all the pattern for all keywords.
12
A PREPRINT - NOVEMBER 12, 2019
4.2 Parser
The responsibility of the lexer is to enforce syntacti-
cal correctness of the input program. However, syntac-
tically correct programs may not necessarily be gram-
matically correct. For example, consider the following
grammatically incorrect program:
1 if if if
The lexer would process the program successfully,
though this is clearly an ill-formed program. We need
to additionally enforce correctness of the structure of
programs. This is the responsibility of the parser.
Once the program passes through the lexer, we ob-
tain a linear stream of tokens. The parser provides struc-
ture to the tokens by constructing an abstract syntax tree
(AST). The separation of the lexer and parser modules
is a useful software design as it allows us to modify the
concrete syntax of our language in the lexer – for ex-
ample, by replacing the equality operator = with == –
without requiring any changes to be made to the parser
which only sees the tokenized output from the lexer as
something such as EqualityOperator10.
The tool ML-Yacc was utilized to specify the CFG
for GEMINI programs. We define a set of production
rules in terms of the tokens produced by the lexer as well
as a set of non-terminals. Further, each production rule
is accompanied by a semantic action to specify some re-
turn value. As the program is processed by a look-ahead
LR parser, the AST is constructed from the expressions
returned by the semantic actions [7]. The production
rules can easily be translated from the definitions of our
syntax grammars from Sections 2.6 and 2.7.
ML-Yacc further allows for the specification of
precedence rules which dictate the order of precedence
for terminals. This affects the order in which the AST
is constructed. These are important to specify correctly
in order to enforce the expected semantics of certain ex-
pressions such as arithmetic ones that must follow the
correct order of operations. The order of precedence for
operators in GEMINI is shown in A.7 of the Appendix.
4.3 Semantic analysis
The responsibility of the parser is to enforce gram-
matical correctness of a syntactically correct program.
We have made progress, since our previously grammat-
ically incorrect program would be caught at the parsing
phase. However, the parser does not enforce semantic
correctness of a program. Consider the following syn-
tactically and grammatically correct, yet semantically
incorrect program:
1 42 * "a"
According to the definition of our typing rules in
Section 2.8, this program is not well-typed; the multipli-
cation operator * can only operate on operands of type
int, yet the second operand has type string. We need
to enforce semantic correctness of programs. This is the
responsibility of the semantic analysis phase.
In this phase, we recurse over the AST produced by
the parser and verify that the semantics of the program
are valid. However, a prohibitive issue is that not all
types are known yet, since declarations of values, func-
tions, and modules may be implicitly typed. We must
first infer the actual types of expressions in a program.
This is possible to do given that GEMINI’s type system
can be classified as a Hindley-Milner type system [5].
Thus, we further divide semantic analysis into three sub-
phases: decoration, inference, and type-checking.
4.3.1 Type decoration
In the first subphase, the GEMINI program is trans-
formed into an intermediate language we will refer to
as EXPLICITGEMINI. In this language, all implicitly-
typed identifiers are decorated with explicit types, as
demonstrated in Figure 8.
1 fun foo(x, y, s: string) =
2 (print(s); x * y)
(a)
1 fun foo(x: ‘a, y: ‘b, s: string): ‘c =
2 (print(s); x * y)
(b)
Figure 8: A function written in (a) GEMINI and (b) EX-
PLICITGEMINI
We begin by decorating each implicitly-typed identi-
fier with a type variable, ormetavariable. Metavariables
in GEMINI are different from those in conventional pro-
gramming languages in that there is a need to differen-
tiate between metavariables of different kinds. A soft-
ware metavariable may be later substituted by some soft-
ware type, but not a hardware type, and vice versa.
Each node in the AST is associated with informa-
tion about its type, which is some variant of the dis-
criminated union type Absyn.ty shown in full in D.2
of the Appendix. During the parsing phase, explicitly
typed identifiers are associated with the given type, such
as Absyn.IntTy for integer-typed identifiers, while im-
10This is the reason the AST is considered abstract, since references to the concrete syntax of the language have been shed.
13
A PREPRINT - NOVEMBER 12, 2019
plicitly typed identifiers are associated with a place-
holder type Absyn.PlaceholderTy.
In the decoration phase, the AST is reconstructed
and each identifier is newly associated the variant
Absyn.ExplicitTy which is constructed using a vari-
ant of the discriminated union type Types.ty, shown
in full in D.1 of the Appendix. Thus, an iden-
tifier that was explicitly typed and associated with
Absyn.IntTy during parsing would now have the type
Absyn.ExplicitTy(Types.S_TY(T.INT)). Further,
an identifier that had the type Absyn.PlaceholderTy
would be given a new fresh metavariable using either the
Types.S_METAor Types.H_META constructor based on
its kind. In some cases, the kind to which a metavari-
able belongs cannot be known yet, in which case it
is temporarily given a new fresh metavariable of type
Types.META and the kind is inferred later.
4.3.2 Type inference
After all identifiers are decorated with explicit types,
it is time to perform type inference. Also called type
reconstruction, this is the most complex phase of the
GEMINI compiler. There are two primary algorithms
underlying the inference phase: unification and substitu-
tion.
The goal of the unification algorithm is to com-
pute the smallest possible substitution mapping σ from
metavariables to types. The unification algorithm is
summarized in Figure 9.
The subroutines UnifyHWType and UnifySWType
are omitted from Figure 9 for the sake of brevity. Both
of these subroutines operate on the basis of structural
recursion over the variants of each discriminated union.
If the two types share the same outermost type, then the
appropriate subroutine is recursively called on the inner
types. The recursion terminates in three cases: (1) either
the terminal types are known andmatch, (2) the terminal
types are known and don’t match, or (3) a metavariable
is being unified with some other type. In the first case,
nothing happens and unification continues. In the sec-
ond case, there is a type mismatch and an error is raised.
In the third case, a mapping is made from the metavari-
able to the other type.
The result of the unification algorithm is a mapping
σ which is in turn used to augment a global substitu-
tion environment Σ. Since each metavariable is created
freshly, it is safe to maintain a global environment since
no two metavariables will correspond to the same ele-
ment in the domain, and each metavariable can only
map to a single element in the range; that is, the map-
ping function is injective.
In addition to Σ, there are two more environments
maintained although their scope is only local to their
lexical closure. These are the type environment τ and
the variable environment Γ. Within a let-binding, dec-
larations bind symbols to their types in these environ-
ments. Values, functions, and modules are bound in Γ
whereas types and datatypes are bound in τ . When pro-
cessing a function or module, the parameters are added
to the environment Γ and only exist within the scope of
the body. Since SML is a functional programming lan-
guage, the implementation lends itself to discarding the
augmented environments once the body has been pro-
cessed, which correctly emulates the behavior of lexical
scoping. As the AST is traversed, the mappings inΣ are
applied to both τ and Γ in order to persist the results of
unification. This constitutes the substitution algorithm,
the main idea of which is summarized in Figure 10.
The definition of subroutines SubHW and SubMod are
omitted, but are similar to that of SubSW. Some cases
from SubstituteType and SubSW are also omitted, but
the representative ones are shown. In substituting a
metavariable, we first determine if it is a bound variable.
If it is, then we must not substitute. If it is not, we look
up the mapping in Σ and return the mapped type if it
exists. We must also make sure to let the iteration al-
gorithm know if any substitution has occurred in order
for it to continue iterating until it reaches a fixed point.
Types like INT cannot be substituted any further and are
therefore returned as is. For type constructors that pos-
sess inner types, such as ARROW, the SubSW routine is
called recursively on the type arguments. The two most
interesting cases are S_POLY and S_MU, which we will
discuss further.
The S_POLY type is inferred any time a function is
parametrically polymorphic in its arguments. It repre-
sents the mathematical concept of the universal quanti-
fier ∀. The set PolyV ars denotes the metavariables that
are bound by the quantifier. As such, when performing
substitution, these must not be substituted. Only upon
function application does the S_POLY type become in-
stantiated at which point the universal quantifier is shed,
and each metavariable in PolyV ars is substituted uni-
formly with whichever type is supplied.
Before discussing S_MU, we must first momentarily
bring light to a special consideration made during the
decoration phase. Since datatypes may be recursive, it
is necessary to decorate their type uniquely when they
are declared. The reason for this is twofold. First,
while processing the body of the datatype there must
exist some reference to the datatype itself since the con-
structor may be self-referential. In decorating datatype
d, a temporary fresh metavariable m is generated and
the type environment τ is augmented with the mapping
{d 7→ m}. Then, the datatype constructors are deco-
rated with any recursive reference to d being replaced
with the metavariable m. Once the entire datatype has
been processed, the true discriminated union type Td
can be determined and Σ is augmented with the map-
ping {m 7→ Td}. Second, we wish to prevent infinite
substitution from occuring in the inference phase if the
datatype is recursive, and as such we wrap the type with
the operator µ, commonly known as µ-recursion [7]. In
14
A PREPRINT - NOVEMBER 12, 2019
IsHWType(t)
1 case t of
2 H_TY(_) => true
3 | _ => false
IsSWType(t)
1 case t of
2 S_TY(_) => true
3 | _ => false
Unify(t1, t2)
1 case t1 of
2 META(m) => {m 7→ t2}
3 | H_META(hm) => if IsHWType(t2) then {hm 7→ t2} else raise KindError
4 | S_META(sm) => if IsSWType(t2) then {sm 7→ t2} else raise KindError
5 | _ => case t2 of
6 META(m) => {m 7→ t1}
7 | H_META(hm) => if IsHWType(t1) then {hm 7→ t1} else raise KindError
8 | S_META(sm) => if IsSWType(t1) then {sm 7→ t1} else raise KindError
9 | _ => if IsHWType(t1) and IsHWType(t2)
10 then UnifyHWType(t1, t2)
11 else if IsSWType(t1) and IsSWType(t2)
12 then UnifySWType(t1, t2)
13 else raise KindError
Figure 9: Unification algorithm and subroutines (pseudocode)
substitution, whenever we encounter the µ operator, we
refrain from substituting any variables it binds. Only
when constructors are instantiated do we expand the re-
cursive definition once, preventing infinite expansion.
In inferring recursive functions, an approach similar
to the handling of recursive datatypes is taken. Namely,
the environment Γ is augmented with a mapping from
the function name to a function type with metavariables
taking the place of parameter and return types. When
processing the body, any application of the recursive
function can be unified since the preliminary definition
was polymorphic.
Type inference enables parametric polymorphismby
constraining types as loosely as possible. This allows a
single part of a program to be used with different types.
As an example, consider the following snippet of some
language STRICTGEMINI which has neither type infer-
ence nor parametric polymorphism.
1 fun concatInt
2 (x: int) (y: int list): int list = x::y
3
4 fun concatString
5 (x: string) (y: string list): string list
6 = x::y
The bodies of both functions are identical, yet we
must declare them separately in order to be able to con-
catenate integers and strings. In the lambda calculus, the
type of concatInt is λx : int.λy : int list.x :: y,
and the type of concatString is λx : string.λy :
string list.x :: y. For each further type, a sepa-
rate concatenation function would need to be written. In
GEMINI, type construction enables us to instead define
just the following.
1 fun concat x y = x::y
Not only is the GEMINI code less verbose, but it
can be used to concatenate an element of any type to
a list of the same type. By case 44 of the inversion
of the typing relation, we know that x has some type
TS and y has some type TS where TS is the same in
both cases. Since there are no further restrictions on the
type TS , the inference algorithm would find the loosest
possible type which would be some metavariable that
we will call ‘a. Therefore, our parametrically polymor-
phic function concat has the type signature ‘a → ‘a
list → ‘a list. In the notation of the lambda cal-
culus, the function is represented as ∀‘a.λx : ‘a.λy : ‘a
list.x :: y. When the function is applied, the quantifier
is removed and the metavariable is substituted uniformly
for the concrete type of the arguments.
An optimization of the compiler is to gracefully han-
dle type mismatches at this stage in order to allow the
rest of the program to be type-checked without propa-
15
A PREPRINT - NOVEMBER 12, 2019
Substitute(Σ, env)
1 hasChanged ← false
2 env′ ← env
3 while hasChanged do
4 for (name, type) in env do
5 env′ ← env′ ∪ {name 7→ SubstituteType(type, Σ, hasChanged)}
6 return env′
SubstituteType(type,Σ, hasChanged) (excerpt)
1 case type of
2 S_TY(stype) => SubSW(type, Σ, hasChanged, ∅)
3 | H_TY(htype) => SubHW(type, Σ, hasChanged, ∅)
4 | M_TY(mtype) => SubMod(type, Σ, hasChanged, ∅)
5 .
6 .
7 .
SubSW(type,Σ, hasChanged, BV ) (excerpt)
1 case type of
2 S_META(sm) => if sm ∈ BV
3 then type
4 else if sm ∈ dom(Σ)
5 then case Σ(sm) of
6 S_TY(type′) => (case type′ of
7 S_META(sm′) => if sm 6= sm′
8 then hasChanged ← true
9 | _ => hasChanged ← true;
10 type′)
11 | _ => type′
12 else type
13 | INT => INT
14 | ARROW(stype1, stype2) =>
15 ARROW(SubSW(stype1, Σ, hasChanged, BV ), SubSW(stype2, Σ, hasChanged, BV ))
16 | S_POLY(PolyV ars, ty′) =>
17 S_POLY(PolyV ars, SubSW(ty′, Σ, hasChanged, BV ∪ PolyV ars))
18 | S_MU(MuV ars, ty′) => S_MU(MuV ars, SubSW(ty′, Σ, hasChanged, BV ∪ MuV ars))
19 .
20 .
21 .
Figure 10: Substitution algorithm and subroutines (pseudocode)
gating the error forward. The way this is done is by
using the TOP and BOTTOM types, and the software- and
hardware-typed equivalents, found in D.1. These act as
the top and bottom types of the type system and are as-
signed to identifiers that fail to be unified during type in-
ference; this prevents a single type mismatch from caus-
ing many errors if the identifier is used in many other
expressions.
4.3.3 Type checking
With all types decorated and subsequently inferred,
we are able to perform type-checking. The typing rules
from Section 2.8 are directly utilized in semantic verifi-
cation. This is done in a recursive manner, since proposi-
tions of typing rules make statements about the types of
subexpressions in order to verify the semantics of the en-
tire expression. The recursion terminates once we reach
a typing rule with no propositions, which indicates it is
an axiom of our system.
In our implementation, type-checking and type infer-
ence are actually performed concurrently. This is an op-
timization to avoid an unnecessary additional traversal
of the AST, since the typing rules can be enforced once
the types of subexpressions have been inferred. This is
done by augmenting the unification algorithm from Fig-
ure 9 in order to determine whether any two types can
be unified, even if neither are metavariables. The way
that this is done is by comparing the structure of the
two types. If the types are both type constructors, then
the unification algorithm recurses on the type arguments.
The termination case of the recursive algorithm is when
16
A PREPRINT - NOVEMBER 12, 2019
both types are proper types. If the types differ at any
point, then there is a mismatch and an error is reported.
4.4 Software evaluation
Once semantic analysis has been performed, we can
safely begin the software evaluation phase knowing we
have a well-typed program with regards to software ex-
pressions11. Typically, compilers do not evaluate the
program they are compiling and instead produce code
in some target language that is functionally equivalent
to the source program. However, in the case of the GEM-
INI compiler there are two reasons we need to evaluate
software terms. First, we wish to produce Verilog code
which does not support certain software primitives. Sec-
ond, we are required to evaluate software terms in order
to perform hardware type-checking. As a result, this
phase of compilation evaluates all software-typed ex-
pressions in order to generate an intermediate represen-
tation (IR) tree that consists solely of hardware-typed
values, which we will later use to generate Verilog code.
In our implementation, we evaluate according to the
rules from Section 2.9. Having intentionally elected to
define our rules by small-step semantics, the translation
to implementation is much easier. For example, Figure
11 demonstrates how conditional expressions are evalu-
ated in this phase12.
As can be seen on line 3 of Figure 11 the guard is
first evaluated recursively until a value is achieved. On
line 5 we retrieve the integer constant for that value and
compare it to 0: if it is non-zero, then we evaluate the
then-clause, else we evaluate the else-clause. Compare
this strategy of evaluation to the rules E-IFTHEN, E-
IFTHEN-T, and E-IFTHEN-F and notice how they are
aligned.
While performing evaluation, a value store is main-
tained that maps symbols to the values they possess,
specifically utilizing the Value.value datatype shown
in full in D.3 of the Appendix. Any time an identifier is
referenced, its assigned value is looked up in the value
store. This naturally enables lexical scoping since the
value store within an inner scope is discarded once that
scope is exited.
Evaluating function and module declarations war-
rants special discussion. When a function is declared, its
name is bound in the value store to a Value.value →
Value.value function. When this function is called,
the argument Value.value is bound to the function pa-
rameter names and the body is evaluated with the aug-
mented value store. The reason we bind to a function
as opposed to simply binding the function name to the
function body is because the values of the arguments are
only known upon function application. SML’s closure
rules enable the value function to remember the state of
the value store upon declaration and process the body
correctly after augmenting it with the supplied parame-
ters.
Module declarations are complicated by one more
consideration. If modules are instantiated in the pro-
gram itself, a similar approach can be taken in order to
expand the module body inline. However, the top-level
module that is returned is not instantiated but needs to
be expanded in order to generate the final Verilog code.
This is done by capturing the argument names when the
module is declared and storing it as part of the module
value; this is exhibited by the Value.ModuleVal vari-
ant of the datatype in D.3. At the top-level, the argu-
ment names are applied to the module function to ex-
pand the module body with all appearances of the argu-
ment variables replaced with Value.NamedVal, which
represents some input pin the output Verilog module.
4.5 Code generation
At the final phase of compilation, we are left with
a tree of hardware values representing a circuit consti-
tuted only of bits, gates, arrays, records, and pins.
First, an additional pass of hardware type-checking
must be performed. The manner in which this is done is
similar to the process for software type-checking. As an
optimization, this is done concurrently with the produc-
tion of Verilog instead of performing an extra traversal.
The manner in which Verilog is produced is also
similar to previous phases. We recurse over the AST
and build the output from the results of subtrees. The
base case of recursion is when an input pin or constant
bit value is encountered, in which case the appropriate
symbol or constant is returned. At each node of the hard-
ware tree, the subexpressions are evaluated to determine
the wire that holds their value, and then the node itself
is evaluated. A fresh variable name is generated and
returned for superexpressions to use in computing them-
selves. For each generated instruction, the type of the
wire is noted and an appropriate declaration is made at
the top of the module. The result is a series of declara-
tions and instructions which are finally emitted in order
to produce the output.
Hardware records are treated specially during this
phase13. Verilog does not support record types since all
data must be expressed in terms of bits and arrays. As
a result, records are encoded into arrays by contanating
the values of fields. Similarly, when accessing a record,
the appropriate range of bits is determined and the array-
transformed record is decoded in order to retrieve the
appropriate field.
11Recall we can’t always claim with certainty that the hardware terms are well-typed at this point, since their types may rely on
the evaluation of software terms.
12The code here is modified slightly for ease of readability
13Tuples are treated in the same way but since they are a derivation of records, they are not mentioned here.
17
A PREPRINT - NOVEMBER 12, 2019
1 evalExp(Absyn.IfExp{guard, then’, else’, pos}) =
2 let
3 val guardVal = evalExp(guard)
4 in
5 if (getInt(guardVal)) <> 0
6 then evalExp(then’)
7 else evalExp(else’)
8 end
Figure 11: Evaluating conditional expressions
5 Examples
We will now present some examples of GEMINI
programs to reinforce some of the concepts described
throughout this paper. Each of the examples gets in-
creasingly more practical in demonstrating how GEM-
INI can be effectively used to generate Verilog code and
we will see the output of a GEMINI program in the final
example.
5.1 Canonical Functions
The first example we observe exemplifies that the
expressive power of GEMINI is comparable to that of
modern programming languages. In Figure 12, we see
some GEMINI code for the canonical list functions of
the functional programming paradigm. Each of these
are defined using pattern-matching with structural de-
composition and recursion. We also see how discrimi-
nated union types can be used to create the parametri-
cally polymorphic option datatype.
5.2 Explicit Logic
While software functions like the ones shown in Fig-
ure 12 are useful, they are not the main point of GEM-
INI. Software constructs are a means to an end used
to make it easier to describe hardware, but ultimately a
GEMINI program must return some circuit that can be
represented in Verilog.
In this example, we showcase a GEMINI program
that makes use of helpful features such as pattern-
matching, recursion, and discriminated union types in
order to build a system of making explicit logic declara-
tions. Figure 13 shows this program.
It is worth paying attention to the interplay between
software- and hardware-typed values in this example,
and how sw and unsw are strategically used to convert
from one kind to the other. This program exemplifies
the way in which hardware values can be passed around
in software constructs, such as functions and datatypes;
their values are never read or accessed, but operations
may be applied to them.
One may imagine lines 2 through 22 being defined
in a library, with toHW exposed for use by other pro-
grams. The function toHW takes an expression of type
explicitLogic and returns a value of type bit. It
does so by matching the expression against the variant
types, each time recursively applying itself to the inner
values except in the case of the base variant INP. We can
see on line 21, for example, that if the expression has the
variant type NOT then toHW first calls itself recursively
on the inner value which it then unwraps, applies with
the bitwise-not operator, and rewraps. The unwrapping
and rewrapping are necessary to adhere to the kinding
system since the bitwise-not operator can only be ap-
plied to values with the type bit which is in *H . In the
production of Verilog, we collapse the wrap operators
and apply the operators as specified.
The module defined on line 24 takes three argu-
ments, each of type bit14. A logic expression is then ex-
plicitly written using the variants from explicitLogic
and the result is converted to the hardware circuit equiv-
alent using toHW and unwrapped in order to reveal the
hardware value. As the comment on line 26 explains,
this module definition is equivalent to if we had written
!(c ^(a & b)). However, the point of this example is
to demonstrate how we can introduce greater flexibility
in a few ways. First, we allow an arbitrary number of
inputs to be passed to the operators as a list. This is use-
ful if we wish to construct a hardware circuit from some
external source, such as a text file or as command line
arguments. We can imagine writing interpreters of cer-
tain specifications to read in files and generate circuits
by leveraging our explicit logic generator. Further, we
are able to define composite operators from our primi-
tive ones. We have done this for NAND and NOR on lines
8 and 9, though we may define more complex operators
as well.
5.3 N-bit RCA
The final example will demonstrate the implemen-
tation of an n-bit ripple carry adder (RCA) circuit as a
GEMINI program. A difficulty of Verilog is that mod-
ules cannot be parametrically polymorphic in the size
of inputs. As a result a circuit that requires a certain
14This is not explicitly specified, but it can be inferred since they are each wrapped with sw and passed as an argument to the INP
constructor. Since the INP constructor accepts a bit sw, it must follow from our unification algorithm that each has the type bit.
18
A PREPRINT - NOVEMBER 12, 2019
1 fun map f x = case x of
2 [] => []
3 |: a::rest => (f a)::(map f rest)
4
5 fun filter f x = case x of
6 [] => []
7 |: a::rest => if (f a)
8 then a::(filter f rest)
9 else (filter f rest)
10
11 fun foldl f acc init = case init of
12 [] => acc
13 |: (x::rest) => (foldl f (f(x, acc)) rest)
14
15 fun foldr f acc init = case init of
16 [] => acc
17 |: (x::rest) => f(x, (foldr f acc rest))
18
19 sdatatype ’a option = SOME of ’a
20 |: NONE
21
22 fun mapPartial f x = case x of
23 [] => []
24 |: (a::rest) => (case (f a) of
25 NONE => mapPartial f rest
26 |: SOME v => v::(mapPartial f rest))
Figure 12: Canonical list functions in GEMINI
module, such as an RCA, to be instantiated on inputs
of different sizes must duplicate the definition of these
modules for each desired input size. From the perspec-
tive of software design, there are several problems with
this approach. First, the same circuit definition must
be made repeatedly which increases the overall size of
the module definition for which readability and main-
tainability suffer. Second, any changes that need to be
made to the general circuit algorithm must be made to
each definition of the module. Third, as the input sizes
increase, the complexity of the circuit may too increase
exponentially making it unfeasible to program without
the possibility of introducing subtle errors.
The n-bit RCA shown in the program in Figure 14
demonstrates how GEMINI can be used to define an
RCA module that is parametrically polymorphic in in-
put size.
On line 2, we define the desired size of the input ar-
rays to be added. This can be defined as a constant, the
result of some expression, or an input from a text file or
command line. The module rca defined on line 13 is
parameterized by the symbol n which is an integer re-
ferred to in the body, namely to define the size of the
array in the generation expression on line 15. While we
cannot return a parametrically polymorphic module, we
can instantiate it to be used by another module which
is what we have done on line 23. The result is an n-
bit RCA module named n_bit_rca that takes two bit
arrays of equal size and produces the result of addition
using the ripple carry method. We can see the Verilog
that is produced as a result of compiling this GEMINI
program with numbits = 2 in Figure 1515.
6 Future Work
We concludewith some remarks on the variousways
in which to improve upon and extend the work presented
in this paper.
6.1 Improvements
While the GEMINI compiler presented in this paper
makes a few optimizations to reduce compilation time,
there is additional scope for improvements to the output
Verilog code. We list some of these in this Section.
Dataflow optimizations. The analysis of dataflow is
common in compiler optimization as a way to determine
the flow of information through a program. In particular,
available expression analysis is a useful kind of dataflow
optimization in the context of GEMINI in order to re-
duce the number of logical gates used in the final pro-
gram. The analysis of available expressions can be used
in order to carry out common subexpression elimination
by determining expressions that need not be recomputed
and reusing their results after the initial computation in-
stead of computing them again. This would also have
15The Verilog module name is taken from the name of the GEMINI file from which it is compiled.
19
A PREPRINT - NOVEMBER 12, 2019
1 let
2 sdatatype explicitLogic = AND of explicitLogic list
3 |: OR of explicitLogic list
4 |: XOR of explicitLogic list
5 |: NOT of explicitLogic
6 |: INP of bit sw
7
8 fun NAND lst = NOT(AND lst)
9 fun NOR lst = NOT(OR lst)
10
11 fun toHW expl =
12 let
13 fun listToArray elist =
14 sw #[List.length elist; gen i => (unsw (toHW (List.nth(elist, i))))]
15 in
16 case expl of
17 INP(b) => b
18 |: AND(lst) => sw (&-> (unsw (listToArray lst)))
19 |: OR(lst) => sw (|-> (unsw (listToArray lst)))
20 |: XOR(lst) => sw (^-> (unsw (listToArray lst)))
21 |: NOT(el) => sw (! (unsw (toHW el)))
22 end
23
24 module mycircuit #(a, b, c) =
25 let
26 (* equivalent to if we had written !(c ^ (a & b)) *)
27 val temp = NOT(XOR[INP(sw c), AND [INP(sw a), INP(sw b)]])
28 in
29 unsw (toHW temp)
30 end
31 in
32 mycircuit
33 end
Figure 13: Explicit Logic in GEMINI
the effect of reducing the number of intermediary wires
declared for a given module.
Constraining resources finitely. Along a similar
line, it may be the case that the circuit to be designed
realistically has specific limitations on the number of
resources, such as wires and logic gates, that are avail-
able for use. As such, the GEMINI compiler can be
instructed to constrain resource allocation to a certain
amount. Dataflow optimizations such as common subex-
pression elimination are one way to reduce resource con-
sumption, though other methods such as register alloca-
tion can be leveraged in order to deal with a finite num-
ber of available resources.
Improved legibility. Currently, a compiled GEMINI
program produces a single Verilog module. Any inter-
mediate modules defined within a GEMINI program are
effectively expanded inline in the final module that is
produced. For large and complex circuits, this can neg-
atively impact modularity and readability by producing
monolithic modules that are hard to test and debug. A
useful improvement to the GEMINI compiler would be
to enable multiple modules to be produced from a sin-
gle GEMINI program. One possible mechanism to allow
this to happen is to mark certain modules as "persistent",
which would prevent the compiler from expanding them
inline and instead create them as a separate module that
can be referred from the top-level module in the result-
ing Verilog file. Another issue with legibility is that
the names of wires are generated automatically, which
makes it difficult to trace the origins of a particular log-
ical operation to the source GEMINI code. A valuable
improvement to the GEMINI compiler would be to al-
low programmers to optionally name the results of cer-
tain logical operations such that the produced Verilog
code is more understandable.
6.2 Extensions
Testbench generation. An important aspect of de-
signing electronic circuits is the ability to test its cor-
rectness. Popular programmable logic design software
such as Quartus support the simulation of Verilog test-
benches. The testbench is a Verilog file that defines a
simulation of a given module against certain inputs, in-
tended to ascertain the correct behavior. There is there-
fore scope for an extension to the GEMINI compiler in
which a programmer can write GEMINI testbenches or
20
A PREPRINT - NOVEMBER 12, 2019
1 let
2 val numbits = (* ... *)
3 module rca_helper #(ai : bit, bi : bit, cin : bit) =
4 let
5 val sum = ai ^ bi ^ cin
6 val cout = (ai & bi) | (ai & cin) | (bi & cin)
7 in
8 #(cout, sum)
9 end
10
11 fun getSecond x = (sw #2(unsw x))
12
13 module rca <:n:> #(a, b) =
14 let
15 val couts = #[n; gen i =>
16 if i = 0 then
17 rca_helper #(a[:i:], b[:i:], ’b:0)
18 else
19 rca_helper #(a[:i:], b[:i:], #1(couts[:i - 1:]))]
20 in
21 unsw Array.fromList(List.map getSecond (Array.toList(sw couts)))
22 end
23 module n_bit_rca #(a : bit[numbits], b : bit[numbits]) = rca <:numbits:> #(a, b)
24 in
25 n_bit_rca
26 end
Figure 14: N-bit RCA in GEMINI
1 module adder(input [1:0] a, input [1:0] b, output reg [1:0] out);
2 reg r9, r13, r10, r12, r11, r2, r6, r8, r7, r3, r5, r4;
3 reg [1:0] r1;
4
5 always @(*) begin
6 r4 <= a[1];
7 r5 <= b[1];
8 r3 <= r4 ^ r5;
9 r7 <= a[0];
10 r8 <= b[0];
11 r6 <= r7 & r8;
12 r2 <= r3 ^ r6; r11 <= a[0];
13 r12 <= b[0];
14 r10 <= r11 ^ r12; r13 <= 1’b0;
15 r9 <= r10 ^ r13; r1[1] <= r2; r1[0] <= r9;
16 out <= r1;
17 end
18 endmodule
Figure 15: 2-bit RCA in Verilog, compiled from GEMINI
21
A PREPRINT - NOVEMBER 12, 2019
individual test cases that are used to produce a Verilog
testbench to accompany the produced module. These
can then both be used in software such as Quartus to
verify that the module behaves as expected.
GEMINI signature files. The concept of interfaces
is borrowed from popular modern programming lan-
guages. In SML and OCaml, these are named signa-
tures. They act by limiting the ways in which other pro-
grams are able to interact with each other, keep some of
the implementation details private to a given program.
In general, this does not have a significant effect on the
quality of the produced Verilog module, though it can
lead to improved software design practices when deal-
ing with projects of a large scale.
Backends for other HDLs. As mentioned in Sec-
tion 4.4, the software evaluation phase of compilation
produces an intermediate representation of our program
consisting solely of hardware-typed values. The com-
piler was designed in this way intentionally to improve
modularity; by generating an IR, the frontend and back-
end of the compiler are largely independent. In the com-
piler presented in this paper, the IR was used to generate
a Verilog program. However, additional backends may
be developed in order to produce modules of other pop-
ular HDLs, such as VHDL. These backends can use the
IR produced by the existing frontend thereby halving
the amount of work needed to compile to a new target
language.
References
[1] Barbacci, M. "A comparison of register transfer lan-
guages for describing computers and digital sys-
tems," Carnegie-Mellon University, Dept. of Com-
puter Science, March 1973
[2] "Verilog’s inventor nabs EDA’s Kaufman award".
EE Times. November 7, 2005
[3] Department of Defense (1992). Military Standard,
Standard general requirements for electronic equip-
ment. Retrieved November 15, 2017
[4] Barbacci, M., Grout S., Lindstrom, G., Maloney,
M.P. "Ada as a hardware description language :
an initial report," Carnegie-Mellon Univ., Dept. of
Computer Science, 1984
[5] Pierce, Benjamin C. Types and Programming Lan-
guages. MIT Press, 2002
[6] Appel, Andrew W., et al. “A Lexical Analyzer
Generator for Standard ML.” A Lexical Ana-
lyzer Generator for Standard ML. Version 1.6.0,
October 1994, Princeton University, Oct. 1994,
www.smlnj.org/doc/ML- Lex/manual.html.
[7] Tarditi, David R., and Andrew W. Appel. “ML-
Yacc User’s Manual Version 2.3.” Princeton Univer-
sity, The Trustees of Princeton University, 6 Oct.
1994, www.cs.princeton.edu/ appel/modern/ml/ml-
yacc/manual.html
[8] Ciletti, Michael D. (2010). Advanced Digital De-
sign with Verilog HDL. Prentice Hall.
[9] C. Baaij, M. Kooijman, J. Kuper, A. Boeijink, and
M. Gerards, “C?aSH: Structural Descriptions of
Synchronous Hardware Using Haskell,” 2010 13th
Euromicro Conference on Digital System Design:
Architectures, Methods and Tools, 2010.
[10] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Water-
man, R. Avižienis, J. Wawrzynek, and K. Asanovic´,
“Chisel,” Proceedings of the 49th Annual Design Au-
tomation Conference on - DAC 12, 2012.
[11] P. Bjesse, K. Claessen, M. Sheeran, and S. Singh,
“Lava: hardware design in Haskell,” ACM SIG-
PLAN Notices, vol. 34, no. 1, pp. 174–184, Jan.
1999.
[12] Y. Li and M. Leeser, “HML: an innova-
tive hardware description language and its
translation to VHDL,” Proceedings of ASP-
DAC95/CHDL95/VLSI95 with EDA Technofair,
Feb. 2000.
[13] W. Taha and T. Sheard, “MetaML and multi-
stage programming with explicit annotations,” The-
oretical Computer Science, vol. 248, no. 1-2, pp.
211–242, 2000.
22
A PREPRINT - NOVEMBER 12, 2019
A Language Specification
A.1 Hardware Operators
Operator Syntax Semantic Result
& e1 & e2 Bitwise logical "and" of composing bits1
| e1 | e2 Bitwise logical "or" of composing bits1
^ e1 ^e2 Bitwise logical "xor" of composing bits1
! !e Negation of bit operand
<< e1 << e2
Shifts left bit array operand to the left by the amount specified (as an unsigned integer) by the right
bit array
>> e1 >> e2
Shifts left bit array operand to the right by the amount specified (as an unsigned integer) by the
right bit array
>>> e1 >>> e2
Shifts left bit array operand to the right by the amount specified (as an unsigned integer) by the
right bit array, filling with the most significant bit value
&-> &->e1 Bitwise and-reduction of bit array operand
|-> |->e1 Bitwise or-reduction of bit array operand
^-> ^->e1 Bitwise xor-reduction of bit array operand
&& e1 && e2 Or-reduction of both bit array operands, followed by bitwise logical "and" of resulting bits
|| e1 || e2 Or-reduction of both bit array operands, followed by bitwise logical "or" of resulting bits
^^ e1 ^^e2 Or-reduction of both bit array operands, followed by bitwise logical "xor" of resulting bits
1 These operators recurse through subexpressions to perform the bitwise operation on the composing bits while retaining the
overall structure.
A.2 Arithmetic Operators
Operator Syntax Semantic Result
˜ ~e Negation of integer operand
+ e1 + e2 Addition of integer operands
- e1 - e2 Subtraction of right integer operand from left integer operand
/ e1 / e2 Division of left integer operand by right integer operand, rounded towards negative infinity
* e1 * e2 Multiplication of integer operands
% e1 % e2 Modulo of dividend left integer operand with divisor right integer operand
+. e1 +. e2 Addition of real operands
-. e1 -. e2 Subtraction of right real operand from left real operand
/. e1 /. e2 Division of left real operand by right real operand
*. e1 *. e2 Multiplication of real operands
A.3 Conditional Operators
Operator Syntax Semantic Result
andalso e1 andalso e2 Logical conjunction of both integer operands
orelse e1 orelse e2 Logical disjunction of both integer operands
not not e Logical complementation of integer operand
A.4 Comparison Operators
Operator Syntax Semantic Result
= e1 = e2 Equality of operands
<> e1 <> e2 Non-equality of operands
> e1 > e2 Left operand has a strictly greater order than right operand
< e1 < e2 Left operand has a strictly lesser order than right operand
>= e1 >= e2 Left operand has a greater or equal order compared to right operand
<= e1 <= e2 Left operand has a lesser or equal order compared to right operand
23
A PREPRINT - NOVEMBER 12, 2019
A.5 List Operators
Operator Syntax Semantic Result
:: e1::e2 Concatenation of left element operand to the beginning of right list operand
A.6 Library Functions
Structure Function Type Semantic Result
Core
print string -> unit Write a string to the standard output
read string -> string Read the contents of a file
List
nth (‘a list * int) -> ‘a
Return an element from a list given
an index; raises an exception if the
index is out of bounds
length ‘a list -> int Return the length of a list
rev ‘a list -> ‘a list Return the reversed list
map (‘a -> ‘b) -> ‘a list -> ‘b list
Apply a mapping function to each el-
ement of a list and return the result
filter (‘a -> int) -> ‘a list -> ‘a list
Return a list containing only ele-
ments that satisfy the predicate func-
tion
foldl (‘a * ‘b -> ‘b) -> ‘b -> ‘a list -> ‘b
Accumulate a value by iterating over
a list from left to right
foldr (‘a * ‘b -> ‘b) -> ‘b -> ‘a list -> ‘b
Same as foldl except iteration is
from right to left
Int toString int -> string
Return a string representation of an
int
String
size string -> int
Return the number of characters in a
string
substring (string * int * int) -> string
Return the substring from a start to
end location of a string; raises an
exception if either index is out of
bounds
concat string list -> string
Return the concatenation of all
strings in a list
split string -> string -> string list
Return a list of strings resulting from
splitting an original string over some
delimiter
Real
floor real -> int
Return a real rounded towards nega-
tive infinity
ceil real -> int
Return a real rounded towards posi-
tive infinity
round real -> int
Return a real rounded towards the
closest integer
fromInt int -> real
Return a real converted from an inte-
ger
toString real -> string
Return a string representation of a
real
Array
toList ‘a[n] sw -> ‘a sw list
Return a list of software-wrapped
hardware values from a software-
wrapped hardware array
fromList ‘a sw list -> ‘a[n] sw
Return a software-wrapped hardware
array from a list of software-wrapped
hardware values
BitArray twosComp bit[n] > bit[n]
Return a circuit performing twos-
complement of a bit array
HW dff ‘a @ n > ‘a @ (n + 1)
Return a DFF circuit from a given
hardware input
24
A PREPRINT - NOVEMBER 12, 2019
A.7 Operator Order of Precedence
Operator(s) Associativity
˜, !, |->, &->, ^-> N/A
$ N/A
#f N/A
[:i:] N/A
/., *., /, *, &, % left
-., +., -, +, ^, | left
&& left
||, ^^ left
:: right
>, <, >=, <= left
=, <> left
<<, >>, >>> left
andalso left
orelse left
:= right
B Formalizations
B.1 Derived Terms
Name Equivalence
tuple (ei)
i∈1..n ≡ {i = ei}
i∈1..n
unit () ≡ {}
logical and e1 andalso e2 ≡ if e1 then e2 else 0
logical or e1 orelse e2 ≡ if e1 then 1 else e2
logical not not e ≡ if e1 then 0 else 1
and-reduction &->#[ei]
i∈1..n ≡ e1 & ... & en
or-reduction |->#[ei]
i∈1..n ≡ e1 | ... | en
xor-reduction ^->#[ei]
i∈1..n ≡ e1 ^... ^en
and-collapse e1 && e2 ≡ (|->e1) & (|->e2)
or-collapse e1 && e2 ≡ (|->e1) | (|->e2)
xor-collapse e1 && e2 ≡ (|->e1) ^(|->e2)
if-then if e1 then e2 ≡ if e1 then e2 else {}
sequence (e1; e2) ≡ (λx : T.e2)e1 where x /∈ FreeV ar(e2)
B.2 Software Value Grammar
〈swv〉 ::= 〈integer〉
| 〈real〉
| 〈string〉
| 〈list〉
| 〈software record〉
| 〈sw〉
| 〈ref 〉
| 〈variant〉
| 〈function〉
〈integer〉 ::= i ∈ Z ∩ [−231, 231 − 1]
〈real〉 ::= r ∈ R∩[2−1074, (2−2−52)×21023]∩[−(2−2−52)×21023,−2−1074]∩ {numbers expressible
as IEEE FP}
25
A PREPRINT - NOVEMBER 12, 2019
〈string〉 ::= s ∈
2
63
−1⋃
i=0
Ai where A is the ASCII alphabet and Ai denotes a sequence of i characters from
the alphabet A
〈list〉 ::= [〈swv〉i∈0..n−1i ]
〈software record〉 ::= {li = 〈swv〉
i∈1..n
i }
〈sw〉 ::= ω(〈hwv〉)
〈ref 〉 ::= µ[l 7→ 〈swv〉]
〈variant〉 ::= Ci 〈swv〉
〈function〉 ::= λx : TS .e
B.3 Software Term Grammar
〈exp〉 ::= 〈literal〉
| 〈access〉
| 〈let binding〉
| 〈conditional〉
| 〈operation〉
| 〈assignment〉
| 〈pattern match〉
| 〈sequence〉
| 〈application〉
〈literal〉 ::= 〈identifier〉
| 〈integer literal〉
| 〈real literal〉
| 〈string literal〉
| 〈list literal〉
| 〈software record literal〉
| 〈ref literal〉
| 〈sw literal〉
〈identifier〉 ::= 〈id-start〉 〈id-tail〉
〈id-start〉 ::= {any alphabetic character or underscore}
〈id-tail〉 ::= {any alphanumeric character or underscore} 〈id-tail〉
| ǫ
〈integer literal〉 ::= 〈binary-integer〉
| 〈octal-integer〉
| 〈decimal-integer〉
| 〈hex-integer〉
〈binary-integer〉 ::= #’b: 〈binary-digits〉
〈octal-integer〉 ::= #’o: 〈octal-digits〉
〈decimal-integer〉 ::= 〈decimal-digits〉
| 〈sign〉 〈decimal-digits〉
26
A PREPRINT - NOVEMBER 12, 2019
〈hex-integer〉 ::= #’x: 〈hex-digits〉
〈real literal〉 ::= 〈real-tail〉
| 〈sign〉 〈real-tail〉
〈real-tail〉 ::= 〈decimal-digits〉 .
〈decimal-digits-or-empty〉 〈exponent-or-empty〉
| 〈decimal-digits-or-empty〉 .
〈decimal-digits〉
〈exponent-or-empty〉
| 〈decimal-digits〉 〈exponent〉
〈decimal-digits-or-empty〉 ::= 〈decimal-digits〉
| ǫ
〈exponent〉 ::= E 〈decimal-digits〉
| E 〈sign〉 〈decimal-digits〉
| e 〈decimal-digits〉
| e 〈sign〉 〈decimal-digits〉
〈exponent-or-empty〉 ::= 〈exponent〉
| ǫ
〈binary-digit〉 ::= any of {0, 1}
〈octal-digit〉 ::= any of {0, 1, 2, 3, 4, 5, 6, 7}
〈decimal-digit〉 ::= any of {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
〈hex-digit〉 ::= any of {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, A, B, C, D, E, F}
〈binary-digits〉 ::= 〈binary-digits〉 〈binary-digit〉
| 〈binary-digit〉
〈octal-digits〉 ::= 〈octal-digits〉 〈octal-digit〉
| 〈octal-digit〉
〈decimal-digits〉 ::= 〈decimal-digits〉 〈decimal-digit〉
| 〈decimal-digit〉
〈hex-digits〉 ::= 〈hex-digits〉 〈hex-digit〉
| 〈hex-digit〉
〈sign〉 ::= ~
〈string literal〉 ::= " 〈string-body〉 "
〈string-body〉 ::= 〈string-body〉 〈string-character〉
| ǫ
〈string-character〉 ::= {any printable character, including space, except for double-quotes (") and backslash (\)}
| \ 〈escape-character〉
〈escape-character〉 ::= any of {\, ’, ", a, b, e, f, n, r, t, 0}
| 〈hex-digits〉
〈list literal〉 ::= [ 〈list-body〉 ]
| nil
27
A PREPRINT - NOVEMBER 12, 2019
〈list-body〉 ::= 〈exp〉 〈exp comma tail〉
| ǫ
〈exp comma tail〉 ::= , 〈exp〉 〈exp comma tail〉
| ǫ
〈software record literal〉 ::= { 〈record-body〉 }
〈record-body〉 ::= 〈identifier〉 = 〈exp〉 〈record-tail〉
| ǫ
〈record-tail〉 ::= , 〈identifier〉 = 〈exp〉 〈record tail〉
| ǫ
〈ref literal〉 ::= ref 〈exp〉
〈sw literal〉 ::= sw 〈hwv〉
〈access〉 ::= 〈struct-access〉
| 〈record-access〉
| 〈ref-access〉
| 〈tuple-access〉
〈struct-access〉 ::= 〈identifier〉 . 〈identifier〉
〈record-access〉 ::= # 〈identifier〉 〈exp〉
〈ref-access〉 ::= $ 〈exp〉
〈tuple-access〉 ::= # 〈decimal-integer〉 〈exp〉
〈let binding〉 ::= let 〈decs〉 in 〈exp〉 end
〈decs〉 ::= 〈val-dec〉 〈decs〉
| 〈ty-dec〉 〈decs〉
| 〈sdataty-dec〉 〈decs〉
| 〈fun-dec〉 〈decs〉
| ǫ
〈val-dec〉 ::= val 〈identifier〉 = 〈exp〉
| val 〈identifier〉 : 〈ty〉 = 〈exp〉
〈ty-dec〉 ::= type 〈tyvars〉 〈identifier〉 = 〈ty〉
〈tyvars〉 ::= 〈tyvar〉
| ( 〈tyvar〉 〈tyvars tail〉 )
| ǫ
〈tyvars tail〉 ::= , 〈tyvar〉 〈tyvars tail〉
| ǫ
〈ty〉 ::= 〈tyvar〉
| 〈identifier〉
| { 〈ty-fields〉 }
| 〈ty〉 list
| 〈ty〉 ref
| 〈ty〉 sw
28
A PREPRINT - NOVEMBER 12, 2019
| 〈ty〉 -> 〈ty〉
| ( 〈ty〉 )
〈tyvar〉 ::= ’ 〈identifier〉
〈ty-fields〉 ::= 〈ty-fields-tail〉
| ǫ
〈ty-fields-tail〉 ::= 〈identifier〉 : 〈ty〉
| 〈ty-fields-tail〉 , 〈identifier〉 : 〈ty〉
〈sdataty-dec〉 ::= sdatatype 〈tyvars〉 〈identifier〉 = 〈identifier〉 of 〈ty〉 〈dataty-tail〉
〈dataty-tail〉 ::= |: 〈identifier〉 of 〈ty〉 〈dataty-tail〉
| ǫ
〈fun-dec〉 ::= fun 〈identifier〉 〈fun-params〉 = 〈exp〉
| fun 〈identifier〉 〈fun-params〉 : 〈ty〉 = 〈exp〉
〈fun-params〉 ::= 〈fun-params〉 〈fun-param〉
| 〈fun-param〉
〈fun-param〉 ::= 〈identifier〉
| ( 〈fun-param-list〉 )
| { 〈fun-param-list〉 }
〈fun-param-list〉 ::= 〈fun-param-elem〉 〈fun-param-list-tail〉
〈fun-param-elem〉 ::= 〈identifier〉
| 〈identifier〉 : 〈ty〉
〈fun-param-list-tail〉 ::= , 〈fun-param-elem〉 〈fun-param-list-tail〉
| ǫ
〈conditional〉 ::= if 〈exp〉 then 〈exp〉 else 〈exp〉
| if 〈exp〉 then 〈exp〉
〈operation〉 ::= 〈arith-op〉
| 〈compare-op〉
| 〈list-op〉
〈arith-op〉 ::= 〈int-op〉
| 〈real-op〉
〈int-op〉 ::= ~ 〈exp〉
| 〈exp〉 + 〈exp〉
| 〈exp〉 - 〈exp〉
| 〈exp〉 * 〈exp〉
| 〈exp〉 / 〈exp〉
| 〈exp〉 % 〈exp〉
〈real-op〉 ::= 〈exp〉 +. 〈exp〉
| 〈exp〉 -. 〈exp〉
| 〈exp〉 *. 〈exp〉
| 〈exp〉 /. 〈exp〉
〈compare-op〉 ::= 〈exp〉 = 〈sw-t〉
| 〈exp〉 <> 〈exp〉
29
A PREPRINT - NOVEMBER 12, 2019
| 〈exp〉 > 〈exp〉
| 〈exp〉 < 〈exp〉
| 〈exp〉 >= 〈exp〉
| 〈exp〉 <= 〈exp〉
〈list-op〉 ::= 〈exp〉 :: 〈exp〉
〈assign〉 ::= 〈exp〉 := 〈exp〉
〈pattern-match〉 ::= case 〈exp〉 of 〈pattern〉 => 〈exp〉 〈matches-tail〉
〈pattern〉 ::= 〈integer literal〉
| 〈string literal〉
| 〈real literal〉
| 〈identifier〉
| 〈identifier〉 〈pattern〉
| ( 〈opt-patterns〉 )
| { 〈record-patterns〉 }
| 〈pattern〉 :: 〈pattern〉
〈opt-patterns〉 ::= 〈pattern〉 〈opt-patterns-tail〉
| ǫ
〈opt-patterns-tail〉 ::= , 〈pattern〉 〈opt-pattern-tails〉
| ǫ
〈record-patterns〉 ::= 〈identifier "=" <pattern〉 〈record-patterns-tail〉
〈record-patterns-tail〉 ::= , 〈identifier〉 = 〈pattern〉 〈rec-patterns-tail〉
| ǫ
〈matches-tail〉 ::= 〈matches-tail〉 |: 〈pattern〉 => 〈exp〉
〈sequence〉 ::= ( 〈exp〉 )
| ( 〈exp〉 ; 〈exp〉 〈sequence-tail〉 )
〈sequence-tail〉 ::= ; 〈exp〉 〈sequence-tail〉
| ǫ
〈application〉 ::= 〈exp〉 〈exp〉
B.4 Hardware Syntax Grammar
〈exp〉 ::= 〈literal〉
| 〈access〉
| 〈let binding〉
| 〈operation〉
| 〈parameterization〉
〈literal〉 ::= 〈bit literal〉
| 〈array literal〉
| 〈hardware record literal〉
〈bit literal〉 ::= ’b: 〈binary-digit〉
〈array literal〉 ::= #[ 〈list-body〉 ]
| 〈gen-array〉
| 〈bit-array〉
30
A PREPRINT - NOVEMBER 12, 2019
〈gen-array〉 ::= #[ 〈exp〉 ; gen 〈identifier〉 => 〈exp〉 ]
〈bit-array〉 ::= 〈exp〉 ’s: 〈exp〉
| 〈exp〉 ’u: 〈exp〉
| 〈exp〉 ’r: 〈exp〉
〈hardware record literal〉 ::= #{ 〈record-body〉 }
〈access〉 ::= 〈array-access〉
〈array-access〉 ::= 〈exp〉 [: 〈exp〉 :]
〈let binding〉 ::= let 〈decs〉 in 〈exp〉 end
〈decs〉 ::= 〈hdataty-dec〉 〈decs〉
| 〈module-dec〉 〈decs〉
| ǫ
〈ty〉 ::= #{ 〈ty-fields〉 }
| 〈ty〉 #* 〈ty〉
| 〈ty〉 [ 〈decimal-integer〉 ]
| 〈ty〉 @ 〈decimal-integer〉
〈hdataty-dec〉 ::= hdatatype 〈tyvars〉 〈identifier〉 = 〈identifier〉 of 〈ty〉 〈dataty-tail〉
〈module-dec〉 ::= module 〈identifier〉 〈mod-sw-param〉 〈mod-param〉 = 〈exp〉
| module 〈identifier〉 〈mod-sw-param〉 〈mod-param〉 : 〈ty〉 = 〈exp〉
〈mod-sw-param〉 ::= <: 〈fun-param〉 >:
| ǫ
〈mod-param〉 ::= 〈identifier〉
| #( 〈fun-param-list〉 )
| #{ 〈fun-param-list〉 }
〈operation〉 ::= 〈bit-op〉
| 〈unsw〉
〈bit-op〉 ::= ! 〈exp〉
| |-> 〈exp〉
| &-> 〈exp〉
| ^-> 〈exp〉
| 〈exp〉 && 〈exp〉
| 〈exp〉 || 〈exp〉
| 〈exp〉 ^^ 〈exp〉
| 〈exp〉 & 〈exp〉
| 〈exp〉 | 〈exp〉
| 〈exp〉 ^ 〈exp〉
| 〈exp〉 « 〈exp〉
| 〈exp〉 » 〈exp〉
| 〈exp〉 »> 〈exp〉
〈unsw〉 ::= unsw 〈exp〉
〈parameterization〉 ::= 〈exp〉 <: 〈exp〉 >:
31
A PREPRINT - NOVEMBER 12, 2019
B.5 Typing Rules
x : T ∈ Γ
Γ ⊢ x : T
(T-VAR)
Γ, x : T1 ⊢ t2 : T2
Γ ⊢ λx : T1.t2 : T1 → T2
(T-ABS)
Γ ⊢ t1 : T1 → T2 Γ ⊢ t2 : T1
Γ ⊢ t1t2 : T2
(T-APP)
〈int〉 : int (T-INT)
〈real〉 : real (T-REAL)
〈string〉 : string (T-STRING)
〈bit〉 : bit (T-BIT)
nil : TS list (T-NIL)
t1 : int
~t1 : int
(T-INT-NEG)
t1 : int t2 : int
t1 + t2 : int
(T-INT-ADD)
t1 : int t2 : int
t1 - t2 : int
(T-INT-SUB)
t1 : int t2 : int
t1 * t2 : int
(T-INT-MUL)
t1 : int t2 : int
t1 / t2 : int
(T-INT-DIV)
t1 : int t2 : int
t1 % t2 : int
(T-INT-MOD)
t1 : real
~t1 : real
(T-REAL-NEG)
t1 : real t2 : real
t1 +. t2 : real
(T-REAL-ADD)
t1 : real t2 : real
t1 -. t2 : real
(T-REAL-SUB)
t1 : real t2 : real
t1 *. t2 : real
(T-REAL-MUL)
t1 : real t2 : real
t1 /. t2 : real
(T-REAL-DIV)
32
A PREPRINT - NOVEMBER 12, 2019
t1 : TH
!t1 : TH
(T-BIT-NEG)
t1 : TH t2 : TH
t1 & t2 : TH
(T-AND)
t1 : TH t2 : TH
t1 | t2 : TH
(T-OR)
t1 : TH t2 : TH
t1
∧
t2 : TH
(T-XOR)
t1 : bit[n]
&->t1 : bit
(T-AND-RED)
t1 : bit[n]
|->t1 : bit
(T-OR-RED)
t1 : bit[n]
∧->t1 : bit
(T-XOR-RED)
t1 : bit[n] t2 : bit[n]
t1 && t2 : bit
(T-LOG-AND)
t1 : bit[n] t2 : bit[n]
t1 || t2 : bit
(T-LOG-OR)
t1 : bit[n] t2 : bit[n]
t1
∧∧
t2 : bit
(T-LOG-XOR)
t1 : bit[n] t2 : bit[m]
t1 << t2 : bit[n]
(T-SLL)
t1 : bit[n] t2 : bit[m]
t1 >> t2 : bit[n]
(T-SRL)
t1 : bit[n] t2 : bit[m]
t1 >>> t2 : bit[n]
(T-SRA)
t1 : TS t2 : TS
t1 = t2 : int
(T-EQ)
t1 : TS t2 : TS
t1 <> t2 : int
(T-NEQ)
t1 : TS t2 : TS
t1 >= t2 : int
(T-GEQ)
t1 : TS t2 : TS
t1 > t2 : int
(T-GT)
t1 : TS t2 : TS
t1 <= t2 : int
(T-LEQ)
t1 : TS t2 : TS
t1 < t2 : int
(T-LT)
t1 : TS t2 : TS list
t1 :: t2 : TS list
(T-CONS)
33
A PREPRINT - NOVEMBER 12, 2019
t1 : int t2 : T t1 : T
if t1 then t2 else t3 : T
(T-IFELSE)
t1 : TS
ref t1 : TS ref
(T-REF)
t1 : TH
sw t1 : TH sw
(T-SW)
t1 : TH sw
unsw t1 : TH
(T-UNSW)
t1 : TS ref t2 : TS
t1 := t2 : unit
(T-ASSIGN)
t1 : TH[n] t2 : int
t1[t2] : TH
(T-ARR-ACC)
t1 : TS ref
$t1 : TS
(T-DEREF)
for each i Γ ⊢ ti : Ti
Γ ⊢ {li = t
i∈1..n
i } : {li : T
i∈1..n
i }
(T-RCD)
Γ ⊢ t1 : {li : T
i∈1..n
i }
Γ ⊢ #lj t1 : Tj
(T-PROJ)
Γ, τ ⊢ t1 : T1
Γ, τ, x1 : T1 ⊢let (xi = ti)
i∈2..n in t0 end: T0
let (xi = ti)
i∈1..n in t0 end : T0
(T-LET)
D : 〈Ci : Ti〉
i∈1..n ∈ τ Γ ⊢ t : Tj
Γ, τ ⊢ Cj t : D
(T-DATATY)
Γ ⊢ t0 : 〈Ci : Ti〉
i∈1..n
for each i Γ, xi : Ti ⊢ ti : T
Γ ⊢ case t0 of Cixi => t
i∈1..n
i : T
(T-CASE)
B.6 Evaluation Rules
t1 −→ t
′
1
t1t2 −→ t
′
1t2
(E-APP1)
t2 −→ t
′
2
v1t2 −→ v1t
′
2
(E-APP2)
(λx.t1)v1 −→ [x 7→ v1]t1 (E-APPABS)
t1 −→ t
′
1
if t1 then t2 else t3
−→ if t′1 then t2 else t3
(E-IFELSE)
v1 : int v1 6= 0
if v1 then t2 else t3 −→ t2
(E-IFELSE-T)
34
A PREPRINT - NOVEMBER 12, 2019
if 0 then t2 else t3 −→ t3 (E-IFELSE-F)
t1 −→ t
′
1
~t1 −→ ~t
′
1
(E-NEG)
t1 −→ t
′
1
t1 + t2 −→ t
′
1 + t2
(E-INT-ADD1)
t2 −→ t
′
2
v1 + t2 −→ v1 + t
′
2
(E-INT-ADD2)
t1 −→ t
′
1
t1 - t2 −→ t
′
1 - t2
(E-INT-SUB1)
t2 −→ t
′
2
v1 - t2 −→ v1 - t
′
2
(E-INT-SUB2)
t1 −→ t
′
1
t1 * t2 −→ t
′
1 * t2
(E-INT-MUL1)
t2 −→ t
′
2
v1 * t2 −→ v1 * t
′
2
(E-INT-MUL2)
t1 −→ t
′
1
t1 / t2 −→ t
′
1 / t2
(E-INT-DIV1)
t2 −→ t
′
2
v1 / t2 −→ v1 / t
′
2
(E-INT-DIV2)
t1 −→ t
′
1
t1 % t2 −→ t
′
1 % t2
(E-INT-MOD1)
t2 −→ t
′
2
v1 % t2 −→ v1 % t
′
2
(E-INT-MOD2)
t1 −→ t
′
1
t1 +. t2 −→ t
′
1 +. t2
(E-REAL-ADD1)
t2 −→ t
′
2
v1 +. t2 −→ v1 +. t
′
2
(E-REAL-ADD2)
t1 −→ t
′
1
t1 -. t2 −→ t
′
1 -. t2
(E-REAL-SUB1)
t2 −→ t
′
2
v1 -. t2 −→ v1 -. t
′
2
(E-REAL-SUB2)
t1 −→ t
′
1
t1 *. t2 −→ t
′
1 *. t2
(E-REAL-MUL1)
t2 −→ t
′
2
v1 *. t2 −→ v1 *. t
′
2
(E-REAL-MUL2)
t1 −→ t
′
1
t1 /. t2 −→ t
′
1 /. t2
(E-REAL-DIV1)
35
A PREPRINT - NOVEMBER 12, 2019
t2 −→ t
′
2
v1 /. t2 −→ v1 /. t
′
2
(E-REAL-DIV2)
tj −→ t
′
j
#[vi
i∈1..j−1, tj, tk
k∈j+1..n]
−→ #[vi
i∈1..j−1, t′j, tk
k∈j+1..n]
(E-ARR)
t1 −→ t
′
1
!t1 −→ !t
′
1
(E-BIT-NEG1)
v : TH[n]
!v −→ #[!v[i]]i∈0..n−1
(E-BIT-NEG2)
v : #{li : THi
i∈1..n}
!v −→ #{li =!#li v}
i∈1..n
(E-BIT-NEG3)
v : bit
!v −→ v
v¯1
(E-BIT-NEG)
t1 −→ t
′
1
t1 & t2 −→ t
′
1 & t2
(E-AND1)
t2 −→ t
′
2
v1 & t2 −→ v1 & t
′
2
(E-AND2)
v1 : bit[n] v2 : bit[n]
v1 & v2 −→ #[v1,i & v2,i]
i∈0..n−1
(E-AND3)
v1 : #{li : THi
i∈1..n}
v2 : #{li : THi
i∈1..n}
v1 & v2 −→ #{li = #li v1 & #li v2}
i∈1..n
(E-AND4)
t1 −→ t
′
1
t1 | t2 −→ t
′
1 | t2
(E-OR1)
t2 −→ t
′
2
v1 | t2 −→ v1 | t
′
2
(E-OR2)
v1 : bit[n] v2 : bit[n]
v1 | v2 −→ #[v1,i | v2,i]
i∈0..n−1
(E-OR3)
v1 : #{li : THi
i∈1..n}
v2 : #{li : THi
i∈1..n}
v1 | v2 −→ #{li = #li v1 | #li v2}
i∈1..n
(E-OR4)
t1 −→ t
′
1
t1
∧
t2 −→ t
′
1
∧
t2
(E-XOR1)
t2 −→ t
′
2
v1
∧
t2 −→ v1
∧
t
′
2
(E-XOR2)
v1 : bit[n] v2 : bit[n]
v1
∧
v2 −→ #[v1,i
∧
v2,i]
i∈0..n−1 (E-XOR3)
36
A PREPRINT - NOVEMBER 12, 2019
v1 : #{li : THi
i∈1..n}
v2 : #{li : THi
i∈1..n}
v1
∧
v2 −→ #{li = #li v1
∧ #li v2}
i∈1..n
(E-XOR4)
t1 −→ t
′
1
&->t1 −→ &->t
′
1
(E-AND-RED1)
v : bit[n]
&->v −→
v[0]
.
..
v[n-1]
(E-AND-RED)
t1 −→ t
′
1
|->t1 −→ |->t
′
1
(E-OR-RED1)
v : bit[n]
|->v −→
v[0]
...
v[n-1]
(E-OR-RED)
t1 −→ t
′
1
∧->t1 −→
∧->t′1
(E-XOR-RED1)
v : bit[n]
∧->v −→
v[0]
.
..
v[n-1]
(E-XOR-RED)
t1 −→ t
′
1
t1 << t2 −→ t
′
1 << t2
(E-SLL1)
t2 −→ t
′
2
v1 << t2 −→ v1 << t
′
2
(E-SLL2)
t1 −→ t
′
1
t1 >> t2 −→ t
′
1 >> t2
(E-SRL1)
t2 −→ t
′
2
v1 >> t2 −→ v1 >> t
′
2
(E-SRL2)
t1 −→ t
′
1
t1 >>> t2 −→ t
′
1 >>> t2
(E-SRA1)
t2 −→ t
′
2
v1 >>> t2 −→ v1 >>> t
′
2
(E-SRA2)
t1 −→ t
′
1
t1 = t2 −→ t
′
1 = t2
(E-EQ1)
t2 −→ t
′
2
v1 = t2 −→ v1 = t
′
2
(E-EQ2)
v1 = v2 −→ subject to semantics of type (E-EQ)
37
A PREPRINT - NOVEMBER 12, 2019
t1 −→ t
′
1
t1 <> t2 −→ t
′
1 <> t2
(E-NEQ1)
t2 −→ t
′
2
v1 <> t2 −→ v1 <> t
′
2
(E-NEQ2)
v1 <> v2 −→ subject to semantics of type (E-NEQ)
t1 −→ t
′
1
t1 < t2 −→ t
′
1 < t2
(E-LT1)
t2 −→ t
′
2
v1 < t2 −→ v1 < t
′
2
(E-LT2)
v1 < v2 −→ subject to semantics of type (E-LT)
t1 −→ t
′
1
t1 <= t2 −→ t
′
1 <= t2
(E-LEQ1)
t2 −→ t
′
2
v1 <= t2 −→ v1 <= t
′
2
(E-LEQ2)
v1 <= v2 −→ subject to semantics of type (E-LEQ)
t1 −→ t
′
1
t1 > t2 −→ t
′
1 > t2
(E-GT1)
t2 −→ t
′
2
v1 > t2 −→ v1 > t
′
2
(E-GT2)
v1 > v2 −→ subject to semantics of type (E-GT)
t1 −→ t
′
1
t1 >= t2 −→ t
′
1 >= t2
(E-GEQ1)
t2 −→ t
′
2
v1 >= t2 −→ v1 >= t
′
2
(E-GEQ2)
v1 >= v2 −→ subject to semantics of type (E-GEQ)
t1 −→ t
′
1
t1 :: t2 −→ t
′
1 :: t2
(E-CONS1)
t2 −→ t
′
2
v1 :: t2 −→ v1 :: t
′
2
(E-CONS2)
#lj {li = v
i∈1..n
i } −→ vj (E-PROJ-RCD)
t1 −→ t
′
1
#l t1 −→ #l t
′
1
(E-PROJ)
38
A PREPRINT - NOVEMBER 12, 2019
tj −→ t
′
j
{li = v
i∈1..j−1
i , lj = tj, lk = t
k∈j+1..n
k }
−→ {li = v
i∈1..j−1
i , lj = t
′
j, lk = t
k∈j+1..n
k }
(E-RCD)
{li = v
i∈1..j−1
i , lj = vj, lk = t
k∈j+1..n
k }
−→ {li = v
i∈1..j
i , lk = t
k∈j+1..n
k }
(E-RCDV)
t1 | µ −→ t
′
1 | µ
′
$t1 | µ −→ $t
′
1 | µ
′
(E-DEREF)
µ(l) = v
$l | µ −→ v | µ
(E-DEREFLOC)
t1 | µ −→ t
′
1 | µ
′
t1 := t2 | µ −→ t
′
1 := t2 | µ
′
(E-ASSIGN1)
t2 | µ −→ t
′
2 | µ
′
l := t2 | µ −→ l := t
′
2 | µ
′
(E-ASSIGN2)
l := v1 | µ −→ unit | [l 7→ v1] µ (E-ASSIGN)
t1 | µ −→ t
′
1 | µ
′
ref t1 | µ −→ ref t
′
1 | µ
′
(E-REF)
l /∈ dom(µ)
ref v1 | µ −→ l | (µ, l 7→ v1)
(E-REFV)
let x1 = v1 (xi = ti)
i∈2..n
in t0 end | (Γ, τ )
−→ let x2 = t2 (xi = ti)
i∈3..n
in t0 end | (Γ, τ, x1 7→ v1)
(E-LETV1)
let x = v in t −→ [x 7→ v]t (E-LETV2)
t1 −→ t
′
1
let x1 = t1 (xi = ti)
i∈2..n in t0 end
−→ let x1 = t
′
1 (xi = ti)
i∈2..n in t0 end
(E-LET)
ti −→ t
′
i
Ci ti −→ Ci t
′
i
(E-DATATY)
t0 −→ t
′
0
case t0 of Ci xi => t
i∈1..n
i
−→ case t′0 of Ci xi => t
i∈1..n
i
(E-CASE)
case Cj vj of Ci xi => t
i∈1..n
i
−→ [xj 7→ vj] tj
(E-CASE-TY)
v1 ≡ v
i∈0..n−1
1,i
v1[v2] −→ v1,v2
(E-ARR-ACC)
39
A PREPRINT - NOVEMBER 12, 2019
t2 −→ t
′
2
v1[t2] −→ v1[t
′
2]
(E-ARR-ACC1)
t1 | ω −→ t
′
1 | ω
′
sw t1 | ω −→ sw t
′
1 | ω
′
(E-SW)
w /∈ dom(ω)
sw v1 | ω −→ w | (ω,w 7→ v1)
(E-SWV)
t1 | σ −→ t
′
1 | σ
′
unsw t1 | σ −→ unsw t
′
1 | σ
′
(E-UNSW)
ω(w) = v
unsw w | ω −→ v | ω
(E-UNSWWRAP)
C Proofs
C.1 Inversion of Typing Relation
1. If Γ ⊢ x : R, then x : R ∈ Γ.
2. If Γ ⊢ λx : T1.t2 : R, then R = T1 → R2 for some R2 with Γ, x : T1 ⊢ t2 : R2.
3. If Γ ⊢ t1 t2 : R then there is some type T11 such that Γ ⊢ t1 : T11 → R and that Γ ⊢ t2 : T11.
4. If 〈integer〉: R, then R = int.
5. If 〈real〉: R, then R = real.
6. If 〈string〉: R, then R = string.
7. If ’b:0 : R, then R = bit.
8. If ’b:1 : R, then R = bit.
9. If nil : R, then R = TS list.
10. If () : R, then R = unit.
11. If ~t1 : int, then t1 : int.
12. If t1 + t2 : R, then R = int, t1 : int and t2 : int.
13. If t1 - t2 : R, then R = int, t1 : int and t2 : int.
14. If t1 * t2 : R, then R = int, t1 : int and t2 : int.
15. If t1 / t2 : R, then R = int, t1 : int and t2 : int.
16. If t1 % t2 : R, then R = int, t1 : int and t2 : int.
17. If ~t1 : real, then t1 : real.
18. If t1 +. t2 : R, then R = real, t1 : real and t2 : real.
19. If t1 -. t2 : R, then R = real, t1 : real and t2 : real.
20. If t1 *. t2 : R, then R = real, t1 : real and t2 : real.
21. If t1 /. t2 : R, then R = real, t1 : real and t2 : real.
22. If !t1 : R, then R = TH and t1 : TH.
23. If t1 & t2 : R, then R = TH, t1 : TH and t2 : TH.
24. If t1 | t2 : R, then R = TH, t1 : TH and t2 : TH.
25. If t1
∧ t2 : R, then R = TH, t1 : TH and t2 : TH.
26. If &->t1 : R, then R = bit and t1 : bit[n].
27. If |->t1 : R, then R = bit and t1 : bit[n].
28. If ∧->t1 : R, then R = bit and t1 : bit[n].
40
A PREPRINT - NOVEMBER 12, 2019
29. If t1 && t2 : R, then R = bit, t1 : bit[n] and t2 : bit[n].
30. If t1 || t2 : R, then R = bit, t1 : bit[n] and t2 : bit[n].
31. If t1
∧∧ t2 : R, then R = bit, t1 : bit[n] and t2 : bit[n].
32. If t1 << t2 : R, then R = bit[n], t1 : bit[n] and t2 : bit[m].
33. If t1 >> t2 : R, then R = bit[n], t1 : bit[n] and t2 : bit[m].
34. If t1 >>> t2 : R, then R = bit[n], t1 : bit[n] and t2 : bit[m].
35. If t1 = t2 : R, then R = int, t1 : TS and t2 : TS.
36. If t1 <> t2 : R, then R = int, t1 : TS and t2 : TS.
37. If t1 < t2 : R, then R = int, t1 : TS and t2 : TS.
38. If t1 <= t2 : R, then R = int, t1 : TS and t2 : TS.
39. If t1 > t2 : R, then R = int, t1 : TS and t2 : TS.
40. If t1 >= t2 : R, then R = int, t1 : TS and t2 : TS.
41. If t1 andalso t2 : R, then R = int, t1 : int and t2 : int.
42. If t1 orelse t2 : R, then R = int, t1 : int and t2 : int.
43. If not t1 : R, then R = int and t1 : int.
44. If t1::t2 : R, then R = TS list, t1 : TS and t2 : TS list.
45. If if t1 then t2 else t3 : R, then t1 : int, t2 : R, and t3 : R.
46. If if t1 then t2 : R, then R = unit, t1 : int and t2 : unit.
47. If ref t1 : R, then R = TS ref and t1 : TS.
48. If t1 := t2 : R, then R = unit, t1 : TS ref and t2 : TS.
49. If $t1 : R, then R = TS and t1 : TS ref.
50. If sw t1 : R, then R = TH sw and t1 : TH.
51. If t1[t2]: R, then R = TH, t1 : TH[n] and t2 : int.
52. If {li = t
i∈1..n
i } : R, then R = {li : T
i∈1..n
i } and for each i, ti : Ti.
53. If #lj t1 : R, then R = Tj and t1 : {li : T
i∈1..n
i }.
54. If let x = t1 in t2 end : R, then Γ, τ ⊢ R = T2, Γ, τ ⊢ t1 : T1 and Γ, τ, x : T1 ⊢ t2 : T2.
55. If Cj t : R, then R = 〈Ci : Ti〉
i∈1..n and t : Tj.
56. If case t0 of Ci xi => t
i∈1..n
i : R, then R = T, t0 : 〈Ci : Ti〉
i∈1..n, and for each i, Γ, xi : Ti ⊢ ti : T.
C.2 Proof of Progress
Proof. By induction on a derivation of t : T.
Case T-INT, T-REAL, T-STRING, T-BIT, T-NIL:
Immediate since t is a value.
Case T-VAR:
Cannot occur since t must be closed as per the
hypothesis.
Case T-ABS:
Immediate since t is a value.
Case T-APP:
t = t1 t2
⊢ t1 : T11 → T12
⊢ t2 : T11
By the induction hypothesis, either t1 is a
value or else there is some t′1 for which t1 −→
t′1, and likewise for t2. If t1 −→ t
′
1, then by
E-APP1, t −→ t′1 t2. On the other hand, if
t1 is a value and t2 −→ t
′
2, then by E-APP2,
t −→ t1 t
′
2. Finally, if both t1 and t2 are val-
ues, then case 5 of the canonical forms lemma
tells us that t1 has the form λx : T11.t12, and
so by E-APPABS, t −→ [x 7→ t2]t12.
Case T-INT-NEG:
t = ~t1
t1 : int
41
A PREPRINT - NOVEMBER 12, 2019
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 is a value, then case 1 of the canoni-
cal forms lemma assures us that it is an integer
value as described in Section B.2 with existing
and valid semantic meaning for negation, yield-
ing a value. On the other hand, if t1 −→ t
′
1,
then by E-NEG, t −→ ~t′1.
Case T-INT-ADD:
t = t1 + t2
t1 : int
t2 : int
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1, then by E-INT-ADD1,
t −→ t′1 + t2. On the other hand, if t1
is a value, then case 1 of the canonical forms
lemma assures us that it is an integer value as
described in Section B.2. Further in this case,
by the induction hypothesis, either t2 is a value
or else there is some t′2 such that t2 −→ t
′
2. If
t2 −→ t
′
2, then by E-INT-ADD2, t −→ v1 +
t′2. On the other hand, if t2 is a value, then case
1 of the canonical forms lemma assures us that
it is an integer value as described in Section
B.2. Thus, both t1 and t2 are integer values
in this case and addition has a well-defined se-
mantic meaning thereby yielding a value.
Case T-INT-(SUB/MUL/DIV/MOD):
Similar to T-INT-ADD.
Case T-REAL-NEG:
t = ~t1
t1 : real
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 is a value, then case 2 of the canoni-
cal forms lemma assures us that it is a value in
the domain of real numbers as described in Sec-
tion B.2 with existing and valid semantic mean-
ing for negation, yielding a value. On the other
hand, if t1 −→ t
′
1, then by E-NEG, t −→ ~t
′
1.
Case T-REAL-ADD:
t = t1 +. t2
t1 : real
t2 : real
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1, then by E-REAL-ADD1,
t −→ t′1 +. t2. On the other hand, if t1
is a value, then case 2 of the canonical forms
lemma assures us that it is a value in the do-
main of real numbers as described in Section
B.2. Further in this case, by the induction hy-
pothesis, either t2 is a value or else there is
some t′2 such that t2 −→ t
′
2. If t2 −→ t
′
2,
then by E-REAL-ADD2, t −→ v1 +. t
′
2.
On the other hand, if t2 is a value, then case
2 of the canonical forms lemma assures us that
it is a value in the domain of real numbers as
described in Section B.2. Thus, both t1 and t2
are real values in this case and addition has a
well-defined semantic meaning thereby yield-
ing a value.
Case T-REAL-(SUB/MUL/DIV):
Similar to T-REAL-ADD.
Case T-BIT-NEG:
t = !t1
t1 : TH
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1 then by E-BIT-NEG1, t −→
!t′1. On the other hand, if t1 is a value then it
is a hardware value and we can perform struc-
tural induction on the possible types. If it is
a bit then by E-BIT-NEG we produce a logi-
cal not-gate. If it is a hardware array then by
E-BIT-NEG2 we move the negation inside and
apply to each element. Similarly, if it is a hard-
ware record then by E-BIT-NEG3 we move the
negation inside and apply to each field element.
Case T-AND:
t1 & t2
t1 : TH
t2 : TH
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1, then by E-AND1, t −→ t
′
1
& t2. On the other hand, if t1 is a value, then
it is a hardware value. Further in this case, by
the induction hypothesis, either t2 is a value
or else there is some t′2 such that t2 −→ t
′
2.
If t2 −→ t
′
2, then by E-AND2, t −→ v1
& t′2. On the other hand, if t2 is a value,
then it is a hardware value. If, both t1 and
t2 are hardware values then we can perform
structural induction on the possible types. If
TH = bit then by the derived-term definition,
t −→ &->#[t1, t2]. If it is a hardware array
then by E-AND3 we move the operation inside
and apply to each pair of elements. Similarly,
if it is a hardware record then by E-AND4 we
move the operation inside and apply to each
pair of field elements.
42
A PREPRINT - NOVEMBER 12, 2019
Case T-(OR/XOR):
Similar to T-AND.
Case T-AND-RED:
t = &->t1
t1 : bit[n]
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1 then by E-AND-RED1, t −→
&->t′1. On the other hand if t1 is a value, then
it is a bit array and is evaluated by E-AND-
RED.
Case T-(OR/XOR)-RED:
Similar to T-AND-RED.
Case T-SLL:
t1 << t2
t1 : bit[n]
t2 : bit[m]
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1 then by E-SLL1, t −→ t
′
1 <<
t2. On the other hand, if t1 is a value, then ei-
ther t2 is a value or else there is some t
′
2 such
that t2 −→ t
′
2. If t2 −→ t
′
2 then by E-SLL2,
t −→ v1 << t
′
2. On the other hand, if t2 is
a value, then both t1 and t2 are values and by
the definition in Section ??, t is a value.
Case T-(SRL/SRA):
Similar to T-SLL.
Case T-EQ:
t = t1 = t2
t1 : TS
t2 : TS
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1, then by E-EQ1, t −→ t
′
1 =
t2. On the other hand, if t1 is a value, then
it may be of any software type. Further in
this case, by the induction hypothesis, either
t2 is a value or else there is some t
′
2 such that
t2 −→ t
′
2. If t2 −→ t
′
2, then by E-EQ2,
t −→ v1 = t
′
2. On the other hand, if t2 is a
value, then it may be of any software type. Fur-
ther, both t1 and t2 are values in this case and
equality has semantic meaning defined as per
the type definition, thereby yielding a value.
Case T-(NEQ/LT/LEQ/GT/GEQ):
Similar to T-EQ.
Case T-NOT:
t = not t1
t1 : int
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. In either case, t can be evaluated by E-
NOT, namely t −→ if t1 then 0 else 1.
Case T-CONS:
t = t1::t2
t1 : TS
t2 : TS list
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1, then by E-CONS1, t −→ t
′
1
:: t2. On the other hand, if t1 is a value,
then either t2 is a value or else there is some
t′2 such that t2 −→ t
′
2. If t2 −→ t
′
2, then
by E-CONS2, t −→ v1 :: t
′
2. On the other
hand, if t2 is a value, then both t1 and t2 are
values and list concatenation evaluates under
well-defined semantics yielding a value.
Case T-IFELSE:
t = if t1 then t2 else t3
t1 : int
t2 : T
t3 : T
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1, then by E-IFELSE, t −→
if t′1 then t2 else t3. On the other hand,
if t1 is a value, then case 1 of the canonical
forms lemma assures us that it is an integer
value as described in Section B.2. Further, it
is either zero or non-zero, and can be evaluated
by either E-IFELSE-T or E-IFELSE-F.
Case T-REF:
t = ref t1
t1 : TS
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1, then by E-REF, t −→ ref
t′1. On the other hand, if t1 is a value, then by
E-REFV, t −→ l|(µ, l 7→ v1).
Case T-SW:
t = sw t1
t1 : TH
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1 then by E-SW, t −→ sw
43
A PREPRINT - NOVEMBER 12, 2019
t′1. On the other hand, if t1 is a value then by
E-SWV, t −→ w|(σ,w 7→ v1).
Case T-UNSW:
t = unsw t1
t1 : TH sw
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1 then by E-UNSW, t −→ hw
t′1. On the other hand, if t1 is a value then by
E-UNSWWRAP, t −→ w | σ.
Case T-ASSIGN:
t = t1 := t2 | µ
t1 : TS ref
t2 : TS
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 |
µ −→ t′1 | µ
′. If t1 | µ −→ t
′
1 | µ
′, then
by E-ASSIGN1, t −→ t′1 := t2 | µ
′. On
the other hand, if t1 is a value, then by the in-
duction hypothesis, either t2 is a value or else
there is some t′2 such that t2 | µ −→ t
′
2 | µ
′.
If t2 | µ −→ t
′
2 | µ
′, then by E-ASSIGN2,
t −→ v1 := t
′
2 | µ
′. On the other hand, if
t2 is a value, then both t1 and t2 are values and
so by E-ASSIGN, t −→ unit | [l 7→ v1]µ.
Case T-DEREF:
t = $t1 | µ
t1 : TS ref
By the induction hypothesis, either t1 is a
value or else there is some t′1 and µ
′ such that
t1 | µ −→ t
′
1 | µ
′. If t1 | µ −→ t
′
1 | µ
′, then
by E-DEREF, t −→ $t′1 | µ
′. On the other
hand, if t1 is a value then case 6 of the canoni-
cal forms lemma assures us that it is a location l
in store µ, and so assuming that µ(l) = v then
by E-DEREFLOC, t −→ v | µ.
Case T-ARR-ACC:
t = t1[t2]
t1 : TH[n]
t2 : int
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. Since t1 is of hardware type it is a value
by definition. By the induction hypothesis, ei-
ther t2 is a value or else there is some t
′
2 such
that t2 −→ t
′
2. If t2 −→ t
′
2, then by E-ARR-
ACC1, t −→ t1[t
′
2]. On the other hand, if
t2 is a value, then both t1 and t2 are values
and by E-ARR-ACC, t −→ v1,v2 .
Case T-RCD:
t = {li = t
i∈1..n
i }
for each i, ti : Ti
By the induction hypothesis, for all j
either tj is a value or else there is
some t′j such that tj −→ t
′
j. If
tj −→ t
′
j, then by E-RCD, t −→
{li = v
i∈1..j−1
i , lj = t
′
j, lk = t
k∈j+1..n
k }.
On the other hand, if tj is a value, then by E-
RCDV, t −→ {li = v
i∈1..j
i , lk = t
k∈j+1..n
k }.
Case T-PROJ:
t = #lj t1
t1 : {li : T
i∈1..n
i }
By the induction hypothesis, either t1 is a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1 then by E-PROJ, t −→ #lj
t′1. On the other hand, if t1 is a value then case
7 of the canonical forms lemma assures us that
it is a record value of form {li = v
i∈1..n
i } and
so by E-PROJ-RCD, t −→ vj.
Case T-LET:
t = let x1 = t1(xi = ti)
i∈2..n
in t0 end t1 : T1
By the induction hypothesis, t1 is either a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1, then by E-LET, t −→ let
x1 = t
′
1(xi = ti)
i∈2..n in t0 end. On the
other hand, if t1 is a value then by the defi-
nition of let-bindings n ≥ 1. If n = 1, then by
E-LETV2, t −→ [x1 7→ v1]t0. On the other
hand, if n > 1, then by E-LETV1, t −→ let
x2 = t2(xi = ti)
i∈3..n in t0 end.
Case T-DATATY:
t = Cj t1
t1 : Tj
By the induction hypothesis, t1 is either a
value or else there is some t′1 such that t1 −→
t′1. If t1 −→ t
′
1, then by E-DATATY, t −→
Cj t
′
1. On the other hand, if t1 is a value then
case 8 of the canonical forms lemma assures us
that it is a datatype value of form 〈Cj = t1〉.
Case T-CASE:
t = case t0 of Ci xi =>
ti∈1..ni
t0 : 〈Ci : Ti〉
i∈1..n
By the induction hypothesis, t0 is either a
value or else there is some t′0 such that t0 −→
t′0. If t0 −→ t
′
0, then by E-CASE, t −→
case t′0 of Ci xi => t
i∈1..n
i . On the other
hand, if t0 is a value then case 8 of the canoni-
cal forms lemma assures us that it is a datatype
value of form 〈Cj = vj〉 and so by E-CASE-
TY, t −→ [xj 7→ vj]tj.
44
A PREPRINT - NOVEMBER 12, 2019
C.3 Proof of Preservation
Proof. By induction on a derivation of t : T. At each
step of the induction, we assume that the desired prop-
erty holds for all subderivations (i.e. that if s : S and
s −→ s′, then s′ : S, whenever s : S is proved by a sub-
derivation of the present one) and proceed by case anal-
ysis on the final rule in the derivation.
Case T-VAR:
t = x
x : T ∈ Γ
If the last rule in the derivation is T-VAR, then
we know from the form of this rule that t must
be a variable of type T. Thus, t is a value, so it
cannot be the case that t −→ t′ for any t′, and
the requirements of the theorem are vacuously
satisfied.
Case T-ABS:
t = λx : T1.t2
If the last rule in the derivation is T-ABS, then
we know from the form of this rule that t must
be an abstraction. Thus, t is a value, so it can-
not be the case that t −→ t′ for any t′, and
the requirements of the theorem are vacuously
satisfied.
Case T-APP:
t = t1 t2
Γ ⊢ t1 : T11 → T12
Γ ⊢ t2 : T11
T = T12
Looking at the evaluation rules with applica-
tions on the left-hand side, we find that there
are three rules by which t −→ t′ can be de-
rived: E-APP1, E-APP2, and E-APPABS. We
consider each case separately.
Subcase E-APP1:
t1 −→ t
′
1
t′ = t′1 t2
From the assumptions of the T-APP
case, we have a subderivation of the
original typing derivationwhose con-
clusion is Γ ⊢ t1 : T11 → T12. We
can apply the induction hypothe-
sis to this subderivation obtaining
Γ ⊢ t′1 : T11 → T12. Combining this
with the fact that Γ ⊢ t2 : T11, we
can apply rule T-APP to conclude
that Γ ⊢ t′ : T.
Subcase E-APP2:
Similar to E-APP1.
Subcase E-APPABS:
t1 = λx : T11.t12 t2 = v2
t′ = [x 7→ v2]t12
Using the inversion lemma, we
can deconstruct the typing deriva-
tion for λx : T11.t12 yielding
Γ, x : T11 ⊢ t12 : T12. From this we
obtain Γ ⊢ t′ : T12.
Case T-INT:
t = 〈integer〉
T = int
If the last rule in the derivation is T-INT, then
we know from the form of this rule that t must
be an integer value as described in Section B.2
and that T is int. Thus, t is a value, so it can-
not be the case that t −→ t′ for any t′, and
the requirements of the theorem are vacuously
satisfied.
Case T-REAL, T-STRING, T-BIT, T-NIL:
Similar to T-INT.
Case T-INT-NEG:
t = ~t1
T = int
If the last rule in the derivation is T-INT-NEG,
then we know from the form of this rule that
t must have the form ~t1 and that T is int.
We must further have a subderivation with con-
clusion t1 : int. Looking at the evaluation
rules with integer negation on the left-hand
side, there is one rule by which t −→ t′ can
be derived: E-NEG.
Subcase E-NEG:
t1 −→ t
′
1
t′ = ~t′1
From the assumptions of the T-INT-
NEG case, we have a subderivation
of the original typing derivation with
conclusion t1 : int. We may apply
the induction hypothesis to this sub-
derivation, obtaining t′1 : int. Then,
by T-INT-NEG, ~t′1 : int and so
t′ : int.
Case T-INT-ADD:
t = t1 + t2
T = int
If the last rule in the derivation is T-INT-ADD,
then we know from the form of this rule that
t must have the form t1 + t2 and that T is
int. We must further have subderivations with
45
A PREPRINT - NOVEMBER 12, 2019
conclusions t1 : int and t2 : int. Looking at
the evaluation rules with integer addition on
the left-hand side, there are two rules by which
t −→ t′ can be derived: E-INT-ADD1 and E-
INT-ADD2.
Subcase E-INT-ADD1:
t1 −→ t
′
1
t′ = t′1 + t2
From the assumptions of the T-INT-
ADD case, we have a subderiva-
tion of the original typing deriva-
tion with conclusion t1 : int. We
may apply the induction hypothe-
sis to this subderivation, obtaining
t′1 : int. We also have a subderiva-
tion with conclusion t2 : int. Thus,
by T-INT-ADD, t′1 + t2 : int and
so t′ : int.
Subcase E-INT-ADD2:
t2 −→ t
′
2
t′ = v1 + t
′
2
From the assumptions of the T-INT-
ADD case, we have a subderivation
of the original typing derivation with
conclusion t2 : int. We may apply
the induction hypothesis to this sub-
derivation, obtaining t′2 : int. We
also have a subderivation with con-
clusion t1 : int and since v1 is
the value form denotation of t1, it
also holds that v1 : int. Thus, by
T-INT-ADD, v1 + t
′
2 : int and so
t′ : int.
Case T-INT-(SUB/MUL/DIV/MOD):
Similar to T-INT-ADD.
Case T-REAL-(ADD/SUB/MUL/DIV):
Similar to T-INT-ADD.
Case T-BIT-NEG:
t = !t1
T = TH
If the last rule in the derivation is T-BIT-NEG,
then we know from the form of this rule that t
must have the form !t1 and that T is TH. Look-
ing at the evaluation rules there is only one rule
by which t −→ t′ can be derived: E-BIT-
NEG1. We may apply the induction hypothesis
to our subderivation, obtaining t′1 : TH. Then,
by T-BIT-NEG, !t′1 : TH and so t
′ : TH.
Case T-AND:
t = t1 & t2
T = TH
If the last rule in the derivation is T-AND, then
we know from the form of this rule that t must
have the form t1 & t2 and that T is TH. We
must further have subderivations with conclu-
sions t1 : TH and t2 : TH. Looking at the eval-
uation rules with & on the left-hand side, there
are two rules by which t −→ t′ can be derived:
E-AND1 and E-AND2.
Subcase E-AND1:
t1 −→ t
′
1
t′ = t′1 & t2
From the assumptions of the T-AND
case, we have a subderivation of
the original typing derivation with
conclusion t1 : TH. We may apply
the induction hypothesis to this sub-
derivation, obtaining t′1 : TH. We
also have a subderivation with con-
clusion t2 : TH. Thus, by T-AND, t
′
1
& t2 : TH and so t
′ : TH.
Subcase E-AND2:
t2 −→ t
′
2
t′ = v1 & t
′
2
From the assumptions of the T-AND
case, we have a subderivation of the
original typing derivation with con-
clusion t2 : TH. We may apply the in-
duction hypothesis to this subderiva-
tion, obtaining t′2 : TH. We also
have a subderivationwith conclusion
t1 : TH and since v1 is the value form
denotation of t1, it also holds that
v1 : TH. Thus, by T-INT-ADD, v1 &
t′2 : TH and so t
′ : TH.
Case T-(OR/XOR):
Similar to T-AND.
Case T-AND-RED:
t = &->t1
T = bit[n]
If the last rule in the derivation is T-AND-RED,
then we know from the form of this rule that t
must have the form &->t1 and that T is bit[n].
Looking at the evaluation rules there is only
one rule by which t −→ t′ can be derived: E-
AND-RED1. We may apply the induction hy-
pothesis to our subderivation, obtaining t′1 : TH.
Then, by T-AND-RED, &->t′1 : bit[n] and
so t′ : bit[n].
Case T-(OR/XOR)-RED:
Similar to T-AND-RED.
Case T-SLL:
46
A PREPRINT - NOVEMBER 12, 2019
t = t1 << t2
T = bit[n]
If the last rule in the derivation is T-SLL, then
we know from the form of this rule that t must
have the form t1 << t2 and that T is bit[n].
We must further have subderivations with con-
clusions t1 : bit[n] and t2 : bit[m]. Look-
ing at the evaluation rules with left-shifting on
the left-hand side, there are two rules by which
t −→ t′ can be derived: E-SLL1 and E-SLL2.
Subcase E-SLL1:
t1 −→ t
′
1
t′ = t′1 << t2
From the assumptions of the T-SLL
case, we have a subderivation of the
original typing derivation with con-
clusion t1 : bit[n]. We may apply
the induction hypothesis to this sub-
derivation, obtaining t′1 : bit[n].
We also have a subderivation with
conclusion t2 : bit[m]. Thus, by
T-SLL, t′1 << t2 : bit[n] and so
t′ : bit[n].
Subcase E-SLL2:
t2 −→ t
′
2
t′ = v1 << t
′
2
From the assumptions of the T-SLL
case, we have a subderivation of the
original typing derivation with con-
clusion t2 : bit[m]. We may apply
the induction hypothesis to this sub-
derivation, obtaining t′2 : bit[m].
We also have a subderivation with
conclusion t1 : bit[n] and since v1
is the value form denotation of t1, it
also holds that v1 : bit[n]. Thus,
by T-SLL, v1 << t
′
2 : bit[n] and
so t′ : bit[n].
Case T-(SRL/SRA):
Similar to T-SLL.
Case T-EQ:
t = t1 = t2
T = int
If the last rule in the derivation is T-EQ, then
we know from the form of this rule that t must
have the form t1 = t2 and that T is int. We
must further have subderivations with conclu-
sions t1 : TS and t2 : TS for some TS. Look-
ing at the evaluation rules with equality on the
left-hand side, there are two rules by which
t −→ t′ can be derived: E-EQ1 and E-EQ2.
Subcase E-EQ1:
t1 −→ t
′
1
t′ = t′1 = t2
From the assumptions of the T-EQ
case, we have a subderivation of
the original typing derivation with
conclusion t1 : TS. We may apply
the induction hypothesis to this sub-
derivation, obtaining t′1 : TS. We
also have a subderivation with con-
clusion t2 : TS. Thus, by T-EQ, t
′
1
= t2 : int and so t
′ : int.
Subcase E-INT-ADD2:
t2 −→ t
′
2
t′ = v1 = t
′
2
From the assumptions of the T-EQ
case, we have a subderivation of
the original typing derivation with
conclusion t2 : TS. We may apply
the induction hypothesis to this sub-
derivation, obtaining t′2 : TS. We
also have a subderivation with con-
clusion t1 : TS and since v1 is the
value form denotation of t1, it also
holds that v1 : TS. Thus, by T-EQ, v1
= t′2 : int and so t
′ : int.
Case T-(NEQ/LT/LEQ/GT/GEQ):
Similar to T-EQ.
Case T-CONS:
t = t1::t2
T = TS list
If the last rule in the derivation is T-CONS,
then we know from the form of this rule that
t must have the form t1::t2 and that T is TS
list. We must further have subderivations
with conclusions t1 : TS and t2 : TS list for
some TS. Looking at the evaluation rules with
the cons operator on the left-hand side, there
are two rules by which t −→ t′ can be derived:
E-CONS1 and E-CONS2.
Subcase E-CONS1:
t1 −→ t
′
1
t′ = t′1::t2
From the assumptions of the T-
CONS case, we have a subderiva-
tion of the original typing derivation
with conclusion t1 : TS. We may
apply the induction hypothesis to
this subderivation, obtaining t′1 : TS.
We also have a subderivation with
conclusion t2 : TS list. Thus, by
T-CONS, t′1::t2 : TS list and so
t′ : TS list.
Subcase E-CONS2:
47
A PREPRINT - NOVEMBER 12, 2019
t2 −→ t
′
2
t′ = v1:: t
′
2
From the assumptions of the T-
CONS case, we have a subderivation
of the original typing derivation with
conclusion t2 : TS list. We may
apply the induction hypothesis to
this subderivation, obtaining t′2 : TS
list. We also have a subderivation
with conclusion t1 : TS and since v1
is the value form denotation of t1,
it also holds that v1 : TS. Thus, by
T-CONS, v1::t
′
2 : TS list and so
t′ : TS list.
Case T-IFELSE:
t = if t1 then t2 else t3
T = T0
If the last rule in the derivation is T-IFELSE,
then we know from the form of this rule that t
must have the form if t1 then t2 else t3
and that T is T0 for some T0. We must further
have subderivations with conclusions t1 : int,
t2 : T0 and t3 : T0. Looking at the evaluation
rules with a conditional on the left-hand side,
there are three rules by which t −→ t′ can
be derived: E-IFELSE, E-IFELSE-T, and E-
IFELSE-F.
Subcase E-IFELSE:
t1 −→ t
′
1
t′ = if t′1 then t2 else
t3
From the assumptions of the T-
IFELSE case, we have a subderiva-
tion of the original typing deriva-
tion with conclusion t1 : int. We
may apply the induction hypothe-
sis to this subderivation, obtaining
t′1 : int. We also have subderiva-
tions with conclusions t2 : T0 and
t3 : T0. Thus, by T-IFELSE, if
t′1 then t2 else t3 : T0 and so
t′ : T0.
Subcase E-IFELSE-T:
t1 6= 0
t′ = t2
If t −→ t′ is derived using E-
IFELSE-T, then from the form of
this rule we see that t1 6= 0 and
the resulting term t′ is the second
subexpression t2. This means we
are finished, since we know (by the
assumptions of the T-IFELSE case)
that t2 : T0, which is what we need.
Subcase E-IFELSE-F:
Similar to E-IFELSE-T.
Case T-REF:
t = ref t1 | µ
T = TS ref
If the last rule in the derivation is T-REF, then
we know from the form of this rule that t must
have the form ref t1 and that T is TS ref for
some TS. We must further have a subderivation
with conclusion t1 : TS. Looking at the evalua-
tion rules with ref on the left-hand side, there
are two rules by which t −→ t′ can be derived:
E-REF, and E-REFV.
Subcase E-REF:
t1 | µ −→ t
′
1 | µ
′
t′ = ref t′1 | µ
′
From the assumptions of the T-REF
case, we have a subderivation of the
original typing derivation with con-
clusion t1 : TS. We may apply the in-
duction hypothesis to this subderiva-
tion, obtaining t′1 : TS. Thus, by T-
REF, ref t′1 | µ
′ : TS ref and so
t′ : TS ref.
Subcase E-REFV:
t1 = v1
t′ = l | (µ, l 7→ v1
If t −→ t′ is derived using E-REFV,
then from the form of this rule we see
that t1 = v1 and the resulting term t
′
is a location in the store µ augmented
with a mapping between the location
and v1. We know that locations in
store µ are of type T ref where T
is the type of the value to which it
is mapped. In this case, that means
that the location l is of type TS ref
which is what we need.
Case T-ASSIGN:
t = t1 := t2
T = unit
If the last rule in the derivation is T-ASSIGN,
then we know from the form of this rule that
t must have the form t1 := t2 and that T is
unit. We must further have a subderivation
with conclusion t1 : TS ref and t2 : TS for
some TS. Looking at the evaluation rules with
the assignment operator on the left-hand side,
there are three rules by which t −→ t′ can
be derived: E-ASSIGN1, E-ASSIGN2, and E-
ASSIGN.
Subcase E-ASSIGN1:
t1 | µ −→ t
′
1 | µ
′
t′ = t′1 := t2 | µ
′
From the assumptions of the T-
ASSIGN case, we have subderiva-
tions of the original typing derivation
48
A PREPRINT - NOVEMBER 12, 2019
with conclusions t1 : TS ref and
t2 : TS. We may apply the induction
hypothesis to the first subderivation,
obtaining t′1 : TS ref. Thus, by T-
ASSIGN, t′1 := t2 : unit and so
t′ : unit.
Subcase E-ASSIGN2:
t2 | µ −→ t
′
2 | µ
′
t′ = v1 := t
′
2 | µ
′
From the assumptions of the T-
ASSIGN case, we have subderiva-
tions of the original typing derivation
with conclusions v1 : TS ref and
t2 : TS. We may apply the induction
hypothesis to the second subderiva-
tion, obtaining t′2 : TS. Thus, by T-
ASSIGN, v′1 := t
′
2 : unit and so
t′ : unit.
Subcase E-ASSIGN:
t1 = l
t2 = v2
t′ = unit | [l 7→ v1]µ
Immediate since t′ = unit.
Case T-DEREF:
t = $t1
T = TS
If the last rule in the derivation is T-DEREF,
then we know from the form of this rule that t
must have the form $t1 and that T is TS. We
must further have a subderivation with conclu-
sion t1 : TS ref for some TS. Looking at the
evaluation rules with the dereferencing opera-
tor on the left-hand side, there are two rules by
which t −→ t′ can be derived: E-DEREF and
E-DEREFLOC.
Subcase E-DEREF:
t1 | µ −→ t
′
1 | µ
′
t′ = $t′1 | µ
′
From the assumptions of the T-
DEREF case, we have a subderiva-
tion of the original typing derivation
with conclusion t1 : TS ref. We
may apply the induction hypothesis,
obtaining t′1 : TS ref. Thus, by T-
DEREF, $t′1 : TS and so t
′ : TS.
Subcase E-DEREFLOC:
t1 =l | µ
t′ = $l | µ
From the assumptions of the T-
DEREF case, we have a subderiva-
tion of the original typing derivation
with conclusion t1 : TS ref. In this
subcase we have that t1 is the loca-
tion value l. We know that l has type
TS ref. Thus, by T-ASSIGN, $l: TS
and so t′ : TS.
Case T-SW:
t = sw t1
T = TH sw
If the last rule in the derivation is T-SW, then
we know from the form of this rule that t must
have the form sw t1 and that T is TH sw for
some TH. We must further have a subderivation
with conclusion t1 : TH. Looking at the evalua-
tion rules with sw on the left-hand side, there is
one rule by which t −→ t′ can be derived: E-
SW. By the induction hypothesis, t′1 : TH and
so by T-SW, sw t′1 : TH sw. Therefore, t
′ : TH
sw as needed.
Case T-UNSW:
t = unsw t1
T = TH
If the last rule in the derivation is T-UNSW,
then we know from the form of this rule that
t must have the form unsw t1 and that T is TH
for some TH. We must further have a subderiva-
tion with conclusion t1 : TH sw. Looking at
the evaluation rules with hw on the left-hand
side, there is one rule by which t −→ t′ can
be derived: E-UNSW. By the induction hypoth-
esis, t′1 : TH sw and so by T-SW, unsw t
′
1 : TH.
Therefore, t′ : TH as needed.
Case T-ARR-ACC:
t = t1[t2]
T = TH
If the last rule in the derivation is T-ARR-ACC,
then we know from the form of this rule that t
must have the form t1[t2] and that T is TH.
We must further have a subderivation with con-
clusion t1 : TH[n] and t2 : int. Looking at
the evaluation rules with array access on the
left-hand side, there are two rules by which
t −→ t′ can be derived: E-ARR-ACC and E-
ARR-ACC1.
Subcase E-ARR-ACC:
t1 = v1
t2 = v2
t′ = v1,v2
From the assumptions of the T-ARR-
ACC case, we have subderivations
of the original typing derivation with
conclusions t1 : TH[n] and t2 : int.
Thus, the v2
th element of v1 is of
type TH, and so t
′ : TH.
Subcase E-ARR-ACC1:
t1 −→ t
′
1
t′ = v1[t
′
2]
From the assumptions of the T-
ARR-ACC case, we have subderiva-
tions of the original typing deriva-
49
A PREPRINT - NOVEMBER 12, 2019
tion with conclusions t1 : TH[n] and
t2 : int. By the induction hypothe-
sis, t′2 : int. Therefore, by T-ARR-
ACC, v1[t
′
2] : TH and so t
′ : TH.
Case T-RCD:
t = {li = t
i∈1..n
i }
T = {li : T
i∈1..n
i }
If the last rule in the derivation is T-RCD, then
we know from the form of this rule that t
must have the form {li = t
i∈1..n
i } and that T
is {li : T
i∈1..n
i }. We must further have a sub-
derivation for each i with conclusion ti : Ti.
Looking at the evaluation rules with record cre-
ation on the left-hand side, there are two rules
by which t −→ t′ can be derived: E-RCD and
E-RCDV.
Subcase E-RCD:
tj −→ t
′
j
t′ = {li = v
i∈1..j−1
i , lj = t
′
j, lk = t
k∈j+1..n
k }
From the assumptions of the T-RCD
case, we have for each i a sub-
derivation of the original typing
derivation with conclusion ti : Ti.
Thus, tj : Tj. We may apply the
induction hypothesis, obtaining
t′j : Tj. Thus, by T-RCD, we obtain
{li = v
i∈1..j−1
i , lj = t
′
j, lk = t
k∈j+1..n
k : {li : T
i∈1..n
i }
and so t′ : {li : T
i∈1..n
i }.
Subcase E-RCDV:
tj = vj
t′ = {li = v
i∈1..j
i , lk = t
k∈j+1..n
k }
If t −→ t′ is derived using E-RCDV,
then from the form of this rule we see
that tj = vj and the resulting term
t′ is an evaluation on the remaining
fields of the record. This means we
are finished, since we know (by the
assumptions of the T-RCD case) that
t′ :{li = v
i∈1..j−1
i , lj = t
′
j, lk = t
k∈j+1..n
k },
which is what we need.
Case T-PROJ:
t = #lj t1
T = Tj
If the last rule in the derivation is T-PROJ, then
we know from the form of this rule that t must
have the form #lj t1 and that T is Tj. We
must further have a subderivation with conclu-
sion t1 :{li : T
i∈1..n
i }. Looking at the evalua-
tion rules with projection on the left-hand side,
there are two rules by which t −→ t′ can be
derived: E-PROJ-RCD and E-PROJ.
Subcase E-PROJ-RCD:
t1 = #lj {li = v
i∈1..n
i }
t′ = vj
From the assumptions of the T-PROJ
case, we have a subderivation of the
original typing derivation with con-
clusion t1 :{li : T
i∈1..n
i }. Thus ex-
tracting the value corresponding to
field lj must yield vj which is of
type Tj, and so t
′ : Tj.
Subcase E-PROJ:
t1 −→ t
′
1
t′ = #lj t
′
1
From the assumptions of the T-PROJ
case, we have a subderivation of the
original typing derivation with con-
clusion t1 :{li : T
i∈1..n
i }. We may
apply the induction hypothesis, ob-
taining t′1 :{li : T
i∈1..n
i }. Thus, by
T-PROJ, #lj t
′
1 : Tj and so t
′ : Tj.
Case T-LET:
t = let (xi = ti)
i∈1..n in t0 end
T = T0
If the last rule in the derivation is T-LET, then
we know from the form of this rule that t must
have the form let (xi = ti)
i∈1..n in t0 end
and that T is T0. We must further have sub-
derivation with conclusions t1 : T1 and let
(xi = ti)
i∈2..n in t0 end : T0. Looking at
the evaluation rules with let-bindings on the
left-hand side, there are three rules by which
t −→ t′ can be derived: E-LET, E-LETV1,
and E-LETV2.
Subcase E-LET:
t1 −→ t
′
1
t′ = let x1 = t
′
1 (xi =
ti)
i∈1..n in t0 end
From the assumptions of the T-LET
case, we have a subderivation of
the original typing derivation with
conclusion t1 : T1. We may ap-
ply the induction hypothesis, obtain-
ing t′1 : T1. Thus, by T-LET, let
x1 = t
′
1 (xi = ti)
i∈1..n in t0 end
: T0 and so t
′ : T0.
Subcase E-LETV1:
t1 = v1
t′ = let x2 = t2 (xi =
ti)
i∈3..n in t0 end
If t1 is v1 we simply augment the
variable and type binding stores with
the mapping x1 7→ v1. The term be-
ing evaluated in the body of the let-
binding does not alter and as such
t′ : T0.
Subcase E-LETV2:
Similar to E-LETV1.
50
A PREPRINT - NOVEMBER 12, 2019
Case T-DATATY:
t = Cj t0
T = 〈Ci : Ti〉
i∈1..n
If the last rule in the derivation is T-DATATY,
then we know from the form of this rule that
t must have the form Cj t0 and that T is
〈Ci : Ti〉
i∈1..n. We must further have a sub-
derivation with conclusion t0 : Tj. Looking at
the evaluation rules with datatype construction
on the left-hand side, there is one rule by which
t0 −→ t
′
0 which is E-DATATY. By the induc-
tion hypothesis, t′0 : Tj. Then, by T-DATATY,
it follows that Cj t
′
0 :〈Ci : Ti〉
i∈1..n. Therefore,
t′ :〈Ci : Ti〉
i∈1..n.
Case T-CASE:
t = case t0 of Ci xi =>
ti
i∈1..n
T = T ′
If the last rule in the derivation is T-CASE,
then we know from the form of this rule that
t must have the form case t0 of Ci xi =>
ti
i∈1..n and that T is T′ for some T′. We
must further have subderivations with conclu-
sions t0 :〈Ci : Ti〉
i∈1..n and, for each i, ti : T
′.
Looking at the evaluation rules with case ex-
pressions on the left-hand side, there are two
rules by which t0 −→ t
′
0 can be derived: E-
CASE and E-CASE-TY.
Subcase E-CASE:
t0 −→ t′0
t′ = case t′0 of Ci xi
=> ti
i∈1..n
From the assumptions of the T-
CASE case, we have a subderivation
of the original typing derivation with
conclusion t0 :〈Ci : Ti〉
i∈1..n. We
may apply the induction hypothesis,
obtaining t′0 :〈Ci : Ti〉
i∈1..n. Thus,
by T-CASE, case t′0 of Ci xi =>
ti
i∈1..n : T0, and so t
′ : T′.
Subcase E-CASE-TY:
t0 = Cj vj
t′ = [xj 7→ vj] tj
As we assumed earlier, ti : T
′ for
all ti and so tj : T
′. As a result,
[xj 7→ vj] tj : T
′.
51
A PREPRINT - NOVEMBER 12, 2019
D Implementation
D.1 Types.ty datatype
1 datatype ty = H_TY of h_ty
2 | S_TY of s_ty
3 | M_TY of m_ty
4 | META of tyvar
5 | TOP
6 | BOTTOM
7
8 and h_ty = BIT
9 | ARRAY of {ty: h_ty, size: int ref}
10 | TEMPORAL of {ty: h_ty, time: int ref}
11 | H_RECORD of (tyvar * h_ty) list
12 | H_DATATYPE of (tyvar * h_ty option) list * unit ref
13 | H_POLY of tyvar list * h_ty
14 | H_META of tyvar
15 | H_TOP
16 | H_BOTTOM
17
18 and s_ty = INT | REAL | STRING
19 | ARROW of (s_ty * s_ty)
20 | LIST of s_ty
21 | SW of h_ty
22 | S_RECORD of (tyvar * s_ty) list
23 | REF of s_ty
24 | S_DATATYPE of (tyvar * s_ty option) list * unit ref
25 | S_MU of tyvar list * s_ty
26 | S_POLY of tyvar list * s_ty
27 | S_META of tyvar
28 | S_TOP
29 | S_BOTTOM
30
31 and m_ty = MODULE of h_ty * h_ty
32 | PARAMETERIZED_MODULE of s_ty * h_ty * h_ty
33 | M_POLY of tyvar list * m_ty
34 | M_BOTTOM
D.2 Absyn.ty datatype
1 datatype ty = NameTy of symbol * pos
2 | ParameterizedTy of symbol * (ty list) * pos
3 | TyVar of symbol * pos
4 | SWRecordTy of field list * pos
5 | HWRecordTy of field list * pos
6 | ArrayTy of ty * int * pos
7 | ListTy of ty * pos
8 | TemporalTy of ty * int * pos
9 | RefTy of ty * pos
10 | SWTy of ty * pos
11 | FunTy of ty * ty * pos
12 | PlaceholderTy of unit ref
13 | ExplicitTy of Types.ty
52
A PREPRINT - NOVEMBER 12, 2019
D.3 Value.value datatype
1 datatype value
2 = IntVal of int
3 | StringVal of string
4 | RealVal of real
5 | ListVal of value list
6 | RefVal of value ref
7 | SWVal of value
8 | RecordVal of (symbol * value) list
9 | FunVal of (value -> value) ref
10 | DatatypeVal of (symbol * unit ref * value)
11 | NamedVal of symbol * Types.ty
12 | BitVal of GeminiBit.bit
13 | ArrayVal of value vector
14 | HWRecordVal of (symbol * value) list
15 | BinOpVal of {left: value, oper: binop, right: value}
16 | UnOpVal of {value: value, oper: unop}
17 | ArrayAccVal of {arr: value, index: int}
18 | DFFVal of int
19 | PreParamModuleVal of (value -> value -> value) * value
20 | ModuleVal of (value -> value) * value
53
