Compiling the  $\pi$ -calculus into a Multithreaded Typed Assembly Language

Tiago Cogumbreiro Francisco Martins Vasco T. Vasconcelos

DI-FCUL

TR-08-13

May 2008

Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749–016 Lisboa Portugal

Technical reports are available at http://www.di.fc.ul.pt/tech-reports. The files are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address.

## Compiling the $\pi$ -calculus into a Multithreaded Typed Assembly Language

Tiago Cogumbreiro Francisco Martins Vasco T. Vasconcelos

May 2008

#### Abstract

Current trends in hardware made available multi-core CPU systems to ordinary users, challenging researchers to devise new techniques to bring software into the multi-core world. However, shaping software for multi-cores is more envolving than simply balancing workload among cores. In a near future (in less than a decade) Intel prepares to manufacture and ship 80-core processors [8]; programmers must perform a paradigm shift from sequential to concurrent programming and produce software adapted for multi-core platforms.

In the last decade, proposals have been made to compile formal concurrent and functional languages, notably the  $\pi$ -calculus [21], typed concurrent objects [12], and the  $\lambda$ -calculus [18], into assembly languages. The last work goes a step further and presents a series of type-preserving compilation steps leading from System F [6] to a typed assembly language. Nevertheless, all theses works are targeted at sequential architectures.

This paper proposes a type-preserving translation from the  $\pi$ -calculus into MIL, a multithreaded typed assembly language for multi-core/multiprocessor architectures [26]. We start from a simple asynchronous typed version of the  $\pi$ -calculus [2, 9, 17] and translate it into MIL code that is then linked to a run-time library (written in MIL) that provides support for implementation of the  $\pi$ -calculus primitives (*e.g.*, queuing messages and processes). In short, we implement a message-passing paradigm in a sharedmemory architecture.

## Contents

| 1        | Introduction                                                                | <b>2</b>             |
|----------|-----------------------------------------------------------------------------|----------------------|
| 2        | The π-Calculus           2.1         Syntax           2.2         Semantics | <b>4</b><br>5<br>6   |
| 3        | MIL: Multithreaded TAL         3.1       Architecture                       | 11<br>12             |
| 4        | The π-Calculus Run-time         4.1       Channels Queues                   | 26<br>27<br>32<br>42 |
| <b>5</b> | Translating the $\pi$ -calculus into MIL                                    | 46                   |
| 6        | Conclusion                                                                  | 54                   |
| A        | Queues                                                                      | 56                   |

# Chapter 1

## Introduction

Physical and electrical constrains are limiting the increase of frequency of each processing unit of a processor, thus the top speed of each processing unit is not expected to increase much more in near future. Instead manufactures are augmenting the number of processing units in each processor (multicore processors) to continue delivering performance gains. The industry is making big investments in projects, such as RAMP [23] and BEE2 [1], that enable emulation of multi-core architectures, showing interest in supporting the foundations for software research that targets these architectures.

To take advantage of multi-core architectures, parallel and concurrent programming needs to be mastered [20]. With the advent of major availability of parallel facilities (from embedded systems, to super computers), programmers must do a paradigm shift from sequential to parallel programming and produce, from scratch, software adapted for multi-core platforms.

The MDA (Model-Driven Architecture) / MDE (Model-Driven Engineering) methodologies are being widely used for software system development [19, 10]. However, these methodologies have informal specification languages and lack semantic foundations. The concurrency theory results (*e.g.* operational semantics, or axiomatic semantics) might enhance these methodologies [5].

In the last decade, some proposals have been made to compile concurrent and functional languages, notably the  $\pi$ -calculus [25], typed concurrent objects [12], and the  $\lambda$ -calculus [18], into assembly languages. The last work goes a step further and presents a series of type preserving compilation steps leading from System F [6] to a typed assembly language [18]. Yet, all the works are targeted for systems with a single core CPU architecture. We propose a typed preserving translation from the  $\pi$ -calculus into MIL, a multithreaded typed assembly language aiming at multi-core/multi-processor architectures. We depart from a simple asynchronous typed version of the  $\pi$ -calculus [2, 9, 17] and translate it into MIL, based on a run-time library (written in MIL) that provides support (*e.g.* queueing of messages and processes) for implementation the  $\pi$ -calculus primitives.

The run-time library defines channels and operations on channels to send and to receive messages. Messages are buffered in the channel they are sent to, until a process request for a message in the channel. The reverse is also true to processes: a process requesting for a message gets blocked until another one sends a message, by storing their state and their code in the channel.

The run-time library and the translation function have an intertwined design, although there is a clear separation of concerns between them. The concept of a process is traversal and defined at different levels of abstraction in both parts.

The concurrency is also preserved to a certain extent (limited only by the number of available processors):  $\pi$  processes are represented by MIL threads of execution. The concurrent architecture of the target language is, therefore, extensively explored, resulting in highly parallel programs free of race-conditions.

This paper is divided into seven chapters. Chapter 2 describes the  $\pi$ -calculus, the source language. Chapter 3 presents our target language, MIL. Chapter 4 and Chapter 5 discusses the translation from the  $\pi$ -calculus into MIL. Finally, in Chapter 6, we summarize our work and hint at future directions.

## Chapter 2

### The $\pi$ -Calculus

The  $\pi$ -calculus, developed by Robin Milner, Joachim Parrow, and David Walker [17], is a process algebra for describing *mobility*. The  $\pi$ -calculus is used to model a network of interconnected processes interacting through connection links (ports) by sending and receiving references to other processes, thus allowing the dynamic reconfiguration of the network.

As an example, consider a process that bounces every message received in a port. Figure 2.1 depicts such an interaction. The client sends message msg and a reply channel, where the server should echo the message to the client. Afterwards, the server sends the message msg back to the client.

In this chapter we present an overview of the  $\pi$ -calculus syntax and semantics.



Figure 2.1: The server echoing the received message back to the client. The dashed line represents communication from the client to the server. The full line represents communication from the server to the client.



The syntax of T is illustrated in Figure 2.3

Figure 2.2: Process syntax

#### 2.1 Syntax

**Processes.** The adopted  $\pi$ -calculus syntax is based on [16] with extensions presented in [24]: asynchronous, polyadic, and typed.

The syntax, depicted in Figure 2.2, is divided into two categories: names and processes. Names are ranged over by lower case letters x and y. Values, v, symbolise either names or primitive values. A vector above a symbol abbreviates a possibly empty sequence of these symbols. For example  $\vec{x}$ stands for the sequence of names  $x_0 \dots x_n$  with  $n \ge 0$ . Processes, denoted by upper case letters P and Q, comprise the nil process,  $\mathbf{0}$ , corresponding to the inactive process; the output process,  $\overline{x}\langle \vec{v} \rangle$ , outlines the action of sending data,  $\vec{v}$ , through a channel x; the input process,  $x(\vec{y}).P$ , that receives a sequence of values via channel x and continues as P, with the received names substituted for the received values; the parallel composition process,  $P \mid Q$ , represents two active processes running concurrently; the restriction process,  $(\nu x: (\vec{T})) P$ , that creates a new channel definition local to process P; and finally the replicated process, !P, that represents an infinite number of active processes running in parallel.

The following example is a possible implementation of the echo server depicted in Figure 2.1.

$$!echo(msg, reply).\overline{reply}\langle msg\rangle$$
(2.1)

This process is ready to receive a message msg and a communication channel

 $Types \qquad Basic value types \\ T, S ::= B \qquad basic value type \qquad B ::= int \quad integer type \\ | (\vec{T}) \quad link type \qquad | str \quad string type \end{cases}$ 

Figure 2.3: Type syntax

*reply* trough channel *echo*. After receiving the values, it outputs the message through channel *reply*. The process is replicated because is must be able to communicate with multiple clients.

**Types.** Types are assigned to channels and to basic values. A basic value type is either a string, str, or an integer, int; the channel type  $(\vec{T})$  describes the types of the communicated value  $\vec{T}$  through the channel. For example, a possible type for the *echo* channel from Process 2.1 is (str, (str)).

#### 2.2 Semantics

The semantics of the  $\pi$ -calculus expresses formally the behaviour of processes. With a rigorous semantics we can identify if two processes have the same structural behaviour, observe how a process evolves as it interacts, and analyse how links move from one process to another.

For the sake of clarity, we omit the type from the restriction operator.

**Structural Congruence.** The structural congruence relation,  $\equiv$ , is the smallest congruence relation on processes closed under rules given in Figure 2.4. Structural congruence identifies processes that represent the same behaviour structure and can be used to reshape process structure to enable reduction. The rules are straightforward. Rule S1 allows for alpha-conversion; Rules S2, S3, and S4 are the standard commutative monoid laws regarding parallel composition, having **0** as neutral element; Rule S5 allows for scope extrusion; Rule S6 garbage collects unused names; Rule S7 states that restriction order is of no importance; and finally Rule S8 allows replication to unfold.

(S1) change of bound names

 $\begin{array}{ll} (\text{S2}) \ P \mid \mathbf{0} \equiv P, \quad P \mid Q \equiv Q \mid P, \quad P \mid (Q \mid R) \equiv (P \mid Q) \mid R \\ (\text{S3}) \ (\nu \, x \colon (\vec{T})) \, (P \mid Q) \equiv P \mid (\nu \, x \colon (\vec{T})) \, Q & \text{if } x \notin \text{fn}(P) \\ (\text{S4}) \ (\nu \, x \colon (\vec{T})) \, \mathbf{0} \equiv \mathbf{0} \\ (\text{S5}) \ (\nu \, x \colon (\vec{T})) \, (\nu \, y \colon (\vec{S})) \, P \equiv (\nu \, y \colon (\vec{S})) \, (\nu \, x \colon (\vec{T})) \, P, \, \text{if } x \neq y \\ (\text{S6}) \ !P \equiv P \mid !P \end{array}$ 



$$\frac{\overline{x(\vec{y}).P \mid \overline{x}\langle \vec{v} \rangle \mid Q \to P\{\vec{v}/\vec{a}\} \mid Q}}{P \to P'} \xrightarrow{P \to P'} \frac{P \to P'}{(\nu \, x \colon (\vec{T})) \, P \to (\nu \, x \colon (\vec{T})) \, P'}} \operatorname{Res} \frac{Q \equiv P \quad P \to P' \quad P' \equiv Q'}{Q \to Q'} \operatorname{Struct}$$

Figure 2.5: Reaction Rules

Bound and free names are defined as usual in the  $\pi$ -calculus, so we omit their formal definitions.

**Reduction.** The reduction relation  $\rightarrow$  defined over processes, in Figure 2.5, establishes how a computational step transforms a process [15]. The formula  $P \rightarrow Q$  means that process P can interact and evolve (reduce) to process Q.

The axiom REACT is the gist of the reaction rules, representing the communication along a channel [14]. An output process,  $\overline{x}\langle \vec{v} \rangle$ , can interact with an input process,  $x(\vec{y}).P$ , if they have the same channel's name, x. The output message,  $\vec{v}$ , moves along channel x to process P and replaces the entry points,  $\vec{y}$ , resulting  $P\{\vec{v}/\vec{y}\}$ . The term  $P\{\vec{v}/\vec{y}\}$  means that the names  $\vec{y}$ , in process P, are to be replaced by the values  $\vec{v}$ .

$$\begin{array}{ll} \displaystyle \frac{baseval \in B}{\Gamma \vdash baseval \colon B} \ \mbox{Tv-Base} & \overline{\Gamma, x \colon T \vdash x \colon T} \ \mbox{Tv-NAME} \\ \\ \displaystyle \overline{\Gamma \vdash \mathbf{0}} \ \mbox{Tv-NIL} & \displaystyle \frac{\Gamma \vdash P}{\Gamma \vdash P} \ \mbox{Tv-REP} \\ \\ \displaystyle \frac{\Gamma \vdash x \colon (T_0 \ldots T_i) \quad \Gamma, y_0 \colon T_0, \ldots, y_i \colon T_i \vdash P}{\Gamma \vdash x(\vec{y}).P} \ \mbox{Tv-In} \\ \\ \displaystyle \frac{\Gamma \vdash x \colon (\vec{T}) \quad \Gamma \vdash v_i \colon T_i \quad \forall i \in I}{\Gamma \vdash x \langle \vec{v} \rangle} \ \mbox{Tv-Out} \\ \\ \displaystyle \frac{\Gamma \vdash P \quad \Gamma \vdash Q}{\Gamma \vdash P \mid Q} \ \mbox{Tv-Par} & \displaystyle \frac{\Gamma, x \colon (\vec{T}) \vdash P}{\Gamma \vdash (\nu \, x \colon (\vec{T})) P} \ \mbox{Tv-Res} \end{array}$$

Figure 2.6: Typing rules for the  $\pi$ -calculus

Rule PAR expresses that reduction can appear on the right side of a parallel composition. Res governs reduction inside the restriction operator. Rule STRUCT brings congruence rules to the reduction relation.

Process

$$!echo(msg, reply).\overline{reply}\langle msg\rangle \mid (\nu \, r) \, \overline{echo} \langle \text{'hello world!'}, r \rangle$$

represents, respectively, the echo server being run concurrently with a client that creates a new name r and sends it, along with a message, through channel *echo*, to the server. The following steps describe the reaction between both processes:

$$!echo(msg, reply).\overline{reply}\langle msg \rangle \mid (\nu r) \,\overline{r} \langle \text{'hello world}!' \rangle$$

**Type system.** Figure 2.6 presents a standard type system for the  $\pi$ -calculus.

Rule Tv-BASE states that primitive values (strings and numbers) are well typed. Rule Tv-NAME sets forth that a name is well typed if it is defined in the type environment and the type used matches the name's declaration. The inactive process **0** is always well typed, rule Tv-NIL. The process  $(\nu x: (\vec{T})) P$  is well typed if, by adding the association between name x and type  $(\vec{T})$  to  $\Gamma$ , the contained process P is well typed, rule Tv-Res. Tv-IN rules that the input process,  $x(\vec{y}).P$ , is well typed if the name of the input channel, x, is a link type and if, by mapping each name of the input channel's arguments to the corresponding type of x, the contained process, P, is well typed. The output process,  $\overline{x}\langle \vec{v} \rangle$ , is well typed if its name, x, is declared as link type and if its arguments are correctly typed, rule Tv-Out. The consistency of the replicated process depends on the consistency of the process being replicated, rule Tv-REP. The parallel process is well typed if each of the composing processes are well typed, rule Tv-PAR.

Now, we show that process

$$(\nu \ echo: (str, (str))) \ echo(msg, reply).reply\langle msg \rangle$$

is well typed. Using rule TV-RES we derive

$$\frac{\emptyset, echo: (str, (str)) \vdash echo(msg, reply).\overline{reply} \langle msg \rangle}{\emptyset \vdash (\nu \ echo: (str, (str))) \ echo(msg, reply).\overline{reply} \langle msg \rangle} \ \text{Tv-Res}$$

Let  $\Gamma' \stackrel{\text{def}}{=} \emptyset$ , *echo*: (str, (str)). We need to prove that the new typing environment,  $\Gamma'$ , typifies process

$$echo(msg, reply).\overline{reply}\langle msg \rangle$$

Applying rule TV-IN:

$$\frac{\Gamma', msg: str, reply: (str) \vdash \overline{reply} \langle msg \rangle \quad \Gamma' \vdash echo: (str, (str))}{\Gamma' \vdash echo(msg, reply).\overline{reply} \langle msg \rangle} \text{ Tv-In}$$

Rule TV-NAME ensures that  $\Gamma' \vdash echo: (str, (str))$  holds. Now, let

$$\Gamma'' \stackrel{\text{def}}{=} \Gamma', msg: str, reply: (str)$$

We are left with the second sequent, that also holds

$$\frac{\overline{\Gamma'' \vdash reply \colon (str)} \quad \text{Tv-Name}}{\Gamma'' \vdash \overline{reply} \langle msg \rangle} \quad \frac{\overline{\Gamma'' \vdash msg \colon str}}{\text{Tv-Name}} \quad \text{Tv-Name}}_{\text{Tv-Out}}$$

### Chapter 3

## MIL: Multithreaded Typed Assembly Language

MIL [26] combines a typed assembly language (TAL) with multithreaded programming, providing the possibility for "executing trusted code safely and efficiently" [18]. Types ensure that pointers cannot be fabricated or forged and that jumps can only be done to checked code, allowing untrusted compilers to generate a typed assembly language that can be compiled with a single trusted compiler.

Multithreaded programming at assembly level helps structuring interthread synchronisation. The type system we provide for the language enforces the absence of race conditions.

#### **3.1** Architecture

MIL envisages an abstract multi-processor with a shared main memory. Each processor consists of registers and of an instruction cache. The main memory is divided into a heap (for storing data and code blocks) and a run pool. Data blocks are represented by tuples and are protected by locks. Code blocks declare the needed registers (including the type for each register), the required locks, and an instruction set. The run pool contains suspended threads waiting for execution. It may happen that there are more threads to be run than the number of processors. Figure 3.1 summarizes the MIL architecture.



Figure 3.1: The MIL architecture.

#### 3.2 Syntax

The syntax of our language is generated by the grammar in Figures 3.2, 3.3, and 3.9. We postpone the exposure of types to Section 3.4. We rely on a set of *heap labels* ranged over by l, and a disjoint set of *type variables* ranged over by  $\alpha$  and  $\beta$ .

Most of the proposed instructions, represented in Figure 3.2, are standard in assembly languages. Instructions are organised in sequences, ending in a jump or in a yield. Instruction yield frees the processor to execute another thread from the thread pool. Our threads are cooperative, meaning that each thread must explicitly release the processor (using the yield instruction).

The *abstract machine*, depicted in Figure 3.3, is parametric in the number of processors available (N) and in the number of registers (R).

An abstract machine can be in two possible states: halted or running. A running machine comprises a heap, a thread pool, and an array of processors. Heaps are maps from labels into *heap values* that may be tuples or code blocks. *Tuples* are vectors of values protected by some lock. Code blocks comprise a signature and a body. The signature of a code block describes the type of each register used in the body, and the locks held by the processor when jumping to the code block. The body is a sequence of instructions to be executed by a processor.

```
r ::= \mathbf{r}_1 \mid \ldots \mid \mathbf{r}_{\mathsf{R}}
registers
                          n ::= \dots | -1 | 0 | 1 | \dots
integer values
                           b ::= -1 | 0 | 1 | \dots
lock values
                           v ::= r \mid n \mid b \mid l \mid \text{pack } \tau, v \text{ as } \tau \mid \text{packL} \alpha, v \text{ as } \tau \mid
values
                                      v[\tau] \mid ?\tau
instructions
                           \iota ::=
                                      r := v \mid r := r + v \mid \text{ if } r = v \text{ jump } v \mid
   control flow
                                      r := malloc [\vec{\tau}] guarded by \alpha \mid
   memory
                                      r := v[n] \mid r[n] := v \mid
                                      \alpha, r := unpack v \mid
   unpack
   lock
                                      \alpha, r := \mathsf{newLock} \ b \ | \ \alpha := \mathsf{newLockLinear}
                                      r := \mathsf{tslE} v \mid r := \mathsf{tslS} v \mid \mathsf{unlockE} v \mid \mathsf{unlockS} v \mid
   fork
                                      fork v
                         I ::= \iota; I \mid \mathsf{jump} v \mid \mathsf{yield}
inst. sequences
```

Figure 3.2: Instructions

A thread pool is a multiset of pairs, each of which contains a pointer (i.e. a label) to a code block and a register file. A processor array contains N processors, where each is composed of a register file, a set of locks, and a sequence of instructions.

#### **3.3** Operational Semantics

Thread pools are managed by the rules illustrated in Figure 3.4. Rule R-HALT stops the machine when it finds an empty thread pool and, at the same time, all processors are idle, changing the machine state to halt. Otherwise, if there is an idle processor and a thread waiting in the pool, then by Rule R-SCHEDULE the thread is assigned to the idle processor. Rule R-FORK places a new thread in the pool, taking the ownership of locks required by

| states           | $S ::= \langle H;T;P \rangle \mid$ halt                                 |
|------------------|-------------------------------------------------------------------------|
| heaps            | $H ::= \{l_1: h_1, \dots, l_n: h_n\}$                                   |
| heap values      | $h ::= \langle v_1 \dots v_n \rangle^{\alpha} \mid \tau\{I\}$           |
| thread pool      | $T ::= \{ \langle l_1, R_1 \rangle, \dots, \langle l_n, R_n \rangle \}$ |
| processors array | $P ::= \{1: p_1, \ldots, N: p_N\}$                                      |
| processor        | $p ::= \langle R; \Lambda; I \rangle$                                   |
| register files   | $R ::= \{r_1 \colon v_1, \ldots, r_{R} \colon v_{R}\}$                  |
| permissions      | $\Lambda ::= (\lambda, \lambda, \lambda)$                               |
| lock sets        | $\lambda ::= \alpha_1, \ldots, \alpha_n$                                |

Figure 3.3: Abstract machine

the forked code block.

Operational semantics regarding locks are depicted in Figure 3.5 and in Figure 3.6. The instruction **newLock** creates a new lock in three possible states, according to its parameter: locked exclusively (when the parameter is -1), locked shared (when the parameter is 1), and unlocked (when the parameter is 0). The scope of  $\alpha$  is the rest of the code block. A tuple with the value of the parameter of the **newLock** is allocated in the heap and register r is made to point it. For example, a new lock in the unlocked state allocates the tuple  $\langle \mathbf{0} \rangle^{\beta}$ . When the lock is created in the exclusive lock state, the new lock variable  $\beta$  is added to the set of exclusive locks held by the processor. Similarly, when the lock is created in the shared lock state, the new lock variable  $\beta$  is added to the set of shared locks held by the processor, allowing just one reader.

Linear locks are created by newLockLinear. The new lock variable  $\beta$  is added to the set of linear locks.

The Test and Set Lock, presented in many machines designed with multiple processes in mind, is an atomic operation that loads the contents of a word into a register and then stores another value in that word. There are two variations of the Test and Set Lock in our language: tslE and tslS. When a tslE is applied to an unlocked state, the type variable  $\alpha$  is added to the set of exclusive locks and the value becomes  $\langle -1 \rangle^{\alpha}$ . Various threads may

$$\begin{split} \frac{\forall i.P(i) = \langle .; .; \text{yield} \rangle}{\langle .; \emptyset; P \rangle \to \text{halt}} & (\text{R-halt}) \\ \frac{H(l) = \forall [.]_{-} \text{ requires } \Lambda\{I\}}{\langle H; T \uplus \{\langle l, R \rangle\}; P\{i: \langle .; .; \text{yield} \rangle\} \to \langle H; T; P\{i: \langle R; \Lambda; I \rangle\} \rangle} & (\text{R-schedule}) \\ \frac{\hat{\mathbf{R}}(v) = l \quad H(l) = \forall [.]_{-} \text{ requires } \Lambda\{.\}}{\langle H; T; \{i: \langle R; \Lambda \uplus \Lambda'; (\text{fork } v; I) \rangle\} \to \langle H; T \cup \{\langle l, R \rangle\}; P\{i: \langle R; \Lambda'; I \rangle\} \rangle} & (\text{R-fork}) \end{split}$$



read values from a tuple locked in shared state, hence when tslS is applied to a shared or to an unlocked lock the value contained in the tuple representing the lock is incremented, reflecting the number of readers holding the shared lock, and then the type variable  $\alpha$  is added to the set of hold shared locks. When tslS is applied to a lock in the exclusive state, it places a -1in the target register and the lock is not acquired by the thread issuing the operation.

Shared locks are unlocked with unlockS and the number of readers is decremented. The running processor must hold the shared lock. Exclusive locks are unlocked with unlockE, while the running processor holds the exclusive lock.

Rules related to memory instructions are illustrated in Figure 3.7. Values can be stored in a tuple, when the lock that guards the tuple is hold by the processor in the set of exclusive locks or in the set of linear locks. A value can be loaded from a tuple if the lock guarded by it is hold by the processor in any set of locks. The rule for malloc allocates a new tuple in the heap and makes r point to it. The size of the tuple is that of sequence of types  $[\vec{\tau}]$ , its values are uninitialised values.

The transition rules for the control flow instructions, illustrated in Figure 3.8, are straightforward [22]. They rely on function  $\hat{R}$  that works on registers or on values, by looking for values in registers, in packs, and in universal concretions.

$$\begin{split} \frac{P(i) = \langle R; \Lambda; (\alpha, r := \mathsf{newLock} \ \mathbf{0}; I) \rangle & l \not\in \operatorname{dom}(H) & \beta \not\in \Lambda \\ \langle H; T; P \rangle \to \langle H\{l: \langle \mathbf{0} \rangle^{\beta} \}; T; P\{i: \langle R\{r: l\}; \Lambda; I[\beta/\alpha] \rangle \} \rangle \\ & (\text{R-NEW-LOCK} \ \mathbf{0}) \\ \\ \frac{P(i) = \langle R; \Lambda; (\alpha, r := \mathsf{newLock} \ \mathbf{1}; I) \rangle & l \not\in \operatorname{dom}(H) & \beta \not\in \Lambda \\ \langle H; T; P \rangle \to \langle H\{l: \langle \mathbf{1} \rangle^{\beta} \}; T; P\{i: \langle R\{r: l\}; (\lambda_E, \lambda_S \uplus \{\beta\}, \lambda_L); I[\beta/\alpha] \rangle \} \rangle \\ & (\text{R-NEW-LOCK} \ \mathbf{1}) \\ \\ \frac{P(i) = \langle R; \Lambda; (\alpha, r := \mathsf{newLock} \ \mathbf{-1}; I) \rangle & l \not\in \operatorname{dom}(H) & \beta \not\in \Lambda \\ \langle H; T; P \rangle \to \langle H\{l: \langle \mathbf{-1} \rangle^{\beta} \}; T; P\{i: \langle R\{r: l\}; (\lambda_E \uplus \{\beta\}, \lambda_S, \lambda_L); I[\beta/\alpha] \rangle \} \rangle \\ & (\text{R-NEW-LOCK} \ \mathbf{-1}) \\ \\ \\ \frac{P(i) = \langle R; \Lambda; (\alpha := \mathsf{newLockLinear}; I) \rangle & \beta \not\in \Lambda \\ \langle H; T; P \rangle \to \langle H; T; P\{i: \langle R; (\lambda_E, \lambda_S, \lambda_L \uplus \{\beta\}); I[\beta/\alpha] \rangle \} \rangle \\ & (\text{R-NEW-LOCKL}) \end{split}$$

Figure 3.5: Operational semantics (lock creation)

$$\hat{\mathbf{R}}(v) = \begin{cases} R(v) & \text{if } v \text{ is a register} \\ \mathsf{pack} \ \tau, \hat{\mathbf{R}}(v') \text{ as } \tau' & \text{if } v \text{ is pack } \tau, v' \text{ as } \tau' \\ \mathsf{packL} \ \alpha, \hat{\mathbf{R}}(v') \text{ as } \tau & \text{if } v \text{ is packL} \ \alpha, v' \text{ as } \tau \\ \hat{\mathbf{R}}(v')[\tau] & \text{if } v \text{ is } v'[\tau] \\ v & \text{otherwise} \end{cases}$$

#### 3.4 Type Discipline

The syntax of types is exposed in Figure 3.9. A type of the form  $\langle \vec{\sigma} \rangle^{\alpha}$  describes a tuple that is protected by lock  $\alpha$ . Each type  $\vec{\sigma}$  is either initialised  $(\tau)$  or uninitialised  $(?\tau)$ . A type of form  $\forall [\vec{\alpha}]\Gamma$  requires  $\Lambda$  describes a code block; a thread jumping into such a block must instantiate all the universal variables  $\vec{\alpha}$ ; it must also hold a register file type  $\Gamma$  as well as the locks in  $\Lambda$ . The singleton lock type,  $\mathsf{lock}(\alpha)$ , is used to represent the type of a lock value in the heap. The types  $\exists \alpha.\tau$  defines conventional existential type. With type  $\exists^{\mathrm{L}}\alpha.\tau$  we are able to use the existential quantification over lock types, by following [4]. The recursive type, where the type may itself be present in the

$$\begin{split} \frac{P(i) = \langle R; \Lambda; (r := \mathsf{tslS} v; I) \rangle & \hat{R}(v) = l & H(l) = \langle b \rangle^{\alpha} & b \geq \mathbf{0} \\ \overline{\langle H; T; P \rangle} \rightarrow \langle H\{l: \langle b + \mathbf{1} \rangle^{\alpha}\}; T; P\{i: \langle R\{r: \mathbf{0}\}; (\lambda_{E}, \lambda_{S} \uplus \{\alpha\}, \lambda_{L}); I \rangle\} \rangle \\ & (R\text{-TSLS-ACQ}) \\ \frac{P(i) = \langle R; \Lambda; (r := \mathsf{tslS} v; I) \rangle & H(\hat{R}(v)) = \langle \mathbf{-1} \rangle^{\alpha}}{\langle H; T; P \rangle \rightarrow \langle H; T; P\{i: \langle R\{r: -\mathbf{1}\}; \Lambda; I \rangle\} \rangle} & (R\text{-TSLS-FAIL}) \\ \frac{P(i) = \langle R; \Lambda; (r := \mathsf{tslE} v; I) \rangle & \hat{R}(v) = l & H(l) = \langle \mathbf{0} \rangle^{\alpha}}{\langle H; T; P \rangle \rightarrow \langle H\{l: \langle \mathbf{-1} \rangle^{\alpha}\}; T; P\{i: \langle R\{r: \mathbf{0}\}; (\lambda_{E} \uplus \{\alpha\}, \lambda_{S}, \lambda_{L}); I \rangle\} \rangle} \\ & (R\text{-TSLE-ACQ}) \\ \frac{P(i) = \langle R; \Lambda; (r := \mathsf{tslE} v; I) \rangle & H(\hat{R}(v)) = \langle b \rangle^{\alpha} & b \neq \mathbf{0}}{\langle H; T; P \rangle \rightarrow \langle H\{l: \langle \mathbf{-1} \rangle^{\alpha}\}; T; P\{i: \langle R\{r: \mathbf{0}\}; \Lambda; I \rangle\} \rangle} \\ & (R\text{-TSLE-FAIL}) \\ \frac{P(i) = \langle R; \Lambda; (r := \mathsf{tslE} v; I) \rangle & H(\hat{R}(v)) = \langle b \rangle^{\alpha} & b \neq \mathbf{0}}{\langle H; T; P \rangle \rightarrow \langle H\{l: \langle b - \mathbf{1} \rangle^{\alpha}\}; T; P\{i: \langle R\{r: b\}; \Lambda; I \rangle\} \rangle} \\ & (R\text{-UNLOCKS}) \\ \frac{P(i) = \langle R; (\lambda_{E}, \lambda_{S} \uplus \{\alpha\}, \lambda_{L}); (\mathsf{unlockS} v; I) \rangle & \hat{R}(v) = l & H(l) = \langle b \rangle^{\alpha}}{\langle H; T; P \rangle \rightarrow \langle H\{l: \langle \mathbf{0} \rangle^{\alpha}\}; T; P\{i: \langle R; (\lambda_{E}, \lambda_{S}, \lambda_{L}); I \rangle\} \rangle} \\ & (R\text{-UNLOCKS}) \\ \end{array}$$

Figure 3.6: Operational semantics (lock manipulation)

types it is composed by, is defined by  $\mu\alpha.\tau$ .

The type system is presented in Figures 3.10 to 3.13. Typing for values is illustrated in Figure 3.10. Heap values are distinguished from operands (that include registers as well) by the form of the sequent. Uninitialised value  $?\tau$  has type  $?\tau$ ; we use the same syntax for a uninitialised value (at the left of the colon) and its type (at the right of the colon). A formula  $\sigma <: \sigma'$  allows to "forget" initialisations.

Instructions are checked against a typing environment  $\Psi$  (mapping labels to types, and type variables to the kind Lock: the kind of singleton lock types), a register file type  $\Gamma$  holding the current types of the registers, and a triple  $\Lambda$  that comprises sets of lock variables (the *permission* of the code

$$\begin{split} \frac{P(i) = \langle R; \Lambda; (r := \text{malloc } [\vec{\tau}] \text{ guarded by } \alpha; I) \rangle & l \not\in \text{dom}(H)}{\langle H; T; P \rangle \rightarrow \langle H\{l; \langle \vec{?\tau} \rangle^{\alpha}\}; T; P\{i: \langle R\{r: l\}; \Lambda; I \rangle\} \rangle} & (\text{R-MALLOC}) \\ \frac{P(i) = \langle R; \Lambda; (r := v[n]; I) \rangle & H(\hat{R}(v)) = \langle v_1 .. v_n .. v_{n+m} \rangle^{\alpha}}{\langle H; T; P \rangle \rightarrow \langle H; T; P\{i: \langle R\{r: v_n\}; \Lambda; I \rangle\} \rangle} & (\text{R-LOAD}) \\ \frac{P(i) = \langle R; \Lambda; (r[n] := v; I) \rangle}{R(r) = l & H(l) = \langle v_1 .. v_n .. v_{n+m} \rangle^{\alpha}} \\ \frac{\langle H; T; P \rangle \rightarrow \langle H\{l: \langle v_1 .. \hat{R}(v) .. v_{n+m} \rangle^{\alpha}\}; T; P\{i: \langle R; \Lambda; I \rangle\} \rangle}{\langle H; T; P \rangle \rightarrow \langle H\{l: \langle v_1 .. \hat{R}(v) .. v_{n+m} \rangle^{\alpha}\}; T; P\{i: \langle R; \Lambda; I \rangle\} \rangle} & (\text{R-STORE}) \end{split}$$

Figure 3.7: Operational semantics (memory)

block), that are, respectively, the exclusive, the shared, and the linear.

Rule T-YIELD requires that shared and exclusive locks must have been released prior to the ending of the thread. Rule T-FORK splits the permission into the two tuples  $\Lambda$  and  $\Lambda'$ : the former is transferred to the forked thread, the latter remains with the current thread, according to the permissions required by the target code block.

Rules T-NEW-LOCK1, T-NEW-LOCK-1, and T-NEW-LOCKL each adds the type variable into the respective set of locks. Rules T-NEW-LOCK0, T-NEW-LOCK1, and T-NEW-LOCK-1 assign a lock type to the register. Rules T-TSLE and T-TSLS require that the value under test holds a lock, disallowing testing a lock already held by the thread. Rules T-UNLOCKE and T-UNLOCKS make sure that only held locks are unlocked. Finally, the rules T-CRITICALE and T-CRITICALS ensure that the current thread holds the exact number of locks required by the target code block. Each of these rules also adds the lock under test to the respective set of locks of the thread. A thread is guaranteed to hold the lock only after (conditionally) jumping to a critical region. A previous test and set lock instructions may have obtained the lock, but as far as the type system goes, the thread holds the lock after the conditional jump.

The typing rules for memory and control flow are depicted in Figure 3.12. The rule for malloc ensures that allocated memory is protected by a lock ( $\alpha$ ) present in the scope. The lock that guards a tuple defines the permissions that affect how the loading and the storing operations work. Holding a lock

$$\begin{split} \frac{P(i) = \langle R; \Lambda; \mathsf{jump} v \rangle \quad H(\hat{\mathbb{R}}(v)) = \_\{I\}}{\langle H; T; P \rangle \to \langle H; T; P\{i: \langle R; \Lambda; I \rangle\} \rangle} & (\mathbb{R}\text{-JUMP}) \\ \frac{P(i) = \langle R; \Lambda; (r := v; I) \rangle}{\langle H; T; P \rangle \to \langle H; T; P\{i: \langle R\{r : \hat{\mathbb{R}}(v)\}; \Lambda; I \rangle\} \rangle} & (\mathbb{R}\text{-MOVE}) \\ \frac{P(i) = \langle R; \Lambda; (r := r' + v; I) \rangle}{\langle H; T; P \rangle \to \langle H; T; P\{i: \langle R\{r : R(r') + \hat{\mathbb{R}}(v)\}; \Lambda; I \rangle\} \rangle} & (\mathbb{R}\text{-ARITH}) \\ P(i) = \langle R; \Lambda; (\text{if } r = v \text{ jump } v'; \_) \rangle & \\ \frac{R(r) = v \quad H(\hat{\mathbb{R}}(v')) = \_\{I\}}{\langle H; T; P \rangle \to \langle H; T; P\{i: \langle R; \Lambda; I \rangle\} \rangle} & (\mathbb{R}\text{-BRANCHT}) \\ \frac{P(i) = \langle R; \Lambda; (\text{if } r = v \text{ jump } \_; I) \rangle \quad R(r) \neq v}{\langle H; T; P \rangle \to \langle H; T; \{i: \langle R; \Lambda; I \rangle\} \rangle} & (\mathbb{R}\text{-BRANCHF}) \\ \frac{P(i) = \langle R; \Lambda; (\alpha, r := \text{unpack } v; I) \rangle \quad \hat{\mathbb{R}}(v) = \text{pack } \tau, v' \text{ as } \_}{\langle H; T; P \rangle \to \langle H; T; P\{i: \langle R\{r : v'\}; \Lambda; I[\tau/\alpha] \rangle\} \rangle} \\ \frac{P(i) = \langle R; \Lambda; (\alpha, r := \text{unpack } v; I) \rangle \quad \hat{\mathbb{R}}(v) = \text{packL} \beta, v' \text{ as } \_}{\langle H; T; P \rangle \to \langle H; T; P\{i: \langle R\{r : v'\}; \Lambda; I[\beta/\alpha] \rangle\} \rangle} \\ (\mathbb{R}\text{-UNPACKL}) \end{split}$$

Figure 3.8: Operational semantics (control flow)

of any kind enables permission to load a value from a tuple. Only exclusive and linear locks permit storing a value into a tuple.

The rules for typing machine states are illustrated in Figure 3.13. They should be easy to follow. The only remark goes to heap tuples, where we make sure that all locks protecting the tuples are in the domain of the typing environment.

#### 3.5 Examples

We select a case in point of inter-process communication: mutual exclusion. We create a tuple and then start two threads that try to write in the tuple concurrently. A reference for lock  $\alpha$  is transferred to each new thread, by

| types               | $\tau$ ::=   | $int \   \ \langle \vec{\sigma} \rangle^{\alpha} \   \ \forall [\vec{\alpha}] \Gamma \text{ requires } \Lambda \   \ lock(\alpha) \  $ |
|---------------------|--------------|----------------------------------------------------------------------------------------------------------------------------------------|
|                     |              | $lockE(\alpha) \   \ lockS(\alpha) \   \ \exists \alpha.\tau \   \ \exists^{\mathtt{L}} \alpha.\tau \   \ \mu\alpha.\tau \   \ \alpha$ |
| init types          | $\sigma ::=$ | $\tau \mid ?\tau$                                                                                                                      |
| register file types | $\Gamma$ ::= | $r_1 \colon \tau_1, \ldots, r_n \colon \tau_n$                                                                                         |
| typing environment  | $\Psi$ ::=   | $\emptyset \mid \Psi, l \colon \tau \mid \Psi, lpha \colon Lock$                                                                       |
|                     |              |                                                                                                                                        |

Figure 3.9: Types

instantiating the universal value, since it is not in the scope of the forked threads.

```
\begin{array}{l} \text{main()} \left\{ \\ \alpha, r_1 := \text{newLock} - 1 \\ r_2 := \text{malloc} [\text{int}] \text{ guarded by } \alpha \\ r_2[0] := 0 \\ \text{unlockE} r_1 \\ \text{fork thread1}[\alpha] \\ \text{fork thread2}[\alpha] \\ \text{yield} \\ \end{array} \right\}
```

Each thread competes in acquiring lock  $\alpha$  using different strategies. In the first thread (thread1) we use a technique called *spin lock*, where we loop actively, not releasing the processor, until we eventually grab the lock exclusively. After that we jump to the critical region.

```
thread1 \forall [\alpha](r_1: \langle \mathsf{lock}(\alpha) \rangle^{\alpha}, r_2: \langle ?\mathsf{int} \rangle^{\alpha}) {

r_3 := \mathsf{tslE} \ r_1 - - exclusive because we want to write

if r_3 = 0 jump criticalRegion [\alpha]

jump thread1[\alpha]

}
```

In the critical region, the value contained in the tuple is incremented.

```
criticalRegion \forall [\alpha](r_1: \langle \mathbf{lock}(\alpha) \rangle^{\alpha}, r_2: \langle ?int \rangle^{\alpha}) requires(\alpha;;) {

r_3:=r_2[0]

r_3:=r_3+1

r_2[0] := r_3
```

$$\begin{split} \vdash \langle \sigma_{1}, \dots, \tau_{n}, \dots, \sigma_{n+m} \rangle^{\alpha} &<: \langle \sigma_{1}, \dots, ?\tau_{n}, \dots, \sigma_{n+m} \rangle^{\alpha} \qquad (\text{S-UNINIT}) \\ & \frac{n \leq m}{\vdash \mathbf{r}_{0} : \tau_{0}, \dots, \mathbf{r}_{m} : \tau_{m} <: \mathbf{r}_{0} : \tau_{0}, \dots, \mathbf{r}_{n} : \tau_{n}} \qquad (\text{S-REG-FILE}) \\ & \vdash \sigma <: \sigma \qquad \vdash \sigma <: \sigma' \qquad \vdash \sigma' <: \sigma'' \qquad (\text{S-REF, S-TRANS}) \\ & \frac{\vdash \tau' <: \tau}{\Psi, l : \tau' \vdash l : \tau} \qquad \Psi \vdash n : \text{ int } \qquad \Psi \vdash b : \text{ lock}(\alpha) \qquad \Psi \vdash ?\tau : ?\tau \\ & (\text{T-LABEL, T-INT, T-LOCK, T-UNINIT}) \\ & \frac{\Psi \vdash v : \tau'[\tau/\alpha] \qquad \alpha \notin \tau, \Psi}{\Psi \vdash \text{ pack } \tau, v \text{ as } \exists \alpha. \tau' : \exists \alpha. \tau'} \qquad \qquad \frac{\Psi \vdash v : \tau[\beta/\alpha] \qquad \alpha \notin \beta, \Psi}{\Psi \vdash \text{ pack } \beta, v \text{ as } \exists^{\perp} \alpha. \tau : \exists^{\perp} \alpha. \tau} \\ & (\text{T-PACK, T-PACKL}) \\ & \Psi; \Gamma \vdash r : \Gamma(r) \qquad \frac{\Psi \vdash v : \tau}{\Psi; \Gamma \vdash v : \tau} \qquad (\text{T-REG, T-VAL}) \\ & \frac{\Psi; \Gamma \vdash v : \forall [\alpha \vec{\beta}] \Gamma' \text{ requires } \Lambda}{\Psi; \Gamma \vdash v [\tau] : \forall [\vec{\beta}] \Gamma'[\tau/\alpha] \text{ requires } \Lambda[\tau/\alpha]} \qquad (\text{T-VAL-APP}) \end{split}$$

Figure 3.10: Typing rules for values  $\Psi \vdash v : \sigma$  and for operands  $\Psi; \Gamma \vdash v : \sigma$ 

## unlockE $r_1$ yield

}

In the second thread (thread2), we opt for a different technique called *sleep lock*. This strategy is cooperative towards other threads, since the control of the processor is not held exclusively until the lock's permission is granted. We try to acquire the lock exclusively. When we do, we jump to the critical region. If we do not acquire the lock, we fork a copy of this thread that will try again later.

```
thread2 \forall [\alpha](r_1: \langle \mathbf{lock}(\alpha) \rangle^{\alpha}, r_2: \langle int \rangle^{\alpha}) {

r_3:= \mathbf{tslE} r_1 - - exclusive because we want to write

if r_3 = 0 jump criticalRegion [\alpha]

fork thread2[\alpha]

yield

}
```

These two techniques have advantages over each other. A spin lock is faster. A sleep lock is fairest to other threads. When there is a reasonable expectation that the lock will be available (with exclusive access) in a short period of time it is more appropriate to use a spin lock. The sleep lock technique, however, does context switching, which is an expensive operation (*i.e.* degrades performance). A short coming of the spin lock in machines with only one processor is demonstrated in this example:

```
 \begin{array}{l} { { main () { \\ \alpha, r_1:= newLock -1 \\ fork release [\alpha] \\ jump spinLock[\alpha] \\ } \\ { release } \forall [\alpha] \ (r_1: \langle lock(\alpha) \rangle^{\alpha}) \ requires \ (\alpha;;) \\ { unlockE \ r_1 \\ yield \\ } \\ { spinLock } \forall [\alpha] \ (r_1: \langle lock(\alpha) \rangle^{\alpha}) \\ { r_2:= tslE \ r_1 \\ if \ r_2 = 0 \\ jump \ someComputation[\alpha] -- \ will \ never \ happen \\ jump \ spinLock[\alpha] \\ } \\ \end{array}
```

The permission of lock  $\alpha$  is given to the forked process that will execute the code block release — when a processor is available. Afterwards, a spin lock is performed to obtain lock  $\alpha$ . But because the sole processor is busy trying to acquire lock  $\alpha$ , the scheduled thread that can release it will never be executed.

The continuation passing style [7] suits programs written in MIL, since they are executed by a stack-less machine. In this programming model the *user* passes a continuation (a label) to a code block that proceeds in the continuation label after it is executed (either by forking or by jumping). It may be useful to pass *user data* to the continuation code (as one of its parameters). Existential types enable abstracting the type of the user data. Let *UserContinuation* stands for

 $\forall [\alpha](r_1: \langle ?int \rangle^{\alpha}) \text{ requires } (;; \alpha)$ 

Let PackedUserData stands for

 $\exists \beta. \langle \forall [\gamma](r_1:\beta) \text{ requires } (;;\gamma), \beta \rangle^{\alpha}$ 

The type *PackedUserData* is a pair that is divided into the continuation of type

 $\forall [\gamma](r_1:\beta) \text{ requires } (;;\gamma)$ 

and the user data of type  $\beta$ , respectively. The type of the user data and the type of register  $r_1$ , present in the continuation, is the same.

The pattern of usage for a value of type PackedUSerData is to unpack the continuation and the user data, and "apply" the former to the latter — by moving the user data into register  $r_1$ , and then jumping to the continuation. The user data is an *opaque* value to the block of code manipulating the pair. The advantages of an opaque user data is twofold. Firstly, the user data is unaltered by the manipulator. Secondly, the code using the pair is not bound to the type of the user data.

Next is an example of how to make a call to a code block that uses the continuation passing style:

main() {

```
\alpha := \mathsf{newLockLinear}
  r_2 := malloc[int] guarded by \alpha
  r_1 := malloc[ContinuationType, \langle ?int \rangle^{\alpha}] guarded by \alpha
  r_1[0] := \text{continuation}
  r_1[1] := r_2
  r_1 := \mathbf{pack} \ r_1, \ \langle ?int \rangle^{\alpha} \ \mathbf{as} \ \mathsf{PackedUserData}
  jump library [\alpha]
}
library [\alpha](r_1: \text{PackedUserData}) requires (;;\alpha) {
  —— do some computation…
  x, r_1 := unpack r_1 - - we do not need the packed type here
  r_2 := r_1[0]
                         -- the continuation
                          -- the user data
  r_1 := r_1[1]
  jump r_2[\alpha]
}
continuation ContinuationType {
  -- do some work
}
```

The code block *main* allocates the user data of type  $\langle ?int \rangle^{\alpha}$ . Afterwards, the user data is stored into a tuple, along with the label pointing to the continuation. The tuple is then packed and passed to the library, which eventually calls the continuation by unpacking the packed data and jumping to the stored label.

| $\Psi; \Gamma; (\emptyset, \emptyset, \lambda_L) \vdash yield \tag{T-YIELD}$                                                                                                                                                                                                                        |  |  |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| $\Psi; \Gamma \vdash v \colon \forall []\Gamma' \text{ requires } \Lambda \qquad \Psi; \Gamma; \Lambda' \vdash I \qquad \vdash \Gamma <: \Gamma' \qquad (T \text{ FORM})$                                                                                                                           |  |  |  |  |
| $\underbrace{\Psi; \Gamma; \Lambda \uplus \Lambda' \vdash fork \ v; I} \tag{1-FORK}$                                                                                                                                                                                                                |  |  |  |  |
| $\frac{\Psi, \alpha :: \operatorname{Lock}; \Gamma\{r : \langle \operatorname{lock}(\alpha) \rangle^{\alpha}\}; \Lambda \vdash I  \alpha \notin \Psi, \Gamma, \Lambda}{\Psi; \Gamma; \Lambda \vdash \alpha, r := \operatorname{newLock} 0; I}  (\text{T-NEW-LOCK } 0)$                              |  |  |  |  |
| $\Psi, \alpha :: \operatorname{Lock}; \Gamma\{r: \langle \operatorname{lock}(\alpha) \rangle^{\alpha}\}; (\lambda_E, \lambda_S \uplus \{\alpha\}, \lambda_L) \vdash I \qquad \alpha \notin \Psi, \Gamma, \Lambda$                                                                                   |  |  |  |  |
| $\Psi; \Gamma; \Lambda \vdash \alpha, r := newLock 1; I$                                                                                                                                                                                                                                            |  |  |  |  |
| (T-NEW-LOCK <b>1</b> )                                                                                                                                                                                                                                                                              |  |  |  |  |
| $\Psi, \alpha :: Lock; \Gamma\{r \colon \langle lock(\alpha) \rangle^{\alpha}\}; (\lambda_E \uplus \{\alpha\}, \lambda_S, \lambda_L) \vdash I \qquad \alpha \not\in \Psi, \Gamma, \Lambda$                                                                                                          |  |  |  |  |
| $\Psi; \Gamma; \Lambda \vdash \alpha, r := newLock -1; I$                                                                                                                                                                                                                                           |  |  |  |  |
| (T-NEW-LOCK -1)                                                                                                                                                                                                                                                                                     |  |  |  |  |
| $\frac{\Psi, \alpha :: Lock; \Gamma; (\lambda_E, \lambda_S, \lambda_L \uplus \{\alpha\}) \vdash I \qquad \alpha \notin \Psi, \Gamma, \Lambda}{\mathbb{T} - NEW-LOCKL} $ (T-NEW-LOCKL)                                                                                                               |  |  |  |  |
| $\Psi; \Gamma; \Lambda \vdash \alpha := newLockLinear; I$                                                                                                                                                                                                                                           |  |  |  |  |
| $\underline{\Psi; \Gamma \vdash v: \langle lock(\alpha) \rangle^{\alpha}  \Psi; \Gamma\{r: lockS(\alpha)\}; \Lambda \vdash I \qquad \alpha \notin \Lambda}_{I-\mathrm{TSLS}} $ (T-TSLS)                                                                                                             |  |  |  |  |
| $\Psi; \Gamma; \Lambda \vdash r := tslS \; v; I$                                                                                                                                                                                                                                                    |  |  |  |  |
| $\frac{\Psi; \Gamma \vdash v : \langle lock(\alpha) \rangle^{\alpha}  \Psi; \Gamma\{r : lockE(\alpha)\}; \Lambda \vdash I \qquad \alpha \notin \Lambda}{T-TSLE}  (T-TSLE)$                                                                                                                          |  |  |  |  |
| $\Psi; \Gamma; \Lambda \vdash r := tslE \ v; I \tag{1.122}$                                                                                                                                                                                                                                         |  |  |  |  |
| $\Psi; \Gamma \vdash v \colon \langle lock(\alpha) \rangle^{\alpha} \qquad \alpha \in \lambda_S \qquad \Psi; \Gamma; (\lambda_S \setminus \{\alpha\}, \lambda_E, \lambda_L) \vdash I$                                                                                                               |  |  |  |  |
| $\Psi; \Gamma; (\lambda_S, \lambda_E, \lambda_L) \vdash unlockS \ v; I $                                                                                                                                                                                                                            |  |  |  |  |
| (T-UNLOCKS)                                                                                                                                                                                                                                                                                         |  |  |  |  |
| $\Psi; \Gamma \vdash v \colon \langle lock(\alpha) \rangle^{\alpha} \qquad \alpha \in \lambda_E \qquad \Psi; \Gamma; (\lambda_S, \lambda_E \setminus \{\alpha\}, \lambda_L) \vdash I$                                                                                                               |  |  |  |  |
| $\Psi; \Gamma; (\lambda_S, \lambda_E, \lambda_L) \vdash unlockE \ v; I $ (The same equation)                                                                                                                                                                                                        |  |  |  |  |
| (T-unlockE)                                                                                                                                                                                                                                                                                         |  |  |  |  |
| $\begin{array}{ll} \Psi; \Gamma \vdash r \colon lockS(\alpha) & \Psi; \Gamma \vdash v \colon \forall [] \Gamma' \text{ requires } (\lambda_E, \lambda_S \uplus \{\alpha\}, \lambda'_L) \\ \Psi; \Gamma; \Lambda \vdash I & \vdash \Gamma <: \Gamma' & \lambda'_L \subseteq \lambda_L \end{array}$   |  |  |  |  |
| $\Psi; \Gamma; (\lambda_E, \lambda_S, \lambda_L) \vdash if \ r = 0 \ jump \ v; I$                                                                                                                                                                                                                   |  |  |  |  |
| (T-CRITICALS)                                                                                                                                                                                                                                                                                       |  |  |  |  |
| $\begin{array}{ll} \Psi; \Gamma \vdash r \colon lockE(\alpha) & \Psi; \Gamma \vdash v \colon \forall [] \Gamma' \text{ requires } (\lambda_E \uplus \{\alpha\}, \lambda_S, \lambda'_L) \\ & \Psi; \Gamma; \Lambda \vdash I & \vdash \Gamma <: \Gamma' & \lambda'_L \subseteq \lambda_L \end{array}$ |  |  |  |  |
| $\Psi; \Gamma; \Lambda \vdash if \ r = 0 \ jump \ v; I$                                                                                                                                                                                                                                             |  |  |  |  |
| (T-criticalE)                                                                                                                                                                                                                                                                                       |  |  |  |  |

Figure 3.11: Typing rules for instructions (thread pool and locks)  $\fbox{\Psi;\Gamma;\Lambda\vdash I}$ 

$$\frac{\Psi; \Gamma; \Lambda \vdash I \qquad \vdash \Gamma <: \Gamma' \qquad \lambda'_{L} \subseteq \lambda_{L}}{\Psi; \Gamma; \Lambda \vdash \text{if } r = 0 \text{ jump } v; I} \qquad (\text{T-BRANCH})$$

$$\frac{\Psi; \Gamma \vdash v: \forall []\Gamma' \text{ requires } (\lambda_{E}, \lambda_{S}, \lambda'_{L}) \qquad \vdash \Gamma <: \Gamma' \qquad \lambda'_{L} \subseteq \lambda_{L}}{\Psi; \Gamma; \Lambda \vdash \text{ jump } v} \qquad (\text{T-JUMP})$$

where  $\operatorname{type}(\tau) = \operatorname{type}(?\tau) = \tau$ .

Figure 3.12: Typing rules for instructions (memory and control flow)  $\fbox{\Psi;\Gamma;\Lambda\vdash I}$ 



Figure 3.13: Typing rules for machine states

## Chapter 4

## The $\pi$ -Calculus Run-time

In this chapter we describe a library, written in MIL, that implements the primitives of the  $\pi$ -calculus used to support the generated code, following the design of Lopes et al. [13].

We implement a message passing paradigm in a shared memory architecture. Because communication in the  $\pi$ -calculus version we select is asynchronous, the output process may be represented (in this run-time) by the message being transmitted itself, which is buffered until delivery in the representation of the transmitting channel. We also define a mechanism to *schedule* a process waiting for a message to be delivered, by blocking its execution, and then resuming it, when a message arrives. This mechanism is used to represent the input process and the replicated input process.

The *closure* of a process consists of an *environment* (the variables known by the process itself) and a *continuation* (a pointer to a code block that embodies the process). We store the closure of processes waiting for a message to be delivered in the target channel. When messages are delivered, we recover the state of the process — by applying the message and the environment into the continuation — and resume its execution.

We present a set of macros used to abstract the definition of types and of operations. For each introduced data structure, we define macros for describing its type, for allocating it, and for accessing the values that compose it. We build our library on top of queues defined in the Appendix A. On Section 4.1, we implement channels. Next we describe operations to send through, and to receive from, a channel. Lastly, we extend the definition of channels, associating a channel with its lock in a tuple (Section 4.3).



Figure 4.1: The closure of a process

#### 4.1 Channels Queues

In this section, we define the type of a process and the type of a channel queue. A *channel queue* is used for asynchronous message passing between processes, which consists of two pools, one for storing messages and another for storing processes. The continuation of a process is the code block that represents a process, expecting the message being delivered, the environment of the process, and a lock protecting the environment. A process is, therefore, composed by the continuation, the environment, and the environment's lock.

The continuation of a process may be defined by the parametrised type:

#### $\mathsf{ContType}(\tau_m, \tau_e) \stackrel{\mathsf{def}}{=} [\alpha](r_1; \tau_m, r_2; \tau_e, r_3; \langle \mathsf{lock}(\alpha) \rangle^{\alpha}) \text{ requires } (;\alpha;)$

Register  $r_1$  (of type  $\tau_m$ ) holds the received message, register  $r_2$  holds the environment of the process (of type  $\tau_e$ ), and register  $r_3$  holds the lock (of type  $\langle \text{lock}(\alpha) \rangle^{\alpha}$ ) that may protect the environment. Lock  $\alpha$  is abstracted by the universal operator. Notice that the code block only has permission to read values from the environment, because the environment of a process is a immutable structure, as can be observed in Chapter 5.

The type of the closure of a process, sketched by Figure 4.1, may be defined by:

 $\mathsf{ClosureType}(\tau, \alpha) \stackrel{\mathsf{def}}{=} \exists \tau_e. \langle \mathsf{ContType}(\tau, \tau_e), \tau_e, \mathsf{PLock} \rangle^{\alpha}$ 

The existential value that abstracts the environment (of type  $\tau_e$ ) of the process (allowing environments of different types) holds a tuple that consists of the continuation, the environment, and the lock type that protects the environment. The continuation process is of type ContType( $\tau, \tau_e$ ), communicating messages of type  $\tau$  and holding an environment of type  $\tau_e$ . The lock of the environment is defined by:



Figure 4.2: A process

PLock  $\stackrel{\text{def}}{=} \exists^{\mathrm{L}}\beta.\langle \mathsf{lock}(\beta) \rangle^{\beta}$ 

and is an existential lock value abstracting lock  $\beta$ . Observe that we have extended the definition of closure, by adding the lock of the environment. This extension grants access to data stored inside the environment. The macros for handling closures are:

ClosureAlloc  $(\tau_m, \tau_e, \alpha) \stackrel{\text{def}}{=} \text{malloc}[\text{ContType}(\tau_m, \tau_e), \tau_e, \text{PLock}]$  guarded by  $\alpha$ ClosureCont $(r) \stackrel{\text{def}}{=} r[0]$ ClosureEnv $(r) \stackrel{\text{def}}{=} r[1]$ ClosureLock $(r) \stackrel{\text{def}}{=} r[2]$ 

We may define a process (illustrated by Figure 4.2) as a pair containing the closure and the *keep in channel* flag (of type int):

 $\mathsf{ProcType}(\tau,\alpha) \stackrel{\mathsf{def}}{=} \langle \mathsf{ClosureType}(\tau,\alpha), \mathsf{int} \rangle^{\alpha}$ 

The flag is necessary for replicated input processes: after reduction, these processes remain in the channel queue, waiting for more messages. Macros to manipulate processes are:

 $\begin{aligned} &\operatorname{ProcAlloc}(\tau, \alpha) \stackrel{\text{def}}{=} \operatorname{\textbf{malloc}}[\operatorname{ClosureType}(\tau, \alpha), \text{ int}] \text{ guarded by } \alpha \\ &\operatorname{ProcClosure}(r) \stackrel{\text{def}}{=} r[0] \\ &\operatorname{ProcKeep}(r) \stackrel{\text{def}}{=} r[1] \end{aligned}$ 

The creation of a process is illustrated in the following example, where c has type ContType(int,  $\langle \rangle^{\beta}$ ) and  $r_1$  has type  $\langle \mathsf{lock}(\beta) \rangle^{\beta}$ :

 $r_3:=$  malloc [] guarded by  $\beta$  —— an empty environment is created  $r_2:=$  ClosureAlloc(int,  $\langle \rangle^{\beta}$ ,  $\alpha$ ) —— alloc the closure of a process ClosureCont( $r_2$ ) := c —— set the continuation to 'c'

In order to create a queue of processes, we need to define a sentinel process. Consider sink:

```
sink [\alpha, \tau_m, \tau_e](r_1:\tau_m, r_2:\tau_e, r_3:\langle \mathsf{lock}(\alpha) \rangle^{\alpha}) requires (;\alpha;) {
unlockS r_3
yield
}
```

Instantiating the universal value sink with  $\tau_m$  and  $\tau_e$  (sink  $[\tau_m][\tau_e]$ ) results in a value of type ContType( $\tau_e, \tau_m$ ). We may use this continuation in sentinel processes.

The creation of a sentinel process may be defined by macro ProcCreate-Sentinel( $r, r_t, \tau, \alpha$ ), where:

- r is the register that will refer the process of type  $\mathsf{ProcType}(\tau, \alpha)$ ;
- $r_s$  is the register that will refer the closure of type  $\mathsf{ClosureType}(\tau, \alpha)$ ;
- $\tau$  is the type of the messages being transmitted;
- $\alpha$  is the lock protecting the channel.

Defined by:

```
ProcCreateSentinel (r, r_s, \tau, \alpha) \stackrel{\text{def}}{=}
    \delta, r := newLock 0
                                           -- create a dummy lock
    r := \mathbf{packL} \ \delta, r \ \mathbf{as} \ \mathsf{PLock}
                                           -- abstract the lock
    r_s := \text{ClosureAlloc}(\tau, \text{int}, \alpha)
                                           -- alloc the closure of the process
     ClosureCont(r_s) := sink[int][\tau] -- set the continuation to 'sink'
     ClosureEnv(r_s) := 0
                                            -- set value 0 as the environment
     ClosureLock(r_s) := r
                                           -- set the lock of the environment as r
    r_s := pack int, r_s as ClosureType(\tau, \alpha) - abstract the environment
     r := \mathsf{ProcAlloc}(\tau, \alpha)
                                           -- alloc the sentinel process
     ProcClosure(r) := r_s
                                           -- set the closure as the one in r_s
     ProcKeep(r) := 0
                                           -- do not keep this process in the queue
```

Channel queues (Figure 4.3) may be defined by:



Figure 4.3: A channel queue

ChannelQueueType $(\tau,\alpha) \stackrel{\text{def}}{=} \langle \text{int, QueueType}(\tau,\alpha), \text{QueueType}(\text{ProcType}(\tau,\alpha),\alpha) \rangle^{\alpha}$ representing a tuple, protected by lock  $\alpha$ , that holds a flag (of type int) marking the kind of contents of the channel queue, a queue of messages (of type QueueType $(\tau,\alpha)$ ), and a queue of processes (of type QueueType(ProcType  $(\tau,\alpha),\alpha$ )). The flag can assume one of three different values: 0 indicates the channel queue is empty; 1 indicates at least one message enqueued; and 2 represents at least one enqueued process. The macros for manipulating channel queues are:

> ChannelQueueAlloc( $\tau, \alpha$ )  $\stackrel{\text{def}}{=}$  malloc[int, QueueType( $\tau, \alpha$ ), QueueType(ProcType( $\tau, \alpha$ ),  $\alpha$ )] guarded by  $\alpha$

| ChannelQueueState(r)     | $\stackrel{def}{=} r[0]$ |
|--------------------------|--------------------------|
| ChannelQueueMsgs(r)      | $\stackrel{def}{=} r[1]$ |
| ChannelQueueProcs(r)     | $\stackrel{def}{=} r[2]$ |
| CHANNEL_QUEUE_EMPTY      | $\stackrel{def}{=} 0$    |
| CHANNEL_QUEUE_WITH_MSGS  | $\stackrel{def}{=} 1$    |
| CHANNEL_QUEUE_WITH_PROCS | $\stackrel{def}{=} 2$    |

The creation of a channel queue, having register  $r_1$  type ProcType(int, $\alpha$ ) (as in the result of the previous example), is:

The last definition in this section,  $ChannelQueueCreate(r, r_m, r_t, \tau, \alpha)$ , is the creation of a channel queue, where:

- r is the register that will refer the new empty channel of type Channel-QueueType(τ, α);
- r<sub>q</sub> is the register that will hold the queue of processes of type Queue-Type(ProcType(τ,α),α);
- $r_e$  is the register that will point to the sentinel element of the queue of processes;
- v is a sentinel message of type  $\tau$ ;
- $\tau$  is the type of the transmitted messages;
- $\alpha$  is the lock protecting the channel.

Notice that value v is unaltered in this macro. Furthermore, the value v may only be register  $r_q$ .

#### 4.2 Communication

In this section we list the two operations responsible for communications using channels: sendMessage and receiveMessage. The former sends a message through a channel. The latter expects a continuation, where the message will be received, and an environment (user data that is available in the continuation). In either case, when a thread jumps to any of these operations, it is not guaranteed that the processor will be yield, or continue executing. Both operations require exclusive access to the lock of the channel, since they alter its internal state, thus representing a point of contention (in regards to the protecting lock).

**Send message.** We define an operation to send a message through a channel queue (depicted in Figure 4.4):

```
sendMessage [\alpha, \tau] — the protecting lock and the message's type are abstracted
 1
 \mathbf{2}
                       (r_1: \tau,
                                                       -- the message being sent
 3
                        r_2: ChannelQueueType(\tau, \alpha), -- the target channel
                        r_3: \langle \mathsf{lock}(\alpha) \rangle^{\alpha}
                                                       -- the channel's lock
 4
                        requires (\alpha;;) {
 5
                                                       -- exclusive access to the
                             channel
 6
      r_4:= ChannelQueueState(r_2) -- get the state of the channel
 7
      -- verify if there are process waiting for a message
 8
      if r_4 = CHANNEL_QUEUE_WITH_PROCS
 9
         -- when there are, deliver the message:
10
        jump sendMessageReduce[\tau][\alpha]
11
      -- else no processes in the channel. enqueue the message
12
13
      -- flag the channel as containing messages:
14
      ChannelQueueState(r_2) := CHANNEL_QUEUE_WITH_MSGS
15
      r_2:= ChannelQueueMsgs(r_2) -- get the queue of messages
16
      QueueAdd(r_2, r_1, r_4, r_1, \tau, \alpha) — put the message (r_1) in the queue
17
                                     -- unlock the channel's lock
      unlockE r_3
18
      yield
                                     -- release control of the processor
19 \}
```

This operation either jumps to the code block sendMessageReduce, when there are processes waiting for a message to arrive, or yields the processor's control, after placing the message in the channel.



Figure 4.4: The activity diagram outlines the execution of sendMessage, where activities in italic represent code blocks (identified by their labels).

In the following, we list the code block sendMessageReduce. This code block assumes that there is at least one processes waiting for a message to arrive.

```
1 sendMessageReduce[\alpha, \tau] — the protecting lock and the message's type are abstracted
```

```
\mathbf{2}
                                                         -- the message being sent
                        (r_1: \tau,
 3
                         r_2: ChannelQueueType(\tau, \alpha), -- the target channel
 4
                         r_3: \langle \mathsf{lock}(\alpha) \rangle^{\alpha}
                                                       -- the channel's lock
 5
                         requires (\alpha;;) {
                                                         -- exclusive access to \alpha
 6
       r_6:= ChannelQueueProcs(r_2) -- moves the queue of processes to r_6
 \overline{7}
       -- remove a process from the queue, and move it into r_4
       --r_5 holds the number of processes in the queue:
 8
 9
       QueueRemove(r_6, r_5, r_4)
10
                                       -- check if the process is to be kept
       r_7 := \operatorname{ProcKeep}(r_4)
11
       if r_7 = 0
12
        -- do not keep it in channel, then deliver the message
13
         jump execProcCheckEmpty[\tau][\alpha]
14
       -- keep the process in the channel after delivery
15
                                       -- move the message into r_7
       r_7 := r_4
16
       -- add process the back into the queue:
17
       QueueAdd(r_6, r_7, r_8, r_4, ProcType(\tau, \alpha),\alpha)
18
       -- deliver the message to the process (in r_4)
19
       -- we are certain that the state of the channel is unaltered (with processes)
20
       unlockE r_3
                                     -- no exclusive access needed
21
       jump execProc[\tau][\alpha]
                                     -- deliver the process
22 }
```

We begin by removing the process from the channel. After that, we verify if we need to put the process back in the channel, by verifying its keep in channel flag. We then jump to execProcCheckEmpty.

Next, is the code block execProcCheckEmpty:

1 execProcCheckEmpty  $[\alpha, \tau]$  — the protecting lock and the message's type are abstracted

| 2 | $(r_1: 	au,$                                 | —— the message being sent  |
|---|----------------------------------------------|----------------------------|
| 3 | $r_2$ : ChannelQueueType( $	au$ , $lpha$ ),  | —— the target channel      |
| 4 | $r_3$ : $\langle lock(lpha)  angle^{lpha}$ , | —— the channel's lock      |
| 5 | $r_4$ : ProcType( $	au$ , $lpha$ ),          | —— the receiving process   |
| 6 | $r_5$ : int)                                 | —— the remaining processes |
| 7 | requires $(\alpha;;)$                        | exclusive access to $lpha$ |
| 8 |                                              |                            |

9 -- verify if the channel became empty: 10 if  $r_5 = 0$ 11 -- the channel became empty, mark and then reduce: 12**jump** execProcFirstEmpty[ $\tau$ ][ $\alpha$ ] -- mark it as empty and continue delivery 13-- no need to change the channel's state, continue with the reduction 14 15-- we don't need access to the channel to deliver the message 16unlockE  $r_3$ 17**jump** execProc[ $\tau$ ][ $\alpha$ ]

```
18 }
```

The code block checks if the channel needs to be marked as empty, before delivering the message to the process (in code block code block execProc). We depict code block execProcFirstEmpty:

1 execProcFirstEmpty  $[\alpha, \tau]$  — the protecting lock and the message's type are abstracted

| 2 | $(r_1:	au,$                                  | the message being sent  |
|---|----------------------------------------------|-------------------------|
| 3 | $r_2$ : ChannelQueueType( $	au, lpha$ ),     | — the target channel    |
| 4 | $r_3$ : $\langle lock(lpha)  angle^{lpha}$ , | —— the channel's lock   |
| 5 | $r_4$ : ProcType( $	au$ , $lpha$ ))          | the receiving process   |
| 6 | requires ( $\alpha$ ;;) {                    | exclusive access to the |
|   | channel                                      |                         |
| 7 | — mark the channel as empty:                 |                         |

```
8 ChannelQueueState(r_2) := CHANNEL_QUEUE_EMPTY
```

- 9 **unlockE**  $r_3$  we don't need access to the channel to deliver the message
- 10 **jump** execProc[ $\tau$ ][ $\alpha$ ]

```
11 }
```

where the state of the channel is marked as empty and then continues the delivery.

We present code block

execProc  $[\alpha, \tau]$  — the protecting lock and the message's type are abstracted 1  $\mathbf{2}$ -- the message being sent  $(r_1: \tau,$ 3  $r_2$ : ChannelQueueType( $\tau$ , $\alpha$ ), -- the target channel 4  $r_3$ :  $(\operatorname{lock}(\alpha))^{\alpha}$ , -- the channel's lock  $r_4$ : ProcType( $\tau, \alpha$ )){ 5-- the receiving process 6 -- spin lock to acquire the channel's lock: 7 $r_5 :=$ tslS  $r_3 --$  we only need read access to the channel 8 if  $r_5 = 0$ 9 -- we have access, continue with the delivery :

```
10jump execContinue[\tau][\alpha]11-- try again:12jump execProc[\tau][\alpha]
```

```
13 }
```

The code block spins to get shared access to the channel's lock. Upon success it jumps to execContinue.

Following is a list of code block execContinue:

| 1  | execContinue[ $\alpha$ , $\tau$ ] the protecting lock and the message's type are abstracted |  |  |
|----|---------------------------------------------------------------------------------------------|--|--|
| 2  | $(r_1: 	au, 	extsf{$                                                                        |  |  |
| 3  | $r_2$ : ChannelQueueType $(	au, lpha), \$ the target channel                                |  |  |
| 4  | $r_3$ : $\langle lock(lpha)  angle^{lpha}$ , $$ the channel's lock                          |  |  |
| 5  | $r_4$ : ProcType $(\tau, \alpha)$ ) — the receiving process                                 |  |  |
| 6  | requires (; $\alpha$ ;) { shared access to the channel                                      |  |  |
| 7  | $r_4$ := ProcClosure $(r_4)$ $$ get the closure of the process                              |  |  |
| 8  | $	au_e$ , $r_4$ := unpack $r_4$ $$ unpack it and get the type of the environment $(	au_e)$  |  |  |
| 9  | $r_5 := r_3$ —— move channel's lock into $r_5$                                              |  |  |
| 10 | $r_3{:=} ClosureLock(r_4)$ $$ get the packed lock of the environment                        |  |  |
| 11 | $eta, \ r_3 :=$ unpack $r_3 \qquad$ unpack the lock of the environment                      |  |  |
| 12 | $r_2 := ClosureEnv(r_4) $ move the environment into $r_2$                                   |  |  |
| 13 | $r_4$ := ClosureCont $(r_4)$ move the continuation into $r_4$                               |  |  |
| 14 | unlockS $r_5$ —— we don't need more access to the channel's lock                            |  |  |
| 15 | try to get the lock of the environment, then continue delivery :                            |  |  |
| 16 | jump execProcGrabLock $[	au_e][	au][eta]$                                                   |  |  |
| 17 | }                                                                                           |  |  |

The closure is unpacked, as well as the associated lock, and the lock of the channel released. After that, it jumps to execProcGrabLock, to get the lock of the environment.

We show the code block

```
1 execProcGrabLock [\alpha, \tau_m, \tau_e] — the protecting lock, the message's type, and the type of the environment are abstracted
```

```
\mathbf{2}
                                      (r_1: \tau_m,
                                                                      —— the message
3
                                                                      -- the environment
                                       r_2: \tau_e,
                                       r_3: (\mathsf{lock}(\alpha))^{\alpha},
                                                                      -- the environment's lock
4
5
                                       r_4: ContType(\tau_m, \tau_e)) { -- the continuation
      -- spin lock to get the shared access:
6
7
      r_5 := tslS r_3
8
       if r_5 = 0
9
```

```
10 -- we have shared access, jump to continuation:

11 jump r_4[\alpha]

12 -- try again:

13 jump execProcGrabLock[\tau_e][\tau_m][\alpha]

14 }
```

which is a spin lock (for a shared lock) that jumps to the continuation when it is successful.

**Receive a message.** We describe receiveMessage, another core operation, where we schedule a process to receive a message. The outline of the algorithm, depicted by Figure 4.5, is to place the process in the buffer, if there are no messages to be delivered. Otherwise, we execute the process. When a process has the flag to keep in the channel queue set, all the messages in the channel queue are consumed by that process.

1 receiveMessage [ $\alpha$ , -- the lock of the channel,  $\mathbf{2}$  $\beta$ , -- the lock of the environment, 3  $au_m$ , -- the type of the message,  $\tau_e$ ] -- and the type of the environment, are abstracted 4 5 $(r_1: ContType(\tau_m, \tau_e)),$ —— the continuation 6  $r_2$ : ChannelQueueType( $\tau_m, \alpha$ ), -- the target channel 7  $r_3$ :  $(\mathsf{lock}(\alpha))^{\alpha}$ , -- the channel's lock 8 -- the environment  $r_4$ :  $\tau_e$ , 9 -- the environment's lock: 10  $r_5$ :  $(\operatorname{lock}(\beta))^{\beta}$ , 11 -- the keep in channel flag : 12 $r_6$ :int) 13requires  $(\alpha;;)$  { 14  $r_5 := packL \ \beta, r_5$  as PLock -- pack the environment's lock 15 $r_7$ := ClosureAlloc $(\tau_m, \tau_e, \alpha)$  -- alloc the closure  $ClosureCont(r_7) := r_1 - set the continuation$ 1617 $ClosureEnv(r_7) := r_4 - set the environment$ 18 $ClosureLock(r_7) := r_5 - -$  set the packed lock 19 $r_7 := \text{pack } \tau_e, r_7 \text{ as ClosureType}(\tau_m, \alpha) - abstract the environment of closure$  $r_4 := \operatorname{ProcAlloc}(\tau_m, \alpha) - - \text{ alloc a process}$ 2021 $ProcClosure(r_4) := r_7 - set the packed closure of the process$ 22 $ProcKeep(r_4) := r_6$  — set the keep on channel flag 23-- test the state of the channel: 24  $r_1 := ChannelQueueState(r_2)$ 



Figure 4.5: The activity diagram outlines the execution of receiveMessage, where activities in italic represent code blocks (identified by their labels).

```
25
      if r_1 = CHANNEL_QUEUE_WITH_MSGS
26
        -- when there are messages, consume the message:
27
        jump receiveMessageReduce[\tau_m][\alpha]
28
      -- otherwise enqueue the process
29
      -- mark the channel with procs:
30
      ChannelQueueState(r_2) := CHANNEL_QUEUE_WITH_PROCS
31
      r_1:= ChannelQueueProcs(r_2) -- get the queue of processes
32
      -- place the process in the queue:
33
      QueueAdd(r_1, r_4, r_5, r_4, \mathsf{ProcType}(\tau_m, \alpha), \alpha)
34
      unlockE r_3
                                     -- we don't need to alter the channel anymore
35
      vield
                                     -- give the control of the processor back
36 }
```

Next, we present the code block receiveMessageReduce. In Figure 4.6 we show a flow-chart that depicts the execution of the following block of code.

```
1 receiveMessageReduce [\alpha, \tau] — the protecting lock and the message's type are abstracted
```

```
\mathbf{2}
                        (r_2: ChannelQueueType(\tau, \alpha), -- the target channel
 3
                         r_3: (\operatorname{lock}(\alpha))^{\alpha},
                                                         -- the channel's lock
                         r_4: ProcType(\tau, \alpha))
 4
                                                       -- the receiving process
 5
                         requires (\alpha;;) {
                                                        -- exclusive access
 6
       -- first, we check if the process stays in the channel after delivery :
 7
       r_1 := \operatorname{ProcKeep}(r_4)
 8
       if r_1 = 1
 9
        -- if so consume all messages in the channel:
10
        jump receiveMessageConsume[\tau][\alpha]
11
       -- otherwise remove a message from the channel and proceed with delivery
       -- get the queue of messages:
12
13
       r_6:= ChannelQueueMsgs(r_2)
14
       -- remove one message from the queue, keeping it in r_1
15
       QueueRemove(r_6, r_5, r_1)
16
       -- verify if the channel is empty and then reduce:
       jump execProcCheckEmpty[\tau][\alpha]
17
18 }
```

Next, we depict the code block receiveMessageConsume. This code block just prepares the registers for code block receiveAllMessages.

1 receiveMessageConsume $[\alpha, \tau]$  — the protecting lock and the message's type are abstracted



Figure 4.6: The activity diagram outlines the execution of receiveMessage-Consume, where activities in italic represent code blocks (identified by their labels).

```
\mathbf{2}
                        (r_2: ChannelQueueType(\tau, \alpha), -- the target channel
3
                         r_3: (\mathsf{lock}(\alpha))^{\alpha},
                                                           -- the channel's lock
4
                         r_4: ProcType(\tau, \alpha))
                                                           -- the receiving process
5
                         requires (\alpha;;) {
                                                           -- exclusive access
      r_5:= ChannelQueueMsgs(r_2) -- move the queue of messages into r_5
6
7
      -- start the consuming loop:
8
      jump receiveAllMessages[\tau][\alpha]
9
  }
```

Following, we show the code block receiveAllMessages, which consumes all messages that exist in the queue, while forking the deliveries.

| 1  | receiveAllMessages $[\alpha, \tau]$ — the protecting lock and the message's type are abstracted |
|----|-------------------------------------------------------------------------------------------------|
| 2  | $(r_2$ : ChannelQueueType $(	au, lpha)$ , $$ the target channel                                 |
| 3  | $r_3$ : $(\mathbf{lock}(\alpha))^{lpha}$ , $$ the channel's lock                                |
| 4  | $r_4$ : ProcType $(\tau, \alpha)$ , $$ the receiving process                                    |
| 5  | $r_5$ : QueueType $(\tau, \alpha)$ ) — the queue of messages                                    |
| 6  | requires $(\alpha;;)$ { $$ exclusive access                                                     |
| 7  | QueueRemove $(r_5,r_6,r_1)$ $$ remove one message, moving it to $r_1$                           |
| 8  | —— deliver the removed message:                                                                 |
| 9  | <b>fork</b> execProc $[\tau][\alpha]$ — start the process in a different thread                 |
| 10 | if $r_6 = 0$ —— check if the channel is empty                                                   |
| 11 | —— channel is empty, finish loop:                                                               |
| 12 | <b>jump</b> receiveAllMessagesFinish $[\tau][\alpha]$                                           |
| 13 | —— continue looping:                                                                            |
| 14 | jump receiveAllMessages[ $	au$ ][ $lpha$ ]                                                      |
| 15 | }                                                                                               |

Finally, we describe the code block receiveAllMessagesFinish that places the process in the channel and updates its state.

```
receiveAllMessagesFinish [\alpha, \tau](-- the target channel:
 1
 \mathbf{2}
                                         r_2: ChannelQueueType(\tau, \alpha),
                                                           —— the channel's lock
 3
                                         r_3: (\mathsf{lock}(\alpha))^{\alpha},
                                         r_4: ProcType(\tau,\alpha)
 4
                                                                 -- the replicated process
 5
                                         ) requires (\alpha;;) {
                                                                  -- exclusive access
 \mathbf{6}
       —— mark the channel with processes:
 7
       ChannelQueueState(r_2) := CHANNEL_QUEUE_WITH_PROCS
 8
       r_1:= ChannelQueueProcs(r_2) — get the queue of processes
 9
       QueueAdd(r_1, r_4, r_2, r_4, \text{ProcType}(\tau, \alpha), \alpha) — add the process to the queue
10
```

```
11unlock r_3-- unlock the channel's lock12yield-- stop execution13}
```

#### 4.3 Channels

We extend our run-time to include support for channels with *private* locks (*i.e.* one lock per channel). We define one more type that embodies this concept. We also define two operations analogous to sendMessage and to receiveMessage, but operate on channels with a private lock. The idea is to store a channel queue and the protecting channel in a tuple, while abstracting the lock name with the existential lock value. Every channel in the system is then protected by the same *global lock*, which protects these new data structures.

We distinguish channel queues from channels with private locks: code that targets the former must know the lock that protects it, whereas code that targets the latter is not aware of the lock that protects the channel, but knows the global lock instead. Channel queues may share a lock amongst each other. Channels, however, do not; each channel has a private lock, thus reducing contention.

We may define a *channel* type as

 $\mathsf{ChannelType}(\tau,\alpha) \stackrel{\mathsf{def}}{=} \exists^{\mathsf{L}}\beta.\langle\mathsf{ChannelQueueType}(\tau,\beta),\langle\mathsf{lock}(\beta)\rangle^{\beta}\rangle^{\alpha}$ 

a tuple that includes a channel and the lock that protects it. Parameter  $\tau$  is the type of the messages being transmitted. Parameter  $\alpha$  is the global lock that protects the tuple holding the channel queue and its lock.

We define macros for handling packed channels:

ChannelAlloc $(\tau, \beta, \alpha) \stackrel{\text{def}}{=}$  malloc[ChannelQueueType $(\tau, \beta), \langle \text{lock}(\beta) \rangle^{\beta}$ ] guarded by  $\alpha$ 

ChannelChannelQueue(r)  $\stackrel{\text{def}}{=} r[0]$ 

 $\mathsf{ChannelLock}(r) \stackrel{\mathsf{def}}{=} r[1]$ 

The initialisation of this structure may be defined by macro Channel-CreateEmpty $(r, r_l, r_c, r_q, r_e, v, \tau, \alpha)$ , where:

• r is the register that will refer the channel of type  $\mathsf{ChannelType}(\tau, \alpha)$ ;

- $r_l$  is the register that will refer the abstracted lock protecting the channel queue of type  $\exists^{\mathrm{L}}\beta$ .  $(\operatorname{lock}(\beta))^{\beta}$ ;
- r<sub>c</sub> is the register that will point to the channel queue of type Channel-QueueType(τ, β);
- r<sub>q</sub> is the register that will hold the queue of processes of type Queue-Type(ProcType(τ, β), β);
- $r_e$  is the register that will point to the sentinel of the queue of processes;
- v is the sentinel message of type  $\tau$ ;
- $\tau$  is the type of the transmitted messages;
- $\alpha$  is the global lock.

```
ChannelCreateEmpty(r, r_l, r_c, r_q, r_e, v, \tau, \alpha) \stackrel{\text{def}}{=}
```

| $\beta$ , $r_l := newLock - 1$                       | —— create the channel's lock           |  |
|------------------------------------------------------|----------------------------------------|--|
| ChannelQueueCreate $(r_c, r_q, r_e, v, \tau, \beta)$ | —— create the channel queue, protected |  |
| by $\beta$                                           |                                        |  |
| unlockE r <sub>l</sub>                               | we don't need access to the channel    |  |
| r := ChannelAlloc(	au,eta,lpha)                      | —— alloc the channel                   |  |
| $ChannelChannelQueue(r) := r_c$                      | —— set the channel queue               |  |
| $ChannelLock(r) := r_l$                              | —— set the private lock                |  |
| —— abstract the private lock:                        |                                        |  |
| $r := packL\ eta, r  as  ChannelType(	au, lpha)$     |                                        |  |

Keep in mind that we need exclusive access to the global lock  $\alpha$  in order to use this macro. Also that v is unaltered after the expansion of this macro. Notice that v may only be registers r and  $r_q$ .

We extend the two operations sendMessage and receiveMessage, by acquiring the global lock with shared access in order to unpack the channel, and then acquiring exclusive access to the channel queue. Finally, the code block jumps to the actual operation. We only list the extension of the operation to send a message, because the extension of receiveMessage is very similar (only changing the register used for temporary operations and the labels of the code blocks).

We list the code block send that tries to acquire the global lock and then jumps to sendUnpack:

1 send  $[\alpha, \tau]$  -- the global lock and the type of the message are abstracted  $\mathbf{2}$ -- the message being sent  $(r_1: \tau,$ 3  $r_4$ : ChannelType $(\tau, \alpha)$ , -- the target channel 4  $r_5: \langle \mathsf{lock}(\alpha) \rangle^{\alpha} \} \{$ —— the global lock 5-- spin lock to acquire the global lock  $\alpha$ 6  $r_2 := \mathsf{tslS} r_5$ 7**if**  $r_2 = 0$ 8 -- acquired the lock, unpack the channel 9 **jump** sendUnpack[ $\tau$ ][ $\alpha$ ] 10-- try again 11 jump send[ $\tau$ ][ $\alpha$ ] 12 }

Next, we show code block sendUnpack that unpacks the channel and loads the channel queue and its lock and then jumps to sendMessageGrabLock:

```
sendUnpack [\alpha, \tau] — the global lock and the type of the message are abstracted
 1
 2
                                           —— the message being sent
                (r_1: \tau,
 3
                 r_4: ChannelType(\tau, \alpha), -- the target channel

— the global lock
— shared access to the global lock

 4
                 r_5: (\mathsf{lock}(\alpha))^{\alpha})
                 requires (;\alpha;) {
 5
                                             -- unpack the channel
 6
      \beta, r_4:= unpack r_4
 \overline{7}
      r_2:= ChannelChannelQueue(r_4) -- move the channel queue to r_2
 8
       r_3 := \mathsf{ChannelLock}(r_4)
                                             -- move the channel queue's lock to r_3
 9
       unlockS r_5
                                             -- unlock the global lock
       jump sendMessageGrabLock[\tau][\beta] -- acquire the channel's lock
10
11 }
```

Finally, we depict the code block sendMessageGrabLock that spin locks to get exclusive access to the lock protecting the channel and then jumps to sendMessage (after acquiring the lock):

1 sendMessageGrabLock  $[\alpha, \tau]$  — the lock and the type of the message are abstracted

```
\mathbf{2}
                                      (r_1: \tau,
                                                                           -- the message
3
                                       r_2: ChannelQueueType(\tau, \alpha), -- the channel queue
4
                                       r_3: (\operatorname{lock}(\alpha))^{\alpha}) \{
                                                                 -- the channel's lock
5
      -- spin lock for exclusive access:
6
      r_4 := \mathsf{tslE} r_3
7
      if r_4 = 0
8
         -- send the message:
        jump sendMessage[\tau][\alpha]
9
```

10 -- try again: 11 jump sendMessageGrabLock[ $\tau$ ][ $\alpha$ ] 12 }

> Concerning the extension of the operation to receive a message, we show the type of the code block receive:

receive  $[\alpha, -- the global lock,$  $\beta$ , -- the environment's lock,  $\tau_m$ , -- the type of the message,  $\tau_e$ ] -- and the type of the environment are abstracted  $(r_1: ContType(\tau_m, \tau_e)),$ -- the continuation -- the environment  $r_2$ :  $\tau_e$ ,  $r_3$ :  $\langle \mathsf{lock}(\beta) \rangle^{\beta}$ , -- the environment's lock  $r_4$ : ChannelType( $au_m$ ,lpha), -- the target channel  $r_5$ :  $\langle \mathsf{lock}(\alpha) \rangle^{\alpha}$ , -- the global lock -- the flag keep in channel  $r_6$ : int)

For convention's sake, the label of the code block that unpacks the channel is receiveUnpack; the label of the code block that grabs the lock of the channel queue is named receiveMessageGrabLock. The implementation of the three code blocks is straight forward.

#### Chapter 5

# Translating the $\pi$ -calculus into MIL

The translation from the  $\pi$ -calculus into MIL comprises three parts: the translation of types with function  $\mathcal{T}[\![\cdot]\!](\gamma)$ , the translation of values with function  $\mathcal{V}[\![\cdot]\!](\vec{x}, r)$ , and the translation of processes with function  $\mathcal{P}[\![\cdot]\!](\Gamma)$ . Our translation functions are conditioned by the  $\pi$ -calculus run-time (Chapter 4).

We begin by defining the function that translates types:

$$\mathcal{T}\llbracket int \rrbracket(\gamma) \stackrel{\text{def}}{=} \text{int} \qquad \mathcal{T}\llbracket(T)\rrbracket(\gamma) \stackrel{\text{def}}{=} \text{ChannelType}(\mathcal{T}\llbracket T \rrbracket(\gamma), \gamma)$$

Parameter  $\gamma$  is the global lock for pairing a channel with its protective lock. Recall that  $\gamma$  is protecting the structure that holds the channel queues, not affecting the operations on channels themselves.

Notwithstanding, two processes running in parallel that are creating each a channel at the same time are serialised because of the global lock. As are two processes wanting to read values from different channels at the same time. Contention is not critical, however, since the creation of channels is less usual than other operations on channels.

For simplicity we create a new environment whenever a new name is defined. The motivation is twofold. First, immutable environments may be shared among threads *without* contention. Second, processors may increase performance, by exploiting the locality of frames [3]. This is the motivation, in the run-time library, for continuations of processes requiring shared access to environments (reflecting its usage in the translation). When we create a



Figure 5.1: Example of the creation of a new environment, based on a old one.

new environment, we only copy the free names of that process (Figure 5.1), therefore attaining an optimised memory usage (in what concerns to possible values to copy).

The macro related to environments may be defined as:

$$\mathsf{EnvType}(\vec{x}, \Gamma, \gamma, \alpha) \stackrel{\text{def}}{=} \langle \mathcal{T}\llbracket \Gamma(x_0) \rrbracket(\gamma), \cdots, \mathcal{T}\llbracket \Gamma(x_n) \rrbracket(\gamma) \rangle^{\alpha}$$
$$\mathsf{EnvAlloc}(\vec{x}, \Gamma, \gamma, \alpha) \stackrel{\text{def}}{=} \mathsf{malloc} \ [\mathcal{T}\llbracket \Gamma(x_0) \rrbracket(\gamma), \cdots, \mathcal{T}\llbracket \Gamma(x_n) \rrbracket(\gamma)] \text{ guarded by } \alpha$$

The type of each name is translated into MIL.

The translation of values is straightforward:

$$\mathcal{V}\llbracket v \rrbracket (\vec{x}, r) \stackrel{\text{def}}{=} \begin{cases} v & \text{if } v \text{ is of type } baseval \\ r[i] & \text{if } v = x_i \text{ where } \vec{x} = x_0 \cdots x_i \cdots x_n \end{cases}$$

Since there exists a one-to-one relation of literals between source and target languages, if it is a literal value, then we use it. If it is a variable, then we must get it from the environment (held in r).

We are now ready to define the translation of processes. The translation becomes simpler because the run-time library supports the channel communication. Yet, we must still maintain the environments of processes.

We start by defining the top-level translation function that defines code

block main and code block grabLock and further translates process P:

```
\mathcal{P}\llbracket P \rrbracket (\Gamma) \stackrel{\text{def}}{=}
ChannelsLock = \langle \mathbf{0} \rangle^{\gamma} : \langle \mathsf{lock}(\gamma) \rangle^{\gamma}
continuationType: \forall [\alpha, \tau] (r_2: \tau, r_3: (\mathsf{lock}(\alpha))^{\alpha}) requires (; \alpha; )
grabLock \forall [\alpha, \tau](
                                                                                      main () requires (\gamma;;) {
                                                                                           \alpha, r_3 := \mathsf{newLock} \ \mathbf{1}
          r_2: \tau,
          r_3: \langle \mathsf{lock}(\alpha) \rangle^{\alpha},
                                                                                          r_2 := [] guarded by \alpha
          r_4: continuationType){
                                                                                          jump l
                                                                                      }
     r_1 := \mathsf{tslS} \ r_3
     if r_1 = 0
                                                                                      \mathcal{P}[\![P]\!](\vec{x}, l, \Gamma, \mathsf{ChannelsLock}, \gamma)
         jump r_4[\tau][\alpha]
     jump grabLock[\tau][\alpha]
}
where \tau_e = \mathsf{EnvType}(\vec{x}, \Gamma, \gamma, \alpha), \ l \text{ is fresh}
```

Code block main creates the base (empty) environment and jumps to the translated process pointed by l. Code block (grabLock) is a helper primitive that is used to acquire shared access to the environment. We then begin the actual translation, by providing a fresh label l to the subsequent translation function of processes.

All translated processes share the same type, parametrised by the typing environment  $\Gamma$ , the environment  $(\vec{x})$  of the translated process, and by the global lock  $\gamma$ . The register file of the code blocks comprise the environment in  $r_1$  and the lock of the environment in  $r_3$ :

$$\mathsf{ProcBlock}(\Gamma, \vec{x}, \gamma) \stackrel{\text{def}}{=} \forall [\alpha](r_2 \colon \mathsf{EnvType}(\vec{x}, \Gamma, \gamma, \alpha), r_3 \colon \langle \mathsf{lock}(\alpha) \rangle^{\alpha}) \text{ requires } (; \alpha;)$$
(5.1)

The translation of the inactive process is predictable, we just yield the

processor's control.

$$\mathcal{P}\llbracket \mathbf{0} \rrbracket (\vec{x}, l, \Gamma, g, \gamma) \stackrel{\text{def}}{=} \\ l \operatorname{ProcBlock}(\Gamma, \vec{x}, \gamma) \{ \\ \operatorname{unlockS} r_3 \\ \operatorname{yield} \\ \}$$

When translating the output process, we use:

$$\mathcal{P}[\![\overline{x_i}\langle v \rangle]\!](\vec{x}, l, \Gamma, g, \gamma) \stackrel{\text{def}}{=}$$

$$l \operatorname{ProcBlock}(\Gamma, \vec{x}, \gamma) \{$$

$$r_1 := \mathcal{V}[\![v]\!](\vec{x}, r_2)$$

$$r_4 := r_2[i]$$

$$unlockS \ r_3$$

$$r_5 := g$$

$$jump \ \text{send}[\tau][\gamma]$$

$$\}$$

$$where \ \tau = \mathcal{T}[\![\Gamma(v)]\!](\gamma), \ \vec{x} = x_0 \cdots x_i \cdots x_n$$

We prepare the registers according to the code block send, by moving the translated message into register  $r_1$  and fetching the channel from the environment into register  $r_4$ .

The input is translated by preparing the registers of the code block receive, where we send the environment of the translated process. In the continuation we create a new environment (for P) and copy the free names (like x) that are used in P, if variable x is used at all; otherwise we reuse the environment of the translated process. Finally, we create the new environment and then we proceed with the translation of P. (We list this macro in further detail later in Macro 5.2.)

 $\begin{aligned} & \mathcal{P}[\![x(y).P]\!](\vec{x},l,\Gamma,g,\gamma) \stackrel{\text{def}}{=} \\ & l \operatorname{ProcBlock}(\Gamma,\vec{x},\gamma) \left\{ \begin{array}{cc} l_1 \operatorname{ContType}(\tau,\tau_e) \left\{ \\ r_4 := r_2[i] & \text{jump } l_2[\alpha] \\ \text{unlockS } r_3 & \right\} \\ & r_5 := g & \text{CreateEnvAndTranslate}(P,y,\tau,\tau_e,\vec{x},l_2,\Gamma,g,\gamma) \\ & r_1 := l_1 \\ & r_6 := 0 \\ & \text{jump receive}[\tau_e][\tau][\alpha][\gamma] \\ \end{array} \right\} \\ \text{where } \tau = \mathcal{T}[\![\Gamma(y)]\!](\gamma), \ \tau_e = \operatorname{EnvType}(\vec{x},\Gamma,\gamma,\alpha), \ l_1 \ \text{and} \ l_2 \ \text{are fresh} \\ & \vec{x} = x_0 \cdots x_i \cdots x_n \end{aligned}$ 

When translating the parallel process, we fork the execution of the translated process on the left, and, because we loose the permission of the environment, we try to acquire it and continue executing the translated process on the right. No contention exists in acquiring the lock of the environment, since all threads have share access.

The translation of the restriction is similar to the input, since there is an environment creation. We begin by creating the new channel, if it is used. After that, we create a new environment, continuing the translation of process P.

```
\mathcal{P}\llbracket(\nu x: (T)) P \rrbracket(\vec{x}, l, \Gamma, q, \gamma) \stackrel{\text{def}}{=}
If x \in \operatorname{fn}(P):
                                                   l_1 \operatorname{\mathsf{ProcBlockCont}}(	au_e, \gamma) 
      l \operatorname{ProcBlock}(\Gamma, \vec{x}, \gamma) 
                                                                        ValueInit(r_1, (T), \{r_2, r_3\})
           r_1 := \mathsf{tslE} g
           if r_1 = 0
                                                                        r_2[i] := r_1
               jump l_1[\alpha]
                                                                        unlockE ChannelsLock
           jump l[\alpha]
                                                                        jump l_1[\alpha]
      }
                                                                   }
      CreateEnvAndTranslate(P, x, \tau, \tau_e, \vec{x}, l_1, \Gamma, g, \gamma)
     where \tau = \mathcal{T}\llbracket \Gamma(x) \rrbracket(\gamma), \ \tau_e = \mathsf{EnvType}(\vec{x}, \Gamma, \gamma, \alpha), \ l_1 \text{ and } l_2 \text{ are fresh}
```

Otherwise:

 $\mathcal{P}\llbracket P \llbracket P \rrbracket (\vec{x}, l_1, \Gamma, g, \gamma)$ 

Where ProcBlockCont is defined by:

 $\mathsf{ProcBlockCont}(\tau,\gamma) \stackrel{\text{def}}{=} \forall [\alpha](r_2:\tau, r_3: \langle \mathsf{lock}(\alpha) \rangle^{\alpha}) \text{ requires } (\gamma;\alpha;)$ 

The translation of the replicated input process is almost the same as the input, but the flag to keep the process in the channel is turned on.

```
 \begin{aligned} & \mathcal{P}[\![!x(y).P]\!](\vec{x},l,\Gamma,g,\gamma) \stackrel{\text{def}}{=} \\ & l \operatorname{ProcBlock}(\Gamma,\vec{x},\gamma) \left\{ \begin{array}{c} l_1 \operatorname{ContType}(\tau,\tau_e) \left\{ \\ r_4 := r_2[i] & \text{jump } l_2[\alpha] \\ \text{unlockS } r_3 & \right\} \\ & r_5 := g & \operatorname{CreateEnvAndTranslate}(P,y,\tau,\tau_e,\vec{x},l_2,\Gamma,g,\gamma) \\ & r_1 := l_1 \\ & r_6 := 1 \\ & \text{jump receive}[\tau_e][\tau][\alpha][\gamma] \\ \end{array} \right\} \\ & \text{where } \tau = \mathcal{T}[\![\Gamma(y)]\!](\gamma), \ \tau_e = \operatorname{EnvType}(\vec{x},\Gamma,\gamma,\alpha), \ l_1 \ \text{and} \ l_2 \ \text{are fresh} \\ & \vec{x} = x_0 \cdots x_i \cdots x_n \end{aligned}
```

We present the macro to create new environments and then translate a given process:

```
\mathsf{CreateEnvAndTranslate}(P, x_i, \tau, \tau_e, \vec{x}, l, \Gamma, g, \gamma) \stackrel{\mathrm{def}}{=}
If x_i \in \vec{y}:
      l \operatorname{ContType}(\tau, \tau_e) \{
           \beta, r_5 := \text{newLock -1}
           r_4 := \mathsf{EnvAlloc}(P, \Gamma, \gamma, \beta)
          \forall y_j \in \vec{y} \setminus \{x_i\} \begin{cases} r_6 := r_2[j] \\ r_4[k] := r_6, \text{ where } y_j = x_k \text{ and } x_k \in \vec{x} \end{cases}
           unlockS r_3
           r_3 := r_5
           r_2 := r_4
                                                                                                                                             (5.2)
           r_2[i] := r_1
            unlockE r_3
           r_4 := l_1
           jump grabLock [\tau'_e][\beta]
      }
     where \tau'_e = \mathsf{EnvType}(\vec{y}, \Gamma, \gamma, \beta)
      \mathcal{P}\llbracket P \llbracket (\vec{y}, l_1, \Gamma, q, \gamma)
Otherwise:
    \mathcal{P}\llbracket P \rrbracket (\vec{x}, l, \Gamma, q, \gamma)
where \vec{y} = \text{fn}(P), l_1 is fresh
```

There are two possible expansions for macro CreateEnvAndTranslate. One expansion is chosen when the name is not used, in which case we skip environment creation and use the provided label as a parameter of the translation. The other expansion is chosen when  $x_i$  is a free name, in which case we need to create a new environment, and then translate P.

To create the new environment we allocate a tuple and copy each value from the old environment  $\vec{x}$  into the new one  $\vec{y}$ . The new environment is protected by a new lock in order to make access to environments *always unblocked*. Translation then proceeds with the fresh environment. The initialisation of values is recursively defined by:

 $\begin{aligned} \mathsf{Valuelnit}(r,T,R) \stackrel{\mathrm{def}}{=} \\ & \text{If } T \text{ is } int: & \text{Otherwise, considering that } T \text{ is } (T'): \\ & r:=0 & r_i \notin R \\ & r_j \notin R \cup \{r_i\} \\ & \text{Valuelnit}(r_i,T',R \cup \{r_i\}) \\ & \text{ChannelCreateEmpty}(r,r_i,r_j,\mathcal{T}\llbracket T' \rrbracket(\gamma),\gamma) \end{aligned}$ 

If it is an integer, we move the value 0 to the target register. Otherwise, we create an empty channel and move it to the target register.

## Chapter 6

### Conclusion

In our work we show a type-preserving compiler that translates the  $\pi$ -calculus into MIL. The translation process also tries to preserve the semantics, by taking advantage of the multithreaded architecture of the target language. In MIL we have a finite number of processors, where each executes a  $\pi$ process, thus reduction between an active (in a processor) and an inactive (in the thread pool) process is not possible.

As related work, we take in analysis Pict, a compiler that translates from the  $\pi$ -calculus into C; a compiler that targets a typed assembly language; and TyCo, a framework for compiling process calculi. Pict [25] is a compiler from the  $\pi$ -calculus into C. The main difference between Pict and our compiler is the target architecture: the former targets a sequential machine, whereas the latter targets a multithreaded machine. Thus, there are no concerns about concurrency on Pict. Variable binding is also very different: Pict uses the variable binding of C — since there is no support for closures in C, the environment of a process must be manually created. MIL has bind variables to registers. On Pict there is no run-time library. The full code of communication is expanded each time it is used, resulting in more code being generated. The  $\pi$ -calculus version of Pict is richer than the one we use, having support for recursive types, polymorphism, and type inference. In Pict there is concerns about memory usage; MIL abstracts these concerns.

The compiler from Greg Morrisett et al. [18] translates from System-F into TAL (a typed assembly language) in 5 compilation stages. The first compilation stage is conversion to CPS. This does not apply to the  $\pi$ -calculus, since it is a CPS-friendly message. The second compilation step makes environments of functions explicit. Both compilers use the existential value

to abstract environment. In their work, packing the environment is done in the translation stage. We pack the environments in the run-time library (less code is generated). The third compilation step, hoisting, defines heap values that consist in code blocks (much like MIL's code blocks). The forth compilation step makes memory allocation explicit. The final translation step is not relevant, since it is a mostly a syntactic translation to TAL. The main difference between works is that our source and target languages are concurrent.

The work [11] presents a framework for compilation of process-calculi. The abstract machine that runs the target language is sequential, thus suffers from the same limitations found in Pict. Contrary to MIL, there are no typing rules for the target language of this work.

Further work includes extending MIL and simplifying the run-time library. We are adding support for read-only tuples (in MIL), thus reducing contention and removing locks from the translation of processes. We are also working on simplifying the run-time library, by enabling the channel queue to hold messages and processes in the same data structure (instead of using two queues). Furthermore, we are pursuing a wait-free implementation of the  $\pi$ -calculus, by instrumenting MIL with compare and swap rather than locks.

#### Appendix A

#### Queues

We use a double-ended queue instead of a pool to store messages and scheduled processes in channels. The implementation uses a linearly-linked list, composed by elements (nodes) that are connected sequentially. Notice that the FIFO order ensures fairness. Also the queue being double-ended enables fast adds and fast removes.

The type of an element (Figure A.1) may be parametrised by:

ElementType $(\tau, \alpha) \stackrel{\text{def}}{=} \mu \ \beta. \langle \tau, \beta \rangle^{\alpha}$ 

An element is a tuple, protected by lock  $\alpha$ , that holds three values: the contents of the element (of type  $\tau$ ), a reference to the previous element (of the recursive type  $\beta$ ), and a reference to the next element (also of type of the element,  $\beta$ ). The macros for accessing this data structure:

ElementAlloc $(\tau, \alpha) \stackrel{\text{def}}{=} \text{malloc}[\tau, \text{ElementType}(\tau, \alpha)]$  guarded by  $\alpha$ ElementValue $(r) \stackrel{\text{def}}{=} r[0]$ ElementNext $(r) \stackrel{\text{def}}{=} r[1]$ 

We show an example of the creation of two elements connected:



Figure A.1: An element.



Figure A.2: Two elements connected.



Figure A.3: A queue with n elements.

1  $r_1:=$  ElementAlloc(int,  $\alpha$ ) -- alloc element 1 2  $r_2:=$  ElementAlloc(int,  $\alpha$ ) -- alloc element 2 3 ElementValue( $r_1$ ) := 3 -- set the value of element 1 as 3 4 ElementNext( $r_1$ ) :=  $r_2$  -- link element 1 to element 2 5 ElementValue( $r_2$ ) := 0 -- set the value of element 2 as 0 6 ElementNext( $r_2$ ) :=  $r_2$  -- link element 2 to itself

Figure A.2 shows the two new elements, which are pointed by registers  $r_1$  and  $r_2$ .

We define a queue as

QueueType $(\tau, \alpha) \stackrel{\text{def}}{=} (\text{int, ElementType}(\tau, \alpha), \text{ElementType}(\tau, \alpha))^{\alpha}$ 

The tuple comprises the number of elements in the queue, the first element in the queue, and the last element in the queue, as portrayed by Figure A.3. The associated macros are:

QueueAlloc( $\tau$ ,  $\alpha$ )  $\stackrel{\text{def}}{=}$  malloc [int, ElementType( $\tau$ ,  $\alpha$ ), ElementType( $\tau$ ,  $\alpha$ )] guarded by  $\alpha$ 



Figure A.4: A queue with one element.

QueueLen(r) $\stackrel{\text{def}}{=} r[0]$ QueueFirst(r) $\stackrel{\text{def}}{=} r[1]$ QueueLast(r) $\stackrel{\text{def}}{=} r[2]$ 

Consider the elements of the previous example, stored referred by registers  $r_1$  and  $r_2$ . The following example, illustrated by Figure A.4, shows the creation of a queue, holding the number 3:

- 1  $r_3 :=$ QueueAlloc(int,  $\alpha$ ) -- alloc the queue
- 2 QueueLen $(r_3) := 1$  set the number of valid elements in the queue
- 3 QueueFirst $(r_3) := r_1$  point to the first element (the head of the queue)
- 4 QueueuLast $(r_3) := r_2$  point to the sentinel (the tail of the queue)

Our queues have a sentinel element, ensuring that every element has a next value. Thus, even though we have two elements, the second one, pointed by the tail of the queue, does not count as valid.

The initialisation of empty queues (with a sentinel), depicted by Figure A.5 is so common that we also define macro QueueCreateEmpty( $r, r_e, v, \tau, \alpha$ ), where:

- r is the register that will refer the new queue;
- $r_e$  is the register that will point to the sentinel element;
- v is the value held by the sentinel element (of type  $\tau$ );
- $\tau$  is the type of the contents of the queue (consequently of v as well);



Figure A.5: An empty queue.

•  $\alpha$  is the lock protecting the queue.

Defined by:

Notice that because of the order of the instructions, v can be the same register as r, but not the same as  $r_e$ . It is also important to realise that v is not altered after the expansion of this macro.

We define macro QueueAdd $(r, r_l, r_e, v, \tau, \alpha)$  to add an element to the queue (delineated by Figure A.6), where:

- r is the register that refers the queue;
- $r_l$  is the register that will store the number of elements of the queue;
- $r_e$  is the register that will refer the sentinel element;
- v is the value to be added to the queue (of type  $\tau$ );
- $\tau$  is the type of the contents of the queue (consequently of v as well);



Figure A.6: Adding the *n*-th value to a queue.

•  $\alpha$  is the lock protecting the queue.

Defined by:

QueueAdd $(r, r_l, r_e, v, \tau, \alpha) \stackrel{\text{def}}{=}$  $r_e := \mathsf{QueueLast}(r)$ -- the sentinel will become the last valid element  $ElementValue(r_e) := v$ -- set the value of the last element  $r_e := \mathsf{ElementAlloc}(\tau, \alpha) - create the new sentinel$  $\mathsf{ElementValue}(r_e) := v$ -- copy the value to the sentinel as well  $\mathsf{ElementNext}(r_e) := r_e$ -- link new element to itself  $r_l := \mathsf{QueueLast}(r)$ -- get the last element again  $\mathsf{ElementNext}(r_l) := r_e$ -- link it to the sentinel -- point the tail of the queue to the sentinel  $QueueLast(r) := r_e$  $r_l := \mathsf{QueueLen}(r)$  $r_l := r_l + 1$ QueueLen $(r) := r_l$ -- increment the count of elements

Notice that, because of the order of the instructions in the macro, the value v may be register  $r_l$  but it may not be registers r and  $r_e$ .

We now define macro QueueRemove $(r, r_l, r_v)$ , illustrated by Figure A.7, to remove a value from a queue, while retaining it in a register. The interface is:

- r is the register that refers the queue;
- $r_l$  is the register that will store the length of the queue;
- $r_v$  is the register that will refer the removed value;

The macro is defined by:



Figure A.7: Removing the first element of a queue.

#### Bibliography

- BEE2 (Berkeley Emulation Engine 2). http://bee2.eecs.berkeley. edu/.
- [2] Gérard Boudol. Asynchrony and the  $\pi$ -calculus (note). Rapport de Recherche 1702, INRIA Sophia-Antipolis, 1992.
- [3] Peter J. Denning and Stuart C. Schwartz. Properties of the working-set model. Commun. ACM, 15(3):191–198, 1972.
- [4] Cormac Flanagan and Martín Abadi. Types for Safe Locking. In Proceedings of ESOP '99, volume 1576 of LNCS, pages 91–108. Springer, 1999.
- [5] Hubert Garavel. Reflections on the Future of Concurrency Theory in General and Process Calculi in Particular. *Electronic Notes in Theoretical Computer Science*, 209:149–164, 2008.
- [6] Jean-Yves Girard. The system F of variable types, fifteen years later. Theoretical Computer Science, 45(2):159–192, 1986.
- [7] Jr. Guy Lewis Steele. RABBIT: A Compiler for SCHEME. Master's thesis, MIT AI Lab, 1978.
- [8] Jim Held, Jerry Bautista, and Sean Koehl. From a few cores to many: A tera-scale computing research overview. White paper, 2006.
- [9] Kohei Honda and Mario Tokoro. An object calculus for asynchronous communication. In Pièrre America, editor, *Proceedings of ECOOP '91*, volume 512 of *LNCS*, pages 133–147. Springer, 1991.
- [10] Stuart Kent. Model Driven Engineering. In Proceedings of IFM '02, pages 286–298. Springer, 2002.

- [11] Luís Lopes, Fernando Silva, and Vasco Vasconcelos. Compiling Object Calculi. Technical Report DCC-98-3, University of Oporto, 1998.
- [12] Luís Lopes, Fernando Silva, and Vasco T. Vasconcelos. A Virtual Machine for the TyCO Process Calculus. In *Proceedings of PPDP '99*, volume 1702 of *LNCS*, pages 244–260. Springer, 1999.
- [13] Luís Lopes, Vasco T. Vasconcelos, and Fernando Silva. Fine Grained Multithreading with Process Calculi. *IEEE Transactions on Computers*, 50(9):229–233, 2001.
- [14] Francisco Martins. Controlling Security Policies in a Distributed Environment. PhD thesis, Faculty of Sciences, University of Lisbon, 2005.
- [15] Robin Milner. The polyadic π-calculus: A tutorial. In Friedrich L. Bauer, Wilfried Brauer, and Helmut Schwichtenberg, editors, *Logic and Algebra of Specification*, volume 94 of *Series F.* NATO ASI, Springer, 1993. Available as Technical Report ECS-LFCS-91-180, University of Edinburgh, 1991.
- [16] Robin Milner. Communicating and Mobile Systems: the  $\pi$ -Calculus. Cambridge University Press, 1999.
- [17] Robin Milner, Joachim Parrow, and David Walker. A calculus of mobile processes, part I/II. Journal of Information and Computation, 100:1–77, 1992.
- [18] Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From System F to Typed Assembly Language. ACM Transactions on Programing Language and Systems, 21(3):527–568, 1999.
- [19] Jishnu Mukerji and Joaquin Miller eds. Model Driven Architecture. Object Management Group, 2001. http://www.omg.org/cgi-bin/doc? ormsc/2001-07-01.
- [20] Kunle Olukotun and Lance Hammond. The future of microprocessors. Queue, 3(7):26–29, 2005.
- [21] Benjamin Pierce and David Turner. Pict: A Programming Language Based on the Pi-Calculus. In Gordon Plotkin, Colin Stirling, and Mads Tofte, editors, *Proof, Language and Interaction: Essays in Honour of Robin Milner*, Foundations of Computing. MIT Press, May 2000.

- [22] Benjamin C. Pierce. Advanced Topics In Types And Programming Languages. MIT Press, 2004.
- [23] The RAMP (Research Accelerator for Multiprocessors) project. http: //ramp.eecs.berkeley.edu/.
- [24] Davide Sangiorgi and David Walker. The  $\pi$ -calculus: a Theory of Mobile Processes. Cambridge University Press, 2001.
- [25] David Turner. The Polymorphic Pi-Calculus: Theory and Implementation. PhD thesis, LFCS, University of Edinburgh, 1996. CST-126-96 (also published as ECS-LFCS-96-345).
- [26] Vasco T. Vasconcelos and Francisco Martins. A multithreaded typed assembly language. In *Proceedings of TV '06*, 2006.