Abstract. Current and emerging trends such as cloud computing, fog computing, and more recently, multi-access edge computing (MEC) increase the interest in finding solutions to the verifiable computation problem. Furthermore, the number of computationally weak devices have increased drastically in recent years due to the ongoing realization of the Internet of Things. This work proposes a solution which enjoys the following two desirable properties: (1) cost of input preparation and verification is very low (low enough to allow verifiable outsourcing of computations by resource-constrained devices on constrained networks); (2) the running time of the verifiable computation is RAM-like.
Introduction
Verifiable outsourcing of computations involves a possibly computationally weak outsourcing party (outsourcer), and one or more worker parties (evaluators) who are possibly untrusted by the outsourcer. The outsourcer sends the inputs for the computation to the evaluator, and the evaluator sends back the result of the computation along with some additional information which enables the outsourcer to verify the received result. How much the outsourcer benefits from outsourcing depends on how much less the cost of verification is compared to the cost of performing the computation, Cost C . Obviously, if the cost of verification is greater than or equal to Cost C , the outsourcer would rather perform the computation itself. It is also desirable that, the cost of the verifiable computation to the evaluator is as close as possible to Cost C .
Solutions to the verifiable computation problem based on Yao's Garbled Circuit (GC) construction enjoy the non-interactivity and inherent verifiability of secure 2-party computations using GCs, but they have to defeat two great challenges before they can be of practical value: the single-use nature of the garbled circuit, and the inflation of size and running time due to the conversion to Boolean circuit. Simply converting a RAM program to a circuit, and then garbling and evaluating it, leads to solutions with circuit-like running times, which is significantly worse compared to the running time of the original RAM program. The solution presented in this work does not address the inflation of size, but it does achieve RAM-like running time. The verifiable RAM (VRAM) construction which underlies the solution sits somewhere between the simple conversion to circuit and the intricate GRAM constructions [3] . The design of VRAM is based on RAM concepts, but unlike GRAM, all the construction work takes place at compile time at a cost similar to circuit construction. The construction underlying the solution is not oblivious, and the solution does not provide privacy.
The rest of this paper is organized as follows. Section 2 provides necessary background information on the random-access machine and garbled circuits. Section 3 develops the necessary concepts for describing VRAM, and provides an informal description of it. Section 4 describes the algorithms which define the VRAM scheme, and Section 5 puts these algorithms together within a protocol, which serves as the formal description of the proposed solution. Section 6 concludes the paper and discusses future work.
Background

Random-Access Machine (RAM)
The random-access machine (RAM) models the essential features of the traditional serial computer [4] . RAM model of computation resembles the operation of modern computers much more closely compared to logic circuits. The randomaccess machine consists of a central processing unit (CPU) and a random-access memory, which are connected to each other and interact (See Fig. 1) . The CPU has a small (compared to the random-access memory) internal memory comprised of special-purpose memory locations called registers, and (for efficiency reasons) all CPU operations are performed on data stored in these registers. The random-access memory is modeled as a collection of m w-bit words, each of which is identified by a memory address. The random-access memory stores both data and collections of CPU instructions called programs. The CPU repeatedly reads an instruction from the random-access memory and executes it, modifying data in the process. The set of all instructions comprise the instruction set (IS). A typical IS includes memory load and store instructions for moving data between memory locations and registers, jump instructions, arithmetic and logical instructions, as well as input and output instructions, and a HALT instruction. Branching and loops in high-level languages correspond to conditional jumps and conditional backward jumps, respectively. In a conditional jump, the CPU either reads the next instruction in forward direction, or 'jumps' to an instruction out of sequence and reads that one, depending on the result of a comparison. Without loss of generality, the random-access memory can be considered as the union of five disjoint memory regions R, P , X, Y and D. The registers will be considered as part of the memory, for the sake of simplifying notation. The read-only X and P are the regions where the input to the program and the program itself are loaded, respectively. Y is the region where the computation result is written at the end of the computation: without loss of generality, and for reasons that will become clear later, we assume that the last thing a program does is to write the computation result into Y . Everything else (e.g. local and global variables) is stored in D. Then, a RAM computation can be expressed as
where P D denotes that the program P can read the initial memory contents of D, as well as reading from locations in D having written to those locations itself. While the latter class of actions treat D as merely temporary storage, the ability of the programs to read the initial contents of D qualifies it as persistent memory which persists between executions of several possibly different programs. R on the other hand is not persistent, and a program P should read a location in R only if it has written to it. Compared to the most efficient equivalent circuit, a RAM program has significantly better average running time, as the circuit evaluation involves (1) evaluating both branches for each branching, and (2) running each loop the maximum possible number of times it can run. On the other hand, RAM execution is not oblivious (to the inputs) while the circuit evaluation is. However, while obliviousness is a desirable property for private computations, that is not necessarily the case for verifiable computations.
Garbled Circuits (GC)
Yao's Garbled Circuit (GC) construction [5, 6] has given rise to numerous research papers, mostly in the area of secure computation. In later years, the original idea has been formalized under the name garbling schemes [2] . A garbling scheme comprises five algorithms Gb, En, Ev, De, ev such that: (1) 
where Gb is given a security parameter, and the function f which is to be computed. Gb yields F , e, and d which describe the garbled function, the encoding function, and the decoding function, respectively. (2) X = En(e, x), where x is the input, and X is the garbled input. Yao's original construction can be described using the syntax above for garbling schemes as follows. In case of Yao's original construction, Gb garbles a circuit representing a function f , and ev is the usual circuit evaluation. Gb starts by assigning to each wire in the circuit, two keys k 0 and k 1 corresponding to the two possible wire values 0 and 1, respectively. For each gate g r , the keys k r , where each one of s and t is either a gate index for a gate whose output wire is connected to an input wire of g r , or an index of an input wire for the circuit. The two encryption keys and the key to be encrypted are chosen respecting the structure of the truth table (TT), so that the evaluation of the garbled circuit with the garbled inputs mimics the in-the-clear evaluation with the corresponding non-garbled inputs. This step is closely related to the correctness condition for the garbling schemes: De(d, Ev(F, En(e, x))) = ev(f, x). The process yields the encrypted truth table (ETT) for the gate (See Fig. 2) . Finally, the rows of the ETTs are shuffled, so that the values on a gate's input wires cannot be inferred from the index of the row opened during the evaluation. In this case, Ev resembles the usual circuit evaluation in terms of the processing order of the gates, however gate evaluations involve undoing the double encryptions, rather than doing simple look-ups in the TTs. Selection of the row to be decrypted may be carried out via trial and error (possible if authenticated encryption is used), or via the point-and-permute technique [1] . En and De are as simple as following the mappings between the bit values 0 and 1, and the corresponding key values. 
The Verifiable RAM (VRAM)
This section describes the verifiable RAM (VRAM). In order to achieve this, the necessary concepts for describing VRAM will be introduced, and examples will be given for providing context.
A VRAM construction allows one-time verifiable computation. It is built on the principle that the execution of a VRAM program can advance so long as the memory access pattern of the VRAM program 'mimics' the memory access pattern of the (non-verifiable) RAM program from which the VRAM program was built. Otherwise, the execution shall not advance. The memory access pattern involves not just the locations accessed, but also the type of access (read or write) and the value read or written.
VRAM Random-Access Memory
The random-access memory of the VRAM will be referred to as encoded memory, and will be denoted by M v . M v holds encodings of the bits of data manipulated by the program, but not the program itself. Using the notation from Section 2.1, only the memory regions R, X, Y , and D are encoded, and the VRAM program is never loaded into the encoded memory. R v , X v , Y v , and D v will denote the encoded twins of R, X, Y , and D, respectively. In order to keep a one-to-one correspondence between the regions and simplify the descriptions, region P of M will be omitted in the rest of this work:
In case of persistent memory, X and D may both be read as input, affecting the path of execution. The reason for defining a separate region X becomes clear in the context of verifiable computations. It is the region X that stores the inputs of the outsourcing party, whereas D might be some large database whose contents may have been altered by previous computations, and may affect the outcome of the current computation, just as contents of X does.
If the word length of RAM memory M is W , then the word length of M v is W · K, where K is the key length, which is the sole security parameter for the VRAM construction. Locations in M and M v are denoted by x and x v , respectively. Each bit value stored at location x, maps to a key whose first bit is stored at location x v = x·K of M v . This mapping from bit values to keys is timedependent. Time dependency of the mapping is a must because a RAM program may write the same value to a location at different times during execution, but the verifiable twin VRAM relies on garbled circuits for its verifiability property, and garbled circuits require fresh un-exposed keys as inputs. A time-like variable VRAM time, denoted by t, is incremented by 1 each time a word in M is written. t also increases due to branchings, as will be explained in the next subsection. The last write times t w [x] are separately kept for each memory location x, to be used during the construction of the VRAM program. t w [x] = 0 for all x at t = 0, and when some memory location x ′ is written to at t = t ′ , t w [x ′ ] is set to t ′ , whether or not the old and new bit values are different. The crucial feature of the encoded memory to keep in mind is that memory writes to M are reflected in the VRAM as time-translation of keys, which take place even when the value in M remains unchanged.
VRAM CPU and VRAM Programs
It was mentioned in Section 2.1 that a RAM computation can be expressed as Y = P D (X). Our goal is to obtain a verifiable version of the same computa-tion, which yields Y v = P Dv v (X v ). Previous subsection described how memory is encoded. This subsection describes how the VRAM program P v can be built from P . Definition of a separate entity VRAM CPU is not necessary, as the VRAM program will cover the functionality associated with both the CPU and the RAM program P .
A VRAM program consists of several garbled circuits, each belonging to one of the three categories B, T , or I. Type B (branch) and type T (time-merge) circuits together model a conditional jump, and type I (instruction) circuits model any instruction which alters memory. Type B circuits guarantee that only a single branch -the correct one for the given inputs-can be followed, and type T circuits are needed for merging branches, and more generally, for handling input-dependent program behaviour. Type I circuits may be further categorized into sub-types which closely resemble the operations in instruction sets such as x86 and x86 64, and they guarantee that M v is altered in a way that is consistent with its twin M at each time step, i.e. the memory access pattern is mimicked. Before going any further, we define a few concepts which are relevant to both RAM programs and VRAM programs: Segment: A segment is an ordered, maximal-length sequence of instructions which are always executed in sequence, independent of initial memory contents. The sequence order reflects the order in which the instructions are executed. Branch: Either a conditional jump or an HALT instruction marks the end of a segment. In case of a conditional jump, two new segments s 1 and s 2 are created, such that at least one of them has non-zero length. The created segments are called branches. Let the VRAM times associated with the first and the last instructions in either s 1 or s 2 be t 1 and t 2 , respectively. t 1 − 1 (resp. t 2 + 1) is defined as the time of split (resp. time of merge), and is denoted by t s (resp. t m ). Path of Execution: A path of execution, or an execution path, is an ordered sequence of segments visited during a single program execution. The sequence order reflects the order in which the segments are visited.
The VRAM time runs from t = 0 to t = τ during an execution, where τ is an input-independent value. Clearly, the input-independent τ is not a measure of the running time of the RAM, or the VRAM. We define another variable t cost , which is more relevant for running time measurements, and use it for imposing a limit on the size of the VRAM program.
The following example aims to clarify these definitions. First, part of a program P written in an assembly language is given (See Listing 1.1). Equivalent code written in a high-level language is given in Listing 1.2. Finally, the VRAM program P v built from P is depicted in Fig. 3 . ;... In Fig. 3 , the axis below shows the VRAM time t. Dots indicate garbled circuits. Type B and T circuits are marked with the respective letter, and all unmarked dots correspond to type I circuits. A HALT instruction is marked with a square. In case of type T circuits a single dot is used to represent possibly several T circuits. In other cases, a single dot represents a single circuit. Segments are denoted by s, and branches are denoted by br. Branch br1 and the 0-length branch br2 (which contains only T circuit(s)) both start at t = 5. Time of split is t = 4 and time of merge is t = 8. The square brackets around the T circuits are included to emphasize the fact that existence of T circuits at the end of branches depends on the instructions in both branches. A VRAM time value and a branch index together define a unique circuit position within the structure of a VRAM program. We adapt the convention that, I t,b stands for the type I circuit associated with VRAM time t, and the upper (resp. lower) branch if b = 0 (resp. b = 1). If a circuit is not associated with any branches, b is omitted. Same convention is used also for type B and type T circuits. I and B circuits on the same branch, as well as those that do not belong in any branches, are drawn at the same height. All T circuits are depicted on a vertical line of their own.
LOAD y
A challenge in building a VRAM program is the input dependency of the execution path. Consider the garbled circuit I 8 in Fig. 3 . I 8 takes as input the encoding keys associated with (x, t w [x]) for all locationsx in which bits of the program variable x are stored. These input keys have to be known at compile time so the circuit I 8 can be constructed. The variable x is written at t = 2, and then at t = 7 in only one of the branches, which would mean t w [x], and consequently the input keys, depend on the path of execution, which is unknown at compile time. But this is not the case. While building the VRAM program, we make sure that t w [x] is input independent, by fast-forwarding keys. Recall that memory writes are modeled by time translation of keys. Fast-forwarding is time translation of keys in order to compensate for time discrepancies due to branching, apart from the normal time translations due to memory writes. Fast-forwards happen in two ways: (1) explicitly via T circuits; (2) implicitly in certain I circuits. T circuit(s) are added to the very end of a branch br when there are memory location(s)x that are modified in the other branch, but not in br, explicitly fast-forwarding allx to the time of merge. The implicit case occurs when a memory locationx is written by one or more I circuits on a branch. The very last time somex is written on a branch, the I circuit which does the writing does not use the VRAM time associated with it to determine the output keys, but instead uses the time of merge, possibly fast forwardingx. There is one other case where explicit fast-forwards occur. A HALT instruction does not alter RAM memory, so it has no corresponding I circuit. It is represented in the VRAM program merely with a marker. These marked positions indicate the end of each possible path of execution (with possibly different running times) at compile time, and program termination at runtime. At these positions are T circuits which fast-forward the whole Y v region to t = τ , making possible the verification of computation result using a single key pair per location.
While building the VRAM program, we have to ensure that the computing party can follow only the correct path of execution while executing the VRAM program. This is achieved by replacing each conditional jump in the RAM program with a circuit which evaluates the condition (e.g. 'is zero?' for JMPZ), and outputs one of the two branch keys depending on the result. Each garbled circuit on a branch, regardless of its type, is encrypted with the corresponding branch key. Below, we present two more examples before taking a closer look at the garbled circuits involved. In order to save space, we only give the high-level language code. We won't be precise about segment lengths and t values, and will concentrate on the VRAM program structure instead. Fig. 4 is that the first branching is already merged before the second one takes place. In some sense, building a VRAM program involves flattening the associated RAM program into two-branch thickness, by considering the expanded VRAM time instead of the regular running time of a RAM program. In general, a VRAM program handles at most two branches at each VRAM time t. Another thing to note in this example is that the RAM program includes two return statements (i.e. HALT instructions), and both T O,0 and T τ fast-forward Y v to t = τ . 
Listing 1.3 contains a typical if-else statement. One thing to note in
Building the I, B, T garbled circuits
Circuits of each type have quite simple structure, so we will provide only one example of each. We will assume that (1) IS = {LOAD, STORE, ADD, SUB, MUL, DIV, JMP, JMPZ, JMPN, HALT}; (2) the instruction LOAD loads its parameter into register r; (3) the instruction JMPZ makes the comparison with the value in register r. The keys for encoding memory are generated from a pseudo-random function F k (x, t, b), and the branch keys used by B circuits are generated from a PRF F br (t, br). x is a memory location in M , t is the VRAM time, b, br ∈ {0, 1} are a bit value and a branch index, respectively. First, we construct I 3 which corresponds to the instruction LOAD z in Listing 1.1. I 3 is an I circuit with sub-type LOAD. Suppose that the location of the word holding variable z is x z , and the location of the register r in R is x r . In this case, the circuit being built is simply all the circuits for loading individual bits, bundled together.
2 So we consider only the circuit I 0 3 responsible from loading the first bit of z at x z . I 0 3 is a gate with a single input wire and a single output wire, whose two ETT rows R 0 , R 1 are: 3, 1) ) 4 A T circuit is almost identical to an I circuit with sub-type LOAD, except that the memory location read and written are the same, so the T circuit merely timetranslates a single word. Again, a circuit which operates on a word can be thought of as W circuits, each time-translating a single bit, bundled together. The ETT rows R 0 , R 1 of T i 5,1 which time-translates the i th bit are:
. Unlike the circuits considered in the previous examples, T 5,1 is on a branch. What is added to the VRAM program is not the circuit, but the ciphertext resulting from its encryption using the branch key k up = F br s br (4, 0).
The VRAM Scheme
The VRAM scheme is comprised of the following four algorithms: A P ROG , A IN P UT , A V ERIF Y , and A EXEC . These describe construction of a VRAM program, encoding of inputs, verification of a computation result, and execution of a VRAM program, respectively. 1. Prepare for a new computation by running AINIT (τprev). 2. Systematically follow every path of execution for the given RAM program P . We will assume the same IS from Section 3.3. For each path, do: -Set tcost = 0. -While tcost < M AXcost and no HALT instruction is encountered, do:
• Pick the next instruction inst on the execution path.
• Increment tcost by 1.
a
• If inst ∈ {LOAD, STORE, ADD, SUB, MUL, DIV}, then build an I circuit with the corresponding sub-type, and update tw with current time t for the memory location written. However, if inst is on a branch, and the target location is written for the last time on this branch (implicit fast-forward), then (1) build I s.t. it time-translates to the time of merge instead of current time; (2) update tw with time of merge for the memory location written. If inst is the final instruction on the branch, build T circuits which emulate the writes that happen only in the other branch (explicit fast-forward), and set t to the time of merge. Otherwise, increment t by 1.
• If inst ∈ {JMPZ, JMPN}, then build a B circuit. Increment t by 1.
• If inst is an HALT instruction, then build T circuits which fastforward each key in Yv to τ . This action is delayed until all paths are exhausted, and τ is fixed.
b Add a dummy element to Pv with type field in the circuit label set to HALT.
• For each circuit built during the processing of inst, prepare wire labels and a circuit label. Wire labels associate with each wire the memory location read or written. c Circuit label indicates the VRAM time when the circuit was built, type (I,B,T ), and also the branch index (0 for 'upper' branch, 1 for 'lower' branch) if the circuit is on a branch. We adopt the convention that the branch that is followed when the condition evaluates to true is the upper branch. If a circuit is not on a branch, label it and add it to Pv. Otherwise, encrypt the circuit with the corresponding branch key and add the labeled ciphertext to Pv.
-Let the largest t value observed on the path be tmax. If tmax > τ , then set τ = tmax.
a We make the simplifying assumption that every instruction has the same cost. b One other possibility is to use a predetermined τ value which is guaranteed to exceed all tmax. Using a τ value which is larger than necessary has no drawbacks. c An exception is the output wires of a type B circuit.
Description: Input preparation.
1. Let X(x) = b, where b ∈ {0, 1}. For each location x in X, set the key at the corresponding location xv in Xv to F
Description: Verification.
For each location y in Y :
-Let the key at the corresponding location yv in Yv be ky. Let 1. Set variable br = −1. br holds current branch index at any time during execution. Value −1 stands for no branch. 2. Set variable k br = null. k br holds the branch key for the current branch. 3. Set variable halt = false. 4. Execution involves iterating over and processing the elements in the collection Pv. Most likely, only a small portion of the elements which are associated with a single path of execution have to be processed. Order of processing is determined by the labels. The element which is picked next for processing is the one with the smallest VRAM time in its label. If br = −1, only elements with same branch index br in its label can be picked. An element is only picked once. If more than one elements are eligible for picking (i.e. they are labeled with the same VRAM time), type I and B circuits are picked before type T circuits. Elements with type set to HALT are picked last. In other cases, e.g. among types I and B, the pick can be made arbitrarily. While there is an element eligible for picking: -Pick the next element.
-If br = −1, decrypt the circuit using k br .
-For each input wire, read from Mv the key associated with the location in the wire label. Assign each key read to the corresponding input wire. Evaluate the garbled circuit. -Read circuit type from circuit label.
-If type is I or T , then for each output wire, read from the wire label the write location, and write the key assigned to the output wire (i.e. the evaluation result) to the corresponding location in Mv. If type is T and there are no more eligible elements labeled with the same t, then set br = −1 and k br = null. -If type is B, then set the key on the left output wire to k br , and set the value on the right output wire to br. -If type is HALT, then set halt = true. Note that extra work has to be done for a branch br that is not executed. The extra work is proportional to the number of distinct memory locations accessed exclusively in br, and is independent of the running time of br. The extra work that has to be done for a loop is proportional to the number of times it is executed. The verifiable RAM program terminates at exactly the same point along the path of execution as its non-verifiable counterpart. The running time of a VRAM program is RAM-like.
Protocol for Outsourcing VRAM Programs
This section presents a protocol for verifiable outsourcing of computations on persistent memory. The protocol consists of a preprocessing phase and an online phase, and works in a three-party setting. The parties involved are the outsourcer (a possibly computationally weak party who outsources the computations and verifies the results), the evaluator (a computationally capable untrusted party who performs the computations), and the constructor (a computationally capable trusted party who builds the verifiable programs corresponding to the outsourced computations). 
Preprocessing Phase
The constructor prepares all the preprocessing material without the involvement of the outsourcer and the evaluator, who may receive their share of the preprocessing material anytime before the first outsourced computation begins, and possibly at different times.
Conclusion and Future Work
This work proposed a solution to the verifiable computation problem which accepts resource-constrained devices as outsourcers, and offers RAM-like running times to evaluators. The other side of the coin is that the computational and memory costs of building VRAM programs incurred on the constructor, the cost incurred on the network due to the size of the VRAM programs, or the memory cost of storing VRAM programs incurred on the evaluator might not be tolerable. Moreover, a VRAM program can be used only once. However, there is also reason to be hopeful. First of all, the possibly intolerable costs mentioned above all concern the preprocessing phase of the protocol, and the online phase of the protocol is efficient. Secondly, it seems possible that the memory and communication costs associated with the constructor and evaluator can be made amortizable over several computations.
